Company profile: Complete Genomics Inc.
Reid, Clifford
2011-02-01
Complete Genomics Inc. is a life sciences company that focuses on complete human genome sequencing. It is taking a completely different approach to DNA sequencing than other companies in the industry. Rather than building a general-purpose platform for sequencing all organisms and all applications, it has focused on a single application - complete human genome sequencing. The company's Complete Genomics Analysis Platform (CGA™ Platform) comprises an integrated package of biochemistry, instrumentation and software that sequences human genomes at the highest quality, lowest cost and largest scale available. Complete Genomics offers a turnkey service that enables customers to outsource their human genome sequencing to the company's genome sequencing center in Mountain View, CA, USA. Customers send in their DNA samples, the company does all the library preparation, DNA sequencing, assembly and variant analysis, and customers receive research-ready data that they can use for biological discovery.
Complete Genome Sequence of Porcine Parvovirus 2 Recovered from Swine Sera
Kluge, M.; Franco, A. C.; Giongo, A.; Valdez, F. P.; Saddi, T. M.; Brito, W. M. E. D.; Roehe, P. M.
2016-01-01
A complete genomic sequence of porcine parvovirus 2 (PPV-2) was detected by viral metagenome analysis on swine sera. A phylogenetic analysis of this genome reveals that it is highly similar to previously reported North American PPV-2 genomes. The complete PPV-2 sequence is 5,426 nucleotides long. PMID:26823583
Complete Genome Sequence of Porcine Parvovirus 2 Recovered from Swine Sera.
Campos, F S; Kluge, M; Franco, A C; Giongo, A; Valdez, F P; Saddi, T M; Brito, W M E D; Roehe, P M
2016-01-28
A complete genomic sequence of porcine parvovirus 2 (PPV-2) was detected by viral metagenome analysis on swine sera. A phylogenetic analysis of this genome reveals that it is highly similar to previously reported North American PPV-2 genomes. The complete PPV-2 sequence is 5,426 nucleotides long. Copyright © 2016 Campos et al.
The complete sequence of Cymbidium mosaic virus from Vanilla fragrans in Hainan, China.
He, Zhen; Jiang, Dongmei; Liu, Aiqin; Sang, Liwei; Li, Wenfeng; Li, Shifang
2011-06-01
The complete nucleotide sequence of Cymbidium mosaic virus (CymMV) isolated from vanilla in Hainan province, China was determined for the first time. It comprised 6,224 nucleotides; sequence analysis suggested that the isolate we obtained was a member of the genus Potexvirus, and its sequence shared 86.67-96.61% identities with previously reported sequences. Phylogenetic analysis suggested that CymMV from vanilla fragrans was clustered into subgroup A and the isolates in this subgroup displayed little regional difference.
Qin, Yanhong; Wang, Li; Zhang, Zhenchen; Qiao, Qi; Zhang, Desheng; Tian, Yuting; Wang, Shuang; Wang, Yongjiang; Yan, Zhaoling
2014-01-01
Background Sweet potato chlorotic stunt virus (family Closteroviridae, genus Crinivirus) features a large bipartite, single-stranded, positive-sense RNA genome. To date, only three complete genomic sequences of SPCSV can be accessed through GenBank. SPCSV was first detected from China in 2011, only partial genomic sequences have been determined in the country. No report on the complete genomic sequence and genome structure of Chinese SPCSV isolates or the genetic relation between isolates from China and other countries is available. Methodology/Principal Findings The complete genomic sequences of five isolates from different areas in China were characterized. This study is the first to report the complete genome sequences of SPCSV from whitefly vectors. Genome structure analysis showed that isolates of WA and EA strains from China have the same coding protein as isolates Can181-9 and m2-47, respectively. Twenty cp genes and four RNA1 partial segments were sequenced and analyzed, and the nucleotide identities of complete genomic, cp, and RNA1 partial sequences were determined. Results indicated high conservation among strains and significant differences between WA and EA strains. Genetic analysis demonstrated that, except for isolates from Guangdong Province, SPCSVs from other areas belong to the WA strain. Genome organization analysis showed that the isolates in this study lack the p22 gene. Conclusions/Significance We presented the complete genome sequences of SPCSV in China. Comparison of nucleotide identities and genome structures between these isolates and previously reported isolates showed slight differences. The nucleotide identities of different SPCSV isolates showed high conservation among strains and significant differences between strains. All nine isolates in this study lacked p22 gene. WA strains were more extensively distributed than EA strains in China. These data provide important insights into the molecular variation and genomic structure of SPCSV in China as well as genetic relationships among isolates from China and other countries. PMID:25170926
Complete genome sequence of the plant pathogen Erwinia amylovora strain ATCC 49946
USDA-ARS?s Scientific Manuscript database
Erwinia amylovora causes the economically important disease fire blight that affects rosaceous plants, especially pear and apple. Here we report the complete genome sequence and annotation of strain ATCC 49946. The analysis of the sequence and its comparison with sequenced genomes of closely related...
Complete Genome Sequence of a Putative Densovirus of the Asian Citrus Psyllid, Diaphorina citri.
Nigg, Jared C; Nouri, Shahideh; Falk, Bryce W
2016-07-28
Here, we report the complete genome sequence of a putative densovirus of the Asian citrus psyllid, Diaphorina citri Diaphorina citri densovirus (DcDNV) was originally identified through metagenomics, and here, we obtained the complete nucleotide sequence using PCR-based approaches. Phylogenetic analysis places DcDNV between viruses of the Ambidensovirus and Iteradensovirus genera. Copyright © 2016 Nigg et al.
Abayli, Hasan; Tonbak, Sukru; Azkur, Ahmet Kursat; Bulut, Hakan
2017-10-01
Relatively high prevalence and mortality rates of bovine ephemeral fever (BEF) have been reported in recent epidemics in some countries, including Turkey, when compared with previous outbreaks. A limited number of complete genome sequences of BEF virus (BEFV) are available in the GenBank Database. In this study, the complete genome of highly pathogenic BEFV isolated during an outbreak in Turkey in 2012 was analyzed for genetic characterization. The complete genome of the Turkish BEFV isolate was amplified by reverse transcription-polymerase chain reaction (RT-PCR) and sequenced. It was found that the complete genome of the Turkish BEFV isolate was 14,901 nt in length. The complete genome sequence obtained from the study showed 91-92% identity at nucleotide level to Australian (BB7721) and Chinese (Bovine/China/Henan1/2012) BEFV isolates. Phylogenetic analysis of the glycoprotein gene of the Turkish BEFV isolate also showed that Turkish isolates were closely related to Israeli isolates. Because of the limited number of complete BEFV genome sequences, the results from this study will be useful for understanding the global molecular epidemiology and geodynamics of BEF.
Huang, Youhua; Huang, Xiaohong; Liu, Hong; Gong, Jie; Ouyang, Zhengliang; Cui, Huachun; Cao, Jianhao; Zhao, Yingtao; Wang, Xiujie; Jiang, Yulin; Qin, Qiwei
2009-01-01
Background Soft-shelled turtle iridovirus (STIV) is the causative agent of severe systemic diseases in cultured soft-shelled turtles (Trionyx sinensis). To our knowledge, the only molecular information available on STIV mainly concerns the highly conserved STIV major capsid protein. The complete sequence of the STIV genome is not yet available. Therefore, determining the genome sequence of STIV and providing a detailed bioinformatic analysis of its genome content and evolution status will facilitate further understanding of the taxonomic elements of STIV and the molecular mechanisms of reptile iridovirus pathogenesis. Results We determined the complete nucleotide sequence of the STIV genome using 454 Life Science sequencing technology. The STIV genome is 105 890 bp in length with a base composition of 55.1% G+C. Computer assisted analysis revealed that the STIV genome contains 105 potential open reading frames (ORFs), which encode polypeptides ranging from 40 to 1,294 amino acids and 20 microRNA candidates. Among the putative proteins, 20 share homology with the ancestral proteins of the nuclear and cytoplasmic large DNA viruses (NCLDVs). Comparative genomic analysis showed that STIV has the highest degree of sequence conservation and a colinear arrangement of genes with frog virus 3 (FV3), followed by Tiger frog virus (TFV), Ambystoma tigrinum virus (ATV), Singapore grouper iridovirus (SGIV), Grouper iridovirus (GIV) and other iridovirus isolates. Phylogenetic analysis based on conserved core genes and complete genome sequence of STIV with other virus genomes was performed. Moreover, analysis of the gene gain-and-loss events in the family Iridoviridae suggested that the genes encoded by iridoviruses have evolved for favoring adaptation to different natural host species. Conclusion This study has provided the complete genome sequence of STIV. Phylogenetic analysis suggested that STIV and FV3 are strains of the same viral species belonging to the Ranavirus genus in the Iridoviridae family. Given virus-host co-evolution and the phylogenetic relationship among vertebrates from fish to reptiles, we propose that iridovirus might transmit between reptiles and amphibians and that STIV and FV3 are strains of the same viral species in the Ranavirus genus. PMID:19439104
Pfeiffer, Friedhelm; Zamora-Lagos, Maria-Antonia; Blettinger, Martin; Yeroslaviz, Assa; Dahl, Andreas; Gruber, Stephan; Habermann, Bianca H
2018-01-05
Due to the predominant usage of short-read sequencing to date, most bacterial genome sequences reported in the last years remain at the draft level. This precludes certain types of analyses, such as the in-depth analysis of genome plasticity. Here we report the finalized genome sequence of the environmental strain Aeromonas salmonicida subsp. pectinolytica 34mel, for which only a draft genome with 253 contigs is currently available. Successful completion of the transposon-rich genome critically depended on the PacBio long read sequencing technology. Using finalized genome sequences of A. salmonicida subsp. pectinolytica and other Aeromonads, we report the detailed analysis of the transposon composition of these bacterial species. Mobilome evolution is exemplified by a complex transposon, which has shifted from pathogenicity-related to environmental-related gene content in A. salmonicida subsp. pectinolytica 34mel. Obtaining the complete, circular genome of A. salmonicida subsp. pectinolytica allowed us to perform an in-depth analysis of its mobilome. We demonstrate the mobilome-dependent evolution of this strain's genetic profile from pathogenic to environmental.
Miyoshi-Akiyama, Tohru; Satou, Kazuhito; Kato, Masako; Shiroma, Akino; Matsumura, Kazunori; Tamotsu, Hinako; Iwai, Hiroki; Teruya, Kuniko; Funatogawa, Keiji; Hirano, Takashi; Kirikae, Teruo
2015-01-01
We report the completely annotated genome sequence of Mycobacterium tuberculosis (Zopf) Lehmann and Neumann (ATCC35812) (Kurono), which is a used for virulence and/or immunization studies. The complete genome sequence of M. tuberculosis Kurono was determined with a length of 4,415,078 bp and a G+C content of 65.60%. The chromosome was shown to contain a total of 4,340 protein-coding genes, 53 tRNA genes, one transfer messenger RNA for all amino acids, and 1 rrn operon. Lineage analysis based on large sequence polymorphisms indicated that M. tuberculosis Kurono belongs to the Euro-American lineage (lineage 4). Phylogenetic analysis using whole genome sequences of M. tuberculosis Kurono in addition to 22 M. tuberculosis complex strains indicated that H37Rv is the closest relative of Kurono based on the results of phylogenetic analysis. These findings provide a basis for research using M. tuberculosis Kurono, especially in animal models. Copyright © 2014 Elsevier Ltd. All rights reserved.
Molecular variability analysis of five new complete cacao swollen shoot virus genomic sequences.
Muller, E; Sackey, S
2005-01-01
Cacao swollen shoot virus (CSSV), a member of the family Caulimovi-ridae, genus Badnavirus occurs in all the main cacao-growing areas of West Africa. We amplified, cloned and sequenced complete genomes of five new isolates, two originating from Togo and three originating from Ghana. The genome of these five newly sequenced isolates all contain the five putative open reading frames I, II, III, X and Y described for the first sequenced CSSV isolate, Agou1 originating from Togo. Their genomes have been aligned with the genome of Agou1. The nucleotide and amino acid sequence identities between isolates have been calculated and a phylogenetic analysis has been made including other pararetroviruses. Maximum nucleotide sequence variability between complete genomes of CSSV isolates was 29.4%. Geographical differentiation between isolates appears more important than differentiation between mild and severe isolates. ORF X differs greatly in size and sequence between the Togolese isolates Nyongbo2 and Agou1, and the four other isolates, its functional role is therefore clearly questionable.
Genomic Diversity and Evolution of the Lyssaviruses
Delmas, Olivier; Holmes, Edward C.; Talbi, Chiraz; Larrous, Florence; Dacheux, Laurent; Bouchier, Christiane; Bourhy, Hervé
2008-01-01
Lyssaviruses are RNA viruses with single-strand, negative-sense genomes responsible for rabies-like diseases in mammals. To date, genomic and evolutionary studies have most often utilized partial genome sequences, particularly of the nucleoprotein and glycoprotein genes, with little consideration of genome-scale evolution. Herein, we report the first genomic and evolutionary analysis using complete genome sequences of all recognised lyssavirus genotypes, including 14 new complete genomes of field isolates from 6 genotypes and one genotype that is completely sequenced for the first time. In doing so we significantly increase the extent of genome sequence data available for these important viruses. Our analysis of these genome sequence data reveals that all lyssaviruses have the same genomic organization. A phylogenetic analysis reveals strong geographical structuring, with the greatest genetic diversity in Africa, and an independent origin for the two known genotypes that infect European bats. We also suggest that multiple genotypes may exist within the diversity of viruses currently classified as ‘Lagos Bat’. In sum, we show that rigorous phylogenetic techniques based on full length genome sequence provide the best discriminatory power for genotype classification within the lyssaviruses. PMID:18446239
Complete Genome Sequence of the Probiotic Strain Lactobacillus salivarius LPM01
Codoñer, Francisco M.; Martinez-Blanch, Juan F.; Acevedo-Piérart, Marcelo; Ormeño, M. Loreto; Ramón, Daniel
2016-01-01
Lactobacillus salivarius LPM01 (DSM 22150) is a probiotic strain able to improve health status in immunocompromised people. Here, we report its complete genome sequence deciphered by PacBio single-molecule real-time (SMRT) technology. Analysis of the sequence may provide insights into its functional activity and safety assessment. PMID:27881545
Behera, Bijay Kumar; Kumari, Kavita; Baisvar, Vishwamitra Singh; Rout, Ajaya Kumar; Pakrashi, Sudip; Paria, Prasenjet; Jena, J K
2017-01-01
In the present study, the complete mitochondrial genome sequence of Labeo gonius is reported using PGM sequencer (Ion Torrent). The complete mitogenome of L. gonius is obtained by the de novo sequences assembly of genomic reads using the Torrent Mapping Alignment Program (TMAP) which is 16 614 bp in length. The mitogenome of L. gonius comprised of 13 protein-coding genes, 22 tRNAs, 2 rRNA genes, and D-loop as control region along with gene order and organization, being similar to most of other fish mitogenomes of NCBI databases. The mitogenome in the present study has 99% similarity to the complete mitogenome sequence of Labeo fimbriatus, as reported earlier. The phylogenetic analysis of Cypriniformes depicted that their mitogenomes are closely related to each other. The complete mitogenome sequence of L. gonius would be helpful in understanding the population genetics, phylogenetics, and evolution of Indian Carps.
Xiao, Sa; Paldurai, Anandan; Nayak, Baibaswata; Samuel, Arthur; Bharoto, Eny E.; Prajitno, Teguh Y.; Collins, Peter L.
2012-01-01
Eight highly virulent Newcastle disease virus (NDV) strains were isolated from vaccinated commercial chickens in Indonesia during outbreaks in 2009 and 2010. The complete genome sequences of two NDV strains and the sequences of the surface protein genes (F and HN) of six other strains were determined. Phylogenetic analysis classified them into two new subgroups of genotype VII in the class II cluster that were genetically distinct from vaccine strains. This is the first report of complete genome sequences of NDV strains isolated from chickens in Indonesia. PMID:22532534
Saw, Jimmy H. W.; Yuryev, Anton; Kanbe, Masaomi; Hou, Shaobin; Young, Aaron G.; Aizawa, Shin-Ichi
2012-01-01
Saprospira grandis is a coastal marine bacterium that can capture and prey upon other marine bacteria using a mechanism known as ‘ixotrophy’. Here, we present the complete genome sequence of Saprospira grandis str. Lewin isolated from La Jolla beach in San Diego, California. The complete genome sequence comprises a chromosome of 4.35 Mbp and a plasmid of 54.9 Kbp. Genome analysis revealed incomplete pathways for the biosynthesis of nine essential amino acids but presence of a large number of peptidases. The genome encodes multiple copies of sensor globin-coupled rsbR genes thought to be essential for stress response and the presence of such sensor globins in Bacteroidetes is unprecedented. A total of 429 spacer sequences within the three CRISPR repeat regions were identified in the genome and this number is the largest among all the Bacteroidetes sequenced to date. PMID:22675601
Complete mitochondrial genome of the fennec fox (Vulpes zerda).
Yang, Xiufeng; Zhao, Chao; Zhang, Honghai; Zhang, Jin; Chen, Lei; Sha, Weilai; Liu, Guangshuai
2016-01-01
In this study, the complete mitochondrial genome of the fennec fox (Vulpes zerda) was sequenced using blood samples obtained from a female individual in Shanghai wildlife Park. Sequence analysis showed that the content of T (26.7%) in total composition was no more than C (27.2%), which is different from most of Canide individuals sequenced previously.
Complete Genome Sequence of the Probiotic Strain Lactobacillus salivarius LPM01.
Chenoll, Empar; Codoñer, Francisco M; Martinez-Blanch, Juan F; Acevedo-Piérart, Marcelo; Ormeño, M Loreto; Ramón, Daniel; Genovés, Salvador
2016-11-23
Lactobacillus salivarius LPM01 (DSM 22150) is a probiotic strain able to improve health status in immunocompromised people. Here, we report its complete genome sequence deciphered by PacBio single-molecule real-time (SMRT) technology. Analysis of the sequence may provide insights into its functional activity and safety assessment. Copyright © 2016 Chenoll et al.
Khamrin, Pattara; Okitsu, Shoko; Ushijima, Hiroshi; Maneekarn, Niwat
2013-07-01
Epidemiological surveillance of human bocavirus (HBoV) was conducted on fecal specimens collected from hospitalized children with diarrhea in Chiang Mai, Thailand in 2011. By partial sequence analysis of VP1 gene, an unusual strain of HBoV (CMH-S011-11), was initially identified as HBoV4. The complete genome sequence of CMH-S011-11 was performed and analyzed further to clarify whether it was a recombinant strain or a new HBoV variant. Analysis of complete genome sequence revealed that the coding sequence starting from NS1, NP1 to VP1/VP2 was 4795 nucleotides long. Interestingly, the nucleotide sequence of NS1 gene of CMH-S011-11 was most closely related to the HBoV2 reference strains detected in Pakistan, which contradicted to the initial genotyping result of the partial VP1 region in the previous study. In addition, comparison of NP1 nucleotide sequence of CMH-S011-11 with those of other HBoV1-4 reference strains also revealed a high level of sequence identity with HBoV2. On the other hand, nucleotide sequence of VP1/VP2 gene of CMH-S011-11 was most closely related to those of HBoV4 reference strains detected in Nigeria. The overall full-length sequence analysis revealed that this CMH-S011-11 was grouped within HBoV4 species, but located in a separate branch from other HBoV4 prototype strains. Recombination analysis revealed that CMH-S011-11 was the result of recombination between HBoV2 and HBoV4 strains with the break point located near the start codon of VP2. Copyright © 2013 Elsevier B.V. All rights reserved.
2012-01-01
Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR) are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas. PMID:23256920
Tu, Jianfeng; Yang, Ying; Yang, Fuhe; Xing, Xiumei
2017-03-01
Peking duck (Anas platyrhychos) and Muscovy duck (Cairina moschata) are two types of domestic ducks and the most popular meat breeds on the world. In this study, we sequenced and compared complete mitochondrial genomes of both breeds. In order to investigate the phylogeny of both breeds within Anseriformes, the sequences of concatenated 12 protein-coding genes were used for phylogenetic analysis. The result was consistent with most of the previous morphological and molecular studies. Our complete mitochondrial genome sequences of both breeds will be useful information in phylogenetics, and be available as basic data for the breeding and genetics.
Chenoll, Empar; Codoñer, Francisco M; Martinez-Blanch, Juan F; Ramón, Daniel; Genovés, Salvador; Menabrito, Marco
2016-04-21
ITALIC! Lactobacillus rhamnosusBPL5 (CECT 8800), is a probiotic strain suitable for the treatment of bacterial vaginosis. Here, we report its complete genome sequence deciphered by PacBio single-molecule real-time (SMRT) technology. Analysis of the sequence may provide insight into its functional activity. Copyright © 2016 Chenoll et al.
Sequencing and phylogenetic analysis of tobacco virus 2, a polerovirus from Nicotiana tabacum.
Zhou, Benguo; Wang, Fang; Zhang, Xuesong; Zhang, Lina; Lin, Huafeng
2017-07-01
The complete genome sequence of a new virus, provisionally named tobacco virus 2 (TV2), was determined and identified from leaves of tobacco (Nicotiana tabacum) exhibiting leaf mosaic, yellowing, and deformity, in Anhui Province, China. The genome sequence of TV2 comprises 5,979 nucleotides, with 87% nucleotide sequence identity to potato leafroll virus (PLRV). Its genome organization is similar to that of PLRV, containing six open reading frames (ORFs) that potentially encode proteins with putative functions in cell-to-cell movement and suppression of RNA silencing. Phylogenetic analysis of the nucleotide sequence placed TV2 alongside members of the genus Polerovirus in the family Luteoviridae. To the best our knowledge, this study is the first report of a complete genome sequence of a new polerovirus identified in tobacco.
Li, Wen Hui; Jia, Wan Zhong; Qu, Zi Gang; Xie, Zhi Zhou; Luo, Jian Xun; Yin, Hong; Sun, Xiao Lin; Blaga, Radu; Fu, Bao Quan
2013-04-01
A total of 16 Taenia multiceps isolates collected from naturally infected sheep or goats in Gansu Province, China were characterized by sequences of mitochondrial cytochrome c oxidase subunit 1 (cox1) gene. The complete cox1 gene was amplified for individual T. multiceps isolates by PCR, ligated to pMD18T vector, and sequenced. Sequence analysis indicated that out of 16 T. multiceps isolates 10 unique cox1 gene sequences of 1,623 bp were obtained with sequence variation of 0.12-0.68%. The results showed that the cox1 gene sequences were highly conserved among the examined T. multiceps isolates. However, they were quite different from those of the other Taenia species. Phylogenetic analysis based on complete cox1 gene sequences revealed that T. multiceps isolates were composed of 3 genotypes and distinguished from the other Taenia species.
Li, Wen Hui; Jia, Wan Zhong; Qu, Zi Gang; Xie, Zhi Zhou; Luo, Jian Xun; Yin, Hong; Sun, Xiao Lin; Blaga, Radu
2013-01-01
A total of 16 Taenia multiceps isolates collected from naturally infected sheep or goats in Gansu Province, China were characterized by sequences of mitochondrial cytochrome c oxidase subunit 1 (cox1) gene. The complete cox1 gene was amplified for individual T. multiceps isolates by PCR, ligated to pMD18T vector, and sequenced. Sequence analysis indicated that out of 16 T. multiceps isolates 10 unique cox1 gene sequences of 1,623 bp were obtained with sequence variation of 0.12-0.68%. The results showed that the cox1 gene sequences were highly conserved among the examined T. multiceps isolates. However, they were quite different from those of the other Taenia species. Phylogenetic analysis based on complete cox1 gene sequences revealed that T. multiceps isolates were composed of 3 genotypes and distinguished from the other Taenia species. PMID:23710087
Analysis of the complete genome of subgroup A' hepatitis B virus isolates from South Africa.
Kramvis, Anna; Weitzmann, Louise; Owiredu, William K B A; Kew, Michael C
2002-04-01
A phylogenetic analysis is presented of six complete and seven pre-S1/S2/S gene sequences of hepatitis B virus (HBV) isolates from South Africa. Five of the full-length sequences and all of the pre-S2/S sequences have been previously reported. Four of the six complete genomes and three of the five incomplete sequences clustered with subgroup A', a unique segment of genotype A of HBV previously identified in 60% of South African isolates using analysis of the pre-S2/S region alone. This separation was also evident when the polymerase open reading frame was analysed, but not on analysis of either the X or pre-core/core genes. Amino acids were identified in the pre-S1 and polymerase regions specific to subgroup A'. In common with genotype D, 10 of 11 genotype A South African isolates had an 11 amino acid deletion in the amino end of the pre-S1 region. This deletion is also found in hepadnaviruses from non-human primates.
Ciotlos, Serban; Mao, Qing; Zhang, Rebecca Yu; Li, Zhenyu; Chin, Robert; Gulbahce, Natali; Liu, Sophie Jia; Drmanac, Radoje; Peters, Brock A
2016-01-01
The cell line BT-474 is a popular cell line for studying the biology of cancer and developing novel drugs. However, there is no complete, published genome sequence for this highly utilized scientific resource. In this study we sought to provide a comprehensive and useful data set for the scientific community by generating a whole genome sequence for BT-474. Five μg of genomic DNA, isolated from an early passage of the BT-474 cell line, was used to generate a whole genome sequence (114X coverage) using Complete Genomics' standard sequencing process. To provide additional variant phasing and structural variation data we also processed and analyzed two separate libraries of 5 and 6 individual cells to depths of 99X and 87X, respectively, using Complete Genomics' Long Fragment Read (LFR) technology. BT-474 is a highly aneuploid cell line with an extremely complex genome sequence. This ~300X total coverage genome sequence provides a more complete understanding of this highly utilized cell line at the genomic level.
Whole-genome random sequencing and assembly of Haemophilus influenzae Rd
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fleischmann, R.D.; Adams, M.D.; White, O.
1995-07-28
An approach for genome analysis based on sequencing and assembly of unselected pieces of DNA from the whole chromosome has been applied to obtain the complete nucleotide sequence (1,830,137 base pairs) of the genome from the bacterium Haemophilus influenzae Rd. This approach eliminates the need for initial mapping efforts and is therefore applicable to the vast array of microbial species for which genome maps are unavailable. The H. influenzae Rd genome sequence (Genome Sequence DataBase accession number L42023) represents the only complete genome sequence from a free-living organism. 46 refs., 4 figs., 4 tabs.
Trucco, Verónica; de Breuil, Soledad; Bejerman, Nicolás; Lenardon, Sergio; Giolitti, Fabián
2014-06-01
The complete nucleotide sequence of an Alfalfa mosaic virus (AMV) isolate infecting alfalfa (Medicago sativa L.) in Argentina, AMV-Arg, was determined. The virus genome has the typical organization described for AMV, and comprises 3,643, 2,593, and 2,038 nucleotides for RNA1, 2 and 3, respectively. The whole genome sequence and each encoding region were compared with those of other four isolates that have been completely sequenced from China, Italy, Spain and USA. The nucleotide identity percentages ranged from 95.9 to 99.1 % for the three RNAs and from 93.7 to 99 % for the protein 1 (P1), protein 2 (P2), movement protein and coat protein (CP) encoding regions, whereas the amino acid identity percentages of these proteins ranged from 93.4 to 99.5 %, the lowest value corresponding to P2. CP sequences of AMV-Arg were compared with those of other 25 available isolates, and the phylogenetic analysis based on the CP gene was carried out. The highest percentage of nucleotide sequence identity of the CP gene was 98.3 % with a Chinese isolate and 98.6 % at the amino acid level with four isolates, two from Italy, one from Brazil and the remaining one from China. The phylogenetic analysis showed that AMV-Arg is closely related to subgroup I of AMV isolates. To our knowledge, this is the first report of a complete nucleotide sequence of AMV from South America and the first worldwide report of complete nucleotide sequence of AMV isolated from alfalfa as natural host.
Nakano, Shogo; Asano, Yasuhisa
2015-02-03
Development of software and methods for design of complete sequences of functional proteins could contribute to studies of protein engineering and protein evolution. To this end, we developed the INTMSAlign software, and used it to design functional proteins and evaluate their usefulness. The software could assign both consensus and correlation residues of target proteins. We generated three protein sequences with S-selective hydroxynitrile lyase (S-HNL) activity, which we call designed S-HNLs; these proteins folded as efficiently as the native S-HNL. Sequence and biochemical analysis of the designed S-HNLs suggested that accumulation of neutral mutations occurs during the process of S-HNLs evolution from a low-activity form to a high-activity (native) form. Taken together, our results demonstrate that our software and the associated methods could be applied not only to design of complete sequences, but also to predictions of protein evolution, especially within families such as esterases and S-HNLs.
NASA Astrophysics Data System (ADS)
Nakano, Shogo; Asano, Yasuhisa
2015-02-01
Development of software and methods for design of complete sequences of functional proteins could contribute to studies of protein engineering and protein evolution. To this end, we developed the INTMSAlign software, and used it to design functional proteins and evaluate their usefulness. The software could assign both consensus and correlation residues of target proteins. We generated three protein sequences with S-selective hydroxynitrile lyase (S-HNL) activity, which we call designed S-HNLs; these proteins folded as efficiently as the native S-HNL. Sequence and biochemical analysis of the designed S-HNLs suggested that accumulation of neutral mutations occurs during the process of S-HNLs evolution from a low-activity form to a high-activity (native) form. Taken together, our results demonstrate that our software and the associated methods could be applied not only to design of complete sequences, but also to predictions of protein evolution, especially within families such as esterases and S-HNLs.
Genetic and phylogenetic analysis of a novel parvovirus isolated from chickens in Guangxi, China.
Feng, Bin; Xie, Zhixun; Deng, Xianwen; Xie, Liji; Xie, Zhiqin; Huang, Li; Fan, Qin; Luo, Sisi; Huang, Jiaoling; Zhang, Yanfang; Zeng, Tingting; Wang, Sheng; Wang, Leyi
2016-11-01
A previously unidentified chicken parvovirus (ChPV) strain, associated with runting-stunting syndrome (RSS), is now endemic among chickens in China. To explore the genetic diversity of ChPV strains, we determined the first complete genome sequence of a novel ChPV isolate (GX-CH-PV-7) identified in chickens in Guang Xi, China, and showed moderate genome sequence similarity to reference strains. Analysis showed that the viral genome sequence is 86.4 %-93.9 % identical to those of other ChPVs. Genetic and phylogenetic analyses showed that this newly emergent GX-CH-PV-7 is closely related to Gallus gallus enteric parvovirus isolate ChPV 798 from the USA, indicating that they may share a common ancestor. The complete DNA sequence is 4612 bp long with an A+T content of 56.66 %. We determined the first complete genome sequence of a previously unidentified ChPV strain to elucidate its origin and evolutionary status.
Phylogenetic Analysis of Ruminant Theileria spp. from China Based on 28S Ribosomal RNA Gene
Gou, Huitian; Guan, Guiquan; Ma, Miling; Liu, Aihong; Liu, Zhijie; Xu, Zongke; Ren, Qiaoyun; Li, Youquan; Yang, Jifei; Chen, Ze
2013-01-01
Species identification using DNA sequences is the basis for DNA taxonomy. In this study, we sequenced the ribosomal large-subunit RNA gene sequences (3,037-3,061 bp) in length of 13 Chinese Theileria stocks that were infective to cattle and sheep. The complete 28S rRNA gene is relatively difficult to amplify and its conserved region is not important for phylogenetic study. Therefore, we selected the D2-D3 region from the complete 28S rRNA sequences for phylogenetic analysis. Our analyses of 28S rRNA gene sequences showed that the 28S rRNA was useful as a phylogenetic marker for analyzing the relationships among Theileria spp. in ruminants. In addition, the D2-D3 region was a short segment that could be used instead of the whole 28S rRNA sequence during the phylogenetic analysis of Theileria, and it may be an ideal DNA barcode. PMID:24327775
Phylogenetic analysis of ruminant Theileria spp. from China based on 28S ribosomal RNA gene.
Gou, Huitian; Guan, Guiquan; Ma, Miling; Liu, Aihong; Liu, Zhijie; Xu, Zongke; Ren, Qiaoyun; Li, Youquan; Yang, Jifei; Chen, Ze; Yin, Hong; Luo, Jianxun
2013-10-01
Species identification using DNA sequences is the basis for DNA taxonomy. In this study, we sequenced the ribosomal large-subunit RNA gene sequences (3,037-3,061 bp) in length of 13 Chinese Theileria stocks that were infective to cattle and sheep. The complete 28S rRNA gene is relatively difficult to amplify and its conserved region is not important for phylogenetic study. Therefore, we selected the D2-D3 region from the complete 28S rRNA sequences for phylogenetic analysis. Our analyses of 28S rRNA gene sequences showed that the 28S rRNA was useful as a phylogenetic marker for analyzing the relationships among Theileria spp. in ruminants. In addition, the D2-D3 region was a short segment that could be used instead of the whole 28S rRNA sequence during the phylogenetic analysis of Theileria, and it may be an ideal DNA barcode.
2014-01-01
Background Recent innovations in sequencing technologies have provided researchers with the ability to rapidly characterize the microbial content of an environmental or clinical sample with unprecedented resolution. These approaches are producing a wealth of information that is providing novel insights into the microbial ecology of the environment and human health. However, these sequencing-based approaches produce large and complex datasets that require efficient and sensitive computational analysis workflows. Many recent tools for analyzing metagenomic-sequencing data have emerged, however, these approaches often suffer from issues of specificity, efficiency, and typically do not include a complete metagenomic analysis framework. Results We present PathoScope 2.0, a complete bioinformatics framework for rapidly and accurately quantifying the proportions of reads from individual microbial strains present in metagenomic sequencing data from environmental or clinical samples. The pipeline performs all necessary computational analysis steps; including reference genome library extraction and indexing, read quality control and alignment, strain identification, and summarization and annotation of results. We rigorously evaluated PathoScope 2.0 using simulated data and data from the 2011 outbreak of Shiga-toxigenic Escherichia coli O104:H4. Conclusions The results show that PathoScope 2.0 is a complete, highly sensitive, and efficient approach for metagenomic analysis that outperforms alternative approaches in scope, speed, and accuracy. The PathoScope 2.0 pipeline software is freely available for download at: http://sourceforge.net/projects/pathoscope/. PMID:25225611
Sailaja, B; Anjum, Najreen; Patil, Yogesh K; Agarwal, Surekha; Malathi, P; Krishnaveni, D; Balachandran, S M; Viraktamath, B C; Mangrauthia, Satendra K
2013-12-01
In this study, complete genome of a south Indian isolate of Rice tungro spherical virus (RTSV) from Andhra Pradesh (AP) was sequenced, and the predicted amino acid sequence was analysed. The RTSV RNA genome consists of 12,171 nt without the poly(A) tail, encoding a putative typical polyprotein of 3,470 amino acids. Furthermore, cleavage sites and sequence motifs of the polyprotein were predicted. Multiple alignment with other RTSV isolates showed a nucleotide sequence identity of 95% to east Indian isolates and 90% to Philippines isolates. A phylogenetic tree based on complete genome sequence showed that Indian isolates clustered together, while Vt6 and PhilA isolates of Philippines formed two separate clusters. Twelve recombination events were detected in RNA genome of RTSV using the Recombination Detection Program version 3. Recombination analysis suggested significant role of 5' end and central region of genome in virus evolution. Further, AP and Odisha isolates appeared as important RTSV isolates involved in diversification of this virus in India through recombination phenomenon. The new addition of complete genome of first south Indian isolate provided an opportunity to establish the molecular evolution of RTSV through recombination analysis and phylogenetic relationship.
The complete chloroplast genome sequence of the medicinal plant Salvia miltiorrhiza.
Qian, Jun; Song, Jingyuan; Gao, Huanhuan; Zhu, Yingjie; Xu, Jiang; Pang, Xiaohui; Yao, Hui; Sun, Chao; Li, Xian'en; Li, Chuyuan; Liu, Juyan; Xu, Haibin; Chen, Shilin
2013-01-01
Salvia miltiorrhiza is an important medicinal plant with great economic and medicinal value. The complete chloroplast (cp) genome sequence of Salvia miltiorrhiza, the first sequenced member of the Lamiaceae family, is reported here. The genome is 151,328 bp in length and exhibits a typical quadripartite structure of the large (LSC, 82,695 bp) and small (SSC, 17,555 bp) single-copy regions, separated by a pair of inverted repeats (IRs, 25,539 bp). It contains 114 unique genes, including 80 protein-coding genes, 30 tRNAs and four rRNAs. The genome structure, gene order, GC content and codon usage are similar to the typical angiosperm cp genomes. Four forward, three inverted and seven tandem repeats were detected in the Salvia miltiorrhiza cp genome. Simple sequence repeat (SSR) analysis among the 30 asterid cp genomes revealed that most SSRs are AT-rich, which contribute to the overall AT richness of these cp genomes. Additionally, fewer SSRs are distributed in the protein-coding sequences compared to the non-coding regions, indicating an uneven distribution of SSRs within the cp genomes. Entire cp genome comparison of Salvia miltiorrhiza and three other Lamiales cp genomes showed a high degree of sequence similarity and a relatively high divergence of intergenic spacers. Sequence divergence analysis discovered the ten most divergent and ten most conserved genes as well as their length variation, which will be helpful for phylogenetic studies in asterids. Our analysis also supports that both regional and functional constraints affect gene sequence evolution. Further, phylogenetic analysis demonstrated a sister relationship between Salvia miltiorrhiza and Sesamum indicum. The complete cp genome sequence of Salvia miltiorrhiza reported in this paper will facilitate population, phylogenetic and cp genetic engineering studies of this medicinal plant.
Detection and characterization of hepatitis A virus circulating in Egypt.
Hamza, Hazem; Abd-Elshafy, Dina Nadeem; Fayed, Sayed A; Bahgat, Mahmoud Mohamed; El-Esnawy, Nagwa Abass; Abdel-Mobdy, Emam
2017-07-01
Hepatitis A virus (HAV) still poses a considerable problem worldwide. In the current study, hepatitis A virus was recovered from wastewater samples collected from three wastewater treatment plants over one year. Using RT-PCR, HAV was detected in 43 out of 68 samples (63.2%) representing both inlet and outlet. Eleven positive samples were subjected to sequencing targeting the VP1-2A junction region. Phylogenetic analysis revealed that all samples belonged to subgenotype IB with few substitutions at the amino acid level. The complete sequence of one isolate (HAV/Egy/BI-11/2015) showed that the similarity at the amino acid level was not reflected at the nucleotide level. However, the deduced amino acid sequence derived from the complete nucleotide sequence showed distinct substitutions in the 2B, 2C, and 3A regions. Recombination analysis revealed a recombination event between X75215 (subgenotype IA) and AF268396 (subgenotype IB) involving a portion of the 2B nonstructural protein coding region (nucleotides 3757-3868) assuming the herein characterized sequence an actual recombinant. Despite the role of recombination in picornaviruses evolution, its involvement in HAV evolution has rarely been reported, and this may be due to the limited available complete HAV sequences. To our knowledge, this represents the first characterized complete sequence of an Egyptian isolate and the described recombination event provides an important update on the circulating HAV strains in Egypt.
First complete genome sequence of infectious laryngotracheitis virus
2011-01-01
Background Infectious laryngotracheitis virus (ILTV) is an alphaherpesvirus that causes acute respiratory disease in chickens worldwide. To date, only one complete genomic sequence of ILTV has been reported. This sequence was generated by concatenating partial sequences from six different ILTV strains. Thus, the full genomic sequence of a single (individual) strain of ILTV has not been determined previously. This study aimed to use high throughput sequencing technology to determine the complete genomic sequence of a live attenuated vaccine strain of ILTV. Results The complete genomic sequence of the Serva vaccine strain of ILTV was determined, annotated and compared to the concatenated ILTV reference sequence. The genome size of the Serva strain was 152,628 bp, with a G + C content of 48%. A total of 80 predicted open reading frames were identified. The Serva strain had 96.5% DNA sequence identity with the concatenated ILTV sequence. Notably, the concatenated ILTV sequence was found to lack four large regions of sequence, including 528 bp and 594 bp of sequence in the UL29 and UL36 genes, respectively, and two copies of a 1,563 bp sequence in the repeat regions. Considerable differences in the size of the predicted translation products of 4 other genes (UL54, UL30, UL37 and UL38) were also identified. More than 530 single-nucleotide polymorphisms (SNPs) were identified. Most SNPs were located within three genomic regions, corresponding to sequence from the SA-2 ILTV vaccine strain in the concatenated ILTV sequence. Conclusions This is the first complete genomic sequence of an individual ILTV strain. This sequence will facilitate future comparative genomic studies of ILTV by providing an appropriate reference sequence for the sequence analysis of other ILTV strains. PMID:21501528
Zhang, Xiao-Yan; Xiang, Hai-Ying; Zhou, Cui-Ji; Li, Da-Wei; Yu, Jia-Lin; Han, Cheng-Gui
2014-08-01
For brassica yellows virus (BrYV), proposed to be a member of a new polerovirus species, two clearly distinct genotypes (BrYV-A and BrYV-B) have been described. In this study, the complete nucleotide sequences of two BrYV isolates from radish and Chinese cabbage were determined. Sequence analysis suggested that these isolates represent a new genotype, referred to here as BrYV-C. The full-length sequences of the two BrYV-C isolates shared 93.4-94.8 % identity with BrYV-A and BrYV-B. Further phylogenetic analysis showed that the BrYV-C isolates formed a subgroup that was distinct from the BrYV-A and BrYV-B isolates based on all of the proteins except P5.
Umetsu, Kazuo; Iwabuchi, Naruki; Yuasa, Isao; Saitou, Naruya; Clark, Paul F; Boxshall, Geoff; Osawa, Motoki; Igarashi, Keiji
2002-12-01
The complete mitochondrial DNA (mtNDA) of the tadpole shrimp Triops cancriformis was sequenced. The sequence consisted of 15,101 bp with an A+T content of 69%. Its gene arrangement was identical with those sequences of the water flea (Daphnia pulex) and giant tiger prawn (Penaeus monodon), whereas it differed from that of the brine shrimp (Artemia franciscana) in the arrangement of its genes for tRNAs. Phylogenetic analysis revealed T. cancriformis to be more closely related to the water flea than to the brine shrimp and giant tiger prawn. We also compared the 16S rRNA sequences of five formalin-fixed tadpole shrimps that had been collected in five different locations and stored in a museum. The sequence divergence was in the range of 0-1.51%, suggesting that those samples were closely related to each other.
Ogura, Kohei; Watanabe, Shinya; Kirikae, Teruo; Miyoshi-Akiyama, Tohru
2017-01-01
Epidemiologic typing of Streptococcus pyogenes (GAS) is frequently based on the genotype of the emm gene, which encodes M/Emm protein. In this study, the complete genome sequence of GAS emm3 strain M3-b, isolated from a patient with streptococcal toxic shock syndrome (STSS), was determined. This strain exhibited 99% identity with other complete genome sequences of emm3 strains MGAS315, SSI-1, and STAB902. The complete genomes of five additional strains isolated from Japanese patients with and without STSS were also sequences. Maximum-likelihood phylogenetic analysis showed that strains M3-b, M3-e, and SSI-1, all which were isolated from STSS patients, were relatively close.
Complete genome sequence analysis of a duck circovirus from Guangxi pockmark ducks.
Xie, Liji; Xie, Zhixun; Zhao, Guangyuan; Liu, Jiabo; Pang, Yaoshan; Deng, Xianwen; Xie, Zhiqin; Fan, Qing
2012-12-01
We report here the complete genomic sequence of a novel duck circovirus (DuCV) strain, GX1104, isolated from Guangxi pockmark ducks in Guangxi, China. The whole nucleotide sequence had the highest homology (97.2%) with the sequence of strain TC/2002 (GenBank accession number AY394721.1) and had a low homology (76.8% to 78.6%) with the sequences of other strains isolated from China, Germany, and the United States. This report will help to understand the epidemiology and molecular characteristics of Guangxi pockmark duck circovirus in southern China.
Fomenkov, Alexey; Akimov, Vladimir N; Vasilyeva, Lina V; Andersen, Dale T; Vincze, Tamas; Roberts, Richard J
2017-03-16
This paper describes the complete genome sequences and methylome analysis of six psychrotrophic strains isolated from perennially ice-covered Lake Untersee in Antarctica. Copyright © 2017 Fomenkov et al.
Complete Genome Sequence of Genotype VI Newcastle Disease Viruses Isolated from Pigeons in Pakistan
Wajid, Abdul; Rehmani, Shafqat Fatima; Sharma, Poonam; Goraichuk, Iryna V.; Dimitrov, Kiril M.
2016-01-01
Two complete genome sequences of Newcastle disease virus (NDV) are described here. Virulent isolates pigeon/Pakistan/Lahore/21A/2015 and pigeon/Pakistan/Lahore/25A/2015 were obtained from racing pigeons sampled in the Pakistani province of Punjab during 2015. Phylogenetic analysis of the fusion protein genes and complete genomes classified the isolates as members of NDV class II, genotype VI. PMID:27540069
USDA-ARS?s Scientific Manuscript database
Contigs with sequence similarities to several nucleorhabdoviruses were identified by high-throughput sequencing analysis from a black currant (Ribes nigrum L.) cultivar. The complete genomic sequence of this new nucleorhabdovirus is 14,432 nucleotides. Its genomic organization is typical of nucleorh...
Complete genome sequence of genotype VI Newcastle disease viruses isolated from pigeons in Pakistan
USDA-ARS?s Scientific Manuscript database
Two complete genome sequences of Newcastle disease virus (NDV) are described here. Virulent isolates pigeon/Pakistan/Lahore/21A/2015 and pigeon/Pakistan/Lahore/25A/2015 were obtained from racing pigeons sampled in the Pakistani province of Punjab during 2015. Phylogenetic analysis of the fusion prot...
Zhao, J H; Tu, G J; Wu, X B; Li, C P
2018-05-01
Ortleppascaris sinensis (Nematoda: Ascaridida) is a dominant intestinal nematode of the captive Chinese alligator. However, the epidemiology, molecular ecology and population genetics of this parasite remain largely unexplored. In this study, the complete mitochondrial (mt) genome sequence of O. sinensis was first determined using a polymerase chain reaction (PCR)-based primer-walking strategy, and this is also the first sequencing of the complete mitochondrial genome of a member of the genus Ortleppascaris. The circular mitochondrial genome (13,828 bp) of O. sinensis contained 12 protein-coding, 22 transfer RNA and 2 ribosomal RNA genes, but lacked the ATP synthetase subunit 8 gene. Finally, phylogenetic analysis of mtDNAs indicated that the genus Ortleppascaris should be attributed to the family Heterocheilidae. It is necessary to sequence more mtNDAs of Ortleppascaris nematodes in the future to test and confirm our conclusion. The complete mitochondrial genome sequence of O. sinensis reported here should contribute to molecular diagnosis, epidemiological investigations and ecological studies of O. sinensis and other related Ascaridida nematodes.
Conservation and variability of West Nile virus proteins.
Koo, Qi Ying; Khan, Asif M; Jung, Keun-Ok; Ramdas, Shweta; Miotto, Olivo; Tan, Tin Wee; Brusic, Vladimir; Salmon, Jerome; August, J Thomas
2009-01-01
West Nile virus (WNV) has emerged globally as an increasingly important pathogen for humans and domestic animals. Studies of the evolutionary diversity of the virus over its known history will help to elucidate conserved sites, and characterize their correspondence to other pathogens and their relevance to the immune system. We describe a large-scale analysis of the entire WNV proteome, aimed at identifying and characterizing evolutionarily conserved amino acid sequences. This study, which used 2,746 WNV protein sequences collected from the NCBI GenPept database, focused on analysis of peptides of length 9 amino acids or more, which are immunologically relevant as potential T-cell epitopes. Entropy-based analysis of the diversity of WNV sequences, revealed the presence of numerous evolutionarily stable nonamer positions across the proteome (entropy value of < or = 1). The representation (frequency) of nonamers variant to the predominant peptide at these stable positions was, generally, low (< or = 10% of the WNV sequences analyzed). Eighty-eight fragments of length 9-29 amino acids, representing approximately 34% of the WNV polyprotein length, were identified to be identical and evolutionarily stable in all analyzed WNV sequences. Of the 88 completely conserved sequences, 67 are also present in other flaviviruses, and several have been associated with the functional and structural properties of viral proteins. Immunoinformatic analysis revealed that the majority (78/88) of conserved sequences are potentially immunogenic, while 44 contained experimentally confirmed human T-cell epitopes. This study identified a comprehensive catalogue of completely conserved WNV sequences, many of which are shared by other flaviviruses, and majority are potential epitopes. The complete conservation of these immunologically relevant sequences through the entire recorded WNV history suggests they will be valuable as components of peptide-specific vaccines or other therapeutic applications, for sequence-specific diagnosis of a wide-range of Flavivirus infections, and for studies of homologous sequences among other flaviviruses.
Molecular characterization of a novel Luteovirus from peach identified by high-throughput sequencing
USDA-ARS?s Scientific Manuscript database
Contigs with sequence homologies to Cherry-associated luteovirus were identified by high-throughput sequencing analysis of two peach accessions undergoing quarantine testing. The complete genomic sequences of the two isolates of this virus are 5,819 and 5,814 nucleotides. Their genome organization i...
Droege, Marcus; Hill, Brendon
2008-08-31
The Genome Sequencer FLX System (GS FLX), powered by 454 Sequencing, is a next-generation DNA sequencing technology featuring a unique mix of long reads, exceptional accuracy, and ultra-high throughput. It has been proven to be the most versatile of all currently available next-generation sequencing technologies, supporting many high-profile studies in over seven applications categories. GS FLX users have pursued innovative research in de novo sequencing, re-sequencing of whole genomes and target DNA regions, metagenomics, and RNA analysis. 454 Sequencing is a powerful tool for human genetics research, having recently re-sequenced the genome of an individual human, currently re-sequencing the complete human exome and targeted genomic regions using the NimbleGen sequence capture process, and detected low-frequency somatic mutations linked to cancer.
Pang, Changlong; Li, Ang; Cui, Di; Yang, Jixian; Ma, Fang; Guo, Haijuan
2016-02-20
Klebsiella pneumoniae J1 is a Gram-negative strain, which belongs to a protein-based microbial flocculant-producing bacterium. However, little genetic information is known about this species. Here we carried out a whole-genome sequence analysis of this strain and report the complete genome sequence of this organism and its genetic basis for carbohydrate metabolism, capsule biosynthesis and transport system. Copyright © 2016 Elsevier B.V. All rights reserved.
Li, Yongqiang; Deng, Congliang; Bian, Yong; Zhao, Xiaoli; Zhou, Qi
2017-04-01
Apple stem grooving virus (ASGV), apple chlorotic leaf spot virus (ACLSV), and prunus necrotic ringspot virus (PNRSV) were identified in a crab apple tree by small RNA deep sequencing. The complete genome sequence of ACLSV isolate BJ (ACLSV-BJ) was 7554 nucleotides and shared 67.0%-83.0% nucleotide sequence identity with other ACLSV isolates. A phylogenetic tree based on the complete genome sequence of all available ACLSV isolates showed that ACLSV-BJ clustered with the isolates SY01 from hawthorn, MO5 from apple, and JB, KMS and YH from pear. The complete nucleotide sequence of ASGV-BJ was 6509 nucleotides (nt) long and shared 78.2%-80.7% nucleotide sequence identity with other isolates. ASGV-BJ and the isolate ASGV_kfp clustered together in the phylogenetic tree as an independent clade. Recombination analysis showed that isolate ASGV-BJ was a naturally occurring recombinant.
Ludgate, Jackie L; Wright, James; Stockwell, Peter A; Morison, Ian M; Eccles, Michael R; Chatterjee, Aniruddha
2017-08-31
Formalin fixed paraffin embedded (FFPE) tumor samples are a major source of DNA from patients in cancer research. However, FFPE is a challenging material to work with due to macromolecular fragmentation and nucleic acid crosslinking. FFPE tissue particularly possesses challenges for methylation analysis and for preparing sequencing-based libraries relying on bisulfite conversion. Successful bisulfite conversion is a key requirement for sequencing-based methylation analysis. Here we describe a complete and streamlined workflow for preparing next generation sequencing libraries for methylation analysis from FFPE tissues. This includes, counting cells from FFPE blocks and extracting DNA from FFPE slides, testing bisulfite conversion efficiency with a polymerase chain reaction (PCR) based test, preparing reduced representation bisulfite sequencing libraries and massively parallel sequencing. The main features and advantages of this protocol are: An optimized method for extracting good quality DNA from FFPE tissues. An efficient bisulfite conversion and next generation sequencing library preparation protocol that uses 50 ng DNA from FFPE tissue. Incorporation of a PCR-based test to assess bisulfite conversion efficiency prior to sequencing. We provide a complete workflow and an integrated protocol for performing DNA methylation analysis at the genome-scale and we believe this will facilitate clinical epigenetic research that involves the use of FFPE tissue.
König, Caroline; Alquézar, René; Vellido, Alfredo; Giraldo, Jesús
2018-03-01
G-protein-coupled receptors (GPCRs) are a large and diverse super-family of eukaryotic cell membrane proteins that play an important physiological role as transmitters of extracellular signal. In this paper, we investigate Class C, a member of this super-family that has attracted much attention in pharmacology. The limited knowledge about the complete 3D crystal structure of Class C receptors makes necessary the use of their primary amino acid sequences for analytical purposes. Here, we provide a systematic analysis of distinct receptor sequence segments with regard to their ability to differentiate between seven class C GPCR subtypes according to their topological location in the extracellular, transmembrane, or intracellular domains. We build on the results from the previous research that provided preliminary evidence of the potential use of separated domains of complete class C GPCR sequences as the basis for subtype classification. The use of the extracellular N-terminus domain alone was shown to result in a minor decrease in subtype discrimination in comparison with the complete sequence, despite discarding much of the sequence information. In this paper, we describe the use of Support Vector Machine-based classification models to evaluate the subtype-discriminating capacity of the specific topological sequence segments.
Woo, Hannah L.; O’Dell, Kaela B.; Utturkar, Sagar; ...
2016-11-23
We isolated Thalassospirasp. strain KO164 from eastern Mediterranean seawater and sediment laboratory microcosms enriched on insoluble organosolv lignin under oxic conditions. Furthermore, an analysis of the deep-ocean bacterium’s ability to degrade recalcitrant organics such as lignin near-complete genome sequence, will be presented here.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woo, Hannah L.; O’Dell, Kaela B.; Utturkar, Sagar
We isolated Thalassospirasp. strain KO164 from eastern Mediterranean seawater and sediment laboratory microcosms enriched on insoluble organosolv lignin under oxic conditions. Furthermore, an analysis of the deep-ocean bacterium’s ability to degrade recalcitrant organics such as lignin near-complete genome sequence, will be presented here.
The complete chloroplast genome sequence of Dodonaea viscosa: comparative and phylogenetic analyses.
Saina, Josphat K; Gichira, Andrew W; Li, Zhi-Zhong; Hu, Guang-Wan; Wang, Qing-Feng; Liao, Kuo
2018-02-01
The plant chloroplast (cp) genome is a highly conserved structure which is beneficial for evolution and systematic research. Currently, numerous complete cp genome sequences have been reported due to high throughput sequencing technology. However, there is no complete chloroplast genome of genus Dodonaea that has been reported before. To better understand the molecular basis of Dodonaea viscosa chloroplast, we used Illumina sequencing technology to sequence its complete genome. The whole length of the cp genome is 159,375 base pairs (bp), with a pair of inverted repeats (IRs) of 27,099 bp separated by a large single copy (LSC) 87,204 bp, and small single copy (SSC) 17,972 bp. The annotation analysis revealed a total of 115 unique genes of which 81 were protein coding, 30 tRNA, and four ribosomal RNA genes. Comparative genome analysis with other closely related Sapindaceae members showed conserved gene order in the inverted and single copy regions. Phylogenetic analysis clustered D. viscosa with other species of Sapindaceae with strong bootstrap support. Finally, a total of 249 SSRs were detected. Moreover, a comparison of the synonymous (Ks) and nonsynonymous (Ka) substitution rates in D. viscosa showed very low values. The availability of cp genome reported here provides a valuable genetic resource for comprehensive further studies in genetic variation, taxonomy and phylogenetic evolution of Sapindaceae family. In addition, SSR markers detected will be used in further phylogeographic and population structure studies of the species in this genus.
Novel insect-specific flavivirus isolated from northern Europe
Huhtamo, Eili; Moureau, Gregory; Cook, Shelley; Julkunen, Ora; Putkuri, Niina; Kurkela, Satu; Uzcátegui, Nathalie Y.; Harbach, Ralph E.; Gould, Ernest A.; Vapalahti, Olli; de Lamballerie, Xavier
2012-01-01
Mosquitoes collected in Finland were screened for flaviviral RNA leading to the discovery and isolation of a novel flavivirus designated Hanko virus (HANKV). Virus characterization, including phylogenetic analysis of the complete coding sequence, confirmed HANKV as a member of the “insect-specific” flavivirus (ISF) group. HANKV is the first member of this group isolated from northern Europe, and therefore the first northern European ISF for which the complete coding sequence has been determined. HANKV was not transcribed as DNA in mosquito cell culture, which appears atypical for an ISF. HANKV shared highest sequence homology with the partial NS5 sequence available for the recently discovered Spanish Ochlerotatus flavivirus (SOcFV). Retrospective analysis of mitochondrial sequences from the virus-positive mosquito pool suggested an Ochlerotatus mosquito species as the most likely host for HANKV. HANKV and SOcFV may therefore represent a novel group of Ochlerotatus-hosted insect-specific flaviviruses in Europe and further afield. PMID:22999256
Behera, Bijay Kumar; Baisvar, Vishwamitra Singh; Kumari, Kavita; Rout, Ajaya Kumar; Pakrashi, Sudip; Paria, Prasenjet; Rao, A R; Rai, Anil
2017-03-01
In the present study, the complete mitochondrial genome sequence of Anabas testudineusis reported using PGM sequencer (Ion Torrent, Life Technologies, La Jolla, CA). The complete mitogenome of climbing perch, A. testudineusis obtained by the de novo sequences assembly of genomic reads using the Torrent Mapping Alignment Program (TMAP), which is 16 603 bp in length. The mitogenome of A. testudineus composed of 13 protein- coding genes, two rRNA, and 22 tRNAs. Here, 20 tRNAs genes showed typical clover leaf model, and D-Loop as the control region along with gene order and organization, being closely similar to Osphronemidae and most of other Perciformes fish mitogenomes of NCBI databases. The mitogenome in the present study has 99% similarity to the complete mitogenome sequence of earlier reported A. testudineus. The phylogenetic analysis of Anabantidae depicted that their mitogenomes are closely related to each other. The complete mitogenome sequence of A. testudineus would be helpful in understanding the population genetics, phylogenetics, and evolution of Anabantidae.
Shittu, Ismaila; Sharma, Poonam; Joannis, Tony M.; Volkening, Jeremy D.; Odaibo, Georgina N.; Olaleye, David O.; Williams-Coplin, Dawn; Solomon, Ponman; Abolnik, Celia; Miller, Patti J.; Dimitrov, Kiril M.
2016-01-01
The first complete genome sequence of a strain of Newcastle disease virus (NDV) of genotype XVII is described here. A velogenic strain (duck/Nigeria/903/KUDU-113/1992) was isolated from an apparently healthy free-roaming domestic duck sampled in Kuru, Nigeria, in 1992. Phylogenetic analysis of the fusion protein gene and complete genome classified the isolate as a member of NDV class II, genotype XVII. PMID:26847901
The complete mitochondrial genome sequence of the maned wolf (Chrysocyon brachyurus).
Zhao, Chao; Yang, Xiufeng; Zhang, Honghai; Zhang, Jin; Chen, Lei; Sha, Weilai; Liu, Guangshuai
2016-01-01
In this study, the complete mitochondrial genome of the maned wolf (Chrysocyon brachyurus), the unique species in Chrysocyon, was sequenced and reported for the first time using blood samples obtained from a female individual in Shanghai Zoo, China. Sequence analysis showed that the genome structure was in accordance with other Canidae species and it contained 12 S rRNA gene, 16 S rRNA gene, 22 tRNA genes, 13 protein-coding genes and 1 control region.
Christley, Scott; Scarborough, Walter; Salinas, Eddie; Rounds, William H; Toby, Inimary T; Fonner, John M; Levin, Mikhail K; Kim, Min; Mock, Stephen A; Jordan, Christopher; Ostmeyer, Jared; Buntzman, Adam; Rubelt, Florian; Davila, Marco L; Monson, Nancy L; Scheuermann, Richard H; Cowell, Lindsay G
2018-01-01
Recent technological advances in immune repertoire sequencing have created tremendous potential for advancing our understanding of adaptive immune response dynamics in various states of health and disease. Immune repertoire sequencing produces large, highly complex data sets, however, which require specialized methods and software tools for their effective analysis and interpretation. VDJServer is a cloud-based analysis portal for immune repertoire sequence data that provide access to a suite of tools for a complete analysis workflow, including modules for preprocessing and quality control of sequence reads, V(D)J gene segment assignment, repertoire characterization, and repertoire comparison. VDJServer also provides sophisticated visualizations for exploratory analysis. It is accessible through a standard web browser via a graphical user interface designed for use by immunologists, clinicians, and bioinformatics researchers. VDJServer provides a data commons for public sharing of repertoire sequencing data, as well as private sharing of data between users. We describe the main functionality and architecture of VDJServer and demonstrate its capabilities with use cases from cancer immunology and autoimmunity. VDJServer provides a complete analysis suite for human and mouse T-cell and B-cell receptor repertoire sequencing data. The combination of its user-friendly interface and high-performance computing allows large immune repertoire sequencing projects to be analyzed with no programming or software installation required. VDJServer is a web-accessible cloud platform that provides access through a graphical user interface to a data management infrastructure, a collection of analysis tools covering all steps in an analysis, and an infrastructure for sharing data along with workflows, results, and computational provenance. VDJServer is a free, publicly available, and open-source licensed resource.
[Sequencing and analysis of the complete genome of a rabies virus isolate from Sika deer].
Zhao, Yun-Jiao; Guo, Li; Huang, Ying; Zhang, Li-Shi; Qian, Ai-Dong
2008-05-01
One DRV strain was isolated from Sika Deer brain and sequenced. Nine overlapped gene fragments were amplified by RT-PCR through 3'-RACE and 5'-RACE method, and the complete DRV genome sequence was assembled. The length of the complete genome is 11863bp. The DRV genome organization was similar to other rabies viruses which were composed of five genes and the initiation sites and termination sites were highly conservative. There were mutated amino acids in important antigen sites of nucleoprotein and glycoprotein. The nucleotide and amino acid homologies of gene N, P, M, G, L in strains with completed genomie sequencing were compared. Compared with N gene sequence of other typical rabies viruses, a phylogenetic tree was established . These results indicated that DRV belonged to gene type 1. The highest homology compared with Chinese vaccine strain 3aG was 94%, and the lowest was 71% compared with WCBV. These findings provided theoretical reference for further research in rabies virus.
Bandelt, Hans-Jürgen; Yao, Yong-Gang; Bravi, Claudio M; Salas, Antonio; Kivisild, Toomas
2009-03-01
Sequence analysis of the mitochondrial genome has become a routine method in the study of mitochondrial diseases. Quite often, the sequencing efforts in the search of pathogenic or disease-associated mutations are affected by technical and interpretive problems, caused by sample mix-up, contamination, biochemical problems, incomplete sequencing, misdocumentation and insufficient reference to previously published data. To assess data quality in case studies of mitochondrial diseases, it is recommended to compare any mtDNA sequence under consideration to their phylogenetically closest lineages available in the Web. The median network method has proven useful for visualizing potential problems with the data. We contrast some early reports of complete mtDNA sequences to more recent total mtDNA sequencing efforts in studies of various mitochondrial diseases. We conclude that the quality of complete mtDNA sequences generated in the medical field in the past few years is somewhat unsatisfactory and may even fall behind that of pioneer manual sequencing in the early nineties. Our study provides a paradigm for an a posteriori evaluation of sequence quality and for detection of potential problems with inferring a pathogenic status of a particular mutation.
Logacheva, Maria D; Samigullin, Tahir H; Dhingra, Amit; Penin, Aleksey A
2008-01-01
Background Chloroplast genome sequences are extremely informative about species-interrelationships owing to its non-meiotic and often uniparental inheritance over generations. The subject of our study, Fagopyrum esculentum, is a member of the family Polygonaceae belonging to the order Caryophyllales. An uncertainty remains regarding the affinity of Caryophyllales and the asterids that could be due to undersampling of the taxa. With that background, having access to the complete chloroplast genome sequence for Fagopyrum becomes quite pertinent. Results We report the complete chloroplast genome sequence of a wild ancestor of cultivated buckwheat, Fagopyrum esculentum ssp. ancestrale. The sequence was rapidly determined using a previously described approach that utilized a PCR-based method and employed universal primers, designed on the scaffold of multiple sequence alignment of chloroplast genomes. The gene content and order in buckwheat chloroplast genome is similar to Spinacia oleracea. However, some unique structural differences exist: the presence of an intron in the rpl2 gene, a frameshift mutation in the rpl23 gene and extension of the inverted repeat region to include the ycf1 gene. Phylogenetic analysis of 61 protein-coding gene sequences from 44 complete plastid genomes provided strong support for the sister relationships of Caryophyllales (including Polygonaceae) to asterids. Further, our analysis also provided support for Amborella as sister to all other angiosperms, but interestingly, in the bayesian phylogeny inference based on first two codon positions Amborella united with Nymphaeales. Conclusion Comparative genomics analyses revealed that the Fagopyrum chloroplast genome harbors the characteristic gene content and organization as has been described for several other chloroplast genomes. However, it has some unique structural features distinct from previously reported complete chloroplast genome sequences. Phylogenetic analysis of the dataset, including this new sequence from non-core Caryophyllales supports the sister relationship between Caryophyllales and asterids. PMID:18492277
Complete cDNA sequence and amino acid analysis of a bovine ribonuclease K6 gene.
Pietrowski, D; Förster, M
2000-01-01
The complete cDNA sequence of a ribonuclease k6 gene of Bos Taurus has been determined. It codes for a protein with 154 amino acids and contains the invariant cysteine, histidine and lysine residues as well as the characteristic motifs specific to ribonuclease active sites. The deduced protein sequence is 27 residues longer than other known ribonucleases k6 and shows amino acids exchanges which could reflect a strain specificity or polymorphism within the bovine genome. Based on sequence similarity we have termed the identified gene bovine ribonuclease k6 b (brk6b).
Hand, Melanie L.; Spangenberg, German C.; Forster, John W.; Cogan, Noel O. I.
2013-01-01
Chloroplast genome sequences are of broad significance in plant biology, due to frequent use in molecular phylogenetics, comparative genomics, population genetics, and genetic modification studies. The present study used a second-generation sequencing approach to determine and assemble the plastid genomes (plastomes) of four representatives from the agriculturally important Lolium-Festuca species complex of pasture grasses (Lolium multiflorum, Festuca pratensis, Festuca altissima, and Festuca ovina). Total cellular DNA was extracted from either roots or leaves, was sequenced, and the output was filtered for plastome-related reads. A comparison between sources revealed fewer plastome-related reads from root-derived template but an increase in incidental bacterium-derived sequences. Plastome assembly and annotation indicated high levels of sequence identity and a conserved organization and gene content between species. However, frequent deletions within the F. ovina plastome appeared to contribute to a smaller plastid genome size. Comparative analysis with complete plastome sequences from other members of the Poaceae confirmed conservation of most grass-specific features. Detailed analysis of the rbcL–psaI intergenic region, however, revealed a “hot-spot” of variation characterized by independent deletion events. The evolutionary implications of this observation are discussed. The complete plastome sequences are anticipated to provide the basis for potential organelle-specific genetic modification of pasture grasses. PMID:23550121
USDA-ARS?s Scientific Manuscript database
The first complete genome sequence of a strain of Newcastle disease virus from genotype XIV is reported here. Strain duck/Nigeria/NG-695/KG.LOM.11-16/2009 was isolated from an apparently healthy domestic duck from a live bird market in Kogi State, Nigeria, in 2009. This strain is classified as a m...
USDA-ARS?s Scientific Manuscript database
The first complete genome sequence of a strain of Newcastle disease virus (NDV) of genotype XVII is described here. A velogenic strain (duck/Nigeria/903/KUDU-113/1992) was isolated from an apparently healthy free-roaming domestic duck sampled in Kuru, Nigeria, in 1992. Phylogenetic analysis of the f...
Metagenomic Analysis of Cucumber RNA from East Timor Reveals an Aphid lethal paralysis virus Genome
Maina, Solomon; Edwards, Owain R.; de Almeida, Luis; Ximenes, Abel
2017-01-01
ABSTRACT We present here the first complete genomic Aphid lethal paralysis virus (ALPV) sequence isolated from cucumber plant RNA from East Timor. We compare it with two complete ALPV genome sequences from China, and one each from Israel, South Africa, and the United States. It most closely resembled the Chinese isolate LGH genome. PMID:28082492
Alexandraki, Voula; Kazou, Maria; Pot, Bruno; Tsakalidou, Effie; Papadimitriou, Konstantinos
2017-08-24
Lactobacillus delbrueckii subsp. bulgaricus is widely used in the production of yogurt and cheese. In this study, we present the complete genome sequence of L. delbrueckii subsp. bulgaricus ACA-DC 87 isolated from traditional Greek yogurt. Whole-genome analysis may reveal desirable technological traits of the strain for dairy fermentations. Copyright © 2017 Alexandraki et al.
Xiao, Sa; Paldurai, Anandan; Nayak, Baibaswata; Mirande, Armando; Collins, Peter L.
2013-01-01
The complete genome sequence was determined for a highly virulent Newcastle disease virus strain from vaccinated chicken farms in Mexico during outbreaks in 2010. On the basis of phylogenetic analysis this strain was classified into genotype V in the class II cluster that was closely related to Mexican strains that appeared in 2004–2006. PMID:23409252
Ali, Akhtar; Ali, Ijaz
2015-01-01
Dengue virus serotype 2 (DENV-2) isolates have been implicated in deadly outbreaks of dengue fever (DF) and dengue hemorrhagic fever (DHF) in several regions of the world. Phylogenetic analysis of DENV-2 isolates collected from particular countries has been performed using partial or individual genes but only a few studies have examined complete whole-genome sequences collected worldwide. Herein, 50 complete genome sequences of DENV-2 isolates, reported over the past 70 years from 19 different countries, were downloaded from GenBank. Phylogenetic analysis was conducted and evolutionary distances of the 50 DENV-2 isolates were determined using maximum likelihood (ML) trees or Bayesian phylogenetic analysis created from complete genome nucleotide (nt) and amino acid (aa) sequences or individual gene sequences. The results showed that all DENV-2 isolates fell into seven main groups containing five previously defined genotypes. A Cosmopolitan genotype showed further division into three groups (C-I, C-II, and C-III) with the C-I group containing two subgroups (C-IA and C-IB). Comparison of the aa sequences showed specific mutations among the various groups of DENV-2 isolates. A maximum number of aa mutations was observed in the NS5 gene, followed by the NS2A, NS3 and NS1 genes, while the smallest number of aa substitutions was recorded in the capsid gene, followed by the PrM/M, NS4A, and NS4B genes. Maximum evolutionary distances were found in the NS2A gene, followed by the NS4A and NS4B genes. Based on these results, we propose that genotyping of DENV-2 isolates in future studies should be performed on entire genome sequences in order to gain a complete understanding of the evolution of various isolates reported from different geographical locations around the world. PMID:26414178
Peng, Rui; Zeng, Bo; Meng, Xiuxiang; Yue, Bisong; Zhang, Zhihe; Zou, Fangdong
2007-08-01
The complete mitochondrial genome sequence of the giant panda, Ailuropoda melanoleuca, was determined by the long and accurate polymerase chain reaction (LA-PCR) with conserved primers and primer walking sequence methods. The complete mitochondrial DNA is 16,805 nucleotides in length and contains two ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and one control region. The total length of the 13 protein-coding genes is longer than the American black bear, brown bear and polar bear by 3 amino acids at the end of ND5 gene. The codon usage also followed the typical vertebrate pattern except for an unusual ATT start codon, which initiates the NADH dehydrogenase subunit 5 (ND5) gene. The molecular phylogenetic analysis was performed on the sequences of 12 concatenated heavy-strand encoded protein-coding genes, and suggested that the giant panda is most closely related to bears.
Kim, Kyunghee; Lee, Sang-Choon; Lee, Junki; Lee, Hyun Oh; Joh, Ho Jun; Kim, Nam-Hoon; Park, Hyun-Seung; Yang, Tae-Jin
2015-01-01
We report complete sequences of chloroplast (cp) genome and 45S nuclear ribosomal DNA (45S nrDNA) for 11 Panax ginseng cultivars. We have obtained complete sequences of cp and 45S nrDNA, the representative barcoding target sequences for cytoplasm and nuclear genome, respectively, based on low coverage NGS sequence of each cultivar. The cp genomes sizes ranged from 156,241 to 156,425 bp and the major size variation was derived from differences in copy number of tandem repeats in the ycf1 gene and in the intergenic regions of rps16-trnUUG and rpl32-trnUAG. The complete 45S nrDNA unit sequences were 11,091 bp, representing a consensus single transcriptional unit with an intergenic spacer region. Comparative analysis of these sequences as well as those previously reported for three Chinese accessions identified very rare but unique polymorphism in the cp genome within P. ginseng cultivars. There were 12 intra-species polymorphisms (six SNPs and six InDels) among 14 cultivars. We also identified five SNPs from 45S nrDNA of 11 Korean ginseng cultivars. From the 17 unique informative polymorphic sites, we developed six reliable markers for analysis of ginseng diversity and cultivar authentication. PMID:26061692
Shen, Kang-Ning; Chen, Ching-Hung; Hsiao, Chung-Der
2016-05-01
In this study, the complete mitogenome sequence of hornlip mullet Plicomugil labiosus (Teleostei: Mugilidae) has been sequenced by next-generation sequencing method. The assembled mitogenome, consisting of 16,829 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein coding genes, 22 transfer RNAs, 2 ribosomal RNAs genes and a non-coding control region of D-loop. D-loop contains 1057 bp length is located between tRNA-Pro and tRNA-Phe. The overall base composition of P. labiosus is 28.0% for A, 29.3% for C, 15.5% for G and 27.2% for T. The complete mitogenome may provide essential and important DNA molecular data for further population, phylogenetic and evolutionary analysis for Mugilidae.
Shen, Kang-Ning; Tsai, Shiou-Yi; Chen, Ching-Hung; Hsiao, Chung-Der; Durand, Jean-Dominique
2016-11-01
In this study, the complete mitogenome sequence of largescale mullet (Teleostei: Mugilidae) has been sequenced by the next-generation sequencing method. The assembled mitogenome, consisting of 16,832 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein-coding genes, 22 transfer RNAs, two ribosomal RNAs genes, and a non-coding control region of D-loop. D-loop which has a length of 1094 bp is located between tRNA-Pro and tRNA-Phe. The overall base composition of largescale mullet is 27.8% for A, 30.1% for C, 16.2% for G, and 25.9% for T. The complete mitogenome may provide essential and important DNA molecular data for further phylogenetic and evolutionary analysis for Mugilidae.
Genetic Diversity of Crimean Congo Hemorrhagic Fever Virus Strains from Iran
Chinikar, Sadegh; Bouzari, Saeid; Shokrgozar, Mohammad Ali; Mostafavi, Ehsan; Jalali, Tahmineh; Khakifirouz, Sahar; Nowotny, Norbert; Fooks, Anthony R.; Shah-Hosseini, Nariman
2016-01-01
Background: Crimean Congo hemorrhagic fever virus (CCHFV) is a member of the Bunyaviridae family and Nairovirus genus. It has a negative-sense, single stranded RNA genome approximately 19.2 kb, containing the Small, Medium, and Large segments. CCHFVs are relatively divergent in their genome sequence and grouped in seven distinct clades based on S-segment sequence analysis and six clades based on M-segment sequences. Our aim was to obtain new insights into the molecular epidemiology of CCHFV in Iran. Methods: We analyzed partial and complete nucleotide sequences of the S and M segments derived from 50 Iranian patients. The extracted RNA was amplified using one-step RT-PCR and then sequenced. The sequences were analyzed using Mega5 software. Results: Phylogenetic analysis of partial S segment sequences demonstrated that clade IV-(Asia 1), clade IV-(Asia 2) and clade V-(Europe) accounted for 80 %, 4 % and 14 % of the circulating genomic variants of CCHFV in Iran respectively. However, one of the Iranian strains (Iran-Kerman/22) was associated with none of other sequences and formed a new clade (VII). The phylogenetic analysis of complete S-segment nucleotide sequences from selected Iranian CCHFV strains complemented with representative strains from GenBank revealed similar topology as partial sequences with eight major clusters. A partial M segment phylogeny positioned the Iranian strains in either association with clade III (Asia-Africa) or clade V (Europe). Conclusion: The phylogenetic analysis revealed subtle links between distant geographic locations, which we propose might originate either from international livestock trade or from long-distance carriage of CCHFV by infected ticks via bird migration. PMID:27308271
Whole genome sequence and comparative analysis of Borrelia burgdorferi MM1
Jabbari, Neda; Reddy, Panga Jaipal; Hood, Leroy
2018-01-01
Lyme disease is caused by spirochaetes of the Borrelia burgdorferi sensu lato genospecies. Complete genome assemblies are available for fewer than ten strains of Borrelia burgdorferi sensu stricto, the primary cause of Lyme disease in North America. MM1 is a sensu stricto strain originally isolated in the midwestern United States. Aside from a small number of genes, the complete genome sequence of this strain has not been reported. Here we present the complete genome sequence of MM1 in relation to other sensu stricto strains and in terms of its Multi Locus Sequence Typing. Our results indicate that MM1 is a new sequence type which contains a conserved main chromosome and 15 plasmids. Our results include the first contiguous 28.5 kb assembly of lp28-8, a linear plasmid carrying the vls antigenic variation system, from a Borrelia burgdorferi sensu stricto strain. PMID:29889842
Quantitative phenotyping via deep barcode sequencing.
Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey
2009-10-01
Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.
Sperber, Göran; Lövgren, Anders; Eriksson, Nils-Einar; Benachenhou, Farid; Blomberg, Jonas
2009-01-01
Background The rapid accumulation of genomic information in databases necessitates rapid and specific algorithms for extracting biologically meaningful information. More or less complete retroviral sequences, also called proviral or endogenous retroviral sequences; ERVs, constitutes at least 5% of vertebrate genomes. After infecting the host, these retroviruses have integrated in germ line cells, and have then been carried in genomes for at least several 100 million years. A better understanding of structure and function of these sequences can have profound biological and medical consequences. Methods RetroTector© (ReTe) is a platform-independent Java program for identification and characterization of proviral sequences in vertebrate genomes. The full ReTe requires a local installation with a MySQL database. Although not overly complicated, the installation may take some time. A "light" version of ReTe, (RetroTector online; ROL) which does not require specific installation procedures is provided, via the World Wide Web. Results ROL was implemented under the Batchelor web interface (A Lövgren et al). It allows both GenBank accession number, file and FASTA cut-and-paste admission of sequences (5 to 10 000 kilobases). Up to ten submissions can be done simultaneously, allowing batch analysis of <= 100 Megabases. Jobs are shown in an IP-number specific list. Results are text files, and can be viewed with the program, RetroTectorViewer.jar (at the same site), which has the full graphical capabilities of the basic ReTe program. A detailed analysis of any retroviral sequences found in the submitted sequence is graphically presented, exportable in standard formats. With the current server, a complete analysis of a 1 Megabase sequence is complete in 10 minutes. It is possible to mask nonretroviral repetitive sequences in the submitted sequence, using host genome specific "brooms", which increase specificity. Discussion Proviral sequences can be hard to recognize, especially if the integration occurred many million years ago. Precise delineation of LTR, gag, pro, pol and env can be difficult, requiring manual work. ROL is a way of simplifying these tasks. Conclusion ROL provides 1. annotation and presentation of known retroviral sequences, 2. detection of proviral chains in unknown genomic sequences, with up to 100 Mbase per submission. PMID:19534753
Sperber, Göran; Lövgren, Anders; Eriksson, Nils-Einar; Benachenhou, Farid; Blomberg, Jonas
2009-06-16
The rapid accumulation of genomic information in databases necessitates rapid and specific algorithms for extracting biologically meaningful information. More or less complete retroviral sequences, also called proviral or endogenous retroviral sequences; ERVs, constitutes at least 5% of vertebrate genomes. After infecting the host, these retroviruses have integrated in germ line cells, and have then been carried in genomes for at least several 100 million years. A better understanding of structure and function of these sequences can have profound biological and medical consequences. RetroTector (ReTe) is a platform-independent Java program for identification and characterization of proviral sequences in vertebrate genomes. The full ReTe requires a local installation with a MySQL database. Although not overly complicated, the installation may take some time. A "light" version of ReTe, (RetroTector online; ROL) which does not require specific installation procedures is provided, via the World Wide Web. ROL http://www.fysiologi.neuro.uu.se/jbgs/ was implemented under the Batchelor web interface (A Lövgren et al). It allows both GenBank accession number, file and FASTA cut-and-paste admission of sequences (5 to 10,000 kilobases). Up to ten submissions can be done simultaneously, allowing batch analysis of
Kurata, Atsushi; Hirose, Yuu; Misawa, Naomi; Wakazuki, Sachiko; Kishimoto, Noriaki; Kobayashi, Tohru
2016-03-10
Here we report the complete genome sequence of Microcella alkaliphila JAM-AC0309, which was newly isolated from the deep subseafloor core sediment from offshore of the Shimokita Peninsula of Japan. An array of genes related to utilization of xylan in this bacterium was identified by whole genome analysis. Copyright © 2016 Elsevier B.V. All rights reserved.
Khanna, Namita; Ghosh, Ananta Kumar; Huntemann, Marcel; Deshpande, Shweta; Han, James; Chen, Amy; Kyrpides, Nikos; Mavrommatis, Kostas; Szeto, Ernest; Markowitz, Victor; Ivanova, Natalia; Pagani, Ioanna; Pati, Amrita; Pitluck, Sam; Nolan, Matt; Woyke, Tanja; Teshima, Hazuki; Chertkov, Olga; Daligault, Hajnalka; Davenport, Karen; Gu, Wei; Munk, Christine; Zhang, Xiaojing; Bruce, David; Detter, Chris; Xu, Yan; Quintana, Beverly; Reitenga, Krista; Kunde, Yulia; Green, Lance; Erkkila, Tracy; Han, Cliff; Brambilla, Evelyne-Marie; Lang, Elke; Klenk, Hans-Peter; Goodwin, Lynne; Chain, Patrick; Das, Debabrata
2013-12-20
Enterobacter sp. IIT-BT 08 belongs to Phylum: Proteobacteria, Class: Gammaproteobacteria, Order: Enterobacteriales, Family: Enterobacteriaceae. The organism was isolated from the leaves of a local plant near the Kharagpur railway station, Kharagpur, West Bengal, India. It has been extensively studied for fermentative hydrogen production because of its high hydrogen yield. For further enhancement of hydrogen production by strain development, complete genome sequence analysis was carried out. Sequence analysis revealed that the genome was linear, 4.67 Mbp long and had a GC content of 56.01%. The genome properties encode 4,393 protein-coding and 179 RNA genes. Additionally, a putative pathway of hydrogen production was suggested based on the presence of formate hydrogen lyase complex and other related genes identified in the genome. Thus, in the present study we describe the specific properties of the organism and the generation, annotation and analysis of its genome sequence as well as discuss the putative pathway of hydrogen production by this organism.
Spencermartinsiella europaea gen. nov., sp. nov., a new member of the family Trichomonascaceae
USDA-ARS?s Scientific Manuscript database
Ten strains of a novel heterothallic yeast species were isolated from rotten wood collected at different locations in Hungary. Analysis of gene sequences for the D1/D2 domain of the large subunit ribosomal RNA, as well as analysis of concatenated gene sequences for the nearly complete nuclear large...
Yang, Heng; Lv, Minna; Sun, Minfei; Lin, Liqin; Kou, Meilin; Gao, Lin; Liao, Defang; Xiong, Heli; He, Yuwen; Li, Huachun
2016-01-01
Bluetongue virus (BTV) mainly infects sheep but can be transmitted to other domestic and wild ruminants, resulting in a considerable financial burden and trade restriction. Our understanding of the origin, movement, and distribution of BTV has been hindered by the fact that this virus has a segmented genome with the possibility of reassortment, the existence of 27 identified serotypes, and a lack of complete sequences of viruses isolated from different parts of the world. BTV serotype 7 is one of the prevalent BTV serotypes in Asia. Nonetheless, no complete genomic sequence of an Asian isolate of this serotype is available. In an effort to understand the molecular epidemiology of BTV infection in China, for the first time, we report here the complete genome sequence of a BTV serotype 7 strain, GDST008, which was isolated in 2014 in China. This sequence also represents the first complete genome sequence of a BTV serotype 7 from Asia and the third one in the world. Sequence analysis suggests that GDST008 consists of segments from BTV viruses of African lineage as well as those from China. Together, these results improve our understanding of the origin, emergence/re-emergence, and movement of BTV and thus can be applied in the development of vaccines and diagnostics.
Kaján, Győző L; Kajon, Adriana E; Pinto, Alexis Castillo; Bartha, Dániel; Arnberg, Niklas
2017-10-15
A novel human adenovirus was isolated from a pediatric case of acute respiratory disease in Panama City, Panama in 2011. The clinical isolate was initially identified as an intertypic recombinant based on hexon and fiber gene sequencing. Based on the analysis of its complete genome sequence, the novel complex recombinant Human mastadenovirus D (HAdV-D) strain was classified into a new HAdV type: HAdV-84, and it was designated Adenovirus D human/PAN/P309886/2011/84[P43H17F84]. HAdV-D types possess usually an ocular or gastrointestinal tropism, and respiratory association is scarcely reported. The virus has a novel fiber type, most closely related to, but still clearly distant from that of HAdV-36. The predicted fiber is hypothesised to bind sialic acid with lower affinity compared to HAdV-37. Bioinformatic analysis of the complete genomic sequence of HAdV-84 revealed multiple homologous recombination events and provided deeper insight into HAdV evolution. Copyright © 2017 Elsevier B.V. All rights reserved.
The complete mitochondrial genome of the big-belly seahorse, Hippocampus abdominalis (Lesson 1827).
Wang, Lei; Chen, Zaizhong; Leng, Xiangjun; Gao, Jianzhong; Chen, Xiaowu; Li, Zhongpu; Sun, Peiying; Zhao, Yuming
2016-11-01
In this study, the complete mitogenome sequence of the big-belly seahorse, Hippocampus abdominalis (Lesson, 1827) (Syngnathiformes: Syngnathidae), has been sequenced by the next-generation sequencing method. The assembled mitogenome is 16 521 bp in length which includes 13 protein-coding genes, 22 transfer RNAs, and 2 ribosomal RNAs genes. The overall base composition of the seahorse is 31.1% for A, 23.6% for C, 16.0% for G, 29.3% for T and shows 87% identities similar to tiger tail seahorse, Hippocampus comes. The complete mitogenome of the big-belly seahorse provides essential and important DNA molecular data for further phylogeography and evolutionary analysis for seahorse family.
[Complete genome sequencing and sequence analysis of BCG Tice].
Wang, Zhiming; Pan, Yuanlong; Wu, Jun; Zhu, Baoli
2012-10-04
The objective of this study is to obtain the complete genome sequence of Bacillus Calmette-Guerin Tice (BCG Tice), in order to provide more information about the molecular biology of BCG Tice and design more reasonable vaccines to prevent tuberculosis. We assembled the data from high-throughput sequencing with SOAPdenovo software, with many contigs and scaffolds obtained. There are many sequence gaps and physical gaps remained as a result of regional low coverage and low quality. We designed primers at the end of contigs and performed PCR amplification in order to link these contigs and scaffolds. With various enzymes to perform PCR amplification, adjustment of PCR reaction conditions, and combined with clone construction to sequence, all the gaps were finished. We obtained the complete genome sequence of BCG Tice and submitted it to GenBank of National Center for Biotechnology Information (NCBI). The genome of BCG Tice is 4334064 base pairs in length, with GC content 65.65%. The problems and strategies during the finishing step of BCG Tice sequencing are illuminated here, with the hope of affording some experience to those who are involved in the finishing step of genome sequencing. The microarray data were verified by our results.
Barcellos, Leonardo H; Palmeiro, Marina Lobato; Naconecy, Marcos M; Geremia, Tomás; Cervieri, André; Shinkai, Rosemary S
2018-05-17
To compare the effects of different screw-tightening sequences and torque applications on stresses in implant-supported fixed complete dentures supported by five abutments. Strain gauges fixed to the abutments were used to test the sequences 2-4-3-1-5; 1-2-3-4-5; 3-2-4-1-5; and 2-5-4-1-3 with direct 10-Ncm torque or progressive torque (5 + 10 Ncm). Data were analyzed using analysis of variance and standardized effect size. No effects of tightening sequence or torque application were found except for the sequence 3-2-4-1-5 and some small to moderate effect sizes. Screw-tightening sequences and torque application modes have only a marginal effect on residual stresses.
Liu, Guo-Hua; Li, Chun; Li, Jia-Yuan; Zhou, Dong-Hui; Xiong, Rong-Chuan; Lin, Rui-Qing; Zou, Feng-Cai; Zhu, Xing-Quan
2012-01-01
Sparganosis, caused by the plerocercoid larvae of members of the genus Spirometra, can cause significant public health problem and considerable economic losses. In the present study, the complete mitochondrial DNA (mtDNA) sequence of Spirometra erinaceieuropaei from China was determined, characterized and compared with that of S. erinaceieuropaei from Japan. The gene arrangement in the mt genome sequences of S. erinaceieuropaei from China and Japan is identical. The identity of the mt genomes was 99.1% between S. erinaceieuropaei from China and Japan, and the complete mtDNA sequence of S. erinaceieuropaei from China is slightly shorter (2 bp) than that from Japan. Phylogenetic analysis of S. erinaceieuropaei with other representative cestodes using two different computational algorithms [Bayesian inference (BI) and maximum likelihood (ML)] based on concatenated amino acid sequences of 12 protein-coding genes, revealed that S. erinaceieuropaei is closely related to Diphyllobothrium spp., supporting classification based on morphological features. The present study determined the complete mtDNA sequences of S. erinaceieuropaei from China that provides novel genetic markers for studying the population genetics and molecular epidemiology of S. erinaceieuropaei in humans and animals. PMID:22553464
Teng, Y; Liu, H; Lv, J Q; Fan, W H; Zhang, Q Y; Qin, Q W
2007-01-01
The complete genome of spring viraemia of carp virus (SVCV) strain A-1 isolated from cultured common carp (Cyprinus carpio) in China was sequenced and characterized. Reverse transcription-polymerase chain reaction (RT-PCR) derived clones were constructed and the DNA was sequenced. It showed that the entire genome of SVCV A-1 consists of 11,100 nucleotide base pairs, the predicted size of the viral RNA of rhabdoviruses. However, the additional insertions in bp 4633-4676 and bp 4684-4724 of SVCV A-1 were different from the other two published SVCV complete genomes. Five open reading frames (ORFs) of SVCV A-1 were identified and further confirmed by RT-PCR and DNA sequencing of their respective RT-PCR products. The 5 structural proteins encoded by the viral RNA were ordered 3'-N-P-M-G-L-5'. This is the first report of a complete genome sequence of SVCV isolated from cultured carp in China. Phylogenetic analysis indicates that SVCV A-1 is closely related to the members of the genus Vesiculovirus, family Rhabdoviridae.
Ali, M. Rahmat; Alam, A. S. M. Rubayet Ul; Amin, M. Al; Ullah, Huzzat; Siddique, Mohammad Anwar; Momtaz, Samina; Sultana, Munawar
2017-01-01
ABSTRACT The complete genome sequence of foot-and-mouth disease virus (FMDV) serotype Asia1 isolated from Bangladesh is reported here. Genome analysis revealed amino acid substitutions in the VP1 antigenic region and deletions in both the 5′ and 3′ untranslated regions (UTRs) compared to the genome of the existing vaccine strain (GenBank accession no. AY304994). PMID:29074654
Complete genome sequence of Corynebacterium glutamicum CP, a Chinese l-leucine producing strain.
Gui, Yongli; Ma, Yuechao; Xu, Qingyang; Zhang, Chenglin; Xie, Xixian; Chen, Ning
2016-02-20
Here, we report the complete genome sequence of Corynebacterium glutamicum CP, an industrial l-leucine producing strain in China. The whole genome consists of a circular chromosome and a plasmid. The comparative genomics analysis shows that there are many mutations in the key enzyme coding genes relevant to l-leucine biosynthesis compared to C. glutamicum ATCC 13032. Copyright © 2016 Elsevier B.V. All rights reserved.
Near-Complete Genome Sequence of a Novel Single-Stranded RNA Virus Discovered in Indoor Air
2018-01-01
ABSTRACT Viral metagenomic analysis of heating, ventilation, and air conditioning (HVAC) filters recovered the near-complete genome sequence of a novel virus, named HVAC-associated RNA virus 1 (HVAC-RV1). The HVAC-RV1 genome is most similar to those of picorna-like viruses identified in arthropods but encodes a small domain observed only in negative-sense single-stranded RNA viruses. PMID:29567746
Shittu, Ismaila; Sharma, Poonam; Volkening, Jeremy D.; Solomon, Ponman; Sulaiman, Lanre K.; Joannis, Tony M.; Williams-Coplin, Dawn; Miller, Patti J.; Dimitrov, Kiril M.
2016-01-01
The first complete genome sequence of a strain of Newcastle disease virus (NDV) from genotype XIV is reported here. Strain duck/Nigeria/NG-695/KG.LOM.11-16/2009 was isolated from an apparently healthy domestic duck from a live bird market in Kogi State, Nigeria, in 2009. This strain is classified as a member of subgenotype XIVb of class II. PMID:26823576
Multifractal analysis of 2001 Mw 7 . 7 Bhuj earthquake sequence in Gujarat, Western India
NASA Astrophysics Data System (ADS)
Aggarwal, Sandeep Kumar; Pastén, Denisse; Khan, Prosanta Kumar
2017-12-01
The 2001 Mw 7 . 7 Bhuj mainshock seismic sequence in the Kachchh area, occurring during 2001 to 2012, has been analyzed using mono-fractal and multi-fractal dimension spectrum analysis technique. This region was characterized by frequent moderate shocks of Mw ≥ 5 . 0 for more than a decade since the occurrence of 2001 Bhuj earthquake. The present study is therefore important for precursory analysis using this sequence. The selected long-sequence has been investigated first time for completeness magnitude Mc 3.0 using the maximum curvature method. Multi-fractal Dq spectrum (Dq ∼ q) analysis was carried out using effective window-length of 200 earthquakes with a moving window of 20 events overlapped by 180 events. The robustness of the analysis has been tested by considering the magnitude completeness correction term of 0.2 to Mc 3.0 as Mc 3.2 and we have tested the error in the calculus of Dq for each magnitude threshold. On the other hand, the stability of the analysis has been investigated down to the minimum magnitude of Mw ≥ 2 . 6 in the sequence. The analysis shows the multi-fractal dimension spectrum Dq decreases with increasing of clustering of events with time before a moderate magnitude earthquake in the sequence, which alternatively accounts for non-randomness in the spatial distribution of epicenters and its self-organized criticality. Similar behavior is ubiquitous elsewhere around the globe, and warns for proximity of a damaging seismic event in an area. OS: Please confirm math roman or italics in abs.
Study of mitochondria D-loop gene to detect the heterogeneity of gemak in Turnicidae family
NASA Astrophysics Data System (ADS)
Setiati, N.; Partaya
2018-03-01
As a part of life biodiversity, birds in Turnicidae family should be preserved from the extinction and its type heterogeneity decline. One effort for giving the strategic base of plasma nutfah conservation is through genetic heterogeneity study. The aim of the research is to analyze D-loop gen from DNA mitochondria of gemak bird in Turnicidae family molecularly. From the result of the analysis, it may be known the genetic heterogeneity of gemak bird based on the sequence of D-loop gen. The collection of both types of gemak of Turnicidae family is still easy since we can find them in ricefield area after harvest particularly for Gemakloreng (Turnix sylvatica), it means while gemak tegalan (Turnixsusciator) is getting difficult to find. Based on the above DNA quantification standard, the blood sample of Gemak in this research is mostly grouped into pure blood (ranges from 1,63 – 1,90), and it deserves to be used for PCR analysis. The sequencing analysis has not detected the sequence of nucleotide completely. However, it indicates sequence polymorphism of base as the arranger of D-loop gen. D-loop gen may identify genetic heterogeneity of gemak bird of Turnicidae family, but it is necessary to perform further sequencing analysis with PCR-RFLP technique. This complete nucleotide sequence is obtained and easy to detect after being cut restriction enzyme.
Graw, J; Liebstein, A; Pietrowski, D; Schmitt-John, T; Werner, T
1993-12-22
The murine genes, gamma B-cry and gamma C-cry, encoding the gamma B- and gamma C-crystallins, were isolated from a genomic DNA library. The complete nucleotide (nt) sequences of both genes were determined from 661 and 711 bp, respectively, upstream from the first exon to the corresponding polyadenylation sites, comprising more than 2650 and 2890 bp, respectively. The new sequences were compared to the partial cDNA sequences available for the murine gamma B-cry and gamma C-cry, as well as to the corresponding genomic sequences from rat and man, at both the nt and predicted amino acid (aa) sequence levels. In the gamma B-cry promoter region, a canonical CCAAT-box, a TATA-box, putative NF-I and C/EBP sites were detected. An R-repeat is inserted 366 bp upstream from the transcription start point. In contrast, the gamma C-cry promoter does not contain a CCAAT-box, but some other putative binding sites for transcription factors (AP-2, UBP-1, LBP-1) were located by computer analysis. The promoter regions of all six gamma-cry from mouse, rat and human, except human psi gamma F-cry, were analyzed for common sequence elements. A complex sequence element of about 70-80 bp was found in the proximal promoter, which contains a gamma-cry-specific and almost invariant sequence (crygpel) of 14 nt, and ends with the also invariant TATA-box. Within the complex sequence element, a minimum of three further features specific for the gamma A-, gamma B- and gamma D/E/F-cry genes can be defined, at least two of which were recently shown to be functional. In addition to these four sequence elements, a subtype-specific structure of inverted repeats with different-sized spacers can be deduced from the multiple sequence alignment. A phylogenetic analysis based on the promoter region, as well as the complete exon 3 of all gamma-cry from mouse, rat and man, suggests separation of only five gamma-cry subtypes (gamma A-, gamma B-, gamma C-, gamma D- and gamma E/F-cry) prior to species separation.
Liu, Maoyan; Liu, Xiangning; Li, Xun; Zhang, Deyong; Dai, Liangyin; Tang, Qianjun
2016-03-01
The genome sequence of pepper vein yellows virus (PeVYV) (PeVYV-HN, accession number KP326573), isolated from pepper plants (Capsicum annuum L.) grown at the Hunan Vegetables Institute (Changsha, Hunan, China), was determined by deep sequencing of small RNAs. The PeVYV-HN genome consists of 6244 nucleotides, contains six open reading frames (ORFs), and is similar to that of an isolate (AB594828) from Japan. Its genomic organization is similar to that of members of the genus Polerovirus. Sequence analysis revealed that PeVYV-HN shared 92% sequence identity with the Japanese PeVYV genome at both the nucleotide and amino acid levels. Evolutionary analysis based on the coat protein (CP), movement protein (MP), and RNA-dependent RNA polymerase (RdRP) showed that PeVYV could be divided into two major lineages corresponding to their geographical origins. The Asian isolates have a higher population expansion frequency than the African isolates. Negative selection and genetic drift (founder effect) were found to be the potential drivers of the molecular evolution of PeVYV. Moreover, recombination was not the distinct cause of PeVYV evolution. This is the first report of a complete genomic sequence of PeVYV in China.
Konami, Y; Yamamoto, K; Osawa, T; Irimura, T
1995-04-01
The complete amino acid sequence of a lactose-binding Cytisus sessilifolius anti-H(O) lectin II (CSA-II) was determined using a protein sequencer. After digestion of CSA-II with endoproteinase Lys-C or Asp-N, the resulting peptides were purified by reversed-phase high performance liquid chromatography (HPLC) and then subjected to sequence analysis. Comparison of the complete amino acid sequence of CSA-II with the sequences of other leguminous seed lectins revealed regions of extensive homology. The amino acid sequence of a putative carbohydrate-binding domain of CSA-II was found to be similar to those of several anti-H(O) leguminous lectins, especially to that of the L-fucose-binding Ulex europaeus lectin I (UEA-I).
Complete sequence analysis reveals two distinct poleroviruses infecting cucurbits in China.
Xiang, Hai-ying; Shang, Qiao-xia; Han, Cheng-gui; Li, Da-wei; Yu, Jia-lin
2008-01-01
The complete RNA genomes of a Chinese isolate of cucurbit aphid-borne yellows virus (CABYV-CHN) and a new polerovirus tentatively referred to as melon aphid-borne yellows virus (MABYV) were determined. The entire genome of CABYV-CHN shared 89.0% nucleotide sequence identity with the French CABYV isolate. In contrast, nucleotide sequence identities between MABYV and CABYV and other poleroviruses were in the range of 50.7-74.2%, with amino acid sequence identities ranging from 24.8 to 82.9% for individual gene products. We propose that CABYV-CHN is a strain of CABYV and that MABYV is a member of a tentative distinct species within the genus Polerovirus.
Complexity: an internet resource for analysis of DNA sequence complexity
Orlov, Y. L.; Potapov, V. N.
2004-01-01
The search for DNA regions with low complexity is one of the pivotal tasks of modern structural analysis of complete genomes. The low complexity may be preconditioned by strong inequality in nucleotide content (biased composition), by tandem or dispersed repeats or by palindrome-hairpin structures, as well as by a combination of all these factors. Several numerical measures of textual complexity, including combinatorial and linguistic ones, together with complexity estimation using a modified Lempel–Ziv algorithm, have been implemented in a software tool called ‘Complexity’ (http://wwwmgs.bionet.nsc.ru/mgs/programs/low_complexity/). The software enables a user to search for low-complexity regions in long sequences, e.g. complete bacterial genomes or eukaryotic chromosomes. In addition, it estimates the complexity of groups of aligned sequences. PMID:15215465
Liu, Shikai; Zhang, Jiaren; Yao, Jun; Liu, Zhanjiang
2016-05-01
The complete mitochondrial genome of the armored catfish, Hypostomus plecostomus, was determined by next generation sequencing of genomic DNA without prior sample processing or primer design. Bioinformatics analysis resulted in the entire mitochondrial genome sequence with length of 16,523 bp. The H. plecostomus mitochondrial genome is consisted of 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes, and 1 control region, showing typical circular molecule structure of mitochondrial genome as in other vertebrates. The whole genome base composition was estimated to be 31.8% A, 27.0% T, 14.6% G, and 26.6% C, with A/T bias of 58.8%. This work provided the H. plecostomus mitochondrial genome sequence which should be valuable for species identification, phylogenetic analysis and conservation genetics studies in catfishes.
Quantitative phenotyping via deep barcode sequencing
Smith, Andrew M.; Heisler, Lawrence E.; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J.; Chee, Mark; Roth, Frederick P.; Giaever, Guri; Nislow, Corey
2009-01-01
Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or “Bar-seq,” outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that ∼20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene–environment interactions on a genome-wide scale. PMID:19622793
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schulman, Al
2009-08-09
Three subfamilies of grasses, the Erhardtoideae (rice), the Panicoideae (maize, sorghum, sugar cane and millet), and the Pooideae (wheat, barley and cool season forage grasses) provide the basis of human nutrition and are poised to become major sources of renewable energy. Here we describe the complete genome sequence of the wild grass Brachypodium distachyon (Brachypodium), the first member of the Pooideae subfamily to be completely sequenced. Comparison of the Brachypodium, rice and sorghum genomes reveals a precise sequence- based history of genome evolution across a broad diversity of the grass family and identifies nested insertions of whole chromosomes into centromericmore » regions as a predominant mechanism driving chromosome evolution in the grasses. The relatively compact genome of Brachypodium is maintained by a balance of retroelement replication and loss. The complete genome sequence of Brachypodium, coupled to its exceptional promise as a model system for grass research, will support the development of new energy and food crops« less
Kotak, Malini; Isanapong, Jantiya; Goodwin, Lynne A.; ...
2015-03-05
The Opitutaceae bacterium strain TAV5, a member of the phylum Verrucomicrobia, was isolated from the wood-feeding termite hindgut. Here, we report here its complete genome sequence, which contains a chromosome and a plasmid of 7,317,842 bp and 99,831 bp, respectively. In conclusion, genomic analysis reveals genes for methylotrophy, lignocellulose degradation, and ammonia and sulfate assimilation.
Watanabe, Satoru; Shiwa, Yuh; Itaya, Mitsuhiro; Yoshikawa, Hirofumi
2012-12-01
Genome synthesis of existing or designed genomes is made feasible by the first successful cloning of a cyanobacterium, Synechocystis PCC6803, in Gram-positive, endospore-forming Bacillus subtilis. Whole-genome sequence analysis of the isolate and parental B. subtilis strains provides clues for identifying single nucleotide polymorphisms (SNPs) in the 2 complete bacterial genomes in one cell.
Ouwerkerk, Janneke P.; Schaap, Peter J.; Ritari, Jarmo; Paulin, Lars; Belzer, Clara
2017-01-01
ABSTRACT Akkermansia glycaniphila is a novel Akkermansia species that was isolated from the intestine of the reticulated python and shares the capacity to degrade mucin with the human strain Akkermansia muciniphila MucT. Here, we report the complete genome sequence of strain PytT of 3,074,121 bp. The genomic analysis reveals genes for mucin degradation and aerobic respiration. PMID:28057747
DNA Translator and Aligner: HyperCard utilities to aid phylogenetic analysis of molecules.
Eernisse, D J
1992-04-01
DNA Translator and Aligner are molecular phylogenetics HyperCard stacks for Macintosh computers. They manipulate sequence data to provide graphical gene mapping, conversions, translations and manual multiple-sequence alignment editing. DNA Translator is able to convert documented GenBank or EMBL documented sequences into linearized, rescalable gene maps whose gene sequences are extractable by clicking on the corresponding map button or by selection from a scrolling list. Provided gene maps, complete with extractable sequences, consist of nine metazoan, one yeast, and one ciliate mitochondrial DNAs and three green plant chloroplast DNAs. Single or multiple sequences can be manipulated to aid in phylogenetic analysis. Sequences can be translated between nucleic acids and proteins in either direction with flexible support of alternate genetic codes and ambiguous nucleotide symbols. Multiple aligned sequence output from diverse sources can be converted to Nexus, Hennig86 or PHYLIP format for subsequent phylogenetic analysis. Input or output alignments can be examined with Aligner, a convenient accessory stack included in the DNA Translator package. Aligner is an editor for the manual alignment of up to 100 sequences that toggles between display of matched characters and normal unmatched sequences. DNA Translator also generates graphic displays of amino acid coding and codon usage frequency relative to all other, or only synonymous, codons for approximately 70 select organism-organelle combinations. Codon usage data is compatible with spreadsheet or UWGCG formats for incorporation of additional molecules of interest. The complete package is available via anonymous ftp and is free for non-commercial uses.
The complete mitochondrial genome of Conus tulipa (Neogastropoda: Conidae).
Chen, Po-Wei; Hsiao, Sheng-Tai; Huang, Chih-Wei; Chen, Kao-Sung; Tseng, Chen-Te; Wu, Wen-Lung; Hwang, Deng-Fwu
2016-07-01
The complete mitogenome sequence of the cone snail Conus tulipa (Linnaeus, 1758) has been sequenced by next-generation sequencing method. The assembled mitogenome is 16,599 bp in length, including 13 protein-coding genes, 22 transfer RNA genes and 2 ribosomal RNA genes. The overall base composition of C. tulipa is 28.7% A, 15.2% C, 18.4% G and 37.7% T. It shows 81.1% identity to the cone snail C. consors, 78.5% to C. borgesi and 77.5% to C. textile. Using the 13 protein-coding genes and 2 ribosomal RNA genes of C. tulipa in this study, together with 18 other closely species, we constructed the species phylogenetic tree to verify the accuracy and utility of new determined mitogenome sequence. The complete mitogenome of the C. tulipa provides an essential and important DNA molecular data for further phylogeography and evolutionary analysis for cone snail phylogeny.
Shen, Kang-Ning; Chen, Ching-Hung; Hsiao, Chung-Der; Durand, Jean-Dominique
2016-09-01
In this study, the complete mitogenome sequence of a cryptic species from East Australia (Mugil sp. H) belonging to the worldwide Mugil cephalus species complex (Teleostei: Mugilidae) has been sequenced by next-generation sequencing method. The assembled mitogenome, consisting of 16,845 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein-coding genes, 22 transfer RNAs, 2 ribosomal RNAs genes and a non-coding control region of D-loop. D-loop consists of 1067 bp length, and is located between tRNA-Pro and tRNA-Phe. The overall base composition of East Australia M. cephalus is 28.4% for A, 29.3% for C, 15.4% for G and 26.9% for T. The complete mitogenome may provide essential and important DNA molecular data for further phylogenetic and evolutionary analysis for flathead mullet species complex.
Shen, Kang-Ning; Yen, Ta-Chi; Chen, Ching-Hung; Li, Huei-Ying; Chen, Pei-Lung; Hsiao, Chung-Der
2016-05-01
In this study, the complete mitogenome sequence of Northwestern Pacific 2 (NWP2) cryptic species of flathead mullet, Mugil cephalus (Teleostei: Mugilidae) has been amplified by long-range PCR and sequenced by next-generation sequencing method. The assembled mitogenome, consisting of 16,686 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein-coding genes, 22 transfer RNAs, 2 ribosomal RNAs genes and a non-coding control region of D-loop. D-loop was 909 bp length and was located between tRNA-Pro and tRNA-Phe. The overall base composition of NWP2 M. cephalus was 28.4% for A, 29.8% for C, 26.5% for T and 15.3% for G. The complete mitogenome may provide essential and important DNA molecular data for further phylogenetic and evolutionary analysis for flathead mullet species complex.
Complete genome analysis of jasmine virus T from Jasminum sambac in China.
Tang, Yajun; Gao, Fangluan; Yang, Zhen; Wu, Zujian; Yang, Liang
2016-07-01
The genome of a potyvirus (isolate JaVT_FZ) recovered from jasmine (Jasminum sambac L.) showing yellow ringspot symptoms in Fuzhou, China, was sequenced. JaVT_FZ is closely related to seven other potyviruses with completely sequenced genomes, with which it shares 66-70 % nucleotide and 52-56 % amino acid sequence identity. However, the coat protein (CP) gene shares 82-92 % nucleotide and 90-97 % amino acid sequence identity with those of two partially sequenced potyviruses, named jasmine potyvirus T (JaVT-jasmine) and jasmine yellow mosaic potyvirus (JaYMV-India), respectively. This suggests that JaVT_FZ, JaVT-jasmine and JaYMV-India should be regarded as members of a single potyvirus species, for which the name "Jasmine virus T" has priority.
Su, Aiguo; Geng, Jianing; Grover, Corrinne E.; Hu, Songnian; Hua, Jinping
2013-01-01
Background Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. Methodology/Principal Findings We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. Conclusion The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species. PMID:23940520
Liu, Guozheng; Cao, Dandan; Li, Shuangshuang; Su, Aiguo; Geng, Jianing; Grover, Corrinne E; Hu, Songnian; Hua, Jinping
2013-01-01
Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species.
Kuhn, Jens H.; Andersen, Kristian G.; Bào, Yīmíng; Bavari, Sina; Becker, Stephan; Bennett, Richard S.; Bergman, Nicholas H.; Blinkova, Olga; Bradfute, Steven; Brister, J. Rodney; Bukreyev, Alexander; Chandran, Kartik; Chepurnov, Alexander A.; Davey, Robert A.; Dietzgen, Ralf G.; Doggett, Norman A.; Dolnik, Olga; Dye, John M.; Enterlein, Sven; Fenimore, Paul W.; Formenty, Pierre; Freiberg, Alexander N.; Garry, Robert F.; Garza, Nicole L.; Gire, Stephen K.; Gonzalez, Jean-Paul; Griffiths, Anthony; Happi, Christian T.; Hensley, Lisa E.; Herbert, Andrew S.; Hevey, Michael C.; Hoenen, Thomas; Honko, Anna N.; Ignatyev, Georgy M.; Jahrling, Peter B.; Johnson, Joshua C.; Johnson, Karl M.; Kindrachuk, Jason; Klenk, Hans-Dieter; Kobinger, Gary; Kochel, Tadeusz J.; Lackemeyer, Matthew G.; Lackner, Daniel F.; Leroy, Eric M.; Lever, Mark S.; Mühlberger, Elke; Netesov, Sergey V.; Olinger, Gene G.; Omilabu, Sunday A.; Palacios, Gustavo; Panchal, Rekha G.; Park, Daniel J.; Patterson, Jean L.; Paweska, Janusz T.; Peters, Clarence J.; Pettitt, James; Pitt, Louise; Radoshitzky, Sheli R.; Ryabchikova, Elena I.; Saphire, Erica Ollmann; Sabeti, Pardis C.; Sealfon, Rachel; Shestopalov, Aleksandr M.; Smither, Sophie J.; Sullivan, Nancy J.; Swanepoel, Robert; Takada, Ayato; Towner, Jonathan S.; van der Groen, Guido; Volchkov, Viktor E.; Volchkova, Valentina A.; Wahl-Jensen, Victoria; Warren, Travis K.; Warfield, Kelly L.; Weidmann, Manfred; Nichol, Stuart T.
2014-01-01
Sequence determination of complete or coding-complete genomes of viruses is becoming common practice for supporting the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, sequencing hardware is under modification for use by non-experts, and software is constantly being improved to simplify sequence data management and analysis. Thus, analysis of virus disease outbreaks on the molecular level is now feasible, including characterization of the evolution of individual virus populations in single patients over time. The increasing accumulation of sequencing data creates a management problem for the curators of commonly used sequence databases and an entry retrieval problem for end users. Therefore, utilizing the data to their fullest potential will require setting nomenclature and annotation standards for virus isolates and associated genomic sequences. The National Center for Biotechnology Information’s (NCBI’s) RefSeq is a non-redundant, curated database for reference (or type) nucleotide sequence records that supplies source data to numerous other databases. Building on recently proposed templates for filovirus variant naming [
NASA Astrophysics Data System (ADS)
Haryati, Sri; Agung Prasetyo, Afiono; Sari, Yulia; Dharmawan, Ruben
2018-05-01
Toxoplasma gondii Surface Antigen 1 (SAG1) is often used as a diagnostic tool due to its immunodominant-specific as antigen. However, data of the Toxoplasma gondii SAG1 protein from Indonesian isolate is limited. To study the protein, genomic DNA was isolated from a Javanese acute toxoplasmosis blood samples patient. A complete coding sequence of Toxoplasma gondii SAG1 was cloned and inserted into an Escherichia coli expression plasmid and sequenced. The sequencing results were subjected to bioinformatics analysis. The Toxoplasma gondii SAG1 complete coding sequences were successfully cloned. Physicochemical analysis revealed the 336 aa of SAG1 had 34.7 kDa of weight. The isoelectric point and aliphatic index were 8.4 and 78.4, respectively. The N-terminal methionine half-life in Escherichia coli was more than 10 hours. The antigenicity, secondary structure, and identification of the HLA binding motifs also had been discussed. The results of this study would contribute information about Toxoplasma gondii SAG1 and benefits for further works willing to develop diagnostic and therapeutic strategies against the parasite.
USDA-ARS?s Scientific Manuscript database
We recently described the complete genome of enterohemorrhagic Escherichia coli (EHEC) O157:H7 strain NADC 6564, an isolate of strain 86-24 linked to the 1986 disease outbreak. In the current study, we compared the chromosomal sequence of NADC 6564 to the well-characterized chromosomal sequences of ...
Ruhlman, Tracey; Lee, Seung-Bum; Jansen, Robert K; Hostetler, Jessica B; Tallon, Luke J; Town, Christopher D; Daniell, Henry
2006-08-31
Carrot (Daucus carota) is a major food crop in the US and worldwide. Its capacity for storage and its lifecycle as a biennial make it an attractive species for the introduction of foreign genes, especially for oral delivery of vaccines and other therapeutic proteins. Until recently efforts to express recombinant proteins in carrot have had limited success in terms of protein accumulation in the edible tap roots. Plastid genetic engineering offers the potential to overcome this limitation, as demonstrated by the accumulation of BADH in chromoplasts of carrot taproots to confer exceedingly high levels of salt resistance. The complete plastid genome of carrot provides essential information required for genetic engineering. Additionally, the sequence data add to the rapidly growing database of plastid genomes for assessing phylogenetic relationships among angiosperms. The complete carrot plastid genome is 155,911 bp in length, with 115 unique genes and 21 duplicated genes within the IR. There are four ribosomal RNAs, 30 distinct tRNA genes and 18 intron-containing genes. Repeat analysis reveals 12 direct and 2 inverted repeats > or = 30 bp with a sequence identity > or = 90%. Phylogenetic analysis of nucleotide sequences for 61 protein-coding genes using both maximum parsimony (MP) and maximum likelihood (ML) were performed for 29 angiosperms. Phylogenies from both methods provide strong support for the monophyly of several major angiosperm clades, including monocots, eudicots, rosids, asterids, eurosids II, euasterids I, and euasterids II. The carrot plastid genome contains a number of dispersed direct and inverted repeats scattered throughout coding and non-coding regions. This is the first sequenced plastid genome of the family Apiaceae and only the second published genome sequence of the species-rich euasterid II clade. Both MP and ML trees provide very strong support (100% bootstrap) for the sister relationship of Daucus with Panax in the euasterid II clade. These results provide the best taxon sampling of complete chloroplast genomes and the strongest support yet for the sister relationship of Caryophyllales to the asterids. The availability of the complete plastid genome sequence should facilitate improved transformation efficiency and foreign gene expression in carrot through utilization of endogenous flanking sequences and regulatory elements.
Xing, Wen-Rui; Hou, Bei-Wei; Guan, Jing-Jiao; Luo, Jing; Ding, Xiao-Yu
2013-04-01
The LEAFY (LFY) homologous gene of Dendrobium moniliforme (L.) Sw. was cloned by new primers which were designed based on the conservative region of known sequences of orchid LEAFY gene. Partial LFY homologous gene was cloned by common PCR, then we got the complete LFY homologous gene Den LFY by Tail-PCR. The complete sequence of DenLFY gene was 3 575 bp which contained three exons and two introns. Using BLAST method, comparison analysis among the exon of LFY homologous gene indicted that the DenLFY gene had high identity with orchids LFY homologous, including the related fragment of PhalLFY (84%) in Phalaenopsis hybrid cultivar, LFY homologous gene in Oncidium (90%) and in other orchid (over 80%). Using MP analysis, Dendrobium is found to be the sister to Oncidium and Phalaenopsis. Homologous analysis demonstrated that the C-terminal amino acids were highly conserved. When the exons and introns were separately considered, exons and the sequence of amino acid were good markers for the function research of DenLFY gene. The second intron can be used in authentication research of Dendrobium based on the length polymorphism between Dendrobium moniliforme and Dendrobium officinale.
Complete sequence and comparative analysis of the chloroplast genome of Plinia trunciflora
Eguiluz, Maria; Yuyama, Priscila Mary; Guzman, Frank; Rodrigues, Nureyev Ferreira; Margis, Rogerio
2017-01-01
Abstract Plinia trunciflora is a Brazilian native fruit tree from the Myrtaceae family, also known as jaboticaba. This species has great potential by its fruit production. Due to the high content of essential oils in their leaves and of anthocyanins in the fruits, there is also an increasing interest by the pharmaceutical industry. Nevertheless, there are few studies focusing on its molecular biology and genetic characterization. We herein report the complete chloroplast (cp) genome of P. trunciflora using high-throughput sequencing and compare it to other previously sequenced Myrtaceae genomes. The cp genome of P. trunciflora is 159,512 bp in size, comprising inverted repeats of 26,414 bp and single-copy regions of 88,097 bp (LSC) and 18,587 bp (SSC). The genome contains 111 single-copy genes (77 protein-coding, 30 tRNA and four rRNA genes). Phylogenetic analysis using 57 cp protein-coding genes demonstrated that P. trunciflora, Eugenia uniflora and Acca sellowiana form a cluster with closer relationship to Syzygium cumini than with Eucalyptus. The complete cp sequence reported here can be used in evolutionary and population genetics studies, contributing to resolve the complex taxonomy of this species and fill the gap in genetic characterization. PMID:29111566
Wang, Yongkang; Song, Xiaodan; Li, Xiaorong; Yang, Sang-tian; Zou, Xiang
2017-01-04
To explore the genome sequence of Aureobasidium pullulans CCTCC M2012223, analyze the key genes related to the biosynthesis of important metabolites, and provide genetic background for metabolic engineering. Complete genome of A. pullulans CCTCC M2012223 was sequenced by Illumina HiSeq high throughput sequencing platform. Then, fragment assembly, gene prediction, functional annotation, and GO/COG cluster were analyzed in comparison with those of other five A. pullulans varieties. The complete genome sequence of A. pullulans CCTCC M2012223 was 30756831 bp with an average GC content of 47.49%, and 9452 genes were successfully predicted. Genome-wide analysis showed that A. pullulans CCTCC M2012223 had the biggest genome assembly size. Protein sequences involved in the pullulan and polymalic acid pathway were highly conservative in all of six A. pullulans varieties. Although both A. pullulans CCTCC M2012223 and A. pullulans var. melanogenum have a close affinity, some point mutation and inserts were occurred in protein sequences involved in melanin biosynthesis. Genome information of A. pullulans CCTCC M2012223 was annotated and genes involved in melanin, pullulan and polymalic acid pathway were compared, which would provide a theoretical basis for genetic modification of metabolic pathway in A. pullulans.
Prevalence and genome characteristics of canine astrovirus in southwest China.
Li, Mingxiang; Yan, Nan; Ji, Conghui; Wang, Min; Zhang, Bin; Yue, Hua; Tang, Cheng
2018-05-30
The aim of this study was to investigate canine astrovirus (CaAstV) infection in southwest China. We collected 107 faecal samples from domestic dogs with obvious diarrhoea. Forty-two diarrhoeic samples (39.3 %) were positive for CaAstV by RT-PCR, and 41/42 samples showed co-infection with canine coronavirus (CCoV), canine parvovirus-2 (CPV-2) and canine distemper virus (CDV). Phylogenetic analysis based on 26 CaAstV partial ORF1a and ORF1b sequences revealed that most CaAstV strains showed unique evolutionary features. Interestingly, putative recombination events were observed among four of the five complete ORF2 sequences cloned in this study, and three of the five complete ORF2 sequences formed a single unique group, suggesting that these strains could be a novel genotype. We successfully sequenced the complete genome of one CaAstV strain (designated 2017/44/CHN), which was 6628 nt in length. The features of this genome include putative recombination events in the ORF1a, ORF1b and ORF2 genes, while the ORF2 gene had a continuous insertion of 7 aa in region II compared with the other complete ORF2 sequences available in GenBank. Phylogenetic analysis showed that 2017/44/CHN formed a single group based on genome sequences, suggesting that this strain might be a novel genotype. The results of this study revealed that CaAstV circulates widely in diarrhoeic dogs in southwest China and exhibits unique evolutionary events. To the best of our knowledge, this is the first report of recombination events in CaAstV, and it contributes to further understanding of the genetic evolution of CaAstV.
Ouwerkerk, Janneke P; Koehorst, Jasper J; Schaap, Peter J; Ritari, Jarmo; Paulin, Lars; Belzer, Clara; de Vos, Willem M
2017-01-05
Akkermansia glycaniphila is a novel Akkermansia species that was isolated from the intestine of the reticulated python and shares the capacity to degrade mucin with the human strain Akkermansia muciniphila Muc T Here, we report the complete genome sequence of strain Pyt T of 3,074,121 bp. The genomic analysis reveals genes for mucin degradation and aerobic respiration. Copyright © 2017 Ouwerkerk et al.
Near-Complete Genome Sequence of a Novel Single-Stranded RNA Virus Discovered in Indoor Air.
Rosario, Karyna; Fierer, Noah; Breitbart, Mya
2018-03-22
Viral metagenomic analysis of heating, ventilation, and air conditioning (HVAC) filters recovered the near-complete genome sequence of a novel virus, named HVAC-associated R NA v irus 1 (HVAC-RV1). The HVAC-RV1 genome is most similar to those of picorna-like viruses identified in arthropods but encodes a small domain observed only in negative-sense single-stranded RNA viruses. Copyright © 2018 Rosario et al.
Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.
Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi
2017-07-01
PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.
Molecular epidemiology of Plum pox virus in Japan.
Maejima, Kensaku; Himeno, Misako; Komatsu, Ken; Takinami, Yusuke; Hashimoto, Masayoshi; Takahashi, Shuichiro; Yamaji, Yasuyuki; Oshima, Kenro; Namba, Shigetou
2011-05-01
For a molecular epidemiological study based on complete genome sequences, 37 Plum pox virus (PPV) isolates were collected from the Kanto region in Japan. Pair-wise analyses revealed that all 37 Japanese isolates belong to the PPV-D strain, with low genetic diversity (less than 0.8%). In phylogenetic analysis of the PPV-D strain based on complete nucleotide sequences, the relationships of the PPV-D strain were reconstructed with high resolution: at the global level, the American, Canadian, and Japanese isolates formed their own distinct monophyletic clusters, suggesting that the routes of viral entry into these countries were independent; at the local level, the actual transmission histories of PPV were precisely reconstructed with high bootstrap support. This is the first description of the molecular epidemiology of PPV based on complete genome sequences.
Genomic characterization of two new enterovirus types, EV-A114 and EV-A121.
Deshpande, Jagadish M; Sharma, Deepa K; Saxena, Vinay K; Shetty, Sushmitha A; Qureshi, Tarique Husain I H; Nalavade, Uma P
2016-12-01
Enteroviruses cause a variety of illnesses of the gastrointestinal tract, central nervous system and cardiovascular system. Phylogenetic analysis of VP1 sequences has identified 106 different human enteroviruses classified into four enterovirus species within the genus Enterovirus of the family Picornaviridae. It is likely that not all enterovirus types have been discovered. Between September 2013 and October 2014, stool samples of 6274 apparently healthy children of up to 5 years of age residing in Gorakhpur district, Uttar Pradesh, India were screened for enteroviruses. Virus isolates obtained in RD and Hep-2c cells were identified by complete VP1 sequencing. Enteroviruses were isolated from 3042 samples. A total of 87 different enterovirus types were identified. Two isolates with 71 and 74 % nucleotide sequence similarity to all other known enteroviruses were recognized as novel types. In this paper we report identification and complete genome sequence analysis of these two isolates classified as EV-A114 and EV-A121.
Curated eutherian third party data gene data sets.
Premzl, Marko
2016-03-01
The free available eutherian genomic sequence data sets advanced scientific field of genomics. Of note, future revisions of gene data sets were expected, due to incompleteness of public eutherian genomic sequence assemblies and potential genomic sequence errors. The eutherian comparative genomic analysis protocol was proposed as guidance in protection against potential genomic sequence errors in public eutherian genomic sequences. The protocol was applicable in updates of 7 major eutherian gene data sets, including 812 complete coding sequences deposited in European Nucleotide Archive as curated third party data gene data sets.
Zhao, Ya-E; Xu, Ji-Ru; Hu, Li; Wu, Li-Ping; Wang, Zheng-Hang
2012-05-01
The study for the first time attempted to accomplish 18S ribosomal DNA (rDNA) complete sequence amplification and analysis for three Demodex species (Demodex folliculorum, Demodex brevis and Demodex canis) based on gDNA extraction from individual mites. The mites were treated by DNA Release Additive and Hot Start II DNA Polymerase so as to promote mite disruption and increase PCR specificity. Determination of D. folliculorum gDNA showed that the gDNA yield reached the highest at 1 mite, tending to descend with the increase of mite number. The individual mite gDNA was successfully used for 18S rDNA fragment (about 900 bp) amplification examination. The alignments of 18S rDNA complete sequences of individual mite samples and those of pooled mite samples ( ≥ 1000mites/sample) showed over 97% identities for each species, indicating that the gDNA extracted from a single individual mite was as satisfactory as that from pooled mites for PCR amplification. Further pairwise sequence analyses showed that average divergence, genetic distance, transition/transversion or phylogenetic tree could not effectively identify the three Demodex species, largely due to the differentiation in the D. canis isolates. It can be concluded that the individual Demodex mite gDNA can satisfy the molecular study of Demodex. 18S rDNA complete sequence is suitable for interfamily identification in Cheyletoidea, but whether it is suitable for intrafamily identification cannot be confirmed until the ascertainment of the types of Demodex mites parasitizing in dogs. Copyright © 2012 Elsevier Inc. All rights reserved.
Chen, Caihui; Zheng, Yongjie; Liu, Sian; Zhong, Yongda; Wu, Yanfang; Li, Jiang; Xu, Li-An; Xu, Meng
2017-01-01
Cinnamomum camphora , a member of the Lauraceae family, is a valuable aromatic and timber tree that is indigenous to the south of China and Japan. All parts of Cinnamomum camphora have secretory cells containing different volatile chemical compounds that are utilized as herbal medicines and essential oils. Here, we reported the complete sequencing of the chloroplast genome of Cinnamomum camphora using illumina technology. The chloroplast genome of Cinnamomum camphora is 152,570 bp in length and characterized by a relatively conserved quadripartite structure containing a large single copy region of 93,705 bp, a small single copy region of 19,093 bp and two inverted repeat (IR) regions of 19,886 bp. Overall, the genome contained 123 coding regions, of which 15 were repeated in the IR regions. An analysis of chloroplast sequence divergence revealed that the small single copy region was highly variable among the different genera in the Lauraceae family. A total of 40 repeat structures and 83 simple sequence repeats were detected in both the coding and non-coding regions. A phylogenetic analysis indicated that Calycanthus is most closely related to Lauraceae , both being members of Laurales , which forms a sister group to Magnoliids . The complete sequence of the chloroplast of Cinnamomum camphora will aid in in-depth taxonomical studies of the Lauraceae family in the future. The genetic sequence information will also have valuable applications for chloroplast genetic engineering.
Herrnstadt, Corinna; Elson, Joanna L; Fahy, Eoin; Preston, Gwen; Turnbull, Douglass M; Anderson, Christen; Ghosh, Soumitra S; Olefsky, Jerrold M; Beal, M Flint; Davis, Robert E; Howell, Neil
2002-05-01
The evolution of the human mitochondrial genome is characterized by the emergence of ethnically distinct lineages or haplogroups. Nine European, seven Asian (including Native American), and three African mitochondrial DNA (mtDNA) haplogroups have been identified previously on the basis of the presence or absence of a relatively small number of restriction-enzyme recognition sites or on the basis of nucleotide sequences of the D-loop region. We have used reduced-median-network approaches to analyze 560 complete European, Asian, and African mtDNA coding-region sequences from unrelated individuals to develop a more complete understanding of sequence diversity both within and between haplogroups. A total of 497 haplogroup-associated polymorphisms were identified, 323 (65%) of which were associated with one haplogroup and 174 (35%) of which were associated with two or more haplogroups. Approximately one-half of these polymorphisms are reported for the first time here. Our results confirm and substantially extend the phylogenetic relationships among mitochondrial genomes described elsewhere from the major human ethnic groups. Another important result is that there were numerous instances both of parallel mutations at the same site and of reversion (i.e., homoplasy). It is likely that homoplasy in the coding region will confound evolutionary analysis of small sequence sets. By a linkage-disequilibrium approach, additional evidence for the absence of human mtDNA recombination is presented here.
Nie, Xiaojun; Lv, Shuzuo; Zhang, Yingxin; Du, Xianghong; Wang, Le; Biradar, Siddanagouda S; Tan, Xiufang; Wan, Fanghao; Weining, Song
2012-01-01
Crofton weed (Ageratina adenophora) is one of the most hazardous invasive plant species, which causes serious economic losses and environmental damages worldwide. However, the sequence resource and genome information of A. adenophora are rather limited, making phylogenetic identification and evolutionary studies very difficult. Here, we report the complete sequence of the A. adenophora chloroplast (cp) genome based on Illumina sequencing. The A. adenophora cp genome is 150, 689 bp in length including a small single-copy (SSC) region of 18, 358 bp and a large single-copy (LSC) region of 84, 815 bp separated by a pair of inverted repeats (IRs) of 23, 755 bp. The genome contains 130 unique genes and 18 duplicated in the IR regions, with the gene content and organization similar to other Asteraceae cp genomes. Comparative analysis identified five DNA regions (ndhD-ccsA, psbI-trnS, ndhF-ycf1, ndhI-ndhG and atpA-trnR) containing parsimony-informative characters higher than 2%, which may be potential informative markers for barcoding and phylogenetic analysis. Repeat structure, codon usage and contraction of the IR were also investigated to reveal the pattern of evolution. Phylogenetic analysis demonstrated a sister relationship between A. adenophora and Guizotia abyssinica and supported a monophyly of the Asterales. We have assembled and analyzed the chloroplast genome of A. adenophora in this study, which was the first sequenced plastome in the Eupatorieae tribe. The complete chloroplast genome information is useful for plant phylogenetic and evolutionary studies within this invasive species and also within the Asteraceae family.
The complete mitochondrial genome sequence of the Tibetan red fox (Vulpes vulpes montana).
Zhang, Jin; Zhang, Honghai; Zhao, Chao; Chen, Lei; Sha, Weilai; Liu, Guangshuai
2015-01-01
In this study, the complete mitochondrial genome of the Tibetan red fox (Vulpes Vulpes montana) was sequenced for the first time using blood samples obtained from a wild female red fox captured from Lhasa in Tibet, China. Qinghai--Tibet Plateau is the highest plateau in the world with an average elevation above 3500 m. Sequence analysis showed it contains 12S rRNA gene, 16S rRNA gene, 22 tRNA genes, 13 protein-coding genes and 1 control region (CR). The variable tandem repeats in CR is the main reason of the length variability of mitochondrial genome among canide animals.
Premzl, Marko
2015-01-01
Using eutherian comparative genomic analysis protocol and public genomic sequence data sets, the present work attempted to update and revise two gene data sets. The most comprehensive third party annotation gene data sets of eutherian adenohypophysis cystine-knot genes (128 complete coding sequences), and d-dopachrome tautomerases and macrophage migration inhibitory factor genes (30 complete coding sequences) were annotated. For example, the present study first described primate-specific cystine-knot Prometheus genes, as well as differential gene expansions of D-dopachrome tautomerase genes. Furthermore, new frameworks of future experiments of two eutherian gene data sets were proposed. PMID:25941635
Distéfano, Ana J; Bonacic Kresic, Ivan; Hopp, H Esteban
2010-11-01
Cotton blue disease is the most important virus disease of cotton in the southern part of America. The complete nucleotide sequence of the ssRNA genome of the cotton blue disease-associated virus was determined for the first time. It comprised 5,866 nucleotides, and the deduced genomic organization resembled that of members of the genus Polerovirus. Sequence homology comparison and phylogenetic analysis confirm that this virus (previous proposed name cotton leafroll dwarf virus) is a member of a new species within the genus Polerovirus.
Angiuoli, Samuel V; White, James R; Matalka, Malcolm; White, Owen; Fricke, W Florian
2011-01-01
The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck" resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers.
Angiuoli, Samuel V.; White, James R.; Matalka, Malcolm; White, Owen; Fricke, W. Florian
2011-01-01
Background The widespread popularity of genomic applications is threatened by the “bioinformatics bottleneck” resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. Results We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Conclusions Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers. PMID:22028928
Fiallo-Olivé, Elvira; Martínez-Zubiaur, Yamila; Moriones, Enrique; Navas-Castillo, Jesús
2010-09-01
The complete genome sequence of two isolates of the bipartite begomovirus (genus Begomovirus, family Geminiviridae) Sida golden mosaic Florida virus (SiGMFV) is presented. We propose that both isolates, found infecting Malvastrum coromandelianum (family Malvaceae) in Cuba, belong to a new strain of SiGMFV. Phylogenetic analysis showed that SiGMFV DNA-A is located in a monophyletic cluster that includes begomoviruses infecting malvaceous weeds from the Caribbean.
MIPS: a database for genomes and protein sequences.
Mewes, H W; Heumann, K; Kaps, A; Mayer, K; Pfeiffer, F; Stocker, S; Frishman, D
1999-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried near Munich, Germany, develops and maintains genome oriented databases. It is commonplace that the amount of sequence data available increases rapidly, but not the capacity of qualified manual annotation at the sequence databases. Therefore, our strategy aims to cope with the data stream by the comprehensive application of analysis tools to sequences of complete genomes, the systematic classification of protein sequences and the active support of sequence analysis and functional genomics projects. This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). MIPS provides access through its WWW server (http://www.mips.biochem.mpg.de) to a spectrum of generic databases, including the above mentioned as well as a database of protein families (PROTFAM), the MITOP database, and the all-against-all FASTA database. PMID:9847138
Xu, Qin; Xiong, Guanjun; Li, Pengbo; He, Fei; Huang, Yi; Wang, Kunbo; Li, Zhaohu; Hua, Jinping
2012-01-01
Background Cotton (Gossypium spp.) is a model system for the analysis of polyploidization. Although ascertaining the donor species of allotetraploid cotton has been intensively studied, sequence comparison of Gossypium chloroplast genomes is still of interest to understand the mechanisms underlining the evolution of Gossypium allotetraploids, while it is generally accepted that the parents were A- and D-genome containing species. Here we performed a comparative analysis of 13 Gossypium chloroplast genomes, twelve of which are presented here for the first time. Methodology/Principal Findings The size of 12 chloroplast genomes under study varied from 159,959 bp to 160,433 bp. The chromosomes were highly similar having >98% sequence identity. They encoded the same set of 112 unique genes which occurred in a uniform order with only slightly different boundary junctions. Divergence due to indels as well as substitutions was examined separately for genome, coding and noncoding sequences. The genome divergence was estimated as 0.374% to 0.583% between allotetraploid species and A-genome, and 0.159% to 0.454% within allotetraploids. Forty protein-coding genes were completely identical at the protein level, and 20 intergenic sequences were completely conserved. The 9 allotetraploids shared 5 insertions and 9 deletions in whole genome, and 7-bp substitutions in protein-coding genes. The phylogenetic tree confirmed a close relationship between allotetraploids and the ancestor of A-genome, and the allotetraploids were divided into four separate groups. Progenitor allotetraploid cotton originated 0.43–0.68 million years ago (MYA). Conclusion Despite high degree of conservation between the Gossypium chloroplast genomes, sequence variations among species could still be detected. Gossypium chloroplast genomes preferred for 5-bp indels and 1–3-bp indels are mainly attributed to the SSR polymorphisms. This study supports that the common ancestor of diploid A-genome species in Gossypium is the maternal source of extant allotetraploid species and allotetraploids have a monophyletic origin. G. hirsutum AD1 lineages have experienced more sequence variations than other allotetraploids in intergenic regions. The available complete nucleotide sequences of 12 Gossypium chloroplast genomes should facilitate studies to uncover the molecular mechanisms of compartmental co-evolution and speciation of Gossypium allotetraploids. PMID:22876273
The complete genome sequence of freesia mosaic virus and its relationship to other potyviruses.
Choi, H I; Lim, H R; Song, Y S; Kim, M J; Choi, S H; Song, Y S; Bae, S C; Ryu, K H
2010-07-01
We have completed the genomic sequence of a potyvirus, freesia mosaic virus (FreMV), and compared it to those of other known potyviruses. The full-length genome sequence of FreMV consists of 9,489 nucleotides. The large protein contains 3,077 amino acids, with an AUG start codon and UAA stop codon, containing one open reading frame typical of a potyvirus polyprotein. The polyprotein of FreMV-Kr gives rise to eleven proteins (P1, HC-pro, P3, PIPO, 6K1, CI, 6K2, VPg, NIa, NIb and CP), and putative cleavage sites of each protein were identified by sequence comparison to those of other known potyviruses. Phylogenetic analysis of the polyprotein revealed that FreMV-Kr was most closely related to PeMoV and was related to BtMV, BaRMV and PeLMV, which belong to the BCMV subgroup. This is the first information on the complete genome structure of FreMV, and the sequence information clearly supports the status of FreMV as a member of a distinct species in the genus Potyvirus.
Analysis for complete genomic sequence of HLA-B and HLA-C alleles in the Chinese Han population.
Zhu, F; He, Y; Zhang, W; He, J; He, J; Xu, X; Lv, H; Yan, L
2011-08-01
In the present study, we have determined the complete genomic sequence and analysed the intron polymorphism of partial HLA-B and HLA-C alleles in the Chinese Han population. Over 3.0 kb DNA fragments of HLA-B and HLA-C loci were amplified by polymerase chain reaction from partial 5' untranslated region to 3' noncoding region respectively, and then the amplified products were sequenced. Full-length nucleotide sequences of 14 HLA-B alleles and 10 HLA-C alleles were obtained and have been submitted to GenBank and IMGT/HLA database. Two novel alleles of HLA-B*52:01:01:02 and HLA-B*59:01:01:02 were identified, and the complete genomic sequence of HLA-B*52:01:01:01 was firstly reported. Totally 157 and 167 polymorphism positions were found in the full-length genomic sequence of HLA-B and HLA-C loci respectively. Our results suggested that many single nucleotide polymorphisms existed in the exon and intron regions, and the data can provide useful information for understanding the evolution of HLA-B and HLA-C alleles. © 2011 Blackwell Publishing Ltd.
Complete Genome Sequence of a Rhodococcus Species Isolated from the Winter Skate Leucoraja ocellata.
Wiens, Julia; Ho, Ryan; Fernando, Dinesh; Kumar, Ayush; Loewen, Peter C; Brassinga, Ann Karen C; Anderson, W Gary
2016-09-01
We report here a genome sequence for Rhodococcus sp. isolate UM008 isolated from the renal/interrenal tissue of the winter skate Leucoraja ocellata Genome sequence analysis suggests that Rhodococcus bacteria may act in a novel mutualistic relationship with their elasmobranch host, serving as biocatalysts in the steroidogenic pathway of 1α-hydroxycorticosterone. Copyright © 2016 Wiens et al.
HLA genotyping by next-generation sequencing of complementary DNA.
Segawa, Hidenobu; Kukita, Yoji; Kato, Kikuya
2017-11-28
Genotyping of the human leucocyte antigen (HLA) is indispensable for various medical treatments. However, unambiguous genotyping is technically challenging due to high polymorphism of the corresponding genomic region. Next-generation sequencing is changing the landscape of genotyping. In addition to high throughput of data, its additional advantage is that DNA templates are derived from single molecules, which is a strong merit for the phasing problem. Although most currently developed technologies use genomic DNA, use of cDNA could enable genotyping with reduced costs in data production and analysis. We thus developed an HLA genotyping system based on next-generation sequencing of cDNA. Each HLA gene was divided into 3 or 4 target regions subjected to PCR amplification and subsequent sequencing with Ion Torrent PGM. The sequence data were then subjected to an automated analysis. The principle of the analysis was to construct candidate sequences generated from all possible combinations of variable bases and arrange them in decreasing order of the number of reads. Upon collecting candidate sequences from all target regions, 2 haplotypes were usually assigned. Cases not assigned 2 haplotypes were forwarded to 4 additional processes: selection of candidate sequences applying more stringent criteria, removal of artificial haplotypes, selection of candidate sequences with a relaxed threshold for sequence matching, and countermeasure for incomplete sequences in the HLA database. The genotyping system was evaluated using 30 samples; the overall accuracy was 97.0% at the field 3 level and 98.3% at the G group level. With one sample, genotyping of DPB1 was not completed due to short read size. We then developed a method for complete sequencing of individual molecules of the DPB1 gene, using the molecular barcode technology. The performance of the automatic genotyping system was comparable to that of systems developed in previous studies. Thus, next-generation sequencing of cDNA is a viable option for HLA genotyping.
ERIC Educational Resources Information Center
Wong, Wang-chan
2017-01-01
Systems Analysis and Design (SA&D) is the cornerstone course of a traditional information system curriculum. Conventionally, it is a sequence of two courses with the second course dedicated to the completion of a project. However, it has recently become more common to reduce the two-course sequence into one, especially for IS departments that…
Zeng, Cong; Thomas, Leighton J; Kelly, Michelle; Gardner, Jonathan P A
2016-05-01
The complete mitochondrial genome of a New Zealand specimen of the deep-sea sponge Poecillastra laminaris (Sollas, 1886) (Astrophorida, Vulcanellidae), from the Colville Ridge, New Zealand, was sequenced using the 454 Life Science pyrosequencing system. To identify homologous mitochondrial sequences, the 454 reads were mapped to the complete mitochondrial genome sequence of Geodia neptuni (GeneBank No. NC_006990). The P. laminaris genome is 18,413 bp in length and includes 14 protein-coding genes, 24 transfer RNA genes and 2 ribosomal RNA genes. Gene order resembled that of other demosponges. The base composition of the genome is A (29.1%), T (35.2%), C (14.0%) and G (21.7%). This is the second published mitogenome for a sponge of the order Astrophorida and will be useful in future phylogenetic analysis of deep-sea sponges.
USDA-ARS?s Scientific Manuscript database
The Agrotis ipsilon multiple nucleopolyhedrovirus (AgipMNPV) is a group II nucleopolyhedrovirus (NPV) from the black cutworm, A. ipsilon, with potential as a biopesticide to control infestations of cutworm larvae. The genome of the Illinois strain of AgipMNPV was completely sequenced. The AgipMNPV...
USDA-ARS?s Scientific Manuscript database
Current advances in sequencing technologies and bioinformatics allow to determine a nearly complete genomic background of rice, a staple food for the poor people. Consequently, comprehensive databases of variation among thousands of varieties is currently being assembled and released. Proper analysi...
Mutation detection in the human HSP70B′ gene by denaturing high-performance liquid chromatography
Hecker, Karl H.; Asea, Alexzander; Kobayashi, Kaoru; Green, Stacy; Tang, Dan; Calderwood, Stuart K.
2000-01-01
Variances, particularly single nucleotide polymorphisms (SNP), in the genomic sequence of individuals are the primary key to understanding gene function as it relates to differences in the susceptibility to disease, environmental influences, and therapy. In this report, the HSP70B′ gene is the target sequence for mutation detection in biopsy samples from human prostate cancer patients undergoing combined hyperthermia and radiation therapy at the Dana-Farber Cancer Institute, using temperature-modulated heteroduplex analysis (TMHA). The underlying principles of TMHA for mutation detection using DHPLC technology are discussed. The procedures involved in amplicon design for mutation analysis by DHPLC are detailed. The melting behavior of the complete coding sequence of the target gene is characterized using WAVEMAKERTM software. Four overlapping amplicons, which span the complete coding region of the HSP70B′ gene, amenable to mutation detection by DHPLC were identified based on the software-predicted melting profile of the target sequence. TMHA was performed on PCR products of individual amplicons of the HSP70B′ gene on the WAVE® Nucleic Acid Fragment Analysis System. The criteria for mutation calling by comparing wild-type and mutant chromatographic patterns are discussed. PMID:11189446
Mutation detection in the human HSP7OB' gene by denaturing high-performance liquid chromatography.
Hecker, K H; Asea, A; Kobayashi, K; Green, S; Tang, D; Calderwood, S K
2000-11-01
Variances, particularly single nucleotide polymorphisms (SNP), in the genomic sequence of individuals are the primary key to understanding gene function as it relates to differences in the susceptibility to disease, environmental influences, and therapy. In this report, the HSP70B' gene is the target sequence for mutation detection in biopsy samples from human prostate cancer patients undergoing combined hyperthermia and radiation therapy at the Dana-Farber Cancer Institute, using temperature-modulated heteroduplex analysis (TMHA). The underlying principles of TMHA for mutation detection using DHPLC technology are discussed. The procedures involved in amplicon design for mutation analysis by DHPLC are detailed. The melting behavior of the complete coding sequence of the target gene is characterized using WAVEMAKER software. Four overlapping amplicons, which span the complete coding region of the HSP70B' gene, amenable to mutation detection by DHPLC were identified based on the software-predicted melting profile of the target sequence. TMHA was performed on PCR products of individual amplicons of the HSP70B' gene on the WAVE Nucleic Acid Fragment Analysis System. The criteria for mutation calling by comparing wild-type and mutant chromatographic patterns are discussed.
Scheuch, Matthias; Höper, Dirk; Beer, Martin
2015-03-03
Fuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck. To overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS - Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets. RIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.
Probabilistic topic modeling for the analysis and classification of genomic sequences
2015-01-01
Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased. PMID:25916734
A complete Neandertal mitochondrial genome sequence determined by high-throughput sequencing
Green, Richard E.; Malaspinas, Anna-Sapfo; Krause, Johannes; Briggs, Adrian W.; Johnson, Philip L. F.; Uhler, Caroline; Meyer, Matthias; Good, Jeffrey M.; Maricic, Tomislav; Stenzel, Udo; Prüfer, Kay; Siebauer, Michael; Burbano, Hernán A.; Ronan, Michael; Rothberg, Jonathan M.; Egholm, Michael; Rudan, Pavao; Brajković, Dejana; Kućan, Željko; Gušić, Ivan; Wikström, Mårten; Laakkonen, Liisa; Kelso, Janet; Slatkin, Montgomery; Pääbo, Svante
2008-01-01
Summary A complete mitochondrial (mt) genome sequence was reconstructed from a 38,000-year-old Neandertal individual using 8,341 mtDNA sequences identified among 4.8 Gb of DNA generated from ~0.3 grams of bone. Analysis of the assembled sequence unequivocally establishes that the Neandertal mtDNA falls outside the variation of extant human mtDNAs and allows an estimate of the divergence date between the two mtDNA lineages of 660,000±140,000 years. Of the 13 proteins encoded in the mtDNA, subunit 2 of cytochrome c oxidase of the mitochondrial electron transport chain has experienced the largest number of amino acid substitutions in human ancestors since the separation from Neandertals. There is evidence that purifying selection in the Neandertal mtDNA was reduced compared to other primate lineages suggesting that the effective population size of Neandertals was small. PMID:18692465
Xin, Min; Zhang, Peipei; Liu, Wenwen; Ren, Yingdang; Cao, Mengji; Wang, Xifeng
2017-10-01
The complete nucleotide sequence of a novel positive single-stranded (+ss) RNA virus, tentatively named watermelon virus A (WVA), was determined using a combination of three methods: RNA sequencing, small RNA sequencing, and Sanger sequencing. The full genome of WVA is comprised of 8,372 nucleotides (nt), excluding the poly (A) tail, and contains four open reading frames (ORFs). The largest ORF, ORF1 encodes a putative replication-associated polyprotein (RP) with three conserved domains. ORF2 and ORF4 encode a movement protein (MP) and coat protein (CP), respectively. The putative product encoded by ORF3, of an estimated molecular mass of 25 kDa, has no significant similarity with other proteins. Identity and phylogenetic analysis indicate that WVA is a new virus, closely related to members of the family Betaflexiviridae. However, the final taxonomic allocation of WVA within the family is yet to be determined.
Krzeminska, Urszula; Wilson, Robyn; Rahman, Sadequr; Song, Beng Kah; Seneviratne, Sampath; Gan, Han Ming; Austin, Christopher M
2016-07-01
The complete mitochondrial genomes of two jungle crows (Corvus macrorhynchos) were sequenced. DNA was extracted from tissue samples obtained from shed feathers collected in the field in Sri Lanka and sequenced using the Illumina MiSeq Personal Sequencer. Jungle crow mitogenomes have a structural organization typical of the genus Corvus and are 16,927 bp and 17,066 bp in length, both comprising 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal subunit genes, and a non-coding control region. In addition, we complement already available house crow (Corvus spelendens) mitogenome resources by sequencing an individual from Singapore. A phylogenetic tree constructed from Corvidae family mitogenome sequences available on GenBank is presented. We confirm the monophyly of the genus Corvus and propose to use complete mitogenome resources for further intra- and interspecies genetic studies.
Gutiérrez, Pablo A; Alzate, Juan F; Montoya, Mauricio Marín
2015-06-01
Transcriptome analysis of a Cape gooseberry (Physalis peruviana) plant with leaf symptoms of a mild yellow mosaic typical of a viral disease revealed an infection with Potato virus X (PVX). The genome sequence of the PVX-Physalis isolate comprises 6435 nt and exhibits higher sequence similarity to members of the Eurasian group of PVX (~95 %) than to the American group (~77 %). Genome organization is similar to other PVX isolates with five open reading frames coding for proteins RdRp, TGBp1, TGBp2, TGBp3, and CP. 5' and 3' untranslated regions revealed all regulatory motifs typically found in PVX isolates. The PVX-Physalis genome is the only complete sequence available for a Potexvirus in Colombia and is a new addition to the restricted number of available sequences of PVX isolates infecting plant species different to potato.
Arrebola, Eva; Carrión, Víctor J.; Gutiérrez-Barranquero, José Antonio; Pérez-García, Alejandro; Ramos, Cayo; Cazorla, Francisco M.; de Vicente, Antonio
2015-01-01
The genome sequence of more than 100 Pseudomonas syringae strains has been sequenced to date; however only few of them have been fully assembled, including P. syringae pv. syringae B728a. Different strains of pv. syringae cause different diseases and have different host specificities; so, UMAF0158 is a P. syringae pv. syringae strain related to B728a but instead of being a bean pathogen it causes apical necrosis of mango trees, and the two strains belong to different phylotypes of pv.syringae and clades of P. syringae. In this study we report the complete sequence and annotation of P. syringae pv. syringae UMAF0158 chromosome and plasmid pPSS158. A comparative analysis with the available sequenced genomes of other 25 P. syringae strains, both closed (the reference genomes DC3000, 1448A and B728a) and draft genomes was performed. The 5.8 Mb UMAF0158 chromosome has 59.3% GC content and comprises 5017 predicted protein-coding genes. Bioinformatics analysis revealed the presence of genes potentially implicated in the virulence and epiphytic fitness of this strain. We identified several genetic features, which are absent in B728a, that may explain the ability of UMAF0158 to colonize and infect mango trees: the mangotoxin biosynthetic operon mbo, a gene cluster for cellulose production, two different type III and two type VI secretion systems, and a particular T3SS effector repertoire. A mutant strain defective in the rhizobial-like T3SS Rhc showed no differences compared to wild-type during its interaction with host and non-host plants and worms. Here we report the first complete sequence of the chromosome of a pv. syringae strain pathogenic to a woody plant host. Our data also shed light on the genetic factors that possibly determine the pathogenic and epiphytic lifestyle of UMAF0158. This work provides the basis for further analysis on specific mechanisms that enable this strain to infect woody plants and for the functional analysis of host specificity in the P. syringae complex. PMID:26313942
Ors, Suna; Inci, Ercan; Turkay, Rustu; Kokurcan, Atilla; Hocaoglu, Elif
2017-12-01
To compare efficancy of three-dimentional SPACE (sampling perfection with application-optimized contrasts using different flip-angle evolutions) and CISS (constructive interference in steady state) sequences in the imaging of the cisternal segments of cranial nerves V-XII. Temporal MRI scans from 50 patients (F:M ratio, 27:23; mean age, 44.5±15.9 years) admitted to our hospital with vertigo, tinnitus, and hearing loss were retrospectively analyzed. All patients had both CISS and SPACE sequences. Quantitative analysis of SPACE and CISS sequences was performed by measuring the ventricle-to-parenchyma contrast-to-noise ratio (CNR). Qualitative analysis of differences in visualization capability, image quality, and severity of artifacts was also conducted. A score ranging 'no artefact' to 'severe artefacts and unreadable' was used for the assessment of artifacts and from 'not visualized' to 'completely visualized' for the assesment of image quality, respectively. The distribution of variables was controlled by the Kolmogorov-Smirnov test. Samples t-test and McNemar's test were used to determine statistical significance. Rates of visualization of posterior fossa cranial nerves in cases of complete visualization were as follows: nerve V (100% for both sequences), nerve VI (94% in SPACE, 86% in CISS sequences), nerves VII-VIII (100% for both sequences), IX-XI nerve complex (96%, 88%); nerve XII (58%, 46%) (p<0.05). SPACE sequences showed fewer artifacts than CISS sequences (p<0.002). Copyright © 2017 Elsevier B.V. All rights reserved.
A NASTRAN primer for the analysis of rotating flexible blades
NASA Technical Reports Server (NTRS)
Lawrence, Charles; Aiello, Robert A.; Ernst, Michael A.; Mcgee, Oliver G.
1987-01-01
This primer provides documentation for using MSC NASTRAN in analyzing rotating flexible blades. The analysis of these blades includes geometrically nonlinear (large displacement) analysis under centrifugal loading, and frequency and mode shape (normal modes) determination. The geometrically nonlinear analysis using NASTRAN Solution sequence 64 is discussed along with the determination of frequencies and mode shapes using Solution Sequence 63. A sample problem with the complete NASTRAN input data is included. Items unique to rotating blade analyses, such as setting angle and centrifugal softening effects are emphasized.
Wen, Chiu-Ming
2017-08-01
An aquabirnavirus was isolated from diseased marbled eels (Anguilla marmorata; MEIPNV1310) with gill haemorrhages and associated mortality. Its genome segment sequences were obtained through next-generation sequencing and compared with published aquabirnavirus sequences. The results indicated that the genome sequence of MEIPNV1310 contains segment A (3099 nucleotides) and segment B (2789 nucleotides). Phylogenetic analysis showed that MEIPNV1310 is closely related to the infectious pancreatic necrosis Ab strain within genogroup II. This genome sequence is beneficial for studying the geographic distribution and evolution of aquabirnaviruses.
Ruhlman, Tracey; Lee, Seung-Bum; Jansen, Robert K; Hostetler, Jessica B; Tallon, Luke J; Town, Christopher D; Daniell, Henry
2006-01-01
Background Carrot (Daucus carota) is a major food crop in the US and worldwide. Its capacity for storage and its lifecycle as a biennial make it an attractive species for the introduction of foreign genes, especially for oral delivery of vaccines and other therapeutic proteins. Until recently efforts to express recombinant proteins in carrot have had limited success in terms of protein accumulation in the edible tap roots. Plastid genetic engineering offers the potential to overcome this limitation, as demonstrated by the accumulation of BADH in chromoplasts of carrot taproots to confer exceedingly high levels of salt resistance. The complete plastid genome of carrot provides essential information required for genetic engineering. Additionally, the sequence data add to the rapidly growing database of plastid genomes for assessing phylogenetic relationships among angiosperms. Results The complete carrot plastid genome is 155,911 bp in length, with 115 unique genes and 21 duplicated genes within the IR. There are four ribosomal RNAs, 30 distinct tRNA genes and 18 intron-containing genes. Repeat analysis reveals 12 direct and 2 inverted repeats ≥ 30 bp with a sequence identity ≥ 90%. Phylogenetic analysis of nucleotide sequences for 61 protein-coding genes using both maximum parsimony (MP) and maximum likelihood (ML) were performed for 29 angiosperms. Phylogenies from both methods provide strong support for the monophyly of several major angiosperm clades, including monocots, eudicots, rosids, asterids, eurosids II, euasterids I, and euasterids II. Conclusion The carrot plastid genome contains a number of dispersed direct and inverted repeats scattered throughout coding and non-coding regions. This is the first sequenced plastid genome of the family Apiaceae and only the second published genome sequence of the species-rich euasterid II clade. Both MP and ML trees provide very strong support (100% bootstrap) for the sister relationship of Daucus with Panax in the euasterid II clade. These results provide the best taxon sampling of complete chloroplast genomes and the strongest support yet for the sister relationship of Caryophyllales to the asterids. The availability of the complete plastid genome sequence should facilitate improved transformation efficiency and foreign gene expression in carrot through utilization of endogenous flanking sequences and regulatory elements. PMID:16945140
"First generation" automated DNA sequencing technology.
Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M
2011-10-01
Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.
Complete genome sequence of the European sheatfish virus.
Mavian, Carla; López-Bueno, Alberto; Fernández Somalo, María Pilar; Alcamí, Antonio; Alejo, Alí
2012-06-01
Viral diseases are an increasing threat to the thriving aquaculture industry worldwide. An emerging group of fish pathogens is formed by several ranaviruses, which have been isolated at different locations from freshwater and seawater fish species since 1985. We report the complete genome sequence of European sheatfish ranavirus (ESV), the first ranavirus isolated in Europe, which causes high mortality rates in infected sheatfish (Silurus glanis) and in other species. Analysis of the genome sequence shows that ESV belongs to the amphibian-like ranaviruses and is closely related to the epizootic hematopoietic necrosis virus (EHNV), a disease agent geographically confined to the Australian continent and notifiable to the World Organization for Animal Health.
Complete Genome Sequence of the European Sheatfish Virus
Mavian, Carla; López-Bueno, Alberto; Fernández Somalo, María Pilar; Alcamí, Antonio
2012-01-01
Viral diseases are an increasing threat to the thriving aquaculture industry worldwide. An emerging group of fish pathogens is formed by several ranaviruses, which have been isolated at different locations from freshwater and seawater fish species since 1985. We report the complete genome sequence of European sheatfish ranavirus (ESV), the first ranavirus isolated in Europe, which causes high mortality rates in infected sheatfish (Silurus glanis) and in other species. Analysis of the genome sequence shows that ESV belongs to the amphibian-like ranaviruses and is closely related to the epizootic hematopoietic necrosis virus (EHNV), a disease agent geographically confined to the Australian continent and notifiable to the World Organization for Animal Health. PMID:22570241
The complete mitochondrial genome of the stonefly Dinocras cephalotes (Plecoptera, Perlidae).
Elbrecht, Vasco; Poettker, Lisa; John, Uwe; Leese, Florian
2015-06-01
The complete mitochondrial genome of the perlid stonefly Dinocras cephalotes (Curtis, 1827) was sequenced using a combined 454 and Sanger sequencing approach using the known sequence of Pteronarcys princeps Banks, 1907 (Pteronarcyidae), to identify homologous 454 reads. The genome is 15,666 bp in length and includes 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and a control region. Gene order resembles that of basal arthropods. The base composition of the genome is A (33.5%), T (29.0%), C (24.4%) and G (13.1%). This is the second published mitogenome for the order Plecoptera and will be useful in future phylogenetic analysis.
The complete chloroplast genomes of two Wisteria species, W. floribunda and W. sinensis (Fabaceae).
Kim, Na-Rae; Kim, Kyunghee; Lee, Sang-Choon; Lee, Jung-Hoon; Cho, Seong-Hyun; Yu, Yeisoo; Kim, Young-Dong; Yang, Tae-Jin
2016-11-01
Wisteria floribunda and Wisteria sinensis are ornamental woody vines in the Fabaceae. The complete chloroplast genome sequences of the two species were generated by de novo assembly using whole genome next generation sequences. The chloroplast genomes of W. floribunda and W. sinensis were 130 960 bp and 130 561 bp long, respectively, and showed inverted repeat (IR)-lacking structures as those reported in IRLC in the Fabaceae. The chloroplast genomes of both species contained same number of protein-coding sequences (77), tRNA genes (30), and rRNA genes (4). The phylogenetic analysis with the reported chloroplast genomes confirmed close taxonomical relationship of W. floribunda and W. sinensis.
Complete amino acid sequence of the myoglobin from the Pacific sei whale, Balaenoptera borealis.
Jones, B N; Rothgeb, T M; England, R D; Gurd, F R
1979-04-25
The complete amino acid sequence of the major component myoglobin from Pacific sei whale, Balaenoptera borealis, was determined by specific cleavage of the protein to obtain large peptides which are readily degraded by the automatic sequencer. The acetimidated apomyoglobin was selectively cleaved at its two methionyl residues with cyanogen bromide and at its three arginyl residues by trypsin. From the sequence analysis of four of these peptides and the apomyoglobin, over 75% of the covalent structure of the protein was obtained. The remainder of the primary structure was determined by the sequence analysis of peptides that resulted from further digestion of the amino-terminal and central cyanogen bromide fragments. The amino-terminal fragment was specifically cleaved at its two tryptophanyl residues with N-chlorosuccinimide and the central cyanogen bromide fragment was cleaved at its glutamyl residues with staphylococcal protease and at its single tyrosyl residue with N-bromosuccinimide. The primary structure of this myoglobin proved identical with that from the gray whale but differs from that of the finback whale at four positions, from that of the minke whale at three positions and from the myoglobin of the humpback whale at one position. The above sequence identities and differences reflect the close taxonomic relationship of these five species of Cetacea.
Bhattacharyya, Anamitra; Stilwagen, Stephanie; Reznik, Gary; Feil, Helene; Feil, William S; Anderson, Iain; Bernal, Axel; D'Souza, Mark; Ivanova, Natalia; Kapatral, Vinayak; Larsen, Niels; Los, Tamara; Lykidis, Athanasios; Selkov, Eugene; Walunas, Theresa L; Purcell, Alexander; Edwards, Rob A; Hawkins, Trevor; Haselkorn, Robert; Overbeek, Ross; Kyrpides, Nikos C; Predki, Paul F
2002-10-01
Draft sequencing is a rapid and efficient method for determining the near-complete sequence of microbial genomes. Here we report a comparative analysis of one complete and two draft genome sequences of the phytopathogenic bacterium, Xylella fastidiosa, which causes serious disease in plants, including citrus, almond, and oleander. We present highlights of an in silico analysis based on a comparison of reconstructions of core biological subsystems. Cellular pathway reconstructions have been used to identify a small number of genes, which are likely to reside within the draft genomes but are not captured in the draft assembly. These represented only a small fraction of all genes and were predominantly large and small ribosomal subunit protein components. By using this approach, some of the inherent limitations of draft sequence can be significantly reduced. Despite the incomplete nature of the draft genomes, it is possible to identify several phage-related genes, which appear to be absent from the draft genomes and not the result of insufficient sequence sampling. This region may therefore identify potential host-specific functions. Based on this first functional reconstruction of a phytopathogenic microbe, we spotlight an unusual respiration machinery as a potential target for biological control. We also predicted and developed a new defined growth medium for Xylella.
Complete genomic sequence of a Tobacco rattle virus isolate from Michigan-grown potatoes.
Crosslin, James M; Hamm, Philip B; Kirk, William W; Hammond, Rosemarie W
2010-04-01
Tobacco rattle virus (TRV) causes stem mottle on potato leaves and necrotic arcs and rings in potato tubers, known as corky ringspot disease. Recently, TRV was reported in Michigan potato tubers cv. FL1879 exhibiting corky ringspot disease. Sequence analysis of the RNA-1-encoded 16-kDa gene of the Michigan isolate, designated MI-1, revealed homology to TRV isolates from Florida and Washington. Here, we report the complete genomic sequence of RNA-1 (6,791 nt) and RNA-2 (3,685 nt) of TRV MI-1. RNA-1 is predicted to contain four open reading frames, and the genome structure and phylogenetic analyses of the RNA-1 nucleotide sequence revealed significant homologies to the known sequences of other TRV-1 isolates. The relationships based on the full-length nucleotide sequence were different from than those based on the 16-kDa gene encoded on genomic RNA-1 and reflect sequence variation within a 20-25-aa residue region of the 16-kDa protein. MI-1 RNA-2 is predicted to contain three ORFs, encoding the coat protein (CP), a 37.6-kDa protein (ORF 2b), and a 33.6-kDa protein (ORF 2c). In addition, it contains a region of similarity to the 3' terminus of RNA-1, including a truncated portion of the 16-kDa cistron. Phylogenetic analysis of RNA-2, based on a comparison of nucleotide sequences with other members of the genus Tobravirus, indicates that TRV MI-1 and other North American isolates cluster as a distinct group. TRV M1-1 is only the second North American isolate for which there is a complete sequence of the genome, and it is distinct from the North American isolate TRV ORY. The relationship of the TRV MI-1 isolate to other tobravirus isolates is discussed.
Paterson, Andrew H.; Wang, Xuelin; Xu, Yiqing; Wu, Dongyang; Qu, Yanshu; Jiang, Anna; Ye, Qiaolin
2016-01-01
Cotton is one of the most important economic crops and the primary source of natural fiber and is an important protein source for animal feed. The complete nuclear and chloroplast (cp) genome sequences of G. raimondii are already available but not mitochondria. Here, we assembled the complete mitochondrial (mt) DNA sequence of G. raimondii into a circular genome of length of 676,078 bp and performed comparative analyses with other higher plants. The genome contains 39 protein-coding genes, 6 rRNA genes, and 25 tRNA genes. We also identified four larger repeats (63.9 kb, 10.6 kb, 9.1 kb, and 2.5 kb) in this mt genome, which may be active in intramolecular recombination in the evolution of cotton. Strikingly, nearly all of the G. raimondii mt genome has been transferred to nucleus on Chr1, and the transfer event must be very recent. Phylogenetic analysis reveals that G. raimondii, as a member of Malvaceae, is much closer to another cotton (G. barbadense) than other rosids, and the clade formed by two Gossypium species is sister to Brassicales. The G. raimondii mt genome may provide a crucial foundation for evolutionary analysis, molecular biology, and cytoplasmic male sterility in cotton and other higher plants. PMID:27847816
Ming, De-Song; Chen, Qing-Qing; Chen, Xiao-Tin
2018-05-14
To clarify the resistance mechanisms of Pannonibacter phragmitetus 31801, isolated from the blood of a liver abscess patient, at the genomic level, we performed whole genomic sequencing using a PacBio RS II single-molecule real-time long-read sequencer. Bioinformatic analysis of the resulting sequence was then carried out to identify any possible resistance genes. Analyses included Basic Local Alignment Search Tool searches against the Antibiotic Resistance Genes Database, ResFinder analysis of the genome sequence, and Resistance Gene Identifier analysis within the Comprehensive Antibiotic Resistance Database. Prophages, clustered regularly interspaced short palindromic repeats (CRISPR), and other putative virulence factors were also identified using PHAST, CRISPRfinder, and the Virulence Factors Database, respectively. The circular chromosome and single plasmid of P. phragmitetus 31801 contained multiple antibiotic resistance genes, including those coding for three different types of β-lactamase [NPS β-lactamase (EC 3.5.2.6), β-lactamase class C, and a metal-dependent hydrolase of β-lactamase superfamily I]. In addition, genes coding for subunits of several multidrug-resistance efflux pumps were identified, including those targeting macrolides (adeJ, cmeB), tetracycline (acrB, adeAB), fluoroquinolones (acrF, ceoB), and aminoglycosides (acrD, amrB, ceoB, mexY, smeB). However, apart from the tripartite macrolide efflux pump macAB-tolC, the genome did not appear to contain the complete complement of subunit genes required for production of most of the major multidrug-resistance efflux pumps.
Wang, Xiaodan; Ma, Dehong; Huang, Xinwei; Li, Lihua; Li, Duo; Zhao, Yujiao; Qiu, Lijuan; Pan, Yue; Chen, Junying; Xi, Juemin; Shan, Xiyun; Sun, Qiangming
2017-06-15
In the past few decades, dengue has spread rapidly and is an emerging disease in China. An unexpected dengue outbreak occurred in Xishuangbanna, Yunnan, China, resulting in 1331 patients in 2013. In order to obtain the complete genome information and perform mutation and evolutionary analysis of causative agent related to this largest outbreak of dengue fever. The viruses were isolated by cell culture and evaluated by genome sequence analysis. Phylogenetic trees were then constructed by Neighbor-Joining methods (MEGA6.0), followed by analysis of nucleotide mutation and amino acid substitution. The analysis of the diversity of secondary structure for E and NS1 protein were also performed. Then selection pressures acting on the coding sequences were estimated by PAML software. The complete genome sequences of two isolated strains (YNSW1, YNSW2) were 10,710 and 10,702 nucleotides in length, respectively. Phylogenetic analysis revealed both strain were classified as genotype II of DENV-3. The results indicated that both isolated strains of Xishuangbanna in 2013 and Laos 2013 stains (KF816161.1, KF816158.1, LC147061.1, LC147059.1, KF816162.1) were most similar to Bangladesh (AY496873.2) in 2002. After comparing with the DENV-3SS (H87) 62 amino acid substitutions were identified in translated regions, and 38 amino acid substitutions were identified in translated regions compared with DENV-3 genotype II stains Bangladesh (AY496873.2). 27(YNSW1) or 28(YNSW2) single nucleotide changes were observed in structural protein sequences with 7(YNSW1) or 8(YNSW2) non-synonymous mutations compared with AY496873.2. Of them, 4 non-synonymous mutations were identified in E protein sequences with (2 in the β-sheet, 2 in the coil). Meanwhile, 117(YNSW1) or 115 (YNSW2) single nucleotide changes were observed in non-structural protein sequences with 31(YNSW1) or 30 (YNSW2) non-synonymous mutations. Particularly, 14 single nucleotide changes were observed in NS1 sequences with 4/14 non-synonymous substitutions (4 in the coil). Selection pressure analysis revealed no positive selection in the amino acid sites of the genes encoding for structural and non-structural proteins. This study may help understand the intrinsic geographical relatedness of dengue virus 3 and contributes further to research on their infectivity, pathogenicity and vaccine development. Copyright © 2017 Elsevier B.V. All rights reserved.
Zhu, Ruo-Lin; Zhang, Qi-Ya
2014-04-01
Paralichthys olivaceus rhabdovirus (PORV), which is associated with high mortality rates in flounder, was isolated in China in 2005. Here, we provide an annotated sequence record of PORV, the genome of which comprises 11,182 nucleotides and contains six genes in the order 3'-N-P-M-G-NV-L-5'. Phylogenetic analysis based on glycoprotein sequences of PORV and other rhabdoviruses showed that PORV clusters with viral haemorrhagic septicemia virus (VHSV), genus Novirhabdovirus, family Rhabdoviridae. Further phylogenetic analysis of the combined amino acid sequences of six proteins of PORV and VHSV strains showed that PORV clusters with Korean strains and is closely related to Asian strains, all of which were isolated from flounder. In a comparison in which the sequences of the six proteins were combined, PORV shared the highest identity (98.3 %) with VHSV strain KJ2008 from Korea.
Brzeziński, K; Janowski, R; Podkowiński, J; Jaskólski, M
2001-01-01
The coding sequences of two S-adenosyl-L-homocysteine hydrolases (SAHases) were identified in yellow lupine by screenig of a cDNA library. One of them, corresponding to the complete protein, was sequenced and compared with 52 other SAHase sequences. Phylogenetic analysis of these proteins identified three groups of the enzymes. Group A comprises only bacterial sequences. Group B is subdivided into two subgroups, one of which (B1) is formed by animal sequences. Subgroup B2 consist of two distinct clusters, B2a and B2b. Cluster B2b comprises all known plant sequences, including the yellow lupine enzyme, which are distinguished by a 50-residue insert. Group C is heterogeneous and contains SAHases from Archaea as well as a new class of animal enzymes, distinctly different from those in group B1.
MIPS: a database for protein sequences and complete genomes.
Mewes, H W; Hani, J; Pfeiffer, F; Frishman, D
1998-01-01
The MIPS group [Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)] at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis of other small genomes and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). Through its WWW server (http://www.mips.biochem.mpg.de ) MIPS provides access to a variety of generic databases, including a database of protein families as well as automatically generated data by the systematic application of sequence analysis algorithms. The yeast genome sequence and its related information was also compiled on CD-ROM to provide dynamic interactive access to the 16 chromosomes of the first eukaryotic genome unraveled. PMID:9399795
Sherry, Norelle L.; Porter, Jessica L.; Seemann, Torsten; Watkins, Andrew; Stinear, Timothy P.
2013-01-01
Next-generation sequencing (NGS) of bacterial genomes has recently become more accessible and is now available to the routine diagnostic microbiology laboratory. However, questions remain regarding its feasibility, particularly with respect to data analysis in nonspecialist centers. To test the applicability of NGS to outbreak investigations, Ion Torrent sequencing was used to investigate a putative multidrug-resistant Escherichia coli outbreak in the neonatal unit of the Mercy Hospital for Women, Melbourne, Australia. Four suspected outbreak strains and a comparator strain were sequenced. Genome-wide single nucleotide polymorphism (SNP) analysis demonstrated that the four neonatal intensive care unit (NICU) strains were identical and easily differentiated from the comparator strain. Genome sequence data also determined that the NICU strains belonged to multilocus sequence type 131 and carried the blaCTX-M-15 extended-spectrum beta-lactamase. Comparison of the outbreak strains to all publicly available complete E. coli genome sequences showed that they clustered with neonatal meningitis and uropathogenic isolates. The turnaround time from a positive culture to the completion of sequencing (prior to data analysis) was 5 days, and the cost was approximately $300 per strain (for the reagents only). The main obstacles to a mainstream adoption of NGS technologies in diagnostic microbiology laboratories are currently cost (although this is decreasing), a paucity of user-friendly and clinically focused bioinformatics platforms, and a lack of genomics expertise outside the research environment. Despite these hurdles, NGS technologies provide unparalleled high-resolution genotyping in a short time frame and are likely to be widely implemented in the field of diagnostic microbiology in the next few years, particularly for epidemiological investigations (replacing current typing methods) and the characterization of resistance determinants. Clinical microbiologists need to familiarize themselves with these technologies and their applications. PMID:23408689
Wang, Guohong; Xiong, Yao; Xu, Qi; Yin, Jia; Hao, Yanling
2015-11-20
Lactobacillus paracasei CAUH35 was isolated from homemade koumiss, a traditional fermented dairy product with beneficial effects on human health. The genome consists of a circular 2,770,411 bp chromosome and four plasmids. Genome analysis revealed the presence of gene clusters involved in the production of exopolysaccharides and bacteriocin. The complete genome sequence of L. paracasei CAUH35 will provide genetic basis for further comparative and functional genomic analyses. Copyright © 2015. Published by Elsevier B.V.
The complete chloroplast genome of Aconitum chiisanense Nakai (Ranunculaceae).
Lim, Chae Eun; Kim, Goon-Bo; Baek, Seunghoon; Han, Su-Min; Yu, Hee-Ju; Mun, Jeong-Hwan
2017-01-01
We determined the complete chloroplast DNA sequence of Aconitum chiisanense Nakai, a rare Aconitum species endemic to Korea. The chloroplast genome is 155 934 bp in length and contains 4 rRNA, 30 tRNA, and 78 protein-coding genes. Phylogenetic analysis revealed that the chloroplast genome of A. chiisanense is closely related to that of A. barbatum var. puberulum. Sequence comparison with other Ranunculaceae chloroplasts identified a unique deletion in the rps16 gene of A. chiisanense chloroplast DNA that can serve as a molecular marker for species identification.
Baxter, Laura L; Hsu, Benjamin J; Umayam, Lowell; Wolfsberg, Tyra G; Larson, Denise M; Frith, Martin C; Kawai, Jun; Hayashizaki, Yoshihide; Carninci, Piero; Pavan, William J
2007-06-01
As part of the RIKEN mouse encyclopedia project, two cDNA libraries were prepared from melanocyte-derived cell lines, using techniques of full-length clone selection and subtraction/normalization to enrich for rare transcripts. End sequencing showed that these libraries display over 83% complete coding sequence at the 5' end and 96-97% complete coding sequence at the 3' end. Evaluation of the libraries, derived from B16F10Y tumor cells and melan-c cells, revealed that they contain clones for a majority of the genes previously demonstrated to function in melanocyte biology. Analysis of genomic locations for transcripts revealed that the distribution of melanocyte genes is non-random throughout the genome. Three genomic regions identified that showed significant clustering of melanocyte-expressed genes contain one or more genes previously shown to regulate melanocyte development or function. A catalog of genes expressed in these libraries is presented, providing a valuable resource of cDNA clones and sequence information that can be used for identification of new genes important for melanocyte development, function, and disease.
Bruce, A. Gregory; Ryan, Jonathan T.; Thomas, Mathew J.; Peng, Xinxia; Grundhoff, Adam; Tsai, Che-Chung
2013-01-01
The complete sequence of retroperitoneal fibromatosis-associated herpesvirus Macaca nemestrina (RFHVMn), the pig-tailed macaque homolog of Kaposi's sarcoma-associated herpesvirus (KSHV), was determined by next-generation sequence analysis of a Kaposi's sarcoma (KS)-like macaque tumor. Colinearity of genes was observed with the KSHV genome, and the core herpesvirus genes had strong sequence homology to the corresponding KSHV genes. RFHVMn lacked homologs of open reading frame 11 (ORF11) and KSHV ORFs K5 and K6, which appear to have been generated by duplication of ORFs K3 and K4 after the divergence of KSHV and RFHV. RFHVMn contained positional homologs of all other unique KSHV genes, although some showed limited sequence similarity. RFHVMn contained a number of candidate microRNA genes. Although there was little sequence similarity with KSHV microRNAs, one candidate contained the same seed sequence as the positional homolog, kshv-miR-K12-10a, suggesting functional overlap. RNA transcript splicing was highly conserved between RFHVMn and KSHV, and strong sequence conservation was noted in specific promoters and putative origins of replication, predicting important functional similarities. Sequence comparisons indicated that RFHVMn and KSHV developed in long-term synchrony with the evolution of their hosts, and both viruses phylogenetically group within the RV1 lineage of Old World primate rhadinoviruses. RFHVMn is the closest homolog of KSHV to be completely sequenced and the first sequenced RV1 rhadinovirus homolog of KSHV from a nonhuman Old World primate. The strong genetic and sequence similarity between RFHVMn and KSHV, coupled with similarities in biology and pathology, demonstrate that RFHVMn infection in macaques offers an important and relevant model for the study of KSHV in humans. PMID:24109218
Li, Puyuan; Huang, Yong; Yu, Lan; Liu, Yannan; Niu, Wenkai; Zou, Dayang; Liu, Huiying; Zheng, Jing; Yin, Xiuyun; Yuan, Jing; Yuan, Xin; Bai, Changqing
2017-09-01
Heteroresistance is a phenomenon in which there are various responses to antibiotics from bacterial cells within the same population. Here, we isolated and characterised an imipenem heteroresistant Acinetobacter baumannii strain (HRAB-85). The genome of strain HRAB-85 was completely sequenced and analysed to understand its antibiotic resistance mechanisms. Population analysis and multilocus sequence typing were performed. Subpopulations grew in the presence of imipenem at concentrations of up to 64μg/mL, and the strain was found to belong to ST208. The total length of strain HRAB-85 was 4,098,585bp with a GC content of 39.98%. The genome harboured at least four insertion sequences: the common ISAba1, ISAba22, ISAba24, and newly reported ISAba26. Additionally, 19 antibiotic-resistance genes against eight classes of antimicrobial agents were found, and 11 genomic islands (GIs) were identified. Among them, GI3, GI10, and GI11 contained many ISs and antibiotic-resistance determinants. The existence of imipenem heteroresistant phenotypes in A. baumannii was substantiated in this hospital, and imipenem pressure, which could induce imipenem-heteroresistant subpopulations, may select for highly resistant strains. The complete genome sequencing and bioinformatics analysis of HRAB-85 could improve our understanding of the epidemiology and resistance mechanisms of carbapenem-heteroresistant A. baumannii. Copyright © 2017. Published by Elsevier Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fields, C.A.
1996-06-01
The objective of this project is the development of practical software to automate the identification of genes in anonymous DNA sequences from the human, and other higher eukaryotic genomes. A software system for automated sequence analysis, gm (gene modeler) has been designed, implemented, tested, and distributed to several dozen laboratories worldwide. A significantly faster, more robust, and more flexible version of this software, gm 2.0 has now been completed, and is being tested by operational use to analyze human cosmid sequence data. A range of efforts to further understand the features of eukaryoyic gene sequences are also underway. This progressmore » report also contains papers coming out of the project including the following: gm: a Tool for Exploratory Analysis of DNA Sequence Data; The Human THE-LTR(O) and MstII Interspersed Repeats are subfamilies of a single widely distruted highly variable repeat family; Information contents and dinucleotide compostions of plant intron sequences vary with evolutionary origin; Splicing signals in Drosophila: intron size, information content, and consensus sequences; Integration of automated sequence analysis into mapping and sequencing projects; Software for the C. elegans genome project.« less
Cho, Myong-Suk; Hyun Cho, Chung; Yeon Kim, Su; Su Yoon, Hwan; Kim, Seung-Chul
2016-09-01
The complete chloroplast genome sequences of the wild flowering cherry, Prunus yedoensis Matsum., which is native and endemic to Jeju Island, Korea, is reported in this study. The genome size is 157 786 bp in length with 36.7% GC content, which is composed of LSC region of 85 908 bp, SSC region of 19 120 bp and two IR copies of 26 379 bp each. The cp genome contains 131 genes, including 86 coding genes, 8 rRNA genes and 37 tRNA genes. The maximum likelihood analysis was conducted to verify a phylogenetic position of the newly sequenced cp genome of P. yedoensis using 11 representatives of complete cp genome sequences within the family Rosaceae. The genus Prunus exhibited monophyly and the result of the phylogenetic relationship agreed with the previous phylogenetic analyses within Rosaceae.
Complete mitogenome sequencing and phylogenetic analysis of PaLi yak (Bos grunniens).
Bao, Pengjia; Guo, Xian; Pei, Jie; Liang, Chunnian; Ding, Xuezhi; Min, Chu; Wang, Hongbo; Wu, Xiaoyun; Yan, Ping
2016-11-01
PaLi yak is a very important local breed in China; as a year-round grazing animal, it plays a very important role for the economic and native herdsmen. The PaLi yak complete mitochondrial DNA is sequenced in this study, the total length is 16,324 bp, containing 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes and a non-coding control region (D-loop region). The order and composition are similar to most of the other vertebrates. The base contents are: 33.72% A, 25.80% C, 13.21% G and 27.27% T; A + T (60.99%) was higher than G + C (39.01%). The phylogenetic relationships were analyzed using the complete mitogenome sequence, results showed that the genetic relationship between yak and cattle is distinct. These information provides useful data for further study on protection of genetic resources and the taxonomy of Bovinae.
Liu, Tianyu; Liang, Yinan; Zhong, Xiuqin; Wang, Ning; Hu, Dandan; Zhou, Xuan; Gu, Xiaobin; Peng, Xuerong; Yang, Guangyou
2014-01-01
Dirofilaria immitis (heartworm) is the causative agent of an important zoonotic disease that is spread by mosquitoes. In this study, molecular and phylogenetic characterization of D. immitis were performed based on complete ND1 and 16S rDNA gene sequences, which provided the foundation for more advanced molecular diagnosis, prevention, and control of heartworm diseases. The mutation rate and evolutionary divergence in adult heartworm samples from seven dogs in western China were analyzed to obtain information on genetic diversity and variability. Phylogenetic relationships were inferred using both maximum parsimony (MP) and Bayes methods based on the complete gene sequences. The results suggest that D. immitis formed an independent monophyletic group in which the 16S rDNA gene has mutated more rapidly than has ND1. PMID:24639299
Huang, Ya-Yi; Matzke, Antonius J. M.; Matzke, Marjori
2013-01-01
Coconut, a member of the palm family (Arecaceae), is one of the most economically important trees used by mankind. Despite its diverse morphology, coconut is recognized taxonomically as only a single species (Cocos nucifera L.). There are two major coconut varieties, tall and dwarf, the latter of which displays traits resulting from selection by humans. We report here the complete chloroplast (cp) genome of a dwarf coconut plant, and describe the gene content and organization, inverted repeat fluctuations, repeated sequence structure, and occurrence of RNA editing. Phylogenetic relationships of monocots were inferred based on 47 chloroplast protein-coding genes. Potential nodes for events of gene duplication and pseudogenization related to inverted repeat fluctuation were mapped onto the tree using parsimony criteria. We compare our findings with those from other palm species for which complete cp genome sequences are available. PMID:24023703
Shen, Kang-Ning; Yen, Ta-Chi; Chen, Ching-Hung; Ye, Jeng-Jia; Hsiao, Chung-Der
2016-05-01
In this study, the complete mitogenome sequence of the cryptic "lineage B" big-fin reef squid, Sepioteuthis lessoniana (Cephalopoda: Loliginidae) has been sequenced by next-generation sequencing method. The assembled mitogenome consisting of 16,694 bp, includes 13 protein coding genes, 25 transfer RNAs, 2 ribosomal RNAs genes. The overall base composition of "lineage B" S. lessoniana is 36.7% for A, 18.9 % for C, 34.5 % for T and 9.8 % for G and show 90% identities to "lineage C" S. lessoniana. It is also exhibits high T + A content (71.2%), two non-coding regions with TA tandem repeats. The complete mitogenome of the cryptic "lineage B" S. lessoniana provides essential and important DNA molecular data for further phylogeography and evolutionary analysis for big-fin reef squid species complex.
Hsiao, Chung-Der; Shen, Kang-Ning; Ching, Tzu-Yun; Wang, Ya-Hsien; Ye, Jeng-Jia; Tsai, Shiou-Yi; Wu, Shan-Chun; Chen, Ching-Hung; Wang, Chia-Hui
2016-07-01
In this study, the complete mitogenome sequence of the cryptic "lineage A" big-fin reef squid, Sepioteuthis lessoniana (Cephalopoda: Loliginidae) has been sequenced by the next-generation sequencing method. The assembled mitogenome consists of 16,605 bp, which includes 13 protein-coding genes, 22 transfer RNAs, and 2 ribosomal RNAs genes. The overall base composition of "lineage A" S. lessoniana is 37.5% for A, 17.4% for C, 9.1% for G, and 35.9% for T and shows 87% identities to "lineage C" S. lessoniana. It is also noticed by its high T + A content (73.4%), two non-coding regions with TA tandem repeats. The complete mitogenome of the cryptic "lineage A" S. lessoniana provides essential and important DNA molecular data for further phylogeography and evolutionary analysis for big-fin reef squid species complex.
Huang, Ya-Yi; Matzke, Antonius J M; Matzke, Marjori
2013-01-01
Coconut, a member of the palm family (Arecaceae), is one of the most economically important trees used by mankind. Despite its diverse morphology, coconut is recognized taxonomically as only a single species (Cocos nucifera L.). There are two major coconut varieties, tall and dwarf, the latter of which displays traits resulting from selection by humans. We report here the complete chloroplast (cp) genome of a dwarf coconut plant, and describe the gene content and organization, inverted repeat fluctuations, repeated sequence structure, and occurrence of RNA editing. Phylogenetic relationships of monocots were inferred based on 47 chloroplast protein-coding genes. Potential nodes for events of gene duplication and pseudogenization related to inverted repeat fluctuation were mapped onto the tree using parsimony criteria. We compare our findings with those from other palm species for which complete cp genome sequences are available.
Complete genome sequence of Menghai rhabdovirus, a novel mosquito-borne rhabdovirus from China.
Sun, Qiang; Zhao, Qiumin; An, Xiaoping; Guo, Xiaofang; Zuo, Shuqing; Zhang, Xianglilan; Pei, Guangqian; Liu, Wenli; Cheng, Shi; Wang, Yunfei; Shu, Peng; Mi, Zhiqiang; Huang, Yong; Zhang, Zhiyi; Tong, Yigang; Zhou, Hongning; Zhang, Jiusong
2017-04-01
Menghai rhabdovirus (MRV) was isolated from Aedes albopictus in Menghai county of Yunnan Province, China, in August 2010. Whole-genome sequencing of MRV was performed using an Ion PGM™ Sequencer. We found that MRV is a single-stranded, negative-sense RNA virus. The complete genome of MRV has 10,744 nt, with short inverted repeat termini, encoding five typical rhabdovirus proteins (N, P, M, G, and L) and an additional small hypothetical protein. Nucleotide BLAST analysis using the BLASTn method showed that the genome sequence most similar to that of MRV is that of Arboretum virus (NC_025393.1), with a Max score of 322, query coverage of 14%, and 66% identity. Genomic and phylogenetic analyses both demonstrated that MRV should be considered a member of a novel species of the family Rhabdoviridae.
NASA Technical Reports Server (NTRS)
Wheeler, Ward C.
2003-01-01
The problem of determining the minimum cost hypothetical ancestral sequences for a given cladogram is known to be NP-complete (Wang and Jiang, 1994). Traditionally, point estimations of hypothetical ancestral sequences have been used to gain heuristic, upper bounds on cladogram cost. These include procedures with such diverse approaches as non-additive optimization of multiple sequence alignment, direct optimization (Wheeler, 1996), and fixed-state character optimization (Wheeler, 1999). A method is proposed here which, by extending fixed-state character optimization, replaces the estimation process with a search. This form of optimization examines a diversity of potential state solutions for cost-efficient hypothetical ancestral sequences and can result in greatly more parsimonious cladograms. Additionally, such an approach can be applied to other NP-complete phylogenetic optimization problems such as genomic break-point analysis. c2003 The Willi Hennig Society. Published by Elsevier Science (USA). All rights reserved.
Cloning and sequence analysis of Hemonchus contortus HC58cDNA.
Muleke, Charles I; Ruofeng, Yan; Lixin, Xu; Xinwen, Bo; Xiangrui, Li
2007-06-01
The complete coding sequence of Hemonchus contortus HC58cDNA was generated by rapid amplification of cDNA ends and polymerase chain reaction using primers based on the 5' and 3' ends of the parasite mRNA, accession no. AF305964. The HC58cDNA gene was 851 bp long, with open reading frame of 717 bp, precursors to 239 amino acids coding for approximately 27 kDa protein. Analysis of amino acid sequence revealed conserved residues of cysteine, histidine, asparagine, occluding loop pattern, hemoglobinase motif and glutamine of the oxyanion hole characteristic of cathepsin B like proteases (CBL). Comparison of the predicted amino acid sequences showed the protein shared 33.5-58.7% identity to cathepsin B homologues in the papain clan CA family (family C1). Phylogenetic analysis revealed close evolutionary proximity of the protein sequence to counterpart sequences in the CBL, suggesting that HC58cDNA was a member of the papain family.
Kraková, Lucia; Šoltys, Katarína; Budiš, Jaroslav; Grivalský, Tomáš; Ďuriš, František; Pangallo, Domenico; Szemes, Tomáš
2016-09-01
Different protocols based on Illumina high-throughput DNA sequencing and denaturing gradient gel electrophoresis (DGGE)-cloning were developed and applied for investigating hot spring related samples. The study was focused on three target genes: archaeal and bacterial 16S rRNA and mcrA of methanogenic microflora. Shorter read lengths of the currently most popular technology of sequencing by Illumina do not allow analysis of the complete 16S rRNA region, or of longer gene fragments, as was the case of Sanger sequencing. Here, we demonstrate that there is no need for special indexed or tailed primer sets dedicated to short variable regions of 16S rRNA since the presented approach allows the analysis of complete bacterial 16S rRNA amplicons (V1-V9) and longer archaeal 16S rRNA and mcrA sequences. Sample augmented with transposon is represented by a set of approximately 300 bp long fragments that can be easily sequenced by Illumina MiSeq. Furthermore, a low proportion of chimeric sequences was observed. DGGE-cloning based strategies were performed combining semi-nested PCR, DGGE and clone library construction. Comparing both investigation methods, a certain degree of complementarity was observed confirming that the DGGE-cloning approach is not obsolete. Novel protocols were created for several types of laboratories, utilizing the traditional DGGE technique or using the most modern Illumina sequencing.
Equid herpesvirus 8: Complete genome sequence and association with abortion in mares
Garvey, Marie; Suárez, Nicolás M.; Kerr, Karen; Hector, Ralph; Moloney-Quinn, Laura; Arkins, Sean; Davison, Andrew J.
2018-01-01
Equid herpesvirus 8 (EHV-8), formerly known as asinine herpesvirus 3, is an alphaherpesvirus that is closely related to equid herpesviruses 1 and 9 (EHV-1 and EHV-9). The pathogenesis of EHV-8 is relatively little studied and to date has only been associated with respiratory disease in donkeys in Australia and horses in China. A single EHV-8 genome sequence has been generated for strain Wh in China, but is apparently incomplete and contains frameshifts in two genes. In this study, the complete genome sequences of four EHV-8 strains isolated in Ireland between 2003 and 2015 were determined by Illumina sequencing. Two of these strains were isolated from cases of abortion in horses, and were misdiagnosed initially as EHV-1, and two were isolated from donkeys, one with neurological disease. The four genome sequences are very similar to each other, exhibiting greater than 98.4% nucleotide identity, and their phylogenetic clustering together demonstrated that genomic diversity is not dependent on the host. Comparative genomic analysis revealed 24 of the 76 predicted protein sequences are completely conserved among the Irish EHV-8 strains. Evolutionary comparisons indicate that EHV-8 is phylogenetically closer to EHV-9 than it is to EHV-1. In summary, the first complete genome sequences of EHV-8 isolates from two host species over a twelve year period are reported. The current study suggests that EHV-8 can cause abortion in horses. The potential threat of EHV-8 to the horse industry and the possibility that donkeys may act as reservoirs of infection warrant further investigation. PMID:29414990
Azhar, Esam I; Hashem, Anwar M; El-Kafrawy, Sherif A; Abol-Ela, Said; Abd-Alla, Adly M M; Sohrab, Sayed Sartaj; Farraj, Suha A; Othman, Norah A; Ben-Helaby, Huda G; Ashshi, Ahmed; Madani, Tariq A; Jamjoom, Ghazi
2015-01-16
Dengue viruses (DENVs) are mosquito-borne viruses which can cause disease ranging from mild fever to severe dengue infection. These viruses are endemic in several tropical and subtropical regions. Multiple outbreaks of DENV serotypes 1, 2 and 3 (DENV-1, DENV-2 and DENV-3) have been reported from the western region in Saudi Arabia since 1994. Strains from at least two genotypes of DENV-1 (Asia and America/Africa genotypes) have been circulating in western Saudi Arabia until 2006. However, all previous studies reported from Saudi Arabia were based on partial sequencing data of the envelope (E) gene without any reports of full genome sequences for any DENV serotypes circulating in Saudi Arabia. Here, we report the isolation and the first complete genome sequence of a DENV-1 strain (DENV-1-Jeddah-1-2011) isolated from a patient from Jeddah, Saudi Arabia in 2011. Whole genome sequence alignment and phylogenetic analysis showed high similarity between DENV-1-Jeddah-1-2011 strain and D1/H/IMTSSA/98/606 isolate (Asian genotype) reported from Djibouti in 1998. Further analysis of the full envelope gene revealed a close relationship between DENV-1-Jeddah-1-2011 strain and isolates reported between 2004-2006 from Jeddah as well as recent isolates from Somalia, suggesting the widespread of the Asian genotype in this region. These data suggest that strains belonging to the Asian genotype might have been introduced into Saudi Arabia long before 2004 most probably by African pilgrims and continued to circulate in western Saudi Arabia at least until 2011. Most importantly, these results indicate that pilgrims from dengue endemic regions can play an important role in the spread of new DENVs in Saudi Arabia and the rest of the world. Therefore, availability of complete genome sequences would serve as a reference for future epidemiological studies of DENV-1 viruses.
Kang, Sang-Ho; Lee, Jeong-Hoon; Lee, Hyun Oh; Ahn, Byoung Ohg; Won, So Youn; Sohn, Seong-Han; Kim, Jung Sun
2017-10-06
Glycyrrhiza uralensis and G. glabra, members of the Fabaceae, are medicinally important species that are native to Asia and Europe. Extracts from these plants are widely used as natural sweeteners because of their much greater sweetness than sucrose. In this study, the three complete chloroplast genomes and five 45S nuclear ribosomal (nr)DNA sequences of these two licorice species and an interspecific hybrid are presented. The chloroplast genomes of G. glabra, G. uralensis and G. glabra × G. uralensis were 127,895 bp, 127,716 bp and 127,939 bp, respectively. The three chloroplast genomes harbored 110 annotated genes, including 76 protein-coding genes, 30 tRNA genes and 4 rRNA genes. The 45S nrDNA sequences were either 5,947 or 5,948 bp in length. Glycyrrhiza glabra and G. glabra × G. uralensis showed two types of nrDNA, while G. uralensis contained a single type. The complete 45S nrDNA sequence unit contains 18S rRNA, ITS1, 5.8S rRNA, ITS2 and 26S rRNA. We identified simple sequence repeat and tandem repeat sequences. We also developed four reliable markers for analysis of Glycyrrhiza diversity authentication.
Isolation and Complete Genome Sequencing of Bluetongue Virus Serotype 12 from India.
Rao, P P; Reddy, Y V; Hegde, N R
2015-10-01
Bluetongue virus (BTV) causes disease mainly in sheep, but can be transmitted via other domestic and wild ruminants, resulting in pecuniary burden and trade restrictions. Segmented genome with the possibility of reassortment, existence of 26 serotypes, geographical restriction in the distribution of many of the serotypes, use of live attenuated vaccines and the lack of complete sequences of viruses isolated from several parts of the globe have complicated our understanding of the origin, movement and distribution of BTV. Recent efforts in genome sequencing of several strains have helped in better comprehending BTV epidemiology. In an effort to contribute to the genetic epidemiology of BTV in India, we report the isolation and complete genome sequencing of a BTV serotype 12 virus (designated NMO1). This is the first BTV-12 isolated from India and the second BTV-12 to be sequenced worldwide. The analysis of sequences of this virus suggests that NMO1 derived its segments from viruses belonging to western topotype viruses, as well as those from South-East Asia and India. The results have implications for understanding the origin, emergence/re-emergence and movement of BTV as well as for the development of vaccines and diagnostics based on robust epidemiological data. © 2013 Blackwell Verlag GmbH.
Enriching public descriptions of marine phages using the Genomic Standards Consortium MIGS standard
Duhaime, Melissa Beth; Kottmann, Renzo; Field, Dawn; Glöckner, Frank Oliver
2011-01-01
In any sequencing project, the possible depth of comparative analysis is determined largely by the amount and quality of the accompanying contextual data. The structure, content, and storage of this contextual data should be standardized to ensure consistent coverage of all sequenced entities and facilitate comparisons. The Genomic Standards Consortium (GSC) has developed the “Minimum Information about Genome/Metagenome Sequences (MIGS/MIMS)” checklist for the description of genomes and here we annotate all 30 publicly available marine bacteriophage sequences to the MIGS standard. These annotations build on existing International Nucleotide Sequence Database Collaboration (INSDC) records, and confirm, as expected that current submissions lack most MIGS fields. MIGS fields were manually curated from the literature and placed in XML format as specified by the Genomic Contextual Data Markup Language (GCDML). These “machine-readable” reports were then analyzed to highlight patterns describing this collection of genomes. Completed reports are provided in GCDML. This work represents one step towards the annotation of our complete collection of genome sequences and shows the utility of capturing richer metadata along with raw sequences. PMID:21677864
Lee, Eun ho; Song, Min-Suk; Shin, Jin-Young; Lee, Young-Min; Kim, Chul-Joong; Lee, Young Sik; Kim, Hyunggee; Choi, Young Ki
2007-09-01
Complete nucleotide sequences of two avian metapneumoviruses (aMPV), designated PL-1 and PL-2, were isolated from pheasants, revealing novel sequences of the first aMPV to be fully sequenced in Korea. The complete genome of both PL-1 and PL-2 was composed of 13,170 nucleotides. Phylogenetic analysis revealed that PL-1 belonged to aMPV subtype C, sharing higher homology in deduced amino acid sequence identities with hMPV, rather than with aMPV subtypes A and B. Replication of PL-1 in experimentally re-infected pheasants was confirmed by reverse transcription (RT)-polymerase chain reaction (PCR). Chickens and mice were experimentally inoculated with PL-1 to test the replication potential of PL-1 in other species. Although one specimen from the nasal turbinates of an inoculated chicken showed a slight trace of viral replication at 3 days post-infection (dpi), all of the infected mice were negative for aMPV by RT-PCR throughout the experiment, suggesting that PL-1 does not readily infect mammals. This is the first report of the isolation and complete genomic sequence of aMPV subtype C originating from pheasants.
Wu, L-P; Yang, T; Liu, H-W; Postman, J; Li, R
2018-05-01
A large contig with sequence similarities to several nucleorhabdoviruses was identified by high-throughput sequencing analysis from a black currant (Ribes nigrum L.) cultivar. The complete genome sequence of this new nucleorhabdovirus is 14,432 nucleotides long. Its genomic organization is very similar to those of unsegmented plant rhabdoviruses, containing six open reading frames in the order 3'-N-P-P3-M-G-L-5. The virus, which is provisionally named "black currant-associated rhabdovirus", is 41-52% identical in its genome nucleotide sequence to other nucleorhabdoviruses and may represent a new species in the genus Nucleorhabdovirus.
Gene Identification Algorithms Using Exploratory Statistical Analysis of Periodicity
NASA Astrophysics Data System (ADS)
Mukherjee, Shashi Bajaj; Sen, Pradip Kumar
2010-10-01
Studying periodic pattern is expected as a standard line of attack for recognizing DNA sequence in identification of gene and similar problems. But peculiarly very little significant work is done in this direction. This paper studies statistical properties of DNA sequences of complete genome using a new technique. A DNA sequence is converted to a numeric sequence using various types of mappings and standard Fourier technique is applied to study the periodicity. Distinct statistical behaviour of periodicity parameters is found in coding and non-coding sequences, which can be used to distinguish between these parts. Here DNA sequences of Drosophila melanogaster were analyzed with significant accuracy.
Ferles, Christos; Beaufort, William-Scott; Ferle, Vanessa
2017-01-01
The present study devises mapping methodologies and projection techniques that visualize and demonstrate biological sequence data clustering results. The Sequence Data Density Display (SDDD) and Sequence Likelihood Projection (SLP) visualizations represent the input symbolical sequences in a lower-dimensional space in such a way that the clusters and relations of data elements are depicted graphically. Both operate in combination/synergy with the Self-Organizing Hidden Markov Model Map (SOHMMM). The resulting unified framework is in position to analyze automatically and directly raw sequence data. This analysis is carried out with little, or even complete absence of, prior information/domain knowledge.
Cheng, Hui; Li, Jinfeng; Zhang, Hong; Cai, Binhua; Gao, Zhihong
2017-01-01
Compared with other members of the family Rosaceae, the chloroplast genomes of Fragaria species exhibit low variation, and this situation has limited phylogenetic analyses; thus, complete chloroplast genome sequencing of Fragaria species is needed. In this study, we sequenced the complete chloroplast genome of F. × ananassa ‘Benihoppe’ using the Illumina HiSeq 2500-PE150 platform and then performed a combination of de novo assembly and reference-guided mapping of contigs to generate complete chloroplast genome sequences. The chloroplast genome exhibits a typical quadripartite structure with a pair of inverted repeats (IRs, 25,936 bp) separated by large (LSC, 85,531 bp) and small (SSC, 18,146 bp) single-copy (SC) regions. The length of the F. × ananassa ‘Benihoppe’ chloroplast genome is 155,549 bp, representing the smallest Fragaria chloroplast genome observed to date. The genome encodes 112 unique genes, comprising 78 protein-coding genes, 30 tRNA genes and four rRNA genes. Comparative analysis of the overall nucleotide sequence identity among ten complete chloroplast genomes confirmed that for both coding and non-coding regions in Rosaceae, SC regions exhibit higher sequence variation than IRs. The Ka/Ks ratio of most genes was less than 1, suggesting that most genes are under purifying selection. Moreover, the mVISTA results also showed a high degree of conservation in genome structure, gene order and gene content in Fragaria, particularly among three octoploid strawberries which were F. × ananassa ‘Benihoppe’, F. chiloensis (GP33) and F. virginiana (O477). However, when the sequences of the coding and non-coding regions of F. × ananassa ‘Benihoppe’ were compared in detail with those of F. chiloensis (GP33) and F. virginiana (O477), a number of SNPs and InDels were revealed by MEGA 7. Six non-coding regions (trnK-matK, trnS-trnG, atpF-atpH, trnC-petN, trnT-psbD and trnP-psaJ) with a percentage of variable sites greater than 1% and no less than five parsimony-informative sites were identified and may be useful for phylogenetic analysis of the genus Fragaria. PMID:29038765
Anthony Johnson, A M; Borah, B K; Sai Gopal, D V R; Dasgupta, I
2012-12-01
Citrus yellow mosaic badna virus (CMBV), a member of the Family Caulimoviridae, Genus Badnavirus is the causative agent of mosaic disease among Citrus species in southern India. Despite its reported prevalence in several citrus species, complete information on clear functional genomics or functional information of full-length genomes from all the CMBV isolates infecting citrus species are not available in publicly accessible databases. CMBV isolates from Rough Lemon and Sweet Orange collected from a nursery were cloned and sequenced. The analysis revealed high sequence homology of the two CMBV isolates with previously reported CMBV sequences implying that they represent new variants. Based on computational analysis of the predicted secondary structures, the possible functions of some CMBV proteins have been analyzed.
2009-01-01
Background Parthenium argentatum (guayule) is an industrial crop that produces latex, which was recently commercialized as a source of latex rubber safe for people with Type I latex allergy. The complete plastid genome of P. argentatum was sequenced. The sequence provides important information useful for genetic engineering strategies. Comparison to the sequences of plastid genomes from three other members of the Asteraceae, Lactuca sativa, Guitozia abyssinica and Helianthus annuus revealed details of the evolution of the four genomes. Chloroplast-specific DNA barcodes were developed for identification of Parthenium species and lines. Results The complete plastid genome of P. argentatum is 152,803 bp. Based on the overall comparison of individual protein coding genes with those in L. sativa, G. abyssinica and H. annuus, we demonstrate that the P. argentatum chloroplast genome sequence is most closely related to that of H. annuus. Similar to chloroplast genomes in G. abyssinica, L. sativa and H. annuus, the plastid genome of P. argentatum has a large 23 kb inversion with a smaller 3.4 kb inversion, within the large inversion. Using the matK and psbA-trnH spacer chloroplast DNA barcodes, three of the four Parthenium species tested, P. tomentosum, P. hysterophorus and P. schottii, can be differentiated from P. argentatum. In addition, we identified lines within P. argentatum. Conclusion The genome sequence of the P. argentatum chloroplast will enrich the sequence resources of plastid genomes in commercial crops. The availability of the complete plastid genome sequence may facilitate transformation efficiency by using the precise sequence of endogenous flanking sequences and regulatory elements in chloroplast transformation vectors. The DNA barcoding study forms the foundation for genetic identification of commercially significant lines of P. argentatum that are important for producing latex. PMID:19917140
Maurino, Fernanda; Dumón, Analía D; Llauger, Gabriela; Alemandri, Vanina; de Haro, Luis A; Mattio, M Fernanda; Del Vas, Mariana; Laguna, Irma Graciela; Giménez Pecci, María de la Paz
2018-01-01
A rhabdovirus infecting maize and wheat crops in Argentina was molecularly characterized. Through next-generation sequencing (NGS) of symptomatic leaf samples, the complete genome was obtained of two isolates of maize yellow striate virus (MYSV), a putative new rhabdovirus, differing by only 0.4% at the nucleotide level. The MYSV genome consists of 12,654 nucleotides for maize and wheat virus isolates, and shares 71% nucleotide sequence identity with the complete genome of barley yellow striate mosaic virus (BYSMV, NC028244). Ten open reading frames (ORFs) were predicted in the MYSV genome from the antigenomic strand and were compared with their BYSMV counterparts. The highest amino acid sequence identity of the MYSV and BYSMV proteins was 80% between the L proteins, and the lowest was 37% between the proteins 4. Phylogenetic analysis suggested that the MYSV isolates are new members of the genus Cytorhabdovirus, family Rhabdoviridae. Yellow striate, affecting maize and wheat crops in Argentina, is an emergent disease that presents a potential economic risk for these widely distributed crops.
Complete genomic sequence of a tobacco rattle virus isolate from Michigan-grown potatoes
USDA-ARS?s Scientific Manuscript database
Tobacco rattle virus (TRV) causes stem mottle on potato leaves and necrotic arcs and rings in potato tubers, known as corky ringspot disease. Recently, TRV was reported in Michigan potato tubers cv. FL1879 exhibiting corky ringspot disease. Sequence analysis of the RNA-1-encoded 16 kDa gene of the...
Williams, Emma L; Bagg, Eleanor A L; Mueller, Michael; Vandrovcova, Jana; Aitman, Timothy J; Rumsby, Gill
2015-01-01
Definitive diagnosis of primary hyperoxaluria (PH) currently utilizes sequential Sanger sequencing of the AGXT, GRPHR, and HOGA1 genes but efficacy is unproven. This analysis is time-consuming, relatively expensive, and delays in diagnosis and inappropriate treatment can occur if not pursued early in the diagnostic work-up. We reviewed testing outcomes of Sanger sequencing in 200 consecutive patient samples referred for analysis. In addition, the Illumina Truseq custom amplicon system was evaluated for paralleled next-generation sequencing (NGS) of AGXT,GRHPR, and HOGA1 in 90 known PH patients. AGXT sequencing was requested in all patients, permitting a diagnosis of PH1 in 50%. All remaining patients underwent targeted exon sequencing of GRHPR and HOGA1 with 8% diagnosed with PH2 and 8% with PH3. Complete sequencing of both GRHPR and HOGA1 was not requested in 25% of patients referred leaving their diagnosis in doubt. NGS analysis showed 98% agreement with Sanger sequencing and both approaches had 100% diagnostic specificity. Diagnostic sensitivity of Sanger sequencing was 98% and for NGS it was 97%. NGS has comparable diagnostic performance to Sanger sequencing for the diagnosis of PH and, if implemented, would screen for all forms of PH simultaneously ensuring prompt diagnosis at decreased cost. PMID:25629080
Cao, Guojie; Allard, Marc; Hoffmann, Maria; Muruvanda, Tim; Luo, Yan; Payne, Justin; Meng, Kevin; Zhao, Shaohua; McDermott, Patrick; Brown, Eric; Meng, Jianghong
2018-06-01
Multidrug-resistant (MDR) plasmids play an important role in disseminating antimicrobial resistance genes. To elucidate the antimicrobial resistance gene compositions in A/C incompatibility complex (IncA/C) plasmids carried by animal-derived MDR Salmonella Newport, and to investigate the spread mechanism of IncA/C plasmids, this study characterizes the complete nucleotide sequences of IncA/C plasmids by comparative analysis. Complete nucleotide sequencing of plasmids and chromosomes of six MDR Salmonella Newport strains was performed using PacBio RSII. Open reading frames were assigned using prokaryotic genome annotation pipeline (PGAP). To understand genomic diversity and evolutionary relationships among Salmonella Newport IncA/C plasmids, we included three complete IncA/C plasmid sequences with similar backbones from Salmonella Newport and Escherichia coli: pSN254, pAM04528, and peH4H, and additional 200 draft chromosomes. With the exception of canine isolate CVM22462, which contained an additional IncI1 plasmid, each of the six MDR Salmonella Newport strains contained only the IncA/C plasmid. These IncA/C plasmids (including references) ranged in size from 80.1 (pCVM21538) to 176.5 kb (pSN254) and carried various resistance genes. Resistance genes floR, tetA, tetR, strA, strB, sul, and mer were identified in all IncA/C plasmids. Additionally, bla CMY-2 and sugE were present in all IncA/C plasmids, excepting pCVM21538. Plasmid pCVM22462 was capable of being transferred by conjugation. The IncI1 plasmid pCVM22462b in CVM22462 carried bla CMY-2 and sugE. Our data showed that MDR Salmonella Newport strains carrying similar IncA/C plasmids clustered together in the phylogenetic tree using chromosome sequences and the IncA/C plasmids from animal-derived Salmonella Newport contained diverse resistance genes. In the current study, we analyzed genomic diversities and phylogenetic relationships among MDR Salmonella Newport using complete plasmids and chromosome sequences and provided possible spread mechanism of IncA/C plasmids in Salmonella Newport Lineage II.
Raventós, D; Jensen, A B; Rask, M B; Casacuberta, J M; Mundy, J; San Segundo, B
1995-01-01
Transient gene expression assays in barley aleurone protoplasts were used to identify a cis-regulatory element involved in the elicitor-responsive expression of the maize PRms gene. Analysis of transcriptional fusions between PRms 5' upstream sequences and a chloramphenicol acetyltransferase reporter gene, as well as chimeric promoters containing PRms promoter fragments or repeated oligonucleotides fused to a minimal promoter, delineated a 20 bp sequence which functioned as an elicitor-response element (ERE). This sequence contains a motif (-246 AATTGACC) similar to sequences found in promoters of other pathogen-responsive genes. The analysis also indicated that an enhancing sequence(s) between -397 and -296 is required for full PRms activation by elicitors. The protein kinase inhibitor staurosporine was found to completely block the transcriptional activation induced by elicitors. These data indicate that protein phosphorylation is involved in the signal transduction pathway leading to PRms expression.
Ensser, Armin; Großkopf, Anna K; Mätz-Rensing, Kerstin; Roos, Christian; Hahn, Alexander S
2018-06-02
SFVmmu-DPZ9524 represents the third completely sequenced rhesus macaque simian foamy virus (SFV) isolate, alongside SFVmmu_K3T with a similar SFV-1-type env, and R289HybAGM with a SFV-2-like env. Sequence analysis demonstrates that, in gag and pol, SFVmmu-DPZ9524 is more closely related to R289HybAGM than to SFVmmu_K3T, which, outside of env, is more similar to a Japanese macaque isolate than to the other two rhesus macaque isolates SFVmmu-DPZ9524 and R289HybAGM. Further, we identify bel as another recombinant locus in R289HybAGM, confirming that recombination contributes to sequence diversity in SFV.
Genomic analysis of WCP30 Phage of Weissella cibaria for Dairy Fermented Foods.
Lee, Young-Duck; Park, Jong-Hyun
2017-01-01
In this study, we report the morphogenetic analysis and genome sequence of a new WCP30 phage of Weissella cibaria , isolated from a fermented food. Based on its morphology, as observed by transmission electron microscopy, WCP30 phage belongs to the family Siphoviridae . Genomic analysis of WCP30 phage showed that it had a 33,697-bp double-stranded DNA genome with 41.2% G+C content. Bioinformatics analysis of the genome revealed 35 open reading frames. A BLASTN search showed that WCP30 phage had low sequence similarity compared to other phages infecting lactic acid bacteria. This is the first report of the morphological features and complete genome sequence of WCP30 phage, which may be useful for controlling the fermentation of dairy foods.
Open Reading Frame Phylogenetic Analysis on the Cloud
2013-01-01
Phylogenetic analysis has become essential in researching the evolutionary relationships between viruses. These relationships are depicted on phylogenetic trees, in which viruses are grouped based on sequence similarity. Viral evolutionary relationships are identified from open reading frames rather than from complete sequences. Recently, cloud computing has become popular for developing internet-based bioinformatics tools. Biocloud is an efficient, scalable, and robust bioinformatics computing service. In this paper, we propose a cloud-based open reading frame phylogenetic analysis service. The proposed service integrates the Hadoop framework, virtualization technology, and phylogenetic analysis methods to provide a high-availability, large-scale bioservice. In a case study, we analyze the phylogenetic relationships among Norovirus. Evolutionary relationships are elucidated by aligning different open reading frame sequences. The proposed platform correctly identifies the evolutionary relationships between members of Norovirus. PMID:23671843
Xiang, Yu; Bernardy, Mike; Bhagwat, Basdeo; Wiersma, Paul A; DeYoung, Robyn; Bouthillier, Michel
2015-02-01
Strawberry decline disease, probably caused by synergistic reactions of mixed virus infections, threatens the North American strawberry industry. Deep sequencing of strawberry plant samples from eastern Canada resulted in the identification of a new virus genome resembling poleroviruses in sequence and genome structure. Phylogenetic analysis suggests that it is a new member of the genus Polerovirus, family Luteoviridae. The virus is tentatively named "strawberry polerovirus 1" (SPV1).
Hahnke, Sarah; Abendroth, Christian; Langer, Thomas; Codoñer, Francisco M; Ramm, Patrice; Porcar, Manuel; Luschnig, Olaf; Klocke, Michael
2018-04-05
A new Ruminococcaceae bacterium, strain HV4-5-B5C, participating in the anaerobic digestion of grass, was isolated from a mesophilic two-stage laboratory-scale leach bed biogas system. The draft annotated genome sequence presented in this study and 16S rRNA gene sequence analysis indicated the affiliation of HV4-5-B5C with the family Ruminococcaceae outside recently described genera. Copyright © 2018 Hahnke et al.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Willis, Leslie G.; Siepp, Robyn; Stewart, Taryn M.
2005-08-01
The genome of the Trichoplusia ni single nucleopolyhedrovirus (TnSNPV), a group II NPV which infects the cabbage looper (T. ni), has been completely sequenced and analyzed. The TnSNPV DNA genome consists of 134,394 bp and has an overall G + C content of 39%. Gene analysis predicted 144 open reading frames (ORFs) of 150 nucleotides or greater that showed minimal overlap. Comparisons with previously sequenced baculoviruses indicate that 119 TnSNPV ORFs were homologues of previously reported viral gene sequences. Ninety-four TnSNPV ORFs returned an Autographa californica multiple NPV (AcMNPV) homologue while 25 ORFs returned poor or no sequence matches withmore » the current databases. A putative photolyase gene was also identified that had highest amino acid identity to the photolyase genes of Chrysodeixis chalcites NPV (ChchNPV) (47%) and Danio rerio (zebrafish) (40%). In addition unlike all other baculoviruses no obvious homologous repeat (hr) sequences were identified. Comparison of the TnSNPV and AcMNPV genomes provides a unique opportunity to examine two baculoviruses that are highly virulent for a common insect host (T. ni) yet belong to diverse baculovirus taxonomic groups and possess distinct biological features. In vitro fusion assays demonstrated that the TnSNPV F protein induces membrane fusion and syncytia formation and were compared to syncytia formed by AcMNPV GP64.« less
Lo, Wen-Sui; Lin, Chan-Pin; Kuo, Chih-Horng
2013-01-01
Phytoplasmas are a group of bacteria that are associated with hundreds of plant diseases. Due to their economical importance and the difficulties involved in the experimental study of these obligate pathogens, genome sequencing and comparative analysis have been utilized as powerful tools to understand phytoplasma biology. To date four complete phytoplasma genome sequences have been published. However, these four strains represent limited phylogenetic diversity. In this study, we report the shotgun sequencing and evolutionary analysis of a peanut witches'-broom (PnWB) phytoplasma genome. The availability of this genome provides the first representative of the 16SrII group and substantially improves the taxon sampling to investigate genome evolution. The draft genome assembly contains 13 chromosomal contigs with a total size of 562,473 bp, covering ∼90% of the chromosome. Additionally, a complete plasmid sequence is included. Comparisons among the five available phytoplasma genomes reveal the differentiations in gene content and metabolic capacity. Notably, phylogenetic inferences of the potential mobile units (PMUs) in these genomes indicate that horizontal transfer may have occurred between divergent phytoplasma lineages. Because many effectors are associated with PMUs, the horizontal transfer of these transposon-like elements can contribute to the adaptation and diversification of these pathogens. In summary, the findings from this study highlight the importance of improving taxon sampling when investigating genome evolution. Moreover, the currently available sequences are inadequate to fully characterize the pan-genome of phytoplasmas. Future genome sequencing efforts to expand phylogenetic diversity are essential in improving our understanding of phytoplasma evolution. PMID:23626855
Ordeig, Laura; Garcia-Cehic, Damir; Gregori, Josep; Soria, Maria Eugenia; Nieto-Aponte, Leonardo; Perales, Celia; Llorens, Meritxell; Chen, Qian; Riveiro-Barciela, Mar; Buti, Maria; Esteban, Rafael; Esteban, Juan Ignacio; Rodriguez-Frias, Francisco; Quer, Josep
2018-01-01
Hepatitis C virus (HCV) is a highly divergent virus currently classified into seven major genotypes and 86 subtypes (ICTV, June 2017), which can have differing responses to therapy. Accurate genotyping/subtyping using high-resolution HCV subtyping enables confident subtype identification, identifies mixed infections and allows detection of new subtypes. During routine genotyping/subtyping, one sample from an Equatorial Guinea patient could not be classified into any of the subtypes. The complete genomic sequence was compared to reference sequences by phylogenetic and sliding window analysis. Resistance-associated substitutions (RASs) were assessed by deep sequencing. The unclassified HCV genome did not belong to any of the existing genotype 1 (G1) subtypes. Sliding window analysis along the complete genome ruled out recombination phenomena suggesting that it belongs to a new HCV G1 subtype. Two NS5A RASs (L31V+Y93H) were found to be naturally combined in the genome which could limit treatment possibilities in patients infected with this subtype.
Escherichia coli K-12: a cooperatively developed annotation snapshot—2005
Riley, Monica; Abe, Takashi; Arnaud, Martha B.; Berlyn, Mary K.B.; Blattner, Frederick R.; Chaudhuri, Roy R.; Glasner, Jeremy D.; Horiuchi, Takashi; Keseler, Ingrid M.; Kosuge, Takehide; Mori, Hirotada; Perna, Nicole T.; Plunkett, Guy; Rudd, Kenneth E.; Serres, Margrethe H.; Thomas, Gavin H.; Thomson, Nicholas R.; Wishart, David; Wanner, Barry L.
2006-01-01
The goal of this group project has been to coordinate and bring up-to-date information on all genes of Escherichia coli K-12. Annotation of the genome of an organism entails identification of genes, the boundaries of genes in terms of precise start and end sites, and description of the gene products. Known and predicted functions were assigned to each gene product on the basis of experimental evidence or sequence analysis. Since both kinds of evidence are constantly expanding, no annotation is complete at any moment in time. This is a snapshot analysis based on the most recent genome sequences of two E.coli K-12 bacteria. An accurate and up-to-date description of E.coli K-12 genes is of particular importance to the scientific community because experimentally determined properties of its gene products provide fundamental information for annotation of innumerable genes of other organisms. Availability of the complete genome sequence of two K-12 strains allows comparison of their genotypes and mutant status of alleles. PMID:16397293
Kim, Byoung-Jun; Kim, Ga-Na; Kim, Bo-Ram; Shim, Tae-Sun; Kook, Yoon-Hoh; Kim, Bum-Joon
2017-01-01
Recent multi locus sequence typing (MLST) and genome based studies indicate that lateral gene transfer (LGT) events in the rpoB gene are prevalent between Mycobacterium abscessus complex strains. To check the prevalence of the M. massiliense strains subject to rpoB LGT (Rec-mas), we applied rpoB typing (711 bp) to 106 Korean strains of M. massiliense infection that had already been identified by hsp65 sequence analysis (603 bp). The analysis indicated 6 smooth strains in M. massiliense Type I (10.0%, 6/60) genotypes but no strains in M. massiliense Type II genotypes (0%, 0/46), showing a discrepancy between the 2 typing methods. Further MLST analysis based on the partial sequencing of seven housekeeping genes, argH, cya, glpK, gnd, murC, pta and purH, as well as erm(41) PCR proved that these 6 Rec-mas strains consisted of two distinct genotypes belonging to M. massiliense and not M. abscessus. The complete rpoB sequencing analysis showed that these 6 Rec-mas strains have an identical hybrid rpoB gene, of which a 478 bp partial rpoB fragment may be laterally transferred from M. abscessus. Notably, five of the 6 Rec-mas strains showed complete identical sequences in a total of nine genes, including the seven MLST genes, hsp65, and rpoB, suggesting their clonal propagation in South Korea. In conclusion, we identified 6 M. massiliense smooth strains of 2 phylogenetically distinct genotypes with a specific hybrid rpoB gene laterally transferred from M. abscessus from Korean patients. Their clinical relevance and bacteriological traits remain to be elucidated.
Kim, Byoung-Jun; Kim, Ga-Na; Kim, Bo-Ram; Shim, Tae-Sun; Kook, Yoon-Hoh
2017-01-01
Recent multi locus sequence typing (MLST) and genome based studies indicate that lateral gene transfer (LGT) events in the rpoB gene are prevalent between Mycobacterium abscessus complex strains. To check the prevalence of the M. massiliense strains subject to rpoB LGT (Rec-mas), we applied rpoB typing (711 bp) to 106 Korean strains of M. massiliense infection that had already been identified by hsp65 sequence analysis (603 bp). The analysis indicated 6 smooth strains in M. massiliense Type I (10.0%, 6/60) genotypes but no strains in M. massiliense Type II genotypes (0%, 0/46), showing a discrepancy between the 2 typing methods. Further MLST analysis based on the partial sequencing of seven housekeeping genes, argH, cya, glpK, gnd, murC, pta and purH, as well as erm(41) PCR proved that these 6 Rec-mas strains consisted of two distinct genotypes belonging to M. massiliense and not M. abscessus. The complete rpoB sequencing analysis showed that these 6 Rec-mas strains have an identical hybrid rpoB gene, of which a 478 bp partial rpoB fragment may be laterally transferred from M. abscessus. Notably, five of the 6 Rec-mas strains showed complete identical sequences in a total of nine genes, including the seven MLST genes, hsp65, and rpoB, suggesting their clonal propagation in South Korea. In conclusion, we identified 6 M. massiliense smooth strains of 2 phylogenetically distinct genotypes with a specific hybrid rpoB gene laterally transferred from M. abscessus from Korean patients. Their clinical relevance and bacteriological traits remain to be elucidated. PMID:28604829
Tsuchiaka, Shinobu; Rahpaya, Sayed Samim; Otomaru, Konosuke; Aoki, Hiroshi; Kishimoto, Mai; Naoi, Yuki; Omatsu, Tsutomu; Sano, Kaori; Okazaki-Terashima, Sachiko; Katayama, Yukie; Oba, Mami; Nagai, Makoto; Mizutani, Tetsuya
2017-01-17
Bovine enterovirus (BEV) belongs to the species Enterovirus E or F, genus Enterovirus and family Picornaviridae. Although numerous studies have identified BEVs in the feces of cattle with diarrhea, the pathogenicity of BEVs remains unclear. Previously, we reported the detection of novel kobu-like virus in calf feces, by metagenomics analysis. In the present study, we identified a novel BEV in diarrheal feces collected for that survey. Complete genome sequences were determined by deep sequencing in feces. Secondary RNA structure analysis of the 5' untranslated region (UTR), phylogenetic tree construction and pairwise identity analysis were conducted. The complete genome sequences of BEV were genetically distant from other EVs and the VP1 coding region contained novel and unique amino acid sequences. We named this strain as BEV AN12/Bos taurus/JPN/2014 (referred to as BEV-AN12). According to genome analysis, the genome length of this virus is 7414 nucleotides excluding the poly (A) tail and its genome consists of a 5'UTR, open reading frame encoding a single polyprotein, and 3'UTR. The results of secondary RNA structure analysis showed that in the 5'UTR, BEV-AN12 had an additional clover leaf structure and small stem loop structure, similarly to other BEVs. In pairwise identity analysis, BEV-AN12 showed high amino acid (aa) identities to Enterovirus F in the polyprotein, P2 and P3 regions (aa identity ≥82.4%). Therefore, BEV-AN12 is closely related to Enterovirus F. However, aa sequences in the capsid protein regions, particularly the VP1 encoding region, showed significantly low aa identity to other viruses in genus Enterovirus (VP1 aa identity ≤58.6%). In addition, BEV-AN12 branched separately from Enterovirus E and F in phylogenetic trees based on the aa sequences of P1 and VP1, although it clustered with Enterovirus F in trees based on sequences in the P2 and P3 genome region. We identified novel BEV possessing highly divergent aa sequences in the VP1 coding region in Japan. According to species definition, we proposed naming this strain as "Enterovirus K", which is a novel species within genus Enterovirus. Further genomic studies are needed to understand the pathogenicity of BEVs.
Complete genome sequence of a Watermelon silver mottle virus isolate from China.
Rao, Xueqin; Wu, Zhuyan; Li, Yuan
2013-06-01
The complete genome of a Watermelon silver mottle virus (WSMoV) (genus Tospovirus, family Bunyaviridae) isolate (WSMoV-GZ) from Guangdong province, China was sequenced. The genomes of WSMoV-GZ contained 3,603, 4,909, and 8,914 nt of small (S), medium (M), and large (L) RNA segments, respectively, and had a genomic organization characteristic of members of the genus Tospovirus. The amino acid sequence of the nucleocapsid (N) protein, S RNA-encoded nonstructural (NSs) protein, M RNA-encoded nonstructural (NSm) protein, Gn/Gc glycoprotein precursor, and RNA-dependent RNA polymerase (RdRp) protein showed 94.3-97.5 % identity with those of other WSMoV isolates. Phylogenetic analysis showed that the N protein of WSMoV-GZ was clustered together with those of the WSMoV isolates. The full sequence of WSMoV-GZ provides a reference genome for comparison with other tospoviruses.
Martínez-Romero, Esperanza
2012-01-01
We report the complete organelle genome sequences of Trebouxiophyceae sp. strain MX-AZ01, an acidophilic green microalga isolated from a geothermal field in Mexico. This eukaryote has the remarkable ability to thrive in a particular shallow lake with emerging hot springs at the bottom, extremely low pH, and toxic heavy metal concentrations. Trebouxiophyceae sp. MX-AZ01 represents one of few described photosynthetic eukaryotes living in such a hostile environment. The organelle genomes of Trebouxiophyceae sp. MX-AZ01 are remarkable. The plastid genome sequence currently presents the highest G+C content for a trebouxiophyte. The mitochondrial genome sequence is the largest reported to date for the Trebouxiophyceae class of green algae. The analysis of the genome sequences presented here provides insight into the evolution of organelle genomes of trebouxiophytes and green algae. PMID:23104370
Combined pituitary hormone deficiency (CPHD) due to a complete PROP1 deletion.
Abrão, M G; Leite, M V; Carvalho, L R; Billerbeck, A E C; Nishi, M Y; Barbosa, A S; Martin, R M; Arnhold, I J P; Mendonca, B B
2006-09-01
PROP1 mutations are the most common cause of genetic combined pituitary hormone deficiency (CPHD). The aim of this study was to investigate the PROP1 gene in two siblings with CPHD. Pituitary function and imaging assessment and molecular analysis of PROP1. Two siblings, born to consanguineous parents, presented with GH deficiency associated with other pituitary hormone deficiencies (TSH, PRL and gonadotrophins). The male sibling also had an evolving cortisol deficiency. Pituitary size was evaluated by magnetic resonance imaging (MRI). PROP1 gene analysis was performed by polymerase chain reaction (PCR), automatic sequencing and Southern blotting. Amplification of sequence tag sites (STS) and the Q8N6H0 gene flanking PROP1 were performed to define the extension of PROP1 deletion. MRI revealed a hypoplastic anterior pituitary in the girl at 14 years and pituitary enlargement in the boy at 18 years. The PROP1 gene failed to amplify in both siblings, whereas other genes were amplified. Southern blotting analysis revealed the PROP1 band in the controls and confirmed complete PROP1 deletion in both siblings. The extension of the deletion was 18.4 kb. The region flanking PROP1 contains several Alu core sequences that might have facilitated stem-loop-mediated excision of PROP1. We report here a complete deletion of PROP1 in two siblings with CPHD phenotype.
de Gier, Camilla; Kirkham, Lea-Ann S.
2015-01-01
Nonhemolytic variants of Haemophilus haemolyticus are difficult to differentiate from Haemophilus influenzae despite a wide difference in pathogenic potential. A previous investigation characterized a challenging set of 60 clinical strains using multiple PCRs for marker genes and described strains that could not be unequivocally identified as either species. We have analyzed the same set of strains by multilocus sequence analysis (MLSA) and near-full-length 16S rRNA gene sequencing. MLSA unambiguously allocated all study strains to either of the two species, while identification by 16S rRNA sequence was inconclusive for three strains. Notably, the two methods yielded conflicting identifications for two strains. Most of the “fuzzy species” strains were identified as H. influenzae that had undergone complete deletion of the fucose operon. Such strains, which are untypeable by the H. influenzae multilocus sequence type (MLST) scheme, have sporadically been reported and predominantly belong to a single branch of H. influenzae MLSA phylogenetic group II. We also found evidence of interspecies recombination between H. influenzae and H. haemolyticus within the 16S rRNA genes. Establishing an accurate method for rapid and inexpensive identification of H. influenzae is important for disease surveillance and treatment. PMID:26378279
Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius
Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.
2010-01-01
Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665
USDA-ARS?s Scientific Manuscript database
The availability of complete or nearly complete genome sequences from several plant species permits detailed discovery and cross-species comparison of transposable elements (TEs) at the whole genome level. We initially investigated 510 LTR-retrotransposon (LTR-RT) families that are comprised of 32,...
Evaluating the protein coding potential of exonized transposable element sequences
Piriyapongsa, Jittima; Rutledge, Mark T; Patel, Sanil; Borodovsky, Mark; Jordan, I King
2007-01-01
Background Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE sequences have been incorporated as exons into mRNAs, and it is widely assumed that such 'exonized' TEs encode protein sequences. However, the extent to which TE-derived sequences actually encode proteins is unknown and a matter of some controversy. We have tried to address this outstanding issue from two perspectives: i-by evaluating ascertainment biases related to the search methods used to uncover TE-derived protein coding sequences (CDS) and ii-through a probabilistic codon-frequency based analysis of the protein coding potential of TE-derived exons. Results We compared the ability of three classes of sequence similarity search methods to detect TE-derived sequences among data sets of experimentally characterized proteins: 1-a profile-based hidden Markov model (HMM) approach, 2-BLAST methods and 3-RepeatMasker. Profile based methods are more sensitive and more selective than the other methods evaluated. However, the application of profile-based search methods to the detection of TE-derived sequences among well-curated experimentally characterized protein data sets did not turn up many more cases than had been previously detected and nowhere near as many cases as recent genome-wide searches have. We observed that the different search methods used were complementary in the sense that they yielded largely non-overlapping sets of hits and differed in their ability to recover known cases of TE-derived CDS. The probabilistic analysis of TE-derived exon sequences indicates that these sequences have low protein coding potential on average. In particular, non-autonomous TEs that do not encode protein sequences, such as Alu elements, are frequently exonized but unlikely to encode protein sequences. Conclusion The exaptation of the numerous TE sequences found in exons as bona fide protein coding sequences may prove to be far less common than has been suggested by the analysis of complete genomes. We hypothesize that many exonized TE sequences actually function as post-transcriptional regulators of gene expression, rather than coding sequences, which may act through a variety of double stranded RNA related regulatory pathways. Indeed, their relatively high copy numbers and similarity to sequences dispersed throughout the genome suggests that exonized TE sequences could serve as master regulators with a wide scope of regulatory influence. Reviewers: This article was reviewed by Itai Yanai, Kateryna D. Makova, Melissa Wilson (nominated by Kateryna D. Makova) and Cedric Feschotte (nominated by John M. Logsdon Jr.). PMID:18036258
Wang, Aishuai; Sun, Yuena; Wu, Changwen
2016-11-01
The complete mitochondrial genome of the Cheilodactylus quadricornis was firstly determined in the present study. The mitochondrial genome of C. quadricornis is 16 521 nucleotides, comprising 13 protein-coding genes and 2 ribosomal RNA genes, 22 tRNA genes and 2 main non-coding regions (the control region and the origin of the light-strand replication). The overall base composition was T, 26.3%; C, 29.6%; A, 27.8% and G, 16.3%. The gene arrangement, base composition, and tRNA structures of the complete mitochondrial genome of C. quadricornis is similar to other teleosts. Only two central conserved sequence blocks (CSB-2 and CSB-3) were identified in the control region. In addition, the conserved motif 5'-GCCGG-3' was identified in the origin of light-strand replication of C. quadricornis. The complete mitochondrial genome of C. quadricornis was used to construct phylogenetic tree, which shows that C. quadricornis and C. variegatus clustered in a clade and formed a sister relationship. This mitogenome sequence data would play an important role in population genetics and phylogenetic analysis of the Cheilodactylidae.
Redwan, R M; Saidin, A; Kumar, S V
2015-08-12
Pineapple (Ananas comosus var. comosus) is known as the king of fruits for its crown and is the third most important tropical fruit after banana and citrus. The plant, which is indigenous to South America, is the most important species in the Bromeliaceae family and is largely traded for fresh fruit consumption. Here, we report the complete chloroplast sequence of the MD-2 pineapple that was sequenced using the PacBio sequencing technology. In this study, the high error rate of PacBio long sequence reads of A. comosus's total genomic DNA were improved by leveraging on the high accuracy but short Illumina reads for error-correction via the latest error correction module from Novocraft. Error corrected long PacBio reads were assembled by using a single tool to produce a contig representing the pineapple chloroplast genome. The genome of 159,636 bp in length is featured with the conserved quadripartite structure of chloroplast containing a large single copy region (LSC) with a size of 87,482 bp, a small single copy region (SSC) with a size of 18,622 bp and two inverted repeat regions (IRA and IRB) each with the size of 26,766 bp. Overall, the genome contained 117 unique coding regions and 30 were repeated in the IR region with its genes contents, structure and arrangement similar to its sister taxon, Typha latifolia. A total of 35 repeats structure were detected in both the coding and non-coding regions with a majority being tandem repeats. In addition, 205 SSRs were detected in the genome with six protein-coding genes contained more than two SSRs. Comparative chloroplast genomes from the subclass Commelinidae revealed a conservative protein coding gene albeit located in a highly divergence region. Analysis of selection pressure on protein-coding genes using Ka/Ks ratio showed significant positive selection exerted on the rps7 gene of the pineapple chloroplast with P less than 0.05. Phylogenetic analysis confirmed the recent taxonomical relation among the member of commelinids which support the monophyly relationship between Arecales and Dasypogonaceae and between Zingiberales to the Poales, which includes the A. comosus. The complete sequence of the chloroplast of pineapple provides insights to the divergence of genic chloroplast sequences from the members of the subclass Commelinidae. The complete pineapple chloroplast will serve as a reference for in-depth taxonomical studies in the Bromeliaceae family when more species under the family are sequenced in the future. The genetic sequence information will also make feasible other molecular applications of the pineapple chloroplast for plant genetic improvement.
Yang, Yunlong; Lin, Ershu; Huang, Shaobin
Chelatococcus daeguensis TAD1 is a themophilic bacterium isolated from a biotrickling filter used to treat NOx in Ruiming Power Plant, located in Guangzhou, China, which shows an excellent aerobic denitrification activity at high temperature. The complete genome sequence of this strain was reported in the present study. Genes related to the aerobic denitrification were identified through whole genome analysis. This work will facilitate the mechanism of aerobic denitrification and provide evidence for its potential application in the nitrogen removal. Copyright © 2017 Sociedade Brasileira de Microbiologia. Published by Elsevier Editora Ltda. All rights reserved.
Hsu, Ju-Chun; Chien, Ting-Ying; Hu, Chia-Cheng; Chen, Mei-Ju May; Wu, Wen-Jer; Feng, Hai-Tung; Haymer, David S; Chen, Chien-Yu
2012-01-01
Insecticide resistance has recently become a critical concern for control of many insect pest species. Genome sequencing and global quantization of gene expression through analysis of the transcriptome can provide useful information relevant to this challenging problem. The oriental fruit fly, Bactrocera dorsalis, is one of the world's most destructive agricultural pests, and recently it has been used as a target for studies of genetic mechanisms related to insecticide resistance. However, prior to this study, the molecular data available for this species was largely limited to genes identified through homology. To provide a broader pool of gene sequences of potential interest with regard to insecticide resistance, this study uses whole transcriptome analysis developed through de novo assembly of short reads generated by next-generation sequencing (NGS). The transcriptome of B. dorsalis was initially constructed using Illumina's Solexa sequencing technology. Qualified reads were assembled into contigs and potential splicing variants (isotigs). A total of 29,067 isotigs have putative homologues in the non-redundant (nr) protein database from NCBI, and 11,073 of these correspond to distinct D. melanogaster proteins in the RefSeq database. Approximately 5,546 isotigs contain coding sequences that are at least 80% complete and appear to represent B. dorsalis genes. We observed a strong correlation between the completeness of the assembled sequences and the expression intensity of the transcripts. The assembled sequences were also used to identify large numbers of genes potentially belonging to families related to insecticide resistance. A total of 90 P450-, 42 GST-and 37 COE-related genes, representing three major enzyme families involved in insecticide metabolism and resistance, were identified. In addition, 36 isotigs were discovered to contain target site sequences related to four classes of resistance genes. Identified sequence motifs were also analyzed to characterize putative polypeptide translational products and associate them with specific genes and protein functions.
Subbiah, Madhuri; Xiao, Sa; Collins, Peter L.; Samal, Siba K
2009-01-01
The complete RNA genome sequence of avian paramyxovirus (APMV) serotype 2, strain Yucaipa isolated from chicken has been determined. With genome size of 14,904 nucleotides (nt), strain Yucaipa is consistent with the “rule of six” and is the smallest virus reported to date among the members of subfamily Paramyxovirinae. The genome contains six non-overlapping genes in the order 3′-N-P/V-M-F-HN-L-5′. The genes are flanked on either side by highly-conserved transcription start and stop signals and have intergenic sequences varying in length from 3 to 23 nt. The genome contains a 55 nt leader sequence at 3′ end and a 154 nt trailer sequence at 5′ end. Alignment and phylogenetic analysis of the predicted amino acid sequences of strain Yucaipa proteins with the cognate proteins of viruses of all of the five genera of family Paramyxoviridae showed that APMV-2 strain Yucaipa is more closely related to APMV-6 than APMV-1. PMID:18603323
Khan, Abdul Latif; Khan, Muhammad Aaqil; Shahzad, Raheem; Lubna; Kang, Sang Mo; Al-Harrasi, Ahmed; Al-Rawahi, Ahmed; Lee, In-Jung
2018-01-01
Pinaceae, the largest family of conifers, has a diversified organization of chloroplast (cp) genomes with two typical highly reduced inverted repeats (IRs). In the current study, we determined the complete sequence of the cp genome of an economically and ecologically important conifer tree, the loblolly pine (Pinus taeda L.), using Illumina paired-end sequencing and compared the sequence with those of other pine species. The results revealed a genome size of 121,531 base pairs (bp) containing a pair of 830-bp IR regions, distinguished by a small single copy (42,258 bp) and large single copy (77,614 bp) region. The chloroplast genome of P. taeda encodes 120 genes, comprising 81 protein-coding genes, four ribosomal RNA genes, and 35 tRNA genes, with 151 randomly distributed microsatellites. Approximately 6 palindromic, 34 forward, and 22 tandem repeats were found in the P. taeda cp genome. Whole cp genome comparison with those of other Pinus species exhibited an overall high degree of sequence similarity, with some divergence in intergenic spacers. Higher and lower numbers of indels and single-nucleotide polymorphism substitutions were observed relative to P. contorta and P. monophylla, respectively. Phylogenomic analyses based on the complete genome sequence revealed that 60 shared genes generated trees with the same topologies, and P. taeda was closely related to P. contorta in the subgenus Pinus. Thus, the complete P. taeda genome provided valuable resources for population and evolutionary studies of gymnosperms and can be used to identify related species. PMID:29596414
Liu, Yue; Huo, Naxin; Dong, Lingli; Wang, Yi; Zhang, Shuixian; Young, Hugh A.; Feng, Xiaoxiao; Gu, Yong Qiang
2013-01-01
Background Artemisia frigida Willd. is an important Mongolian traditional medicinal plant with pharmacological functions of stanch and detumescence. However, there is little sequence and genomic information available for Artemisia frigida, which makes phylogenetic identification, evolutionary studies, and genetic improvement of its value very difficult. We report the complete chloroplast genome sequence of Artemisia frigida based on 454 pyrosequencing. Methodology/Principal Findings The complete chloroplast genome of Artemisia frigida is 151,076 bp including a large single copy (LSC) region of 82,740 bp, a small single copy (SSC) region of 18,394 bp and a pair of inverted repeats (IRs) of 24,971 bp. The genome contains 114 unique genes and 18 duplicated genes. The chloroplast genome of Artemisia frigida contains a small 3.4 kb inversion within a large 23 kb inversion in the LSC region, a unique feature in Asteraceae. The gene order in the SSC region of Artemisia frigida is inverted compared with the other 6 Asteraceae species with the chloroplast genomes sequenced. This inversion is likely caused by an intramolecular recombination event only occurred in Artemisia frigida. The existence of rich SSR loci in the Artemisia frigida chloroplast genome provides a rare opportunity to study population genetics of this Mongolian medicinal plant. Phylogenetic analysis demonstrates a sister relationship between Artemisia frigida and four other species in Asteraceae, including Ageratina adenophora, Helianthus annuus, Guizotia abyssinica and Lactuca sativa, based on 61 protein-coding sequences. Furthermore, Artemisia frigida was placed in the tribe Anthemideae in the subfamily Asteroideae (Asteraceae) based on ndhF and trnL-F sequence comparisons. Conclusion The chloroplast genome sequence of Artemisia frigida was assembled and analyzed in this study, representing the first plastid genome sequenced in the Anthemideae tribe. This complete chloroplast genome sequence will be useful for molecular ecology and molecular phylogeny studies within Artemisia species and also within the Asteraceae family. PMID:23460871
NASA Astrophysics Data System (ADS)
Shao, Xupeng
2017-04-01
Glutenite bodies are widely developed in northern Minfeng zone of Dongying Sag. Their litho-electric relationship is not clear. In addition, as the conventional sequence stratigraphic research method drawbacks of involving too many subjective human factors, it has limited deepening of the regional sequence stratigraphic research. The wavelet transform technique based on logging data and the time-frequency analysis technique based on seismic data have advantages of dividing sequence stratigraphy quantitatively comparing with the conventional methods. Under the basis of the conventional sequence research method, this paper used the above techniques to divide the fourth-order sequence of the upper Es4 in northern Minfeng zone of Dongying Sag. The research shows that the wavelet transform technique based on logging data and the time-frequency analysis technique based on seismic data are essentially consistent, both of which divide sequence stratigraphy quantitatively in the frequency domain; wavelet transform technique has high resolutions. It is suitable for areas with wells. The seismic time-frequency analysis technique has wide applicability, but a low resolution. Both of the techniques should be combined; the upper Es4 in northern Minfeng zone of Dongying Sag is a complete set of third-order sequence, which can be further subdivided into 5 fourth-order sequences that has the depositional characteristics of fine-upward sequence in granularity. Key words: Dongying sag, northern Minfeng zone, wavelet transform technique, time-frequency analysis technique ,the upper Es4, sequence stratigraphy
The Use of Weighted Graphs for Large-Scale Genome Analysis
Zhou, Fang; Toivonen, Hannu; King, Ross D.
2014-01-01
There is an acute need for better tools to extract knowledge from the growing flood of sequence data. For example, thousands of complete genomes have been sequenced, and their metabolic networks inferred. Such data should enable a better understanding of evolution. However, most existing network analysis methods are based on pair-wise comparisons, and these do not scale to thousands of genomes. Here we propose the use of weighted graphs as a data structure to enable large-scale phylogenetic analysis of networks. We have developed three types of weighted graph for enzymes: taxonomic (these summarize phylogenetic importance), isoenzymatic (these summarize enzymatic variety/redundancy), and sequence-similarity (these summarize sequence conservation); and we applied these types of weighted graph to survey prokaryotic metabolism. To demonstrate the utility of this approach we have compared and contrasted the large-scale evolution of metabolism in Archaea and Eubacteria. Our results provide evidence for limits to the contingency of evolution. PMID:24619061
USDA-ARS?s Scientific Manuscript database
A baculovirus isolate from a USDA Forest Service collection was examined by electron microscopy and analysis of its genome sequence. The isolate, formerly referred to as Pseudoletia (Mythimna) sp. nucleopolyhedrovirus #7 (MyspNPV#7), was determined by barcoding PCR to derive from the host species My...
The complete chloroplast genome sequence of Actinidia arguta using the PacBio RS II platform
Lin, Miaomiao; Qi, Xiujuan; Chen, Jinyong; Sun, Leiming; Zhong, Yunpeng; Fang, Jinbao; Hu, Chungen
2018-01-01
Actinidia arguta is the most basal species in a phylogenetically and economically important genus in the family Actinidiaceae. To better understand the molecular basis of the Actinidia arguta chloroplast (cp), we sequenced the complete cp genome from A. arguta using Illumina and PacBio RS II sequencing technologies. The cp genome from A. arguta was 157,611 bp in length and composed of a pair of 24,232 bp inverted repeats (IRs) separated by a 20,463 bp small single copy region (SSC) and an 88,684 bp large single copy region (LSC). Overall, the cp genome contained 113 unique genes. The cp genomes from A. arguta and three other Actinidia species from GenBank were subjected to a comparative analysis. Indel mutation events and high frequencies of base substitution were identified, and the accD and ycf2 genes showed a high degree of variation within Actinidia. Forty-seven simple sequence repeats (SSRs) and 155 repetitive structures were identified, further demonstrating the rapid evolution in Actinidia. The cp genome analysis and the identification of variable loci provide vital information for understanding the evolution and function of the chloroplast and for characterizing Actinidia population genetics. PMID:29795601
Complete Sequence and Analysis of Coconut Palm (Cocos nucifera) Mitochondrial Genome.
Aljohi, Hasan Awad; Liu, Wanfei; Lin, Qiang; Zhao, Yuhui; Zeng, Jingyao; Alamer, Ali; Alanazi, Ibrahim O; Alawad, Abdullah O; Al-Sadi, Abdullah M; Hu, Songnian; Yu, Jun
2016-01-01
Coconut (Cocos nucifera L.), a member of the palm family (Arecaceae), is one of the most economically important crops in tropics, serving as an important source of food, drink, fuel, medicine, and construction material. Here we report an assembly of the coconut (C. nucifera, Oman local Tall cultivar) mitochondrial (mt) genome based on next-generation sequencing data. This genome, 678,653bp in length and 45.5% in GC content, encodes 72 proteins, 9 pseudogenes, 23 tRNAs, and 3 ribosomal RNAs. Within the assembly, we find that the chloroplast (cp) derived regions account for 5.07% of the total assembly length, including 13 proteins, 2 pseudogenes, and 11 tRNAs. The mt genome has a relatively large fraction of repeat content (17.26%), including both forward (tandem) and inverted (palindromic) repeats. Sequence variation analysis shows that the Ti/Tv ratio of the mt genome is lower as compared to that of the nuclear genome and neutral expectation. By combining public RNA-Seq data for coconut, we identify 734 RNA editing sites supported by at least two datasets. In summary, our data provides the second complete mt genome sequence in the family Arecaceae, essential for further investigations on mitochondrial biology of seed plants.
The complete chloroplast genome sequence of Euonymus japonicus (Celastraceae).
Choi, Kyoung Su; Park, SeonJoo
2016-09-01
The complete chloroplast (cp) genome sequence of the Euonymus japonicus, the first sequenced of the genus Euonymus, was reported in this study. The total length was 157 637 bp, containing a pair of 26 678 bp inverted repeat region (IR), which were separated by small single copy (SSC) region and large single copy (LSC) region of 18 340 bp and 85 941 bp, respectively. This genome contains 107 unique genes, including 74 coding genes, four rRNA genes, and 29 tRNA genes. Seventeen genes contain intron of E. japonicus, of which three genes (clpP, ycf3, and rps12) include two introns. The maximum likelihood (ML) phylogenetic analysis revealed that E. japonicus was closely related to Manihot and Populus.
Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud
Griffith, Malachi; Walker, Jason R.; Spies, Nicholas C.; Ainscough, Benjamin J.; Griffith, Obi L.
2015-01-01
Massively parallel RNA sequencing (RNA-seq) has rapidly become the assay of choice for interrogating RNA transcript abundance and diversity. This article provides a detailed introduction to fundamental RNA-seq molecular biology and informatics concepts. We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki. PMID:26248053
Chang, Suhua; Zhang, Jiajie; Liao, Xiaoyun; Zhu, Xinxing; Wang, Dahai; Zhu, Jiang; Feng, Tao; Zhu, Baoli; Gao, George F; Wang, Jian; Yang, Huanming; Yu, Jun; Wang, Jing
2007-01-01
Frequent outbreaks of highly pathogenic avian influenza and the increasing data available for comparative analysis require a central database specialized in influenza viruses (IVs). We have established the Influenza Virus Database (IVDB) to integrate information and create an analysis platform for genetic, genomic, and phylogenetic studies of the virus. IVDB hosts complete genome sequences of influenza A virus generated by Beijing Institute of Genomics (BIG) and curates all other published IV sequences after expert annotation. Our Q-Filter system classifies and ranks all nucleotide sequences into seven categories according to sequence content and integrity. IVDB provides a series of tools and viewers for comparative analysis of the viral genomes, genes, genetic polymorphisms and phylogenetic relationships. A search system has been developed for users to retrieve a combination of different data types by setting search options. To facilitate analysis of global viral transmission and evolution, the IV Sequence Distribution Tool (IVDT) has been developed to display the worldwide geographic distribution of chosen viral genotypes and to couple genomic data with epidemiological data. The BLAST, multiple sequence alignment and phylogenetic analysis tools were integrated for online data analysis. Furthermore, IVDB offers instant access to pre-computed alignments and polymorphisms of IV genes and proteins, and presents the results as SNP distribution plots and minor allele distributions. IVDB is publicly available at http://influenza.genomics.org.cn.
Rider, Stanley Dean
2016-07-01
The complete mitochondrial genome of the desert darkling beetle Asbolus verrucosus (LeConte, 1851) was sequenced using paired-end technology to an average depth of 42,111× and assembled using De Bruijn graph-based methods. The genome is 15,828 bp in length and conforms to the basal arthropod mitochondrial gene composition with the same gene orders and orientations as other darkling beetle mitochondria. This arrangement includes a control region, 22 tRNA genes, 2 rRNA genes and 13 protein-coding genes. The main coding strand is probably replicated as the lagging strand (GC skew of -0.36 and AT skew of +0.19). Phylogenomics analyses are consistent with taxonomic classifications and indicate that Tenebrio molitor is the closest relative that has a completely sequenced mitochondrial genome available for analysis. This is the first fully assembled mitogenome sequence for a darkling beetle in the subfamily Pimeliinae and will be useful for population studies on members of this ecologically important group of beetles.
Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu
2015-01-01
The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, it becomes increasingly challenging to maintain high-quality orthology relationships while allowing the users to incorporate the latest genomic data available into an analysis. Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality. In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results. In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Zheng, Renhua; Xu, Haibin; Zhou, Yanwei; Li, Meiping; Lu, Fengjuan; Dong, Yini; Liu, Xin; Chen, Jinhui; Shi, Jisen
2016-01-01
Glyptostrobus pensilis, belonging to the monotypic genus Glyptostrobus (Family: Cupressaceae), is an ancient conifer that is naturally distributed in low-lying wet areas. Here, we report the complete chloroplast (cp) genome sequence (132,239 bp) of G. pensilis. The G. pensilis cp genome is similar in gene content, organization and genome structure to the sequenced cp genomes from other cupressophytes, especially with respect to the loss of the inverted repeat region A (IRA). Through phylogenetic analysis, we demonstrated that the genus Glyptostrobus is closely related to the genus Cryptomeria, supporting previous findings based on physiological characteristics. Since IRs play an important role in stabilize cp genome and conifer cp genomes lost different IR regions after splitting in two clades (cupressophytes and Pinaceae), we performed cp genome rearrangement analysis and found more extensive cp genome rearrangements among the species of cupressophytes relative to Pinaceae. Additional repeat analysis indicated that cupressophytes cp genomes contained less potential functional repeats, especially in Cupressaceae, compared with Pinaceae. These results suggested that dynamics of cp genome rearrangement in conifers differed since the two clades, Pinaceae and cupressophytes, lost IR copies independently and developed different repeats to complement the residual IRs. In addition, we identified 170 perfect simple sequence repeats that will be useful in future research focusing on the evolution of genetic diversity and conservation of genetic variation for this endangered species in the wild. PMID:27560965
2014-01-01
Background Limited available sequence information has greatly impeded population genetics, phylogenetics and systematics studies in the subclass Acari (mites and ticks). Mitochondrial (mt) DNA is well known to provide genetic markers for investigations in these areas, but complete mt genomic data have been lacking for many Acari species. Herein, we present the complete mt genome of the scab mite Psoroptes cuniculi. Methods P. cuniculi was collected from a naturally infected New Zealand white rabbit from China and identified by morphological criteria. The complete mt genome of P. cuniculi was amplified by PCR and then sequenced. The relationships of this scab mite with selected members of the Acari were assessed by phylogenetic analysis of concatenated amino acid sequence datasets by Bayesian inference (BI), maximum likelihood (ML) and maximum parsimony (MP). Results This mt genome (14,247 bp) is circular and consists of 37 genes, including 13 genes for proteins, 22 genes for tRNA, 2 genes for rRNA. The gene arrangement in mt genome of P. cuniculi is the same as those of Dermatophagoides farinae (Pyroglyphidae) and Aleuroglyphus ovatus (Acaridae), but distinct from those of Steganacarus magnus (Steganacaridae) and Panonychus citri (Tetranychidae). Phylogenetic analyses using concatenated amino acid sequences of 12 protein-coding genes, with three different computational algorithms (BI, ML and MP), showed the division of subclass Acari into two superorders, supported the monophylies of the both superorders Parasitiformes and Acariformes; and the three orders Ixodida and Mesostigmata and Astigmata, but rejected the monophyly of the order Prostigmata. Conclusions The mt genome of P. cuniculi represents the first mt genome of any member of the family Psoroptidae. Analysis of mt genome sequences in the present study has provided new insights into the phylogenetic relationships among several major lineages of Acari species. PMID:25052180
Zhang, Ying; Li, Lei; Yan, Ting Liang; Liu, Qiang
2014-10-01
Praxelis (Eupatorium catarium Veldkamp) is a new hazardous invasive plant species that has caused serious economic losses and environmental damage in the Northern hemisphere tropical and subtropical regions. Although previous studies focused on detecting the biological characteristics of this plant to prevent its expansion, little effort has been made to understand the impact of Praxelis on the ecosystem in an evolutionary process. The genetic information of Praxelis is required for further phylogenetic identification and evolutionary studies. Here, we report the complete Praxelis chloroplast (cp) genome sequence. The Praxelis chloroplast genome is 151,410 bp in length including a small single-copy region (18,547 bp) and a large single-copy region (85,311 bp) separated by a pair of inverted repeats (IRs; 23,776 bp). The genome contains 85 unique and 18 duplicated genes in the IR region. The gene content and organization are similar to other Asteraceae tribe cp genomes. We also analyzed the whole cp genome sequence, repeat structure, codon usage, contraction of the IR and gene structure/organization features between native and invasive Asteraceae plants, in order to understand the evolution of organelle genomes between native and invasive Asteraceae. Comparative analysis identified the 14 markers containing greater than 2% parsimony-informative characters, indicating that they are potential informative markers for barcoding and phylogenetic analysis. Moreover, a sister relationship between Praxelis and seven other species in Asteraceae was found based on phylogenetic analysis of 28 protein-coding sequences. Complete cp genome information is useful for plant phylogenetic and evolutionary studies within this invasive species and also within the Asteraceae family. Copyright © 2014 Elsevier B.V. All rights reserved.
Chen, Jinhui; Hao, Zhaodong; Xu, Haibin; Yang, Liming; Liu, Guangxin; Sheng, Yu; Zheng, Chen; Zheng, Weiwei; Cheng, Tielong; Shi, Jisen
2015-01-01
Metasequoia glyptostroboides Hu et Cheng is the only species in the genus Metasequoia Miki ex Hu et Cheng, which belongs to the Cupressaceae family. There were around 10 species in the Metasequoia genus, which were widely spread across the Northern Hemisphere during the Cretaceous of the Mesozoic and in the Cenozoic. M. glyptostroboides is the only remaining representative of this genus. Here, we report the complete chloroplast (cp) genome sequence and the cp genomic features of M. glyptostroboides. The M. glyptostroboides cp genome is 131,887 bp in length, with a total of 117 genes comprised of 82 protein-coding genes, 31 tRNA genes and four rRNA genes. In this genome, 11 forward repeats, nine palindromic repeats, and 15 tandem repeats were detected. A total of 188 perfect microsatellites were detected through simple sequence repeat (SSR) analysis and these were distributed unevenly within the cp genome. Comparison of the cp genome structure and gene order to those of several other land plants indicated that a copy of the inverted repeat (IR) region, which was found to be IR region A (IRA), was lost in the M. glyptostroboides cp genome. The five most divergent and five most conserved genes were determined and further phylogenetic analysis was performed among plant species, especially for related species in conifers. Finally, phylogenetic analysis demonstrated that M. glyptostroboides is a sister species to Cryptomeria japonica (L. F.) D. Don and to Taiwania cryptomerioides Hayata. The complete cp genome sequence information of M. glyptostroboides will be great helpful for further investigations of this endemic relict woody plant and for in-depth understanding of the evolutionary history of the coniferous cp genomes, especially for the position of M. glyptostroboides in plant systematics and evolution.
Chen, Jinhui; Hao, Zhaodong; Xu, Haibin; Yang, Liming; Liu, Guangxin; Sheng, Yu; Zheng, Chen; Zheng, Weiwei; Cheng, Tielong; Shi, Jisen
2015-01-01
Metasequoia glyptostroboides Hu et Cheng is the only species in the genus Metasequoia Miki ex Hu et Cheng, which belongs to the Cupressaceae family. There were around 10 species in the Metasequoia genus, which were widely spread across the Northern Hemisphere during the Cretaceous of the Mesozoic and in the Cenozoic. M. glyptostroboides is the only remaining representative of this genus. Here, we report the complete chloroplast (cp) genome sequence and the cp genomic features of M. glyptostroboides. The M. glyptostroboides cp genome is 131,887 bp in length, with a total of 117 genes comprised of 82 protein-coding genes, 31 tRNA genes and four rRNA genes. In this genome, 11 forward repeats, nine palindromic repeats, and 15 tandem repeats were detected. A total of 188 perfect microsatellites were detected through simple sequence repeat (SSR) analysis and these were distributed unevenly within the cp genome. Comparison of the cp genome structure and gene order to those of several other land plants indicated that a copy of the inverted repeat (IR) region, which was found to be IR region A (IRA), was lost in the M. glyptostroboides cp genome. The five most divergent and five most conserved genes were determined and further phylogenetic analysis was performed among plant species, especially for related species in conifers. Finally, phylogenetic analysis demonstrated that M. glyptostroboides is a sister species to Cryptomeria japonica (L. F.) D. Don and to Taiwania cryptomerioides Hayata. The complete cp genome sequence information of M. glyptostroboides will be great helpful for further investigations of this endemic relict woody plant and for in-depth understanding of the evolutionary history of the coniferous cp genomes, especially for the position of M. glyptostroboides in plant systematics and evolution. PMID:26136762
Lim, Yan Wei; Cuevas, Daniel A.; Silva, Genivaldo Gueiros Z.; Aguinaldo, Kristen; Dinsdale, Elizabeth A.; Haas, Andreas F.; Hatay, Mark; Sanchez, Savannah E.; Wegley-Kelly, Linda; Dutilh, Bas E.; Harkins, Timothy T.; Lee, Clarence C.; Tom, Warren; Sandin, Stuart A.; Smith, Jennifer E.; Zgliczynski, Brian; Vermeij, Mark J.A.; Rohwer, Forest
2014-01-01
Genomics and metagenomics have revolutionized our understanding of marine microbial ecology and the importance of microbes in global geochemical cycles. However, the process of DNA sequencing has always been an abstract extension of the research expedition, completed once the samples were returned to the laboratory. During the 2013 Southern Line Islands Research Expedition, we started the first effort to bring next generation sequencing to some of the most remote locations on our planet. We successfully sequenced twenty six marine microbial genomes, and two marine microbial metagenomes using the Ion Torrent PGM platform on the Merchant Yacht Hanse Explorer. Onboard sequence assembly, annotation, and analysis enabled us to investigate the role of the microbes in the coral reef ecology of these islands and atolls. This analysis identified phosphonate as an important phosphorous source for microbes growing in the Line Islands and reinforced the importance of L-serine in marine microbial ecosystems. Sequencing in the field allowed us to propose hypotheses and conduct experiments and further sampling based on the sequences generated. By eliminating the delay between sampling and sequencing, we enhanced the productivity of the research expedition. By overcoming the hurdles associated with sequencing on a boat in the middle of the Pacific Ocean we proved the flexibility of the sequencing, annotation, and analysis pipelines. PMID:25177534
Lim, Yan Wei; Cuevas, Daniel A; Silva, Genivaldo Gueiros Z; Aguinaldo, Kristen; Dinsdale, Elizabeth A; Haas, Andreas F; Hatay, Mark; Sanchez, Savannah E; Wegley-Kelly, Linda; Dutilh, Bas E; Harkins, Timothy T; Lee, Clarence C; Tom, Warren; Sandin, Stuart A; Smith, Jennifer E; Zgliczynski, Brian; Vermeij, Mark J A; Rohwer, Forest; Edwards, Robert A
2014-01-01
Genomics and metagenomics have revolutionized our understanding of marine microbial ecology and the importance of microbes in global geochemical cycles. However, the process of DNA sequencing has always been an abstract extension of the research expedition, completed once the samples were returned to the laboratory. During the 2013 Southern Line Islands Research Expedition, we started the first effort to bring next generation sequencing to some of the most remote locations on our planet. We successfully sequenced twenty six marine microbial genomes, and two marine microbial metagenomes using the Ion Torrent PGM platform on the Merchant Yacht Hanse Explorer. Onboard sequence assembly, annotation, and analysis enabled us to investigate the role of the microbes in the coral reef ecology of these islands and atolls. This analysis identified phosphonate as an important phosphorous source for microbes growing in the Line Islands and reinforced the importance of L-serine in marine microbial ecosystems. Sequencing in the field allowed us to propose hypotheses and conduct experiments and further sampling based on the sequences generated. By eliminating the delay between sampling and sequencing, we enhanced the productivity of the research expedition. By overcoming the hurdles associated with sequencing on a boat in the middle of the Pacific Ocean we proved the flexibility of the sequencing, annotation, and analysis pipelines.
NASA Astrophysics Data System (ADS)
Humpula, James F.; Ostrom, Peggy H.; Gandhi, Hasand; Strahler, John R.; Walker, Angela K.; Stafford, Thomas W.; Smith, James J.; Voorhies, Michael R.; George Corner, R.; Andrews, Phillip C.
2007-12-01
Ancient DNA sequences offer an extraordinary opportunity to unravel the evolutionary history of ancient organisms. Protein sequences offer another reservoir of genetic information that has recently become tractable through the application of mass spectrometric techniques. The extent to which ancient protein sequences resolve phylogenetic relationships, however, has not been explored. We determined the osteocalcin amino acid sequence from the bone of an extinct Camelid (21 ka, Camelops hesternus) excavated from Isleta Cave, New Mexico and three bones of extant camelids: bactrian camel ( Camelus bactrianus); dromedary camel ( Camelus dromedarius) and guanaco ( Llama guanacoe) for a diagenetic and phylogenetic assessment. There was no difference in sequence among the four taxa. Structural attributes observed in both modern and ancient osteocalcin include a post-translation modification, Hyp 9, deamidation of Gln 35 and Gln 39, and oxidation of Met 36. Carbamylation of the N-terminus in ancient osteocalcin may result in blockage and explain previous difficulties in sequencing ancient proteins via Edman degradation. A phylogenetic analysis using osteocalcin sequences of 25 vertebrate taxa was conducted to explore osteocalcin protein evolution and the utility of osteocalcin sequences for delineating phylogenetic relationships. The maximum likelihood tree closely reflected generally recognized taxonomic relationships. For example, maximum likelihood analysis recovered rodents, birds and, within hominins, the Homo-Pan-Gorilla trichotomy. Within Artiodactyla, character state analysis showed that a substitution of Pro 4 for His 4 defines the Capra-Ovis clade within Artiodactyla. Homoplasy in our analysis indicated that osteocalcin evolution is not a perfect indicator of species evolution. Limited sequence availability prevented assigning functional significance to sequence changes. Our preliminary analysis of osteocalcin evolution represents an initial step towards a complete character analysis aimed at determining the evolutionary history of this functionally significant protein. We emphasize that ancient protein sequencing and phylogenetic analyses using amino acid sequences must pay close attention to post-translational modifications, amino acid substitutions due to diagenetic alteration and the impacts of isobaric amino acids on mass shifts and sequence alignments.
Laser mass spectrometry for DNA sequencing, disease diagnosis, and fingerprinting
NASA Astrophysics Data System (ADS)
Chen, C. H. Winston; Taranenko, N. I.; Zhu, Y. F.; Chung, C. N.; Allman, S. L.
1997-05-01
Since laser mass spectrometry has the potential for achieving very fast DNA analysis, we recently applied it to DNA sequencing, DNA typing for fingerprinting, and DNA screening for disease diagnosis. Two different approaches for sequencing DNA have been successfully demonstrated. One is to sequence DNA with DNA ladders produced from Sanger's enzymatic method. The other is to do direct sequencing without DNA ladders. The need for quick DNA typing for identification purposes is critical for forensic application. Our preliminary results indicate laser mass spectrometry can possible be used for rapid DNA fingerprinting applications at a much lower cost than gel electrophoresis. Population screening for certain genetic disease can be a very efficient step to reducing medical costs through prevention. Since laser mass spectrometry can provide very fast DNA analysis, we applied laser mass spectrometry to disease diagnosis. Clinical samples with both base deletion and point mutation have been tested with complete success.
GWFASTA: server for FASTA search in eukaryotic and microbial genomes.
Issac, Biju; Raghava, G P S
2002-09-01
Similarity searches are a powerful method for solving important biological problems such as database scanning, evolutionary studies, gene prediction, and protein structure prediction. FASTA is a widely used sequence comparison tool for rapid database scanning. Here we describe the GWFASTA server that was developed to assist the FASTA user in similarity searches against partially and/or completely sequenced genomes. GWFASTA consists of more than 60 microbial genomes, eight eukaryote genomes, and proteomes of annotatedgenomes. Infact, it provides the maximum number of databases for similarity searching from a single platform. GWFASTA allows the submission of more than one sequence as a single query for a FASTA search. It also provides integrated post-processing of FASTA output, including compositional analysis of proteins, multiple sequences alignment, and phylogenetic analysis. Furthermore, it summarizes the search results organism-wise for prokaryotes and chromosome-wise for eukaryotes. Thus, the integration of different tools for sequence analyses makes GWFASTA a powerful toolfor biologists.
Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana.
Mayer, K; Schüller, C; Wambutt, R; Murphy, G; Volckaert, G; Pohl, T; Düsterhöft, A; Stiekema, W; Entian, K D; Terryn, N; Harris, B; Ansorge, W; Brandt, P; Grivell, L; Rieger, M; Weichselgartner, M; de Simone, V; Obermaier, B; Mache, R; Müller, M; Kreis, M; Delseny, M; Puigdomenech, P; Watson, M; Schmidtheini, T; Reichert, B; Portatelle, D; Perez-Alonso, M; Boutry, M; Bancroft, I; Vos, P; Hoheisel, J; Zimmermann, W; Wedler, H; Ridley, P; Langham, S A; McCullagh, B; Bilham, L; Robben, J; Van der Schueren, J; Grymonprez, B; Chuang, Y J; Vandenbussche, F; Braeken, M; Weltjens, I; Voet, M; Bastiaens, I; Aert, R; Defoor, E; Weitzenegger, T; Bothe, G; Ramsperger, U; Hilbert, H; Braun, M; Holzer, E; Brandt, A; Peters, S; van Staveren, M; Dirske, W; Mooijman, P; Klein Lankhorst, R; Rose, M; Hauf, J; Kötter, P; Berneiser, S; Hempel, S; Feldpausch, M; Lamberth, S; Van den Daele, H; De Keyser, A; Buysshaert, C; Gielen, J; Villarroel, R; De Clercq, R; Van Montagu, M; Rogers, J; Cronin, A; Quail, M; Bray-Allen, S; Clark, L; Doggett, J; Hall, S; Kay, M; Lennard, N; McLay, K; Mayes, R; Pettett, A; Rajandream, M A; Lyne, M; Benes, V; Rechmann, S; Borkova, D; Blöcker, H; Scharfe, M; Grimm, M; Löhnert, T H; Dose, S; de Haan, M; Maarse, A; Schäfer, M; Müller-Auer, S; Gabel, C; Fuchs, M; Fartmann, B; Granderath, K; Dauner, D; Herzl, A; Neumann, S; Argiriou, A; Vitale, D; Liguori, R; Piravandi, E; Massenet, O; Quigley, F; Clabauld, G; Mündlein, A; Felber, R; Schnabl, S; Hiller, R; Schmidt, W; Lecharny, A; Aubourg, S; Chefdor, F; Cooke, R; Berger, C; Montfort, A; Casacuberta, E; Gibbons, T; Weber, N; Vandenbol, M; Bargues, M; Terol, J; Torres, A; Perez-Perez, A; Purnelle, B; Bent, E; Johnson, S; Tacon, D; Jesse, T; Heijnen, L; Schwarz, S; Scholler, P; Heber, S; Francs, P; Bielke, C; Frishman, D; Haase, D; Lemcke, K; Mewes, H W; Stocker, S; Zaccaria, P; Bevan, M; Wilson, R K; de la Bastide, M; Habermann, K; Parnell, L; Dedhia, N; Gnoj, L; Schutz, K; Huang, E; Spiegel, L; Sehkon, M; Murray, J; Sheet, P; Cordes, M; Abu-Threideh, J; Stoneking, T; Kalicki, J; Graves, T; Harmon, G; Edwards, J; Latreille, P; Courtney, L; Cloud, J; Abbott, A; Scott, K; Johnson, D; Minx, P; Bentley, D; Fulton, B; Miller, N; Greco, T; Kemp, K; Kramer, J; Fulton, L; Mardis, E; Dante, M; Pepin, K; Hillier, L; Nelson, J; Spieth, J; Ryan, E; Andrews, S; Geisel, C; Layman, D; Du, H; Ali, J; Berghoff, A; Jones, K; Drone, K; Cotton, M; Joshu, C; Antonoiu, B; Zidanic, M; Strong, C; Sun, H; Lamar, B; Yordan, C; Ma, P; Zhong, J; Preston, R; Vil, D; Shekher, M; Matero, A; Shah, R; Swaby, I K; O'Shaughnessy, A; Rodriguez, M; Hoffmann, J; Till, S; Granat, S; Shohdy, N; Hasegawa, A; Hameed, A; Lodhi, M; Johnson, A; Chen, E; Marra, M; Martienssen, R; McCombie, W R
1999-12-16
The higher plant Arabidopsis thaliana (Arabidopsis) is an important model for identifying plant genes and determining their function. To assist biological investigations and to define chromosome structure, a coordinated effort to sequence the Arabidopsis genome was initiated in late 1996. Here we report one of the first milestones of this project, the sequence of chromosome 4. Analysis of 17.38 megabases of unique sequence, representing about 17% of the genome, reveals 3,744 protein coding genes, 81 transfer RNAs and numerous repeat elements. Heterochromatic regions surrounding the putative centromere, which has not yet been completely sequenced, are characterized by an increased frequency of a variety of repeats, new repeats, reduced recombination, lowered gene density and lowered gene expression. Roughly 60% of the predicted protein-coding genes have been functionally characterized on the basis of their homology to known genes. Many genes encode predicted proteins that are homologous to human and Caenorhabditis elegans proteins.
Koundal, Vikas; Haq, Qazi Mohd Rizwanul; Praveen, Shelly
2011-02-01
The genome of Cucumber mosaic virus New Delhi strain (CMV-ND) from India, obtained from tomato, was completely sequenced and compared with full genome sequences of 14 known CMV strains from subgroups I and II, for their genetic diversity. Sequence analysis suggests CMV-ND shares maximum sequence identity at the nucleotide level with a CMV strain from Taiwan. Among all 15 strains of CMV, the encoded protein 2b is least conserved, whereas the coat protein (CP) is most conserved. Sequence identity values and phylogram results indicate that CMV-ND belongs to subgroup I. Based on the recombination detection program result, it appears that CMV is prone to recombination, and different RNA components of CMV-ND have evolved differently. Recombinational analysis of all 15 CMV strains detected maximum recombination breakpoints in RNA2; CP showed the least recombination sites.
Zeng, Y H; Chen, X H; Jiao, N Z
2007-12-01
To assess how completely the diversity of anoxygenic phototrophic bacteria (APB) was sampled in natural environments. All nucleotide sequences of the APB marker gene pufM from cultures and environmental clones were retrieved from the GenBank database. A set of cutoff values (sequence distances 0.06, 0.15 and 0.48 for species, genus, and (sub)phylum levels, respectively) was established using a distance-based grouping program. Analysis of the environmental clones revealed that current efforts on APB isolation and sampling in natural environments are largely inadequate. Analysis of the average distance between each identified genus and an uncultured environmental pufM sequence indicated that the majority of cultured APB genera lack environmental representatives. The distance-based grouping method is fast and efficient for bulk functional gene sequences analysis. The results clearly show that we are at a relatively early stage in sampling the global richness of APB species. Periodical assessment will undoubtedly facilitate in-depth analysis of potential biogeographical distribution pattern of APB. This is the first attempt to assess the present understanding of APB diversity in natural environments. The method used is also useful for assessing the diversity of other functional genes.
Sun, Chia-Tsen; Chiang, Austin W T; Hwang, Ming-Jing
2017-10-27
Proteome-scale bioinformatics research is increasingly conducted as the number of completely sequenced genomes increases, but analysis of protein domains (PDs) usually relies on similarity in their amino acid sequences and/or three-dimensional structures. Here, we present results from a bi-clustering analysis on presence/absence data for 6,580 unique PDs in 2,134 species with a sequenced genome, thus covering a complete set of proteins, for the three superkingdoms of life, Bacteria, Archaea, and Eukarya. Our analysis revealed eight distinctive PD clusters, which, following an analysis of enrichment of Gene Ontology functions and CATH classification of protein structures, were shown to exhibit structural and functional properties that are taxa-characteristic. For examples, the largest cluster is ubiquitous in all three superkingdoms, constituting a set of 1,472 persistent domains created early in evolution and retained in living organisms and characterized by basic cellular functions and ancient structural architectures, while an Archaea and Eukarya bi-superkingdom cluster suggests its PDs may have existed in the ancestor of the two superkingdoms, and others are single superkingdom- or taxa (e.g. Fungi)-specific. These results contribute to increase our appreciation of PD diversity and our knowledge of how PDs are used in species, yielding implications on species evolution.
Yoon, Jun-Hee; Kim, Thomas W; Mendez, Pedro; Jablons, David M; Kim, Il-Jin
2017-01-01
The development of next-generation sequencing (NGS) technology allows to sequence whole exomes or genome. However, data analysis is still the biggest bottleneck for its wide implementation. Most laboratories still depend on manual procedures for data handling and analyses, which translates into a delay and decreased efficiency in the delivery of NGS results to doctors and patients. Thus, there is high demand for developing an automatic and an easy-to-use NGS data analyses system. We developed comprehensive, automatic genetic analyses controller named Mobile Genome Express (MGE) that works in smartphones or other mobile devices. MGE can handle all the steps for genetic analyses, such as: sample information submission, sequencing run quality check from the sequencer, secured data transfer and results review. We sequenced an Actrometrix control DNA containing multiple proven human mutations using a targeted sequencing panel, and the whole analysis was managed by MGE, and its data reviewing program called ELECTRO. All steps were processed automatically except for the final sequencing review procedure with ELECTRO to confirm mutations. The data analysis process was completed within several hours. We confirmed the mutations that we have identified were consistent with our previous results obtained by using multi-step, manual pipelines.
Mori, Kazuki; Shirasawa, Kenta; Nogata, Hitoshi; Hirata, Chiharu; Tashiro, Kosuke; Habu, Tsuyoshi; Kim, Sangwan; Himeno, Shuichi; Kuhara, Satoru; Ikegami, Hidetoshi
2017-01-25
With the aim of identifying sex determinants of fig, we generated the first draft genome sequence of fig and conducted the subsequent analyses. Linkage analysis with a high-density genetic map established by a restriction-site associated sequencing technique, and genome-wide association study followed by whole-genome resequencing analysis identified two missense mutations in RESPONSIVE-TO-ANTAGONIST1 (RAN1) orthologue encoding copper-transporting ATPase completely associated with sex phenotypes of investigated figs. This result suggests that RAN1 is a possible sex determinant candidate in the fig genome. The genomic resources and genetic findings obtained in this study can contribute to general understanding of Ficus species and provide an insight into fig's and plant's sex determination system.
Schmid, Michael; Muri, Jonathan; Melidis, Damianos; Varadarajan, Adithi R; Somerville, Vincent; Wicki, Adrian; Moser, Aline; Bourqui, Marc; Wenzel, Claudia; Eugster-Meier, Elisabeth; Frey, Juerg E; Irmler, Stefan; Ahrens, Christian H
2018-01-01
Although complete genome sequences hold particular value for an accurate description of core genomes, the identification of strain-specific genes, and as the optimal basis for functional genomics studies, they are still largely underrepresented in public repositories. Based on an assessment of the genome assembly complexity for all lactobacilli, we used Pacific Biosciences' long read technology to sequence and de novo assemble the genomes of three Lactobacillus helveticus starter strains, raising the number of completely sequenced strains to 12. The first comparative genomics study for L. helveticus -to our knowledge-identified a core genome of 988 genes and sets of unique, strain-specific genes ranging from about 30 to more than 200 genes. Importantly, the comparison of MiSeq- and PacBio-based assemblies uncovered that not only accessory but also core genes can be missed in incomplete genome assemblies based on short reads. Analysis of the three genomes revealed that a large number of pseudogenes were enriched for functional Gene Ontology categories such as amino acid transmembrane transport and carbohydrate metabolism, which is in line with a reductive genome evolution in the rich natural habitat of L. helveticus . Notably, the functional Clusters of Orthologous Groups of proteins categories "cell wall/membrane biogenesis" and "defense mechanisms" were found to be enriched among the strain-specific genes. A genome mining effort uncovered examples where an experimentally observed phenotype could be linked to the underlying genotype, such as for cell envelope proteinase PrtH3 of strain FAM8627. Another possible link identified for peptidoglycan hydrolases will require further experiments. Of note, strain FAM22155 did not harbor a CRISPR/Cas system; its loss was also observed in other L. helveticus strains and lactobacillus species, thus questioning the value of the CRISPR/Cas system for diagnostic purposes. Importantly, the complete genome sequences proved to be very useful for the analysis of natural whey starter cultures with metagenomics, as a larger percentage of the sequenced reads of these complex mixtures could be unambiguously assigned down to the strain level.
Schmid, Michael; Muri, Jonathan; Melidis, Damianos; Varadarajan, Adithi R.; Somerville, Vincent; Wicki, Adrian; Moser, Aline; Bourqui, Marc; Wenzel, Claudia; Eugster-Meier, Elisabeth; Frey, Juerg E.; Irmler, Stefan; Ahrens, Christian H.
2018-01-01
Although complete genome sequences hold particular value for an accurate description of core genomes, the identification of strain-specific genes, and as the optimal basis for functional genomics studies, they are still largely underrepresented in public repositories. Based on an assessment of the genome assembly complexity for all lactobacilli, we used Pacific Biosciences' long read technology to sequence and de novo assemble the genomes of three Lactobacillus helveticus starter strains, raising the number of completely sequenced strains to 12. The first comparative genomics study for L. helveticus—to our knowledge—identified a core genome of 988 genes and sets of unique, strain-specific genes ranging from about 30 to more than 200 genes. Importantly, the comparison of MiSeq- and PacBio-based assemblies uncovered that not only accessory but also core genes can be missed in incomplete genome assemblies based on short reads. Analysis of the three genomes revealed that a large number of pseudogenes were enriched for functional Gene Ontology categories such as amino acid transmembrane transport and carbohydrate metabolism, which is in line with a reductive genome evolution in the rich natural habitat of L. helveticus. Notably, the functional Clusters of Orthologous Groups of proteins categories “cell wall/membrane biogenesis” and “defense mechanisms” were found to be enriched among the strain-specific genes. A genome mining effort uncovered examples where an experimentally observed phenotype could be linked to the underlying genotype, such as for cell envelope proteinase PrtH3 of strain FAM8627. Another possible link identified for peptidoglycan hydrolases will require further experiments. Of note, strain FAM22155 did not harbor a CRISPR/Cas system; its loss was also observed in other L. helveticus strains and lactobacillus species, thus questioning the value of the CRISPR/Cas system for diagnostic purposes. Importantly, the complete genome sequences proved to be very useful for the analysis of natural whey starter cultures with metagenomics, as a larger percentage of the sequenced reads of these complex mixtures could be unambiguously assigned down to the strain level. PMID:29441050
MACSIMS : multiple alignment of complete sequences information management system
Thompson, Julie D; Muller, Arnaud; Waterhouse, Andrew; Procter, Jim; Barton, Geoffrey J; Plewniak, Frédéric; Poch, Olivier
2006-01-01
Background In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. Results MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. Conclusion MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at . PMID:16792820
Genomic characterization and taxonomic position of a rhabdovirus from a hybrid snakehead.
Zeng, Weiwei; Wang, Qing; Wang, Yingying; Liu, Cun; Liang, Hongru; Fang, Xiang; Wu, Shuqin
2014-09-01
A new rhabdovirus, tentatively designated as hybrid snakehead rhabdovirus C1207 (HSHRV-C1207), was first isolated from a moribund hybrid snakehead (Channa maculata×Channa argus) in China. We present the complete genome sequence of HSHRV-C1207 and a comprehensive sequence comparison between HSHRV-C1207 and other rhabdoviruses. Sequence alignment and phylogenetic analysis revealed that HSHRV-C1207 shared the highest degree of homology with Monopterus albus rhabdovirus and Siniperca chuatsi rhabdovirus. All three viruses clustered into a single group that was distinct from the recognized genera in the family Rhabdoviridae. Our analysis suggests that HSHRV-C1207, as well as MARV and SCRV, should be assigned to a new rhabdovirus genus.
Alabi, Olufemi J; Villegas, Cecilia; Gregg, Lori; Murray, K Daniel
2016-06-01
Two isolates of a novel bipartite begomovirus, tentatively named malvastrum bright yellow mosaic virus (MaBYMV), were molecularly characterized from naturally infected plants of the genus Malvastrum showing bright yellow mosaic disease symptoms in South Texas. Six complete DNA-A and five DNA-B genome sequences of MaBYMV obtained from the isolates ranged in length from 2,608 to 2,609 nucleotides (nt) and 2,578 to 2,605 nt, respectively. Both genome segments shared a 178- to 180-nt common region. In pairwise comparisons, the complete DNA-A and DNA-B sequences of MaBYMV were most similar (87-88 % and 79-81 % identity, respectively) and phylogenetically related to the corresponding sequences of sida mosaic Sinaloa virus-[MX-Gua-06]. Further analysis revealed that MaBYMV is a putative recombinant virus, thus supporting the notion that malvaceous hosts may be influencing the evolution of several begomoviruses. The design of new diagnostic primers enabled the detection of MaBYMV in cohorts of Bemisia tabaci collected from symptomatic Malvastrum sp. plants, thus implicating whiteflies as potential vectors of the virus.
Sorimachi, Kenji; Okayasu, Teiji; Ohhira, Shuji
2015-04-01
Normalized nucleotide and amino acid contents of complete genome sequences can be visualized as radar charts. The shapes of these charts depict the characteristics of an organism's genome. The normalized values calculated from the genome sequence theoretically exclude experimental errors. Further, because normalization is independent of both target size and kind, this procedure is applicable not only to single genes but also to whole genomes, which consist of a huge number of different genes. In this review, we discuss the applications of the normalization of the nucleotide and predicted amino acid contents of complete genomes to the investigation of genome structure and to evolutionary research from primitive organisms to Homo sapiens. Some of the results could never have been obtained from the analysis of individual nucleotide or amino acid sequences but were revealed only after the normalization of nucleotide and amino acid contents was applied to genome research. The discovery that genome structure was homogeneous was obtained only after normalization methods were applied to the nucleotide or predicted amino acid contents of genome sequences. Normalization procedures are also applicable to evolutionary research. Thus, normalization of the contents of whole genomes is a useful procedure that can help to characterize organisms.
Drew, Richard John; Walsh, Anne; Laoi, Bairbre Ni; Crowley, Brendan
2012-07-01
BK polyomavirus (family Polyomaviridae) may cause hemorrhagic cystitis (BKV-HC) in hematopoietic stem cell transplant recipients. Eleven complete BKV genomes (GenBank accession numbers: JN192431-JN192441) were sequenced from urine samples of allogenic hematopoietic stem cell transplant recipients and compared to complete BKV genomes in the published literature. Of the 11 isolates, seven (64%) were subgroup Ib-1, three (27%) isolates belonged to subgroup Ib-2 and a single isolate belonged to subtype III. The analysis of single-nucleotide polymorphisms in this study showed that isolates could be subclassified into subtypes I-IV and subgroups Ib-1 and Ib-2 on the basis of VP1 of the first part of the Large T-antigen (LTag). The non-coding control region (NCCR) of the 11 isolates was also sequenced. These sequences showed that there was consistent sequence homology within subgroups Ib-1 and Ib-2. Two new mutations were described in the isolates, G→C at O(84) in isolate SJH-LG-310, and a deletion at R(2-7) in isolate SJH-LG-309. No known transcription factor is thought to be present at the site of either of these mutations. There were no rearrangements seen in isolates and this may be because the patients were not followed up over time. There were five nucleotide positions at which subgroup Ib-1 isolated differed from subgroup Ib-2 isolates in the NCCR sequence, O(41) , P(18) , P(31) , R(4) , and S(18) . The mutation O(41) is present in the promoter granulocyte/macrophage stimulating factor) gene and the P(31) mutation is present in the NF-1 gene. Copyright © 2012 Wiley Periodicals, Inc.
Silence of the centromeres--not.
Cooke, Howard J
2004-07-01
Centromeres are a conundrum; although many proteins associated with centomeres are conserved from yeast to humans, the underlying DNA sequence is not. A proposed solution to this problem is that an epigenetic, largely heterochromatic, state be imposed by these proteins. Recent analysis of a human neocentromere and the complete sequence of a rice centromere suggest that this epigenetic state can enable transcription of at least some genes within a centromere.
Rodrigues, Thaís C S; Subramaniam, Kuttichantran; Cortés-Hinojosa, Galaxia; Wellehan, James F X; Ng, Terry Fei Fan; Delwart, Eric; McCulloch, Stephen D; Goldstein, Juli D; Schaefer, Adam M; Fair, Patricia A; Reif, John S; Bossart, Gregory D; Waltzek, Thomas B
2018-04-26
The genome sequence of a papillomavirus was determined from fecal samples collected from bottlenose dolphins in the Indian River Lagoon, FL. The genome was 7,772 bp and displayed a typical papillomavirus genome organization. Phylogenetic analysis supported the bottlenose dolphin papillomavirus as being a novel type of Omikronpapillomavirus 1 . Copyright © 2018 Rodrigues et al.
Vemulapati, B; Druffel, K L; Eigenbrode, S D; Karasev, A; Pappu, H R
2010-10-01
The family Luteoviridae consists of eight viruses assigned to three different genera, Luteovirus, Polerovirus and Enamovirus. The complete genomic sequences of pea enation mosaic virus (genus Enamovirus) and bean leafroll virus (genus Luteovirus) from the Pacific Northwest, USA, were determined. Annotation, sequence comparisons, and phylogenetic analysis of selected genes together with those of known polero- and enamoviruses were conducted.
Zhang, Honghai; Chen, Lei
2011-03-01
The dhole (Cuon alpinus) is the only existent species in the genus Cuon (Carnivora: Canidae). In the present study, the complete mitochondrial genome of the dhole was sequenced. The total length is 16672 base pairs which is the shortest in Canidae. Sequence analysis revealed that most mitochondrial genomic functional regions were highly consistent among canid animals except the CSB domain of the control region. The difference in length among the Canidae mitochondrial genome sequences is mainly due to the number of short segments of tandem repeated in the CSB domain. Phylogenetic analysis was progressed based on the concatenated data set of 14 mitochondrial genes of 8 canid animals by using maximum parsimony (MP), maximum likelihood (ML) and Bayesian (BI) inference methods. The genera Vulpes and Nyctereutes formed a sister group and split first within Canidae, followed by that in the Cuon. The divergence in the genus Canis was the latest. The divarication of domestic dogs after that of the Canis lupus laniger is completely supported by all the three topologies. Pairwise sequence divergence data of different mitochondrial genes among canid animals were also determined. Except for the synonymous substitutions in protein-coding genes, the control region exhibits the highest sequence divergences. The synonymous rates are approximately two to six times higher than those of the non-synonymous sites except for a slightly higher rate in the non-synonymous substitution between Cuon alpinus and Vulpes vulpes. 16S rRNA genes have a slightly faster sequence divergence than 12S rRNA and tRNA genes. Based on nucleotide substitutions of tRNA genes and rRNA genes, the times since divergence between dhole and other canid animals, and between domestic dogs and three subspecies of wolves were evaluated. The result indicates that Vulpes and Nyctereutes have a close phylogenetic relationship and the divergence of Nyctereutes is a little earlier. The Tibetan wolf may be an archaic pedigree within wolf subspecies. The genetic distance between wolves and domestic dogs is less than that among different subspecies of wolves. The domestication of dogs was about 1.56-1.92 million years ago or even earlier.
The complete mitochondrial genome of black-footed ferret, Mustela nigripes (Mustela, Mustelinae).
Zhao, Ren-Bin; Zhou, Chao-Yang; Lu, Zhi-Xiang; Hu, Peng; Liu, Jian-Qiong; Tan, Wei-Wei; Yang, Tong-Hua
2016-05-01
In this study, the complete mitochondrial genome sequence of black-footed ferret, Mustela nigripes, is determined for the first time. This mitogenome is 16,556 bp in length and contains 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 control region (D-loop). The overall base composition is A (32.9%), C (26.1%), G (13.8%), and T (27.2%), so the percentage of A and T (60.1%) is higher than that of G and C. Most of the genes are encoded on H-strand, except for the ND6 subunit gene and six tRNA genes. The complete mitochondrial genome sequence reported here would be useful for further phylogenetic analysis and conservation genetic studies in M. nigripes.
Choe, Se-Eun; Nguyen, Thuy Thi-Dieu; Kang, Tae-Gyu; Kweon, Chang-Hee; Kang, Seung-Won
2011-09-01
Nuclear ribosomal DNA sequence of the second internal transcribed spacer (ITS-2) has been used efficiently to identify the liver fluke species collected from different hosts and various geographic regions. ITS-2 sequences of 19 Fasciola samples collected from Korean native cattle were determined and compared. Sequence comparison including ITS-2 sequences of isolates from this study and reference sequences from Fasciola hepatica and Fasciola gigantica and intermediate Fasciola in Genbank revealed seven identical variable sites of investigated isolates. Among 19 samples, 12 individuals had ITS-2 sequences completely identical to that of pure F. hepatica, five possessed the sequences identical to F. gigantica type, whereas two shared the sequence of both F. hepatica and F. gigantica. No variations in length and nucleotide composition of ITS-2 sequence were observed within isolates that belonged to F. hepatica or F. gigantica. At the position of 218, five Fasciola containing a single-base substitution (C>T) formed a distinct branch inside the F. gigantica-type group which was similar to those of Asian-origin isolates. The phylogenetic tree of the Fasciola spp. based on complete ITS-2 sequences from this study and other representative isolates in different locations clearly showed that pure F. hepatica, F. gigantica type and intermediate Fasciola were observed. The result also provided additional genetic evidence for the existence of three forms of Fasciola isolated from native cattle in Korea by genetic approach using ITS-2 sequence.
2016-10-27
Institute of Infectious Diseases, Fort Detrick, Frederick, Maryland, USA 9 10 11 Running head: Complete Genome Sequence of Y. pestis strain Cadman...1 Complete Genome Sequence of Pigmentation Negative Yersinia pestis strain Cadman 1 2 3 Sean Lovetta, Kitty Chaseb, Galina Korolevaa, Gustavo...we report the genome sequence of Yersinia pestis strain Cadman, an attenuated strain 25 lacking the pgm locus. Y. pestis is the causative agent of
Recurrence time statistics: versatile tools for genomic DNA sequence analysis.
Cao, Yinhe; Tung, Wen-Wen; Gao, J B
2004-01-01
With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.
Alvarado-Mora, Mónica Viviana; Santana, Rúbia Anita Ferraz; Sitnik, Roberta; Ferreira, Paulo Roberto Abrão; Mangueira, Cristovão Luís Pitangueira; Carrilho, Flair José; Pinho, João Renato Rebello
2011-06-01
The hepatitis B virus (HBV) is among the leading causes of chronic hepatitis, cirrhosis and hepatocellular carcinoma. In Brazil, genotype A is the most frequent, followed by genotypes D and F. Genotypes B and C are found in Brazil exclusively among Asian patients and their descendants. The aim of this study was to sequence the entire HBV genome of a Caucasian patient infected with HBV/C2 and to infer the origin of the virus based on sequencing analysis. The sequence of this Brazilian isolate was grouped with four other sequences described in China. The sequence of this patient is the first complete genome of HBV/C2 reported in Brazil.
Song, Wen Jun; Qin, Qi Wei; Qiu, Jin; Huang, Can Hua; Wang, Fan; Hew, Choy Leong
2004-01-01
Here we report the complete genome sequence of Singapore grouper iridovirus (SGIV). Sequencing of the random shotgun and restriction endonuclease genomic libraries showed that the entire SGIV genome consists of 140,131 nucleotide bp. One hundred sixty-two open reading frames (ORFs) from the sense and antisense DNA strands, coding for lengths varying from 41 to 1,268 amino acids, were identified. Computer-assisted analyses of the deduced amino acid sequences revealed that 77 of the ORFs exhibited homologies to known virus genes, 23 of which matched functional iridovirus proteins. Forty-two putative conserved domains or signatures were detected in the National Center for Biotechnology Information CD-Search database and PROSITE database. An assortment of enzyme activities involved in DNA replication, transcription, nucleotide metabolism, cell signaling, etc., were identified. Viruses were cultured on a cell line derived from the embryonated egg of the grouper Epinephelus tauvina, isolated, and purified by sucrose gradient ultracentrifugation. The protein extract from the purified virions was analyzed by polyacrylamide gel electrophoresis followed by in-gel digestion of protein bands. Matrix-assisted laser desorption ionization-time of flight mass spectrometry and database searching led to identification of 26 proteins. Twenty of these represented novel or previously unidentified genes, which were further confirmed by reverse transcription-PCR (RT-PCR) and DNA sequencing of their respective RT-PCR products. PMID:15507645
The Mouse Genomes Project: a repository of inbred laboratory mouse strain genomes.
Adams, David J; Doran, Anthony G; Lilue, Jingtao; Keane, Thomas M
2015-10-01
The Mouse Genomes Project was initiated in 2009 with the goal of using next-generation sequencing technologies to catalogue molecular variation in the common laboratory mouse strains, and a selected set of wild-derived inbred strains. The initial sequencing and survey of sequence variation in 17 inbred strains was completed in 2011 and included comprehensive catalogue of single nucleotide polymorphisms, short insertion/deletions, larger structural variants including their fine scale architecture and landscape of transposable element variation, and genomic sites subject to post-transcriptional alteration of RNA. From this beginning, the resource has expanded significantly to include 36 fully sequenced inbred laboratory mouse strains, a refined and updated data processing pipeline, and new variation querying and data visualisation tools which are available on the project's website ( http://www.sanger.ac.uk/resources/mouse/genomes/ ). The focus of the project is now the completion of de novo assembled chromosome sequences and strain-specific gene structures for the core strains. We discuss how the assembled chromosomes will power comparative analysis, data access tools and future directions of mouse genetics.
Weiner, Ronald M.; Taylor, Larry E.; Henrissat, Bernard; Hauser, Loren; Land, Miriam; Coutinho, Pedro M.; Rancurel, Corinne; Saunders, Elizabeth H.; Longmire, Atkinson G.; Zhang, Haitao; Bayer, Edward A.; Gilbert, Harry J.; Larimer, Frank; Zhulin, Igor B.; Ekborg, Nathan A.; Lamed, Raphael; Richardson, Paul M.; Borovok, Ilya; Hutcheson, Steven
2008-01-01
The marine bacterium Saccharophagus degradans strain 2-40 (Sde 2-40) is emerging as a vanguard of a recently discovered group of marine and estuarine bacteria that recycles complex polysaccharides. We report its complete genome sequence, analysis of which identifies an unusually large number of enzymes that degrade >10 complex polysaccharides. Not only is this an extraordinary range of catabolic capability, many of the enzymes exhibit unusual architecture including novel combinations of catalytic and substrate-binding modules. We hypothesize that many of these features are adaptations that facilitate depolymerization of complex polysaccharides in the marine environment. This is the first sequenced genome of a marine bacterium that can degrade plant cell walls, an important component of the carbon cycle that is not well-characterized in the marine environment. PMID:18516288
Complete Genome Analysis of an Enterovirus EV-B83 Isolated in China.
Tang, Jingjing; Li, Qiongfen; Tian, Bingjun; Zhang, Jie; Li, Kai; Ding, Zhengrong; Lu, Lin
2016-07-12
Enterovirus B83 (EV-B83) is a recently identified member of enterovirus species B. It is a rarely reported serotype and up to date, only the complete genome sequence of the prototype strain from the United States is available. In this study, we describe the complete genomic characterization of an EV-B83 strain 246/YN/CHN/08HC isolated from a healthy child living in border region of Yunnan Province, China in 2008. Compared with the prototype strain, it had 79.6% similarity in the complete genome and 78.9% similarity in the VP1 coding region, reflecting the great genetic divergence among them. VP1-coding region alignment revealed it had 77.2-91.3% with other EV-B83 sequences available in GenBank. Similarity plot analysis revealed it had higher identity with several other EV-B serotypes than the EV-B83 prototype strain in the P2 and P3 coding region, suggesting multiple recombination events might have occurred. The great genetic divergence with previously isolated strains and the extremely rare isolation suggest this serotype has circulated at a low epidemic strength for many years. This is the first report of complete genome of EV-B83 in China.
Montoya-Ruiz, Carolina; Cajimat, Maria N B; Milazzo, Mary Louise; Diaz, Francisco J; Rodas, Juan David; Valbuena, Gustavo; Fulhorst, Charles F
2015-07-01
The results of a previous study suggested that Cherrie's cane rat (Zygodontomys cherriei) is the principal host of Necoclí virus (family Bunyaviridae, genus Hantavirus) in Colombia. Bayesian analyses of complete nucleocapsid protein gene sequences and complete glycoprotein precursor gene sequences in this study confirmed that Necoclí virus is phylogenetically closely related to Maporal virus, which is principally associated with the delicate pygmy rice rat (Oligoryzomys delicatus) in western Venezuela. In pairwise comparisons, nonidentities between the complete amino acid sequence of the nucleocapsid protein of Necoclí virus and the complete amino acid sequences of the nucleocapsid proteins of other hantaviruses were ≥8.7%. Likewise, nonidentities between the complete amino acid sequence of the glycoprotein precursor of Necoclí virus and the complete amino acid sequences of the glycoprotein precursors of other hantaviruses were ≥11.7%. Collectively, the unique association of Necoclí virus with Z. cherriei in Colombia, results of the Bayesian analyses of complete nucleocapsid protein gene sequences and complete glycoprotein precursor gene sequences, and results of the pairwise comparisons of amino acid sequences strongly support the notion that Necoclí virus represents a novel species in the genus Hantavirus. Further work is needed to determine whether Calabazo virus (a hantavirus associated with Z. brevicauda cherriei in Panama) and Necoclí virus are conspecific.
Zhang, Yulei; Zhao, Lijuan; Chen, Wenjie; Huang, Yunmao; Yang, Ling; Sarathbabu, V; Wu, Zaohe; Li, Jun; Nie, Pin; Lin, Li
2017-10-01
We analyzed here the complete genome sequences of a highly virulent Flavobacterium columnare Pf1 strain isolated in our laboratory. The complete genome consists of a 3,171,081 bp circular DNA with 2784 predicted protein-coding genes. Among these, 286 genes were predicted as antibiotic resistance genes, including 32 RND-type efflux pump related genes which were associated with the export of aminoglycosides, indicating inducible aminoglycosides resistances in F. columnare. On the other hand, 328 genes were predicted as pathogenicity related genes which could be classified as virulence factors, gliding motility proteins, adhesins, and many putative secreted proteases. These genes were probably involved in the colonization, invasion and destruction of fish tissues during the infection of F. columnare. Apparently, our obtained complete genome sequences provide the basis for the explanation of the interactions between the F. columnare and the infected fish. The predicted antibiotic resistance and pathogenicity related genes will shed a new light on the development of more efficient preventional strategies against the infection of F. columnare, which is a major worldwide fish pathogen. Copyright © 2017 Elsevier Ltd. All rights reserved.
Sedlar, Karel; Kolek, Jan; Provaznik, Ivo; Patakova, Petra
2017-02-20
The complete genome sequence of non-type strain Clostridium pasteurianum NRRL B-598 was introduced last year; it is an oxygen tolerant, spore-forming, mesophilic heterofermentative bacterium with high hydrogen production and acetone-butanol fermentation ability. The basic genome statistics have shown its similarity to C. beijerinckii rather than the C. pasteurianum species. Here, we present a comparative analysis of the strain with several other complete clostridial genome sequences. Besides a 16S rRNA gene sequence comparison, digital DNA-DNA hybridization (dDDH) and phylogenomic analysis confirmed an inaccuracy of the taxonomic status of strain Clostridium pasteurianum NRRL B-598. Therefore, we suggest its reclassification to be Clostridium beijerinckii NRRL B-598. This is a specific strain and is not identical to other C. beijerinckii strains. This misclassification explains its unexpected behavior, different from other C. pasteurianum strains; it also permits better understanding of the bacterium for a future genetic manipulation that might increase its biofuel production potential. Copyright © 2017 Elsevier B.V. All rights reserved.
Meczker, Katalin; Dömötör, Dóra; Vass, János; Rákhely, Gábor; Schneider, György; Kovács, Tamás
2014-01-01
The enterobacterium Erwinia amylovora is the causal agent of fire blight. This study presents the analysis of the complete genome of phage PhiEaH1, isolated from the soil surrounding an E. amylovora-infected apple tree in Hungary. Its genome is 218 kb in size, containing 244 ORFs. PhiEaH1 is the second E. amylovora infecting phage from the Siphoviridae family whose complete genome sequence was determined. Beside PhiEaH2, PhiEaH1 is the other active component of Erwiphage, the first bacteriophage-based pesticide on the market against E. amylovora. Comparative genome analysis in this study has revealed that PhiEaH1 not only differs from the 10 formerly sequenced E. amylovora bacteriophages belonging to other phage families, but also from PhiEaH2. Sequencing of more Siphoviridae phage genomes might reveal further diversity, providing opportunities for the development of even more effective biological control agents, phage cocktails against Erwinia fire blight disease of commercial fruit crops.
MetaDP: a comprehensive web server for disease prediction of 16S rRNA metagenomic datasets.
Xu, Xilin; Wu, Aiping; Zhang, Xinlei; Su, Mingming; Jiang, Taijiao; Yuan, Zhe-Ming
2016-01-01
High-throughput sequencing-based metagenomics has garnered considerable interest in recent years. Numerous methods and tools have been developed for the analysis of metagenomic data. However, it is still a daunting task to install a large number of tools and complete a complicated analysis, especially for researchers with minimal bioinformatics backgrounds. To address this problem, we constructed an automated software named MetaDP for 16S rRNA sequencing data analysis, including data quality control, operational taxonomic unit clustering, diversity analysis, and disease risk prediction modeling. Furthermore, a support vector machine-based prediction model for intestinal bowel syndrome (IBS) was built by applying MetaDP to microbial 16S sequencing data from 108 children. The success of the IBS prediction model suggests that the platform may also be applied to other diseases related to gut microbes, such as obesity, metabolic syndrome, or intestinal cancer, among others (http://metadp.cn:7001/).
Villacreses, Javier; Rojas-Herrera, Marcelo; Sánchez, Carolina; Hewstone, Nicole; Undurraga, Soledad F.; Alzate, Juan F.; Manque, Patricio; Maracaja-Coutinho, Vinicius; Polanco, Victor
2015-01-01
Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1). High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs): ORFs 1 and 2 shares 66%–73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV), Petuvirus genus. ORF1 encodes a movement protein (MP); ORF2 a Reverse Transcriptase (RT) and a Ribonuclease H (RNase H) domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs), AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq). Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant. PMID:25855242
The COG database: a tool for genome-scale analysis of protein functions and evolution
Tatusov, Roman L.; Galperin, Michael Y.; Natale, Darren A.; Koonin, Eugene V.
2000-01-01
Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www.ncbi.nlm.nih.gov/COG ). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56–83% of the gene products from each of the complete bacterial and archaeal genomes and ~35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes. PMID:10592175
Kim, Kyunghee; Lee, Sang-Choon; Lee, Junki; Yu, Yeisoo; Yang, Kiwoung; Choi, Beom-Soon; Koh, Hee-Jong; Waminal, Nomar Espinosa; Choi, Hong-Il; Kim, Nam-Hoon; Jang, Woojong; Park, Hyun-Seung; Lee, Jonghoon; Lee, Hyun Oh; Joh, Ho Jun; Lee, Hyeon Ju; Park, Jee Young; Perumal, Sampath; Jayakodi, Murukarthick; Lee, Yun Sun; Kim, Backki; Copetti, Dario; Kim, Soonok; Kim, Sunggil; Lim, Ki-Byung; Kim, Young-Dong; Lee, Jungho; Cho, Kwang-Su; Park, Beom-Seok; Wing, Rod A.; Yang, Tae-Jin
2015-01-01
Cytoplasmic chloroplast (cp) genomes and nuclear ribosomal DNA (nR) are the primary sequences used to understand plant diversity and evolution. We introduce a high-throughput method to simultaneously obtain complete cp and nR sequences using Illumina platform whole-genome sequence. We applied the method to 30 rice specimens belonging to nine Oryza species. Concurrent phylogenomic analysis using cp and nR of several of specimens of the same Oryza AA genome species provides insight into the evolution and domestication of cultivated rice, clarifying three ambiguous but important issues in the evolution of wild Oryza species. First, cp-based trees clearly classify each lineage but can be biased by inter-subspecies cross-hybridization events during speciation. Second, O. glumaepatula, a South American wild rice, includes two cytoplasm types, one of which is derived from a recent interspecies hybridization with O. longistminata. Third, the Australian O. rufipogan-type rice is a perennial form of O. meridionalis. PMID:26506948
NASA Astrophysics Data System (ADS)
Gong, Liang; Wu, Yu; Jian, Qijie; Yin, Chunxiao; Li, Taotao; Gupta, Vijai Kumar; Duan, Xuewu; Jiang, Yueming
2018-01-01
Vibrio qinghaiensis sp.-Q67 (Vqin-Q67) is a freshwater luminescent bacterium that continuously emits blue-green light (485 nm). The bacterium has been widely used for detecting toxic contaminants. Here, we report the complete genome sequence of Vqin-Q67, obtained using third-generation PacBio sequencing technology. Continuous long reads were attained from three PacBio sequencing runs and reads >500 bp with a quality value of >0.75 were merged together into a single dataset. This resultant highly-contiguous de novo assembly has no genome gaps, and comprises two chromosomes with substantial genetic information, including protein-coding genes, non-coding RNA, transposon and gene islands. Our dataset can be useful as a comparative genome for evolution and speciation studies, as well as for the analysis of protein-coding gene families, the pathogenicity of different Vibrio species in fish, the evolution of non-coding RNA and transposon, and the regulation of gene expression in relation to the bioluminescence of Vqin-Q67.
[Big Data Revolution or Data Hubris? : On the Data Positivism of Molecular Biology].
Gramelsberger, Gabriele
2017-12-01
Genome data, the core of the 2008 proclaimed big data revolution in biology, are automatically generated and analyzed. The transition from the manual laboratory practice of electrophoresis sequencing to automated DNA-sequencing machines and software-based analysis programs was completed between 1982 and 1992. This transition facilitated the first data deluge, which was considerably increased by the second and third generation of DNA-sequencers during the 2000s. However, the strategies for evaluating sequence data were also transformed along with this transition. The paper explores both the computational strategies of automation, as well as the data evaluation culture connected with it, in order to provide a complete picture of the complexity of today's data generation and its intrinsic data positivism. This paper is thereby guided by the question, whether this data positivism is the basis of the big data revolution of molecular biology announced today, or it marks the beginning of its data hubris.
A third genotype of the human parvovirus PARV4 in sub-Saharan Africa.
Simmonds, Peter; Douglas, Jill; Bestetti, Giovanna; Longhi, Erika; Antinori, Spinello; Parravicini, Carlo; Corbellino, Mario
2008-09-01
PARV4 is a recently discovered human parvovirus widely distributed in injecting drug users in the USA and Europe, particularly in those co-infected with human immunodeficiency virus (HIV). Like parvovirus B19, PARV4 persists in previously exposed individuals. In bone marrow and lymphoid tissue, PARV4 sequences were detected in two sub-Saharan African study subjects with AIDS but without a reported history of parenteral exposure and who were uninfected with hepatitis C virus. PARV4 variants infecting these subjects were phylogenetically distinct from genotypes 1 and 2 (formerly PARV5) that were reported previously. Analysis of near-complete genome sequences demonstrated that they should be classified as a third (equidistant) PARV4 genotype. The availability of a further near-complete genome sequence of this novel genotype facilitated identification of conserved novel open reading frames embedded in the ORF2 coding sequence; one encoded a putative protein with identifiable homology to SAT proteins of members of the genus Parvovirus.
Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran
2015-02-06
Gaining access to sequence and structure information of telomere binding proteins helps in understanding the essential biological processes involve in conserved sequence specific interaction between DNA and the proteins. Rice telomere binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix turn helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain but till now there is very less communication on the in silico studies of these complete proteins.Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK web server.Digging up all the facts about the proteins it was reveled that around 120 amino acids in the tail part was showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicates the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and Energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.
Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran
2015-09-01
Gaining access to sequence and structure information of telomere-binding proteins helps in understanding the essential biological processes involve in conserved sequence-specific interaction between DNA and the proteins. Rice telomere-binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix-turn-helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain, but till now there is very less communication on the in silico studies of these complete proteins. Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK Web server. By digging up all the facts about the proteins, it was revealed that around 120 amino acids in the tail part were showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicate the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA-binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.
Comparative genome analysis in the integrated microbial genomes (IMG) system.
Markowitz, Victor M; Kyrpides, Nikos C
2007-01-01
Comparative genome analysis is critical for the effective exploration of a rapidly growing number of complete and draft sequences for microbial genomes. The Integrated Microbial Genomes (IMG) system (img.jgi.doe.gov) has been developed as a community resource that provides support for comparative analysis of microbial genomes in an integrated context. IMG allows users to navigate the multidimensional microbial genome data space and focus their analysis on a subset of genes, genomes, and functions of interest. IMG provides graphical viewers, summaries, and occurrence profile tools for comparing genes, pathways, and functions (terms) across specific genomes. Genes can be further examined using gene neighborhoods and compared with sequence alignment tools.
Benevenuto, Juliana; Peters, Leila P.; Carvalho, Giselle; Palhares, Alessandra; Quecine, Maria C.; Nunes, Filipe R. S.; Kmit, Maria C. P.; Wai, Alvan; Hausner, Georg; Aitken, Karen S.; Berkman, Paul J.; Fraser, James A.; Moolhuijzen, Paula M.; Coutinho, Luiz L.; Creste, Silvana; Vieira, Maria L. C.; Kitajima, João P.; Monteiro-Vitorello, Claudia B.
2015-01-01
Sporisorium scitamineum is a biotrophic fungus responsible for the sugarcane smut, a worldwide spread disease. This study provides the complete sequence of individual chromosomes of S. scitamineum from telomere to telomere achieved by a combination of PacBio long reads and Illumina short reads sequence data, as well as a draft sequence of a second fungal strain. Comparative analysis to previous available sequences of another strain detected few polymorphisms among the three genomes. The novel complete sequence described herein allowed us to identify and annotate extended subtelomeric regions, repetitive elements and the mitochondrial DNA sequence. The genome comprises 19,979,571 bases, 6,677 genes encoding proteins, 111 tRNAs and 3 assembled copies of rDNA, out of our estimated number of copies as 130. Chromosomal reorganizations were detected when comparing to sequences of S. reilianum, the closest smut relative, potentially influenced by repeats of transposable elements. Repetitive elements may have also directed the linkage of the two mating-type loci. The fungal transcriptome profiling from in vitro and from interaction with sugarcane at two time points (early infection and whip emergence) revealed that 13.5% of the genes were differentially expressed in planta and particular to each developmental stage. Among them are plant cell wall degrading enzymes, proteases, lipases, chitin modification and lignin degradation enzymes, sugar transporters and transcriptional factors. The fungus also modulates transcription of genes related to surviving against reactive oxygen species and other toxic metabolites produced by the plant. Previously described effectors in smut/plant interactions were detected but some new candidates are proposed. Ten genomic islands harboring some of the candidate genes unique to S. scitamineum were expressed only in planta. RNAseq data was also used to reassure gene predictions. PMID:26065709
MIPS: a database for genomes and protein sequences
Mewes, H. W.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Mayer, K.; Mokrejs, M.; Morgenstern, B.; Münsterkötter, M.; Rudd, S.; Weil, B.
2002-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz–Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91–93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155–158; Barker et al. (2001) Nucleic Acids Res., 29, 29–32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de). PMID:11752246
MIPS: a database for genomes and protein sequences.
Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B
2002-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).
USDA-ARS?s Scientific Manuscript database
We report the complete genome sequence of Clavibacter michiganensis subsp. insidiosus R1-1 isolated in Minnesota, USA. The R1-1 genome, generated by de novo assembly of PacBio sequencing data, is the first complete genome sequence available for this subspecies....
Kizaki, Seiichiro; Chandran, Anandhakumar; Sugiyama, Hiroshi
2016-03-02
Tet (ten-eleven translocation) family proteins have the ability to oxidize 5-methylcytosine (mC) to 5-hydroxymethylcytosine (hmC), 5-formylcytosine (fC), and 5-carboxycytosine (caC). However, the oxidation reaction of Tet is not understood completely. Evaluation of genomic-level epigenetic changes by Tet protein requires unbiased identification of the highly selective oxidation sites. In this study, we used high-throughput sequencing to investigate the sequence specificity of mC oxidation by Tet1. A 6.6×10(4) -member mC-containing random DNA-sequence library was constructed. The library was subjected to Tet-reactive pulldown followed by high-throughput sequencing. Analysis of the obtained sequence data identified the Tet1-reactive sequences. We identified mCpG as a highly reactive sequence of Tet1 protein. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Association mining of dependency between time series
NASA Astrophysics Data System (ADS)
Hafez, Alaaeldin
2001-03-01
Time series analysis is considered as a crucial component of strategic control over a broad variety of disciplines in business, science and engineering. Time series data is a sequence of observations collected over intervals of time. Each time series describes a phenomenon as a function of time. Analysis on time series data includes discovering trends (or patterns) in a time series sequence. In the last few years, data mining has emerged and been recognized as a new technology for data analysis. Data Mining is the process of discovering potentially valuable patterns, associations, trends, sequences and dependencies in data. Data mining techniques can discover information that many traditional business analysis and statistical techniques fail to deliver. In this paper, we adapt and innovate data mining techniques to analyze time series data. By using data mining techniques, maximal frequent patterns are discovered and used in predicting future sequences or trends, where trends describe the behavior of a sequence. In order to include different types of time series (e.g. irregular and non- systematic), we consider past frequent patterns of the same time sequences (local patterns) and of other dependent time sequences (global patterns). We use the word 'dependent' instead of the word 'similar' for emphasis on real life time series where two time series sequences could be completely different (in values, shapes, etc.), but they still react to the same conditions in a dependent way. In this paper, we propose the Dependence Mining Technique that could be used in predicting time series sequences. The proposed technique consists of three phases: (a) for all time series sequences, generate their trend sequences, (b) discover maximal frequent trend patterns, generate pattern vectors (to keep information of frequent trend patterns), use trend pattern vectors to predict future time series sequences.
De Bruyn, Alexandre; Harimalala, Mireille; Hoareau, Murielle; Ranomenjanahary, Sahondramalala; Reynaud, Bernard; Lefeuvre, Pierre; Lett, Jean-Michel
2015-06-01
Here, we describe for the first time the complete genome sequence of a new bipartite begomovirus in Madagascar isolated from the weed Asystasia gangetica (Acanthaceae), for which we propose the tentative name asystasia mosaic Madagascar virus (AMMGV). DNA-A and -B nucleotide sequences of AMMGV were only distantly related to known begomovirus sequence and shared highest nucleotide sequence identity of 72.9 % (DNA-A) and 66.9 % (DNA-B) with a recently described bipartite begomovirus infecting Asystasia sp. in West Africa. Phylogenetic analysis demonstrated that this novel virus from Madagascar belongs to a new lineage of Old World bipartite begomoviruses.
The complete sequence of the mitochondrial genome of the African Penguin (Spheniscus demersus).
Labuschagne, Christiaan; Kotzé, Antoinette; Grobler, J Paul; Dalton, Desiré L
2014-01-15
The complete mitochondrial genome of the African Penguin (Spheniscus demersus) was sequenced. The molecule was sequenced via next generation sequencing and primer walking. The size of the genome is 17,346 bp in length. Comparison with the mitochondrial DNA of two other penguin genomes that have so far been reported was conducted namely; Little blue penguin (Eudyptula minor) and the Rockhopper penguin (Eudyptes chrysocome). This analysis made it possible to identify common penguin mitochondrial DNA characteristics. The S. demersus mtDNA genome is very similar, both in composition and length to both the E. chrysocome and E. minor genomes. The gene content of the African penguin mitochondrial genome is typical of vertebrates and all three penguin species have the standard gene order originally identified in the chicken. The control region for S. demersus is located between tRNA-Glu and tRNA-Phe and all three species of penguins contain two sets of similar repeats with varying copy numbers towards the 3' end of the control region, accounting for the size variance. This is the first report of the complete nucleotide sequence for the mitochondrial genome of the African penguin, S. demersus. These results can be subsequently used to provide information for penguin phylogenetic studies and insights into the evolution of genomes. © 2013 Elsevier B.V. All rights reserved.
The SUPERFAMILY database in 2004: additions and improvements.
Madera, Martin; Vogel, Christine; Kummerfeld, Sarah K; Chothia, Cyrus; Gough, Julian
2004-01-01
The SUPERFAMILY database provides structural assignments to protein sequences and a framework for analysis of the results. At the core of the database is a library of profile Hidden Markov Models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent an entire superfamily. We have applied the library to predicted proteins from all completely sequenced genomes (currently 154), the Swiss-Prot and TrEMBL databases and other sequence collections. Close to 60% of all proteins have at least one match, and one half of all residues are covered by assignments. All models and full results are available for download and online browsing at http://supfam.org. Users can study the distribution of their superfamily of interest across all completely sequenced genomes, investigate with which other superfamilies it combines and retrieve proteins in which it occurs. Alternatively, concentrating on a particular genome as a whole, it is possible first, to find out its superfamily composition, and secondly, to compare it with that of other genomes to detect superfamilies that are over- or under-represented. In addition, the webserver provides the following standard services: sequence search; keyword search for genomes, superfamilies and sequence identifiers; and multiple alignment of genomic, PDB and custom sequences.
NASA Astrophysics Data System (ADS)
Purba, H.; Musu, J. T.; Diria, S. A.; Permono, W.; Sadjati, O.; Sopandi, I.; Ruzi, F.
2018-03-01
Well logging data provide many geological information and its trends resemble nonlinear or non-stationary signals. As long well log data recorded, there will be external factors can interfere or influence its signal resolution. A sensitive signal analysis is required to improve the accuracy of logging interpretation which it becomes an important thing to determine sequence stratigraphy. Complete Ensemble Empirical Mode Decomposition (CEEMD) is one of nonlinear and non-stationary signal analysis method which decomposes complex signal into a series of intrinsic mode function (IMF). Gamma Ray and Spontaneous Potential well log parameters decomposed into IMF-1 up to IMF-10 and each of its combination and correlation makes physical meaning identification. It identifies the stratigraphy and cycle sequence and provides an effective signal treatment method for sequence interface. This method was applied to BRK- 30 and BRK-13 well logging data. The result shows that the combination of IMF-5, IMF-6, and IMF-7 pattern represent short-term and middle-term while IMF-9 and IMF-10 represent the long-term sedimentation which describe distal front and delta front facies, and inter-distributary mouth bar facies, respectively. Thus, CEEMD clearly can determine the different sedimentary layer interface and better identification of the cycle of stratigraphic base level.
Breaking the 1000-gene barrier for Mimivirus using ultra-deep genome and transcriptome sequencing.
Legendre, Matthieu; Santini, Sébastien; Rico, Alain; Abergel, Chantal; Claverie, Jean-Michel
2011-03-04
Mimivirus, a giant dsDNA virus infecting Acanthamoeba, is the prototype of the mimiviridae family, the latest addition to the family of the nucleocytoplasmic large DNA viruses (NCLDVs). Its 1.2 Mb-genome was initially predicted to encode 917 genes. A subsequent RNA-Seq analysis precisely mapped many transcript boundaries and identified 75 new genes. We now report a much deeper analysis using the SOLiD™ technology combining RNA-Seq of the Mimivirus transcriptome during the infectious cycle (202.4 Million reads), and a complete genome re-sequencing (45.3 Million reads). This study corrected the genome sequence and identified several single nucleotide polymorphisms. Our results also provided clear evidence of previously overlooked transcription units, including an important RNA polymerase subunit distantly related to Euryarchea homologues. The total Mimivirus gene count is now 1018, 11% greater than the original annotation. This study highlights the huge progress brought about by ultra-deep sequencing for the comprehensive annotation of virus genomes, opening the door to a complete one-nucleotide resolution level description of their transcriptional activity, and to the realistic modeling of the viral genome expression at the ultimate molecular level. This work also illustrates the need to go beyond bioinformatics-only approaches for the annotation of short protein and non-coding genes in viral genomes.
Complete Sequence and Analysis of Coconut Palm (Cocos nucifera) Mitochondrial Genome
Zhao, Yuhui; Zeng, Jingyao; Alamer, Ali; Alanazi, Ibrahim O.; Alawad, Abdullah O.; Al-Sadi, Abdullah M.; Hu, Songnian; Yu, Jun
2016-01-01
Coconut (Cocos nucifera L.), a member of the palm family (Arecaceae), is one of the most economically important crops in tropics, serving as an important source of food, drink, fuel, medicine, and construction material. Here we report an assembly of the coconut (C. nucifera, Oman local Tall cultivar) mitochondrial (mt) genome based on next-generation sequencing data. This genome, 678,653bp in length and 45.5% in GC content, encodes 72 proteins, 9 pseudogenes, 23 tRNAs, and 3 ribosomal RNAs. Within the assembly, we find that the chloroplast (cp) derived regions account for 5.07% of the total assembly length, including 13 proteins, 2 pseudogenes, and 11 tRNAs. The mt genome has a relatively large fraction of repeat content (17.26%), including both forward (tandem) and inverted (palindromic) repeats. Sequence variation analysis shows that the Ti/Tv ratio of the mt genome is lower as compared to that of the nuclear genome and neutral expectation. By combining public RNA-Seq data for coconut, we identify 734 RNA editing sites supported by at least two datasets. In summary, our data provides the second complete mt genome sequence in the family Arecaceae, essential for further investigations on mitochondrial biology of seed plants. PMID:27736909
Postel, Alexander; Schmeiser, Stefanie; Zimmermann, Bernd; Becher, Paul
2016-01-01
Molecular epidemiology has become an indispensable tool in the diagnosis of diseases and in tracing the infection routes of pathogens. Due to advances in conventional sequencing and the development of high throughput technologies, the field of sequence determination is in the process of being revolutionized. Platforms for sharing sequence information and providing standardized tools for phylogenetic analyses are becoming increasingly important. The database (DB) of the European Union (EU) and World Organisation for Animal Health (OIE) Reference Laboratory for classical swine fever offers one of the world’s largest semi-public virus-specific sequence collections combined with a module for phylogenetic analysis. The classical swine fever (CSF) DB (CSF-DB) became a valuable tool for supporting diagnosis and epidemiological investigations of this highly contagious disease in pigs with high socio-economic impacts worldwide. The DB has been re-designed and now allows for the storage and analysis of traditionally used, well established genomic regions and of larger genomic regions including complete viral genomes. We present an application example for the analysis of highly similar viral sequences obtained in an endemic disease situation and introduce the new geographic “CSF Maps” tool. The concept of this standardized and easy-to-use DB with an integrated genetic typing module is suited to serve as a blueprint for similar platforms for other human or animal viruses. PMID:27827988
A survey of tools for variant analysis of next-generation genome sequencing data
Pabinger, Stephan; Dander, Andreas; Fischer, Maria; Snajder, Rene; Sperk, Michael; Efremova, Mirjana; Krabichler, Birgit; Speicher, Michael R.; Zschocke, Johannes
2014-01-01
Recent advances in genome sequencing technologies provide unprecedented opportunities to characterize individual genomic landscapes and identify mutations relevant for diagnosis and therapy. Specifically, whole-exome sequencing using next-generation sequencing (NGS) technologies is gaining popularity in the human genetics community due to the moderate costs, manageable data amounts and straightforward interpretation of analysis results. While whole-exome and, in the near future, whole-genome sequencing are becoming commodities, data analysis still poses significant challenges and led to the development of a plethora of tools supporting specific parts of the analysis workflow or providing a complete solution. Here, we surveyed 205 tools for whole-genome/whole-exome sequencing data analysis supporting five distinct analytical steps: quality assessment, alignment, variant identification, variant annotation and visualization. We report an overview of the functionality, features and specific requirements of the individual tools. We then selected 32 programs for variant identification, variant annotation and visualization, which were subjected to hands-on evaluation using four data sets: one set of exome data from two patients with a rare disease for testing identification of germline mutations, two cancer data sets for testing variant callers for somatic mutations, copy number variations and structural variations, and one semi-synthetic data set for testing identification of copy number variations. Our comprehensive survey and evaluation of NGS tools provides a valuable guideline for human geneticists working on Mendelian disorders, complex diseases and cancers. PMID:23341494
Comprehensive analysis of orthologous protein domains using the HOPS database.
Storm, Christian E V; Sonnhammer, Erik L L
2003-10-01
One of the most reliable methods for protein function annotation is to transfer experimentally known functions from orthologous proteins in other organisms. Most methods for identifying orthologs operate on a subset of organisms with a completely sequenced genome, and treat proteins as single-domain units. However, it is well known that proteins are often made up of several independent domains, and there is a wealth of protein sequences from genomes that are not completely sequenced. A comprehensive set of protein domain families is found in the Pfam database. We wanted to apply orthology detection to Pfam families, but first some issues needed to be addressed. First, orthology detection becomes impractical and unreliable when too many species are included. Second, shorter domains contain less information. It is therefore important to assess the quality of the orthology assignment and avoid very short domains altogether. We present a database of orthologous protein domains in Pfam called HOPS: Hierarchical grouping of Orthologous and Paralogous Sequences. Orthology is inferred in a hierarchic system of phylogenetic subgroups using ortholog bootstrapping. To avoid the frequent errors stemming from horizontally transferred genes in bacteria, the analysis is presently limited to eukaryotic genes. The results are accessible in the graphical browser NIFAS, a Java tool originally developed for analyzing phylogenetic relations within Pfam families. The method was tested on a set of curated orthologs with experimentally verified function. In comparison to tree reconciliation with a complete species tree, our approach finds significantly more orthologs in the test set. Examples for investigating gene fusions and domain recombination using HOPS are given.
Application of resequencing to rice genomics, functional genomics and evolutionary analysis
2014-01-01
Rice is a model system used for crop genomics studies. The completion of the rice genome draft sequences in 2002 not only accelerated functional genome studies, but also initiated a new era of resequencing rice genomes. Based on the reference genome in rice, next-generation sequencing (NGS) using the high-throughput sequencing system can efficiently accomplish whole genome resequencing of various genetic populations and diverse germplasm resources. Resequencing technology has been effectively utilized in evolutionary analysis, rice genomics and functional genomics studies. This technique is beneficial for both bridging the knowledge gap between genotype and phenotype and facilitating molecular breeding via gene design in rice. Here, we also discuss the limitation, application and future prospects of rice resequencing. PMID:25006357
Novel primers for complete mitochondrial cytochrome b genesequencing in mammals
Naidu, Ashwin; Fitak, Robert R.; Munguia-Vega, Adrian; Culver, Melanie
2011-01-01
Sequence-based species identification relies on the extent and integrity of sequence data available in online databases such as GenBank. When identifying species from a sample of unknown origin, partial DNA sequences obtained from the sample are aligned against existing sequences in databases. When the sequence from the matching species is not present in the database, high-scoring alignments with closely related sequences might produce unreliable results on species identity. For species identification in mammals, the cytochrome b (cyt b) gene has been identified to be highly informative; thus, large amounts of reference sequence data from the cyt b gene are much needed. To enhance availability of cyt b gene sequence data on a large number of mammalian species in GenBank and other such publicly accessible online databases, we identified a primer pair for complete cyt b gene sequencing in mammals. Using this primer pair, we successfully PCR amplified and sequenced the complete cyt b gene from 40 of 44 mammalian species representing 10 orders of mammals. We submitted 40 complete, correctly annotated, cyt b protein coding sequences to GenBank. To our knowledge, this is the first single primer pair to amplify the complete cyt b gene in a broad range of mammalian species. This primer pair can be used for the addition of new cyt b gene sequences and to enhance data available on species represented in GenBank. The availability of novel and complete gene sequences as high-quality reference data can improve the reliability of sequence-based species identification.
The complete chloroplast genome of North American ginseng, Panax quinquefolius.
Han, Zeng-Jie; Li, Wei; Liu, Yuan; Gao, Li-Zhi
2016-09-01
We report complete nucleotide sequence of the Panax quinquefolius chloroplast genome using next-generation sequencing technology. The genome size is 156 359 bp, including two inverted repeats (IRs) of 52 153 bp, separated by the large single-copy (LSC 86 184 bp) and small single-copy (SSC 18 081 bp) regions. This cp genome encodes 114 unigenes (80 protein-coding genes, four rRNA genes, and 30 tRNA genes), in which 18 are duplicated in the IR regions. Overall GC content of the genome is 38.08%. A phylogenomic analysis of the 10 complete chloroplast genomes from Araliaceae using Daucus carota from Apiaceae as outgroup showed that P. quinquefolius is closely related to the other two members of the genus Panax, P. ginseng and P. notoginseng.
Lu, You; Samac, Deborah A.; Glazebrook, Jane
2015-01-01
We report here the complete genome sequence of Clavibacter michiganensis subsp. insidiosus R1-1, isolated in Minnesota, USA. The R1-1 genome, generated by a de novo assembly of PacBio sequencing data, is the first complete genome sequence available for this subspecies. PMID:25953184
What can we learn about lyssavirus genomes using 454 sequencing?
Höper, Dirk; Finke, Stefan; Freuling, Conrad M; Hoffmann, Bernd; Beer, Martin
2012-01-01
The main task of the individual project number four"Whole genome sequencing, virus-host adaptation, and molecular epidemiological analyses of lyssaviruses "within the network" Lyssaviruses--a potential re-emerging public health threat" is to provide high quality complete genome sequences from lyssaviruses. These sequences are analysed in-depth with regard to the diversity of the viral populations as to both quasi-species and so-called defective interfering RNAs. Moreover, the sequence data will facilitate further epidemiological analyses, will provide insight into the evolution of lyssaviruses and will be the basis for the design of novel nucleic acid based diagnostics. The first results presented here indicate that not only high quality full-length lyssavirus genome sequences can be generated, but indeed efficient analysis of the viral population gets feasible.
Yip, Cyril C Y; Lo, Janice Y C; Sridhar, Siddharth; Lung, David C; Luk, Shik; Chan, Kwok-Hung; Chan, Jasper F W; Cheng, Vincent C C; Woo, Patrick C Y; Yuen, Kwok-Yung; Lau, Susanna K P
2017-05-16
A fatal case associated with enterovirus D68 (EV-D68) infection affecting a 10-year-old boy was reported in Hong Kong in 2014. To examine if a new strain has emerged in Hong Kong, we sequenced the partial genome of the EV-D68 strain identified from the fatal case and the complete VP1, and partial 5'UTR and 2C sequences of nine additional EV-D68 strains isolated from patients in Hong Kong. Sequence analysis indicated that a cluster of strains including the previously recognized A2 strains should belong to a separate clade, clade D, which is further divided into subclades D1 and D2. Among the 10 EV-D68 strains, 7 (including the fatal case) belonged to the previously described, newly emerged subclade B3, 2 belonged to subclade B1, and 1 belonged to subclade D1. Three EV-D68 strains, each from subclades B1, B3, and D1, were selected for complete genome sequencing and recombination analysis. While no evidence of recombination was noted among local strains, interclade recombination was identified in subclade D2 strains detected in mainland China in 2008 with VP2 acquired from clade A. This study supports the reclassification of subclade A2 into clade D1, and demonstrates interclade recombination between clades A and D2 in EV-D68 strains from China.
Phylogeographic Analysis of Mitochondrial DNA in Northern Asian Populations
Derenko, Miroslava ; Malyarchuk, Boris ; Grzybowski, Tomasz ; Denisova, Galina ; Dambueva, Irina ; Perkova, Maria ; Dorzhu, Choduraa ; Luzina, Faina ; Lee, Hong Kyu ; Vanecek, Tomas ; Villems, Richard ; Zakharov, Ilia
2007-01-01
To elucidate the human colonization process of northern Asia and human dispersals to the Americas, a diverse subset of 71 mitochondrial DNA (mtDNA) lineages was chosen for complete genome sequencing from the collection of 1,432 control-region sequences sampled from 18 autochthonous populations of northern, central, eastern, and southwestern Asia. On the basis of complete mtDNA sequencing, we have revised the classification of haplogroups A, D2, G1, M7, and I; identified six new subhaplogroups (I4, N1e, G1c, M7d, M7e, and J1b2a); and fully characterized haplogroups N1a and G1b, which were previously described only by the first hypervariable segment (HVS1) sequencing and coding-region restriction-fragment–length polymorphism analysis. Our findings indicate that the southern Siberian mtDNA pool harbors several lineages associated with the Late Upper Paleolithic and/or early Neolithic dispersals from both eastern Asia and southwestern Asia/southern Caucasus. Moreover, the phylogeography of the D2 lineages suggests that southern Siberia is likely to be a geographical source for the last postglacial maximum spread of this subhaplogroup to northern Siberia and that the expansion of the D2b branch occurred in Beringia ∼7,000 years ago. In general, a detailed analysis of mtDNA gene pools of northern Asians provides the additional evidence to rule out the existence of a northern Asian route for the initial human colonization of Asia. PMID:17924343
Phylogeographic analysis of mitochondrial DNA in northern Asian populations.
Derenko, Miroslava; Malyarchuk, Boris; Grzybowski, Tomasz; Denisova, Galina; Dambueva, Irina; Perkova, Maria; Dorzhu, Choduraa; Luzina, Faina; Lee, Hong Kyu; Vanecek, Tomas; Villems, Richard; Zakharov, Ilia
2007-11-01
To elucidate the human colonization process of northern Asia and human dispersals to the Americas, a diverse subset of 71 mitochondrial DNA (mtDNA) lineages was chosen for complete genome sequencing from the collection of 1,432 control-region sequences sampled from 18 autochthonous populations of northern, central, eastern, and southwestern Asia. On the basis of complete mtDNA sequencing, we have revised the classification of haplogroups A, D2, G1, M7, and I; identified six new subhaplogroups (I4, N1e, G1c, M7d, M7e, and J1b2a); and fully characterized haplogroups N1a and G1b, which were previously described only by the first hypervariable segment (HVS1) sequencing and coding-region restriction-fragment-length polymorphism analysis. Our findings indicate that the southern Siberian mtDNA pool harbors several lineages associated with the Late Upper Paleolithic and/or early Neolithic dispersals from both eastern Asia and southwestern Asia/southern Caucasus. Moreover, the phylogeography of the D2 lineages suggests that southern Siberia is likely to be a geographical source for the last postglacial maximum spread of this subhaplogroup to northern Siberia and that the expansion of the D2b branch occurred in Beringia ~7,000 years ago. In general, a detailed analysis of mtDNA gene pools of northern Asians provides the additional evidence to rule out the existence of a northern Asian route for the initial human colonization of Asia.
STINGRAY: system for integrated genomic resources and analysis.
Wagner, Glauber; Jardim, Rodrigo; Tschoeke, Diogo A; Loureiro, Daniel R; Ocaña, Kary A C S; Ribeiro, Antonio C B; Emmel, Vanessa E; Probst, Christian M; Pitaluga, André N; Grisard, Edmundo C; Cavalcanti, Maria C; Campos, Maria L M; Mattoso, Marta; Dávila, Alberto M R
2014-03-07
The STINGRAY system has been conceived to ease the tasks of integrating, analyzing, annotating and presenting genomic and expression data from Sanger and Next Generation Sequencing (NGS) platforms. STINGRAY includes: (a) a complete and integrated workflow (more than 20 bioinformatics tools) ranging from functional annotation to phylogeny; (b) a MySQL database schema, suitable for data integration and user access control; and (c) a user-friendly graphical web-based interface that makes the system intuitive, facilitating the tasks of data analysis and annotation. STINGRAY showed to be an easy to use and complete system for analyzing sequencing data. While both Sanger and NGS platforms are supported, the system could be faster using Sanger data, since the large NGS datasets could potentially slow down the MySQL database usage. STINGRAY is available at http://stingray.biowebdb.org and the open source code at http://sourceforge.net/projects/stingray-biowebdb/.
Chen, Liang
2017-06-10
Bacillus velezensis LM2303 is a biocontrol strain with a broad inhibitory spectrum against plant pathogens, isolated from the dung of wild yak inhabited Qinghai-Tibet plateau, China. Here we present its complete genome sequence, which consists of a single, circular chromosome of 3,989,393bp with a 46.68% G+C content. Genome analysis revealed genes encoding specialized functions for the biosynthesis of antifungal metabolites and antibacterial metabolites, the promotion of plant growth, the alleviation of oxidative stress and nutrient utilization. And the biosynthesis of antimicrobial metabolites in strain LM2303 was confirmed by biochemical analysis, while its plant growth promoting traits were confirmed by inoculation tests. Our results will establish a better foundation for further studies and biocontrol application of B. velezensis LM2303. Copyright © 2017 Elsevier B.V. All rights reserved.
STINGRAY: system for integrated genomic resources and analysis
2014-01-01
Background The STINGRAY system has been conceived to ease the tasks of integrating, analyzing, annotating and presenting genomic and expression data from Sanger and Next Generation Sequencing (NGS) platforms. Findings STINGRAY includes: (a) a complete and integrated workflow (more than 20 bioinformatics tools) ranging from functional annotation to phylogeny; (b) a MySQL database schema, suitable for data integration and user access control; and (c) a user-friendly graphical web-based interface that makes the system intuitive, facilitating the tasks of data analysis and annotation. Conclusion STINGRAY showed to be an easy to use and complete system for analyzing sequencing data. While both Sanger and NGS platforms are supported, the system could be faster using Sanger data, since the large NGS datasets could potentially slow down the MySQL database usage. STINGRAY is available at http://stingray.biowebdb.org and the open source code at http://sourceforge.net/projects/stingray-biowebdb/. PMID:24606808
Estrada-Gómez, Sebastian; Vargas-Muñoz, Leidy Johana; Saldarriaga-Córdoba, Mónica; Cifuentes, Yeimy; Perafan, Carlos
2017-04-01
Theraphosidae spider venoms are well known for possess a complex mixture of protein and non-protein compounds in their venom. The objective of this study was to report and identify different proteins translated from the venom gland DNA information of the recently described Theraphosidae spider Pamphobeteus verdolaga. Using a venom gland transcriptomic analysis, we reported a set of the first complete sequences of seven different proteins of the recenlty described Theraphosidae spider P. verdolaga. Protein analysis indicates the presence of different proteins on the venom composition of this new spider, some of them uncommon in the Theraphosidae family. MS/MS analysis of P. verdolaga showed different fragments matching sphingomyelinases (sicaritoxin), barytoxins, hexatoxins, latroinsectotoxins, and linear (zadotoxins) peptides. Only four of the MS/MS fragments showed 100% sequence similarity with one of the transcribed proteins. Transcriptomic analysis showed the presence of different groups of proteins like phospholipases, hyaluronidases, inhibitory cysteine knots (ICK) peptides among others. The three database of protein domains used in this study (Pfam, SMART and CDD) showed congruency in the search of unique conserved protein domain for only four of the translated proteins. Those proteins matched with EF-hand proteins, cysteine rich secretory proteins, jingzhaotoxins, theraphotoxins and hexatoxins, from different Mygalomorphae spiders belonging to the families Theraphosidae, Barychelidae and Hexathelidae. None of the analyzed sequences showed a complete 100% similarity. Copyright © 2017 Elsevier Ltd. All rights reserved.
Variation of 45S rDNA intergenic spacers in Arabidopsis thaliana.
Havlová, Kateřina; Dvořáčková, Martina; Peiro, Ramon; Abia, David; Mozgová, Iva; Vansáčová, Lenka; Gutierrez, Crisanto; Fajkus, Jiří
2016-11-01
Approximately seven hundred 45S rRNA genes (rDNA) in the Arabidopsis thaliana genome are organised in two 4 Mbp-long arrays of tandem repeats arranged in head-to-tail fashion separated by an intergenic spacer (IGS). These arrays make up 5 % of the A. thaliana genome. IGS are rapidly evolving sequences and frequent rearrangements inside the rDNA loci have generated considerable interspecific and even intra-individual variability which allows to distinguish among otherwise highly conserved rRNA genes. The IGS has not been comprehensively described despite its potential importance in regulation of rDNA transcription and replication. Here we describe the detailed sequence variation in the complete IGS of A. thaliana WT plants and provide the reference/consensus IGS sequence, as well as genomic DNA analysis. We further investigate mutants dysfunctional in chromatin assembly factor-1 (CAF-1) (fas1 and fas2 mutants), which are known to have a reduced number of rDNA copies, and plant lines with restored CAF-1 function (segregated from a fas1xfas2 genetic background) showing major rDNA rearrangements. The systematic rDNA loss in CAF-1 mutants leads to the decreased variability of the IGS and to the occurrence of distinct IGS variants. We present for the first time a comprehensive and representative set of complete IGS sequences, obtained by conventional cloning and by Pacific Biosciences sequencing. Our data expands the knowledge of the A. thaliana IGS sequence arrangement and variability, which has not been available in full and in detail until now. This is also the first study combining IGS sequencing data with RFLP analysis of genomic DNA.
Comprehensive comparative analysis of kinesins in photosynthetic eukaryotes
Richardson, Dale N; Simmons, Mark P; Reddy, Anireddy SN
2006-01-01
Background Kinesins, a superfamily of molecular motors, use microtubules as tracks and transport diverse cellular cargoes. All kinesins contain a highly conserved ~350 amino acid motor domain. Previous analysis of the completed genome sequence of one flowering plant (Arabidopsis) has resulted in identification of 61 kinesins. The recent completion of genome sequencing of several photosynthetic and non-photosynthetic eukaryotes that belong to divergent lineages offers a unique opportunity to conduct a comprehensive comparative analysis of kinesins in plant and non-plant systems and infer their evolutionary relationships. Results We used the kinesin motor domain to identify kinesins in the completed genome sequences of 19 species, including 13 newly sequenced genomes. Among the newly analyzed genomes, six represent photosynthetic eukaryotes. A total of 529 kinesins was used to perform comprehensive analysis of kinesins and to construct gene trees using the Bayesian and parsimony approaches. The previously recognized 14 families of kinesins are resolved as distinct lineages in our inferred gene tree. At least three of the 14 kinesin families are not represented in flowering plants. Chlamydomonas, a green alga that is part of the lineage that includes land plants, has at least nine of the 14 known kinesin families. Seven of ten families present in flowering plants are represented in Chlamydomonas, indicating that these families were retained in both the flowering-plant and green algae lineages. Conclusion The increase in the number of kinesins in flowering plants is due to vast expansion of the Kinesin-14 and Kinesin-7 families. The Kinesin-14 family, which typically contains a C-terminal motor, has many plant kinesins that have the motor domain at the N terminus, in the middle, or the C terminus. Several domains in kinesins are present exclusively either in plant or animal lineages. Addition of novel domains to kinesins in lineage-specific groups contributed to the functional diversification of kinesins. Results from our gene-tree analyses indicate that there was tremendous lineage-specific duplication and diversification of kinesins in eukaryotes. Since the functions of only a few plant kinesins are reported in the literature, this comprehensive comparative analysis will be useful in designing functional studies with photosynthetic eukaryotes. PMID:16448571
Salem, Nidá M; Golino, Deborah A; Falk, Bryce W; Rowhani, Adib
2008-01-01
The three double-stranded (ds) RNAs were detected in Rosa multiflora plants showing rose spring dwarf (RSD) symptoms. Northern blot analysis revealed three dsRNAs in preparations of both dsRNA and total RNA from R. multiflora plants. The complete sequences of the dsRNAs (referred to as dsRNA 1, dsRNA 2 and dsRNA 3) were determined based on a combination of shotgun cloning of dsRNA cDNAs and reverse transcription-polymerase chain reaction (RT-PCR). The largest dsRNA (dsRNA 1) was 1,762 bp long with a single open reading frame (ORF) that encoded a putative polypeptide containing 479 amino acid residues with a molecular mass of 55.9 kDa. This polypeptide contains amino acid sequence motifs conserved in the RNA-dependent RNA polymerases (RdRp) of members of the family Partitiviridae. Both dsRNA 2 (1,475 bp) and dsRNA 3 (1,384 bp) contained single ORFs, encoding putative proteins of unknown function. The 5' untranslated regions (UTR) of all three segments shared regions of high sequence homology. Phylogenetic analysis using the RdRp sequences of the various partitiviruses revealed that the new sequences would constitute the genome of a virus in family Partitiviridae. This virus would cluster with Fragaria chiloensis cryptic virus and Raphanus sativus cryptic virus 2. We suggest that the three dsRNA segments constitute the genome of a novel cryptic virus infecting roses; we propose the name Rosa multiflora cryptic virus (RMCV). Detection primers were developed and used for RT-PCR detection of RMCV in rose plants.
Complete Amino Acid Sequence of a Copper/Zinc-Superoxide Dismutase from Ginger Rhizome.
Nishiyama, Yuki; Fukamizo, Tamo; Yoneda, Kazunari; Araki, Tomohiro
2017-04-01
Superoxide dismutase (SOD) is an antioxidant enzyme protecting cells from oxidative stress. Ginger (Zingiber officinale) is known for its antioxidant properties, however, there are no data on SODs from ginger rhizomes. In this study, we purified SOD from the rhizome of Z. officinale (Zo-SOD) and determined its complete amino acid sequence using N terminal sequencing, amino acid analysis, and de novo sequencing by tandem mass spectrometry. Zo-SOD consists of 151 amino acids with two signature Cu/Zn-SOD motifs and has high similarity to other plant Cu/Zn-SODs. Multiple sequence alignment showed that Cu/Zn-binding residues and cysteines forming a disulfide bond, which are highly conserved in Cu/Zn-SODs, are also present in Zo-SOD. Phylogenetic analysis revealed that plant Cu/Zn-SODs clustered into distinct chloroplastic, cytoplasmic, and intermediate groups. Among them, only chloroplastic enzymes carried amino acid substitutions in the region functionally important for enzymatic activity, suggesting that chloroplastic SODs may have a function distinct from those of SODs localized in other subcellular compartments. The nucleotide sequence of the Zo-SOD coding region was obtained by reverse-translation, and the gene was synthesized, cloned, and expressed. The recombinant Zo-SOD demonstrated pH stability in the range of 5-10, which is similar to other reported Cu/Zn-SODs, and thermal stability in the range of 10-60 °C, which is higher than that for most plant Cu/Zn-SODs but lower compared to the enzyme from a Z. officinale relative Curcuma aromatica.
Kemper, Jenny M; Naylor, Gavin J P
2016-11-01
We present the complete mitochondrial genome sequence (16 555 bp) of the Philippines spurdog, Squalus montalbani, currently listed as Vulnerable due to population declines and fishing pressures. A phylogenetic analysis was carried out on S. montalbani and representative shark mitogenomes. Squalus montalbani was placed within the Squaliformes as a sister taxon to Squalus acanthias and Cirrhigaleus australis.
USDA-ARS?s Scientific Manuscript database
Salmonella enterica subsp. enterica are a versatile group of bacteria with a wide range of variation in virulence potential. Complete S. enterica genome sequences available to date are primarily of strains isolated from humans or of serotypes that commonly cause human disease. To facilitate genomic ...
Complete genome sequence of a novel avian paramyxovirus isolated from wild birds in South Korea.
Jeong, Jipseol; Kim, Youngsik; An, Injung; Wang, Seung-Jun; Kim, Yongkwan; Lee, Hyun-Jeong; Choi, Kang-Seuk; Im, Se-Pyeong; Min, Wongi; Oem, Jae-Ku; Jheong, Weonhwa
2018-01-01
A novel avian paramyxovirus (APMV), Cheonsu1510, was isolated from wild bird feces in South Korea and serologically and genetically characterized. In hemagglutination inhibition tests, antiserum against Cheonsu1510 showed low reactivity with other APMVs and vice versa. The complete genome of Cheonsu1510 comprised 15,408 nucleotides, contained six open reading frames (3'-N-P-M-F-HN-L-5'), and showed low sequence identity to other APMVs (< 63%) and a unique genomic composition. Phylogenetic analysis revealed that Cheonsu1510 was related to but distinct from APMV-1, -9, and -15. These results suggest that Cheonsu1510 represents a new APMV serotype, APMV-17.
Fumoto, Masaki; Miyazaki, Satoru; Sugawara, Hideaki
2002-01-01
Genome Information Broker (GIB) is a powerful tool for the study of comparative genomics. GIB allows users to retrieve and display partial and/or whole genome sequences together with the relevant biological annotation. GIB has accumulated all the completed microbial genome and has recently been expanded to include Arabidopsis thaliana genome data from DDBJ/EMBL/GenBank. In the near future, hundreds of genome sequences will be determined. In order to handle such huge data, we have enhanced the GIB architecture by using XML, CORBA and distributed RDBs. We introduce the new GIB here. GIB is freely accessible at http://gib.genes.nig.ac.jp/. PMID:11752256
The nucleotide sequence and genome organization of Plasmopara halstedii virus.
Heller-Dohmen, Marion; Göpfert, Jens C; Pfannstiel, Jens; Spring, Otmar
2011-03-17
Only very few viruses of Oomycetes have been studied in detail. Isometric virions were found in different isolates of the oomycete Plasmopara halstedii, the downy mildew pathogen of sunflower. However, complete nucleotide sequences and data on the genome organization were lacking. Viral RNA of different P. halstedii isolates was subjected to nucleotide sequencing and analysis of the viral genome. The N-terminal sequence of the viral coat protein was determined using Top-Down MALDI-TOF analysis. The complete nucleotide sequences of both single-stranded RNA segments (RNA1 and RNA2) were established. RNA1 consisted of 2793 nucleotides (nt) exclusive its 3' poly(A) tract and a single open-reading frame (ORF1) of 2745 nt. ORF1 was framed by a 5' untranslated region (5' UTR) of 18 nt and a 3' untranslated region (3' UTR) of 30 nt. ORF1 contained motifs of RNA-dependent RNA polymerases (RdRp) and showed similarities to RdRp of Scleropthora macrospora virus A (SmV A) and viruses within the Nodaviridae family. RNA2 consisted of 1526 nt exclusive its 3' poly(A) tract and a second ORF (ORF2) of 1128 nt. ORF2 coded for the single viral coat protein (CP) and was framed by a 5' UTR of 164 nt and a 3' UTR of 234 nt. The deduced amino acid sequence of ORF2 was verified by nano-LC-ESI-MS/MS experiments. Top-Down MALDI-TOF analysis revealed the N-terminal sequence of the CP. The N-terminal sequence represented a region within ORF2 suggesting a proteolytic processing of the CP in vivo. The CP showed similarities to CP of SmV A and viruses within the Tombusviridae family. Fragments of RNA1 (ca. 1.9 kb) and RNA2 (ca. 1.4 kb) were used to analyze the nucleotide sequence variation of virions in different P. halstedii isolates. Viral sequence variation was 0.3% or less regardless of their host's pathotypes, the geographical origin and the sensitivity towards the fungicide metalaxyl. The results showed the presence of a single and new virus type in different P. halstedii isolates. Insignificant viral sequence variation indicated that the virus did not account for differences in pathogenicity of the oomycete P. halstedii.
Reads2Type: a web application for rapid microbial taxonomy identification.
Saputra, Dhany; Rasmussen, Simon; Larsen, Mette V; Haddad, Nizar; Sperotto, Maria Maddalena; Aarestrup, Frank M; Lund, Ole; Sicheritz-Pontén, Thomas
2015-11-25
Identification of bacteria may be based on sequencing and molecular analysis of a specific locus such as 16S rRNA, or a set of loci such as in multilocus sequence typing. In the near future, healthcare institutions and routine diagnostic microbiology laboratories may need to sequence the entire genome of microbial isolates. Therefore we have developed Reads2Type, a web-based tool for taxonomy identification based on whole bacterial genome sequence data. Raw sequencing data provided by the user are mapped against a set of marker probes that are derived from currently available bacteria complete genomes. Using a dataset of 1003 whole genome sequenced bacteria from various sequencing platforms, Reads2Type was able to identify the species with 99.5 % accuracy and on the minutes time scale. In comparison with other tools, Reads2Type offers the advantage of not needing to transfer sequencing files, as the entire computational analysis is done on the computer of whom utilizes the web application. This also prevents data privacy issues to arise. The Reads2Type tool is available at http://www.cbs.dtu.dk/~dhany/reads2type.html.
Microsatellite analysis in the genome of Acanthaceae: An in silico approach.
Kaliswamy, Priyadharsini; Vellingiri, Srividhya; Nathan, Bharathi; Selvaraj, Saravanakumar
2015-01-01
Acanthaceae is one of the advanced and specialized families with conventionally used medicinal plants. Simple sequence repeats (SSRs) play a major role as molecular markers for genome analysis and plant breeding. The microsatellites existing in the complete genome sequences would help to attain a direct role in the genome organization, recombination, gene regulation, quantitative genetic variation, and evolution of genes. The current study reports the frequency of microsatellites and appropriate markers for the Acanthaceae family genome sequences. The whole nucleotide sequences of Acanthaceae species were obtained from National Center for Biotechnology Information database and screened for the presence of SSRs. SSR Locator tool was used to predict the microsatellites and inbuilt Primer3 module was used for primer designing. Totally 110 repeats from 108 sequences of Acanthaceae family plant genomes were identified, and the occurrence of dinucleotide repeats was found to be abundant in the genome sequences. The essential amino acid isoleucine was found rich in all the sequences. We also designed the SSR-based primers/markers for 59 sequences of this family that contains microsatellite repeats in their genome. The identified microsatellites and primers might be useful for breeding and genetic studies of plants that belong to Acanthaceae family in the future.
Gautum, K K; Raj, R; Kumar, S; Raj, S K; Roy, R K; Katiyar, R
2014-01-01
The complete RNA3 genome of Cucumber mosaic virus (CMV) was amplified by RT-PCR from three infected gerbera (Gerbera jamesonii) leaf samples exhibiting severe chlorotic mosaic and flower deformation symptoms. The amplicons obtained were cloned sequenced and deposited in GenBank under the accessions JN692495, JX913531 (from cv. Zingaro) and JX888093 (from cv. Silvester). These sequences shared 98-99 % identities to each other and with a strain of CMV-Banana reported from India, and 90-95 % identities with various strains of CMV reported worldwide. Phylogenetic analysis revealed their closest affinity with CMV-Banana strain, and close relationships with several other strains of CMV of subgroup IB. This study provides evidence of subgroup IB CMV causing severe chlorosis and flower deformation in two cultivars (Zingaro and Silvester) of G. jamesonii in India.
Fu, Xiao-Zhe; Shi, Cun-Bin; Li, Ning-Qiu; Pan, Hou-Jun; Chang, Ou-Qin; Wu, Shu-Qin
2007-09-01
The major capsid protein of lymphocystis disease virus isolated from Rachycentron canadum (LCDV-rc) was amplified and analysed. The 457bp DNA core fragment was amplified with the degenerate primers designed according to the conserved sequences of MCP gene of iridoviruses, then the flaking sequences adjacent to the core region were amplified by inverse PCR, and the complete sequence was obtained by combining all of them. The open reading frame of the gene is 1380bp in length, encoding a putative protein of 459 aa with molecular weight 51.12 kD and pI 6.87. Constructing the phylogenetic tree for comparing the MCP amino acid of iridoviruses, the results indicated that LCDV-rc is most homologous to the other Lymphocystis viruses and all of them constitute a branch. Accordingly LCDV-rc is identified as Lymphocystivirus.
Bhore, Subhash J; Kassim, Amelia; Loh, Chye Ying; Shah, Farida H
2010-01-01
It is well known that the nutritional quality of the American oil-palm (Elaeis oleifera) mesocarp oil is superior to that of African oil-palm (Elaeis guineensis Jacq. Tenera) mesocarp oil. Therefore, it is of important to identify the genetic features for its superior value. This could be achieved through the genome sequencing of the oil-palm. However, the genome sequence is not available in the public domain due to commercial secrecy. Hence, we constructed a cDNA library and generated expressed sequence tags (3,205) from the mesocarp tissue of the American oil-palm. We continued to annotate each of these cDNAs after submitting to GenBank/DDBJ/EMBL. A rough analysis turned our attention to the beta-carotene hydroxylase (Chyb) enzyme encoding cDNA. Then, we completed the full sequencing of cDNA clone for its both strands using M13 forward and reverse primers. The full nucleotide and protein sequence was further analyzed and annotated using various Bioinformatics tools. The analysis results showed the presence of fatty acid hydroxylase superfamily domain in the protein sequence. The multiple sequence alignment of selected Chyb amino acid sequences from other plant species and algal members with E. oleifera Chyb using ClustalW and its phylogenetic analysis suggest that Chyb from monocotyledonous plant species, Lilium hubrid, Crocus sativus and Zea mays are the most evolutionary related with E. oleifera Chyb. This study reports the annotation of E. oleifera Chyb. Abbreviations ESTs - expressed sequence tags, EoChyb - Elaeis oleifera beta-carotene hydroxylase, MC - main cluster PMID:21364789
Takeda, Jun-ichi; Suzuki, Yutaka; Nakao, Mitsuteru; Barrero, Roberto A.; Koyanagi, Kanako O.; Jin, Lihua; Motono, Chie; Hata, Hiroko; Isogai, Takao; Nagai, Keiichi; Otsuki, Tetsuji; Kuryshev, Vladimir; Shionyu, Masafumi; Yura, Kei; Go, Mitiko; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Wiemann, Stefan; Nomura, Nobuo; Sugano, Sumio; Gojobori, Takashi; Imanishi, Tadashi
2006-01-01
We report the first genome-wide identification and characterization of alternative splicing in human gene transcripts based on analysis of the full-length cDNAs. Applying both manual and computational analyses for 56 419 completely sequenced and precisely annotated full-length cDNAs selected for the H-Invitational human transcriptome annotation meetings, we identified 6877 alternative splicing genes with 18 297 different alternative splicing variants. A total of 37 670 exons were involved in these alternative splicing events. The encoded protein sequences were affected in 6005 of the 6877 genes. Notably, alternative splicing affected protein motifs in 3015 genes, subcellular localizations in 2982 genes and transmembrane domains in 1348 genes. We also identified interesting patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or having overlapping protein coding sequences (CDSs) of different reading frames (multiple CDS). In these cases, completely unrelated proteins are encoded by a single locus. Genome-wide annotations of alternative splicing, relying on full-length cDNAs, should lay firm groundwork for exploring in detail the diversification of protein function, which is mediated by the fast expanding universe of alternative splicing variants. PMID:16914452
Novosel, D; Tuboly, T; Csagola, A; Lorincz, M; Cubric-Curik, V; Jungic, A; Curik, I; Segalés, J; Cortey, M; Lipej, Z
2014-04-26
Porcine circovirus type 2 (PCV2) causes some of the most significant economic losses in pig production. Several multisystemic syndromes have been attributed to PCV2 infection, which are known as PCV2-associated diseases (PCVDs). This study investigated the origin and evolution of PCV2 sequences in domestic pigs and wild boars affected by PCVDs in Croatia. Viral sequences were recovered from three wild boars diagnosed with PCV2-systemic disease (PCV2-SD), 63 fetuses positive for PCV2 DNA as determined by PCR, 14 domestic pigs affected with PCV2-SD (displaying severe interstitial nephritis) and five domestic pigs with proliferative and necrotising pneumonia. Seventeen complete PCV2 genomes were recovered. Phylogenetic and evolutionary analyses based on median-joining phylogenetic networks, amino acid alignments and principal coordinate analysis were performed using complete genomes, as well as complete and partial ORF sequences for ORF1 and ORF2. Two of the 17 PCV2 sequences belonged to PCV2a, 14 to PCV2b and one was unclustered. PCV2b was the predominant genotype in Croatia and has been linked to international trade as a route of introduction. Correlation between particular viral strains with PCVDs is lacking.
Comparative Genomics of the Balsaminaceae Sister Genera Hydrocera triflora and Impatiens pinfanensis
Li, Zhi-Zhong; Saina, Josphat K.; Gichira, Andrew W.; Kyalo, Cornelius M.; Wang, Qing-Feng
2018-01-01
The family Balsaminaceae, which consists of the economically important genus Impatiens and the monotypic genus Hydrocera, lacks a reported or published complete chloroplast genome sequence. Therefore, chloroplast genome sequences of the two sister genera are significant to give insight into the phylogenetic position and understanding the evolution of the Balsaminaceae family among the Ericales. In this study, complete chloroplast (cp) genomes of Impatiens pinfanensis and Hydrocera triflora were characterized and assembled using a high-throughput sequencing method. The complete cp genomes were found to possess the typical quadripartite structure of land plants chloroplast genomes with double-stranded molecules of 154,189 bp (Impatiens pinfanensis) and 152,238 bp (Hydrocera triflora) in length. A total of 115 unique genes were identified in both genomes, of which 80 are protein-coding genes, 31 are distinct transfer RNA (tRNA) and four distinct ribosomal RNA (rRNA). Thirty codons, of which 29 had A/T ending codons, revealed relative synonymous codon usage values of >1, whereas those with G/C ending codons displayed values of <1. The simple sequence repeats comprise mostly the mononucleotide repeats A/T in all examined cp genomes. Phylogenetic analysis based on 51 common protein-coding genes indicated that the Balsaminaceae family formed a lineage with Ebenaceae together with all the other Ericales. PMID:29360746
Lu, You; Samac, Deborah A; Glazebrook, Jane; Ishimaru, Carol A
2015-05-07
We report here the complete genome sequence of Clavibacter michiganensis subsp. insidiosus R1-1, isolated in Minnesota, USA. The R1-1 genome, generated by a de novo assembly of PacBio sequencing data, is the first complete genome sequence available for this subspecies. Copyright © 2015 Lu et al.
First Complete Squash leaf curl China virus Genomic Segment DNA-A Sequence from East Timor
Maina, Solomon; Edwards, Owain R.; de Almeida, Luis; Ximenes, Abel
2017-01-01
ABSTRACT We present here the first complete Squash leaf curl China virus (SLCCV) genomic segment DNA-A sequence from East Timor. It was isolated from a pumpkin plant. When compared with 15 complete SLCCV DNA-A genome sequences from other world regions, it most resembled the Malaysian isolate MC1 sequence. PMID:28619789
Sequence Search and Comparative Genomic Analysis of SUMO-Activating Enzymes Using CoGe.
Carretero-Paulet, Lorenzo; Albert, Victor A
2016-01-01
The growing number of genome sequences completed during the last few years has made necessary the development of bioinformatics tools for the easy access and retrieval of sequence data, as well as for downstream comparative genomic analyses. Some of these are implemented as online platforms that integrate genomic data produced by different genome sequencing initiatives with data mining tools as well as various comparative genomic and evolutionary analysis possibilities.Here, we use the online comparative genomics platform CoGe ( http://www.genomevolution.org/coge/ ) (Lyons and Freeling. Plant J 53:661-673, 2008; Tang and Lyons. Front Plant Sci 3:172, 2012) (1) to retrieve the entire complement of orthologous and paralogous genes belonging to the SUMO-Activating Enzymes 1 (SAE1) gene family from a set of species representative of the Brassicaceae plant eudicot family with genomes fully sequenced, and (2) to investigate the history, timing, and molecular mechanisms of the gene duplications driving the evolutionary expansion and functional diversification of the SAE1 family in Brassicaceae.
Genome features of moderately halophilic polyhydroxyalkanoate-producing Yangia sp. CCB-MM3.
Lau, Nyok-Sean; Sam, Ka-Kei; Amirul, Abdullah Al-Ashraf
2017-01-01
Yangia sp. CCB-MM3 was one of several halophilic bacteria isolated from soil sediment in the estuarine Matang Mangrove, Malaysia. So far, no member from the genus Yangia , a member of the Rhodobacteraceae family, has been reported sequenced. In the current study, we present the first complete genome sequence of Yangia sp. strain CCB-MM3. The genome includes two chromosomes and five plasmids with a total length of 5,522,061 bp and an average GC content of 65%. Since a different strain of Yangia sp. (ND199) was reported to produce a polyhydroxyalkanoate copolymer, the ability for this production was tested in vitro and confirmed for strain CCB-MM3. Analysis of its genome sequence confirmed presence of a pathway for production of propionyl-CoA and gene cluster for PHA production in the sequenced strain. The genome sequence described will be a useful resource for understanding the physiology and metabolic potential of Yangia as well as for comparative genomic analysis with other Rhodobacteraceae .
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peters, J.; Peters, M.; Lottspeich, F.
1987-11-01
The complete nucleotide sequence of the gene encoding the surface (hexagonally packed intermediate (HPI))-layer polypeptide of Deinococcus radiodurans Sark was determined and found to encode a polypeptide of 1036 amino acids. Amino acid sequence analysis of about 30% of the residues revealed that the mature polypeptide consists of at least 978 amino acids. The N terminus was blocked to Edman degradation. The results of proteolytic modification of the HPI layer in situ and M/sub r/ estimations of the HPI polypeptide expressed in Escherichia coli indicated that there is a leader sequence. The N-terminal region contained a very high percentage (29%)more » of threonine and serine, including a cluster of nine consecutive serine or threonine residues, whereas a stretch near the C terminus was extremely rich in aromatic amino acids (29%). The protein contained at least two disulfide bridges, as well as tightly bound reducing sugars and fatty acids.« less
Analysis of the Genome and Chromium Metabolism-Related Genes of Serratia sp. S2.
Dong, Lanlan; Zhou, Simin; He, Yuan; Jia, Yan; Bai, Qunhua; Deng, Peng; Gao, Jieying; Li, Yingli; Xiao, Hong
2018-05-01
This study is to investigate the genome sequence of Serratia sp. S2. The genomic DNA of Serratia sp. S2 was extracted and the sequencing library was constructed. The sequencing was carried out by Illumina 2000 and complete genomic sequences were obtained. Gene function annotation and bioinformatics analysis were performed by comparing with the known databases. The genome size of Serratia sp. S2 was 5,604,115 bp and the G+C content was 57.61%. There were 5373 protein coding genes, and 3732, 3614, and 3942 genes were respectively annotated into the GO, KEGG, and COG databases. There were 12 genes related to chromium metabolism in the Serratia sp. S2 genome. The whole genome sequence of Serratia sp. S2 is submitted to the GenBank database with gene accession number of LNRP00000000. Our findings may provide theoretical basis for the subsequent development of new biotechnology to repair environmental chromium pollution.
Beet western yellows virus infects the carnivorous plant Nepenthes mirabilis.
Miguel, Sissi; Biteau, Flore; Mignard, Benoit; Marais, Armelle; Candresse, Thierry; Theil, Sébastien; Bourgaud, Frédéric; Hehn, Alain
2016-08-01
Although poleroviruses are known to infect a broad range of higher plants, carnivorous plants have not yet been reported as hosts. Here, we describe the first polerovirus naturally infecting the pitcher plant Nepenthes mirabilis. The virus was identified through bioinformatic analysis of NGS transcriptome data. The complete viral genome sequence was assembled from overlapping PCR fragments and shown to share 91.1 % nucleotide sequence identity with the US isolate of beet western yellows virus (BWYV). Further analysis of other N. mirabilis plants revealed the presence of additional BWYV isolates differing by several insertion/deletion mutations in ORF5.
Vaidya, Sunil R; Chowdhury, Deepika T; Jadhav, Santoshkumar M; Hamde, Venkat S
2016-04-01
Limited information is available regarding epidemiology of mumps in India. Mumps vaccine is not included in the Universal Immunization Program of India. The complete genome sequences of Indian mumps virus (MuV) isolates are not available, hence this study was performed. Five isolates from bilateral parotitis and pancreatitis patients from Maharashtra, a MuV isolate from unilateral parotitis patient from Tamil Nadu, and a MuV isolate from encephalitis patient from Uttar Pradesh were genotyped by the standard protocol of the World Health Organization and subsequently complete genomes were sequenced. Indian MuV genomes were compared with published MuV genomes, including reference genotypes and eight vaccine strains for the genetic differences. The SH gene analysis revealed that five MuV isolates belonged to genotype C and two belonged to genotype G strains. The percent nucleotide divergence (PND) was 1.1% amongst five MuV genotype C strains and 2.2% amongst two MuV genotype G strains. A comparison with widely used mumps Jeryl Lynn vaccine strain revealed that Indian mumps isolates had 54, 54, 53, 49, 49, 38, and 49 amino acid substitutions in Chennai-2012, Kushinagar-2013, Pune-2008, Osmanabad-2012a, Osmanabad-2012b, Pune-1986 and Pune-2012, respectively. This study reports the complete genome sequences of Indian MuV strains obtained in years 1986, 2008, 2012 and 2013 that may be useful for further studies in India and globally. Copyright © 2016 Elsevier B.V. All rights reserved.
Congruence analysis of point clouds from unstable stereo image sequences
NASA Astrophysics Data System (ADS)
Jepping, C.; Bethmann, F.; Luhmann, T.
2014-06-01
This paper deals with the correction of exterior orientation parameters of stereo image sequences over deformed free-form surfaces without control points. Such imaging situation can occur, for example, during photogrammetric car crash test recordings where onboard high-speed stereo cameras are used to measure 3D surfaces. As a result of such measurements 3D point clouds of deformed surfaces are generated for a complete stereo sequence. The first objective of this research focusses on the development and investigation of methods for the detection of corresponding spatial and temporal tie points within the stereo image sequences (by stereo image matching and 3D point tracking) that are robust enough for a reliable handling of occlusions and other disturbances that may occur. The second objective of this research is the analysis of object deformations in order to detect stable areas (congruence analysis). For this purpose a RANSAC-based method for congruence analysis has been developed. This process is based on the sequential transformation of randomly selected point groups from one epoch to another by using a 3D similarity transformation. The paper gives a detailed description of the congruence analysis. The approach has been tested successfully on synthetic and real image data.
Han, Limin; Chen, Chen; Wang, Zhezhi
2018-01-01
Epipremnum aureum is an important foliage plant in the Araceae family. In this study, we have sequenced the complete chloroplast genome of E. aureum by using Illumina Hiseq sequencing platforms. This genome is a double-stranded circular DNA sequence of 164,831 bp that contains 35.8% GC. The two inverted repeats (IRa and IRb; 26,606 bp) are spaced by a small single-copy region (22,868 bp) and a large single-copy region (88,751 bp). The chloroplast genome has 131 (113 unique) functional genes, including 86 (79 unique) protein-coding genes, 37 (30 unique) tRNA genes, and eight (four unique) rRNA genes. Tandem repeats comprise the majority of the 43 long repetitive sequences. In addition, 111 simple sequence repeats are present, with mononucleotides being the most common type and di- and tetranucleotides being infrequent events. Positive selection pressure on rps12 in the E. aureum chloroplast has been demonstrated via synonymous and nonsynonymous substitution rates and selection pressure sites analyses. Ycf15 and infA are pseudogenes in this species. We constructed a Maximum Likelihood phylogenetic tree based on the complete chloroplast genomes of 38 species from 13 families. Those results strongly indicated that E. aureum is positioned as the sister of Colocasia esculenta within the Araceae family. This work may provide information for further study of the molecular phylogenetic relationships within Araceae, as well as molecular markers and breeding novel varieties by chloroplast genetic-transformation of E. aureum in particular. PMID:29529038
TOPICAL REVIEW: Integrated genetic analysis microsystems
NASA Astrophysics Data System (ADS)
Lagally, Eric T.; Mathies, Richard A.
2004-12-01
With the completion of the Human Genome Project and the ongoing DNA sequencing of the genomes of other animals, bacteria, plants and others, a wealth of new information about the genetic composition of organisms has become available. However, as the demand for sequence information grows, so does the workload required both to generate this sequence and to use it for targeted genetic analysis. Microfabricated genetic analysis systems are well poised to assist in the collection and use of these data through increased analysis speed, lower analysis cost and higher parallelism leading to increased assay throughput. In addition, such integrated microsystems may point the way to targeted genetic experiments on single cells and in other areas that are otherwise very difficult. Concomitant with these advantages, such systems, when fully integrated, should be capable of forming portable systems for high-speed in situ analyses, enabling a new standard in disciplines such as clinical chemistry, forensics, biowarfare detection and epidemiology. This review will discuss the various technologies available for genetic analysis on the microscale, and efforts to integrate them to form fully functional robust analysis devices.
Yu, Ziniu; Wei, Zhengpeng; Kong, Xiaoyu; Shi, Wei
2008-01-01
Background Mitochondrial DNA sequences are extensively used as genetic markers not only for studies of population or ecological genetics, but also for phylogenetic and evolutionary analyses. Complete mt-sequences can reveal information about gene order and its variation, as well as gene and genome evolution when sequences from multiple phyla are compared. Mitochondrial gene order is highly variable among mollusks, with bivalves exhibiting the most variability. Of the 41 complete mt genomes sequenced so far, 12 are from bivalves. We determined, in the current study, the complete mitochondrial DNA sequence of Crassostrea hongkongensis. We present here an analysis of features of its gene content and genome organization in comparison with two other Crassostrea species to assess the variation within bivalves and among main groups of mollusks. Results The complete mitochondrial genome of C. hongkongensis was determined using long PCR and a primer walking sequencing strategy with genus-specific primers. The genome is 16,475 bp in length and contains 12 protein-coding genes (the atp8 gene is missing, as in most bivalves), 22 transfer tRNA genes (including a suppressor tRNA gene), and 2 ribosomal RNA genes, all of which appear to be transcribed from the same strand. A striking finding of this study is that a DNA segment containing four tRNA genes (trnk1, trnC, trnQ1 and trnN) and two duplicated or split rRNA gene (rrnL5' and rrnS) are absent from the genome, when compared with that of two other extant Crassostrea species, which is very likely a consequence of loss of a single genomic region present in ancestor of C. hongkongensis. It indicates this region seem to be a "hot spot" of genomic rearrangements over the Crassostrea mt-genomes. The arrangement of protein-coding genes in C. hongkongensis is identical to that of Crassostrea gigas and Crassostrea virginica, but higher amino acid sequence identities are shared between C. hongkongensis and C. gigas than between other pairs. There exists significant codon bias, favoring codons ending in A or T and against those ending with C. Pair analysis of genome rearrangements showed that the rearrangement distance is great between C. gigas-C. hongkongensis and C. virginica, indicating a high degree of rearrangements within Crassostrea. The determination of complete mt-genome of C. hongkongensis has yielded useful insight into features of gene order, variation, and evolution of Crassostrea and bivalve mt-genomes. Conclusion The mt-genome of C. hongkongensis shares some similarity with, and interesting differences to, other Crassostrea species and bivalves. The absence of trnC and trnN genes and duplicated or split rRNA genes from the C. hongkongensis genome is a completely novel feature not previously reported in Crassostrea species. The phenomenon is likely due to the loss of a segment that is present in other Crassostrea species and was present in ancestor of C. hongkongensis, thus a case of "tandem duplication-random loss (TDRL)". The mt-genome and new feature presented here reveal and underline the high level variation of gene order and gene content in Crassostrea and bivalves, inspiring more research to gain understanding to mechanisms underlying gene and genome evolution in bivalves and mollusks. PMID:18847502
Comparing sequencing assays and human-machine analyses in actionable genomics for glioblastoma.
Wrzeszczynski, Kazimierz O; Frank, Mayu O; Koyama, Takahiko; Rhrissorrakrai, Kahn; Robine, Nicolas; Utro, Filippo; Emde, Anne-Katrin; Chen, Bo-Juen; Arora, Kanika; Shah, Minita; Vacic, Vladimir; Norel, Raquel; Bilal, Erhan; Bergmann, Ewa A; Moore Vogel, Julia L; Bruce, Jeffrey N; Lassman, Andrew B; Canoll, Peter; Grommes, Christian; Harvey, Steve; Parida, Laxmi; Michelini, Vanessa V; Zody, Michael C; Jobanputra, Vaidehi; Royyuru, Ajay K; Darnell, Robert B
2017-08-01
To analyze a glioblastoma tumor specimen with 3 different platforms and compare potentially actionable calls from each. Tumor DNA was analyzed by a commercial targeted panel. In addition, tumor-normal DNA was analyzed by whole-genome sequencing (WGS) and tumor RNA was analyzed by RNA sequencing (RNA-seq). The WGS and RNA-seq data were analyzed by a team of bioinformaticians and cancer oncologists, and separately by IBM Watson Genomic Analytics (WGA), an automated system for prioritizing somatic variants and identifying drugs. More variants were identified by WGS/RNA analysis than by targeted panels. WGA completed a comparable analysis in a fraction of the time required by the human analysts. The development of an effective human-machine interface in the analysis of deep cancer genomic datasets may provide potentially clinically actionable calls for individual patients in a more timely and efficient manner than currently possible. NCT02725684.
Gupta, R C; Randerath, E; Randerath, K
1976-01-01
A double-labeling procedure for sequence analysis of nonradioactive polyribonucleotides is detailed, which is based on controlled endonucleolytic degradation of 3'-terminally (3H)-labeled oligonucleotide-(3') dialcohols and 5"-terminal analysis of the partial (3H)-labeled fragments following their separation according to chain length by polyethyleneimine- (PEI-)cellulose TLC and detection by fluorography. Undesired nonradioactive partial digestion products are eliminated by periodate oxidation. The 5'-termini are assayed by enzymic incorporation of (32p)-label into the isolated fragments, enzymic release of (32p)-labeled nucleoside-(5') monophosphates, two-dimensional PEI-cellulose chromatography, and autoradiography. Using this procedure, as little as 0.1 - 0.3 A260 unit of tRNA is needed to sequence all fragments in complete ribonuclease T1 and A digests, whereas radioactive derivative methods previously described by us1-4 required 4 - 6 A260 units. Images PMID:826884
Zarkasi, Kamarul Zaman; Taylor, Richard S; Glencross, Brett D; Abell, Guy C J; Tamplin, Mark L; Bowman, John P
2017-10-01
In this study, microbial community dynamics were assessed within a simple in vitro model system in order to understand those changes influenced by diet. The abundance and diversity of bacteria were monitored within different treatment slurries inoculated with salmon faecal samples in order to mimic the effects of dietary variables. A total of five complete diets and two ingredients (plant meal) were tested. The total viable counts (TVCs) and sequencing data revealed that there was very clear separation between the complete diets and the plant meal treatments, suggesting a dynamic response by the allochthonous bacteria to the treatments. Automated ribosomal intergenic spacer analysis (ARISA) results showed that different diet formulations produced different patterns of fragments, with no separation between the complete diets. However, plant-based protein ingredients were clearly separated from the other treatments. 16S rRNA Illumina-based sequencing analysis showed that members of the genera Aliivibrio, Vibrio and Photobacterium became predominant for all complete diets treatments. The plant-based protein ingredient treatments only sustained weak growth of the genus Sphingomonas. In vitro based testing of diets could be a useful strategy to determine the potential impact of either complete feeds or ingredients on major fish gastrointestinal tract microbiome members. Copyright © 2017 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.
Lee, Seung-Bum; Kaittanis, Charalambos; Jansen, Robert K; Hostetler, Jessica B; Tallon, Luke J; Town, Christopher D; Daniell, Henry
2006-01-01
Background Cotton (Gossypium hirsutum) is the most important fiber crop grown in 90 countries. In 2004–2005, US farmers planted 79% of the 5.7-million hectares of nuclear transgenic cotton. Unfortunately, genetically modified cotton has the potential to hybridize with other cultivated and wild relatives, resulting in geographical restrictions to cultivation. However, chloroplast genetic engineering offers the possibility of containment because of maternal inheritance of transgenes. The complete chloroplast genome of cotton provides essential information required for genetic engineering. In addition, the sequence data were used to assess phylogenetic relationships among the major clades of rosids using cotton and 25 other completely sequenced angiosperm chloroplast genomes. Results The complete cotton chloroplast genome is 160,301 bp in length, with 112 unique genes and 19 duplicated genes within the IR, containing a total of 131 genes. There are four ribosomal RNAs, 30 distinct tRNA genes and 17 intron-containing genes. The gene order in cotton is identical to that of tobacco but lacks rpl22 and infA. There are 30 direct and 24 inverted repeats 30 bp or longer with a sequence identity ≥ 90%. Most of the direct repeats are within intergenic spacer regions, introns and a 72 bp-long direct repeat is within the psaA and psaB genes. Comparison of protein coding sequences with expressed sequence tags (ESTs) revealed nucleotide substitutions resulting in amino acid changes in ndhC, rpl23, rpl20, rps3 and clpP. Phylogenetic analysis of a data set including 61 protein-coding genes using both maximum likelihood and maximum parsimony were performed for 28 taxa, including cotton and five other angiosperm chloroplast genomes that were not included in any previous phylogenies. Conclusion Cotton chloroplast genome lacks rpl22 and infA and contains a number of dispersed direct and inverted repeats. RNA editing resulted in amino acid changes with significant impact on their hydropathy. Phylogenetic analysis provides strong support for the position of cotton in the Malvales in the eurosids II clade sister to Arabidopsis in the Brassicales. Furthermore, there is strong support for the placement of the Myrtales sister to the eurosid I clade, although expanded taxon sampling is needed to further test this relationship. PMID:16553962
The complete genome sequence of the acarbose producer Actinoplanes sp. SE50/110
2012-01-01
Background Actinoplanes sp. SE50/110 is known as the wild type producer of the alpha-glucosidase inhibitor acarbose, a potent drug used worldwide in the treatment of type-2 diabetes mellitus. As the incidence of diabetes is rapidly rising worldwide, an ever increasing demand for diabetes drugs, such as acarbose, needs to be anticipated. Consequently, derived Actinoplanes strains with increased acarbose yields are being used in large scale industrial batch fermentation since 1990 and were continuously optimized by conventional mutagenesis and screening experiments. This strategy reached its limits and is generally superseded by modern genetic engineering approaches. As a prerequisite for targeted genetic modifications, the complete genome sequence of the organism has to be known. Results Here, we present the complete genome sequence of Actinoplanes sp. SE50/110 [GenBank:CP003170], the first publicly available genome of the genus Actinoplanes, comprising various producers of pharmaceutically and economically important secondary metabolites. The genome features a high mean G + C content of 71.32% and consists of one circular chromosome with a size of 9,239,851 bp hosting 8,270 predicted protein coding sequences. Phylogenetic analysis of the core genome revealed a rather distant relation to other sequenced species of the family Micromonosporaceae whereas Actinoplanes utahensis was found to be the closest species based on 16S rRNA gene sequence comparison. Besides the already published acarbose biosynthetic gene cluster sequence, several new non-ribosomal peptide synthetase-, polyketide synthase- and hybrid-clusters were identified on the Actinoplanes genome. Another key feature of the genome represents the discovery of a functional actinomycete integrative and conjugative element. Conclusions The complete genome sequence of Actinoplanes sp. SE50/110 marks an important step towards the rational genetic optimization of the acarbose production. In this regard, the identified actinomycete integrative and conjugative element could play a central role by providing the basis for the development of a genetic transformation system for Actinoplanes sp. SE50/110 and other Actinoplanes spp. Furthermore, the identified non-ribosomal peptide synthetase- and polyketide synthase-clusters potentially encode new antibiotics and/or other bioactive compounds, which might be of pharmacologic interest. PMID:22443545
The complete genome sequence of the acarbose producer Actinoplanes sp. SE50/110.
Schwientek, Patrick; Szczepanowski, Rafael; Rückert, Christian; Kalinowski, Jörn; Klein, Andreas; Selber, Klaus; Wehmeier, Udo F; Stoye, Jens; Pühler, Alfred
2012-03-23
Actinoplanes sp. SE50/110 is known as the wild type producer of the alpha-glucosidase inhibitor acarbose, a potent drug used worldwide in the treatment of type-2 diabetes mellitus. As the incidence of diabetes is rapidly rising worldwide, an ever increasing demand for diabetes drugs, such as acarbose, needs to be anticipated. Consequently, derived Actinoplanes strains with increased acarbose yields are being used in large scale industrial batch fermentation since 1990 and were continuously optimized by conventional mutagenesis and screening experiments. This strategy reached its limits and is generally superseded by modern genetic engineering approaches. As a prerequisite for targeted genetic modifications, the complete genome sequence of the organism has to be known. Here, we present the complete genome sequence of Actinoplanes sp. SE50/110 [GenBank:CP003170], the first publicly available genome of the genus Actinoplanes, comprising various producers of pharmaceutically and economically important secondary metabolites. The genome features a high mean G + C content of 71.32% and consists of one circular chromosome with a size of 9,239,851 bp hosting 8,270 predicted protein coding sequences. Phylogenetic analysis of the core genome revealed a rather distant relation to other sequenced species of the family Micromonosporaceae whereas Actinoplanes utahensis was found to be the closest species based on 16S rRNA gene sequence comparison. Besides the already published acarbose biosynthetic gene cluster sequence, several new non-ribosomal peptide synthetase-, polyketide synthase- and hybrid-clusters were identified on the Actinoplanes genome. Another key feature of the genome represents the discovery of a functional actinomycete integrative and conjugative element. The complete genome sequence of Actinoplanes sp. SE50/110 marks an important step towards the rational genetic optimization of the acarbose production. In this regard, the identified actinomycete integrative and conjugative element could play a central role by providing the basis for the development of a genetic transformation system for Actinoplanes sp. SE50/110 and other Actinoplanes spp. Furthermore, the identified non-ribosomal peptide synthetase- and polyketide synthase-clusters potentially encode new antibiotics and/or other bioactive compounds, which might be of pharmacologic interest.
Bejerman, Nicolás; Giolitti, Fabián; Trucco, Verónica; de Breuil, Soledad; Dietzgen, Ralf G; Lenardon, Sergio
2016-07-01
Alfalfa dwarf disease, probably caused by synergistic interactions of mixed virus infections, is a major and emergent disease that threatens alfalfa production in Argentina. Deep sequencing of diseased alfalfa plant samples from the central region of Argentina resulted in the identification of a new virus genome resembling enamoviruses in sequence and genome structure. Phylogenetic analysis suggests that it is a new member of the genus Enamovirus, family Luteoviridae. The virus is tentatively named "alfalfa enamovirus 1" (AEV-1). The availability of the AEV-1 genome sequence will make it possible to assess the genetic variability of this virus and to construct an infectious clone to investigate its role in alfalfa dwarfism disease.
Reddy, M Sreekanth; Kanakala, S; Srinivas, K P; Hema, M; Malathi, V G; Sreenivasulu, P
2014-05-01
The complete DNA A genome of a virus isolate associated with yellow mosaic disease of a medicinal plant, Hemidesmus indicus, from India was cloned and sequenced. The length of DNA A was 2825 nucleotides, 35 nucleotides longer than the unit genome of monopartite begomoviruses. Comparison of the nucleotide sequence of DNA A of the virus isolate with those of other begomoviruses showed maximum sequence identity of 69 % to DNA A of ageratum yellow vein China virus (AYVCNV; AJ558120) and 68 % with tomato yellow leaf curl virus- LBa4 (TYLCV; EF185318), and it formed a distinct clade in phylogenetic analysis. The genome organization of the present virus isolate was found to be similar to that of Old World monopartite begomoviruses. The genome was considered to be monopartite, because association of DNA B and β satellite DNA components was not detected. Based on its sequence identity (<70 %) to all other begomoviruses known to date and ICTV (International Committee on Taxonomy of Viruses) species demarcating criteria (<89 % identity), it is considered a member of a novel begomovirus species, and the tentative name "Hemidesmus yellow mosaic virus" (HeYMV) is proposed.
Saha, Surya; Hunter, Wayne B; Reese, Justin; Morgan, J Kent; Marutani-Hert, Mizuri; Huang, Hong; Lindeberg, Magdalen
2012-01-01
Diaphorina citri (Hemiptera: Psyllidae), the Asian citrus psyllid, is the insect vector of Ca. Liberibacter asiaticus, the causal agent of citrus greening disease. Sequencing of the D. citri metagenome has been initiated to gain better understanding of the biology of this organism and the potential roles of its bacterial endosymbionts. To corroborate candidate endosymbionts previously identified by rDNA amplification, raw reads from the D. citri metagenome sequence were mapped to reference genome sequences. Results of the read mapping provided the most support for Wolbachia and an enteric bacterium most similar to Salmonella. Wolbachia-derived reads were extracted using the complete genome sequences for four Wolbachia strains. Reads were assembled into a draft genome sequence, and the annotation assessed for the presence of features potentially involved in host interaction. Genome alignment with the complete sequences reveals membership of Wolbachia wDi in supergroup B, further supported by phylogenetic analysis of FtsZ. FtsZ and Wsp phylogenies additionally indicate that the Wolbachia strain in the Florida D. citri isolate falls into a sub-clade of supergroup B, distinct from Wolbachia present in Chinese D. citri isolates, supporting the hypothesis that the D. citri introduced into Florida did not originate from China.
Saha, Surya; Hunter, Wayne B.; Reese, Justin; Morgan, J. Kent; Marutani-Hert, Mizuri; Huang, Hong; Lindeberg, Magdalen
2012-01-01
Diaphorina citri (Hemiptera: Psyllidae), the Asian citrus psyllid, is the insect vector of Ca. Liberibacter asiaticus, the causal agent of citrus greening disease. Sequencing of the D. citri metagenome has been initiated to gain better understanding of the biology of this organism and the potential roles of its bacterial endosymbionts. To corroborate candidate endosymbionts previously identified by rDNA amplification, raw reads from the D. citri metagenome sequence were mapped to reference genome sequences. Results of the read mapping provided the most support for Wolbachia and an enteric bacterium most similar to Salmonella. Wolbachia-derived reads were extracted using the complete genome sequences for four Wolbachia strains. Reads were assembled into a draft genome sequence, and the annotation assessed for the presence of features potentially involved in host interaction. Genome alignment with the complete sequences reveals membership of Wolbachia wDi in supergroup B, further supported by phylogenetic analysis of FtsZ. FtsZ and Wsp phylogenies additionally indicate that the Wolbachia strain in the Florida D. citri isolate falls into a sub-clade of supergroup B, distinct from Wolbachia present in Chinese D. citri isolates, supporting the hypothesis that the D. citri introduced into Florida did not originate from China. PMID:23166822
Gruenstaeudl, Michael; Gerschler, Nico; Borsch, Thomas
2018-06-21
The sequencing and comparison of plastid genomes are becoming a standard method in plant genomics, and many researchers are using this approach to infer plant phylogenetic relationships. Due to the widespread availability of next-generation sequencing, plastid genome sequences are being generated at breakneck pace. This trend towards massive sequencing of plastid genomes highlights the need for standardized bioinformatic workflows. In particular, documentation and dissemination of the details of genome assembly, annotation, alignment and phylogenetic tree inference are needed, as these processes are highly sensitive to the choice of software and the precise settings used. Here, we present the procedure and results of sequencing, assembling, annotating and quality-checking of three complete plastid genomes of the aquatic plant genus Cabomba as well as subsequent gene alignment and phylogenetic tree inference. We accompany our findings by a detailed description of the bioinformatic workflow employed. Importantly, we share a total of eleven software scripts for each of these bioinformatic processes, enabling other researchers to evaluate and replicate our analyses step by step. The results of our analyses illustrate that the plastid genomes of Cabomba are highly conserved in both structure and gene content.
Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi
2015-11-20
The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.
NASA Astrophysics Data System (ADS)
De Marchi, G.; Paresce, F.; Straniero, O.; Prada Moroni, P. G.
2004-03-01
Very deep images of the Galactic globular cluster M 4 (NGC 6121) through the F606W and F814W filters were taken in 2001 with the WFPC2 on board the HST. A first published analysis of this data set (Richer et al. \\cite{Richer2002}) produced the result that the age of M 4 is 12.7± 0.7 Gyr (Hansen et al. \\cite{Hansen2002}), thus setting a robust lower limit to the age of the universe. In view of the great astronomical importance of getting this number right, we have subjected the same data set to the simplest possible photometric analysis that completely avoids uncertain assumptions about the origin of the detected sources. This analysis clearly reveals both a thin main sequence, from which can be deduced the deepest statistically complete mass function yet determined for a globular cluster, and a white dwarf (WD) sequence extending all the way down to the 5 \\sigma detection limit at I ≃ 27. The WD sequence is abruptly terminated at exactly this limit as expected by detection statistics. Using our most recent theoretical WD models (Prada Moroni & Straniero \\cite{Prada2002}) to obtain the expected WD sequence for different ages in the observed bandpasses, we find that the data so far obtained do not reach the peak of the WD luminosity function, thus only allowing one to set a lower limit to the age of M 4 of ˜9 Gyr. Thus, the problem of determining the absolute age of a globular cluster and, therefore, the onset of GC formation with cosmologically significant accuracy remains completely open. Only observations several magnitudes deeper than the limit obtained so far would allow one to approach this objective. Based on observations with the NASA/ESA Hubble Space Telescope, obtained at the Space Telescope Science Institute, which is operated by AURA for NASA under contract NAS5-26555.
USDA-ARS?s Scientific Manuscript database
To evaluate genetic diversity of Lymantria dispar nucleopolyhedrovirus (LdMNPV) at the genomic level, five isolates of LdMNPV from North America, Europe, and Asia were selected for complete genome sequence determination and analysis. These isolates consist of LdMNPV-2161 from Korea; LdMNPV-3029, a ...
Meiler, Arno; Klinger, Claudia; Kaufmann, Michael
2012-09-08
The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC's NUCOCOG dataset as the largest one available for that purpose thus far. Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.
2012-01-01
Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills. PMID:22958836
Ma, Ji; Yang, Bingxian; Zhu, Wei; Sun, Lianli; Tian, Jingkui; Wang, Xumin
2013-10-10
Mahonia bealei (Berberidaceae) is a frequently-used traditional Chinese medicinal plant with efficient anti-inflammatory ability. This plant is one of the sources of berberine, a new cholesterol-lowering drug with anti-diabetic activity. We have sequenced the complete nucleotide sequence of the chloroplast (cp) genome of M. bealei. The complete cp genome of M. bealei is 164,792 bp in length, and has a typical structure with large (LSC 73,052 bp) and small (SSC 18,591 bp) single-copy regions separated by a pair of inverted repeats (IRs 36,501 bp) of large size. The Mahonia cp genome contains 111 unique genes and 39 genes are duplicated in the IR regions. The gene order and content of M. bealei are almost unarranged which is consistent with the hypothesis that large IRs stabilize cp genome and reduce gene loss-and-gain probabilities during evolutionary process. A large IR expansion of over 12 kb has occurred in M. bealei, 15 genes (rps19, rpl22, rps3, rpl16, rpl14, rps8, infA, rpl36, rps11, petD, petB, psbH, psbN, psbT and psbB) have expanded to have an additional copy in the IRs. The IR expansion rearrangement occurred via a double-strand DNA break and subsequence repair, which is different from the ordinary gene conversion mechanism. Repeat analysis identified 39 direct/inverted repeats 30 bp or longer with a sequence identity ≥ 90%. Analysis also revealed 75 simple sequence repeat (SSR) loci and almost all are composed of A or T, contributing to a distinct bias in base composition. Comparison of protein-coding sequences with ESTs reveals 9 putative RNA edits and 5 of them resulted in non-synonymous modifications in rpoC1, rps2, rps19 and ycf1. Phylogenetic analysis using maximum parsimony (MP) and maximum likelihood (ML) was performed on a dataset composed of 65 protein-coding genes from 25 taxa, which yields an identical tree topology as previous plastid-based trees, and provides strong support for the sister relationship between Ranunculaceae and Berberidaceae. Molecular dating analyses suggest that Ranunculaceae and Berberidaceae diverged between 90 and 84 mya, which is congruent with the fossil records and with recent estimates of the divergence time of these two taxa. © 2013.
USDA-ARS?s Scientific Manuscript database
The recent completion of the complete genome sequence of the guinea pig (Cavia porcellus) provides innovative opportunities to apply proteomic technologies to an important animal model of disease. In this study, a 2-D guinea pig proteome lung map was used to investigate the pathogenic mechanisms of ...
Characterization of the complete mitochondrial genome sequence of wild yak (Bos mutus).
Chunnian, Liang; Wu, Xiaoyun; Ding, Xuezhi; Wang, Hongbo; Guo, Xian; Chu, Min; Bao, Pengjia; Yan, Ping
2016-11-01
Wild yak is a special breed in China and it is regarded as an important genetic resource for sustainably developing the animal husbandry in Tibetan area and enriching region's biodiversity. The complete mitochondrial genome of wild yak (16,322 bp in length) displayed 37 typical animal mitochondrial genes and A + T-rich (61.01%), with an overall G + C content of only 38.99%. It contained a non-coding control region (D-loop), 13 protein-coding genes, two rRNA genes, and 22 tRNA genes. Most of the genes have ATG initiation codons, whereas ND2, ND3, and ND5 genes start with ATA and were encoded on H-strand. The gene order of wild yak mitogenome is identical to that observed in most other vertebrates. The complete mitochondrial genome sequence of wild yak reported here could provide valuable information for developing genetic markers and phylogenetic analysis in yak.
Wang, Cheng-Long; Ding, Meng-Qi; Zou, Chen-Yan; Zhu, Xue-Mei; Tang, Yu; Zhou, Mei-Liang; Shao, Ji-Rong
2017-07-26
Buckwheat is a nutritional and economically crop belonging to Polygonaceae, Fagopyrum. To better understand the mutation patterns and evolution trend in the chloroplast (cp) genome of buckwheat, and found sufficient number of variable regions to explore the phylogenetic relationships of this genus, two complete cp genomes of buckwheat including Fagopyrum dibotrys (F. dibotrys) and Fagopyrum luojishanense (F. luojishanense) were sequenced, and other two Fagopyrum cp genomes were used for comparative analysis. After morphological analysis, the main difference among these buckwheat were height, leaf shape, seeds and flower type. F. luojishanense was distinguishable from the cultivated species easily. Although the F. dibotrys and two cultivated species has some similarity, they different in habit and component contents. The cp genome of F. dibotrys was 159,320 bp while the F. luojishanense was 159,265 bp. 48 and 61 SSRs were found in F. dibotrys and F. luojishanense respectively. Meanwhile, 10 highly variable regions among these buckwheat species were located precisely. The phylogenetic relationships among four Fagopyrum species based on complete cp genomes was showed. The results suggested that F. dibotrys is more closely related to Fagopyrum tataricum. These data provided valuable genetic information for Fagopyrum species identification, taxonomy, phylogenetic study and molecular breeding.
Bào, Yīmíng; Kuhn, Jens H
2018-01-01
During the last decade, genome sequence-based classification of viruses has become increasingly prominent. Viruses can be even classified based on coding-complete genome sequence data alone. Nevertheless, classification remains arduous as experts are required to establish phylogenetic trees to depict the evolutionary relationships of such sequences for preliminary taxonomic placement. Pairwise sequence comparison (PASC) of genomes is one of several novel methods for establishing relationships among viruses. This method, provided by the US National Center for Biotechnology Information as an open-access tool, circumvents phylogenetics, and yet PASC results are often in agreement with those of phylogenetic analyses. Computationally inexpensive, PASC can be easily performed by non-taxonomists. Here we describe how to use the PASC tool for the preliminary classification of novel viral hemorrhagic fever-causing viruses.
Insights from Human/Mouse genome comparisons
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pennacchio, Len A.
2003-03-30
Large-scale public genomic sequencing efforts have provided a wealth of vertebrate sequence data poised to provide insights into mammalian biology. These include deep genomic sequence coverage of human, mouse, rat, zebrafish, and two pufferfish (Fugu rubripes and Tetraodon nigroviridis) (Aparicio et al. 2002; Lander et al. 2001; Venter et al. 2001; Waterston et al. 2002). In addition, a high-priority has been placed on determining the genomic sequence of chimpanzee, dog, cow, frog, and chicken (Boguski 2002). While only recently available, whole genome sequence data have provided the unique opportunity to globally compare complete genome contents. Furthermore, the shared evolutionary ancestrymore » of vertebrate species has allowed the development of comparative genomic approaches to identify ancient conserved sequences with functionality. Accordingly, this review focuses on the initial comparison of available mammalian genomes and describes various insights derived from such analysis.« less
NASA Astrophysics Data System (ADS)
Gao, Jie; Jiang, Li-Li; Xu, Zhen-Yuan
2009-10-01
A new chaos game representation of protein sequences based on the detailed hydrophobic-hydrophilic (HP) model has been proposed by Yu et al (Physica A 337 (2004) 171). A CGR-walk model is proposed based on the new CGR coordinates for the protein sequences from complete genomes in the present paper. The new CGR coordinates based on the detailed HP model are converted into a time series, and a long-memory ARFIMA(p, d, q) model is introduced into the protein sequence analysis. This model is applied to simulating real CGR-walk sequence data of twelve protein sequences. Remarkably long-range correlations are uncovered in the data and the results obtained from these models are reasonably consistent with those available from the ARFIMA(p, d, q) model.
Ni, Lianghong; Zhao, Zhili; Xu, Hongxi; Chen, Shilin; Dorje, Gaawe
2016-02-15
Endemic to the Sino-Himalayan subregion, the medicinal alpine plant Gentiana straminea is a threatened species. The genetic and molecular data about it is deficient. Here we report the complete chloroplast (cp) genome sequence of G. straminea, as the first sequenced member of the family Gentianaceae. The cp genome is 148,991bp in length, including a large single copy (LSC) region of 81,240bp, a small single copy (SSC) region of 17,085bp and a pair of inverted repeats (IRs) of 25,333bp. It contains 112 unique genes, including 78 protein-coding genes, 30 tRNAs and 4 rRNAs. The rps16 gene lacks exon2 between trnK-UUU and trnQ-UUG, which is the first rps16 pseudogene found in the nonparasitic plants of Asterids clade. Sequence analysis revealed the presence of 13 forward repeats, 13 palindrome repeats and 39 simple sequence repeats (SSRs). An entire cp genome comparison study of G. straminea and four other species in Gentianales was carried out. Phylogenetic analyses using maximum likelihood (ML) and maximum parsimony (MP) were performed based on 69 protein-coding genes from 36 species of Asterids. The results strongly supported the position of Gentianaceae as one member of the order Gentianales. The complete chloroplast genome sequence will provide intragenic information for its conservation and contribute to research on the genetic and phylogenetic analyses of Gentianales and Asterids. Copyright © 2015 Elsevier B.V. All rights reserved.
Illeghems, Koen; De Vuyst, Luc; Weckx, Stefan
2013-08-01
Acetobacter pasteurianus 386B, an acetic acid bacterium originating from a spontaneous cocoa bean heap fermentation, proved to be an ideal functional starter culture for coca bean fermentations. It is able to dominate the fermentation process, thereby resisting high acetic acid concentrations and temperatures. However, the molecular mechanisms underlying its metabolic capabilities and niche adaptations are unknown. In this study, whole-genome sequencing and comparative genome analysis was used to investigate this strain's mechanisms to dominate the cocoa bean fermentation process. The genome sequence of A. pasteurianus 386B is composed of a 2.8-Mb chromosome and seven plasmids. The annotation of 2875 protein-coding sequences revealed important characteristics, including several metabolic pathways, the occurrence of strain-specific genes such as an endopolygalacturonase, and the presence of mechanisms involved in tolerance towards various stress conditions. Furthermore, the low number of transposases in the genome and the absence of complete phage genomes indicate that this strain might be more genetically stable compared with other A. pasteurianus strains, which is an important advantage for the use of this strain as a functional starter culture. Comparative genome analysis with other members of the Acetobacteraceae confirmed the functional properties of A. pasteurianus 386B, such as its thermotolerant nature and unique genetic composition. Genome analysis of A. pasteurianus 386B provided detailed insights into the underlying mechanisms of its metabolic features, niche adaptations, and tolerance towards stress conditions. Combination of these data with previous experimental knowledge enabled an integrated, global overview of the functional characteristics of this strain. This knowledge will enable improved fermentation strategies and selection of appropriate acetic acid bacteria strains as functional starter culture for cocoa bean fermentation processes.
Pallavi, Tokala; Chandra, Rampalli Viswa; Reddy, Aileni Amarender; Reddy, Bavigadda Harish; Naveen, Anumala
2016-01-01
Context: The inflammatory processes involved in chronic periodontitis and coronary artery diseases (CADs) are similar and produce reactive oxygen species that may result in similar somatic mutations in mitochondrial deoxyribonucleic acid (mtDNA). Aims: The aims of the present study were to identify somatic mtDNA mutations in periodontal and cardiac tissues from subjects undergoing coronary artery bypass surgery and determine what fraction was identical and unique to these tissues. Settings and Design: The study population consisted of 30 chronic periodontitis subjects who underwent coronary artery surgery after an angiogram had indicated CAD. Materials and Methods: Gingival tissue samples were taken from the site with deepest probing depth; coronary artery tissue samples were taken during the coronary artery bypass grafting procedures, and blood samples were drawn during this surgical procedure. These samples were stored under aseptic conditions and later transported for mtDNA analysis. Statistical Analysis Used: Complete mtDNA sequences were obtained and aligned with the revised Cambridge reference sequence (NC_012920) using sequence analysis and auto assembler tools. Results: Among the complete mtDNA sequences, a total of 162 variations were spread across the whole mitochondrial genome and present only in the coronary artery and the gingival tissue samples but not in the blood samples. Among the 162 variations, 12 were novel and four of the 12 novel variations were found in mitochondrial NADH dehydrogenase subunit 5 complex I gene (33.3%). Conclusions: Analysis of mtDNA mutations indicated 162 variants unique to periodontitis and CAD. Of these, 12 were novel and may have resulted from destructive oxidative forces common to these two diseases. PMID:27041832
HBV Genotypic Variability in Cuba
Loureiro, Carmen L.; Aguilar, Julio C.; Aguiar, Jorge; Muzio, Verena; Pentón, Eduardo; Garcia, Daymir; Guillen, Gerardo; Pujol, Flor H.
2015-01-01
The genetic diversity of HBV in human population is often a reflection of its genetic admixture. The aim of this study was to explore the genotypic diversity of HBV in Cuba. The S genomic region of Cuban HBV isolates was sequenced and for selected isolates the complete genome or precore-core sequence was analyzed. The most frequent genotype was A (167/250, 67%), mainly A2 (149, 60%) but also A1 and one A4. A total of 77 isolates were classified as genotype D (31%), with co-circulation of several subgenotypes (56 D4, 2 D1, 5 D2, 7 D3/6 and 7 D7). Three isolates belonged to genotype E, two to H and one to B3. Complete genome sequence analysis of selected isolates confirmed the phylogenetic analysis performed with the S region. Mutations or polymorphisms in precore region were more common among genotype D compared to genotype A isolates. The HBV genotypic distribution in this Caribbean island correlates with the Y lineage genetic background of the population, where a European and African origin prevails. HBV genotypes E, B3 and H isolates might represent more recent introductions. PMID:25742179
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roger, T.; Morisset, J.; Seman, M.
1996-12-31
The mouse Tcrg locus comprises seven Tcrg-V, four Tcrg-J, and four Tcrg-C segments which generate only six major types of functional g chains, Vg7-, Vg4-, Vg6-, or Vg5-Jg1-Cg1, Vg2-Jg2-Cg2, and Vg1-Jg4-Cg4. A complete analysis of restriction fragment length polymorphism (RFLP) of the Tcrg locus in wild and inbred mice suggested its relative conservation compared to other loci of the immunoglobulin (Ig) gene family. Three haplotypes have been characterized in laboratory mice: gA, gB, and gC, represented by BALB/c, DBA/2, and AKR prototypes. Tcr-gA and -gC haplotypes are highly related. By contrast, Tcr-gB, likely inherited from Asian mouse subspecies, appeared verymore » different by RFLP analysis. Yet only partial sequence data have been reported on gA and gB Tcrg-V genes. Here, the complete sequence of all Tcrg-V genes of the two haplotypes is described. 16 refs., 1 fig.« less
Complete Genome Sequence and Comparative Analysis of the Fish Pathogen Lactococcus garvieae
Oshima, Kenshiro; Yoshizaki, Mariko; Kawanishi, Michiko; Nakaya, Kohei; Suzuki, Takehito; Miyauchi, Eiji; Ishii, Yasuo; Tanabe, Soichi; Murakami, Masaru; Hattori, Masahira
2011-01-01
Lactococcus garvieae causes fatal haemorrhagic septicaemia in fish such as yellowtail. The comparative analysis of genomes of a virulent strain Lg2 and a non-virulent strain ATCC 49156 of L. garvieae revealed that the two strains shared a high degree of sequence identity, but Lg2 had a 16.5-kb capsule gene cluster that is absent in ATCC 49156. The capsule gene cluster was composed of 15 genes, of which eight genes are highly conserved with those in exopolysaccharide biosynthesis gene cluster often found in Lactococcus lactis strains. Sequence analysis of the capsule gene cluster in the less virulent strain L. garvieae Lg2-S, Lg2-derived strain, showed that two conserved genes were disrupted by a single base pair deletion, respectively. These results strongly suggest that the capsule is crucial for virulence of Lg2. The capsule gene cluster of Lg2 may be a genomic island from several features such as the presence of insertion sequences flanked on both ends, different GC content from the chromosomal average, integration into the locus syntenic to other lactococcal genome sequences, and distribution in human gut microbiomes. The analysis also predicted other potential virulence factors such as haemolysin. The present study provides new insights into understanding of the virulence mechanisms of L. garvieae in fish. PMID:21829716
Shirts, Brian H; Salipante, Stephen J; Casadei, Silvia; Ryan, Shawnia; Martin, Judith; Jacobson, Angela; Vlaskin, Tatyana; Koehler, Karen; Livingston, Robert J; King, Mary-Claire; Walsh, Tom; Pritchard, Colin C
2014-10-01
Single-exon inversions have rarely been described in clinical syndromes and are challenging to detect using Sanger sequencing. We report the case of a 40-year-old woman with adenomatous colon polyps too numerous to count and who had a complex inversion spanning the entire exon 10 in APC (the gene encoding for adenomatous polyposis coli), causing exon skipping and resulting in a frameshift and premature protein truncation. In this study, we employed complete APC gene sequencing using high-coverage next-generation sequencing by ColoSeq, analysis with BreakDancer and SLOPE software, and confirmatory transcript analysis. ColoSeq identified a complex small genomic rearrangement consisting of an inversion that results in translational skipping of exon 10 in the APC gene. This mutation would not have been detected by traditional sequencing or gene-dosage methods. We report a case of adenomatous polyposis resulting from a complex single-exon inversion. Our report highlights the benefits of large-scale sequencing methods that capture intronic sequences with high enough depth of coverage-as well as the use of informatics tools-to enable detection of small pathogenic structural rearrangements.
T4-Like Genome Organization of the Escherichia coli O157:H7 Lytic Phage AR1▿†
Liao, Wei-Chao; Ng, Wailap Victor; Lin, I-Hsuan; Syu, Wan-Jr; Liu, Tze-Tze; Chang, Chuan-Hsiung
2011-01-01
We report the genome organization and analysis of the first completely sequenced T4-like phage, AR1, of Escherichia coli O157:H7. Unlike most of the other sequenced phages of O157:H7, which belong to the temperate Podoviridae and Siphoviridae families, AR1 is a T4-like phage known to efficiently infect this pathogenic bacterial strain. The 167,435-bp AR1 genome is currently the largest among all the sequenced E. coli O157:H7 phages. It carries a total of 281 potential open reading frames (ORFs) and 10 putative tRNA genes. Of these, 126 predicted proteins could be classified into six viral orthologous group categories, with at least 18 proteins of the structural protein category having been detected by tandem mass spectrometry. Comparative genomic analysis of AR1 and four other completely sequenced T4-like genomes (RB32, RB69, T4, and JS98) indicated that they share a well-organized and highly conserved core genome, particularly in the regions encoding DNA replication and virion structural proteins. The major diverse features between these phages include the modules of distal tail fibers and the types and numbers of internal proteins, tRNA genes, and mobile elements. Codon usage analysis suggested that the presence of AR1-encoded tRNAs may be relevant to the codon usage of structural proteins. Furthermore, protein sequence analysis of AR1 gp37, a potential receptor binding protein, indicated that eight residues in the C terminus are unique to O157:H7 T4-like phages AR1 and PP01. These residues are known to be located in the T4 receptor recognition domain, and they may contribute to specificity for adsorption to the O157:H7 strain. PMID:21507986
Rodgers, Mary A; Wilkinson, Eduan; Vallari, Ana; McArthur, Carole; Sthreshley, Larry; Brennan, Catherine A; Cloherty, Gavin; de Oliveira, Tulio
2017-03-15
As the epidemiological epicenter of the human immunodeficiency virus (HIV) pandemic, the Democratic Republic of the Congo (DRC) is a reservoir of circulating HIV strains exhibiting high levels of diversity and recombination. In this study, we characterized HIV specimens collected in two rural areas of the DRC between 2001 and 2003 to identify rare strains of HIV. The env gp41 region was sequenced and characterized for 172 HIV-positive specimens. The env sequences were predominantly subtype A (43.02%), but 7 other subtypes (33.14%), 20 circulating recombinant forms (CRFs; 11.63%), and 20 unclassified (11.63%) sequences were also found. Of the rare and unclassified subtypes, 18 specimens were selected for next-generation sequencing (NGS) by a modified HIV-switching mechanism at the 5' end of the RNA template (SMART) method to obtain full-genome sequences. NGS produced 14 new complete genomes, which included pure subtype C ( n = 2), D ( n = 1), F1 ( n = 1), H ( n = 3), and J ( n = 1) genomes. The two subtype C genomes and one of the subtype H genomes branched basal to their respective subtype branches but had no evidence of recombination. The remaining 6 genomes were complex recombinants of 2 or more subtypes, including subtypes A1, F, G, H, J, and K and unclassified fragments, including one subtype CRF25 isolate, which branched basal to all CRF25 references. Notably, all recombinant subtype H fragments branched basal to the H clade. Spatial-geographical analysis indicated that the diverse sequences identified here did not expand globally. The full-genome and subgenomic sequences identified in our study population significantly increase the documented diversity of the strains involved in the continually evolving HIV-1 pandemic. IMPORTANCE Very little is known about the ancestral HIV-1 strains that founded the global pandemic, and very few complete genome sequences are available from patients in the Congo Basin, where HIV-1 expanded early in the global pandemic. By sequencing a subgenomic fragment of the HIV-1 envelope from study participants in the DRC, we identified rare variants for complete genome sequencing. The basal branching of some of the complete genome sequences that we recovered suggests that these strains are more closely related to ancestral HIV-1 strains than to previously reported strains and is evidence that the local diversification of HIV in the DRC continues to outpace the diversity of global strains decades after the emergence of the pandemic. Copyright © 2017 Rodgers et al.
2014-01-01
Background Fascioliasis is an important and neglected disease of humans and other mammals, caused by trematodes of the genus Fasciola. Fasciola hepatica and F. gigantica are valid species that infect humans and animals, but the specific status of Fasciola sp. (‘intermediate form’) is unclear. Methods Single specimens inferred to represent Fasciola sp. (‘intermediate form’; Heilongjiang) and F. gigantica (Guangxi) from China were genetically identified and characterized using PCR-based sequencing of the first and second internal transcribed spacer regions of nuclear ribosomal DNA. The complete mitochondrial (mt) genomes of these representative specimens were then sequenced. The relationships of these specimens with selected members of the Trematoda were assessed by phylogenetic analysis of concatenated amino acid sequence datasets by Bayesian inference (BI). Results The complete mt genomes of representatives of Fasciola sp. and F. gigantica were 14,453 bp and 14,478 bp in size, respectively. Both mt genomes contain 12 protein-coding genes, 22 transfer RNA genes and two ribosomal RNA genes, but lack an atp8 gene. All protein-coding genes are transcribed in the same direction, and the gene order in both mt genomes is the same as that published for F. hepatica. Phylogenetic analysis of the concatenated amino acid sequence data for all 12 protein-coding genes showed that the specimen of Fasciola sp. was more closely related to F. gigantica than to F. hepatica. Conclusions The mt genomes characterized here provide a rich source of markers, which can be used in combination with nuclear markers and imaging techniques, for future comparative studies of the biology of Fasciola sp. from China and other countries. PMID:24685294
Liu, Guo-Hua; Gasser, Robin B; Young, Neil D; Song, Hui-Qun; Ai, Lin; Zhu, Xing-Quan
2014-03-31
Fascioliasis is an important and neglected disease of humans and other mammals, caused by trematodes of the genus Fasciola. Fasciola hepatica and F. gigantica are valid species that infect humans and animals, but the specific status of Fasciola sp. ('intermediate form') is unclear. Single specimens inferred to represent Fasciola sp. ('intermediate form'; Heilongjiang) and F. gigantica (Guangxi) from China were genetically identified and characterized using PCR-based sequencing of the first and second internal transcribed spacer regions of nuclear ribosomal DNA. The complete mitochondrial (mt) genomes of these representative specimens were then sequenced. The relationships of these specimens with selected members of the Trematoda were assessed by phylogenetic analysis of concatenated amino acid sequence datasets by Bayesian inference (BI). The complete mt genomes of representatives of Fasciola sp. and F. gigantica were 14,453 bp and 14,478 bp in size, respectively. Both mt genomes contain 12 protein-coding genes, 22 transfer RNA genes and two ribosomal RNA genes, but lack an atp8 gene. All protein-coding genes are transcribed in the same direction, and the gene order in both mt genomes is the same as that published for F. hepatica. Phylogenetic analysis of the concatenated amino acid sequence data for all 12 protein-coding genes showed that the specimen of Fasciola sp. was more closely related to F. gigantica than to F. hepatica. The mt genomes characterized here provide a rich source of markers, which can be used in combination with nuclear markers and imaging techniques, for future comparative studies of the biology of Fasciola sp. from China and other countries.
Yu, Zhongtang; Yu, Marie; Morrison, Mark
2006-04-01
Serial analysis of ribosomal sequence tags (SARST) is a recently developed technology that can generate large 16S rRNA gene (rrs) sequence data sets from microbiomes, but there are numerous enzymatic and purification steps required to construct the ribosomal sequence tag (RST) clone libraries. We report here an improved SARST method, which still targets the V1 hypervariable region of rrs genes, but reduces the number of enzymes, oligonucleotides, reagents, and technical steps needed to produce the RST clone libraries. The new method, hereafter referred to as SARST-V1, was used to examine the eubacterial diversity present in community DNA recovered from the microbiome resident in the ovine rumen. The 190 sequenced clones contained 1055 RSTs and no less than 236 unique phylotypes (based on > or = 95% sequence identity) that were assigned to eight different eubacterial phyla. Rarefaction and monomolecular curve analyses predicted that the complete RST clone library contains 99% of the 353 unique phylotypes predicted to exist in this microbiome. When compared with ribosomal intergenic spacer analysis (RISA) of the same community DNA sample, as well as a compilation of nine previously published conventional rrs clone libraries prepared from the same type of samples, the RST clone library provided a more comprehensive characterization of the eubacterial diversity present in rumen microbiomes. As such, SARST-V1 should be a useful tool applicable to comprehensive examination of diversity and composition in microbiomes and offers an affordable, sequence-based method for diversity analysis.
Primary and secondary structural analyses of glutathione S-transferase pi from human placenta.
Ahmad, H; Wilson, D E; Fritz, R R; Singh, S V; Medh, R D; Nagle, G T; Awasthi, Y C; Kurosky, A
1990-05-01
The primary structure of glutathione S-transferase (GST) pi from a single human placenta was determined. The structure was established by chemical characterization of tryptic and cyanogen bromide peptides as well as automated sequence analysis of the intact enzyme. The structural analysis indicated that the protein is comprised of 209 amino acid residues and gave no evidence of post-translational modifications. The amino acid sequence differed from that of the deduced amino acid sequence determined by nucleotide sequence analysis of a cDNA clone (Kano, T., Sakai, M., and Muramatsu, M., 1987, Cancer Res. 47, 5626-5630) at position 104 which contained both valine and isoleucine whereas the deduced sequence from nucleotide sequence analysis identified only isoleucine at this position. These results demonstrated that in the one individual placenta studied at least two GST pi genes are coexpressed, probably as a result of allelomorphism. Computer assisted consensus sequence evaluation identified a hydrophobic region in GST pi (residues 155-181) that was predicted to be either a buried transmembrane helical region or a signal sequence region. The significance of this hydrophobic region was interpreted in relation to the mode of action of the enzyme especially in regard to the potential involvement of a histidine in the active site mechanism. A comparison of the chemical similarity of five known human GST complete enzyme structures, one of pi, one of mu, two of alpha, and one microsomal, gave evidence that all five enzymes have evolved by a divergent evolutionary process after gene duplication, with the microsomal enzyme representing the most divergent form.
Bicycle: a bioinformatics pipeline to analyze bisulfite sequencing data.
Graña, Osvaldo; López-Fernández, Hugo; Fdez-Riverola, Florentino; González Pisano, David; Glez-Peña, Daniel
2018-04-15
High-throughput sequencing of bisulfite-converted DNA is a technique used to measure DNA methylation levels. Although a considerable number of computational pipelines have been developed to analyze such data, none of them tackles all the peculiarities of the analysis together, revealing limitations that can force the user to manually perform additional steps needed for a complete processing of the data. This article presents bicycle, an integrated, flexible analysis pipeline for bisulfite sequencing data. Bicycle analyzes whole genome bisulfite sequencing data, targeted bisulfite sequencing data and hydroxymethylation data. To show how bicycle overtakes other available pipelines, we compared them on a defined number of features that are summarized in a table. We also tested bicycle with both simulated and real datasets, to show its level of performance, and compared it to different state-of-the-art methylation analysis pipelines. Bicycle is publicly available under GNU LGPL v3.0 license at http://www.sing-group.org/bicycle. Users can also download a customized Ubuntu LiveCD including bicycle and other bisulfite sequencing data pipelines compared here. In addition, a docker image with bicycle and its dependencies, which allows a straightforward use of bicycle in any platform (e.g. Linux, OS X or Windows), is also available. ograna@cnio.es or dgpena@uvigo.es. Supplementary data are available at Bioinformatics online.
Long-read sequencing data analysis for yeasts.
Yue, Jia-Xing; Liti, Gianni
2018-06-01
Long-read sequencing technologies have become increasingly popular due to their strengths in resolving complex genomic regions. As a leading model organism with small genome size and great biotechnological importance, the budding yeast Saccharomyces cerevisiae has many isolates currently being sequenced with long reads. However, analyzing long-read sequencing data to produce high-quality genome assembly and annotation remains challenging. Here, we present a modular computational framework named long-read sequencing data analysis for yeasts (LRSDAY), the first one-stop solution that streamlines this process. Starting from the raw sequencing reads, LRSDAY can produce chromosome-level genome assembly and comprehensive genome annotation in a highly automated manner with minimal manual intervention, which is not possible using any alternative tool available to date. The annotated genomic features include centromeres, protein-coding genes, tRNAs, transposable elements (TEs), and telomere-associated elements. Although tailored for S. cerevisiae, we designed LRSDAY to be highly modular and customizable, making it adaptable to virtually any eukaryotic organism. When applying LRSDAY to an S. cerevisiae strain, it takes ∼41 h to generate a complete and well-annotated genome from ∼100× Pacific Biosciences (PacBio) running the basic workflow with four threads. Basic experience working within the Linux command-line environment is recommended for carrying out the analysis using LRSDAY.
Upadhyay, Atul Kumar; Sowdhamini, Ramanathan
2016-01-01
3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids.
The complete genome sequence of the Atlantic salmon paramyxovirus (ASPV)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nylund, Stian; Karlsen, Marius; Nylund, Are
2008-03-30
The complete RNA genome of the Atlantic salmon paramyxovirus (ASPV), isolated from Atlantic salmon suffering from proliferative gill inflammation (PGI), has been determined. The genome is 16,965 nucleotides in length and consists of six nonoverlapping genes in the order 3'- N - P/C/V - M - F - HN - L -5', coding for the nucleocapsid, phospho-, matrix, fusion, hemagglutinin-neuraminidase and large polymerase proteins, respectively. The gene junctions contain highly conserved transcription start and stop signal sequences and trinucleotide intergenic regions similar to those of other Paramyxoviridae. The ASPV P-gene expression strategy is like that of the respiro- and morbilliviruses,more » which express the phosphoprotein from the primary transcript, and edit a portion of the mRNA to encode the accessory proteins V and W. It also encodes the C-protein by ribosomal choice of translation initiation. Pairwise comparisons of amino acid identities, and phylogenetic analysis of deduced ASPV protein sequences with homologous sequences from other Paramyxoviridae, show that ASPV has an affinity for the genus Respirovirus, but may represent a new genus within the subfamily Paramyxovirinae.« less
Huang, Wei-Yi; Zhao, Guang-Hui; Wei, Shu-Jun; Song, Hui-Qun; Xu, Min-Jun; Lin, Rui-Qing; Zhou, Dong-Hui; Zhu, Xing-Quan
2012-01-01
Complete mitochondrial (mt) genomes and the gene rearrangements are increasingly used as molecular markers for investigating phylogenetic relationships. Contributing to the complete mt genomes of Gastropoda, especially Pulmonata, we determined the mt genome of the freshwater snail Galba pervia, which is an important intermediate host for Fasciola spp. in China. The complete mt genome of G. pervia is 13,768 bp in length. Its genome is circular, and consists of 37 genes, including 13 genes for proteins, 2 genes for rRNA, 22 genes for tRNA. The mt gene order of G. pervia showed novel arrangement (tRNA-His, tRNA-Gly and tRNA-Tyr change positions and directions) when compared with mt genomes of Pulmonata species sequenced to date, indicating divergence among different species within the Pulmonata. A total of 3655 amino acids were deduced to encode 13 protein genes. The most frequently used amino acid is Leu (15.05%), followed by Phe (11.24%), Ser (10.76%) and IIe (8.346%). Phylogenetic analyses using the concatenated amino acid sequences of the 13 protein-coding genes, with three different computational algorithms (maximum parsimony, maximum likelihood and Bayesian analysis), all revealed that the families Lymnaeidae and Planorbidae are closely related two snail families, consistent with previous classifications based on morphological and molecular studies. The complete mt genome sequence of G. pervia showed a novel gene arrangement and it represents the first sequenced high quality mt genome of the family Lymnaeidae. These novel mtDNA data provide additional genetic markers for studying the epidemiology, population genetics and phylogeographics of freshwater snails, as well as for understanding interplay between the intermediate snail hosts and the intra-mollusca stages of Fasciola spp.. PMID:22844544
MIPS: analysis and annotation of proteins from whole genomes
Mewes, H. W.; Amid, C.; Arnold, R.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Münsterkötter, M.; Pagel, P.; Strack, N.; Stümpflen, V.; Warfsmann, J.; Ruepp, A.
2004-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein–protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:14681354
MIPS: analysis and annotation of proteins from whole genomes.
Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A
2004-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).
CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline.
Agrawal, Sonia; Arze, Cesar; Adkins, Ricky S; Crabtree, Jonathan; Riley, David; Vangala, Mahesh; Galens, Kevin; Fraser, Claire M; Tettelin, Hervé; White, Owen; Angiuoli, Samuel V; Mahurkar, Anup; Fricke, W Florian
2017-04-27
The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. CloVR-Comparative runs reference-free multiple whole-genome alignments to determine unique, shared and core coding sequences (CDSs) and single nucleotide polymorphisms (SNPs). Output includes short summary reports and detailed text-based results files, graphical visualizations (phylogenetic trees, circular figures), and a database file linked to the Sybil comparative genome browser. Data up- and download, pipeline configuration and monitoring, and access to Sybil are managed through CloVR-Comparative web interface. CloVR-Comparative and Sybil are distributed as part of the CloVR virtual appliance, which runs on local computers or the Amazon EC2 cloud. Representative datasets (e.g. 40 draft and complete Escherichia coli genomes) are processed in <36 h on a local desktop or at a cost of <$20 on EC2. CloVR-Comparative allows anybody with Internet access to run comparative genomics projects, while eliminating the need for on-site computational resources and expertise.
Verhoeven, Joost Theo Petra; Canuti, Marta; Munro, Hannah J; Dufour, Suzanne C; Lang, Andrew S
2018-04-19
High-throughput sequencing (HTS) technologies are becoming increasingly important within microbiology research, but aspects of library preparation, such as high cost per sample or strict input requirements, make HTS difficult to implement in some niche applications and for research groups on a budget. To answer these necessities, we developed ViDiT, a customizable, PCR-based, extremely low-cost (<5 US dollars per sample) and versatile library preparation method, and CACTUS, an analysis pipeline designed to rely on cloud computing power to generate high-quality data from ViDiT-based experiments without the need of expensive servers. We demonstrate here the versatility and utility of these methods within three fields of microbiology: virus discovery, amplicon-based viral genome sequencing and microbiome profiling. ViDiT-CACTUS allowed the identification of viral fragments from 25 different viral families from 36 oropharyngeal-cloacal swabs collected from wild birds, the sequencing of three almost complete genomes of avian influenza A viruses (>90% coverage), and the characterization and functional profiling of the complete microbial diversity (bacteria, archaea, viruses) within a deep-sea carnivorous sponge. ViDiT-CACTUS demonstrated its validity in a wide range of microbiology applications and its simplicity and modularity make it easily implementable in any molecular biology laboratory, towards various research goals.
Caboche, Ségolène; Audebert, Christophe; Hot, David
2014-01-01
The recent progresses of high-throughput sequencing (HTS) technologies enable easy and cost-reduced access to whole genome sequencing (WGS) or re-sequencing. HTS associated with adapted, automatic and fast bioinformatics solutions for sequencing applications promises an accurate and timely identification and characterization of pathogenic agents. Many studies have demonstrated that data obtained from HTS analysis have allowed genome-based diagnosis, which has been consistent with phenotypic observations. These proofs of concept are probably the first steps toward the future of clinical microbiology. From concept to routine use, many parameters need to be considered to promote HTS as a powerful tool to help physicians and clinicians in microbiological investigations. This review highlights the milestones to be completed toward this purpose. PMID:25437800
Genetic characterization of K13965, a strain of Oak Vale virus from Western Australia
Quan, Phenix-Lan; Williams, David T.; Johansen, Cheryl A.; Jain, Komal; Petrosov, Alexandra; Diviney, Sinead M.; Tashmukhamedova, Alla; Hutchison, Stephen K.; Tesh, Robert B.; Mackenzie, John S.; Briese, Thomas; Lipkin, W. Ian
2011-01-01
K13965, an uncharacterized virus, was isolated in 1993 from Anopheles annulipes mosquitoes collected in the Kimberley region of northern Western Australia. Here, we report its genomic sequence, identify it as a rhabdovirus, and characterize its phylogenetic relationships. The genome comprises a P′ (C) and SH protein similar to the recently characterized Tupaia and Durham viruses, and shows overlap between G and L genes. Comparison of K13965 genome sequence to other rhabdoviruses identified K13965 as a strain of the unclassified Australian Oak Vale rhabdovirus, whose complete genome sequence we also determined. Phylogenetic analysis of N and L sequences indicated genetic relationship to a recently proposed Sandjima virus clade, although the Oak Vale virus sequences form a branch separate from the African members of that group. PMID:21740935
Dombrovsky, Aviv; Glanz, Eyal; Lachman, Oded; Sela, Noa; Doron-Faigenboim, Adi; Antignus, Yehezkel
2013-01-01
We determined the complete sequence and organization of the genome of a putative member of the genus Polerovirus tentatively named Pepper yellow leaf curl virus (PYLCV). PYLCV has a wider host range than Tobacco vein-distorting virus (TVDV) and has a close serological relationship with Cucurbit aphid-borne yellows virus (CABYV) (both poleroviruses). The extracted viral RNA was subjected to SOLiD next-generation sequence analysis and used as a template for reverse transcription synthesis, which was followed by PCR amplification. The ssRNA genome of PYLCV includes 6,028 nucleotides encoding six open reading frames (ORFs), which is typical of the genus Polerovirus. Comparisons of the deduced amino acid sequences of the PYLCV ORFs 2-4 and ORF5, indicate that there are high levels of similarity between these sequences to ORFs 2-4 of TVDV (84-93%) and to ORF5 of CABYV (87%). Both PYLCV and Pepper vein yellowing virus (PeVYV) contain sequences that point to a common ancestral polerovirus. The recombination breakpoint which is located at CABYV ORF3, which encodes the viral coat protein (CP), may explain the CABYV-like sequences found in the genomes of the pepper infecting viruses PYLCV and PeVYV. Two additional regions unique to PYLCV (PY1 and PY2) were identified between nucleotides 4,962 and 5,061 (ORF 5) and between positions 5,866 and 6,028 in the 3' NCR. Sequence analysis of the pepper-infecting PeVYV revealed three unique regions (Pe1-Pe3) with no similarity to other members of the genus Polerovirus. Genomic analyses of PYLCV and PeVYV suggest that the speciation of these viruses occurred through putative recombination event(s) between poleroviruses co-infecting a common host(s), resulting in the emergence of PYLCV, a novel pathogen with a wider host range. PMID:23936244
Dombrovsky, Aviv; Glanz, Eyal; Lachman, Oded; Sela, Noa; Doron-Faigenboim, Adi; Antignus, Yehezkel
2013-01-01
We determined the complete sequence and organization of the genome of a putative member of the genus Polerovirus tentatively named Pepper yellow leaf curl virus (PYLCV). PYLCV has a wider host range than Tobacco vein-distorting virus (TVDV) and has a close serological relationship with Cucurbit aphid-borne yellows virus (CABYV) (both poleroviruses). The extracted viral RNA was subjected to SOLiD next-generation sequence analysis and used as a template for reverse transcription synthesis, which was followed by PCR amplification. The ssRNA genome of PYLCV includes 6,028 nucleotides encoding six open reading frames (ORFs), which is typical of the genus Polerovirus. Comparisons of the deduced amino acid sequences of the PYLCV ORFs 2-4 and ORF5, indicate that there are high levels of similarity between these sequences to ORFs 2-4 of TVDV (84-93%) and to ORF5 of CABYV (87%). Both PYLCV and Pepper vein yellowing virus (PeVYV) contain sequences that point to a common ancestral polerovirus. The recombination breakpoint which is located at CABYV ORF3, which encodes the viral coat protein (CP), may explain the CABYV-like sequences found in the genomes of the pepper infecting viruses PYLCV and PeVYV. Two additional regions unique to PYLCV (PY1 and PY2) were identified between nucleotides 4,962 and 5,061 (ORF 5) and between positions 5,866 and 6,028 in the 3' NCR. Sequence analysis of the pepper-infecting PeVYV revealed three unique regions (Pe1-Pe3) with no similarity to other members of the genus Polerovirus. Genomic analyses of PYLCV and PeVYV suggest that the speciation of these viruses occurred through putative recombination event(s) between poleroviruses co-infecting a common host(s), resulting in the emergence of PYLCV, a novel pathogen with a wider host range.
Statistical Features of the 2010 Beni-Ilmane, Algeria, Aftershock Sequence
NASA Astrophysics Data System (ADS)
Hamdache, M.; Peláez, J. A.; Gospodinov, D.; Henares, J.
2018-03-01
The aftershock sequence of the 2010 Beni-Ilmane ( M W 5.5) earthquake is studied in depth to analyze the spatial and temporal variability of seismicity parameters of the relationships modeling the sequence. The b value of the frequency-magnitude distribution is examined rigorously. A threshold magnitude of completeness equal to 2.1, using the maximum curvature procedure or the changing point algorithm, and a b value equal to 0.96 ± 0.03 have been obtained for the entire sequence. Two clusters have been identified and characterized by their faulting type, exhibiting b values equal to 0.99 ± 0.05 and 1.04 ± 0.05. Additionally, the temporal decay of the aftershock sequence was examined using a stochastic point process. The analysis was done through the restricted epidemic-type aftershock sequence (RETAS) stochastic model, which allows the possibility to recognize the prevailing clustering pattern of the relaxation process in the examined area. The analysis selected the epidemic-type aftershock sequence (ETAS) model to offer the most appropriate description of the temporal distribution, which presumes that all events in the sequence can cause secondary aftershocks. Finally, the fractal dimensions are estimated using the integral correlation. The obtained D 2 values are 2.15 ± 0.01, 2.23 ± 0.01 and 2.17 ± 0.02 for the entire sequence, and for the first and second cluster, respectively. An analysis of the temporal evolution of the fractal dimensions D -2, D 0, D 2 and the spectral slope has been also performed to derive and characterize the different clusters included in the sequence.
The complete chloroplast genome of a medicinal plant Epimedium koreanum Nakai (Berberidaceae).
Lee, Jung-Hoon; Kim, Kyunghee; Kim, Na-Rae; Lee, Sang-Choon; Yang, Tae-Jin; Kim, Young-Dong
2016-11-01
Epimedium koreanum is a perennial medicinal plant distributed in Eastern Asia. The complete chloroplast genome sequences of E. koreanum was obtained by de novo assembly using whole genome next-generation sequences. The chloroplast genome of E. koreanum was 157 218 bp in length and separated into four distinct regions such as large single copy region (89 600 bp), small single copy region (17 222 bp) and a pair of inverted repeat regions (25 198 bp). The genome contained a total of 112 genes including 78 protein-coding genes, 30 tRNA genes, and 4 rRNA genes. Phylogenetic analysis with the reported chloroplast genomes revealed that E. koreanum is most closely related to Berberis bealei, a traditional medicinal plant in the Berberidaceae family.
Subramanian, Sankar; Lingala, Syamala Gowri; Swaminathan, Siva; Huynen, Leon; Lambert, David
2014-08-01
The complete mitochondrial genome of the Chinstrap penguin (Pygoscelis antarcticus) was sequenced and compared with other penguin mitogenomes. The genome is 15,972 bp in length with the number and order of protein coding genes and RNAs being very similar to that of other known penguin mitogenomes. Comparative nucleotide analysis showed the Chinstrap mitogenome shares 94% homology with the mitogenome of its sister species, Pygoscelis adelie (Adélie penguin). Divergence at nonsynonymous nucleotide positions was found to be up to 23 times less than that observed in synonymous positions of protein coding genes, suggesting high selection constraints. The complete mitogenome data will be useful for genetic and evolutionary studies of penguins.
Complete genome sequence of Anabaena variabilis ATCC 29413
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thiel, Teresa; Pratte, Brenda S.; Zhong, Jinshun
2013-01-01
Anabaena variabilis ATCC 29413 is a filamentous, heterocyst-forming cyanobacterium that has served as a model organism, with an extensive literature extending over 40 years. The strain has three distinct nitrogenases that function under different environmental conditions and is capable of photoautotrophic growth in the light and true heterotrophic growth in the dark using fructose as both carbon and energy source. While this strain was first isolated in 1964 in Mississippi and named Ana-baena flos-aquae MSU A-37, it clusters phylogenetically with cyanobacteria of the genus Nostoc. The strain is a moderate thermophile, growing well at approximately 40 C. Here we providemore » some additional characteristics of the strain, and an analysis of the complete genome sequence.« less
Økland, Arnfinn Lodden; Skoge, Renate Hvidsten; Nylund, Are
2018-06-01
We have determined the complete genome sequence of a new rhabdovirus, tentatively named Caligus rogercresseyi rhabdovirus Ch01 (CrRV-Ch01), which was found in the parasite Caligus rogercresseyi, present on farmed Atlantic salmon (Salmo salar) in Chile. The genome encodes the five canonical rhabdovirus proteins in addition to an unknown protein, in the order N-P-M-U (unknown)-G-L. Phylogenetic analysis showed that the virus clusters with two rhabdoviruses (Lepeophtheirus salmonis rhabdovirus No9 and Lepeophtheirus salmonis rhabdovirus No127) obtained from another parasitic caligid, Lepeophtheirus salmonis, present on farmed Atlantic salmon on the west coast of Norway.
Walker, Joseph F; Zanis, Michael J; Emery, Nancy C
2014-04-01
Complete chloroplast genome studies can help resolve relationships among large, complex plant lineages such as Asteraceae. We present the first whole plastome from the Madieae tribe and compare its sequence variation to other chloroplast genomes in Asteraceae. We used high throughput sequencing to obtain the Lasthenia burkei chloroplast genome. We compared sequence structure and rates of molecular evolution in the small single copy (SSC), large single copy (LSC), and inverted repeat (IR) regions to those for eight Asteraceae accessions and one Solanaceae accession. The chloroplast sequence of L. burkei is 150 746 bp and contains 81 unique protein coding genes and 4 coding ribosomal RNA sequences. We identified three major inversions in the L. burkei chloroplast, all of which have been found in other Asteraceae lineages, and a previously unreported inversion in Lactuca sativa. Regions flanking inversions contained tRNA sequences, but did not have particularly high G + C content. Substitution rates varied among the SSC, LSC, and IR regions, and rates of evolution within each region varied among species. Some observed differences in rates of molecular evolution may be explained by the relative proportion of coding to noncoding sequence within regions. Rates of molecular evolution vary substantially within and among chloroplast genomes, and major inversion events may be promoted by the presence of tRNAs. Collectively, these results provide insight into different mechanisms that may promote intramolecular recombination and the inversion of large genomic regions in the plastome.
Kibenge, Molly J T; Iwamoto, Tokinori; Wang, Yingwei; Morton, Alexandra; Godoy, Marcos G; Kibenge, Frederick S B
2013-07-11
Piscine reovirus (PRV) is a newly discovered fish reovirus of anadromous and marine fish ubiquitous among fish in Norwegian salmon farms, and likely the causative agent of heart and skeletal muscle inflammation (HSMI). HSMI is an increasingly economically significant disease in Atlantic salmon (Salmo salar) farms. The nucleotide sequence data available for PRV are limited, and there is no genetic information on this virus outside of Norway and none from wild fish. RT-PCR amplification and sequencing were used to obtain the complete viral genome of PRV (10 segments) from western Canada and Chile. The genetic diversity among the PRV strains and their relationship to Norwegian PRV isolates were determined by phylogenetic analyses and sequence identity comparisons. PRV is distantly related to members of the genera Orthoreovirus and Aquareovirus and an unambiguous new genus within the family Reoviridae. The Canadian and Norwegian PRV strains are most divergent in the segment S1 and S4 encoded proteins. Phylogenetic analysis of PRV S1 sequences, for which the largest number of complete sequences from different "isolates" is available, grouped Norwegian PRV strains into a single genotype, Genotype I, with sub-genotypes, Ia and Ib. The Canadian PRV strains matched sub-genotype Ia and Chilean PRV strains matched sub-genotype Ib. PRV should be considered as a member of a new genus within the family Reoviridae with two major Norwegian sub-genotypes. The Canadian PRV diverged from Norwegian sub-genotype Ia around 2007 ± 1, whereas the Chilean PRV diverged from Norwegian sub-genotype Ib around 2008 ± 1.
2013-01-01
Background Piscine reovirus (PRV) is a newly discovered fish reovirus of anadromous and marine fish ubiquitous among fish in Norwegian salmon farms, and likely the causative agent of heart and skeletal muscle inflammation (HSMI). HSMI is an increasingly economically significant disease in Atlantic salmon (Salmo salar) farms. The nucleotide sequence data available for PRV are limited, and there is no genetic information on this virus outside of Norway and none from wild fish. Methods RT-PCR amplification and sequencing were used to obtain the complete viral genome of PRV (10 segments) from western Canada and Chile. The genetic diversity among the PRV strains and their relationship to Norwegian PRV isolates were determined by phylogenetic analyses and sequence identity comparisons. Results PRV is distantly related to members of the genera Orthoreovirus and Aquareovirus and an unambiguous new genus within the family Reoviridae. The Canadian and Norwegian PRV strains are most divergent in the segment S1 and S4 encoded proteins. Phylogenetic analysis of PRV S1 sequences, for which the largest number of complete sequences from different “isolates” is available, grouped Norwegian PRV strains into a single genotype, Genotype I, with sub-genotypes, Ia and Ib. The Canadian PRV strains matched sub-genotype Ia and Chilean PRV strains matched sub-genotype Ib. Conclusions PRV should be considered as a member of a new genus within the family Reoviridae with two major Norwegian sub-genotypes. The Canadian PRV diverged from Norwegian sub-genotype Ia around 2007 ± 1, whereas the Chilean PRV diverged from Norwegian sub-genotype Ib around 2008 ± 1. PMID:23844948
Complete genome sequence of ‘Candidatus Liberibacter africanus’
USDA-ARS?s Scientific Manuscript database
The complete genome sequence of ‘Candidatus Liberibacter africanus’ (Laf), strain ptsapsy, was obtained by an Illumina HiSeq 2000. The Laf genome comprises 1,192,232 nucleotides, 34.5% GC content, 1,141 predicted coding sequences, 44 tRNAs, 3 complete copies of ribosomal RNA genes (16S, 23S and 5S) ...
Complete genome sequence of salmonella enterica subsp. enterica Serovar Thompson Strain RM6836
USDA-ARS?s Scientific Manuscript database
Salmonella enterica subsp. enterica serovar Thompson (S. Thompson) strain RM6836 was isolated from lettuce in 2002. We report the complete sequence and annotation of the genome of S. Thompson strain RM6836. This is the first reported complete genome sequence for S. Thompson and will provide a point ...
Complete genome sequence of the clinical Campylobacter coli isolate 15-537360
USDA-ARS?s Scientific Manuscript database
Campylobacter coli strain 15-537360 was originally isolated from a 42 year-old patient with gastroenteritis. Here we report its complete genome sequence, which comprises a 1.7 Mbp chromosome and a 29 kbp conjugative cryptic plasmid. This is the first complete genome sequence of a clinical isolate of...
The first genome sequences of human bocaviruses from Vietnam
Thanh, Tran Tan; Van, Hoang Minh Tu; Hong, Nguyen Thi Thu; Nhu, Le Nguyen Truc; Anh, Nguyen To; Tuan, Ha Manh; Hien, Ho Van; Tuong, Nguyen Manh; Kien, Trinh Trung; Khanh, Truong Huu; Nhan, Le Nguyen Thanh; Hung, Nguyen Thanh; Chau, Nguyen Van Vinh; Thwaites, Guy; van Doorn, H. Rogier; Tan, Le Van
2017-01-01
As part of an ongoing effort to generate complete genome sequences of hand, foot and mouth disease-causing enteroviruses directly from clinical specimens, two complete coding sequences and two partial genomic sequences of human bocavirus 1 (n=3) and 2 (n=1) were co-amplified and sequenced, representing the first genome sequences of human bocaviruses from Vietnam. The sequences may aid future study aiming at understanding the evolution of the virus. PMID:28090592
NASA Astrophysics Data System (ADS)
Kang, Yi-Hao; Chen, Ye-Hong; Shi, Zhi-Cheng; Huang, Bi-Hua; Song, Jie; Xia, Yan
2017-08-01
We propose a protocol for complete Bell-state analysis for two superconducting-quantum-interference-device qubits. The Bell-state analysis could be completed by using a sequence of microwave pulses designed by the transitionless tracking algorithm, which is a useful method in the technique of shortcut to adiabaticity. After the whole process, the information for distinguishing four Bell states will be encoded on two auxiliary qubits, while the Bell states remain unchanged. One can read out the information by detecting the auxiliary qubits. Thus the Bell-state analysis is nondestructive. The numerical simulations show that the protocol possesses a high success probability of distinguishing each Bell state with current experimental technology even when decoherence is taken into account. Thus, the protocol may have potential applications for the information readout in quantum communications and quantum computations in superconducting quantum networks.
Vina-Rodriguez, Ariel; Schlosser, Josephine; Becher, Dietmar; Kaden, Volker; Groschup, Martin H; Eiden, Martin
2015-05-22
An increasing number of indigenous cases of hepatitis E caused by genotype 3 viruses (HEV-3) have been diagnosed all around the word, particularly in industrialized countries. Hepatitis E is a zoonotic disease and accumulating evidence indicates that domestic pigs and wild boars are the main reservoirs of HEV-3. A detailed analysis of HEV-3 subtypes could help to determine the interplay of human activity, the role of animals as reservoirs and cross species transmission. Although complete genome sequences are most appropriate for HEV subtype determination, in most cases only partial genomic sequences are available. We therefore carried out a subtype classification analysis, which uses regions from all three open reading frames of the genome. Using this approach, more than 1000 published HEV-3 isolates were subtyped. Newly recovered HEV partial sequences from hunted German wild boars were also included in this study. These sequences were assigned to genotype 3 and clustered within subtype 3a, 3i and, unexpectedly, one of them within the subtype 3b, a first non-human report of this subtype in Europe.
da Fonseca, Néli José; Lima Afonso, Marcelo Querino; Pedersolli, Natan Gonçalves; de Oliveira, Lucas Carrijo; Andrade, Dhiego Souto; Bleicher, Lucas
2017-10-28
Flaviviruses are responsible for serious diseases such as dengue, yellow fever, and zika fever. Their genomes encode a polyprotein which, after cleavage, results in three structural and seven non-structural proteins. Homologous proteins can be studied by conservation and coevolution analysis as detected in multiple sequence alignments, usually reporting positions which are strictly necessary for the structure and/or function of all members in a protein family or which are involved in a specific sub-class feature requiring the coevolution of residue sets. This study provides a complete conservation and coevolution analysis on all flaviviruses non-structural proteins, with results mapped on all well-annotated available sequences. A literature review on the residues found in the analysis enabled us to compile available information on their roles and distribution among different flaviviruses. Also, we provide the mapping of conserved and coevolved residues for all sequences currently in SwissProt as a supplementary material, so that particularities in different viruses can be easily analyzed. Copyright © 2017 Elsevier Inc. All rights reserved.
Nguyen, Thong T; Suryamohan, Kushal; Kuriakose, Boney; Janakiraman, Vasantharajan; Reichelt, Mike; Chaudhuri, Subhra; Guillory, Joseph; Divakaran, Neethu; Rabins, P E; Goel, Ridhi; Deka, Bhabesh; Sarkar, Suman; Ekka, Preety; Tsai, Yu-Chih; Vargas, Derek; Santhosh, Sam; Mohan, Sangeetha; Chin, Chen-Shan; Korlach, Jonas; Thomas, George; Babu, Azariah; Seshagiri, Somasekar
2018-06-12
We sequenced the Hyposidra talaca NPV (HytaNPV) double stranded circular DNA genome using PacBio single molecule sequencing technology. We found that the HytaNPV genome is 139,089 bp long with a GC content of 39.6%. It encodes 141 open reading frames (ORFs) including the 37 baculovirus core genes, 25 genes conserved among lepidopteran baculoviruses, 72 genes known in baculovirus, and 7 genes unique to the HytaNPV genome. It is a group II alphabaculovirus that codes for the F protein and lacks the gp64 gene found in group I alphabaculovirus viruses. Using RNA-seq, we confirmed the expression of the ORFs identified in the HytaNPV genome. Phylogenetic analysis showed HytaNPV to be closest to BusuNPV, SujuNPV and EcobNPV that infect other tea pests, Buzura suppressaria, Sucra jujuba, and Ectropis oblique, respectively. We identified repeat elements and a conserved non-coding baculovirus element in the genome. Analysis of the putative promoter sequences identified motif consistent with the temporal expression of the genes observed in the RNA-seq data.
USDA-ARS?s Scientific Manuscript database
The complete nucleotide sequence of a recently discovered Florida (FL) isolate of Hibiscus infecting Cilevirus (HiCV) was determined by Sanger sequencing. The movement- and coat- protein gene sequences of the HiCV-FL isolate are more divergent than other genes of the previously sequenced HiCV-HA (Ha...
USDA-ARS?s Scientific Manuscript database
The complete genome sequence of a Southern tomato virus (STV) isolate on tomato plants in a seed production field in Bangladesh was obtained for the first time using next generation sequencing. The identified isolate STV_BD-13 shares high degree of sequence identity (99%) with several known STV isol...
USDA-ARS?s Scientific Manuscript database
Complete genome sequence of a double-stranded RNA (dsRNA) virus, southern tomato virus (STV), on tomatoes in China, was elucidated using small RNAs deep sequencing. The identified STV_CN12 shares 99% sequence identity to other isolates from Mexico, France, Spain, and U.S. This is the first report ...
Massively parallel sequencing-enabled mixture analysis of mitochondrial DNA samples.
Churchill, Jennifer D; Stoljarova, Monika; King, Jonathan L; Budowle, Bruce
2018-02-22
The mitochondrial genome has a number of characteristics that provide useful information to forensic investigations. Massively parallel sequencing (MPS) technologies offer improvements to the quantitative analysis of the mitochondrial genome, specifically the interpretation of mixed mitochondrial samples. Two-person mixtures with nuclear DNA ratios of 1:1, 5:1, 10:1, and 20:1 of individuals from different and similar phylogenetic backgrounds and three-person mixtures with nuclear DNA ratios of 1:1:1 and 5:1:1 were prepared using the Precision ID mtDNA Whole Genome Panel and Ion Chef, and sequenced on the Ion PGM or Ion S5 sequencer (Thermo Fisher Scientific, Waltham, MA, USA). These data were used to evaluate whether and to what degree MPS mixtures could be deconvolved. Analysis was effective in identifying the major contributor in each instance, while SNPs from the minor contributor's haplotype only were identified in the 1:1, 5:1, and 10:1 two-person mixtures. While the major contributor was identified from the 5:1:1 mixture, analysis of the three-person mixtures was more complex, and the mixed haplotypes could not be completely parsed. These results indicate that mixed mitochondrial DNA samples may be interpreted with the use of MPS technologies.
Microsatellite analysis in the genome of Acanthaceae: An in silico approach
Kaliswamy, Priyadharsini; Vellingiri, Srividhya; Nathan, Bharathi; Selvaraj, Saravanakumar
2015-01-01
Background: Acanthaceae is one of the advanced and specialized families with conventionally used medicinal plants. Simple sequence repeats (SSRs) play a major role as molecular markers for genome analysis and plant breeding. The microsatellites existing in the complete genome sequences would help to attain a direct role in the genome organization, recombination, gene regulation, quantitative genetic variation, and evolution of genes. Objective: The current study reports the frequency of microsatellites and appropriate markers for the Acanthaceae family genome sequences. Materials and Methods: The whole nucleotide sequences of Acanthaceae species were obtained from National Center for Biotechnology Information database and screened for the presence of SSRs. SSR Locator tool was used to predict the microsatellites and inbuilt Primer3 module was used for primer designing. Results: Totally 110 repeats from 108 sequences of Acanthaceae family plant genomes were identified, and the occurrence of dinucleotide repeats was found to be abundant in the genome sequences. The essential amino acid isoleucine was found rich in all the sequences. We also designed the SSR-based primers/markers for 59 sequences of this family that contains microsatellite repeats in their genome. Conclusion: The identified microsatellites and primers might be useful for breeding and genetic studies of plants that belong to Acanthaceae family in the future. PMID:25709226
Junttila, N; Lévêque, N; Magnius, L O; Kabue, J P; Muyembe-Tamfum, J J; Maslin, J; Lina, B; Norder, H
2015-03-01
Complete coding regions were sequenced for two new enterovirus genomes: EV-B93 previously identified by VP1 sequencing, derived from a child with acute flaccid paralysis in the Democratic Republic of Congo; and EV-C95 from a French soldier with acute gastroenteritis in Djibouti. The EV-B93 P1 had more than 30% nucleotide divergence from other EV-B types, with highest similarity to E-15 and EV-B80. The P1 nucleotide sequence of EV-C95 was most similar, 71%, to CV-A21. Complete coding regions for the new enteroviruses were compared with those of 135 EV-B and 176 EV-C strains representing all types available in GenBank. When strains from the same outbreak or strains isolated during the same year in the same geographical region were excluded, 27 of the 58 EV-B, and 16 of the 23 EV-C types were represented by more than one sequence. However, for EV-B the P3 sequences formed three clades mainly according to origin or time of isolation, irrespective of type, while for EV-C the P3 sequences segregated mainly according to disease manifestation, with most strains causing paralysis, including polioviruses, forming one clade, and strains causing respiratory illness forming another. There was no intermixing of types between these two clades, apart from two EV-C96 strains. The EV-B P3 sequences had lower inter-clade and higher intra-clade variability as compared to the EV-C sequences, which may explain why inter-clade recombinations are more frequent in EV-B. Further analysis of more isolates may shed light on the role of recombinations in the evolution of EV-B in geographical context. © 2014 Wiley Periodicals, Inc.
2012-01-01
Background Although it has proven to be an important foundation for investigations of carnivoran ecology, biology and evolution, the complete species-level supertree for Carnivora of Bininda-Emonds et al. is showing its age. Additional, largely molecular sequence data are now available for many species and the advancement of computer technology means that many of the limitations of the original analysis can now be avoided. We therefore sought to provide an updated estimate of the phylogenetic relationships within all extant Carnivora, again using supertree analysis to be able to analyze as much of the global phylogenetic database for the group as possible. Results In total, 188 source trees were combined, representing 114 trees from the literature together with 74 newly constructed gene trees derived from nearly 45,000 bp of sequence data from GenBank. The greater availability of sequence data means that the new supertree is almost completely resolved and also better reflects current phylogenetic opinion (for example, supporting a monophyletic Mephitidae, Eupleridae and Prionodontidae; placing Nandinia binotata as sister to the remaining Feliformia). Following an initial rapid radiation, diversification rate analyses indicate a downturn in the net speciation rate within the past three million years as well as a possible increase some 18.0 million years ago; numerous diversification rate shifts within the order were also identified. Conclusions Together, the two carnivore supertrees remain the only complete phylogenetic estimates for all extant species and the new supertree, like the old one, will form a key tool in helping us to further understand the biology of this charismatic group of carnivores. PMID:22369503
Liu, Bin; Wu, Hao; Zhang, Deyuan; Wang, Xiaolong; Chou, Kuo-Chen
2017-02-21
To expedite the pace in conducting genome/proteome analysis, we have developed a Python package called Pse-Analysis. The powerful package can automatically complete the following five procedures: (1) sample feature extraction, (2) optimal parameter selection, (3) model training, (4) cross validation, and (5) evaluating prediction quality. All the work a user needs to do is to input a benchmark dataset along with the query biological sequences concerned. Based on the benchmark dataset, Pse-Analysis will automatically construct an ideal predictor, followed by yielding the predicted results for the submitted query samples. All the aforementioned tedious jobs can be automatically done by the computer. Moreover, the multiprocessing technique was adopted to enhance computational speed by about 6 folds. The Pse-Analysis Python package is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/Pse-Analysis/, and can be directly run on Windows, Linux, and Unix.
The Complete Sequence of a Human Parainfluenzavirus 4 Genome
Yea, Carmen; Cheung, Rose; Collins, Carol; Adachi, Dena; Nishikawa, John; Tellier, Raymond
2009-01-01
Although the human parainfluenza virus 4 (HPIV4) has been known for a long time, its genome, alone among the human paramyxoviruses, has not been completely sequenced to date. In this study we obtained the first complete genomic sequence of HPIV4 from a clinical isolate named SKPIV4 obtained at the Hospital for Sick Children in Toronto (Ontario, Canada). The coding regions for the N, P/V, M, F and HN proteins show very high identities (95% to 97%) with previously available partial sequences for HPIV4B. The sequence for the L protein and the non-coding regions represent new information. A surprising feature of the genome is its length, more than 17 kb, making it the longest genome within the genus Rubulavirus, although the length is well within the known range of 15 kb to 19 kb for the subfamily Paramyxovirinae. The availability of a complete genomic sequence will facilitate investigations on a respiratory virus that is still not completely characterized. PMID:21994536
NASA Astrophysics Data System (ADS)
Gao, Fengtao; Wei, Min; Zhu, Ying; Guo, Hua; Chen, Songlin; Yang, Guanpin
2017-06-01
This study presents the complete mitochondrial genome of the hybrid Epinephelus moara♀× Epinephelus lanceolatus♂. The genome is 16886 bp in length, and contains 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes, a light-strand replication origin and a control region. Additionally, phylogenetic analysis based on the nucleotide sequences of 13 conserved protein-coding genes using the maximum likelihood method indicated that the mitochondrial genome is maternally inherited. This study presents genomic data for studying phylogenetic relationships and breeding of hybrid Epinephelinae.
Guo, Zhong-Long; Wang, Juan; Shen, Yu-Ying
2015-01-01
Insect mitochondrial genome (mitogenome) are the most extensively used genetic information for molecular evolution, phylogenetics and population genetics. Pentatomomorpha (>14,000 species) is the second largest infraorder of Heteroptera and of great economic importance. To better understand the diversity and phylogeny within Pentatomomorpha, we sequenced and annotated the complete mitogenome of Corizus tetraspilus (Hemiptera: Rhopalidae), an important pest of alfalfa in China. We analyzed the main features of the C. tetraspilus mitogenome, and provided a comparative analysis with four other Coreoidea species. Our results reveal that gene content, gene arrangement, nucleotide composition, codon usage, rRNA structures and sequences of mitochondrial transcription termination factor are conserved in Coreoidea. Comparative analysis shows that different protein-coding genes have been subject to different evolutionary rates correlated with the G+C content. All the transfer RNA genes found in Coreoidea have the typical clover leaf secondary structure, except for trnS1 (AGN) which lacks the dihydrouridine (DHU) arm and possesses a unusual anticodon stem (9 bp vs. the normal 5 bp). The control regions (CRs) among Coreoidea are highly variable in size, of which the CR of C. tetraspilus is the smallest (440 bp), making the C. tetraspilus mitogenome the smallest (14,989 bp) within all completely sequenced Coreoidea mitogenomes. No conserved motifs are found in the CRs of Coreoidea. In addition, the A+T content (60.68%) of the CR of C. tetraspilus is much lower than that of the entire mitogenome (74.88%), and is lowest among Coreoidea. Phylogenetic analyses based on mitogenomic data support the monophyly of each superfamily within Pentatomomorpha, and recognize a phylogenetic relationship of (Aradoidea + (Pentatomoidea + (Lygaeoidea + (Pyrrhocoroidea + Coreoidea)))). PMID:26042898
Complete Chloroplast Genome Sequences of Four Meliaceae Species and Comparative Analyses
Mader, Malte; Pakull, Birte; Blanc-Jolivet, Céline; Paulini-Drewes, Maike; Bouda, Zoéwindé Henri-Noël; Degen, Bernd; Small, Ian
2018-01-01
The Meliaceae family mainly consists of trees and shrubs with a pantropical distribution. In this study, the complete chloroplast genomes of four Meliaceae species were sequenced and compared with each other and with the previously published Azadirachta indica plastome. The five plastomes are circular and exhibit a quadripartite structure with high conservation of gene content and order. They include 130 genes encoding 85 proteins, 37 tRNAs and 8 rRNAs. Inverted repeat expansion resulted in a duplication of rps19 in the five Meliaceae species, which is consistent with that in many other Sapindales, but different from many other rosids. Compared to Azadirachta indica, the four newly sequenced Meliaceae individuals share several large deletions, which mainly contribute to the decreased genome sizes. A whole-plastome phylogeny supports previous findings that the four species form a monophyletic sister clade to Azadirachta indica within the Meliaceae. SNPs and indels identified in all complete Meliaceae plastomes might be suitable targets for the future development of genetic markers at different taxonomic levels. The extended analysis of SNPs in the matK gene led to the identification of four potential Meliaceae-specific SNPs as a basis for future validation and marker development. PMID:29494509
Chen, Nian; Lai, Xiao-Ping
2010-07-01
We obtained the complete mitochondrial genome of King Cobra(GenBank accession number: EU_921899) by Ex Taq-PCR, TA-cloning and primer-walking methods. This genome is very similar to other vertebrate, which is 17 267 bp in length and encodes 38 genes (including 13 protein-coding, 2 ribosomal RNA and 23 transfer RNA genes) and two long non-coding regions. The duplication of tRNA-Ile gene forms a new mitochondrial gene rearrangement model. Eight tRNA genes and one protein genes were transcribed from L strand, and the other genes were transcribed genes from H strand. Genes on the H strand show a fairly similar content of Adenosine and Thymine respectively, whereas those on the L strand have higher proportion of A than T. Combined rDNA sequence data (12S+16S rRNA) were used to reconstruct the phylogeny of 21 snake species for which complete mitochondrial genome sequences were available in the public databases. This large data set and an appropriate range of outgroup taxa demonstrated that Elapidae is more closely related to colubridae than viperidae, which supports the traditional viewpoints.
Comparative Analysis of the First Complete Enterococcus faecium Genome
Lam, Margaret M. C.; Seemann, Torsten; Bulach, Dieter M.; Gladman, Simon L.; Chen, Honglei; Haring, Volker; Moore, Robert J.; Ballard, Susan; Grayson, M. Lindsay; Johnson, Paul D. R.; Howden, Benjamin P.
2012-01-01
Vancomycin-resistant enterococci (VRE) are one of the leading causes of nosocomial infections in health care facilities around the globe. In particular, infections caused by vancomycin-resistant Enterococcus faecium are becoming increasingly common. Comparative and functional genomic studies of E. faecium isolates have so far been limited owing to the lack of a fully assembled E. faecium genome sequence. Here we address this issue and report the complete 3.0-Mb genome sequence of the multilocus sequence type 17 vancomycin-resistant Enterococcus faecium strain Aus0004, isolated from the bloodstream of a patient in Melbourne, Australia, in 1998. The genome comprises a 2.9-Mb circular chromosome and three circular plasmids. The chromosome harbors putative E. faecium virulence factors such as enterococcal surface protein, hemolysin, and collagen-binding adhesin. Aus0004 has a very large accessory genome (38%) that includes three prophage and two genomic islands absent among 22 other E. faecium genomes. One of the prophage was present as inverted 50-kb repeats that appear to have facilitated a 683-kb chromosomal inversion across the replication terminus, resulting in a striking replichore imbalance. Other distinctive features include 76 insertion sequence elements and a single chromosomal copy of Tn1549 containing the vanB vancomycin resistance element. A complete E. faecium genome will be a useful resource to assist our understanding of this emerging nosocomial pathogen. PMID:22366422
Hayashi, T; Makino, K; Ohnishi, M; Kurokawa, K; Ishii, K; Yokoyama, K; Han, C G; Ohtsubo, E; Nakayama, K; Murata, T; Tanaka, M; Tobe, T; Iida, T; Takami, H; Honda, T; Sasakawa, C; Ogasawara, N; Yasunaga, T; Kuhara, S; Shiba, T; Hattori, M; Shinagawa, H
2001-02-28
Escherichia coli O157:H7 is a major food-borne infectious pathogen that causes diarrhea, hemorrhagic colitis, and hemolytic uremic syndrome. Here we report the complete chromosome sequence of an O157:H7 strain isolated from the Sakai outbreak, and the results of genomic comparison with a benign laboratory strain, K-12 MG1655. The chromosome is 5.5 Mb in size, 859 Kb larger than that of K-12. We identified a 4.1-Mb sequence highly conserved between the two strains, which may represent the fundamental backbone of the E. coli chromosome. The remaining 1.4-Mb sequence comprises of O157:H7-specific sequences, most of which are horizontally transferred foreign DNAs. The predominant roles of bacteriophages in the emergence of O157:H7 is evident by the presence of 24 prophages and prophage-like elements that occupy more than half of the O157:H7-specific sequences. The O157:H7 chromosome encodes 1632 proteins and 20 tRNAs that are not present in K-12. Among these, at least 131 proteins are assumed to have virulence-related functions. Genome-wide codon usage analysis suggested that the O157:H7-specific tRNAs are involved in the efficient expression of the strain-specific genes. A complete set of the genes specific to O157:H7 presented here sheds new insight into the pathogenicity and the physiology of O157:H7, and will open a way to fully understand the molecular mechanisms underlying the O157:H7 infection.
Ding, Yanqiang; Fang, Yang; Guo, Ling; Li, Zhidan; He, Kaize; Zhao, Yun; Zhao, Hai
2017-01-01
Phylogenetic relationship within different genera of Lemnoideae, a kind of small aquatic monocotyledonous plants, was not well resolved, using either morphological characters or traditional markers. Given that rich genetic information in chloroplast genome makes them particularly useful for phylogenetic studies, we used chloroplast genomes to clarify the phylogeny within Lemnoideae. DNAs were sequenced with next-generation sequencing. The duckweeds chloroplast genomes were indirectly filtered from the total DNA data, or directly obtained from chloroplast DNA data. To test the reliability of assembling the chloroplast genome based on the filtration of the total DNA, two methods were used to assemble the chloroplast genome of Landoltia punctata strain ZH0202. A phylogenetic tree was built on the basis of the whole chloroplast genome sequences using MrBayes v.3.2.6 and PhyML 3.0. Eight complete duckweeds chloroplast genomes were assembled, with lengths ranging from 165,775 bp to 171,152 bp, and each contains 80 protein-coding sequences, four rRNAs, 30 tRNAs and two pseudogenes. The identity of L. punctata strain ZH0202 chloroplast genomes assembled through two methods was 100%, and their sequences and lengths were completely identical. The chloroplast genome comparison demonstrated that the differences in chloroplast genome sizes among the Lemnoideae primarily resulted from variation in non-coding regions, especially from repeat sequence variation. The phylogenetic analysis demonstrated that the different genera of Lemnoideae are derived from each other in the following order: Spirodela , Landoltia , Lemna , Wolffiella , and Wolffia . This study demonstrates potential of whole chloroplast genome DNA as an effective option for phylogenetic studies of Lemnoideae. It also showed the possibility of using chloroplast DNA data to elucidate those phylogenies which were not yet solved well by traditional methods even in plants other than duckweeds.
Phylogenic study of Lemnoideae (duckweeds) through complete chloroplast genomes for eight accessions
Ding, Yanqiang; Fang, Yang; Guo, Ling; Li, Zhidan; He, Kaize
2017-01-01
Background Phylogenetic relationship within different genera of Lemnoideae, a kind of small aquatic monocotyledonous plants, was not well resolved, using either morphological characters or traditional markers. Given that rich genetic information in chloroplast genome makes them particularly useful for phylogenetic studies, we used chloroplast genomes to clarify the phylogeny within Lemnoideae. Methods DNAs were sequenced with next-generation sequencing. The duckweeds chloroplast genomes were indirectly filtered from the total DNA data, or directly obtained from chloroplast DNA data. To test the reliability of assembling the chloroplast genome based on the filtration of the total DNA, two methods were used to assemble the chloroplast genome of Landoltia punctata strain ZH0202. A phylogenetic tree was built on the basis of the whole chloroplast genome sequences using MrBayes v.3.2.6 and PhyML 3.0. Results Eight complete duckweeds chloroplast genomes were assembled, with lengths ranging from 165,775 bp to 171,152 bp, and each contains 80 protein-coding sequences, four rRNAs, 30 tRNAs and two pseudogenes. The identity of L. punctata strain ZH0202 chloroplast genomes assembled through two methods was 100%, and their sequences and lengths were completely identical. The chloroplast genome comparison demonstrated that the differences in chloroplast genome sizes among the Lemnoideae primarily resulted from variation in non-coding regions, especially from repeat sequence variation. The phylogenetic analysis demonstrated that the different genera of Lemnoideae are derived from each other in the following order: Spirodela, Landoltia, Lemna, Wolffiella, and Wolffia. Discussion This study demonstrates potential of whole chloroplast genome DNA as an effective option for phylogenetic studies of Lemnoideae. It also showed the possibility of using chloroplast DNA data to elucidate those phylogenies which were not yet solved well by traditional methods even in plants other than duckweeds. PMID:29302399
Yoshida, Tetsuya; Kitazawa, Yugo; Komatsu, Ken; Neriya, Yutaro; Ishikawa, Kazuya; Fujita, Naoko; Hashimoto, Masayoshi; Maejima, Kensaku; Yamaji, Yasuyuki; Namba, Shigetou
2014-11-01
In this study, we detected a Japanese isolate of hibiscus latent Fort Pierce virus (HLFPV-J), a member of the genus Tobamovirus, in a hibiscus plant in Japan and determined the complete sequence and organization of its genome. HLFPV-J has four open reading frames (ORFs), each of which shares more than 98 % nucleotide sequence identity with those of other HLFPV isolates. Moreover, HLFPV-J contains a unique internal poly(A) region of variable length, ranging from 44 to 78 nucleotides, in its 3'-untranslated region (UTR), as is the case with hibiscus latent Singapore virus (HLSV), another hibiscus-infecting tobamovirus. The length of the HLFPV-J genome was 6431 nucleotides, including the shortest internal poly(A) region. The sequence identities of ORFs 1, 2, 3 and 4 of HLFPV-J to other tobamoviruses were 46.6-68.7, 49.9-70.8, 31.0-70.8 and 39.4-70.1 %, respectively, at the nucleotide level and 39.8-75.0, 43.6-77.8, 19.2-70.4 and 31.2-74.2 %, respectively, at the amino acid level. The 5'- and 3'-UTRs of HLFPV-J showed 24.3-58.6 and 13.0-79.8 % identity, respectively, to other tobamoviruses. In particular, when compared to other tobamoviruses, each ORF and UTR of HLFPV-J showed the highest sequence identity to those of HLSV. Phylogenetic analysis showed that HLFPV-J, other HLFPV isolates and HLSV constitute a malvaceous-plant-infecting tobamovirus cluster. These results indicate that the genomic structure of HLFPV-J has unique features similar to those of HLSV. To our knowledge, this is the first report of the complete genome sequence of HLFPV.
The complete CDS of the prion protein (PRNP) gene of African lion (Panthera leo).
Maj, Andrzej; Spellman, Garth M; Sarver, Shane K
2008-04-01
We provide the complete PRNP CDS sequence for the African lion, which is different from the previously published sequence and more similar to other carnivore sequences. The newly obtained prion protein sequence differs from the domestic cat sequence at three amino acid positions and contains only four octapeptide repeats. We recommend that this sequence be used as the reference sequence for future studies of the PRNP gene for this species.
Complete genome sequence of the first human parechovirus type 3 isolated in Taiwan.
Chang, Jenn-Tzong; Yang, Chih-Shiang; Chen, Bao-Chen; Chen, Yao-Shen; Chang, Tsung-Hsien
2017-11-01
The first human parechovirus 3 (HPeV3 VGHKS-2007) in Taiwan was identified from a clinical specimen from a male infant. The entire genome of the HPeV3 isolate was sequenced and compared to known HPeV3 sequences. Genome alignment data showed that HPeV3 VGHKS-2007 shares the highest nucleotide identity, 99%, with the Japanese strain of HPeV3 1361K-162589-Yamagata-2008. All HPeV3 isolates possess at least 97% amino acid identity. The analysis of the genome sequence of HPeV3 VGHKS-2007 will facilitate future investigations of the epidemiology and pathogenicity of HPeV3 infection. Copyright © 2017. Published by Elsevier Taiwan LLC.
Complete genome sequence of chinese strain of ‘Candidatus Liberibacter asiaticus’
USDA-ARS?s Scientific Manuscript database
The complete genome sequence of ‘Candidatus Liberibacter asiaticus’ strain (Las) Guangxi-1(GX-1) was obtained by an Illumina HiSeq 2000. The GX-1 genome comprises 1,268,237 nucleotides, 36.5 % GC content, 1,141 predicted coding sequences, 44 tRNAs, 3 complete copies of ribosomal RNA genes (16S, 23S ...
Analysis on the use of Multi-Sequence MRI Series for Segmentation of Abdominal Organs
NASA Astrophysics Data System (ADS)
Selver, M. A.; Selvi, E.; Kavur, E.; Dicle, O.
2015-01-01
Segmentation of abdominal organs from MRI data sets is a challenging task due to various limitations and artefacts. During the routine clinical practice, radiologists use multiple MR sequences in order to analyze different anatomical properties. These sequences have different characteristics in terms of acquisition parameters (such as contrast mechanisms and pulse sequence designs) and image properties (such as pixel spacing, slice thicknesses and dynamic range). For a complete understanding of the data, computational techniques should combine the information coming from these various MRI sequences. These sequences are not acquired in parallel but in a sequential manner (one after another). Therefore, patient movements and respiratory motions change the position and shape of the abdominal organs. In this study, the amount of these effects is measured using three different symmetric surface distance metrics performed to three dimensional data acquired from various MRI sequences. The results are compared to intra and inter observer differences and discussions on using multiple MRI sequences for segmentation and the necessities for registration are presented.
Munger, Steven C.; Raghupathy, Narayanan; Choi, Kwangbom; Simons, Allen K.; Gatti, Daniel M.; Hinerfeld, Douglas A.; Svenson, Karen L.; Keller, Mark P.; Attie, Alan D.; Hibbs, Matthew A.; Graber, Joel H.; Chesler, Elissa J.; Churchill, Gary A.
2014-01-01
Massively parallel RNA sequencing (RNA-seq) has yielded a wealth of new insights into transcriptional regulation. A first step in the analysis of RNA-seq data is the alignment of short sequence reads to a common reference genome or transcriptome. Genetic variants that distinguish individual genomes from the reference sequence can cause reads to be misaligned, resulting in biased estimates of transcript abundance. Fine-tuning of read alignment algorithms does not correct this problem. We have developed Seqnature software to construct individualized diploid genomes and transcriptomes for multiparent populations and have implemented a complete analysis pipeline that incorporates other existing software tools. We demonstrate in simulated and real data sets that alignment to individualized transcriptomes increases read mapping accuracy, improves estimation of transcript abundance, and enables the direct estimation of allele-specific expression. Moreover, when applied to expression QTL mapping we find that our individualized alignment strategy corrects false-positive linkage signals and unmasks hidden associations. We recommend the use of individualized diploid genomes over reference sequence alignment for all applications of high-throughput sequencing technology in genetically diverse populations. PMID:25236449
SIMPLEX: Cloud-Enabled Pipeline for the Comprehensive Analysis of Exome Sequencing Data
Fischer, Maria; Snajder, Rene; Pabinger, Stephan; Dander, Andreas; Schossig, Anna; Zschocke, Johannes; Trajanoski, Zlatko; Stocker, Gernot
2012-01-01
In recent studies, exome sequencing has proven to be a successful screening tool for the identification of candidate genes causing rare genetic diseases. Although underlying targeted sequencing methods are well established, necessary data handling and focused, structured analysis still remain demanding tasks. Here, we present a cloud-enabled autonomous analysis pipeline, which comprises the complete exome analysis workflow. The pipeline combines several in-house developed and published applications to perform the following steps: (a) initial quality control, (b) intelligent data filtering and pre-processing, (c) sequence alignment to a reference genome, (d) SNP and DIP detection, (e) functional annotation of variants using different approaches, and (f) detailed report generation during various stages of the workflow. The pipeline connects the selected analysis steps, exposes all available parameters for customized usage, performs required data handling, and distributes computationally expensive tasks either on a dedicated high-performance computing infrastructure or on the Amazon cloud environment (EC2). The presented application has already been used in several research projects including studies to elucidate the role of rare genetic diseases. The pipeline is continuously tested and is publicly available under the GPL as a VirtualBox or Cloud image at http://simplex.i-med.ac.at; additional supplementary data is provided at http://www.icbi.at/exome. PMID:22870267
Lyssavirus in Indian Flying Foxes, Sri Lanka.
Gunawardena, Panduka S; Marston, Denise A; Ellis, Richard J; Wise, Emma L; Karawita, Anjana C; Breed, Andrew C; McElhinney, Lorraine M; Johnson, Nicholas; Banyard, Ashley C; Fooks, Anthony R
2016-08-01
A novel lyssavirus was isolated from brains of Indian flying foxes (Pteropus medius) in Sri Lanka. Phylogenetic analysis of complete virus genome sequences, and geographic location and host species, provides strong evidence that this virus is a putative new lyssavirus species, designated as Gannoruwa bat lyssavirus.
Analysis of whole genome sequences of 16 strains of rubella virus from the United States, 1961-2009.
Abernathy, Emily; Chen, Min-hsin; Bera, Jayati; Shrivastava, Susmita; Kirkness, Ewen; Zheng, Qi; Bellini, William; Icenogle, Joseph
2013-01-25
Rubella virus is the causative agent of rubella, a mild rash illness, and a potent teratogenic agent when contracted by a pregnant woman. Global rubella control programs target the reduction and elimination of congenital rubella syndrome. Phylogenetic analysis of partial sequences of rubella viruses has contributed to virus surveillance efforts and played an important role in demonstrating that indigenous rubella viruses have been eliminated in the United States. Sixteen wild-type rubella viruses were chosen for whole genome sequencing. All 16 viruses were collected in the United States from 1961 to 2009 and are from 8 of the 13 known rubella genotypes. Phylogenetic analysis of 30 whole genome sequences produced a maximum likelihood tree giving high bootstrap values for all genotypes except provisional genotype 1a. Comparison of the 16 new complete sequences and 14 previously sequenced wild-type viruses found regions with clusters of variable amino acids. The 5' 250 nucleotides of the genome are more conserved than any other part of the genome. Genotype specific deletions in the untranslated region between the non-structural and structural open reading frames were observed for genotypes 2B and genotype 1G. No evidence was seen for recombination events among the 30 viruses. The analysis presented here is consistent with previous reports on the genetic characterization of rubella virus genomes. Conserved and variable regions were identified and additional evidence for genotype specific nucleotide deletions in the intergenic region was found. Phylogenetic analysis confirmed genotype groupings originally based on structural protein coding region sequences, which provides support for the WHO nomenclature for genetic characterization of wild-type rubella viruses.
Huala, Eva; Dickerman, Allan W.; Garcia-Hernandez, Margarita; Weems, Danforth; Reiser, Leonore; LaFond, Frank; Hanley, David; Kiphart, Donald; Zhuang, Mingzhe; Huang, Wen; Mueller, Lukas A.; Bhattacharyya, Debika; Bhaya, Devaki; Sobral, Bruno W.; Beavis, William; Meinke, David W.; Town, Christopher D.; Somerville, Chris; Rhee, Seung Yon
2001-01-01
Arabidopsis thaliana, a small annual plant belonging to the mustard family, is the subject of study by an estimated 7000 researchers around the world. In addition to the large body of genetic, physiological and biochemical data gathered for this plant, it will be the first higher plant genome to be completely sequenced, with completion expected at the end of the year 2000. The sequencing effort has been coordinated by an international collaboration, the Arabidopsis Genome Initiative (AGI). The rationale for intensive investigation of Arabidopsis is that it is an excellent model for higher plants. In order to maximize use of the knowledge gained about this plant, there is a need for a comprehensive database and information retrieval and analysis system that will provide user-friendly access to Arabidopsis information. This paper describes the initial steps we have taken toward realizing these goals in a project called The Arabidopsis Information Resource (TAIR) (www.arabidopsis.org). PMID:11125061
Evaluation of microbial community in hydrothermal field by direct DNA sequencing
NASA Astrophysics Data System (ADS)
Kawarabayasi, Y.; Maruyama, A.
2002-12-01
Many extremophiles have been discovered from terrestrial and marine hydrothermal fields. Some thermophiles can grow beyond 90°C in culture, while direct microscopic analysis occasionally indicates that microbes may survive in much hotter hydrothermal fluids. However, it is very difficult to isolate and cultivate such microbes from the environments, i.e., over 99% of total microbes remains undiscovered. Based on experiences of entire microbial genome analysis (Y.K.) and microbial community analysis (A.M.), we started to find out unique microbes/genes in hydrothermal fields through direct sequencing of environmental DNA fragments. At first, shotgun plasmid libraries were directly constructed with the DNA molecules prepared from mixed microbes collected by an in situ filtration system from low-temperature fluids at RM24 in the Southern East Pacific Rise (S-EPR). A gene amplification (PCR) technique was not used for preventing mutation in the process. The nucleotide sequences of 285 clones indicated that no sequence had identical data in public databases. Among 27 clones determined entire sequences, no ORF was identified on 14 clones like intron in Eukaryote. On four clones, tetra-nucleotide-long multiple tandem repetitive sequences were identified. This type of sequence was identified in some familiar disease in human. The result indicates that living/dead materials with eukaryotic features may exist in this low temperature field. Secondly, shotgun plasmid libraries were constructed from the environmental DNA prepared from Beppu hot springs. In randomly-selected 143 clones used for sequencing, no known sequence was identified. Unlike the clones in S-EPR library, clear ORFs were identified on all nine clones determined the entire sequence. It was found that one clone, H4052, contained the complete Aspartyl-tRNA synthetase. Phylogenetic analysis using amino acid sequences of this gene indicated that this gene was separated from other Euryarchaea before the differentiation of species. Thus, some novel archaeal species are expected to be in this field. The present direct cloning and sequencing technique is now opening a window to the new world in hydrothermal microbial community analysis.
Complete genome analysis of porcine kobuviruses from the feces of pigs in Japan.
Akagami, Masataka; Ito, Mika; Niira, Kazutaka; Kuroda, Moegi; Masuda, Tsuneyuki; Haga, Kei; Tsuchiaka, Shinobu; Naoi, Yuki; Kishimoto, Mai; Sano, Kaori; Omatsu, Tsutomu; Aoki, Hiroshi; Katayama, Yukie; Oba, Mami; Oka, Tomoichiro; Ichimaru, Toru; Yamasato, Hiroshi; Ouchi, Yoshinao; Shirai, Junsuke; Katayama, Kazuhiko; Mizutani, Tetsuya; Nagai, Makoto
2017-08-01
Porcine kobuviruses (PoKoVs) are ubiquitously distributed in pig populations worldwide and are thought to be enteric viruses in swine. Although PoKoVs have been detected in pigs in Japan, no complete genome data for Japanese PoKoVs are available. In the present study, 24 nearly complete or complete sequences of the PoKoV genome obtained from 10 diarrheic feces and 14 non-diarrheic feces of Japanese pigs were analyzed using a metagenomics approach. Japanese PoKoVs shared 85.2-100% identity with the complete coding nucleotide (nt) sequences and the closest relationship of 85.1-98.3% with PoKoVs from other countries. Twenty of 24 Japanese PoKoVs carried a deletion of 90 nt in the 2B coding region. Phylogenetic tree analyses revealed that PoKoVs were not grouped according to their geographical region of origin and the phylogenetic trees of the L, P1, P2, and P3 genetic regions showed topologies different from each other. Similarity plot analysis using strains from a single farm revealed partially different similarity patterns among strains from identical farm origins, suggesting that recombination events had occurred. These results indicate that various PoKoV strains are prevalent and not restricted geographically on pig farms worldwide and the coexistence of multiple strains leads to recombination events of PoKoVs and contributes to the genetic diversity and evolution of PoKoVs.
NASA Technical Reports Server (NTRS)
Diak, George R.; Smith, William L.
1993-01-01
The goals of this research endeavor have been to develop a flexible and relatively complete framework for the investigation of current and future satellite data sources in numerical meteorology. In order to realistically model how satellite information might be used for these purposes, it is necessary that Observing System Simulation Experiments (OSSEs) be as complete as possible. It is therefore desirable that these experiments simulate in entirety the sequence of steps involved in bringing satellite information from the radiance level through product retrieval to a realistic analysis and forecast sequence. In this project we have worked to make this sequence realistic by synthesizing raw satellite data from surrogate atmospheres, deriving satellite products from these data and subsequently producing analyses and forecasts using the retrieved products. The accomplishments made in 1991 are presented. The emphasis was on examining atmospheric soundings and microphysical products which we expect to produce with the launch of the Advanced Microwave Sounding Unit (AMSU), slated for flight in mid 1994.
The Genome 10K Project: a way forward.
Koepfli, Klaus-Peter; Paten, Benedict; O'Brien, Stephen J
2015-01-01
The Genome 10K Project was established in 2009 by a consortium of biologists and genome scientists determined to facilitate the sequencing and analysis of the complete genomes of 10,000 vertebrate species. Since then the number of selected and initiated species has risen from ∼26 to 277 sequenced or ongoing with funding, an approximately tenfold increase in five years. Here we summarize the advances and commitments that have occurred by mid-2014 and outline the achievements and present challenges of reaching the 10,000-species goal. We summarize the status of known vertebrate genome projects, recommend standards for pronouncing a genome as sequenced or completed, and provide our present and future vision of the landscape of Genome 10K. The endeavor is ambitious, bold, expensive, and uncertain, but together the Genome 10K Consortium of Scientists and the worldwide genomics community are moving toward their goal of delivering to the coming generation the gift of genome empowerment for many vertebrate species.
Liu, Chen-Jian; Wang, Rui; Gong, Fu-Ming; Liu, Xiao-Feng; Zheng, Hua-Jun; Luo, Yi-Yong; Li, Xiao-Ran
2015-12-01
Lactobacillus plantarum is an important probiotic and is mostly isolated from fermented foods. We sequenced the genome of L. plantarum strain 5-2, which was derived from fermented soybean isolated from Yunnan province, China. The strain was determined to contain 3114 genes. Fourteen complete insertion sequence (IS) elements were found in 5-2 chromosome. There were 24 DNA replication proteins and 76 DNA repair proteins in the 5-2 genome. Consistent with the classification of L. plantarum as a facultative heterofermentative lactobacillus, the 5-2 genome encodes key enzymes required for the EMP (Embden-Meyerhof-Parnas) and phosphoketolase (PK) pathways. Several components of the secretion machinery are found in the 5-2 genome, which was compared with L. plantarum ST-III, JDM1 and WCFS1. Most of the specific proteins in the four genomes appeared to be related to their prophage elements. Copyright © 2015 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garcia-Zepeda, E.A.; Sarafi, M.N.; Luster, A.D.
1997-05-01
Eotaxin is a CC chemokine that is a specific chemoattractant for eosinophils and is implicated in the pathogenesis of eosinophilic inflammatory diseases, such as asthma. We describe the genomic organization, complete sequence, including 1354 bp 5{prime} of the RNA initiation site, and chromosomal localization of the human eotaxin gene. Fluorescence in situ hybridization analysis localized eotaxin to human chromosome 17, in the region q21.1-q21.2, and the human gene name SCYA11 was assigned. We also present the 5{prime} flanking sequence of the mouse eotaxin gene and have identified several regulatory elements that are conserved between the murine and the human promoters.more » In particular, the presence of elements such as NF-{Kappa}B, interferon-{gamma} response element, and glucocorticoid response element may explain the observed regulation of the eotaxin gene by cytokines and glucocorticoids. 17 refs., 4 figs., 1 tab.« less
The Genome 10K Project: A Way Forward
Koepfli, Klaus-Peter; Paten, Benedict; O’Brien, Stephen J.
2017-01-01
The Genome 10K Project was established in 2009 by a consortium of biologists and genome scientists determined to facilitate the sequencing and analysis of the complete genomes of 10,000 vertebrate species. Since then the number of selected and initiated species has risen from ~26 to 277 sequenced or ongoing with funding, an approximately tenfold increase in five years. Here we summarize the advances and commitments that have occurred by mid-2014 and outline the achievements and present challenges of reaching the 10,000-species goal. We summarize the status of known vertebrate genome projects, recommend standards for pronouncing a genome as sequenced or completed, and provide our present and future vision of the landscape of Genome 10K. The endeavor is ambitious, bold, expensive, and uncertain, but together the Genome 10K Consortium of Scientists and the worldwide genomics community are moving toward their goal of delivering to the coming generation the gift of genome empowerment for many vertebrate species. PMID:25689317
Sequence Analysis of Mitochondrial Genome of Toxascaris leonina from a South China Tiger.
Li, Kangxin; Yang, Fang; Abdullahi, A Y; Song, Meiran; Shi, Xianli; Wang, Minwei; Fu, Yeqi; Pan, Weida; Shan, Fang; Chen, Wu; Li, Guoqing
2016-12-01
Toxascaris leonina is a common parasitic nematode of wild mammals and has significant impacts on the protection of rare wild animals. To analyze population genetic characteristics of T. leonina from South China tiger, its mitochondrial (mt) genome was sequenced. Its complete circular mt genome was 14,277 bp in length, including 12 protein-coding genes, 22 tRNA genes, 2 rRNA genes, and 2 non-coding regions. The nucleotide composition was biased toward A and T. The most common start codon and stop codon were TTG and TAG, and 4 genes ended with an incomplete stop codon. There were 13 intergenic regions ranging 1 to 10 bp in size. Phylogenetically, T. leonina from a South China tiger was close to canine T. leonina . This study reports for the first time a complete mt genome sequence of T. leonina from the South China tiger, and provides a scientific basis for studying the genetic diversity of nematodes between different hosts.
1994-01-01
The apparatus that permits protein translocation across the internal thylakoid membranes of chloroplasts is completely unknown, even though these membranes have been the subject of extensive biochemical analysis. We have used a genetic approach to characterize the translocation of Chlamydomonas cytochrome f, a chloroplast-encoded protein that spans the thylakoid once. Mutations in the hydrophobic core of the cytochrome f signal sequence inhibit the accumulation of cytochrome f, lead to an accumulation of precursor, and impair the ability of Chlamydomonas cells to grow photosynthetically. One hydrophobic core mutant also reduces the accumulation of other thylakoid membrane proteins, but not those that translocate completely across the membrane. These results suggest that the signal sequence of cytochrome f is required and is involved in one of multiple insertion pathways. Suppressors of two signal peptide mutations describe at least two nuclear genes whose products likely describe the translocation apparatus, and selected second-site chloroplast suppressors further define regions of the cytochrome f signal peptide. PMID:8034740
Pandey, Ram Vinay; Pabinger, Stephan; Kriegner, Albert; Weinhäusel, Andreas
2016-01-01
Traditional Sanger sequencing as well as Next-Generation Sequencing have been used for the identification of disease causing mutations in human molecular research. The majority of currently available tools are developed for research and explorative purposes and often do not provide a complete, efficient, one-stop solution. As the focus of currently developed tools is mainly on NGS data analysis, no integrative solution for the analysis of Sanger data is provided and consequently a one-stop solution to analyze reads from both sequencing platforms is not available. We have therefore developed a new pipeline called MutAid to analyze and interpret raw sequencing data produced by Sanger or several NGS sequencing platforms. It performs format conversion, base calling, quality trimming, filtering, read mapping, variant calling, variant annotation and analysis of Sanger and NGS data under a single platform. It is capable of analyzing reads from multiple patients in a single run to create a list of potential disease causing base substitutions as well as insertions and deletions. MutAid has been developed for expert and non-expert users and supports four sequencing platforms including Sanger, Illumina, 454 and Ion Torrent. Furthermore, for NGS data analysis, five read mappers including BWA, TMAP, Bowtie, Bowtie2 and GSNAP and four variant callers including GATK-HaplotypeCaller, SAMTOOLS, Freebayes and VarScan2 pipelines are supported. MutAid is freely available at https://sourceforge.net/projects/mutaid.
Pandey, Ram Vinay; Pabinger, Stephan; Kriegner, Albert; Weinhäusel, Andreas
2016-01-01
Traditional Sanger sequencing as well as Next-Generation Sequencing have been used for the identification of disease causing mutations in human molecular research. The majority of currently available tools are developed for research and explorative purposes and often do not provide a complete, efficient, one-stop solution. As the focus of currently developed tools is mainly on NGS data analysis, no integrative solution for the analysis of Sanger data is provided and consequently a one-stop solution to analyze reads from both sequencing platforms is not available. We have therefore developed a new pipeline called MutAid to analyze and interpret raw sequencing data produced by Sanger or several NGS sequencing platforms. It performs format conversion, base calling, quality trimming, filtering, read mapping, variant calling, variant annotation and analysis of Sanger and NGS data under a single platform. It is capable of analyzing reads from multiple patients in a single run to create a list of potential disease causing base substitutions as well as insertions and deletions. MutAid has been developed for expert and non-expert users and supports four sequencing platforms including Sanger, Illumina, 454 and Ion Torrent. Furthermore, for NGS data analysis, five read mappers including BWA, TMAP, Bowtie, Bowtie2 and GSNAP and four variant callers including GATK-HaplotypeCaller, SAMTOOLS, Freebayes and VarScan2 pipelines are supported. MutAid is freely available at https://sourceforge.net/projects/mutaid. PMID:26840129
Complete Coding Genome Sequence for Mogiana Tick Virus, a Jingmenvirus Isolated from Ticks in Brazil
2017-05-04
and capable of infecting a wide range of animal hosts (1–5). Here, we report the complete coding genome sequence (i.e., only missing portions of...segmented nature of the genome was not under- stood. Therefore, only the two genome segments with detectable sequence homolo- gies to flaviviruses were...originally reported (2). We revisited the data set of Maruyama et al. (2) and assembled the complete coding sequences for all four genome segments. We
Cloning and sequence analysis of a cDNA clone coding for the mouse GM2 activator protein.
Bellachioma, G; Stirling, J L; Orlacchio, A; Beccari, T
1993-01-01
A cDNA (1.1 kb) containing the complete coding sequence for the mouse GM2 activator protein was isolated from a mouse macrophage library using a cDNA for the human protein as a probe. There was a single ATG located 12 bp from the 5' end of the cDNA clone followed by an open reading frame of 579 bp. Northern blot analysis of mouse macrophage RNA showed that there was a single band with a mobility corresponding to a size of 2.3 kb. We deduce from this that the mouse mRNA, in common with the mRNA for the human GM2 activator protein, has a long 3' untranslated sequence of approx. 1.7 kb. Alignment of the mouse and human deduced amino acid sequences showed 68% identity overall and 75% identity for the sequence on the C-terminal side of the first 31 residues, which in the human GM2 activator protein contains the signal peptide. Hydropathicity plots showed great similarity between the mouse and human sequences even in regions of low sequence similarity. There is a single N-glycosylation site in the mouse GM2 activator protein sequence (Asn151-Phe-Thr) which differs in its location from the single site reported in the human GM2 activator protein sequence (Asn63-Val-Thr). Images Figure 1 PMID:7689829
Characterization and mapping of cDNA encoding aspartate aminotransferase in rice, Oryza sativa L.
Song, J; Yamamoto, K; Shomura, A; Yano, M; Minobe, Y; Sasaki, T
1996-10-31
Fifteen cDNA clones, putatively identified as encoding aspartate aminotransferase (AST, EC 2.6.1.1.), were isolated and partially sequenced. Together with six previously isolated clones putatively identified to encode ASTs (Sasaki, et al. 1994, Plant Journal 6, 615-624), their sequences were characterized and classified into 4 cDNA species. Two of the isolated clones, C60213 and C2079, were full-length cDNAs, and their complete nucleotide sequences were determined. C60213 was 1612 bp long and its deduced amino acid sequence showed 88% homology with that of Panicum miliaceum L. mitochondrial AST. The C60213-encoded protein had an N-terminal amino acid sequence that was characteristic of a mitochondrial transit peptide. On the other hand, C2079 was 1546 bp long and had 91% amino acid sequence homology with P. miliaceum L. cytosolic AST but lacked in the transit peptide sequence. The homologies of nucleotide sequences and deduced amino acid sequences of C2079 and C60213 were 54% and 52%, respectively. C2079 and C60213 were mapped on chromosomes 1 and 6, respectively, by restriction fragment length polymorphism linkage analysis. Northern blot analysis using C2079 as a probe revealed much higher transcript levels in callus and root than in green and etiolated shoots, suggesting tissue-specific variations of AST gene expression.
First report of the complete sequence of Sida golden yellow vein virus from Jamaica.
Stewart, Cheryl S; Kon, Tatsuya; Gilbertson, Robert L; Roye, Marcia E
2011-08-01
Begomoviruses are phytopathogens that threaten food security [18]. Sida spp. are ubiquitous weed species found in Jamaica. Sida samples were collected island-wide, DNA was extracted via a modified Dellaporta method, and the viral genome was amplified using degenerate and sequence-specific primers [2, 11]. The amplicons were cloned and sequenced. Sequence analysis revealed that a DNA-A molecule isolated from a plant in Liguanea, St. Andrew, was 90.9% similar to Sida golden yellow vein virus-[United States of America:Homestead:A11], making it a strain of SiGYVV. It was named Sida golden yellow vein virus-[Jamaica:Liguanea 2:2008] (SiGYVV-[JM:Lig2:08]). The cognate DNA-B, previously unreported, was successfully cloned and was most similar to that of Malvastrum yellow mosaic Jamaica virus (MaYMJV). Phylogenetic analysis suggested that this virus was most closely related to begomoviruses that infect malvaceous hosts in Jamaica, Cuba and Florida in the United States.
Rodríguez, Javier M; Moreno, Leticia Tais; Alejo, Alí; Lacasta, Anna; Rodríguez, Fernando; Salas, María L
2015-01-01
The strain BA71V has played a key role in African swine fever virus (ASFV) research. It was the first genome sequenced, and remains the only genome completely determined. A large part of the studies on the function of ASFV genes, viral transcription, replication, DNA repair and morphogenesis, has been performed using this model. This avirulent strain was obtained by adaptation to grow in Vero cells of the highly virulent BA71 strain. We report here the analysis of the genome sequence of BA71 in comparison with that of BA71V. They possess the smallest genomes for a virulent or an attenuated ASFV, and are essentially identical except for a relatively small number of changes. We discuss the possible contribution of these changes to virulence. Analysis of the BA71 sequence allowed us to identify new similarities among ASFV proteins, and with database proteins including two ASFV proteins that could function as a two-component signaling network.
Isolation and cloning of a metalloproteinase from king cobra snake venom.
Guo, Xiao-Xi; Zeng, Lin; Lee, Wen-Hui; Zhang, Yun; Jin, Yang
2007-06-01
A 50 kDa fibrinogenolytic protease, ohagin, from the venom of Ophiophagus hannah was isolated by a combination of gel filtration, ion-exchange and heparin affinity chromatography. Ohagin specifically degraded the alpha-chain of human fibrinogen and the proteolytic activity was completely abolished by EDTA, but not by PMSF, suggesting it is a metalloproteinase. It dose-dependently inhibited platelet aggregation induced by ADP, TMVA and stejnulxin. The full sequence of ohagin was deduced by cDNA cloning and confirmed by protein sequencing and peptide mass fingerprinting. The full-length cDNA sequence of ohagin encodes an open reading frame of 611 amino acids that includes signal peptide, proprotein and mature protein comprising metalloproteinase, disintegrin-like and cysteine-rich domains, suggesting it belongs to P-III class metalloproteinase. In addition, P-III class metalloproteinases from the venom glands of Naja atra, Bungarus multicinctus and Bungarus fasciatus were also cloned in this study. Sequence analysis and phylogenetic analysis indicated that metalloproteinases from elapid snake venoms form a new subgroup of P-III SVMPs.
A study of entropy/clarity of genetic sequences using metric spaces and fuzzy sets.
Georgiou, D N; Karakasidis, T E; Nieto, Juan J; Torres, A
2010-11-07
The study of genetic sequences is of great importance in biology and medicine. Sequence analysis and taxonomy are two major fields of application of bioinformatics. In the present paper we extend the notion of entropy and clarity to the use of different metrics and apply them in the case of the Fuzzy Polynuclotide Space (FPS). Applications of these notions on selected polynucleotides and complete genomes both in the I(12×k) space, but also using their representation in FPS are presented. Our results show that the values of fuzzy entropy/clarity are indicative of the degree of complexity necessary for the description of the polynucleotides in the FPS, although in the latter case the interpretation is slightly different than in the case of the I(12×k) hypercube. Fuzzy entropy/clarity along with the use of appropriate metrics can contribute to sequence analysis and taxonomy. Copyright © 2010 Elsevier Ltd. All rights reserved.
Complete Nucleotide Sequence of Watermelon Chlorotic Stunt Virus Originating from Oman
Khan, Akhtar J.; Akhtar, Sohail; Briddon, Rob W.; Ammara, Um; Al-Matrooshi, Abdulrahman M.; Mansoor, Shahid
2012-01-01
Watermelon chlorotic stunt virus (WmCSV) is a bipartite begomovirus (genus Begomovirus, family Geminiviridae) that causes economic losses to cucurbits, particularly watermelon, across the Middle East and North Africa. Recently squash (Cucurbita moschata) grown in an experimental field in Oman was found to display symptoms such as leaf curling, yellowing and stunting, typical of a begomovirus infection. Sequence analysis of the virus isolated from squash showed 97.6–99.9% nucleotide sequence identity to previously described WmCSV isolates for the DNA A component and 93–98% identity for the DNA B component. Agrobacterium-mediated inoculation to Nicotiana benthamiana resulted in the development of symptoms fifteen days post inoculation. This is the first bipartite begomovirus identified in Oman. Overall the Oman isolate showed the highest levels of sequence identity to a WmCSV isolate originating from Iran, which was confirmed by phylogenetic analysis. This suggests that WmCSV present in Oman has been introduced from Iran. The significance of this finding is discussed. PMID:22852046
Complete nucleotide sequence of watermelon chlorotic stunt virus originating from Oman.
Khan, Akhtar J; Akhtar, Sohail; Briddon, Rob W; Ammara, Um; Al-Matrooshi, Abdulrahman M; Mansoor, Shahid
2012-07-01
Watermelon chlorotic stunt virus (WmCSV) is a bipartite begomovirus (genus Begomovirus, family Geminiviridae) that causes economic losses to cucurbits, particularly watermelon, across the Middle East and North Africa. Recently squash (Cucurbita moschata) grown in an experimental field in Oman was found to display symptoms such as leaf curling, yellowing and stunting, typical of a begomovirus infection. Sequence analysis of the virus isolated from squash showed 97.6-99.9% nucleotide sequence identity to previously described WmCSV isolates for the DNA A component and 93-98% identity for the DNA B component. Agrobacterium-mediated inoculation to Nicotiana benthamiana resulted in the development of symptoms fifteen days post inoculation. This is the first bipartite begomovirus identified in Oman. Overall the Oman isolate showed the highest levels of sequence identity to a WmCSV isolate originating from Iran, which was confirmed by phylogenetic analysis. This suggests that WmCSV present in Oman has been introduced from Iran. The significance of this finding is discussed.
Novel, non-symbiotic isolates of Neorhizobium from a dryland agricultural soil.
Soenens, Amalia; Imperial, Juan
2018-01-01
Semi-selective enrichment, followed by PCR screening, resulted in the successful direct isolation of fast-growing Rhizobia from a dryland agricultural soil. Over 50% of these isolates belong to the genus Neorhizobium , as concluded from partial rpoB and near-complete 16S rDNA sequence analysis. Further genotypic and genomic analysis of five representative isolates confirmed that they form a coherent group within Neorhizobium , closer to N. galegae than to the remaining Neorhizobium species, but clearly differentiated from the former, and constituting at least one new genomospecies within Neorhizobium. All the isolates lacked nod and nif symbiotic genes but contained a repABC replication/maintenance region, characteristic of rhizobial plasmids, within large contigs from their draft genome sequences. These repABC sequences were related, but not identical, to repABC sequences found in symbiotic plasmids from N. galegae , suggesting that the non-symbiotic isolates have the potential to harbor symbiotic plasmids. This is the first report of non-symbiotic members of Neorhizobium from soil.
Genetic characterization of K13965, a strain of Oak Vale virus from Western Australia.
Quan, Phenix-Lan; Williams, David T; Johansen, Cheryl A; Jain, Komal; Petrosov, Alexandra; Diviney, Sinead M; Tashmukhamedova, Alla; Hutchison, Stephen K; Tesh, Robert B; Mackenzie, John S; Briese, Thomas; Lipkin, W Ian
2011-09-01
K13965, an uncharacterized virus, was isolated in 1993 from Anopheles annulipes mosquitoes collected in the Kimberley region of northern Western Australia. Here, we report its genomic sequence, identify it as a rhabdovirus, and characterize its phylogenetic relationships. The genome comprises a P' (C) and SH protein similar to the recently characterized Tupaia and Durham viruses, and shows overlap between G and L genes. Comparison of K13965 genome sequence to other rhabdoviruses identified K13965 as a strain of the unclassified Australian Oak Vale rhabdovirus, whose complete genome sequence we also determined. Phylogenetic analysis of N and L sequences indicated genetic relationship to a recently proposed Sandjima virus clade, although the Oak Vale virus sequences form a branch separate from the African members of that group. Copyright © 2011 Elsevier B.V. All rights reserved.
Linear and Nonlinear Statistical Characterization of DNA
NASA Astrophysics Data System (ADS)
Norio Oiwa, Nestor; Goldman, Carla; Glazier, James
2002-03-01
We find spatial order in the distribution of protein-coding (including RNAs) and control segments of GenBank genomic sequences, irrespective of ATCG content. This is achieved by correlations, histograms, fractal dimensions and singularity spectra. Estimates of these quantities in complete nuclear genome indicate that coding sequences are long-range correlated and their disposition are self-similar (multifractal) for eukaryotes. These characteristics are absent in prokaryotes, where there are few noncoding sequences, suggesting the `junk' DNA play a relevant role to the genome structure and function. Concerning the genetic message of ATCG sequences, we build a random walk (Levy flight), using DNA symmetry arguments, where we associate A, T, C and G as left, right, down and up steps, respectively. Nonlinear analysis of mitochondrial DNA walks reveal multifractal pattern based on palindromic sequences, which fold in hairpins and loops.
Molecular characterization of two prunus necrotic ringspot virus isolates from Canada.
Cui, Hongguang; Hong, Ni; Wang, Guoping; Wang, Aiming
2012-05-01
We determined the entire RNA1, 2 and 3 sequences of two prunus necrotic ringspot virus (PNRSV) isolates, Chr3 from cherry and Pch12 from peach, obtained from an orchard in the Niagara Fruit Belt, Canada. The RNA1, 2 and 3 of the two isolates share nucleotide sequence identities of 98.6%, 98.4% and 94.5%, respectively. Their RNA1- and 2-encoded amino acid sequences are about 98% identical to the corresponding sequences of a cherry isolate, CH57, the only other PNRSV isolate with complete RNA1 and 2 sequences available. Phylogenetic analysis of the coat protein and movement protein encoded by RNA3 of Pch12 and Chr3 and published PNRSV isolates indicated that Chr3 belongs to the PV96 group and Pch12 belongs to the PV32 group.
Identification and Resolution of Microdiversity through Metagenomic Sequencing of Parallel Consortia
Maezato, Yukari; Wu, Yu-Wei; Romine, Margaret F.; Lindemann, Stephen R.
2015-01-01
To gain a predictive understanding of the interspecies interactions within microbial communities that govern community function, the genomic complement of every member population must be determined. Although metagenomic sequencing has enabled the de novo reconstruction of some microbial genomes from environmental communities, microdiversity confounds current genome reconstruction techniques. To overcome this issue, we performed short-read metagenomic sequencing on parallel consortia, defined as consortia cultivated under the same conditions from the same natural community with overlapping species composition. The differences in species abundance between the two consortia allowed reconstruction of near-complete (at an estimated >85% of gene complement) genome sequences for 17 of the 20 detected member species. Two Halomonas spp. indistinguishable by amplicon analysis were found to be present within the community. In addition, comparison of metagenomic reads against the consensus scaffolds revealed within-species variation for one of the Halomonas populations, one of the Rhodobacteraceae populations, and the Rhizobiales population. Genomic comparison of these representative instances of inter- and intraspecies microdiversity suggests differences in functional potential that may result in the expression of distinct roles in the community. In addition, isolation and complete genome sequence determination of six member species allowed an investigation into the sensitivity and specificity of genome reconstruction processes, demonstrating robustness across a wide range of sequence coverage (9× to 2,700×) within the metagenomic data set. PMID:26497460
Identification and Resolution of Microdiversity through Metagenomic Sequencing of Parallel Consortia
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nelson, William C.; Maezato, Yukari; Wu, Yu-Wei
2015-10-23
To gain a predictive understanding of the interspecies interactions within microbial communities that govern community function, the genomic complement of every member population must be determined. Although metagenomic sequencing has enabled thede novoreconstruction of some microbial genomes from environmental communities, microdiversity confounds current genome reconstruction techniques. To overcome this issue, we performed short-read metagenomic sequencing on parallel consortia, defined as consortia cultivated under the same conditions from the same natural community with overlapping species composition. The differences in species abundance between the two consortia allowed reconstruction of near-complete (at an estimated >85% of gene complement) genome sequences for 17 ofmore » the 20 detected member species. TwoHalomonasspp. indistinguishable by amplicon analysis were found to be present within the community. In addition, comparison of metagenomic reads against the consensus scaffolds revealed within-species variation for one of theHalomonaspopulations, one of theRhodobacteraceaepopulations, and theRhizobialespopulation. Genomic comparison of these representative instances of inter- and intraspecies microdiversity suggests differences in functional potential that may result in the expression of distinct roles in the community. In addition, isolation and complete genome sequence determination of six member species allowed an investigation into the sensitivity and specificity of genome reconstruction processes, demonstrating robustness across a wide range of sequence coverage (9× to 2,700×) within the metagenomic data set.« less
Manchado, Manuel; Infante, Carlos; Asensio, Esther; Cañavate, Jose Pedro; Douglas, Susan E
2007-07-03
Ribosomal proteins (RPs) are key components of ribosomes, the cellular organelle responsible for protein biosynthesis in cells. Their levels can vary as a function of organism growth and development; however, some RPs have been associated with other cellular processes or extraribosomal functions. Their high representation in cDNA libraries has resulted in the increase of RP sequences available from different organisms and their proposal as appropriate molecular markers for phylogenetic analysis. The development of large-scale genomics of Senegalese sole (Solea senegalensis) and Atlantic halibut (Hippoglossus hippoglossus), two commercially important flatfish species, has made possible the identification and systematic analysis of the complete set of RP sequences for the small (40S) ribosome subunit. Amino acid sequence comparisons showed a high similarity both between these two flatfish species and with respect to other fish and human. EST analysis revealed the existence of two and four RPS27 genes in Senegalese sole and Atlantic halibut, respectively. Phylogenetic analysis clustered RPS27 in two separate clades with their fish and mammalian counterparts. Steady-state transcript levels for eight RPs (RPS2, RPS3a, RPS15, RPS27-1, RPS27-2, RPS27a, RPS28, and RPS29) in sole were quantitated during larval development and in tissues, using a real-time PCR approach. All eight RPs exhibited different expression patterns in tissues with the lowest levels in brain. On the contrary, RP transcripts increased co-ordinately after first larval feeding reducing progressively during the metamorphic process. The genomic resources and knowledge developed in this survey will provide new insights into the evolution of Pleuronectiformes. Expression data will contribute to a better understanding of RP functions in fish, especially the mechanisms that govern growth and development in larvae, with implications in aquaculture.
Norman, Paul J.; Norberg, Steven J.; Guethlein, Lisbeth A.; Nemat-Gorgani, Neda; Royce, Thomas; Wroblewski, Emily E.; Dunn, Tamsen; Mann, Tobias; Alicata, Claudia; Hollenbach, Jill A.; Chang, Weihua; Shults Won, Melissa; Gunderson, Kevin L.; Abi-Rached, Laurent; Ronaghi, Mostafa; Parham, Peter
2017-01-01
The most polymorphic part of the human genome, the MHC, encodes over 160 proteins of diverse function. Half of them, including the HLA class I and II genes, are directly involved in immune responses. Consequently, the MHC region strongly associates with numerous diseases and clinical therapies. Notoriously, the MHC region has been intractable to high-throughput analysis at complete sequence resolution, and current reference haplotypes are inadequate for large-scale studies. To address these challenges, we developed a method that specifically captures and sequences the 4.8-Mbp MHC region from genomic DNA. For 95 MHC homozygous cell lines we assembled, de novo, a set of high-fidelity contigs and a sequence scaffold, representing a mean 98% of the target region. Included are six alternative MHC reference sequences of the human genome that we completed and refined. Characterization of the sequence and structural diversity of the MHC region shows the approach accurately determines the sequences of the highly polymorphic HLA class I and HLA class II genes and the complex structural diversity of complement factor C4A/C4B. It has also uncovered extensive and unexpected diversity in other MHC genes; an example is MUC22, which encodes a lung mucin and exhibits more coding sequence alleles than any HLA class I or II gene studied here. More than 60% of the coding sequence alleles analyzed were previously uncharacterized. We have created a substantial database of robust reference MHC haplotype sequences that will enable future population scale studies of this complicated and clinically important region of the human genome. PMID:28360230
Beckenbach, Andrew T.
2012-01-01
The complete mitochondrial DNA sequences of eight representatives of lower Diptera, suborder Nematocera, along with nearly complete sequences from two other species, are presented. These taxa represent eight families not previously represented by complete mitochondrial DNA sequences. Most of the sequences retain the ancestral dipteran mitochondrial gene arrangement, while one sequence, that of the midge Arachnocampa flava (family Keroplatidae), has an inversion of the trnE gene. The most unusual result is the extensive rearrangement of the mitochondrial genome of a winter crane fly, Paracladura trichoptera (family Trichocera). The pattern of rearrangement indicates that the mechanism of rearrangement involved a tandem duplication of the entire mitochondrial genome, followed by random and nonrandom loss of one copy of each gene. Another winter crane fly retains the ancestral diperan gene arrangement. A preliminary mitochondrial phylogeny of the Diptera is also presented. PMID:22155689
Kumar, Rakesh; Mandal, B; Geetanjali, A S; Jain, R K; Jaiwal, P K
2010-08-01
Watermelon bud necrosis virus (WBNV), a member of the genus Tospovirus, family Bunyaviridae is an important viral pathogen in watermelon cultivation in India. The complete genome sequence properties of WBNV are not available. In the present study, the complete M RNA sequence and the genome organisation of a WBNV isolate infecting watermelon in Delhi (WBNV-wDel) were determined. The M RNA was 4,794 nucleotides (nt) long and potentially coded for a movement protein (NSm) of 34.22 kDa (307 amino acids) on the viral sense strand and a Gn/Gc glycoprotein precursor of 127.15 kDa (1,121 amino acids) on the complementary strand. The two open reading frames were separated by an intergenic region of 402 nt. The 5' and 3' untranslated regions were 55 and 47 nt long, respectively, containing complementary termini typical of tospoviruses. WBNV-wDel was most closely related (79.1% identity) to Groundnut bud necrosis virus, an important tospovirus that occurs in several crops in India, and was different (63.3-75.2% identity) from the other cucurbit-infecting tospoviruses known to occur in Taiwan and Japan. Sequence analysis of NSm and Gn/Gc revealed phylogenetic incongruence between WBNV-wDel and another isolate originating from central India (WBNV-Wm-Som isolate). The Wm-Som isolate showed evolutionary divergence from the wDel isolate in the Gn/Gc protein (74.6% identity) potentially due to recombination with the other tospoviruses that are known to occur in India. This is the first report of a comparison of complete sequences of M RNA of WBNV.
Yao, Xiaohong; Tang, Ping; Li, Zuozhou; Li, Dawei; Liu, Yifei; Huang, Hongwen
2015-01-01
Actinidia chinensis is an important economic plant belonging to the basal lineage of the asterids. Availability of a complete Actinidia chloroplast genome sequence is crucial to understanding phylogenetic relationships among major lineages of angiosperms and facilitates kiwifruit genetic improvement. We report here the complete nucleotide sequences of the chloroplast genomes for Actinidia chinensis and A. chinensis var deliciosa obtained through de novo assembly of Illumina paired-end reads produced by total DNA sequencing. The total genome size ranges from 155,446 to 157,557 bp, with an inverted repeat (IR) of 24,013 to 24,391 bp, a large single copy region (LSC) of 87,984 to 88,337 bp and a small single copy region (SSC) of 20,332 to 20,336 bp. The genome encodes 113 different genes, including 79 unique protein-coding genes, 30 tRNA genes and 4 ribosomal RNA genes, with 16 duplicated in the inverted repeats, and a tRNA gene (trnfM-CAU) duplicated once in the LSC region. Comparisons of IR boundaries among four asterid species showed that IR/LSC borders were extended into the 5' portion of the psbA gene and IR contraction occurred in Actinidia. The clap gene has been lost from the chloroplast genome in Actinidia, and may have been transferred to the nucleus during chloroplast evolution. Twenty-seven polymorphic simple sequence repeat (SSR) loci were identified in the Actinidia chloroplast genome. Maximum parsimony analyses of a 72-gene, 16 taxa angiosperm dataset strongly support the placement of Actinidiaceae in Ericales within the basal asterids.
Yamamoto, Eiji; Ito, Toshihiro; Ito, Hiroshi
2016-11-01
The nucleotide sequences of nucleocapsid protein (N); phosphoprotein (P); matrix protein (M); hemagglutinin-neuraminidase (HN); and large polymerase protein (L) genes, 3'-end leader, 5'-end trailer and intergenic regions of the avian paramyxovirus (APMV) strain goose/Shimane/67/2000 (APMV/Shimane67) were determined. Together with previously reported data on fusion protein (F) gene sequence [46], the determination of the genome sequence of APMV/Shimane67 has been completed in this study. The genome of APMV/Shimane67 comprised 16,146 nucleotides in length and contains six genes in the order of 3'-N-P-M-F-HN-L-5'. The features of the APMV/Shimane67 genome (e.g., nucleotide length of whole genome and each of the six genes, and predicted amino acid length of each of the six genes) were distinct from those of other APMV serotypes. Phylogenetic analysis indicated that although APMV/Shimane67 was grouped with APMV-1, -9 and -12, the evolutionary distance between APMV/Shimane67 and these viruses was longer than that observed between intra-serotype viruses. These results show that the genome sequence of APMV/Shimane67 contains specific characteristics and is distinguishable from other types of APMV.
Li, Juan; Chen, Fen; Sugiyama, Hiromu; Blair, David; Lin, Rui-Qing; Zhu, Xing-Quan
2015-07-01
In the present study, near-complete mitochondrial (mt) genome sequences for Schistosoma japonicum from different regions in the Philippines and Japan were amplified and sequenced. Comparisons among S. japonicum from the Philippines, Japan, and China revealed a geographically based length difference in mt genomes, but the mt genomic organization and gene arrangement were the same. Sequence differences among samples from the Philippines and all samples from the three endemic areas were 0.57-2.12 and 0.76-3.85 %, respectively. The most variable part of the mt genome was the non-coding region. In the coding portion of the genome, protein-coding genes varied more than rRNA genes and tRNAs. The near-complete mt genome sequences for Philippine specimens were identical in length (14,091 bp) which was 4 bp longer than those of S. japonicum samples from Japan and China. This indel provides a unique genetic marker for S. japonicum samples from the Philippines. Phylogenetic analyses based on the concatenated amino acids of 12 protein-coding genes showed that samples of S. japonicum clustered according to their geographical origins. The identified mitochondrial indel marker will be useful for tracing the source of S. japonicum infection in humans and animals in Southeast Asia.
Samuel, Arthur S.; Kumar, Sachin; Madhuri, Subbiah; Collins, Peter L.; Samal, Siba K.
2009-01-01
The complete genome consensus sequence was determined for avian paramyxovirus (APMV) serotype 9 prototype strain PMV-9/domestic Duck/New York/22/78. The genome is 15,438 nucleotides (nt) long and encodes six non-overlapping genes in the order of 3′-N-P/V/W-M-F-HN-L-5′ with intergenic regions of 0–30 nt. The genome length follows the “rule of six” and contains a 55-nt leader sequence at the 3′ end and a 47-nt trailer sequence at the 5′ end. The cleavage site of the F protein is I-R-E-G-R-I↓F, which does not conform to the conventional cleavage site of the ubiquitous cellular protease furin. The virus required exogenous protease for in vitro replication and grew only in a few established cell lines, indicating a restricted host range. Alignment and phylogenetic analysis of the predicted amino acid sequences of APMV-9 proteins with the cognate proteins of viruses of all five genera of family Paramyxoviridae showed that APMV-9 is more closely related to APMV-1 than to other APMVs. The mean death time in embryonated chicken eggs was found to be more than 120 h, indicating APMV-9 to be avirulent for chickens. PMID:19185593
Birth and death of genes linked to chromosomal inversion
Furuta, Yoshikazu; Kawai, Mikihiko; Yahara, Koji; Takahashi, Noriko; Handa, Naofumi; Tsuru, Takeshi; Oshima, Kenshiro; Yoshida, Masaru; Azuma, Takeshi; Hattori, Masahira; Uchiyama, Ikuo; Kobayashi, Ichizo
2011-01-01
The birth and death of genes is central to adaptive evolution, yet the underlying genome dynamics remain elusive. The availability of closely related complete genome sequences helps to follow changes in gene contents and clarify their relationship to overall genome organization. Helicobacter pylori, bacteria in our stomach, are known for their extreme genome plasticity through mutation and recombination and will make a good target for such an analysis. In comparing their complete genome sequences, we found that gain and loss of genes (loci) for outer membrane proteins, which mediate host interaction, occurred at breakpoints of chromosomal inversions. Sequence comparison there revealed a unique mechanism of DNA duplication: DNA duplication associated with inversion. In this process, a DNA segment at one chromosomal locus is copied and inserted, in an inverted orientation, into a distant locus on the same chromosome, while the entire region between these two loci is also inverted. Recognition of this and three more inversion modes, which occur through reciprocal recombination between long or short sequence similarity or adjacent to a mobile element, allowed reconstruction of synteny evolution through inversion events in this species. These results will guide the interpretation of extensive DNA sequencing results for understanding long- and short-term genome evolution in various organisms and in cancer cells. PMID:21212362
Gene structure and evolution of transthyretin in the order Chiroptera.
Khwanmunee, Jiraporn; Leelawatwattana, Ladda; Prapunpoj, Porntip
2016-02-01
Bats are mammals in the order Chiroptera. Although many extensive morphologic and molecular genetics analyses have been attempted, phylogenetic relationships of bats has not been completely resolved. The paraphyly of microbats is of particular controversy that needs to be confirmed. In this study, we attempted to use the nucleotide sequence of transthyretin (TTR) intron 1 to resolve the relationship among bats. To explore its utility, the complete sequences of TTR gene and intron 1 region of bats in Vespertilionidae: genus Eptesicus (Eptesicus fuscus) and genus Myotis (Myotis brandtii, Myotis davidii, and Myotis lucifugus), and Pteropodidae (Pteropus alecto and Pteropus vampyrus) were extracted from the retrieved sequences, whereas those of Rhinoluphus affinis and Scotophilus kuhlii were amplified and sequenced. The derived overall amino sequences of bat TTRs were found to be very similar to those in other eutherians but differed from those in other classes of vertebrates. However, missing of amino acids from N-terminal or C-terminal region was observed. The phylogenetic analysis of amino acid sequences suggested bat and other eutherian TTRs lineal descent from a single most recent common ancestor which differed from those of non-placental mammals and the other classes of vertebrates. The splicing of bat TTR precursor mRNAs was similar to those of other eutherian but different from those of marsupial, bird, reptile and amphibian. Based on TTR intron 1 sequence, the inferred evolutionary relationship within Chiroptera revealed more closely relatedness of R. affinis to megabats than to microbats. Accordingly, the paraphyly of microbats was suggested.
Complete genome sequence of a tomato infecting tomato mottle mosaic virus in New York
USDA-ARS?s Scientific Manuscript database
Complete genome sequence of an emerging isolate of tomato mottle mosaic virus (ToMMV) infecting experimental nicotianan benthamiana plants in up-state New York was obtained using small RNA deep sequencing. ToMMV_NY-13 shared 99% sequence identity to ToMMV isolates from Mexico and Florida. Broader d...
Near complete genome sequence of Clostridium paradoxum strain JW-YL-7
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lancaster, Andrew; Utturkar, Sagar M.; Poole, Farris
2016-05-05
Clostridium paradoxum strain JW-YL-7 is a moderately thermophilic anaerobic alkaliphile isolated from the municipal sewage treatment plant in Athens, GA. We report the near-complete genome sequence of C. paradoxum strain JW-YL-7 obtained by using PacBio DNA sequencing and Pilon for sequence assembly refinement with Illumina data.
Dash, Paban Kumar; Sharma, Shashi; Soni, Manisha; Agarwal, Ankita; Sahni, Ajay Kumar; Parida, Manmohan
2015-01-02
Dengue is now hyper-endemic in most parts of south and southeast Asia including India. The northern India particularly national capital New Delhi witnessed major Dengue outbreaks with Dengue virus type 1 (DENV-1) as the dominant serotype since last five years. This study was initiated to decipher the complete genome information of recently circulating DENV-1 (2009-2011) along with the prototype Indian DENV-1, isolated in 1956. Further extensive ML phylogenetic and Bayesian phylogeography analysis was carried out to investigate the evolution of this virus and understand its spatiotemporal diffusion across the globe. The complete genome analysis revealed deletion of a unique 21-nucleotide stretch in the 3' un-translated region of recent Indian DENV-1. The north Indian DENV-1 revealed up to 5.2% nucleotide sequence difference compared to recent isolates from southern India. Selection pressure analysis revealed positive selection in few amino acid sites of both structural and non-structural proteins. The molecular phylogeny classified the Indian DENV-1 into genotype III, which is also known as cosmopolitan genotype. The northern and southern Indian DENV-1 were grouped into distinct clades. The molecular clock analysis estimated a mean evolutionary rate of 7.08×10(-4) substitutions/site/year for cosmopolitan genotype. The phylogeography analysis revealed that the cosmopolitan genotype DENV-1 originated ∼1938 in India and subsequently spread globally. The diffusion of virus from India to Caribbean and South America was confirmed through SPREAD analysis. This study also confirmed the temporal displacement of different clades of DENV-1 in India over last five decades. Copyright © 2014 Elsevier B.V. All rights reserved.
Primer development to obtain complete coding sequence of HA and NA genes of influenza A/H3N2 virus.
Agustiningsih, Agustiningsih; Trimarsanto, Hidayat; Setiawaty, Vivi; Artika, I Made; Muljono, David Handojo
2016-08-30
Influenza is an acute respiratory illness and has become a serious public health problem worldwide. The need to study the HA and NA genes in influenza A virus is essential since these genes frequently undergo mutations. This study describes the development of primer sets for RT-PCR to obtain complete coding sequence of Hemagglutinin (HA) and Neuraminidase (NA) genes of influenza A/H3N2 virus from Indonesia. The primers were developed based on influenza A/H3N2 sequence worldwide from Global Initiative on Sharing All Influenza Data (GISAID) and further tested using Indonesian influenza A/H3N2 archived samples of influenza-like illness (ILI) surveillance from 2008 to 2009. An optimum RT-PCR condition was acquired for all HA and NA fragments designed to cover complete coding sequence of HA and NA genes. A total of 71 samples were successfully sequenced for complete coding sequence both of HA and NA genes out of 145 samples of influenza A/H3N2 tested. The developed primer sets were suitable for obtaining complete coding sequences of HA and NA genes of Indonesian samples from 2008 to 2009.
Generation and analysis of expressed sequence tags from the bone marrow of Chinese Sika deer.
Yao, Baojin; Zhao, Yu; Zhang, Mei; Li, Juan
2012-03-01
Sika deer is one of the best-known and highly valued animals of China. Despite its economic, cultural, and biological importance, there has not been a large-scale sequencing project for Sika deer to date. With the ultimate goal of sequencing the complete genome of this organism, we first established a bone marrow cDNA library for Sika deer and generated a total of 2,025 reads. After processing the sequences, 2,017 high-quality expressed sequence tags (ESTs) were obtained. These ESTs were assembled into 1,157 unigenes, including 238 contigs and 919 singletons. Comparative analyses indicated that 888 (76.75%) of the unigenes had significant matches to sequences in the non-redundant protein database, In addition to highly expressed genes, such as stearoyl-CoA desaturase, cytochrome c oxidase, adipocyte-type fatty acid-binding protein, adiponectin and thymosin beta-4, we also obtained vascular endothelial growth factor-A and heparin-binding growth-associated molecule, both of which are of great importance for angiogenesis research. There were 244 (21.09%) unigenes with no significant match to any sequence in current protein or nucleotide databases, and these sequences may represent genes with unknown function in Sika deer. Open reading frame analysis of the sequences was performed using the getorf program. In addition, the sequences were functionally classified using the gene ontology hierarchy, clusters of orthologous groups of proteins and Kyoto encyclopedia of genes and genomes databases. Analysis of ESTs described in this paper provides an important resource for the transcriptome exploration of Sika deer, and will also facilitate further studies on functional genomics, gene discovery and genome annotation of Sika deer.
Holm, Kåre Olav; Nilsson, Kristina; Hjerde, Erik; Willassen, Nils-Peder; Milton, Debra L
2015-01-01
Vibrio anguillarum causes a fatal hemorrhagic septicemia in marine fish that leads to great economical losses in aquaculture world-wide. Vibrio anguillarum strain NB10 serotype O1 is a Gram-negative, motile, curved rod-shaped bacterium, isolated from a diseased fish on the Swedish coast of the Gulf of Bothnia, and is slightly halophilic. Strain NB10 is a virulent isolate that readily colonizes fish skin and intestinal tissues. Here, the features of this bacterium are described and the annotation and analysis of its complete genome sequence is presented. The genome is 4,373,835 bp in size, consists of two circular chromosomes and one plasmid, and contains 3,783 protein-coding genes and 129 RNA genes.
See-Too, Wah Seng; Lim, Yan-Lue; Ee, Robson; Convey, Peter; Pearce, David A; Yin, Wai-Fong; Chan, Kok Gan
2016-03-20
Pseudomonas sp. strain L10.10 (=DSM 101070) is a psychrotolerant bacterium which was isolated from Lagoon Island, Antarctica. Analysis of its complete genome sequence indicates its possible role as a plant-growth promoting bacterium, including nitrogen-fixing ability and indole acetic acid (IAA)-producing trait, with additional suggestion of plant disease prevention attributes via hydrogen cyanide production. Copyright © 2016 Elsevier B.V. All rights reserved.
The Complete Chloroplast Genome of Wild Rice (Oryza minuta) and Its Comparison to Related Species.
Asaf, Sajjad; Waqas, Muhammad; Khan, Abdul L; Khan, Muhammad A; Kang, Sang-Mo; Imran, Qari M; Shahzad, Raheem; Bilal, Saqib; Yun, Byung-Wook; Lee, In-Jung
2017-01-01
Oryza minuta , a tetraploid wild relative of cultivated rice (family Poaceae), possesses a BBCC genome and contains genes that confer resistance to bacterial blight (BB) and white-backed (WBPH) and brown (BPH) plant hoppers. Based on the importance of this wild species, this study aimed to understand the phylogenetic relationships of O. minuta with other Oryza species through an in-depth analysis of the composition and diversity of the chloroplast (cp) genome. The analysis revealed a cp genome size of 135,094 bp with a typical quadripartite structure and consisting of a pair of inverted repeats separated by small and large single copies, 139 representative genes, and 419 randomly distributed microsatellites. The genomic organization, gene order, GC content and codon usage are similar to those of typical angiosperm cp genomes. Approximately 30 forward, 28 tandem and 20 palindromic repeats were detected in the O . minuta cp genome. Comparison of the complete O. minuta cp genome with another eleven Oryza species showed a high degree of sequence similarity and relatively high divergence of intergenic spacers. Phylogenetic analyses were conducted based on the complete genome sequence, 65 shared genes and matK gene showed same topologies and O. minuta forms a single clade with parental O. punctata . Thus, the complete O . minuta cp genome provides interesting insights and valuable information that can be used to identify related species and reconstruct its phylogeny.
Lari, Martina; Rizzi, Ermanno; Mona, Stefano; Corti, Giorgio; Catalano, Giulio; Chen, Kefei; Vernesi, Cristiano; Larson, Greger; Boscato, Paolo; De Bellis, Gianluca; Cooper, Alan; Caramelli, David; Bertorelle, Giorgio
2011-01-31
Bos primigenius, the aurochs, is the wild ancestor of modern cattle breeds and was formerly widespread across Eurasia and northern Africa. After a progressive decline, the species became extinct in 1627. The origin of modern taurine breeds in Europe is debated. Archaeological and early genetic evidence point to a single Near Eastern origin and a subsequent spread during the diffusion of herding and farming. More recent genetic data are instead compatible with local domestication events or at least some level of local introgression from the aurochs. Here we present the analysis of the complete mitochondrial genome of a pre-Neolithic Italian aurochs. In this study, we applied a combined strategy employing both multiplex PCR amplifications and 454 pyrosequencing technology to sequence the complete mitochondrial genome of an 11,450-year-old aurochs specimen from Central Italy. Phylogenetic analysis of the aurochs mtDNA genome supports the conclusions from previous studies of short mtDNA fragments--namely that Italian aurochsen were genetically very similar to modern cattle breeds, but highly divergent from the North-Central European aurochsen. Complete mitochondrial genome sequences are now available for several modern cattle and two pre-Neolithic mtDNA genomes from very different geographic areas. These data suggest that previously identified sub-groups within the widespread modern cattle mitochondrial T clade are polyphyletic, and they support the hypothesis that modern European breeds have multiple geographic origins.
Xie, Qing; Shen, Kang-Ning; Hao, Xiuying; Nam, Phan Nhut; Ngoc Hieu, Bui Thi; Chen, Ching-Hung; Zhu, Changqing; Lin, Yen-Chang; Hsiao, Chung-Der
2017-03-01
abtract We decoded the complete chloroplast DNA (cpDNA) sequence of the Tianshan Snow Lotus (Saussurea involucrata), a famous traditional Chinese medicinal plant of the family Asteraceae, by using next-generation sequencing technology. The genome consists of 152 490 bp containing a pair of inverted repeats (IRs) of 25 202 bp, which was separated by a large single-copy region and a small single-copy region of 83 446 bp and 18 639 bp, respectively. The genic regions account for 57.7% of whole cpDNA, and the GC content of the cpDNA was 37.7%. The S. involucrata cpDNA encodes 114 unigenes (82 protein-coding genes, 4 rRNA genes, and 28 tRNA genes). There are eight protein-coding genes (atpF, ndhA, ndhB, rpl2, rpoC1, rps16, clpP, and ycf3) and five tRNA genes (trnA-UGC, trnI-GAU, trnK-UUU, trnL-UAA, and trnV-UAC) containing introns. A phylogenetic analysis of the 11 complete cpDNA from Asteracease showed that S. involucrata is closely related to Centaurea diffusa (Diffuse Knapweed). The complete cpDNA of S. involucrata provides essential and important DNA molecular data for further phylogenetic and evolutionary analysis for Asteraceae.
Sequence Analysis of the Genome of Carnation (Dianthus caryophyllus L.)
Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi
2014-01-01
The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. ‘Francesco’ was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568 887 315 bp, consisting of 45 088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16 644 bp and 60 737 bp, respectively, and the longest scaffold was 1 287 144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. PMID:24344172
A novel tandem repeat sequence located on human chromosome 4p: isolation and characterization.
Kogi, M; Fukushige, S; Lefevre, C; Hadano, S; Ikeda, J E
1997-06-01
In an effort to analyze the genomic region of the distal half of human chromosome 4p, to where Huntington disease and other diseases have been mapped, we have isolated the cosmid clone (CRS447) that was likely to contain a region with specific repeat sequences. Clone CRS447 was subjected to detailed analysis, including chromosome mapping, restriction mapping, and DNA sequencing. Chromosome mapping by both a human-CHO hybrid cell panel and FISH revealed that CRS447 was predominantly located in the 4p15.1-15.3 region. CRS447 was shown to consist of tandem repeats of 4.7-kb units present on chromosome 4p. A single EcoRI unit was subcloned (pRS447), and the complete sequence was determined as 4752 nucleotides. When pRS447 was used as a probe, the number of copies of this repeat per haploid genome was estimated to be 50-70. Sequence analysis revealed that it contained two internal CA repeats and one putative ORF. Database search established that this sequence was unreported. However, two homologous STS markers were found in the database. We concluded that CRS447/pRS447 is a novel tandem repeat sequence that is mainly specific to human chromosome 4p.
Zheng, Yang; Cai, Jing; Li, JianWen; Li, Bo; Lin, Runmao; Tian, Feng; Wang, XiaoLing; Wang, Jun
2010-01-01
A 10-fold BAC library for giant panda was constructed and nine BACs were selected to generate finish sequences. These BACs could be used as a validation resource for the de novo assembly accuracy of the whole genome shotgun sequencing reads of giant panda newly generated by the Illumina GA sequencing technology. Complete sanger sequencing, assembly, annotation and comparative analysis were carried out on the selected BACs of a joint length 878 kb. Homologue search and de novo prediction methods were used to annotate genes and repeats. Twelve protein coding genes were predicted, seven of which could be functionally annotated. The seven genes have an average gene size of about 41 kb, an average coding size of about 1.2 kb and an average exon number of 6 per gene. Besides, seven tRNA genes were found. About 27 percent of the BAC sequence is composed of repeats. A phylogenetic tree was constructed using neighbor-join algorithm across five species, including giant panda, human, dog, cat and mouse, which reconfirms dog as the most related species to giant panda. Our results provide detailed sequence and structure information for new genes and repeats of giant panda, which will be helpful for further studies on the giant panda.
2011-01-01
Background Human Rhinoviruses (HRVs) are well recognized viral pathogens associated with acute respiratory tract illnesses (RTIs) abundant worldwide. Although recent studies have phylogenetically identified the new HRV species (HRV-C), data on molecular epidemiology, genetic diversity, and clinical manifestation have been limited. Result To gain new insight into HRV genetic diversity, we determined the complete coding sequences of putative new members of HRV species C (HRV-CU072 with 1% prevalence) and HRV-B (HRV-CU211) identified from clinical specimens collected from pediatric patients diagnosed with a symptom of acute lower RTI. Complete coding sequence and phylogenetic analysis revealed that the HRV-CU072 strain shared a recent common ancestor with most closely related Chinese strain (N4). Comparative analysis at the protein level showed that HRV-CU072 might accumulate substitutional mutations in structural proteins, as well as nonstructural proteins 3C and 3 D. Comparative analysis of all available HRVs and HEVs indicated that HRV-C contains a relatively high G+C content and is more closely related to HEV-D. This might be correlated to their replication and capability to adapt to the high temperature environment of the human lower respiratory tract. We herein report an infrequently occurring intra-species recombination event in HRV-B species (HRV-CU211) with a crossing over having taken place at the boundary of VP2 and VP3 genes. Moreover, we observed phylogenetic compatibility in all HRV species and suggest that dynamic mechanisms for HRV evolution seem to be related to recombination events. These findings indicated that the elementary units shaping the genetic diversity of HRV-C could be found in the nonstructural 2A and 3D genes. Conclusion This study provides information for understanding HRV genetic diversity and insight into the role of selection pressure and recombination mechanisms influencing HRV evolution. PMID:21214911
Linsuwanon, Piyada; Payungporn, Sunchai; Suwannakarn, Kamol; Chieochansin, Thaweesak; Theamboonlers, Apiradee; Poovorawan, Yong
2011-01-07
Human Rhinoviruses (HRVs) are well recognized viral pathogens associated with acute respiratory tract illnesses (RTIs) abundant worldwide. Although recent studies have phylogenetically identified the new HRV species (HRV-C), data on molecular epidemiology, genetic diversity, and clinical manifestation have been limited. To gain new insight into HRV genetic diversity, we determined the complete coding sequences of putative new members of HRV species C (HRV-CU072 with 1% prevalence) and HRV-B (HRV-CU211) identified from clinical specimens collected from pediatric patients diagnosed with a symptom of acute lower RTI. Complete coding sequence and phylogenetic analysis revealed that the HRV-CU072 strain shared a recent common ancestor with most closely related Chinese strain (N4). Comparative analysis at the protein level showed that HRV-CU072 might accumulate substitutional mutations in structural proteins, as well as nonstructural proteins 3C and 3 D. Comparative analysis of all available HRVs and HEVs indicated that HRV-C contains a relatively high G+C content and is more closely related to HEV-D. This might be correlated to their replication and capability to adapt to the high temperature environment of the human lower respiratory tract. We herein report an infrequently occurring intra-species recombination event in HRV-B species (HRV-CU211) with a crossing over having taken place at the boundary of VP2 and VP3 genes. Moreover, we observed phylogenetic compatibility in all HRV species and suggest that dynamic mechanisms for HRV evolution seem to be related to recombination events. These findings indicated that the elementary units shaping the genetic diversity of HRV-C could be found in the nonstructural 2A and 3D genes. This study provides information for understanding HRV genetic diversity and insight into the role of selection pressure and recombination mechanisms influencing HRV evolution.
Lyssavirus in Indian Flying Foxes, Sri Lanka
Gunawardena, Panduka S.; Marston, Denise A.; Ellis, Richard J.; Wise, Emma L.; Karawita, Anjana C.; Breed, Andrew C.; McElhinney, Lorraine M.; Johnson, Nicholas; Banyard, Ashley C.
2016-01-01
A novel lyssavirus was isolated from brains of Indian flying foxes (Pteropus medius) in Sri Lanka. Phylogenetic analysis of complete virus genome sequences, and geographic location and host species, provides strong evidence that this virus is a putative new lyssavirus species, designated as Gannoruwa bat lyssavirus. PMID:27434858