partial genomic sequence: Topics by Science.gov

Sample records for partial genomic sequence

Genome-wide comparisons of phylogenetic similarities between partial genomic regions and the full-length genome in Hepatitis E virus genotyping.

PubMed

Wang, Shuai; Wei, Wei; Luo, Xuenong; Cai, Xuepeng

2014-01-01

Besides the complete genome, different partial genomic sequences of Hepatitis E virus (HEV) have been used in genotyping studies, making it difficult to compare the results based on them. No commonly agreed partial region for HEV genotyping has been determined. In this study, we used a statistical method to evaluate the phylogenetic performance of each partial genomic sequence from a genome wide, by comparisons of evolutionary distances between genomic regions and the full-length genomes of 101 HEV isolates to identify short genomic regions that can reproduce HEV genotype assignments based on full-length genomes. Several genomic regions, especially one genomic region at the 3'-terminal of the papain-like cysteine protease domain, were detected to have relatively high phylogenetic correlations with the full-length genome. Phylogenetic analyses confirmed the identical performances between these regions and the full-length genome in genotyping, in which the HEV isolates involved could be divided into reasonable genotypes. This analysis may be of value in developing a partial sequence-based consensus classification of HEV species.
Evaluation of anonymous and expressed sequence tag derived polymorphic microsatellite markers in the tobacco budworm Heliothis virescens (Lepidoptera: noctuidae)

USDA-ARS?s Scientific Manuscript database

Polymorphic genetic markers were identified and characterized using a partial genomic library of Heliothis virescens enriched for simple sequence repeats (SSR) and nucleotide sequences of expressed sequence tags (EST). Nucleotide sequences of 192 clones from the partial genomic library yielded 147 u...
Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.

2005-08-26

Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. Amore » minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.« less
Complete Genomic Sequence and Comparative Analysis of the Genome Segments of Sweet Potato Chlorotic Stunt Virus in China

PubMed Central

Qin, Yanhong; Wang, Li; Zhang, Zhenchen; Qiao, Qi; Zhang, Desheng; Tian, Yuting; Wang, Shuang; Wang, Yongjiang; Yan, Zhaoling

2014-01-01

Background Sweet potato chlorotic stunt virus (family Closteroviridae, genus Crinivirus) features a large bipartite, single-stranded, positive-sense RNA genome. To date, only three complete genomic sequences of SPCSV can be accessed through GenBank. SPCSV was first detected from China in 2011, only partial genomic sequences have been determined in the country. No report on the complete genomic sequence and genome structure of Chinese SPCSV isolates or the genetic relation between isolates from China and other countries is available. Methodology/Principal Findings The complete genomic sequences of five isolates from different areas in China were characterized. This study is the first to report the complete genome sequences of SPCSV from whitefly vectors. Genome structure analysis showed that isolates of WA and EA strains from China have the same coding protein as isolates Can181-9 and m2-47, respectively. Twenty cp genes and four RNA1 partial segments were sequenced and analyzed, and the nucleotide identities of complete genomic, cp, and RNA1 partial sequences were determined. Results indicated high conservation among strains and significant differences between WA and EA strains. Genetic analysis demonstrated that, except for isolates from Guangdong Province, SPCSVs from other areas belong to the WA strain. Genome organization analysis showed that the isolates in this study lack the p22 gene. Conclusions/Significance We presented the complete genome sequences of SPCSV in China. Comparison of nucleotide identities and genome structures between these isolates and previously reported isolates showed slight differences. The nucleotide identities of different SPCSV isolates showed high conservation among strains and significant differences between strains. All nine isolates in this study lacked p22 gene. WA strains were more extensively distributed than EA strains in China. These data provide important insights into the molecular variation and genomic structure of SPCSV in China as well as genetic relationships among isolates from China and other countries. PMID:25170926
The first genome sequences of human bocaviruses from Vietnam

PubMed Central

Thanh, Tran Tan; Van, Hoang Minh Tu; Hong, Nguyen Thi Thu; Nhu, Le Nguyen Truc; Anh, Nguyen To; Tuan, Ha Manh; Hien, Ho Van; Tuong, Nguyen Manh; Kien, Trinh Trung; Khanh, Truong Huu; Nhan, Le Nguyen Thanh; Hung, Nguyen Thanh; Chau, Nguyen Van Vinh; Thwaites, Guy; van Doorn, H. Rogier; Tan, Le Van

2017-01-01

As part of an ongoing effort to generate complete genome sequences of hand, foot and mouth disease-causing enteroviruses directly from clinical specimens, two complete coding sequences and two partial genomic sequences of human bocavirus 1 (n=3) and 2 (n=1) were co-amplified and sequenced, representing the first genome sequences of human bocaviruses from Vietnam. The sequences may aid future study aiming at understanding the evolution of the virus. PMID:28090592
A New Zamilon-like Virophage Partial Genome Assembled from a Bioreactor Metagenome

PubMed Central

Bekliz, Meriem; Verneau, Jonathan; Benamar, Samia; Raoult, Didier; La Scola, Bernard; Colson, Philippe

2015-01-01

Virophages replicate within viral factories inside the Acanthamoeba cytoplasm, and decrease the infectivity and replication of their associated giant viruses. Culture isolation and metagenome analyses have suggested that they are common in our environment. By screening metagenomic databases in search of amoebal viruses, we detected virophage-related sequences among sequences generated from the same non-aerated bioreactor metagenome as recently screened by another team for virophage capsid-encoding genes. We describe here the assembled partial genome of a virophage closely related to Zamilon, which infects Acanthamoeba with mimiviruses of lineages B and C but not A. Searches for sequences related to amoebal giant viruses, other Megavirales representatives and virophages were conducted using BLAST against this bioreactor metagenome (PRJNA73603). Comparative genomic and phylogenetic analyses were performed using sequences from previously identified virophages. A total of 72 metagenome contigs generated from the bioreactor were identified as best matching with sequences from Megavirales representatives, mostly Pithovirus sibericum, pandoraviruses and amoebal mimiviruses from three lineages A–C, as well as from virophages. In addition, a partial genome from a Zamilon-like virophage, we named Zamilon 2, was assembled. This genome has a size of 6716 base pairs, corresponding to 39% of the Zamilon genome, and comprises partial or full-length homologs for 15 Zamilon predicted open reading frames (ORFs). Mean nucleotide and amino acid identities for these 15 Zamilon 2 ORFs with their Zamilon counterparts were 89% (range, 81–96%) and 91% (range, 78–99%), respectively. Notably, these ORFs included two encoding a capsid protein and a packaging ATPase. Comparative genomics and phylogenetic analyses indicated that the partial genome was that of a new Zamilon-like virophage. Further studies are needed to gain better knowledge of the tropism and prevalence of virophages in our biosphere and in humans. PMID:26640459
Detection of somatic, subclonal and mosaic CNVs from sequencing | Division of Cancer Prevention

Cancer.gov

Progress in technology has made individual genome sequencing a clinical reality, with partial genome sequencing already in use in clinical care. In fact, it is expected that within a few years whole genome sequencing will be a standard procedure that will allow discovering personal genomic variants of all types and thus greatly facilitate individualized medicine. However, fast
Partial DNA-guided Cas9 enables genome editing with reduced off-target activity

PubMed Central

Yin, Hao; Song, Chun-Qing; Suresh, Sneha; Kwan, Suet-Yan; Wu, Qiongqiong; Walsh, Stephen; Ding, Junmei; Bogorad, Roman L; Zhu, Lihua Julie; Wolfe, Scot A; Koteliansky, Victor; Xue, Wen; Langer, Robert; Anderson, Daniel G

2018-01-01

CRISPR–Cas9 is a versatile RNA-guided genome editing tool. Here we demonstrate that partial replacement of RNA nucleotides with DNA nucleotides in CRISPR RNA (crRNA) enables efficient gene editing in human cells. This strategy of partial DNA replacement retains on-target activity when used with both crRNA and sgRNA, as well as with multiple guide sequences. Partial DNA replacement also works for crRNA of Cpf1, another CRISPR system. We find that partial DNA replacement in the guide sequence significantly reduces off-target genome editing through focused analysis of off-target cleavage, measurement of mismatch tolerance and genome-wide profiling of off-target sites. Using the structure of the Cas9–sgRNA complex as a guide, the majority of the 3′ end of crRNA can be replaced with DNA nucleotide, and the 5 - and 3′-DNA-replaced crRNA enables efficient genome editing. Cas9 guided by a DNA–RNA chimera may provide a generalized strategy to reduce both the cost and the off-target genome editing in human cells. PMID:29377001
Genome sequences of Phytophthora enable translational plant disease management and accelerate research

Treesearch

Niklaus J. Grünwald

2012-01-01

Whole and partial genome sequences are becoming available at an ever-increasing pace. For many plant pathogen systems, we are moving into the era of genome resequencing. The first Phytophthora genomes, P. ramorum and P. sojae, became available in 2004, followed shortly by P. infestans...
Genome sequencing of the redbanded stink bug (Piezodorus guildinii)

USDA-ARS?s Scientific Manuscript database

We assembled a partial genome sequence from the redbanded stink bug, Piezodorus guildinii from Illumina MiSeq sequencing runs. The sequence has been submitted and published under NCBI GenBank Accession Number JTEQ01000000. The BioProject and BioSample Accession numbers are PRJNA263369 and SAMN030997...
Two Complete Genome Sequences of Phasey Bean Mild Yellows Virus, a Novel Member of the Luteoviridae from Australia

PubMed Central

Kehoe, Monica; Coutts, Brenda; van Leur, Joop; Filardo, Fiona; Thomas, John

2016-01-01

We present here the complete genome sequences of a novel polerovirus from Trifolium subterraneum (subterranean clover) and Cicer arietinum (chickpea) and compare these to a partial viral genome sequence obtained from Macroptilium lathyroides (phasey bean). We propose the name phasey bean mild yellows virus for this novel polerovirus. PMID:26847905
Two Complete Genome Sequences of Phasey Bean Mild Yellows Virus, a Novel Member of the Luteoviridae from Australia.

PubMed

Sharman, Murray; Kehoe, Monica; Coutts, Brenda; van Leur, Joop; Filardo, Fiona; Thomas, John

2016-02-04

We present here the complete genome sequences of a novel polerovirus from Trifolium subterraneum (subterranean clover) and Cicer arietinum (chickpea) and compare these to a partial viral genome sequence obtained from Macroptilium lathyroides (phasey bean). We propose the name phasey bean mild yellows virus for this novel polerovirus. Copyright © 2016 Sharman et al.
First full-length genome sequence of the polerovirus luffa aphid-borne yellows virus (LABYV) reveals the presence of at least two consensus sequences in an isolate from Thailand.

PubMed

Knierim, Dennis; Maiss, Edgar; Kenyon, Lawrence; Winter, Stephan; Menzel, Wulf

2015-10-01

Luffa aphid-borne yellows virus (LABYV) was proposed as the name for a previously undescribed polerovirus based on partial genome sequences obtained from samples of cucurbit plants collected in Thailand between 2008 and 2013. In this study, we determined the first full-length genome sequence of LABYV. Based on phylogenetic analysis and genome properties, it is clear that this virus represents a distinct species in the genus Polerovirus. Analysis of sequences from sample TH24, which was collected in 2010 from a luffa plant in Thailand, reveals the presence of two different full-length genome consensus sequences.
Evaluation of next generation mtGenome sequencing using the Ion Torrent Personal Genome Machine (PGM)☆

PubMed Central

Parson, Walther; Strobl, Christina; Huber, Gabriela; Zimmermann, Bettina; Gomes, Sibylle M.; Souto, Luis; Fendt, Liane; Delport, Rhena; Langit, Reina; Wootton, Sharon; Lagacé, Robert; Irwin, Jodi

2013-01-01

Insights into the human mitochondrial phylogeny have been primarily achieved by sequencing full mitochondrial genomes (mtGenomes). In forensic genetics (partial) mtGenome information can be used to assign haplotypes to their phylogenetic backgrounds, which may, in turn, have characteristic geographic distributions that would offer useful information in a forensic case. In addition and perhaps even more relevant in the forensic context, haplogroup-specific patterns of mutations form the basis for quality control of mtDNA sequences. The current method for establishing (partial) mtDNA haplotypes is Sanger-type sequencing (STS), which is laborious, time-consuming, and expensive. With the emergence of Next Generation Sequencing (NGS) technologies, the body of available mtDNA data can potentially be extended much more quickly and cost-efficiently. Customized chemistries, laboratory workflows and data analysis packages could support the community and increase the utility of mtDNA analysis in forensics. We have evaluated the performance of mtGenome sequencing using the Personal Genome Machine (PGM) and compared the resulting haplotypes directly with conventional Sanger-type sequencing. A total of 64 mtGenomes (>1 million bases) were established that yielded high concordance with the corresponding STS haplotypes (<0.02% differences). About two-thirds of the differences were observed in or around homopolymeric sequence stretches. In addition, the sequence alignment algorithm employed to align NGS reads played a significant role in the analysis of the data and the resulting mtDNA haplotypes. Further development of alignment software would be desirable to facilitate the application of NGS in mtDNA forensic genetics. PMID:23948325
Genetic Diversity of Crimean Congo Hemorrhagic Fever Virus Strains from Iran

PubMed Central

Chinikar, Sadegh; Bouzari, Saeid; Shokrgozar, Mohammad Ali; Mostafavi, Ehsan; Jalali, Tahmineh; Khakifirouz, Sahar; Nowotny, Norbert; Fooks, Anthony R.; Shah-Hosseini, Nariman

2016-01-01

Background: Crimean Congo hemorrhagic fever virus (CCHFV) is a member of the Bunyaviridae family and Nairovirus genus. It has a negative-sense, single stranded RNA genome approximately 19.2 kb, containing the Small, Medium, and Large segments. CCHFVs are relatively divergent in their genome sequence and grouped in seven distinct clades based on S-segment sequence analysis and six clades based on M-segment sequences. Our aim was to obtain new insights into the molecular epidemiology of CCHFV in Iran. Methods: We analyzed partial and complete nucleotide sequences of the S and M segments derived from 50 Iranian patients. The extracted RNA was amplified using one-step RT-PCR and then sequenced. The sequences were analyzed using Mega5 software. Results: Phylogenetic analysis of partial S segment sequences demonstrated that clade IV-(Asia 1), clade IV-(Asia 2) and clade V-(Europe) accounted for 80 %, 4 % and 14 % of the circulating genomic variants of CCHFV in Iran respectively. However, one of the Iranian strains (Iran-Kerman/22) was associated with none of other sequences and formed a new clade (VII). The phylogenetic analysis of complete S-segment nucleotide sequences from selected Iranian CCHFV strains complemented with representative strains from GenBank revealed similar topology as partial sequences with eight major clusters. A partial M segment phylogeny positioned the Iranian strains in either association with clade III (Asia-Africa) or clade V (Europe). Conclusion: The phylogenetic analysis revealed subtle links between distant geographic locations, which we propose might originate either from international livestock trade or from long-distance carriage of CCHFV by infected ticks via bird migration. PMID:27308271
Molecular characterization of human T-cell lymphotropic virus type 1 full and partial genomes by Illumina massively parallel sequencing technology.

PubMed

Pessôa, Rodrigo; Watanabe, Jaqueline Tomoko; Nukui, Youko; Pereira, Juliana; Casseb, Jorge; Kasseb, Jorge; de Oliveira, Augusto César Penalva; Segurado, Aluisio Cotrim; Sanabani, Sabri Saeed

2014-01-01

Here, we report on the partial and full-length genomic (FLG) variability of HTLV-1 sequences from 90 well-characterized subjects, including 48 HTLV-1 asymptomatic carriers (ACs), 35 HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM/TSP) and 7 adult T-cell leukemia/lymphoma (ATLL) patients, using an Illumina paired-end protocol. Blood samples were collected from 90 individuals, and DNA was extracted from the PBMCs to measure the proviral load and to amplify the HTLV-1 FLG from two overlapping fragments. The amplified PCR products were subjected to deep sequencing. The sequencing data were assembled, aligned, and mapped against the HTLV-1 genome with sufficient genetic resemblance and utilized for further phylogenetic analysis. A high-throughput sequencing-by-synthesis instrument was used to obtain an average of 3210- and 5200-fold coverage of the partial (n = 14) and FLG (n = 76) data from the HTLV-1 strains, respectively. The results based on the phylogenetic trees of consensus sequences from partial and FLGs revealed that 86 (95.5%) individuals were infected with the transcontinental sub-subtypes of the cosmopolitan subtype (aA) and that 4 individuals (4.5%) were infected with the Japanese sub-subtypes (aB). A comparison of the nucleotide and amino acids of the FLG between the three clinical settings yielded no correlation between the sequenced genotype and clinical outcomes. The evolutionary relationships among the HTLV sequences were inferred from nucleotide sequence, and the results are consistent with the hypothesis that there were multiple introductions of the transcontinental subtype in Brazil. This study has increased the number of subtype aA full-length genomes from 8 to 81 and HTLV-1 aB from 2 to 5 sequences. The overall data confirmed that the cosmopolitan transcontinental sub-subtypes were the most prevalent in the Brazilian population. It is hoped that this valuable genomic data will add to our current understanding of the evolutionary history of this medically important virus.
The mitochondrial genome of the Arizona Snowfly Mesocapnia arizonensis (Plecoptera, Capniidae).

PubMed

Elbrecht, Vasco; Leese, Florian

2016-09-01

We assembled the mitochondrial genome of the capniid stonefly Mesocapnia arizonensis (Baumann & Gaufin, 1969) using Illumina HiSeq sequence data. The recovered mitogenome is 14,921 bp in length and includes 13 protein-coding genes, 2 ribosomal RNA genes and 22 transfer RNA genes. The control region could only be assembled partially. Gene order resembles that of basal arthropods. This is the first partial mitogenome sequence for the stonefly superfamily group Euholognatha and will be useful in future phylogenetic analyses.
Partial structure of the phylloxin gene from the giant monkey frog, Phyllomedusa bicolor: parallel cloning of precursor cDNA and genomic DNA from lyophilized skin secretion.

PubMed

Chen, Tianbao; Gagliardo, Ron; Walker, Brian; Zhou, Mei; Shaw, Chris

2005-12-01

Phylloxin is a novel prototype antimicrobial peptide from the skin of Phyllomedusa bicolor. Here, we describe parallel identification and sequencing of phylloxin precursor transcript (mRNA) and partial gene structure (genomic DNA) from the same sample of lyophilized skin secretion using our recently-described cloning technique. The open-reading frame of the phylloxin precursor was identical in nucleotide sequence to that previously reported and alignment with the nucleotide sequence derived from genomic DNA indicated the presence of a 175 bp intron located in a near identical position to that found in the dermaseptins. The highly-conserved structural organization of skin secretion peptide genes in P. bicolor can thus be extended to include that encoding phylloxin (plx). These data further reinforce our assertion that application of the described methodology can provide robust genomic/transcriptomic/peptidomic data without the need for specimen sacrifice.
Genomic Diversity and Evolution of the Lyssaviruses

PubMed Central

Delmas, Olivier; Holmes, Edward C.; Talbi, Chiraz; Larrous, Florence; Dacheux, Laurent; Bouchier, Christiane; Bourhy, Hervé

2008-01-01

Lyssaviruses are RNA viruses with single-strand, negative-sense genomes responsible for rabies-like diseases in mammals. To date, genomic and evolutionary studies have most often utilized partial genome sequences, particularly of the nucleoprotein and glycoprotein genes, with little consideration of genome-scale evolution. Herein, we report the first genomic and evolutionary analysis using complete genome sequences of all recognised lyssavirus genotypes, including 14 new complete genomes of field isolates from 6 genotypes and one genotype that is completely sequenced for the first time. In doing so we significantly increase the extent of genome sequence data available for these important viruses. Our analysis of these genome sequence data reveals that all lyssaviruses have the same genomic organization. A phylogenetic analysis reveals strong geographical structuring, with the greatest genetic diversity in Africa, and an independent origin for the two known genotypes that infect European bats. We also suggest that multiple genotypes may exist within the diversity of viruses currently classified as ‘Lagos Bat’. In sum, we show that rigorous phylogenetic techniques based on full length genome sequence provide the best discriminatory power for genotype classification within the lyssaviruses. PMID:18446239
Rhabdovirus-like endogenous viral elements in the genome of Spodoptera frugiperda insect cells are actively transcribed: Implications for adventitious virus detection.

PubMed

Geisler, Christoph; Jarvis, Donald L

2016-07-01

Spodoptera frugiperda (Sf) cell lines are used to produce several biologicals for human and veterinary use. Recently, it was discovered that all tested Sf cell lines are persistently infected with Sf-rhabdovirus, a novel rhabdovirus. As part of an effort to search for other adventitious viruses, we searched the Sf cell genome and transcriptome for sequences related to Sf-rhabdovirus. To our surprise, we found intact Sf-rhabdovirus N- and P-like ORFs, and partial Sf-rhabdovirus G- and L-like ORFs. The transcribed and genomic sequences matched, indicating the transcripts were derived from the genomic sequences. These appear to be endogenous viral elements (EVEs), which result from the integration of partial viral genetic material into the host cell genome. It is theoretically impossible for the Sf-rhabdovirus-like EVEs to produce infectious virus particles as 1) they are disseminated across 4 genomic loci, 2) the G and L ORFs are incomplete, and 3) the M ORF is missing. Our finding of transcribed virus-like sequences in Sf cells underscores that MPS-based searches for adventitious viruses in cell substrates used to manufacture biologics should take into account both genomic and transcribed sequences to facilitate the identification of transcribed EVE's, and to avoid false positive detection of replication-competent adventitious viruses. Copyright © 2016 International Alliance for Biological Standardization. Published by Elsevier Ltd. All rights reserved.

Rhabdovirus-like endogenous viral elements in the genome of Spodoptera frugiperda insect cells are actively transcribed: implications for adventitious virus detection

PubMed Central

Geisler, Christoph; Jarvis, Donald L.

2016-01-01

Spodoptera frugiperda (Sf) cell lines are used to produce several biologicals for human and veterinary use. Recently, it was discovered that all tested Sf cell lines are persistently infected with Sf-rhabdovirus, a novel rhabdovirus. As part of an effort to search for other adventitious viruses, we searched the Sf cell genome and transcriptome for sequences related to Sf-rhabdovirus. To our surprise, we found intact Sf-rhabdovirus N- and P-like ORFs, and partial Sf-rhabdovirus G- and L-like ORFs. The transcribed and genomic sequences matched, indicating the transcripts were derived from the genomic sequences. These appear to be endogenous viral elements (EVEs), which result from the integration of partial viral genetic material into the host cell genome. It is theoretically impossible for the Sf-rhabdovirus-like EVEs to produce infectious virus particles as 1) they are disseminated across 4 genomic loci, 2) the G and L ORFs are incomplete, and 3) the M ORF is missing. Our finding of transcribed virus-like sequences in Sf cells underscores that MPS-based searches for adventitious viruses in cell substrates used to manufacture biologics should take into account both genomic and transcribed sequences to facilitate the identification of transcribed EVE's, and to avoid false positive detection of replication-competent adventitious viruses. PMID:27236849
Analysis for complete genomic sequence of HLA-B and HLA-C alleles in the Chinese Han population.

PubMed

Zhu, F; He, Y; Zhang, W; He, J; He, J; Xu, X; Lv, H; Yan, L

2011-08-01

In the present study, we have determined the complete genomic sequence and analysed the intron polymorphism of partial HLA-B and HLA-C alleles in the Chinese Han population. Over 3.0 kb DNA fragments of HLA-B and HLA-C loci were amplified by polymerase chain reaction from partial 5' untranslated region to 3' noncoding region respectively, and then the amplified products were sequenced. Full-length nucleotide sequences of 14 HLA-B alleles and 10 HLA-C alleles were obtained and have been submitted to GenBank and IMGT/HLA database. Two novel alleles of HLA-B*52:01:01:02 and HLA-B*59:01:01:02 were identified, and the complete genomic sequence of HLA-B*52:01:01:01 was firstly reported. Totally 157 and 167 polymorphism positions were found in the full-length genomic sequence of HLA-B and HLA-C loci respectively. Our results suggested that many single nucleotide polymorphisms existed in the exon and intron regions, and the data can provide useful information for understanding the evolution of HLA-B and HLA-C alleles. © 2011 Blackwell Publishing Ltd.
A Partial Least Squares Based Procedure for Upstream Sequence Classification in Prokaryotes.

PubMed

Mehmood, Tahir; Bohlin, Jon; Snipen, Lars

2015-01-01

The upstream region of coding genes is important for several reasons, for instance locating transcription factor, binding sites, and start site initiation in genomic DNA. Motivated by a recently conducted study, where multivariate approach was successfully applied to coding sequence modeling, we have introduced a partial least squares (PLS) based procedure for the classification of true upstream prokaryotic sequence from background upstream sequence. The upstream sequences of conserved coding genes over genomes were considered in analysis, where conserved coding genes were found by using pan-genomics concept for each considered prokaryotic species. PLS uses position specific scoring matrix (PSSM) to study the characteristics of upstream region. Results obtained by PLS based method were compared with Gini importance of random forest (RF) and support vector machine (SVM), which is much used method for sequence classification. The upstream sequence classification performance was evaluated by using cross validation, and suggested approach identifies prokaryotic upstream region significantly better to RF (p-value < 0.01) and SVM (p-value < 0.01). Further, the proposed method also produced results that concurred with known biological characteristics of the upstream region.
Draft Genome Sequence of Streptomyces clavuligerus NRRL 3585, a Producer of Diverse Secondary Metabolites▿

PubMed Central

Song, Ju Yeon; Jeong, Haeyoung; Yu, Dong Su; Fischbach, Michael A.; Park, Hong-Seog; Kim, Jae Jong; Seo, Jeong-Sun; Jensen, Susan E.; Oh, Tae Kwang; Lee, Kye Joon; Kim, Jihyun F.

2010-01-01

Streptomyces clavuligerus is an important industrial strain that produces a number of antibiotics, including clavulanic acid and cephamycin C. A high-quality draft genome sequence of the S. clavuligerus NRRL 3585 strain was produced by employing a hybrid approach that involved Sanger sequencing, Roche/454 pyrosequencing, optical mapping, and partial finishing. Its genome, comprising four linear replicons, one chromosome, and four plasmids, carries numerous sets of genes involved in the biosynthesis of secondary metabolites, including a variety of antibiotics. PMID:20889745
Component identification of electron transport chains in curdlan-producing Agrobacterium sp. ATCC 31749 and its genome-specific prediction using comparative genome and phylogenetic trees analysis.

PubMed

Zhang, Hongtao; Setubal, Joao Carlos; Zhan, Xiaobei; Zheng, Zhiyong; Yu, Lijun; Wu, Jianrong; Chen, Dingqiang

2011-06-01

Agrobacterium sp. ATCC 31749 (formerly named Alcaligenes faecalis var. myxogenes) is a non-pathogenic aerobic soil bacterium used in large scale biotechnological production of curdlan. However, little is known about its genomic information. DNA partial sequence of electron transport chains (ETCs) protein genes were obtained in order to understand the components of ETC and genomic-specificity in Agrobacterium sp. ATCC 31749. Degenerate primers were designed according to ETC conserved sequences in other reported species. DNA partial sequences of ETC genes in Agrobacterium sp. ATCC 31749 were cloned by the PCR method using degenerate primers. Based on comparative genomic analysis, nine electron transport elements were ascertained, including NADH ubiquinone oxidoreductase, succinate dehydrogenase complex II, complex III, cytochrome c, ubiquinone biosynthesis protein ubiB, cytochrome d terminal oxidase, cytochrome bo terminal oxidase, cytochrome cbb (3)-type terminal oxidase and cytochrome caa (3)-type terminal oxidase. Similarity and phylogenetic analyses of these genes revealed that among fully sequenced Agrobacterium species, Agrobacterium sp. ATCC 31749 is closest to Agrobacterium tumefaciens C58. Based on these results a comprehensive ETC model for Agrobacterium sp. ATCC 31749 is proposed.
Draft genome sequences of 1 MSSA and 7 MRSA ST5 isolates obtained from California

USDA-ARS?s Scientific Manuscript database

Staphylococcus aureus is a commensal of humans that can cause a spectrum of diseases. An isolate’s capacity to cause disease is partially attributed to the acquisition of novel mobile genetic elements. This report provides the draft genome sequence of one methicillin susceptible and seven methicilli...
Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes).

PubMed

Dessimoz, Christophe; Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

2011-09-01

Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references.
Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes)

PubMed Central

Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

2011-01-01

Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references. PMID:21712341
The Complete Sequence of a Human Parainfluenzavirus 4 Genome

PubMed Central

Yea, Carmen; Cheung, Rose; Collins, Carol; Adachi, Dena; Nishikawa, John; Tellier, Raymond

2009-01-01

Although the human parainfluenza virus 4 (HPIV4) has been known for a long time, its genome, alone among the human paramyxoviruses, has not been completely sequenced to date. In this study we obtained the first complete genomic sequence of HPIV4 from a clinical isolate named SKPIV4 obtained at the Hospital for Sick Children in Toronto (Ontario, Canada). The coding regions for the N, P/V, M, F and HN proteins show very high identities (95% to 97%) with previously available partial sequences for HPIV4B. The sequence for the L protein and the non-coding regions represent new information. A surprising feature of the genome is its length, more than 17 kb, making it the longest genome within the genus Rubulavirus, although the length is well within the known range of 15 kb to 19 kb for the subfamily Paramyxovirinae. The availability of a complete genomic sequence will facilitate investigations on a respiratory virus that is still not completely characterized. PMID:21994536
When Genomics Is Not Enough: Experimental Evidence for a Decrease in LINE-1 Activity During the Evolution of Australian Marsupials

PubMed Central

Gallus, Susanne; Lammers, Fritjof

2016-01-01

The autonomous transposable element LINE-1 is a highly abundant element that makes up between 15% and 20% of therian mammal genomes. Since their origin before the divergence of marsupials and placental mammals, LINE-1 elements have contributed actively to the genome landscape. A previous in silico screen of the Tasmanian devil genome revealed a lack of functional coding LINE-1 sequences. In this study we present the results of an in vitro analysis from a partial LINE-1 reverse transcriptase coding sequence in five marsupial species. Our experimental screen supports the in silico findings of the genome-wide degradation of LINE-1 sequences in the Tasmanian devil, and identifies a high frequency of degraded LINE-1 sequences in other Australian marsupials. The comparison between the experimentally obtained LINE-1 sequences and reference genome assemblies suggests that conclusions from in silico analyses of retrotransposition activity can be influenced by incomplete genome assemblies from short reads. PMID:27389686
Gramene 2013: Comparative plant genomics resources

USDA-ARS?s Scientific Manuscript database

Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework fo...
Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing

PubMed Central

2011-01-01

Background Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. Results A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. Conclusions The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first step in the development of a community resource for further study of plant-insect co-evolution, anti-herbivore defense, floral developmental genetics, reproductive biology, chemical evolution, population genetics, and comparative genomics using milkweeds, and A. syriaca in particular, as ecological and evolutionary models. PMID:21542930
Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing.

PubMed

Straub, Shannon C K; Fishbein, Mark; Livshultz, Tatyana; Foster, Zachary; Parks, Matthew; Weitemier, Kevin; Cronn, Richard C; Liston, Aaron

2011-05-04

Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first step in the development of a community resource for further study of plant-insect co-evolution, anti-herbivore defense, floral developmental genetics, reproductive biology, chemical evolution, population genetics, and comparative genomics using milkweeds, and A. syriaca in particular, as ecological and evolutionary models.
Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial “pan-genome”

PubMed Central

Tettelin, Hervé; Masignani, Vega; Cieslewicz, Michael J.; Donati, Claudio; Medini, Duccio; Ward, Naomi L.; Angiuoli, Samuel V.; Crabtree, Jonathan; Jones, Amanda L.; Durkin, A. Scott; DeBoy, Robert T.; Davidsen, Tanja M.; Mora, Marirosa; Scarselli, Maria; Margarit y Ros, Immaculada; Peterson, Jeremy D.; Hauser, Christopher R.; Sundaram, Jaideep P.; Nelson, William C.; Madupu, Ramana; Brinkac, Lauren M.; Dodson, Robert J.; Rosovitz, Mary J.; Sullivan, Steven A.; Daugherty, Sean C.; Haft, Daniel H.; Selengut, Jeremy; Gwinn, Michelle L.; Zhou, Liwei; Zafar, Nikhat; Khouri, Hoda; Radune, Diana; Dimitrov, George; Watkins, Kisha; O'Connor, Kevin J. B.; Smith, Shannon; Utterback, Teresa R.; White, Owen; Rubens, Craig E.; Grandi, Guido; Madoff, Lawrence C.; Kasper, Dennis L.; Telford, John L.; Wessels, Michael R.; Rappuoli, Rino; Fraser, Claire M.

2005-01-01

The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and also limits genome-wide screens for vaccine candidates or for antimicrobial targets. We have generated the genomic sequence of six strains representing the five major disease-causing serotypes of Streptococcus agalactiae, the main cause of neonatal infection in humans. Analysis of these genomes and those available in databases showed that the S. agalactiae species can be described by a pan-genome consisting of a core genome shared by all isolates, accounting for ≈80% of any single genome, plus a dispensable genome consisting of partially shared and strain-specific genes. Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactiae pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes. PMID:16172379
Genome Information Broker (GIB): data retrieval and comparative analysis system for completed microbial genomes and more

PubMed Central

Fumoto, Masaki; Miyazaki, Satoru; Sugawara, Hideaki

2002-01-01

Genome Information Broker (GIB) is a powerful tool for the study of comparative genomics. GIB allows users to retrieve and display partial and/or whole genome sequences together with the relevant biological annotation. GIB has accumulated all the completed microbial genome and has recently been expanded to include Arabidopsis thaliana genome data from DDBJ/EMBL/GenBank. In the near future, hundreds of genome sequences will be determined. In order to handle such huge data, we have enhanced the GIB architecture by using XML, CORBA and distributed RDBs. We introduce the new GIB here. GIB is freely accessible at http://gib.genes.nig.ac.jp/. PMID:11752256
Piscine reovirus: Genomic and molecular phylogenetic analysis from farmed and wild salmonids collected on the Canada/US Pacific Coast

USGS Publications Warehouse

Siah, Ahmed; Morrison, Diane B.; Fringuelli, Elena; Savage, Paul S.; Richmond, Zina; Purcell, Maureen K.; Johns, Robert; Johnson, Stewart C.; Sakasida, Sonja M.

2015-01-01

Piscine reovirus (PRV) is a double stranded non-enveloped RNA virus detected in farmed and wild salmonids. This study examined the phylogenetic relationships among different PRV sequence types present in samples from salmonids in Western Canada and the US, including Alaska (US), British Columbia (Canada) and Washington State (US). Tissues testing positive for PRV were partially sequenced for segment S1, producing 71 sequences that grouped into 10 unique sequence types. Sequence analysis revealed no identifiable geographical or temporal variation among the sequence types. Identical sequence types were found in fish sampled in 2001, 2005 and 2014. In addition, PRV positive samples from fish derived from Alaska, British Columbia and Washington State share identical sequence types. Comparative analysis of the phylogenetic tree indicated that Canada/US Pacific Northwest sequences formed a subgroup with some Norwegian sequence types (group II), distinct from other Norwegian and Chilean sequences (groups I, III and IV). Representative PRV positive samples from farmed and wild fish in British Columbia and Washington State were subjected to genome sequencing using next generation sequencing methods. Individual analysis of each of the 10 partial segments indicated that the Canadian and US PRV sequence types clustered separately from available whole genome sequences of some Norwegian and Chilean sequences for all segments except the segment S4. In summary, PRV was genetically homogenous over a large geographic distance (Alaska to Washington State), and the sequence types were relatively stable over a 13 year period.
Piscine Reovirus: Genomic and Molecular Phylogenetic Analysis from Farmed and Wild Salmonids Collected on the Canada/US Pacific Coast

PubMed Central

Siah, Ahmed; Morrison, Diane B.; Fringuelli, Elena; Savage, Paul; Richmond, Zina; Johns, Robert; Purcell, Maureen K.; Johnson, Stewart C.; Saksida, Sonja M.

2015-01-01

Piscine reovirus (PRV) is a double stranded non-enveloped RNA virus detected in farmed and wild salmonids. This study examined the phylogenetic relationships among different PRV sequence types present in samples from salmonids in Western Canada and the US, including Alaska (US), British Columbia (Canada) and Washington State (US). Tissues testing positive for PRV were partially sequenced for segment S1, producing 71 sequences that grouped into 10 unique sequence types. Sequence analysis revealed no identifiable geographical or temporal variation among the sequence types. Identical sequence types were found in fish sampled in 2001, 2005 and 2014. In addition, PRV positive samples from fish derived from Alaska, British Columbia and Washington State share identical sequence types. Comparative analysis of the phylogenetic tree indicated that Canada/US Pacific Northwest sequences formed a subgroup with some Norwegian sequence types (group II), distinct from other Norwegian and Chilean sequences (groups I, III and IV). Representative PRV positive samples from farmed and wild fish in British Columbia and Washington State were subjected to genome sequencing using next generation sequencing methods. Individual analysis of each of the 10 partial segments indicated that the Canadian and US PRV sequence types clustered separately from available whole genome sequences of some Norwegian and Chilean sequences for all segments except the segment S4. In summary, PRV was genetically homogenous over a large geographic distance (Alaska to Washington State), and the sequence types were relatively stable over a 13 year period. PMID:26536673
Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic

PubMed Central

Yebra, Gonzalo; Hodcroft, Emma B.; Ragonnet-Cronin, Manon L.; Pillay, Deenan; Brown, Andrew J. Leigh; Fraser, Christophe; Kellam, Paul; de Oliveira, Tulio; Dennis, Ann; Hoppe, Anne; Kityo, Cissy; Frampton, Dan; Ssemwanga, Deogratius; Tanser, Frank; Keshani, Jagoda; Lingappa, Jairam; Herbeck, Joshua; Wawer, Maria; Essex, Max; Cohen, Myron S.; Paton, Nicholas; Ratmann, Oliver; Kaleebu, Pontiano; Hayes, Richard; Fidler, Sarah; Quinn, Thomas; Novitsky, Vladimir; Haywards, Andrew; Nastouli, Eleni; Morris, Steven; Clark, Duncan; Kozlakidis, Zisis

2016-01-01

HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes (gag-pol-env, gag-pol, gag, pol, env and partial pol) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree’s using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag-pol-env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences. PMID:28008945
Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic.

PubMed

Yebra, Gonzalo; Hodcroft, Emma B; Ragonnet-Cronin, Manon L; Pillay, Deenan; Brown, Andrew J Leigh

2016-12-23

HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes (gag-pol-env, gag-pol, gag, pol, env and partial pol) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree's using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag-pol-env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences.
First complete genome sequence of infectious laryngotracheitis virus

PubMed Central

2011-01-01

Background Infectious laryngotracheitis virus (ILTV) is an alphaherpesvirus that causes acute respiratory disease in chickens worldwide. To date, only one complete genomic sequence of ILTV has been reported. This sequence was generated by concatenating partial sequences from six different ILTV strains. Thus, the full genomic sequence of a single (individual) strain of ILTV has not been determined previously. This study aimed to use high throughput sequencing technology to determine the complete genomic sequence of a live attenuated vaccine strain of ILTV. Results The complete genomic sequence of the Serva vaccine strain of ILTV was determined, annotated and compared to the concatenated ILTV reference sequence. The genome size of the Serva strain was 152,628 bp, with a G + C content of 48%. A total of 80 predicted open reading frames were identified. The Serva strain had 96.5% DNA sequence identity with the concatenated ILTV sequence. Notably, the concatenated ILTV sequence was found to lack four large regions of sequence, including 528 bp and 594 bp of sequence in the UL29 and UL36 genes, respectively, and two copies of a 1,563 bp sequence in the repeat regions. Considerable differences in the size of the predicted translation products of 4 other genes (UL54, UL30, UL37 and UL38) were also identified. More than 530 single-nucleotide polymorphisms (SNPs) were identified. Most SNPs were located within three genomic regions, corresponding to sequence from the SA-2 ILTV vaccine strain in the concatenated ILTV sequence. Conclusions This is the first complete genomic sequence of an individual ILTV strain. This sequence will facilitate future comparative genomic studies of ILTV by providing an appropriate reference sequence for the sequence analysis of other ILTV strains. PMID:21501528

Serendipitous discovery of Wolbachia genomes in multiple Drosophila species.

PubMed

Salzberg, Steven L; Dunning Hotopp, Julie C; Delcher, Arthur L; Pop, Mihai; Smith, Douglas R; Eisen, Michael B; Nelson, William C

2005-01-01

The Trace Archive is a repository for the raw, unanalyzed data generated by large-scale genome sequencing projects. The existence of this data offers scientists the possibility of discovering additional genomic sequences beyond those originally sequenced. In particular, if the source DNA for a sequencing project came from a species that was colonized by another organism, then the project may yield substantial amounts of genomic DNA, including near-complete genomes, from the symbiotic or parasitic organism. By searching the publicly available repository of DNA sequencing trace data, we discovered three new species of the bacterial endosymbiont Wolbachia pipientis in three different species of fruit fly: Drosophila ananassae, D. simulans, and D. mojavensis. We extracted all sequences with partial matches to a previously sequenced Wolbachia strain and assembled those sequences using customized software. For one of the three new species, the data recovered were sufficient to produce an assembly that covers more than 95% of the genome; for a second species the data produce the equivalent of a 'light shotgun' sampling of the genome, covering an estimated 75-80% of the genome; and for the third species the data cover approximately 6-7% of the genome. The results of this study reveal an unexpected benefit of depositing raw data in a central genome sequence repository: new species can be discovered within this data. The differences between these three new Wolbachia genomes and the previously sequenced strain revealed numerous rearrangements and insertions within each lineage and hundreds of novel genes. The three new genomes, with annotation, have been deposited in GenBank.
The genome sequence of sweet cherry (Prunus avium) for use in genomics-assisted breeding.

PubMed

Shirasawa, Kenta; Isuzugawa, Kanji; Ikenaga, Mitsunobu; Saito, Yutaro; Yamamoto, Toshiya; Hirakawa, Hideki; Isobe, Sachiko

2017-10-01

We determined the genome sequence of sweet cherry (Prunus avium) using next-generation sequencing technology. The total length of the assembled sequences was 272.4 Mb, consisting of 10,148 scaffold sequences with an N50 length of 219.6 kb. The sequences covered 77.8% of the 352.9 Mb sweet cherry genome, as estimated by k-mer analysis, and included >96.0% of the core eukaryotic genes. We predicted 43,349 complete and partial protein-encoding genes. A high-density consensus map with 2,382 loci was constructed using double-digest restriction site-associated DNA sequencing. Comparing the genetic maps of sweet cherry and peach revealed high synteny between the two genomes; thus the scaffolds were integrated into pseudomolecules using map- and synteny-based strategies. Whole-genome resequencing of six modern cultivars found 1,016,866 SNPs and 162,402 insertions/deletions, out of which 0.7% were deleterious. The sequence variants, as well as simple sequence repeats, can be used as DNA markers. The genomic information helps us to identify agronomically important genes and will accelerate genetic studies and breeding programs for sweet cherries. Further information on the genomic sequences and DNA markers is available in DBcherry (http://cherry.kazusa.or.jp (8 May 2017, date last accessed)). © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
A Novel Partial Sequence Alignment Tool for Finding Large Deletions

PubMed Central

Aruk, Taner; Ustek, Duran; Kursun, Olcay

2012-01-01

Finding large deletions in genome sequences has become increasingly more useful in bioinformatics, such as in clinical research and diagnosis. Although there are a number of publically available next generation sequencing mapping and sequence alignment programs, these software packages do not correctly align fragments containing deletions larger than one kb. We present a fast alignment software package, BinaryPartialAlign, that can be used by wet lab scientists to find long structural variations in their experiments. For BinaryPartialAlign, we make use of the Smith-Waterman (SW) algorithm with a binary-search-based approach for alignment with large gaps that we called partial alignment. BinaryPartialAlign implementation is compared with other straight-forward applications of SW. Simulation results on mtDNA fragments demonstrate the effectiveness (runtime and accuracy) of the proposed method. PMID:22566777
Complete genome sequences of two divergent isolates of strawberry crinkle virus coinfecting a single strawberry plant.

PubMed

Koloniuk, Igor; Fránová, Jana; Sarkisova, Tatiana; Přibylová, Jaroslava

2018-05-04

Strawberry crinkle disease is one of the major diseases that threatens strawberry production. Although the biological properties of the agent, strawberry crinkle virus (SCV), have been thoroughly investigated, its complete genome sequence has never been published. Existing RT-PCR-based detection relies on a partial sequence of the L protein gene, presumably the least expressed viral gene. Here, we present complete sequences of two divergent SCV isolates co-infecting a single plant, Fragaria x ananassa cv. Čačanská raná.
IMGD: an integrated platform supporting comparative genomics and phylogenetics of insect mitochondrial genomes

PubMed Central

Lee, Wonhoon; Park, Jongsun; Choi, Jaeyoung; Jung, Kyongyong; Park, Bongsoo; Kim, Donghan; Lee, Jaeyoung; Ahn, Kyohun; Song, Wonho; Kang, Seogchan; Lee, Yong-Hwan; Lee, Seunghwan

2009-01-01

Background Sequences and organization of the mitochondrial genome have been used as markers to investigate evolutionary history and relationships in many taxonomic groups. The rapidly increasing mitochondrial genome sequences from diverse insects provide ample opportunities to explore various global evolutionary questions in the superclass Hexapoda. To adequately support such questions, it is imperative to establish an informatics platform that facilitates the retrieval and utilization of available mitochondrial genome sequence data. Results The Insect Mitochondrial Genome Database (IMGD) is a new integrated platform that archives the mitochondrial genome sequences from 25,747 hexapod species, including 112 completely sequenced and 20 nearly completed genomes and 113,985 partially sequenced mitochondrial genomes. The Species-driven User Interface (SUI) of IMGD supports data retrieval and diverse analyses at multi-taxon levels. The Phyloviewer implemented in IMGD provides three methods for drawing phylogenetic trees and displays the resulting trees on the web. The SNP database incorporated to IMGD presents the distribution of SNPs and INDELs in the mitochondrial genomes of multiple isolates within eight species. A newly developed comparative SNU Genome Browser supports the graphical presentation and interactive interface for the identified SNPs/INDELs. Conclusion The IMGD provides a solid foundation for the comparative mitochondrial genomics and phylogenetics of insects. All data and functions described here are available at the web site . PMID:19351385
Hepatitis E virus genotype 3 diversity: phylogenetic analysis and presence of subtype 3b in wild boar in Europe.

PubMed

Vina-Rodriguez, Ariel; Schlosser, Josephine; Becher, Dietmar; Kaden, Volker; Groschup, Martin H; Eiden, Martin

2015-05-22

An increasing number of indigenous cases of hepatitis E caused by genotype 3 viruses (HEV-3) have been diagnosed all around the word, particularly in industrialized countries. Hepatitis E is a zoonotic disease and accumulating evidence indicates that domestic pigs and wild boars are the main reservoirs of HEV-3. A detailed analysis of HEV-3 subtypes could help to determine the interplay of human activity, the role of animals as reservoirs and cross species transmission. Although complete genome sequences are most appropriate for HEV subtype determination, in most cases only partial genomic sequences are available. We therefore carried out a subtype classification analysis, which uses regions from all three open reading frames of the genome. Using this approach, more than 1000 published HEV-3 isolates were subtyped. Newly recovered HEV partial sequences from hunted German wild boars were also included in this study. These sequences were assigned to genotype 3 and clustered within subtype 3a, 3i and, unexpectedly, one of them within the subtype 3b, a first non-human report of this subtype in Europe.
Southern Tomato Virus: The Link between the Families Totiviridae and Partitiviridae

USDA-ARS?s Scientific Manuscript database

A dsRNA virus with a genome of 3.5 kb was isolated from field and greenhouse-grown tomato plants of different cultivars and geographic locations in North America. Cloning and sequencing of the viral genome showed the presence of two partially overlapping open reading frames (ORFs) and a genomic orga...
GWFASTA: server for FASTA search in eukaryotic and microbial genomes.

PubMed

Issac, Biju; Raghava, G P S

2002-09-01

Similarity searches are a powerful method for solving important biological problems such as database scanning, evolutionary studies, gene prediction, and protein structure prediction. FASTA is a widely used sequence comparison tool for rapid database scanning. Here we describe the GWFASTA server that was developed to assist the FASTA user in similarity searches against partially and/or completely sequenced genomes. GWFASTA consists of more than 60 microbial genomes, eight eukaryote genomes, and proteomes of annotatedgenomes. Infact, it provides the maximum number of databases for similarity searching from a single platform. GWFASTA allows the submission of more than one sequence as a single query for a FASTA search. It also provides integrated post-processing of FASTA output, including compositional analysis of proteins, multiple sequences alignment, and phylogenetic analysis. Furthermore, it summarizes the search results organism-wise for prokaryotes and chromosome-wise for eukaryotes. Thus, the integration of different tools for sequence analyses makes GWFASTA a powerful toolfor biologists.
Partial Shotgun Sequencing of the Boechera stricta Genome Reveals Extensive Microsynteny and Promoter Conservation with Arabidopsis1[W

PubMed Central

Windsor, Aaron J.; Schranz, M. Eric; Formanová, Nataša; Gebauer-Jung, Steffi; Bishop, John G.; Schnabelrauch, Domenica; Kroymann, Juergen; Mitchell-Olds, Thomas

2006-01-01

Comparative genomics provides insight into the evolutionary dynamics that shape discrete sequences as well as whole genomes. To advance comparative genomics within the Brassicaceae, we have end sequenced 23,136 medium-sized insert clones from Boechera stricta, a wild relative of Arabidopsis (Arabidopsis thaliana). A significant proportion of these sequences, 18,797, are nonredundant and display highly significant similarity (BLASTn e-value ≤ 10−30) to low copy number Arabidopsis genomic regions, including more than 9,000 annotated coding sequences. We have used this dataset to identify orthologous gene pairs in the two species and to perform a global comparison of DNA regions 5′ to annotated coding regions. On average, the 500 nucleotides upstream to coding sequences display 71.4% identity between the two species. In a similar analysis, 61.4% identity was observed between 5′ noncoding sequences of Brassica oleracea and Arabidopsis, indicating that regulatory regions are not as diverged among these lineages as previously anticipated. By mapping the B. stricta end sequences onto the Arabidopsis genome, we have identified nearly 2,000 conserved blocks of microsynteny (bracketing 26% of the Arabidopsis genome). A comparison of fully sequenced B. stricta inserts to their homologous Arabidopsis genomic regions indicates that indel polymorphisms >5 kb contribute substantially to the genome size difference observed between the two species. Further, we demonstrate that microsynteny inferred from end-sequence data can be applied to the rapid identification and cloning of genomic regions of interest from nonmodel species. These results suggest that among diploid relatives of Arabidopsis, small- to medium-scale shotgun sequencing approaches can provide rapid and cost-effective benefits to evolutionary and/or functional comparative genomic frameworks. PMID:16607030
Partial genome sequence of Thioalkalivibrio thiocyanodenitrificans ARhD 1 T, a chemolithoautotrophic haloalkaliphilic sulfur-oxidizing bacterium capable of complete denitrification

DOE PAGES

Berben, Tom; Sorokin, Dimitry Y.; Ivanova, Natalia; ...

2015-10-26

Thioalkalivibrio thiocyanodenitrificans strain ARhD 1 T is a motile, Gram-negative bacterium isolated from soda lakes that belongs to the Gammaproteobacteria. It derives energy for growth and carbon fixation from the oxidation of sulfur compounds, most notably thiocyanate, and so is a chemolithoautotroph. It is capable of complete denitrification under anaerobic conditions. In addition, the draft genome sequence consists of 3,746,647 bp in 3 scaffolds, containing 3558 protein-coding and 121 RNA genes. T. thiocyanodenitrificans ARhD 1 T was sequenced as part of the DOE Joint Genome Institute Community Science Program.
Partial genome sequence of the haloalkaliphilic soda lake bacterium Thioalkalivibrio thiocyanoxidans ARh 2 T

DOE PAGES

Berben, Tom; Sorokin, Dimitry Y.; Ivanova, Natalia; ...

2015-10-26

Thioalkalivibrio thiocyanoxidans strain ARh 2 T is a sulfur-oxidizing bacterium isolated from haloalkaline soda lakes. It is a motile, Gram-negative member of the Gammaproteobacteria. Remarkable properties include the ability to grow on thiocyanate as the sole energy, sulfur and nitrogen source, and the capability of growth at salinities of up to 4.3 M total Na +. This draft genome sequence consists of 61 scaffolds comprising 2,765,337 bp, and contains 2616 protein-coding and 61 RNA-coding genes. In conclusion, this organism was sequenced as part of the Community Science Program of the DOE Joint Genome Institute.
Shifts in the evolutionary rate and intensity of purifying selection between two Brassica genomes revealed by analyses of orthologous transposons and relics of a whole genome triplication.

PubMed

Zhao, Meixia; Du, Jianchang; Lin, Feng; Tong, Chaobo; Yu, Jingyin; Huang, Shunmou; Wang, Xiaowu; Liu, Shengyi; Ma, Jianxin

2013-10-01

Recent sequencing of the Brassica rapa and Brassica oleracea genomes revealed extremely contrasting genomic features such as the abundance and distribution of transposable elements between the two genomes. However, whether and how these structural differentiations may have influenced the evolutionary rates of the two genomes since their split from a common ancestor are unknown. Here, we investigated and compared the rates of nucleotide substitution between two long terminal repeats (LTRs) of individual orthologous LTR-retrotransposons, the rates of synonymous and non-synonymous substitution among triplicated genes retained in both genomes from a shared whole genome triplication event, and the rates of genetic recombination estimated/deduced by the comparison of physical and genetic distances along chromosomes and ratios of solo LTRs to intact elements. Overall, LTR sequences and genic sequences showed more rapid nucleotide substitution in B. rapa than in B. oleracea. Synonymous substitution of triplicated genes retained from a shared whole genome triplication was detected at higher rates in B. rapa than in B. oleracea. Interestingly, non-synonymous substitution was observed at lower rates in the former than in the latter, indicating shifted densities of purifying selection between the two genomes. In addition to evolutionary asymmetry, orthologous genes differentially regulated and/or disrupted by transposable elements between the two genomes were also characterized. Our analyses suggest that local genomic and epigenomic features, such as recombination rates and chromatin dynamics reshaped by independent proliferation of transposable elements and elimination between the two genomes, are perhaps partially the causes and partially the outcomes of the observed inter-specific asymmetric evolution. © 2013 Purdue University The Plant Journal © 2013 John Wiley & Sons Ltd.
Complete genome analysis of jasmine virus T from Jasminum sambac in China.

PubMed

Tang, Yajun; Gao, Fangluan; Yang, Zhen; Wu, Zujian; Yang, Liang

2016-07-01

The genome of a potyvirus (isolate JaVT_FZ) recovered from jasmine (Jasminum sambac L.) showing yellow ringspot symptoms in Fuzhou, China, was sequenced. JaVT_FZ is closely related to seven other potyviruses with completely sequenced genomes, with which it shares 66-70 % nucleotide and 52-56 % amino acid sequence identity. However, the coat protein (CP) gene shares 82-92 % nucleotide and 90-97 % amino acid sequence identity with those of two partially sequenced potyviruses, named jasmine potyvirus T (JaVT-jasmine) and jasmine yellow mosaic potyvirus (JaYMV-India), respectively. This suggests that JaVT_FZ, JaVT-jasmine and JaYMV-India should be regarded as members of a single potyvirus species, for which the name "Jasmine virus T" has priority.
Cloning of a CACTA transposon-like insertion in intron I of tomato invertase Lin5 gene and identification of transposase-like sequences of Solanaceae species.

PubMed

Proels, Reinhard K; Roitsch, Thomas

2006-03-01

Very few CACTA transposon-like sequences have been described in Solanaceae species. Sequence information has been restricted to partial transposase (TPase)-like fragments, and no target gene of CACTA-like transposon insertion has been described in tomato to date. In this manuscript, we report on a CACTA transposon-like insertion in intron I of tomato (Lycopersicon esculentum) invertase gene Lin5 and TPase-like sequences of several Solanaceae species. Consensus primers deduced from the TPase region of the tomato CACTA transposon-like element allowed the amplification of similar sequences from various Solanaceae species of different subfamilies including Solaneae (Solanum tuberosum), Cestreae (Nicotiana tabacum) and Datureae (Datura stramonium). This demonstrates the ubiquitous presence of CACTA-like elements in Solanaceae genomes. The obtained partial sequences are highly conserved, and allow further detection and detailed analysis of CACTA-like transposons throughout Solanaceae species. CACTA-like transposon sequences make possible the evaluation of their use for genome analysis, functional studies of genes and the evolutionary relationships between plant species.
PepLine: a software pipeline for high-throughput direct mapping of tandem mass spectrometry data on genomic sequences.

PubMed

Ferro, Myriam; Tardif, Marianne; Reguer, Erwan; Cahuzac, Romain; Bruley, Christophe; Vermat, Thierry; Nugues, Estelle; Vigouroux, Marielle; Vandenbrouck, Yves; Garin, Jérôme; Viari, Alain

2008-05-01

PepLine is a fully automated software which maps MS/MS fragmentation spectra of trypsic peptides to genomic DNA sequences. The approach is based on Peptide Sequence Tags (PSTs) obtained from partial interpretation of QTOF MS/MS spectra (first module). PSTs are then mapped on the six-frame translations of genomic sequences (second module) giving hits. Hits are then clustered to detect potential coding regions (third module). Our work aimed at optimizing the algorithms of each component to allow the whole pipeline to proceed in a fully automated manner using raw nucleic acid sequences (i.e., genomes that have not been "reduced" to a database of ORFs or putative exons sequences). The whole pipeline was tested on controlled MS/MS spectra sets from standard proteins and from Arabidopsis thaliana envelope chloroplast samples. Our results demonstrate that PepLine competed with protein database searching softwares and was fast enough to potentially tackle large data sets and/or high size genomes. We also illustrate the potential of this approach for the detection of the intron/exon structure of genes.
Whole genome sequencing identifies influenza A H3N2 transmission and offers superior resolution to classical typing methods.

PubMed

Meinel, Dominik M; Heinzinger, Susanne; Eberle, Ute; Ackermann, Nikolaus; Schönberger, Katharina; Sing, Andreas

2018-02-01

Influenza with its annual epidemic waves is a major cause of morbidity and mortality worldwide. However, only little whole genome data are available regarding the molecular epidemiology promoting our understanding of viral spread in human populations. We implemented a RT-PCR strategy starting from patient material to generate influenza A whole genome sequences for molecular epidemiological surveillance. Samples were obtained within the Bavarian Influenza Sentinel. The complete influenza virus genome was amplified by a one-tube multiplex RT-PCR and sequenced on an Illumina MiSeq. We report whole genomic sequences for 50 influenza A H3N2 viruses, which was the predominating virus in the season 2014/15, directly from patient specimens. The dataset included random samples from Bavaria (Germany) throughout the influenza season and samples from three suspected transmission clusters. We identified the outbreak samples based on sequence identity. Whole genome sequencing (WGS) was superior in resolution compared to analysis of single segments or partial segment analysis. Additionally, we detected manifestation of substantial amounts of viral quasispecies in several patients, carrying mutations varying from the dominant virus in each patient. Our rapid whole genome sequencing approach for influenza A virus shows that WGS can effectively be used to detect and understand outbreaks in large communities. Additionally, the genomic data provide in-depth details about the circulating virus within one season.
Random Amplification and Pyrosequencing for Identification of Novel Viral Genome Sequences

PubMed Central

Hang, Jun; Forshey, Brett M.; Kochel, Tadeusz J.; Li, Tao; Solórzano, Víctor Fiestas; Halsey, Eric S.; Kuschner, Robert A.

2012-01-01

ssRNA viruses have high levels of genomic divergence, which can lead to difficulty in genomic characterization of new viruses using traditional PCR amplification and sequencing methods. In this study, random reverse transcription, anchored random PCR amplification, and high-throughput pyrosequencing were used to identify orthobunyavirus sequences from total RNA extracted from viral cultures of acute febrile illness specimens. Draft genome sequence for the orthobunyavirus L segment was assembled and sequentially extended using de novo assembly contigs from pyrosequencing reads and orthobunyavirus sequences in GenBank as guidance. Accuracy and continuous coverage were achieved by mapping all reads to the L segment draft sequence. Subsequently, RT-PCR and Sanger sequencing were used to complete the genome sequence. The complete L segment was found to be 6936 bases in length, encoding a 2248-aa putative RNA polymerase. The identified L segment was distinct from previously published South American orthobunyaviruses, sharing 63% and 54% identity at the nucleotide and amino acid level, respectively, with the complete Oropouche virus L segment and 73% and 81% identity at the nucleotide and amino acid level, respectively, with a partial Caraparu virus L segment. The result demonstrated the effectiveness of a sequence-independent amplification and next-generation sequencing approach for obtaining complete viral genomes from total nucleic acid extracts and its use in pathogen discovery. PMID:22468136
The complete chloroplast genome sequences of Lychnis wilfordii and Silene capitata and comparative analyses with other Caryophyllaceae genomes.

PubMed

Kang, Jong-Soo; Lee, Byoung Yoon; Kwak, Myounghai

2017-01-01

The complete chloroplast genomes of Lychnis wilfordii and Silene capitata were determined and compared with ten previously reported Caryophyllaceae chloroplast genomes. The chloroplast genome sequences of L. wilfordii and S. capitata contain 152,320 bp and 150,224 bp, respectively. The gene contents and orders among 12 Caryophyllaceae species are consistent, but several microstructural changes have occurred. Expansion of the inverted repeat (IR) regions at the large single copy (LSC)/IRb and small single copy (SSC)/IR boundaries led to partial or entire gene duplications. Additionally, rearrangements of the LSC region were caused by gene inversions and/or transpositions. The 18 kb inversions, which occurred three times in different lineages of tribe Sileneae, were thought to be facilitated by the intermolecular duplicated sequences. Sequence analyses of the L. wilfordii and S. capitata genomes revealed 39 and 43 repeats, respectively, including forward, palindromic, and reverse repeats. In addition, a total of 67 and 56 simple sequence repeats were discovered in the L. wilfordii and S. capitata chloroplast genomes, respectively. Finally, we constructed phylogenetic trees of the 12 Caryophyllaceae species and two Amaranthaceae species based on 73 protein-coding genes using both maximum parsimony and likelihood methods.
First description of Grapevine leafroll-associated virus 5 in Argentina and partial genome sequence.

PubMed

Gómez Talquenca, Sebastián; Muñoz, Claudio; Grau, Oscar; Gracia, Olga

2009-02-01

An accession of Vitis vinifera cv. Red Globe from Argentina, was found to be infected with Grapevine leafroll-associated virus-5 by ELISA. It was partially sequenced, and three ORFs, corresponding to HSP70h, HSP90h, and CP, were found. This isolate shares a high aminoacid identity with the previously reported sequence of the virus, and identities between 80% and 90% with previously reported GLRaV-9 and GLRaV-4 isolates. The analysis of the sequence supports the clustering together with GLRaV-4 and GLRV-9 inside the Ampelovirus genus.
Complete Genome Sequence of Sulfuriferula sp. Strain AH1, a Sulfur-Oxidizing Autotroph Isolated from Weathered Mine Tailings from the Duluth Complex in Minnesota

PubMed Central

Roepke, Elizabeth W.; Hua, An An; Flood, Beverly E.; Bailey, Jake V.

2017-01-01

ABSTRACT We report the closed and annotated genome sequence of Sulfuriferula sp. strain AH1. Strain AH1 has a 2,877,007-bp chromosome that includes a partial Sox system for inorganic sulfur oxidation and a complete nitrogen fixation pathway. It also has a single 39,138-bp plasmid with genes for arsenic and mercury resistance. PMID:28798167

Draft Genome Sequence, and a Sequence-Defined Genetic Linkage Map of the Legume Crop Species Lupinus angustifolius L

PubMed Central

Zheng, Zequn; Zhang, Qisen; Zhou, Gaofeng; Sweetingham, Mark W.; Howieson, John G.; Li, Chengdao

2013-01-01

Lupin (Lupinus angustifolius L.) is the most recently domesticated crop in major agricultural cultivation. Its seeds are high in protein and dietary fibre, but low in oil and starch. Medical and dietetic studies have shown that consuming lupin-enriched food has significant health benefits. We report the draft assembly from a whole genome shotgun sequencing dataset for this legume species with 26.9x coverage of the genome, which is predicted to contain 57,807 genes. Analysis of the annotated genes with metabolic pathways provided a partial understanding of some key features of lupin, such as the amino acid profile of storage proteins in seeds. Furthermore, we applied the NGS-based RAD-sequencing technology to obtain 8,244 sequence-defined markers for anchoring the genomic sequences. A total of 4,214 scaffolds from the genome sequence assembly were aligned into the genetic map. The combination of the draft assembly and a sequence-defined genetic map made it possible to locate and study functional genes of agronomic interest. The identification of co-segregating SNP markers, scaffold sequences and gene annotation facilitated the identification of a candidate R gene associated with resistance to the major lupin disease anthracnose. We demonstrated that the combination of medium-depth genome sequencing and a high-density genetic linkage map by application of NGS technology is a cost-effective approach to generating genome sequence data and a large number of molecular markers to study the genomics, genetics and functional genes of lupin, and to apply them to molecular plant breeding. This strategy does not require prior genome knowledge, which potentiates its application to a wide range of non-model species. PMID:23734219
Draft genome sequence, and a sequence-defined genetic linkage map of the legume crop species Lupinus angustifolius L.

PubMed

Yang, Huaan; Tao, Ye; Zheng, Zequn; Zhang, Qisen; Zhou, Gaofeng; Sweetingham, Mark W; Howieson, John G; Li, Chengdao

2013-01-01

Lupin (Lupinus angustifolius L.) is the most recently domesticated crop in major agricultural cultivation. Its seeds are high in protein and dietary fibre, but low in oil and starch. Medical and dietetic studies have shown that consuming lupin-enriched food has significant health benefits. We report the draft assembly from a whole genome shotgun sequencing dataset for this legume species with 26.9x coverage of the genome, which is predicted to contain 57,807 genes. Analysis of the annotated genes with metabolic pathways provided a partial understanding of some key features of lupin, such as the amino acid profile of storage proteins in seeds. Furthermore, we applied the NGS-based RAD-sequencing technology to obtain 8,244 sequence-defined markers for anchoring the genomic sequences. A total of 4,214 scaffolds from the genome sequence assembly were aligned into the genetic map. The combination of the draft assembly and a sequence-defined genetic map made it possible to locate and study functional genes of agronomic interest. The identification of co-segregating SNP markers, scaffold sequences and gene annotation facilitated the identification of a candidate R gene associated with resistance to the major lupin disease anthracnose. We demonstrated that the combination of medium-depth genome sequencing and a high-density genetic linkage map by application of NGS technology is a cost-effective approach to generating genome sequence data and a large number of molecular markers to study the genomics, genetics and functional genes of lupin, and to apply them to molecular plant breeding. This strategy does not require prior genome knowledge, which potentiates its application to a wide range of non-model species.
FrameD: A flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences.

PubMed

Schiex, Thomas; Gouzy, Jérôme; Moisan, Annick; de Oliveira, Yannick

2003-07-01

We describe FrameD, a program that predicts coding regions in prokaryotic and matured eukaryotic sequences. Initially targeted at gene prediction in bacterial GC rich genomes, the gene model used in FrameD also allows to predict genes in the presence of frameshifts and partially undetermined sequences which makes it also very suitable for gene prediction and frameshift correction in unfinished sequences such as EST and EST cluster sequences. Like recent eukaryotic gene prediction programs, FrameD also includes the ability to take into account protein similarity information both in its prediction and its graphical output. Its performances are evaluated on different bacterial genomes. The web site (http://genopole.toulouse.inra.fr/bioinfo/FrameD/FD) allows direct prediction, sequence correction and translation and the ability to learn new models for new organisms.
Sequence Analysis of the Genome of Carnation (Dianthus caryophyllus L.)

PubMed Central

Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

2014-01-01

The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. ‘Francesco’ was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568 887 315 bp, consisting of 45 088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16 644 bp and 60 737 bp, respectively, and the longest scaffold was 1 287 144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. PMID:24344172
Hepatitis E Virus Genotype 3 Diversity: Phylogenetic Analysis and Presence of Subtype 3b in Wild Boar in Europe

PubMed Central

Vina-Rodriguez, Ariel; Schlosser, Josephine; Becher, Dietmar; Kaden, Volker; Groschup, Martin H.; Eiden, Martin

2015-01-01

An increasing number of indigenous cases of hepatitis E caused by genotype 3 viruses (HEV-3) have been diagnosed all around the word, particularly in industrialized countries. Hepatitis E is a zoonotic disease and accumulating evidence indicates that domestic pigs and wild boars are the main reservoirs of HEV-3. A detailed analysis of HEV-3 subtypes could help to determine the interplay of human activity, the role of animals as reservoirs and cross species transmission. Although complete genome sequences are most appropriate for HEV subtype determination, in most cases only partial genomic sequences are available. We therefore carried out a subtype classification analysis, which uses regions from all three open reading frames of the genome. Using this approach, more than 1000 published HEV-3 isolates were subtyped. Newly recovered HEV partial sequences from hunted German wild boars were also included in this study. These sequences were assigned to genotype 3 and clustered within subtype 3a, 3i and, unexpectedly, one of them within the subtype 3b, a first non-human report of this subtype in Europe. PMID:26008708
Genomics in Cardiovascular Disease

PubMed Central

Roberts, Robert; Marian, A.J.; Dandona, Sonny; Stewart, Alexandre F.R.

2013-01-01

A paradigm shift towards biology occurred in the 1990’s subsequently catalyzed by the sequencing of the human genome in 2000. The cost of DNA sequencing has gone from millions to thousands of dollars with sequencing of one’s entire genome costing only $1,000. Rapid DNA sequencing is being embraced for single gene disorders, particularly for sporadic cases and those from small families. Transmission of lethal genes such as associated with Huntington’s disease can, through in-vitro fertilization, avoid passing it on to one’s offspring. DNA sequencing will meet the challenge of elucidating the genetic predisposition for common polygenic diseases, especially in determining the function of the novel common genetic risk variants and identifying the rare variants, which may also partially ascertain the source of the missing heritability. The challenge for DNA sequencing remains great, despite human genome sequences being 99.5% identical, the 3 million single nucleotide polymorphisms (SNPs) responsible for most of the unique features add up to 60 new mutations per person which, for 7 billion people, is 420 billion mutations. It is claimed that DNA sequencing has increased 10,000 fold while information storage and retrieval only 16 fold. The physician and health user will be challenged by the convergence of two major trends, whole genome sequencing and the storage/retrieval and integration of the data. PMID:23524054
Confirmation of a novel siadenovirus species detected in raptors: partial sequence and phylogenetic analysis.

PubMed

Kovács, Endre R; Benko, Mária

2009-03-01

Partial genome characterisation of a novel adenovirus, found recently in organ samples of multiple species of dead birds of prey, was carried out by sequence analysis of PCR-amplified DNA fragments. The virus, named as raptor adenovirus 1 (RAdV-1), has originally been detected by a nested PCR method with consensus primers targeting the adenoviral DNA polymerase gene. Phylogenetic analysis with the deduced amino acid sequence of the small PCR product has implied a new siadenovirus type present in the samples. Since virus isolation attempts remained unsuccessful, further characterisation of this putative novel siadenovirus was carried out with the use of PCR on the infected organ samples. The DNA sequence of the central genome part of RAdV-1, encompassing nine full (pTP, 52K, pIIIa, III, pVII, pX, pVI, hexon, protease) and two partial (DNA polymerase and DBP) genes and exceeding 12 kb pairs in size, was determined. Phylogenetic tree reconstructions, based on several genes, unambiguously confirmed the preliminary classification of RAdV-1 as a new species within the genus Siadenovirus. Further study of RAdV-1 is of interest since it represents a rare adenovirus genus of yet undetermined host origin.
Why barcode? High-throughput multiplex sequencing of mitochondrial genomes for molecular systematics.

PubMed

Timmermans, M J T N; Dodsworth, S; Culverwell, C L; Bocak, L; Ahrens, D; Littlewood, D T J; Pons, J; Vogler, A P

2010-11-01

Mitochondrial genome sequences are important markers for phylogenetics but taxon sampling remains sporadic because of the great effort and cost required to acquire full-length sequences. Here, we demonstrate a simple, cost-effective way to sequence the full complement of protein coding mitochondrial genes from pooled samples using the 454/Roche platform. Multiplexing was achieved without the need for expensive indexing tags ('barcodes'). The method was trialled with a set of long-range polymerase chain reaction (PCR) fragments from 30 species of Coleoptera (beetles) sequenced in a 1/16th sector of a sequencing plate. Long contigs were produced from the pooled sequences with sequencing depths ranging from ∼10 to 100× per contig. Species identity of individual contigs was established via three 'bait' sequences matching disparate parts of the mitochondrial genome obtained by conventional PCR and Sanger sequencing. This proved that assembly of contigs from the sequencing pool was correct. Our study produced sequences for 21 nearly complete and seven partial sets of protein coding mitochondrial genes. Combined with existing sequences for 25 taxa, an improved estimate of basal relationships in Coleoptera was obtained. The procedure could be employed routinely for mitochondrial genome sequencing at the species level, to provide improved species 'barcodes' that currently use the cox1 gene only.
The Chthonomonas calidirosea Genome Is Highly Conserved across Geographic Locations and Distinct Chemical and Microbial Environments in New Zealand's Taupō Volcanic Zone.

PubMed

Lee, Kevin C; Stott, Matthew B; Dunfield, Peter F; Huttenhower, Curtis; McDonald, Ian R; Morgan, Xochitl C

2016-06-15

Chthonomonas calidirosea T49(T) is a low-abundance, carbohydrate-scavenging, and thermophilic soil bacterium with a seemingly disorganized genome. We hypothesized that the C. calidirosea genome would be highly responsive to local selection pressure, resulting in the divergence of its genomic content, genome organization, and carbohydrate utilization phenotype across environments. We tested this hypothesis by sequencing the genomes of four C. calidirosea isolates obtained from four separate geothermal fields in the Taupō Volcanic Zone, New Zealand. For each isolation site, we measured physicochemical attributes and defined the associated microbial community by 16S rRNA gene sequencing. Despite their ecological and geographical isolation, the genome sequences showed low divergence (maximum, 1.17%). Isolate-specific variations included single-nucleotide polymorphisms (SNPs), restriction-modification systems, and mobile elements but few major deletions and no major rearrangements. The 50-fold variation in C. calidirosea relative abundance among the four sites correlated with site environmental characteristics but not with differences in genomic content. Conversely, the carbohydrate utilization profiles of the C. calidirosea isolates corresponded to the inferred isolate phylogenies, which only partially paralleled the geographical relationships among the sample sites. Genomic sequence conservation does not entirely parallel geographic distance, suggesting that stochastic dispersal and localized extinction, which allow for rapid population homogenization with little restriction by geographical barriers, are possible mechanisms of C. calidirosea distribution. This dispersal and extinction mechanism is likely not limited to C. calidirosea but may shape the populations and genomes of many other low-abundance free-living taxa. This study compares the genomic sequence variations and metabolisms of four strains of Chthonomonas calidirosea, a rare thermophilic bacterium from the phylum Armatimonadetes It additionally compares the microbial communities and chemistry of each of the geographically distinct sites from which the four C. calidirosea strains were isolated. C. calidirosea was previously reported to possess a highly disorganized genome, but it was unclear whether this reflected rapid evolution. Here, we show that each isolation site has a distinct chemistry and microbial community, but despite this, the C. calidirosea genome is highly conserved across all isolation sites. Furthermore, genomic sequence differences only partially paralleled geographic distance, suggesting that C. calidirosea genotypes are not primarily determined by adaptive evolution. Instead, the presence of C. calidirosea may be driven by stochastic dispersal and localized extinction. This ecological mechanism may apply to many other low-abundance taxa. Copyright © 2016 Lee et al.
The Chthonomonas calidirosea Genome Is Highly Conserved across Geographic Locations and Distinct Chemical and Microbial Environments in New Zealand's Taupō Volcanic Zone

PubMed Central

Lee, Kevin C.; Stott, Matthew B.; Dunfield, Peter F.; Huttenhower, Curtis; McDonald, Ian R.

2016-01-01

ABSTRACT Chthonomonas calidirosea T49T is a low-abundance, carbohydrate-scavenging, and thermophilic soil bacterium with a seemingly disorganized genome. We hypothesized that the C. calidirosea genome would be highly responsive to local selection pressure, resulting in the divergence of its genomic content, genome organization, and carbohydrate utilization phenotype across environments. We tested this hypothesis by sequencing the genomes of four C. calidirosea isolates obtained from four separate geothermal fields in the Taupō Volcanic Zone, New Zealand. For each isolation site, we measured physicochemical attributes and defined the associated microbial community by 16S rRNA gene sequencing. Despite their ecological and geographical isolation, the genome sequences showed low divergence (maximum, 1.17%). Isolate-specific variations included single-nucleotide polymorphisms (SNPs), restriction-modification systems, and mobile elements but few major deletions and no major rearrangements. The 50-fold variation in C. calidirosea relative abundance among the four sites correlated with site environmental characteristics but not with differences in genomic content. Conversely, the carbohydrate utilization profiles of the C. calidirosea isolates corresponded to the inferred isolate phylogenies, which only partially paralleled the geographical relationships among the sample sites. Genomic sequence conservation does not entirely parallel geographic distance, suggesting that stochastic dispersal and localized extinction, which allow for rapid population homogenization with little restriction by geographical barriers, are possible mechanisms of C. calidirosea distribution. This dispersal and extinction mechanism is likely not limited to C. calidirosea but may shape the populations and genomes of many other low-abundance free-living taxa. IMPORTANCE This study compares the genomic sequence variations and metabolisms of four strains of Chthonomonas calidirosea, a rare thermophilic bacterium from the phylum Armatimonadetes. It additionally compares the microbial communities and chemistry of each of the geographically distinct sites from which the four C. calidirosea strains were isolated. C. calidirosea was previously reported to possess a highly disorganized genome, but it was unclear whether this reflected rapid evolution. Here, we show that each isolation site has a distinct chemistry and microbial community, but despite this, the C. calidirosea genome is highly conserved across all isolation sites. Furthermore, genomic sequence differences only partially paralleled geographic distance, suggesting that C. calidirosea genotypes are not primarily determined by adaptive evolution. Instead, the presence of C. calidirosea may be driven by stochastic dispersal and localized extinction. This ecological mechanism may apply to many other low-abundance taxa. PMID:27060125
Genomic sequences of Piezodorus guildinii from the southern United States

USDA-ARS?s Scientific Manuscript database

The Redbanded Stink Bug, Piezodorus guildinii, is native to Central and South America and a well-studied pest of soybeans in Brazil. Recently, it has been become economically important in the southern U.S. states, damaging soybeans from South Carolina to Texas. We cloned the partial genomic DNA from...
Single-Molecule Denaturation Mapping of Genomic DNA in Nanofluidic Channels

NASA Astrophysics Data System (ADS)

Reisner, Walter; Larsen, Niels; Kristensen, Anders; Tegenfeldt, Jonas O.; Flyvbjerg, Henrik

2009-03-01

We have developed a new DNA barcoding technique based on the partial denaturation of extended fluorescently labeled DNA molecules. We partially melt DNA extended in nanofluidic channels via a combination of local heating and added chemical denaturants. The melted molecules, imaged via a standard fluorescence videomicroscopy setup, exhibit a nonuniform fluorescence profile corresponding to a series of local dips and peaks in the intensity trace along the stretched molecule. We show that this barcode is consistent with the presence of locally melted regions and can be explained by calculations of sequence-dependent melting probability. We believe this melting mapping technology is the first optically based single molecule technique sensitive to genome wide sequence variation that does not require an additional enzymatic labeling or restriction scheme.
The Microbial Genomes Atlas (MiGA) webserver: taxonomic and gene diversity analysis of Archaea and Bacteria at the whole genome level.

PubMed

Rodriguez-R, Luis M; Gunturu, Santosh; Harvey, William T; Rosselló-Mora, Ramon; Tiedje, James M; Cole, James R; Konstantinidis, Konstantinos T

2018-06-14

The small subunit ribosomal RNA gene (16S rRNA) has been successfully used to catalogue and study the diversity of prokaryotic species and communities but it offers limited resolution at the species and finer levels, and cannot represent the whole-genome diversity and fluidity. To overcome these limitations, we introduced the Microbial Genomes Atlas (MiGA), a webserver that allows the classification of an unknown query genomic sequence, complete or partial, against all taxonomically classified taxa with available genome sequences, as well as comparisons to other related genomes including uncultivated ones, based on the genome-aggregate Average Nucleotide and Amino Acid Identity (ANI/AAI) concepts. MiGA integrates best practices in sequence quality trimming and assembly and allows input to be raw reads or assemblies from isolate genomes, single-cell sequences, and metagenome-assembled genomes (MAGs). Further, MiGA can take as input hundreds of closely related genomes of the same or closely related species (a so-called 'Clade Project') to assess their gene content diversity and evolutionary relationships, and calculate important clade properties such as the pangenome and core gene sets. Therefore, MiGA is expected to facilitate a range of genome-based taxonomic and diversity studies, and quality assessment across environmental and clinical settings. MiGA is available at http://microbial-genomes.org/.
Complete Taiwanese Macaque (Macaca cyclopis) Mitochondrial Genome: Reference-Assisted de novo Assembly with Multiple k-mer Strategy.

PubMed

Huang, Yu-Feng; Midha, Mohit; Chen, Tzu-Han; Wang, Yu-Tai; Smith, David Glenn; Pei, Kurtis Jai-Chyi; Chiu, Kuo Ping

2015-01-01

The Taiwanese (Formosan) macaque (Macaca cyclopis) is the only nonhuman primate endemic to Taiwan. This primate species is valuable for evolutionary studies and as subjects in medical research. However, only partial fragments of the mitochondrial genome (mitogenome) of this primate species have been sequenced, not mentioning its nuclear genome. We employed next-generation sequencing to generate 2 x 90 bp paired-end reads, followed by reference-assisted de novo assembly with multiple k-mer strategy to characterize the M. cyclopis mitogenome. We compared the assembled mitogenome with that of other macaque species for phylogenetic analysis. Our results show that, the M. cyclopis mitogenome consists of 16,563 nucleotides encoding for 13 protein-coding genes, 2 ribosomal RNAs and 22 transfer RNAs. Phylogenetic analysis indicates that M. cyclopis is most closely related to M. mulatta lasiota (Chinese rhesus macaque), supporting the notion of Asia-continental origin of M. cyclopis proposed in previous studies based on partial mitochondrial sequences. Our work presents a novel approach for assembling a mitogenome that utilizes the capabilities of de novo genome assembly with assistance of a reference genome. The availability of the complete Taiwanese macaque mitogenome will facilitate the study of primate evolution and the characterization of genetic variations for the potential usage of this species as a non-human primate model for medical research.
Genomic comparison of the endophyte Herbaspirillum seropedicae SmR1 and the phytopathogen Herbaspirillum rubrisubalbicans M1 by suppressive subtractive hybridization and partial genome sequencing.

PubMed

Monteiro, Rose A; Balsanelli, Eduardo; Tuleski, Thalita; Faoro, Helison; Cruz, Leonardo M; Wassem, Roseli; de Baura, Valter A; Tadra-Sfeir, Michelle Z; Weiss, Vinícius; DaRocha, Wanderson D; Muller-Santos, Marcelo; Chubatsu, Leda S; Huergo, Luciano F; Pedrosa, Fábio O; de Souza, Emanuel M

2012-05-01

Herbaspirillum rubrisubalbicans M1 causes the mottled stripe disease in sugarcane cv. B-4362. Inoculation of this cultivar with Herbaspirillum seropedicae SmR1 does not produce disease symptoms. A comparison of the genomic sequences of these closely related species may permit a better understanding of contrasting phenotype such as endophytic association and pathogenic life style. To achieve this goal, we constructed suppressive subtractive hybridization (SSH) libraries to identify DNA fragments present in one species and absent in the other. In a parallel approach, partial genomic sequence from H. rubrisubalbicans M1 was directly compared in silico with the H. seropedicae SmR1 genome. The genomic differences between the two organisms revealed by SSH suggested that lipopolysaccharide and adhesins are potential molecular factors involved in the different phenotypic behavior. The cluster wss probably involved in cellulose biosynthesis was found in H. rubrisubalbicans M1. Expression of this gene cluster was increased in H. rubrisubalbicans M1 cells attached to the surface of maize root, and knockout of wssD gene led to decrease in maize root surface attachment and endophytic colonization. The production of cellulose could be responsible for the maize attachment pattern of H. rubrisubalbicans M1 that is capable of outcompeting H. seropedicae SmR1. © 2012 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.
Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

PubMed

Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

2014-06-01

The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Unusual RNA plant virus integration in the soybean genome leads to the production of small RNAs.

PubMed

da Fonseca, Guilherme Cordenonsi; de Oliveira, Luiz Felipe Valter; de Morais, Guilherme Loss; Abdelnor, Ricardo Vilela; Nepomuceno, Alexandre Lima; Waterhouse, Peter M; Farinelli, Laurent; Margis, Rogerio

2016-05-01

Horizontal gene transfer (HGT) is known to be a major force in genome evolution. The acquisition of genes from viruses by eukaryotic genomes is a well-studied example of HGT, including rare cases of non-retroviral RNA virus integration. The present study describes the integration of cucumber mosaic virus RNA-1 into soybean genome. After an initial metatranscriptomic analysis of small RNAs derived from soybean, the de novo assembly resulted a 3029-nt contig homologous to RNA-1. The integration of this sequence in the soybean genome was confirmed by DNA deep sequencing. The locus where the integration occurred harbors the full RNA-1 sequence followed by the partial sequence of an endogenous mRNA and another sequence of RNA-1 as an inverted repeat and allowing the formation of a hairpin structure. This region recombined into a retrotransposon located inside an exon of a soybean gene. The nucleotide similarity of the integrated sequence compared to other Cucumber mosaic virus sequences indicates that the integration event occurred recently. We described a rare event of non-retroviral RNA virus integration in soybean that leads to the production of a double-stranded RNA in a similar fashion to virus resistance RNAi plants. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Genome sequence of the olive tree, Olea europaea.

PubMed

Cruz, Fernando; Julca, Irene; Gómez-Garrido, Jèssica; Loska, Damian; Marcet-Houben, Marina; Cano, Emilio; Galán, Beatriz; Frias, Leonor; Ribeca, Paolo; Derdak, Sophia; Gut, Marta; Sánchez-Fernández, Manuel; García, Jose Luis; Gut, Ivo G; Vargas, Pablo; Alioto, Tyler S; Gabaldón, Toni

2016-06-27

The Mediterranean olive tree (Olea europaea subsp. europaea) was one of the first trees to be domesticated and is currently of major agricultural importance in the Mediterranean region as the source of olive oil. The molecular bases underlying the phenotypic differences among domesticated cultivars, or between domesticated olive trees and their wild relatives, remain poorly understood. Both wild and cultivated olive trees have 46 chromosomes (2n). A total of 543 Gb of raw DNA sequence from whole genome shotgun sequencing, and a fosmid library containing 155,000 clones from a 1,000+ year-old olive tree (cv. Farga) were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 443 kb, and a total length of 1.31 Gb, which represents 95 % of the estimated genome length (1.38 Gb). In addition, the associated fungus Aureobasidium pullulans was partially sequenced. Genome annotation, assisted by RNA sequencing from leaf, root, and fruit tissues at various stages, resulted in 56,349 unique protein coding genes, suggesting recent genomic expansion. Genome completeness, as estimated using the CEGMA pipeline, reached 98.79 %. The assembled draft genome of O. europaea will provide a valuable resource for the study of the evolution and domestication processes of this important tree, and allow determination of the genetic bases of key phenotypic traits. Moreover, it will enhance breeding programs and the formation of new varieties.
Complete mitochondrial genome of the versicoloured emerald hummingbird Amazilia versicolor, a polymorphic species.

PubMed

Prosdocimi, Francisco; Souto, Helena Magarinos; Ruschi, Piero Angeli; Furtado, Carolina; Jennings, W Bryan

2016-09-01

The genome of the versicoloured emerald hummingbird (Amazilia versicolor) was partially sequenced in one-sixth of an Illumina HiSeq lane. The mitochondrial genome was assembled using MIRA and MITObim software, yielding a circular molecule of 16,861 bp in length and deposited in GenBank under the accession number KF624601. The mitogenome contained 13 protein-coding genes, 22 transfer tRNAs, 2 ribosomal RNAs and 1 non-coding control region. The molecule was assembled using 21,927 sequencing reads of 100 bp each, resulting in ∼130 × coverage of uniformly distributed reads along the genome. This is the forth mitochondrial genome described for this highly diverse family of birds and may benefit further phylogenetic, phylogeographic, population genetic and species delimitation studies of hummingbirds.
Complete genome sequence analysis of novel human bocavirus reveals genetic recombination between human bocavirus 2 and human bocavirus 4.

PubMed

Khamrin, Pattara; Okitsu, Shoko; Ushijima, Hiroshi; Maneekarn, Niwat

2013-07-01

Epidemiological surveillance of human bocavirus (HBoV) was conducted on fecal specimens collected from hospitalized children with diarrhea in Chiang Mai, Thailand in 2011. By partial sequence analysis of VP1 gene, an unusual strain of HBoV (CMH-S011-11), was initially identified as HBoV4. The complete genome sequence of CMH-S011-11 was performed and analyzed further to clarify whether it was a recombinant strain or a new HBoV variant. Analysis of complete genome sequence revealed that the coding sequence starting from NS1, NP1 to VP1/VP2 was 4795 nucleotides long. Interestingly, the nucleotide sequence of NS1 gene of CMH-S011-11 was most closely related to the HBoV2 reference strains detected in Pakistan, which contradicted to the initial genotyping result of the partial VP1 region in the previous study. In addition, comparison of NP1 nucleotide sequence of CMH-S011-11 with those of other HBoV1-4 reference strains also revealed a high level of sequence identity with HBoV2. On the other hand, nucleotide sequence of VP1/VP2 gene of CMH-S011-11 was most closely related to those of HBoV4 reference strains detected in Nigeria. The overall full-length sequence analysis revealed that this CMH-S011-11 was grouped within HBoV4 species, but located in a separate branch from other HBoV4 prototype strains. Recombination analysis revealed that CMH-S011-11 was the result of recombination between HBoV2 and HBoV4 strains with the break point located near the start codon of VP2. Copyright © 2013 Elsevier B.V. All rights reserved.

Navigating the tip of the genomic iceberg: Next-generation sequencing for plant systematics.

PubMed

Straub, Shannon C K; Parks, Matthew; Weitemier, Kevin; Fishbein, Mark; Cronn, Richard C; Liston, Aaron

2012-02-01

Just as Sanger sequencing did more than 20 years ago, next-generation sequencing (NGS) is poised to revolutionize plant systematics. By combining multiplexing approaches with NGS throughput, systematists may no longer need to choose between more taxa or more characters. Here we describe a genome skimming (shallow sequencing) approach for plant systematics. Through simulations, we evaluated optimal sequencing depth and performance of single-end and paired-end short read sequences for assembly of nuclear ribosomal DNA (rDNA) and plastomes and addressed the effect of divergence on reference-guided plastome assembly. We also used simulations to identify potential phylogenetic markers from low-copy nuclear loci at different sequencing depths. We demonstrated the utility of genome skimming through phylogenetic analysis of the Sonoran Desert clade (SDC) of Asclepias (Apocynaceae). Paired-end reads performed better than single-end reads. Minimum sequencing depths for high quality rDNA and plastome assemblies were 40× and 30×, respectively. Divergence from the reference significantly affected plastome assembly, but relatively similar references are available for most seed plants. Deeper rDNA sequencing is necessary to characterize intragenomic polymorphism. The low-copy fraction of the nuclear genome was readily surveyed, even at low sequencing depths. Nearly 160000 bp of sequence from three organelles provided evidence of phylogenetic incongruence in the SDC. Adoption of NGS will facilitate progress in plant systematics, as whole plastome and rDNA cistrons, partial mitochondrial genomes, and low-copy nuclear markers can now be efficiently obtained for molecular phylogenetics studies.
Novel molecular approach to define pest species status and tritrophic interactions from historical Bemisia specimens.

PubMed

Tay, W T; Elfekih, S; Polaszek, A; Court, L N; Evans, G A; Gordon, K H J; De Barro, P J

2017-03-27

Museum specimens represent valuable genomic resources for understanding host-endosymbiont/parasitoid evolutionary relationships, resolving species complexes and nomenclatural problems. However, museum collections suffer DNA degradation, making them challenging for molecular-based studies. Here, the mitogenomes of a single 1912 Sri Lankan Bemisia emiliae cotype puparium, and of a 1942 Japanese Bemisia puparium are characterised using a Next-Generation Sequencing approach. Whiteflies are small sap-sucking insects including B. tabaci pest species complex. Bemisia emiliae's draft mitogenome showed a high degree of homology with published B. tabaci mitogenomes, and exhibited 98-100% partial mitochondrial DNA Cytochrome Oxidase I (mtCOI) gene identity with the B. tabaci species known as Asia II-7. The partial mtCOI gene of the Japanese specimen shared 99% sequence identity with the Bemisia 'JpL' genetic group. Metagenomic analysis identified bacterial sequences in both Bemisia specimens, while hymenopteran sequences were also identified in the Japanese Bemisia puparium, including complete mtCOI and rRNA genes, and various partial mtDNA genes. At 88-90% mtCOI sequence identity to Aphelinidae wasps, we concluded that the 1942 Bemisia nymph was parasitized by an Eretmocerus parasitoid wasp. Our approach enables the characterisation of genomes and associated metagenomic communities of museum specimens using 1.5 ng gDNA, and to infer historical tritrophic relationships in Bemisia whiteflies.
A Primer on Metagenomics

PubMed Central

Wooley, John C.; Godzik, Adam; Friedberg, Iddo

2010-01-01

Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics. PMID:20195499
Outbreak of poliomyelitis in Finland in 1984-85 - Re-analysis of viral sequences using the current standard approach.

PubMed

Simonen, Marja-Leena; Roivainen, Merja; Iber, Jane; Burns, Cara; Hovi, Tapani

2010-01-01

In 1984, a wild type 3 poliovirus (PV3/FIN84) spread all over Finland causing nine cases of paralytic poliomyelitis and one case of aseptic meningitis. The outbreak was ended in 1985 with an intensive vaccination campaign. By limited sequence comparison with previously isolated PV3 strains, closest relatives of PV3/FIN84 were found among strains circulating in the Mediterranean region. Now we wanted to reanalyse the relationships using approaches currently exploited in poliovirus surveillance. Cell lysates of 22 strains isolated during the outbreak and stored frozen were subjected to RT-PCR amplification in three genomic regions without prior subculture. Sequences of the entire VP1 coding region, 150 nucleotides in the VP1-2A junction, most of the 5' non-coding region, partial sequences of the 3D RNA polymerase coding region and partial 3' non-coding region were compared within the outbreak and with sequences available in data banks. In addition, complete nucleotide sequences were obtained for 2 strains isolated from two different cases of disease during the outbreak. The results confirmed the previously described wide intraepidemic variation of the strains, including amino acid substitutions in antigenic sites, as well as the likely Mediterranean region origin of the strains. Simplot and bootscanning analyses of the complete genomes indicated complicated evolutionary history of the non-capsid coding regions of the genome suggesting several recombinations with different HEV-C viruses in the past.
The Fusarium Graminearum Genome Reveals a Link Between Localized Polymorphism and Pathogen Specialization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cuomo, Christina A.; Guldener, Ulrich; Xu, Jin Rong

2007-09-07

We sequenced and annotated the genome of the filamentous fungus Fusarium graminearum, a major pathogen of cultivated cereals. Very few repetitive sequences were detected, and the process of repeat-induced point mutation, in which duplicated sequences are subject to extensive mutation, may partially account for the reduced repeat content and apparent low number of paralogous (ancestrally duplicated) genes. A second strain of F. graminearum contained more than 10,000 single-nucleotide polymorphisms, which were frequently located near telomeres and within other discrete chromosomal segments. Many highly polymorphic regions contained sets of genes implicated in plant-fungus interactions and were unusually divergent, with higher ratesmore » of recombination. These regions of genome innovation may result from selection due to interactions of F. graminearum with its plant hosts.« less
A ddRAD-based genetic map and its integration with the genome assembly of Japanese eel (Anguilla japonica) provides insights into genome evolution after the teleost-specific genome duplication

PubMed Central

2014-01-01

Background Recent advancements in next-generation sequencing technology have enabled cost-effective sequencing of whole or partial genomes, permitting the discovery and characterization of molecular polymorphisms. Double-digest restriction-site associated DNA sequencing (ddRAD-seq) is a powerful and inexpensive approach to developing numerous single nucleotide polymorphism (SNP) markers and constructing a high-density genetic map. To enrich genomic resources for Japanese eel (Anguilla japonica), we constructed a ddRAD-based genetic map using an Ion Torrent Personal Genome Machine and anchored scaffolds of the current genome assembly to 19 linkage groups of the Japanese eel. Furthermore, we compared the Japanese eel genome with genomes of model fishes to infer the history of genome evolution after the teleost-specific genome duplication. Results We generated the ddRAD-based linkage map of the Japanese eel, where the maps for female and male spanned 1748.8 cM and 1294.5 cM, respectively, and were arranged into 19 linkage groups. A total of 2,672 SNP markers and 115 Simple Sequence Repeat markers provide anchor points to 1,252 scaffolds covering 151 Mb (13%) of the current genome assembly of the Japanese eel. Comparisons among the Japanese eel, medaka, zebrafish and spotted gar genomes showed highly conserved synteny among teleosts and revealed part of the eight major chromosomal rearrangement events that occurred soon after the teleost-specific genome duplication. Conclusions The ddRAD-seq approach combined with the Ion Torrent Personal Genome Machine sequencing allowed us to conduct efficient and flexible SNP genotyping. The integration of the genetic map and the assembled sequence provides a valuable resource for fine mapping and positional cloning of quantitative trait loci associated with economically important traits and for investigating comparative genomics of the Japanese eel. PMID:24669946
A ddRAD-based genetic map and its integration with the genome assembly of Japanese eel (Anguilla japonica) provides insights into genome evolution after the teleost-specific genome duplication.

PubMed

Kai, Wataru; Nomura, Kazuharu; Fujiwara, Atushi; Nakamura, Yoji; Yasuike, Motoshige; Ojima, Nobuhiko; Masaoka, Tetsuji; Ozaki, Akiyuki; Kazeto, Yukinori; Gen, Koichiro; Nagao, Jiro; Tanaka, Hideki; Kobayashi, Takanori; Ototake, Mitsuru

2014-03-26

Recent advancements in next-generation sequencing technology have enabled cost-effective sequencing of whole or partial genomes, permitting the discovery and characterization of molecular polymorphisms. Double-digest restriction-site associated DNA sequencing (ddRAD-seq) is a powerful and inexpensive approach to developing numerous single nucleotide polymorphism (SNP) markers and constructing a high-density genetic map. To enrich genomic resources for Japanese eel (Anguilla japonica), we constructed a ddRAD-based genetic map using an Ion Torrent Personal Genome Machine and anchored scaffolds of the current genome assembly to 19 linkage groups of the Japanese eel. Furthermore, we compared the Japanese eel genome with genomes of model fishes to infer the history of genome evolution after the teleost-specific genome duplication. We generated the ddRAD-based linkage map of the Japanese eel, where the maps for female and male spanned 1748.8 cM and 1294.5 cM, respectively, and were arranged into 19 linkage groups. A total of 2,672 SNP markers and 115 Simple Sequence Repeat markers provide anchor points to 1,252 scaffolds covering 151 Mb (13%) of the current genome assembly of the Japanese eel. Comparisons among the Japanese eel, medaka, zebrafish and spotted gar genomes showed highly conserved synteny among teleosts and revealed part of the eight major chromosomal rearrangement events that occurred soon after the teleost-specific genome duplication. The ddRAD-seq approach combined with the Ion Torrent Personal Genome Machine sequencing allowed us to conduct efficient and flexible SNP genotyping. The integration of the genetic map and the assembled sequence provides a valuable resource for fine mapping and positional cloning of quantitative trait loci associated with economically important traits and for investigating comparative genomics of the Japanese eel.
Using the Developmental Gene Bicoid to Identify Species of Forensically Important Blowflies (Diptera: Calliphoridae)

PubMed Central

Park, Seong Hwan; Park, Chung Hyun; Zhang, Yong; Piao, Huguo; Chung, Ukhee; Kim, Seong Yoon; Ko, Kwang Soo; Yi, Cheong-Ho; Jo, Tae-Ho; Hwang, Juck-Joon

2013-01-01

Identifying species of insects used to estimate postmortem interval (PMI) is a major subject in forensic entomology. Because forensic insect specimens are morphologically uniform and are obtained at various developmental stages, DNA markers are greatly needed. To develop new autosomal DNA markers to identify species, partial genomic sequences of the bicoid (bcd) genes, containing the homeobox and its flanking sequences, from 12 blowfly species (Aldrichina grahami, Calliphora vicina, Calliphora lata, Triceratopyga calliphoroides, Chrysomya megacephala, Chrysomya pinguis, Phormia regina, Lucilia ampullacea, Lucilia caesar, Lucilia illustris, Hemipyrellia ligurriens and Lucilia sericata; Calliphoridae: Diptera) were determined and analyzed. This study first sequenced the ten blowfly species other than C. vicina and L. sericata. Based on the bcd sequences of these 12 blowfly species, a phylogenetic tree was constructed that discriminates the subfamilies of Calliphoridae (Luciliinae, Chrysomyinae, and Calliphorinae) and most blowfly species. Even partial genomic sequences of about 500 bp can distinguish most blowfly species. The short intron 2 and coding sequences downstream of the bcd homeobox in exon 3 could be utilized to develop DNA markers for forensic applications. These gene sequences are important in the evolution of insect developmental biology and are potentially useful for identifying insect species in forensic science. PMID:23586044
Direct identification of non-polio enteroviruses in residual paralysis cases by analysis of VP1 sequences.

PubMed

Rahimi, Pooneh; Tabatabaie, H; Gouya, Mohammad M; Mahmudi, M; Musavi, T; Rad, K Samimi; Azad, T Mokhtari; Nategh, R

2009-06-01

The 66 serotypes of human enteroviruses (EVs) are classified into four species A-D, based on phylogenetic relationships in multiple genome regions. Partial VP(1) amplification and sequence analysis are reliable methods for identifying non-polio enterovirus serotypes, especially in negative cell culture specimens from patients with residual paralysis. In Iran during the years 2000-2002, there were 29 residual paralysis cases with negative cell (RD, HEp(2) and L(20)B) culture results. The genomic RNA was extracted from stool specimens from cases of residual paralysis and detected by amplification of the 5'-nontranslated region using RT-PCR with Pan-EV primers. Partial VP(1) amplification by semi-nested RT-PCR (snRT-PCR) and sequence analysis were done. Specimens from the 29 culture-negative cases contained echoviruses of six different serotypes. The global eradication of wild polioviruses is near and study of non-polio enteroviruses, which can cause poliomyelitis, is increasingly important to understand their pathogenesis. The VP(1) sequences, derived from the snRT-PCR products, allowed rapid molecular analysis of these non-polio strains.
Poliovirus replication proteins: RNA sequence encoding P3-1b and the sites of proteolytic processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Semler, B.L.; Anderson, C.W.; Kitamura, N.

1981-06-01

A partial amino-terminal amino acid sequence of each of the major proteins encoded by the replicase region of the poliovirus genome has been determined. A comparison of this sequence information with the amino acid sequence predicted from the RNA sequence that has been determined for the 3' region of the poliovirus genome has allowed us to locate precisely the proteolytic cleavage sites at which the initial polyprotein is processed to create the poliovirus products P3-1b (NCVP1b), P3-2 (NCVP2), P3-4b (NCVP4b), and P3-7c (NCVP7c). For each of these products, as well as for the small genome-linked protein VPg, proteolytic cleavage occursmore » between a glutamine and a glycine residue to create the amino terminus of each protein. This result suggests that a single proteinase may be responsible for all of these cleavages. The sequence data also allow the precise positioning of the genome-linked protein VPg within the precursor P3-1b just proximal to the amino terminus of polypeptide P3-2.« less
Isolation and characterization of 5S rDNA sequences in catfishes genome (Heptapteridae and Pseudopimelodidae): perspectives for rDNA studies in fish by C0t method.

PubMed

Gouveia, Juceli Gonzalez; Wolf, Ivan Rodrigo; de Moraes-Manécolo, Vivian Patrícia Oliveira; Bardella, Vanessa Belline; Ferracin, Lara Munique; Giuliano-Caetano, Lucia; da Rosa, Renata; Dias, Ana Lúcia

2016-12-01

Sequences of 5S ribosomal RNA (rRNA) are extensively used in fish cytogenomic studies, once they have a flexible organization at the chromosomal level, showing inter- and intra-specific variation in number and position in karyotypes. Sequences from the genome of Imparfinis schubarti (Heptapteridae) were isolated, aiming to understand the organization of 5S rDNA families in the fish genome. The isolation of 5S rDNA from the genome of I. schubarti was carried out by reassociation kinetics (C 0 t) and PCR amplification. The obtained sequences were cloned for the construction of a micro-library. The obtained clones were sequenced and hybridized in I. schubarti and Microglanis cottoides (Pseudopimelodidae) for chromosome mapping. An analysis of the sequence alignments with other fish groups was accomplished. Both methods were effective when using 5S rDNA for hybridization in I. schubarti genome. However, the C 0 t method enabled the use of a complete 5S rRNA gene, which was also successful in the hybridization of M. cottoides. Nevertheless, this gene was obtained only partially by PCR. The hybridization results and sequence analyses showed that intact 5S regions are more appropriate for the probe operation, due to conserved structure and motifs. This study contributes to a better understanding of the organization of multigene families in catfish's genomes.
Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system.

PubMed

Speth, Daan R; In 't Zandt, Michiel H; Guerrero-Cruz, Simon; Dutilh, Bas E; Jetten, Mike S M

2016-03-31

Partial-nitritation anammox (PNA) is a novel wastewater treatment procedure for energy-efficient ammonium removal. Here we use genome-resolved metagenomics to build a genome-based ecological model of the microbial community in a full-scale PNA reactor. Sludge from the bioreactor examined here is used to seed reactors in wastewater treatment plants around the world; however, the role of most of its microbial community in ammonium removal remains unknown. Our analysis yielded 23 near-complete draft genomes that together represent the majority of the microbial community. We assign these genomes to distinct anaerobic and aerobic microbial communities. In the aerobic community, nitrifying organisms and heterotrophs predominate. In the anaerobic community, widespread potential for partial denitrification suggests a nitrite loop increases treatment efficiency. Of our genomes, 19 have no previously cultivated or sequenced close relatives and six belong to bacterial phyla without any cultivated members, including the most complete Omnitrophica (formerly OP3) genome to date.
Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system

PubMed Central

Speth, Daan R.; in 't Zandt, Michiel H.; Guerrero-Cruz, Simon; Dutilh, Bas E.; Jetten, Mike S. M.

2016-01-01

Partial-nitritation anammox (PNA) is a novel wastewater treatment procedure for energy-efficient ammonium removal. Here we use genome-resolved metagenomics to build a genome-based ecological model of the microbial community in a full-scale PNA reactor. Sludge from the bioreactor examined here is used to seed reactors in wastewater treatment plants around the world; however, the role of most of its microbial community in ammonium removal remains unknown. Our analysis yielded 23 near-complete draft genomes that together represent the majority of the microbial community. We assign these genomes to distinct anaerobic and aerobic microbial communities. In the aerobic community, nitrifying organisms and heterotrophs predominate. In the anaerobic community, widespread potential for partial denitrification suggests a nitrite loop increases treatment efficiency. Of our genomes, 19 have no previously cultivated or sequenced close relatives and six belong to bacterial phyla without any cultivated members, including the most complete Omnitrophica (formerly OP3) genome to date. PMID:27029554
Phylogenetic analysis of the true water bugs (Insecta: Hemiptera: Heteroptera: Nepomorpha): evidence from mitochondrial genomes

PubMed Central

Hua, Jimeng; Li, Ming; Dong, Pengzhi; Cui, Ying; Xie, Qiang; Bu, Wenjun

2009-01-01

Background The true water bugs are grouped in infraorder Nepomorpha (Insecta: Hemiptera: Heteroptera) and are of great economic importance. The phylogenetic relationships within Nepomorpha and the taxonomic hierarchies of Pleoidea and Aphelocheiroidea are uncertain. Most of the previous studies were based on morphological characters without algorithmic assessment. In the latest study, the molecular markers employed in phylogenetic analyses were partial sequences of 16S rDNA and 18S rDNA with a total length about 1 kb. Up to now, no mitochondrial genome of the true water bugs has been sequenced, which is one of the largest data sets that could be compared across animal taxa. In this study we analyzed the unresolved problems in Nepomorpha using evidence from mitochondrial genomes. Results Nine mitochondrial genomes of Nepomorpha and five of other hemipterans were sequenced. These mitochondrial genomes contain the commonly found 37 genes without gene rearrangements. Based on the nucleotide sequences of mt-genomes, Pleoidea is not a member of the Nepomorpha and Aphelocheiroidea should be grouped back into Naucoroidea. Phylogenetic relationships among the superfamilies of Nepomorpha were resolved robustly. Conclusion The mt-genome is an effective data source for resolving intraordinal phylogenetic problems at the superfamily level within Heteroptera. The mitochondrial genomes of the true water bugs are typical insect mt-genomes. Based on the nucleotide sequences of the mt-genomes, we propose the Pleoidea to be a separate heteropteran infraorder. The infraorder Nepomorpha consists of five superfamilies with the relationships (Corixoidea + ((Naucoroidea + Notonectoidea) + (Ochteroidea + Nepoidea))). PMID:19523246
The Early ANTP Gene Repertoire: Insights from the Placozoan Genome

PubMed Central

Schierwater, Bernd; Kamm, Kai; Srivastava, Mansi; Rokhsar, Daniel; Rosengarten, Rafael D.; Dellaporta, Stephen L.

2008-01-01

The evolution of ANTP genes in the Metazoa has been the subject of conflicting hypotheses derived from full or partial gene sequences and genomic organization in higher animals. Whole genome sequences have recently filled in some crucial gaps for the basal metazoan phyla Cnidaria and Porifera. Here we analyze the complete genome of Trichoplax adhaerens, representing the basal metazoan phylum Placozoa, for its set of ANTP class genes. The Trichoplax genome encodes representatives of Hox/ParaHox-like, NKL, and extended Hox genes. This repertoire possibly mirrors the condition of a hypothetical cnidarian-bilaterian ancestor. The evolution of the cnidarian and bilaterian ANTP gene repertoires can be deduced by a limited number of cis-duplications of NKL and “extended Hox” genes and the presence of a single ancestral “ProtoHox” gene. PMID:18716659
Repetitive part of the banana (Musa acuminata) genome investigated by low-depth 454 sequencing.

PubMed

Hribová, Eva; Neumann, Pavel; Matsumoto, Takashi; Roux, Nicolas; Macas, Jirí; Dolezel, Jaroslav

2010-09-16

Bananas and plantains (Musa spp.) are grown in more than a hundred tropical and subtropical countries and provide staple food for hundreds of millions of people. They are seed-sterile crops propagated clonally and this makes them vulnerable to a rapid spread of devastating diseases and at the same time hampers breeding improved cultivars. Although the socio-economic importance of bananas and plantains cannot be overestimated, they remain outside the focus of major research programs. This slows down the study of nuclear genome and the development of molecular tools to facilitate banana improvement. In this work, we report on the first thorough characterization of the repeat component of the banana (M. acuminata cv. 'Calcutta 4') genome. Analysis of almost 100 Mb of sequence data (0.15× genome coverage) permitted partial sequence reconstruction and characterization of repetitive DNA, making up about 30% of the genome. The results showed that the banana repeats are predominantly made of various types of Ty1/copia and Ty3/gypsy retroelements representing 16 and 7% of the genome respectively. On the other hand, DNA transposons were found to be rare. In addition to new families of transposable elements, two new satellite repeats were discovered and found useful as cytogenetic markers. To help in banana sequence annotation, a specific Musa repeat database was created, and its utility was demonstrated by analyzing the repeat composition of 62 genomic BAC clones. A low-depth 454 sequencing of banana nuclear genome provided the largest amount of DNA sequence data available until now for Musa and permitted reconstruction of most of the major types of DNA repeats. The information obtained in this study improves the knowledge of the long-range organization of banana chromosomes, and provides sequence resources needed for repeat masking and annotation during the Musa genome sequencing project. It also provides sequence data for isolation of DNA markers to be used in genetic diversity studies and in marker-assisted selection.
Repetitive part of the banana (Musa acuminata) genome investigated by low-depth 454 sequencing

PubMed Central

2010-01-01

Background Bananas and plantains (Musa spp.) are grown in more than a hundred tropical and subtropical countries and provide staple food for hundreds of millions of people. They are seed-sterile crops propagated clonally and this makes them vulnerable to a rapid spread of devastating diseases and at the same time hampers breeding improved cultivars. Although the socio-economic importance of bananas and plantains cannot be overestimated, they remain outside the focus of major research programs. This slows down the study of nuclear genome and the development of molecular tools to facilitate banana improvement. Results In this work, we report on the first thorough characterization of the repeat component of the banana (M. acuminata cv. 'Calcutta 4') genome. Analysis of almost 100 Mb of sequence data (0.15× genome coverage) permitted partial sequence reconstruction and characterization of repetitive DNA, making up about 30% of the genome. The results showed that the banana repeats are predominantly made of various types of Ty1/copia and Ty3/gypsy retroelements representing 16 and 7% of the genome respectively. On the other hand, DNA transposons were found to be rare. In addition to new families of transposable elements, two new satellite repeats were discovered and found useful as cytogenetic markers. To help in banana sequence annotation, a specific Musa repeat database was created, and its utility was demonstrated by analyzing the repeat composition of 62 genomic BAC clones. Conclusion A low-depth 454 sequencing of banana nuclear genome provided the largest amount of DNA sequence data available until now for Musa and permitted reconstruction of most of the major types of DNA repeats. The information obtained in this study improves the knowledge of the long-range organization of banana chromosomes, and provides sequence resources needed for repeat masking and annotation during the Musa genome sequencing project. It also provides sequence data for isolation of DNA markers to be used in genetic diversity studies and in marker-assisted selection. PMID:20846365
Genome analysis and identification of gelatinase encoded gene in Enterobacter aerogenes

NASA Astrophysics Data System (ADS)

Shahimi, Safiyyah; Mutalib, Sahilah Abdul; Khalid, Rozida Abdul; Repin, Rul Aisyah Mat; Lamri, Mohd Fadly; Bakar, Mohd Faizal Abu; Isa, Mohd Noor Mat

2016-11-01

In this study, bioinformatic analysis towards genome sequence of E. aerogenes was done to determine gene encoded for gelatinase. Enterobacter aerogenes was isolated from hot spring water and gelatinase species-specific bacterium to porcine and fish gelatin. This bacterium offers the possibility of enzymes production which is specific to both species gelatine, respectively. Enterobacter aerogenes was partially genome sequenced resulting in 5.0 mega basepair (Mbp) total size of sequence. From pre-process pipeline, 87.6 Mbp of total reads, 68.8 Mbp of total high quality reads and 78.58 percent of high quality percentage was determined. Genome assembly produced 120 contigs with 67.5% of contigs over 1 kilo base pair (kbp), 124856 bp of N50 contig length and 55.17 % of GC base content percentage. About 4705 protein gene was identified from protein prediction analysis. Two candidate genes selected have highest similarity identity percentage against gelatinase enzyme available in Swiss-Prot and NCBI online database. They were NODE_9_length_26866_cov_148.013245_12 containing 1029 base pair (bp) sequence with 342 amino acid sequence and NODE_24_length_155103_cov_177.082458_62 which containing 717 bp sequence with 238 amino acid sequence, respectively. Thus, two paired of primers (forward and reverse) were designed, based on the open reading frame (ORF) of selected genes. Genome analysis of E. aerogenes resulting genes encoded gelatinase were identified.
Hyb-Seq: Combining target enrichment and genome skimming for plant phylogenomics1

PubMed Central

Weitemier, Kevin; Straub, Shannon C. K.; Cronn, Richard C.; Fishbein, Mark; Schmickl, Roswitha; McDonnell, Angela; Liston, Aaron

2014-01-01

• Premise of the study: Hyb-Seq, the combination of target enrichment and genome skimming, allows simultaneous data collection for low-copy nuclear genes and high-copy genomic targets for plant systematics and evolution studies. • Methods and Results: Genome and transcriptome assemblies for milkweed (Asclepias syriaca) were used to design enrichment probes for 3385 exons from 768 genes (>1.6 Mbp) followed by Illumina sequencing of enriched libraries. Hyb-Seq of 12 individuals (10 Asclepias species and two related genera) resulted in at least partial assembly of 92.6% of exons and 99.7% of genes and an average assembly length >2 Mbp. Importantly, complete plastomes and nuclear ribosomal DNA cistrons were assembled using off-target reads. Phylogenomic analyses demonstrated signal conflict between genomes. • Conclusions: The Hyb-Seq approach enables targeted sequencing of thousands of low-copy nuclear exons and flanking regions, as well as genome skimming of high-copy repeats and organellar genomes, to efficiently produce genome-scale data sets for phylogenomics. PMID:25225629
The complete mitochondrial genome of Hydra vulgaris (Hydroida: Hydridae).

PubMed

Pan, Hong-Chun; Fang, Hong-Yan; Li, Shi-Wei; Liu, Jun-Hong; Wang, Ying; Wang, An-Tai

2014-12-01

The complete mitochondrial genome of Hydra vulgaris (Hydroida: Hydridae) is composed of two linear DNA molecules. The mitochondrial DNA (mtDNA) molecule 1 is 8010 bp long and contains six protein-coding genes, large subunit rRNA, methionine and tryptophan tRNAs, two pseudogenes consisting respectively of a partial copy of COI, and terminal sequences at two ends of the linear mtDNA, while the mtDNA molecule 2 is 7576 bp long and contains seven protein-coding genes, small subunit rRNA, methionine tRNA, a pseudogene consisting of a partial copy of COI and terminal sequences at two ends of the linear mtDNA. COI gene begins with GTG as start codon, whereas other 12 protein-coding genes start with a typical ATG initiation codon. In addition, all protein-coding genes are terminated with TAA as stop codon.

Intervening sequences in a plant gene-comparison of the partial sequence of cDNA and genomic DNA of French bean phaseolin

NASA Astrophysics Data System (ADS)

Sun, S. M.; Slightom, J. L.; Hall, T. C.

1981-01-01

A plant gene coding for the major storage protein (phaseolin, G1-globulin) of the French bean was isolated from a genomic library constructed in the phage vector Charon 24A. Comparison of the nucleotide sequence of part of the gene with that of the cloned messenger RNA (cDNA) revealed the presence of three intervening sequences, all beginning with GTand ending with AG. The 5' and 3' boundaries of intervening sequences TVS-A (88 base pairs) and IVS-B (124 base pairs) are similar to those described for animal and viral genes, but the 3' boundary of IVS-C (129 base pairs) shows some differences. A sequence of 185 amino acids deduced from the cloned DMAs represents about 40% of a phaseolin polypeptide.
Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies.

PubMed

Zeng, Lu; Kortschak, R Daniel; Raison, Joy M; Bertozzi, Terry; Adelson, David L

2018-01-01

Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package.
Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies

PubMed Central

Zeng, Lu; Kortschak, R. Daniel; Raison, Joy M.

2018-01-01

Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package. PMID:29538441
Diversity Analysis of Dairy and Nondairy Lactococcus lactis Isolates, Using a Novel Multilocus Sequence Analysis Scheme and (GTG)5-PCR Fingerprinting▿

PubMed Central

Rademaker, Jan L. W.; Herbet, Hélène; Starrenburg, Marjo J. C.; Naser, Sabri M.; Gevers, Dirk; Kelly, William J.; Hugenholtz, Jeroen; Swings, Jean; van Hylckama Vlieg, Johan E. T.

2007-01-01

The diversity of a collection of 102 lactococcus isolates including 91 Lactococcus lactis isolates of dairy and nondairy origin was explored using partial small subunit rRNA gene sequence analysis and limited phenotypic analyses. A subset of 89 strains of L. lactis subsp. cremoris and L. lactis subsp. lactis isolates was further analyzed by (GTG)5-PCR fingerprinting and a novel multilocus sequence analysis (MLSA) scheme. Two major genomic lineages within L. lactis were found. The L. lactis subsp. cremoris type-strain-like genotype lineage included both L. lactis subsp. cremoris and L. lactis subsp. lactis isolates. The other major lineage, with a L. lactis subsp. lactis type-strain-like genotype, comprised L. lactis subsp. lactis isolates only. A novel third genomic lineage represented two L. lactis subsp. lactis isolates of nondairy origin. The genomic lineages deviate from the subspecific classification of L. lactis that is based on a few phenotypic traits only. MLSA of six partial genes (atpA, encoding ATP synthase alpha subunit; pheS, encoding phenylalanine tRNA synthetase; rpoA, encoding RNA polymerase alpha chain; bcaT, encoding branched chain amino acid aminotransferase; pepN, encoding aminopeptidase N; and pepX, encoding X-prolyl dipeptidyl peptidase) revealed 363 polymorphic sites (total length, 1,970 bases) among 89 L. lactis subsp. cremoris and L. lactis subsp. lactis isolates with unique sequence types for most isolates. This allowed high-resolution cluster analysis in which dairy isolates form subclusters of limited diversity within the genomic lineages. The pheS DNA sequence analysis yielded two genetic groups dissimilar to the other genotyping analysis-based lineages, indicating a disparate acquisition route for this gene. PMID:17890345
Diversity analysis of dairy and nondairy Lactococcus lactis isolates, using a novel multilocus sequence analysis scheme and (GTG)5-PCR fingerprinting.

PubMed

Rademaker, Jan L W; Herbet, Hélène; Starrenburg, Marjo J C; Naser, Sabri M; Gevers, Dirk; Kelly, William J; Hugenholtz, Jeroen; Swings, Jean; van Hylckama Vlieg, Johan E T

2007-11-01

The diversity of a collection of 102 lactococcus isolates including 91 Lactococcus lactis isolates of dairy and nondairy origin was explored using partial small subunit rRNA gene sequence analysis and limited phenotypic analyses. A subset of 89 strains of L. lactis subsp. cremoris and L. lactis subsp. lactis isolates was further analyzed by (GTG)(5)-PCR fingerprinting and a novel multilocus sequence analysis (MLSA) scheme. Two major genomic lineages within L. lactis were found. The L. lactis subsp. cremoris type-strain-like genotype lineage included both L. lactis subsp. cremoris and L. lactis subsp. lactis isolates. The other major lineage, with a L. lactis subsp. lactis type-strain-like genotype, comprised L. lactis subsp. lactis isolates only. A novel third genomic lineage represented two L. lactis subsp. lactis isolates of nondairy origin. The genomic lineages deviate from the subspecific classification of L. lactis that is based on a few phenotypic traits only. MLSA of six partial genes (atpA, encoding ATP synthase alpha subunit; pheS, encoding phenylalanine tRNA synthetase; rpoA, encoding RNA polymerase alpha chain; bcaT, encoding branched chain amino acid aminotransferase; pepN, encoding aminopeptidase N; and pepX, encoding X-prolyl dipeptidyl peptidase) revealed 363 polymorphic sites (total length, 1,970 bases) among 89 L. lactis subsp. cremoris and L. lactis subsp. lactis isolates with unique sequence types for most isolates. This allowed high-resolution cluster analysis in which dairy isolates form subclusters of limited diversity within the genomic lineages. The pheS DNA sequence analysis yielded two genetic groups dissimilar to the other genotyping analysis-based lineages, indicating a disparate acquisition route for this gene.
Xenopus laevis ribosomal protein genes: isolation of recombinant cDNA clones and study of the genomic organization.

PubMed Central

Bozzoni, I; Beccari, E; Luo, Z X; Amaldi, F

1981-01-01

Poly-A+ mRNA from Xenopus laevis oocytes, partially enriched for r-protein coding capacity has been used as starting material for preparing a cDNA bank in plasmid pBR322. The clones containing sequences specific for r-proteins have been selected by translation of the complementary mRNAs. Clones for six different r-proteins have been identified and utilized as probes for studying their genomic organization. Two gene copies per haploid genome were found for r-proteins L1, L14, S19, and four-five for protein S1, S8 and L32. Moreover a population polymorphism has been observed for the genomic regions containing sequences for r-protein S1, S8 and L14. Images PMID:6112733
Complete genomic sequences for hepatitis C virus subtypes 4b, 4c, 4d, 4g, 4k, 4l, 4m, 4n, 4o, 4p, 4q, 4r and 4t.

PubMed

Li, Chunhua; Lu, Ling; Wu, Xianghong; Wang, Chuanxi; Bennett, Phil; Lu, Teng; Murphy, Donald

2009-08-01

In this study, we characterized the full-length genomic sequences of 13 distinct hepatitis C virus (HCV) genotype 4 isolates/subtypes: QC264/4b, QC381/4c, QC382/4d, QC193/4g, QC383/4k, QC274/4l, QC249/4m, QC97/4n, QC93/4o, QC139/4p, QC262/4q, QC384/4r and QC155/4t. These were amplified, using RT-PCR, from the sera of patients now residing in Canada, 11 of which were African immigrants. The resulting genomes varied between 9421 and 9475 nt in length and each contains a single ORF of 9018-9069 nt. The sequences showed nucleotide similarities of 77.3-84.3 % in comparison with subtypes 4a (GenBank accession no. Y11604) and 4f (EF589160) and 70.6-72.8 % in comparison with genotype 1 (M62321/1a, M58335/1b, D14853/1c, and 1?/AJ851228) reference sequences. These similarities were often higher than those currently defined by HCV classification criteria for subtype (75.0-80.0 %) and genotype (67.0-70.0 %) division, respectively. Further analyses of the complete and partial E1 and partial NS5B sequences confirmed these 13 'provisionally assigned subtypes'.
Life in the dark: metagenomic evidence that a microbial slime community is driven by inorganic nitrogen metabolism.

PubMed

Tetu, Sasha G; Breakwell, Katy; Elbourne, Liam D H; Holmes, Andrew J; Gillings, Michael R; Paulsen, Ian T

2013-06-01

Beneath Australia's large, dry Nullarbor Plain lies an extensive underwater cave system, where dense microbial communities known as 'slime curtains' are found. These communities exist in isolation from photosynthetically derived carbon and are presumed to be chemoautotrophic. Earlier work found high levels of nitrite and nitrate in the cave waters and a high relative abundance of Nitrospirae in bacterial 16S rRNA clone libraries. This suggested that these communities may be supported by nitrite oxidation, however, details of the inorganic nitrogen cycling in these communities remained unclear. Here we report analysis of 16S rRNA amplicon and metagenomic sequence data from the Weebubbie cave slime curtain community. The microbial community is comprised of a diverse assortment of bacterial and archaeal genera, including an abundant population of Thaumarchaeota. Sufficient thaumarchaeotal sequence was recovered to enable a partial genome sequence to be assembled, which showed considerable synteny with the corresponding regions in the genome of the autotrophic ammonia oxidiser Nitrosopumilus maritimus SCM1. This partial genome sequence, contained regions with high sequence identity to the ammonia mono-oxygenase operon and carbon fixing 3-hydroxypropionate/4-hydroxybutyrate cycle genes of N. maritimus SCM1. Additionally, the community, as a whole, included genes encoding key enzymes for inorganic nitrogen transformations, including nitrification and denitrification. We propose that the Weebubbie slime curtain community represents a distinctive microbial ecosystem, in which primary productivity is due to the combined activity of archaeal ammonia-oxidisers and bacterial nitrite oxidisers.
Aligning the unalignable: bacteriophage whole genome alignments.

PubMed

Bérard, Sèverine; Chateau, Annie; Pompidor, Nicolas; Guertin, Paul; Bergeron, Anne; Swenson, Krister M

2016-01-13

In recent years, many studies focused on the description and comparison of large sets of related bacteriophage genomes. Due to the peculiar mosaic structure of these genomes, few informative approaches for comparing whole genomes exist: dot plots diagrams give a mostly qualitative assessment of the similarity/dissimilarity between two or more genomes, and clustering techniques are used to classify genomes. Multiple alignments are conspicuously absent from this scene. Indeed, whole genome aligners interpret lack of similarity between sequences as an indication of rearrangements, insertions, or losses. This behavior makes them ill-prepared to align bacteriophage genomes, where even closely related strains can accomplish the same biological function with highly dissimilar sequences. In this paper, we propose a multiple alignment strategy that exploits functional collinearity shared by related strains of bacteriophages, and uses partial orders to capture mosaicism of sets of genomes. As classical alignments do, the computed alignments can be used to predict that genes have the same biological function, even in the absence of detectable similarity. The Alpha aligner implements these ideas in visual interactive displays, and is used to compute several examples of alignments of Staphylococcus aureus and Mycobacterium bacteriophages, involving up to 29 genomes. Using these datasets, we prove that Alpha alignments are at least as good as those computed by standard aligners. Comparison with the progressive Mauve aligner - which implements a partial order strategy, but whose alignments are linearized - shows a greatly improved interactive graphic display, while avoiding misalignments. Multiple alignments of whole bacteriophage genomes work, and will become an important conceptual and visual tool in comparative genomics of sets of related strains. A python implementation of Alpha, along with installation instructions for Ubuntu and OSX, is available on bitbucket (https://bitbucket.org/thekswenson/alpha).
Analysis of sequences from field samples reveals the presence of the recently described pepper vein yellows virus (genus Polerovirus) in six additional countries.

PubMed

Knierim, Dennis; Tsai, Wen-Shi; Kenyon, Lawrence

2013-06-01

Polerovirus infection was detected by reverse transcription polymerase chain reaction (RT-PCR) in 29 pepper plants (Capsicum spp.) and one black nightshade plant (Solanum nigrum) sample collected from fields in India, Indonesia, Mali, Philippines, Thailand and Taiwan. At least two representative samples for each country were selected to generate a general polerovirus RT-PCR product of 1.4 kb length for sequencing. Sequence analysis of the partial genome sequences revealed the presence of pepper vein yellows virus (PeVYV) in all 13 samples. A 1990 Australian herbarium sample of pepper described by serological means as infected with capsicum yellows virus (CYV) was identified by sequence analysis of a partial CP sequence as probably infected with a potato leaf roll virus (PLRV) isolate.
Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula

PubMed Central

Macas, Jiří; Neumann, Pavel; Navrátilová, Alice

2007-01-01

Background Extraordinary size variation of higher plant nuclear genomes is in large part caused by differences in accumulation of repetitive DNA. This makes repetitive DNA of great interest for studying the molecular mechanisms shaping architecture and function of complex plant genomes. However, due to methodological constraints of conventional cloning and sequencing, a global description of repeat composition is available for only a very limited number of higher plants. In order to provide further data required for investigating evolutionary patterns of repeated DNA within and between species, we used a novel approach based on massive parallel sequencing which allowed a comprehensive repeat characterization in our model species, garden pea (Pisum sativum). Results Analysis of 33.3 Mb sequence data resulted in quantification and partial sequence reconstruction of major repeat families occurring in the pea genome with at least thousands of copies. Our results showed that the pea genome is dominated by LTR-retrotransposons, estimated at 140,000 copies/1C. Ty3/gypsy elements are less diverse and accumulated to higher copy numbers than Ty1/copia. This is in part due to a large population of Ogre-like retrotransposons which alone make up over 20% of the genome. In addition to numerous types of mobile elements, we have discovered a set of novel satellite repeats and two additional variants of telomeric sequences. Comparative genome analysis revealed that there are only a few repeat sequences conserved between pea and soybean genomes. On the other hand, all major families of pea mobile elements are well represented in M. truncatula. Conclusion We have demonstrated that even in a species with a relatively large genome like pea, where a single 454-sequencing run provided only 0.77% coverage, the generated sequences were sufficient to reconstruct and analyze major repeat families corresponding to a total of 35–48% of the genome. These data provide a starting point for further investigations of legume plant genomes based on their global comparative analysis and for the development of more sophisticated approaches for data mining. PMID:18031571
Agaricus bisporus genome sequence: a commentary.

PubMed

Kerrigan, Richard W; Challen, Michael P; Burton, Kerry S

2013-06-01

The genomes of two isolates of Agaricus bisporus have been sequenced recently. This soil-inhabiting fungus has a wide geographical distribution in nature and it is also cultivated in an industrialized indoor process ($4.7bn annual worldwide value) to produce edible mushrooms. Previously this lignocellulosic fungus has resisted precise econutritional classification, i.e. into white- or brown-rot decomposers. The generation of the genome sequence and transcriptomic analyses has revealed a new classification, 'humicolous', for species adapted to grow in humic-rich, partially decomposed leaf material. The Agaricus biporus genomes contain a collection of polysaccharide and lignin-degrading genes and more interestingly an expanded number of genes (relative to other lignocellulosic fungi) that enhance degradation of lignin derivatives, i.e. heme-thiolate peroxidases and β-etherases. A motif that is hypothesized to be a promoter element in the humicolous adaptation suite is present in a large number of genes specifically up-regulated when the mycelium is grown on humic-rich substrate. The genome sequence of A. bisporus offers a platform to explore fungal biology in carbon-rich soil environments and terrestrial cycling of carbon, nitrogen, phosphorus and potassium. Copyright © 2013 Elsevier Inc. All rights reserved.
The Fecal Viral Flora of Wild Rodents

PubMed Central

Phan, Tung G.; Kapusinszky, Beatrix; Wang, Chunlin; Rose, Robert K.; Lipton, Howard L.; Delwart, Eric L.

2011-01-01

The frequent interactions of rodents with humans make them a common source of zoonotic infections. To obtain an initial unbiased measure of the viral diversity in the enteric tract of wild rodents we sequenced partially purified, randomly amplified viral RNA and DNA in the feces of 105 wild rodents (mouse, vole, and rat) collected in California and Virginia. We identified in decreasing frequency sequences related to the mammalian viruses families Circoviridae, Picobirnaviridae, Picornaviridae, Astroviridae, Parvoviridae, Papillomaviridae, Adenoviridae, and Coronaviridae. Seventeen small circular DNA genomes containing one or two replicase genes distantly related to the Circoviridae representing several potentially new viral families were characterized. In the Picornaviridae family two new candidate genera as well as a close genetic relative of the human pathogen Aichi virus were characterized. Fragments of the first mouse sapelovirus and picobirnaviruses were identified and the first murine astrovirus genome was characterized. A mouse papillomavirus genome and fragments of a novel adenovirus and adenovirus-associated virus were also sequenced. The next largest fraction of the rodent fecal virome was related to insect viruses of the Densoviridae, Iridoviridae, Polydnaviridae, Dicistroviriade, Bromoviridae, and Virgaviridae families followed by plant virus-related sequences in the Nanoviridae, Geminiviridae, Phycodnaviridae, Secoviridae, Partitiviridae, Tymoviridae, Alphaflexiviridae, and Tombusviridae families reflecting the largely insect and plant rodent diet. Phylogenetic analyses of full and partial viral genomes therefore revealed many previously unreported viral species, genera, and families. The close genetic similarities noted between some rodent and human viruses might reflect past zoonoses. This study increases our understanding of the viral diversity in wild rodents and highlights the large number of still uncharacterized viruses in mammals. PMID:21909269
One Bacterial Cell, One Complete Genome

DOE Office of Scientific and Technical Information (OSTI.GOV)

Woyke, Tanja; Tighe, Damon; Mavrommatis, Konstantinos

2010-04-26

While the bulk of the finished microbial genomes sequenced to date are derived from cultured bacterial and archaeal representatives, the vast majority of microorganisms elude current culturing attempts, severely limiting the ability to recover complete or even partial genomes from these environmental species. Single cell genomics is a novel culture-independent approach, which enables access to the genetic material of an individual cell. No single cell genome has to our knowledge been closed and finished to date. Here we report the completed genome from an uncultured single cell of Candidatus Sulcia muelleri DMIN. Digital PCR on single symbiont cells isolated frommore » the bacteriome of the green sharpshooter Draeculacephala minerva bacteriome allowed us to assess that this bacteria is polyploid with genome copies ranging from approximately 200?900 per cell, making it a most suitable target for single cell finishing efforts. For single cell shotgun sequencing, an individual Sulcia cell was isolated and whole genome amplified by multiple displacement amplification (MDA). Sanger-based finishing methods allowed us to close the genome. To verify the correctness of our single cell genome and exclude MDA-derived artifacts, we independently shotgun sequenced and assembled the Sulcia genome from pooled bacteriomes using a metagenomic approach, yielding a nearly identical genome. Four variations we detected appear to be genuine biological differences between the two samples. Comparison of the single cell genome with bacteriome metagenomic sequence data detected two single nucleotide polymorphisms (SNPs), indicating extremely low genetic diversity within a Sulcia population. This study demonstrates the power of single cell genomics to generate a complete, high quality, non-composite reference genome within an environmental sample, which can be used for population genetic analyzes.« less
Mapping copy number variation by population-scale genome sequencing.

PubMed

Mills, Ryan E; Walter, Klaudia; Stewart, Chip; Handsaker, Robert E; Chen, Ken; Alkan, Can; Abyzov, Alexej; Yoon, Seungtai Chris; Ye, Kai; Cheetham, R Keira; Chinwalla, Asif; Conrad, Donald F; Fu, Yutao; Grubert, Fabian; Hajirasouliha, Iman; Hormozdiari, Fereydoun; Iakoucheva, Lilia M; Iqbal, Zamin; Kang, Shuli; Kidd, Jeffrey M; Konkel, Miriam K; Korn, Joshua; Khurana, Ekta; Kural, Deniz; Lam, Hugo Y K; Leng, Jing; Li, Ruiqiang; Li, Yingrui; Lin, Chang-Yun; Luo, Ruibang; Mu, Xinmeng Jasmine; Nemesh, James; Peckham, Heather E; Rausch, Tobias; Scally, Aylwyn; Shi, Xinghua; Stromberg, Michael P; Stütz, Adrian M; Urban, Alexander Eckehart; Walker, Jerilyn A; Wu, Jiantao; Zhang, Yujun; Zhang, Zhengdong D; Batzer, Mark A; Ding, Li; Marth, Gabor T; McVean, Gil; Sebat, Jonathan; Snyder, Michael; Wang, Jun; Ye, Kenny; Eichler, Evan E; Gerstein, Mark B; Hurles, Matthew E; Lee, Charles; McCarroll, Steven A; Korbel, Jan O

2011-02-03

Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.
Nucleotide sequences of Japanese isolates of citrus vein enation virus.

PubMed

Nakazono-Nagaoka, Eiko; Fujikawa, Takashi; Iwanami, Toru

2017-03-01

The genomic sequences of five Japanese isolates of citrus vein enation virus (CVEV) isolates that induce vein enation were determined and compared with that of the Spanish isolate VE-1. The nucleotide sequences of all Japanese isolates were 5,983 nt in length. The genomic RNA of Japanese isolates had five potential open reading frames (ORF 0, ORF 1, ORF 2, ORF 3, and ORF 5) in the positive-sense strand. The nucleotide sequence identity among the Japanese isolates and Spanish isolate VE-1 ranged from 98.0% to 99.8%. Comparison of the partial amino acid sequences of ten Japanese isolates and three Spanish isolates suggested that four amino acid residues, at positions of 83, 104, and 113 in ORF 2 and position 41 in ORF 5, might be unique to some Japanese isolates.
Full-length genome sequences of five hepatitis C virus isolates representing subtypes 3g, 3h, 3i and 3k, and a unique genotype 3 variant.

PubMed

Lu, Ling; Li, Chunhua; Yuan, Jie; Lu, Teng; Okamoto, Hiroaki; Murphy, Donald G

2013-03-01

We characterized the full-length genomes of five distinct hepatitis C virus (HCV)-3 isolates. These represent the first complete genomes for subtypes 3g and 3h, the second such genomes for 3k and 3i, and of one novel variant presently not assigned to a subtype. Each genome was determined from 18-25 overlapping fragments. They had lengths of 9579-9660 nt and each contained a single ORF encoding 3020-3025 aa. They were isolated from five patients residing in Canada; four were of Asian origin and one was of Somali origin. Phylogenetic analysis using 64 partial NS5B sequences differentiated 10 assigned subtypes, 3a-3i and 3k, and two additional lineages within genotype 3. From the data of this study, HCV-3 full-length sequences are now available for six of the assigned subtypes and one unassigned. Our findings should add insights to HCV evolutionary studies and clinical applications.
Complete nucleotide sequence of the Cryptomeria japonica D. Don. chloroplast genome and comparative chloroplast genomics: diversified genomic structure of coniferous species.

PubMed

Hirao, Tomonori; Watanabe, Atsushi; Kurita, Manabu; Kondo, Teiji; Takata, Katsuhiko

2008-06-23

The recent determination of complete chloroplast (cp) genomic sequences of various plant species has enabled numerous comparative analyses as well as advances in plant and genome evolutionary studies. In angiosperms, the complete cp genome sequences of about 70 species have been determined, whereas those of only three gymnosperm species, Cycas taitungensis, Pinus thunbergii, and Pinus koraiensis have been established. The lack of information regarding the gene content and genomic structure of gymnosperm cp genomes may severely hamper further progress of plant and cp genome evolutionary studies. To address this need, we report here the complete nucleotide sequence of the cp genome of Cryptomeria japonica, the first in the Cupressaceae sensu lato of gymnosperms, and provide a comparative analysis of their gene content and genomic structure that illustrates the unique genomic features of gymnosperms. The C. japonica cp genome is 131,810 bp in length, with 112 single copy genes and two duplicated (trnI-CAU, trnQ-UUG) genes that give a total of 116 genes. Compared to other land plant cp genomes, the C. japonica cp has lost one of the relevant large inverted repeats (IRs) found in angiosperms, fern, liverwort, and gymnosperms, such as Cycas and Gingko, and additionally has completely lost its trnR-CCG, partially lost its trnT-GGU, and shows diversification of accD. The genomic structure of the C. japonica cp genome also differs significantly from those of other plant species. For example, we estimate that a minimum of 15 inversions would be required to transform the gene organization of the Pinus thunbergii cp genome into that of C. japonica. In the C. japonica cp genome, direct repeat and inverted repeat sequences are observed at the inversion and translocation endpoints, and these sequences may be associated with the genomic rearrangements. The observed differences in genomic structure between C. japonica and other land plants, including pines, strongly support the theory that the large IRs stabilize the cp genome. Furthermore, the deleted large IR and the numerous genomic rearrangements that have occurred in the C. japonica cp genome provide new insights into both the evolutionary lineage of coniferous species in gymnosperm and the evolution of the cp genome.
Partial mapping and sequencing of a fish iridovirus genome reveals genes homologous to the frog virus 3 p31, p40 and human eIF2alpha.

PubMed

Yu, Y X; Béarzotti, M; Vende, P; Ahne, W; Brémont, M

1999-09-01

Iridovirus-like pathogens have been recognized as a cause of serious systemic diseases among feral, cultured and ornamental fish in the recent years. Mortalities of fish due to systemic iridovirus infection reaching 30-100% were observed in Europe, Australia, Japan and Thailand. Up to now, the molecular biology of these important pathogens has been poorly documented. To get better insights on the genomic organization of these piscine iridoviruses, we have constructed a cosmid viral DNA library from the epizootic hematopoietic necrosis virus (EHNV). Two recombinant cosmids (Cos7 and Cos12) have been selected for systematic sequencing. Cos7 and 12 are localized side by side along the genome and cover the 2/3 part of the total EHNV genome which has been estimated to be approximately 101.47 kb in length. Thirty five kilobase pairs (kbps) from Cos7 and 10 kbps from Cos12 have been determined. Sequence analysis revealed open reading frames (ORF) sharing homologies with sequences from the Frog virus 3 such as the p31 and p40 proteins. Among the others identified ORFs, some of them presented homologies with known protein sequences, such as the human eIF2alpha protein, and some did not show any significant homologies with sequences available in the databases. But, none were related to Lymphocystis virus, a member of the Iridoviridae family, for which the full genome nucleotide sequence has been determined.
Chromosomal localization and partial genomic structure of the human peroxisome proliferator activated receptor-gamma (hPPAR gamma) gene.

PubMed

Beamer, B A; Negri, C; Yen, C J; Gavrilova, O; Rumberger, J M; Durcan, M J; Yarnall, D P; Hawkins, A L; Griffin, C A; Burns, D K; Roth, J; Reitman, M; Shuldiner, A R

1997-04-28

We determined the chromosomal localization and partial genomic structure of the coding region of the human PPAR gamma gene (hPPAR gamma), a nuclear receptor important for adipocyte differentiation and function. Sequence analysis and long PCR of human genomic DNA with primers that span putative introns revealed that intron positions and sizes of hPPAR gamma are similar to those previously determined for the mouse PPAR gamma gene[13]. Fluorescent in situ hybridization localized hPPAR gamma to chromosome 3, band 3p25. Radiation hybrid mapping with two independent primer pairs was consistent with hPPAR gamma being within 1.5 Mb of marker D3S1263 on 3p25-p24.2. These sequences of the intron/exon junctions of the 6 coding exons shared by hPPAR gamma 1 and hPPAR gamma 2 will facilitate screening for possible mutations. Furthermore, D3S1263 is a suitable polymorphic marker for linkage analysis to evaluate PPAR gamma's potential contribution to genetic susceptibility to obesity, lipoatrophy, insulin resistance, and diabetes.

Development and validation of microsatellite markers for Brachiaria ruziziensis obtained by partial genome assembly of Illumina single-end reads

PubMed Central

2013-01-01

Background Brachiaria ruziziensis is one of the most important forage species planted in the tropics. The application of genomic tools to aid the selection of superior genotypes can provide support to B. ruziziensis breeding programs. However, there is a complete lack of information about the B. ruziziensis genome. Also, the availability of genomic tools, such as molecular markers, to support B. ruziziensis breeding programs is rather limited. Recently, next-generation sequencing technologies have been applied to generate sequence data for the identification of microsatellite regions and primer design. In this study, we present a first validated set of SSR markers for Brachiaria ruziziensis, selected from a de novo partial genome assembly of single-end Illumina reads. Results A total of 85,567 perfect microsatellite loci were detected in contigs with a minimum 10X coverage. We selected a set of 500 microsatellite loci identified in contigs with minimum 100X coverage for primer design and synthesis, and tested a subset of 269 primer pairs, 198 of which were polymorphic on 11 representative B. ruziziensis accessions. Descriptive statistics for these primer pairs are presented, as well as estimates of marker transferability to other relevant brachiaria species. Finally, a set of 11 multiplex panels containing the 30 most informative markers was validated and proposed for B. ruziziensis genetic analysis. Conclusions We show that the detection and development of microsatellite markers from genome assembled Illumina single-end DNA sequences is highly efficient. The developed markers are readily suitable for genetic analysis and marker assisted selection of Brachiaria ruziziensis. The use of this approach for microsatellite marker development is promising for species with limited genomic information, whose breeding programs would benefit from the use of genomic tools. To our knowledge, this is the first set of microsatellite markers developed for this important species. PMID:23324172
The complete mitogenome of the whale shark parasitic copepod Pandarus rhincodonicus norman, Newbound & Knott (Crustacea; Siphonostomatoida; Pandaridae)--a new gene order for the copepoda.

PubMed

Austin, Christopher M; Tan, Mun Hua; Lee, Yin Peng; Croft, Laurence J; Meekan, Mark G; Pierce, Simon J; Gan, Han Ming

2016-01-01

The complete mitochondrial genome of the parasitic copepod Pandarus rhincodonicus was obtained from a partial genome scan using the HiSeq sequencing system. The Pandarus rhincodonicus mitogenome has 14,480 base pairs (62% A+T content) made up of 12 protein-coding genes, 2 ribosomal subunit genes, 22 transfer RNAs, and a putative 384 bp non-coding AT-rich region. This Pandarus mitogenome sequence is the first for the family Pandaridae, the second for the order Siphonostomatoida and the sixth for the Copepoda.
Development and cross-species/genera transferability of microsatellite markers discovered using 454 genome sequencing in chokecherry (Prunus virginiana L.).

PubMed

Wang, Hongxia; Walla, James A; Zhong, Shaobin; Huang, Danqiong; Dai, Wenhao

2012-11-01

Chokecherry (Prunus virginiana L.) (2n = 4x = 32) is a unique Prunus species for both genetics and disease-resistance research due to its tetraploid nature and X-disease resistance. However, no genetic and genomic information on chokecherry is available. A partial chokecherry genome was sequenced using Roche 454 sequencing technology. A total of 145,094 reads covering 4.8 Mbp of the chokecherry genome were generated and 15,113 contigs were assembled, of which 11,675 contigs were larger than 100 bp in size. A total of 481 SSR loci were identified from 234 (out of 11,675) contigs and 246 polymerase chain reaction (PCR) primer pairs were designed. Of 246 primers, 212 (86.2 %) effectively produced amplification from the genomic DNA of chokecherry. All 212 amplifiable chokecherry primers were used to amplify genomic DNA from 11 other rosaceous species (sour cherry, sweet cherry, black cherry, peach, apricot, plum, apple, crabapple, pear, juneberry, and raspberry). Thus, chokecherry SSR primers can be transferable across Prunus species and other rosaceous species. An average of 63.2 and 58.7 % of amplifiable chokecherry primers amplified DNA from cherry and other Prunus species, respectively, while 47.2 % of amplifiable chokecherry primers amplified DNA from other rosaceous species. Using random genome sequence data generated from next-generation sequencing technology to identify microsatellite loci appears to be rapid and cost-efficient, particularly for species with no sequence information available. Sequence information and confirmed transferability of the identified chokecherry SSRs among species will be valuable for genetic research in Prunus and other rosaceous species. Key message A total of 246 SSR primers were identified from chokecherry genome sequences. Of which, 212 were confirmed amplifiable both in chokecherry and other 11 other rosaceous species.
Molecular characterization and phylogenetic analysis of Sugarcane yellow leaf virus isolates from China.

PubMed

Gao, San-Ji; Lin, Yi-Hua; Pan, Yong-Bao; Damaj, Mona B; Wang, Qin-Nan; Mirkov, T Erik; Chen, Ru-Kai

2012-10-01

Sugarcane yellow leaf virus (SCYLV) (genus Polerovirus, family Luteoviridae), the causal agent of sugarcane yellow leaf disease (YLD), was first detected in China in 2006. To assess the distribution of SCYLV in the major sugarcane-growing Chinese provinces, leaf samples from 22 sugarcane clones (Saccharum spp. hybrid) showing YLD symptoms were collected and analyzed for infection by the virus using reverse transcription PCR (RT-PCR), quantitative RT-PCR, and immunological assays. A complete genomic sequence (5,879 nt) of the Chinese SCYLV isolate CHN-FJ1 and partial genomic sequences (2,915 nt) of 13 other Chinese SCYLV isolates from this study were amplified, cloned, and sequenced. The genomic sequence of the CHN-FJ1 isolate was found to share a high identity (98.4-99.1 %) with those of the Brazilian (BRA) genotype isolates and a low identity (86.5-86.9 %) with those of the CHN1 and Cuban (CUB) genotype isolates. The genetic diversity of these 14 Chinese SCYLV isolates was assessed along with that of 29 SCYLV isolates of worldwide origin reported in the GenBank database, based on the full or partial genomic sequence. Phylogenetic analysis demonstrated that all the 14 Chinese SCYLV isolates clustered into one large group with the BRA genotype and 12 other reported SCYLV isolates. In addition, five reported Chinese SCYLV isolates were grouped with the Peruvian (PER), CHN1 and CUB genotypes. We therefore speculated that at least four SCYLV genotypes, BRA, PER, CHN1, and CUB, are associated with YLD in China. Interestingly, a 39-nt deletion was detected in the sequence of the CHN-GD3 isolate, in the middle of the ORF1 region adjacent to the overlap between ORF1 and ORF2. This location is known to be one of the recombination breakpoints in the Luteoviridae family.
Whole-genome comparative analysis of three phytopathogenic Xylella fastidiosa strains.

PubMed

Bhattacharyya, Anamitra; Stilwagen, Stephanie; Ivanova, Natalia; D'Souza, Mark; Bernal, Axel; Lykidis, Athanasios; Kapatral, Vinayak; Anderson, Iain; Larsen, Niels; Los, Tamara; Reznik, Gary; Selkov, Eugene; Walunas, Theresa L; Feil, Helene; Feil, William S; Purcell, Alexander; Lassez, Jean-Louis; Hawkins, Trevor L; Haselkorn, Robert; Overbeek, Ross; Predki, Paul F; Kyrpides, Nikos C

2002-09-17

Xylella fastidiosa (Xf) causes wilt disease in plants and is responsible for major economic and crop losses globally. Owing to the public importance of this phytopathogen we embarked on a comparative analysis of the complete genome of Xf pv citrus and the partial genomes of two recently sequenced strains of this species: Xf pv almond and Xf pv oleander, which cause leaf scorch in almond and oleander plants, respectively. We report a reanalysis of the previously sequenced Xf 9a5c (CVC, citrus) strain and the two "gapped" Xf genomes revealing ORFs encoding critical functions in pathogenicity and conjugative transfer. Second, a detailed whole-genome functional comparison was based on the three sequenced Xf strains, identifying the unique genes present in each strain, in addition to those shared between strains. Third, an "in silico" cellular reconstruction of these organisms was made, based on a comparison of their core functional subsystems that led to a characterization of their conjugative transfer machinery, identification of potential differences in their adhesion mechanisms, and highlighting of the absence of a classical quorum-sensing mechanism. This study demonstrates the effectiveness of comparative analysis strategies in the interpretation of genomes that are closely related.
Genome Structure of the Legume, Lotus japonicus

PubMed Central

Sato, Shusei; Nakamura, Yasukazu; Kaneko, Takakazu; Asamizu, Erika; Kato, Tomohiko; Nakao, Mitsuteru; Sasamoto, Shigemi; Watanabe, Akiko; Ono, Akiko; Kawashima, Kumiko; Fujishiro, Tsunakazu; Katoh, Midori; Kohara, Mitsuyo; Kishida, Yoshie; Minami, Chiharu; Nakayama, Shinobu; Nakazaki, Naomi; Shimizu, Yoshimi; Shinpo, Sayaka; Takahashi, Chika; Wada, Tsuyuko; Yamada, Manabu; Ohmido, Nobuko; Hayashi, Makoto; Fukui, Kiichi; Baba, Tomoya; Nakamichi, Tomoko; Mori, Hirotada; Tabata, Satoshi

2008-01-01

The legume Lotus japonicus has been widely used as a model system to investigate the genetic background of legume-specific phenomena such as symbiotic nitrogen fixation. Here, we report structural features of the L. japonicus genome. The 315.1-Mb sequences determined in this and previous studies correspond to 67% of the genome (472 Mb), and are likely to cover 91.3% of the gene space. Linkage mapping anchored 130-Mb sequences onto the six linkage groups. A total of 10 951 complete and 19 848 partial structures of protein-encoding genes were assigned to the genome. Comparative analysis of these genes revealed the expansion of several functional domains and gene families that are characteristic of L. japonicus. Synteny analysis detected traces of whole-genome duplication and the presence of synteny blocks with other plant genomes to various degrees. This study provides the first opportunity to look into the complex and unique genetic system of legumes. PMID:18511435
Analysis of the mitochondrial genome of cheetahs (Acinonyx jubatus) with neurodegenerative disease.

PubMed

Burger, Pamela A; Steinborn, Ralf; Walzer, Christian; Petit, Thierry; Mueller, Mathias; Schwarzenberger, Franz

2004-08-18

The complete mitochondrial genome of Acinonyx jubatus was sequenced and mitochondrial DNA (mtDNA) regions were screened for polymorphisms as candidates for the cause of a neurodegenerative demyelinating disease affecting captive cheetahs. The mtDNA reference sequences were established on the basis of the complete sequences of two diseased and two nondiseased animals as well as partial sequences of 26 further individuals. The A. jubatus mitochondrial genome is 17,047-bp long and shows a high sequence similarity (91%) to the domestic cat. Based on single nucleotide polymorphisms (SNPs) in the control region (CR) and pedigree information, the 18 myelopathic and 12 non-myelopathic cheetahs included in this study were classified into haplotypes I, II and III. In view of the phenotypic comparability of the neurodegenerative disease observed in cheetahs and human mtDNA-associated diseases, specific coding regions including the tRNAs leucine UUR, lysine, serine UCN, and partial complex I and V sequences were screened. We identified a heteroplasmic and a homoplasmic SNP at codon 507 in the subunit 5 (MTND5) of complex I. The heteroplasmic haplotype I-specific valine to methionine substitution represents a nonconservative amino acid change and was found in 11 myelopathic and eight non-myelopathic cheetahs with levels ranging from 29% to 79%. The homoplasmic conservative amino acid substitution valine to alanine was identified in two myelopathic animals of haplotype II. In addition, a synonymous SNP in the codon 76 of the MTND4L gene was found in the single haplotype III animal. The amino acid exchanges in the MTND5 gene were not associated with the occurrence of neurodegenerative disease in captive cheetahs.
First Report of a Fatal Case Associated with EV-D68 Infection in Hong Kong and Emergence of an Interclade Recombinant in China Revealed by Genome Analysis.

PubMed

Yip, Cyril C Y; Lo, Janice Y C; Sridhar, Siddharth; Lung, David C; Luk, Shik; Chan, Kwok-Hung; Chan, Jasper F W; Cheng, Vincent C C; Woo, Patrick C Y; Yuen, Kwok-Yung; Lau, Susanna K P

2017-05-16

A fatal case associated with enterovirus D68 (EV-D68) infection affecting a 10-year-old boy was reported in Hong Kong in 2014. To examine if a new strain has emerged in Hong Kong, we sequenced the partial genome of the EV-D68 strain identified from the fatal case and the complete VP1, and partial 5'UTR and 2C sequences of nine additional EV-D68 strains isolated from patients in Hong Kong. Sequence analysis indicated that a cluster of strains including the previously recognized A2 strains should belong to a separate clade, clade D, which is further divided into subclades D1 and D2. Among the 10 EV-D68 strains, 7 (including the fatal case) belonged to the previously described, newly emerged subclade B3, 2 belonged to subclade B1, and 1 belonged to subclade D1. Three EV-D68 strains, each from subclades B1, B3, and D1, were selected for complete genome sequencing and recombination analysis. While no evidence of recombination was noted among local strains, interclade recombination was identified in subclade D2 strains detected in mainland China in 2008 with VP2 acquired from clade A. This study supports the reclassification of subclade A2 into clade D1, and demonstrates interclade recombination between clades A and D2 in EV-D68 strains from China.
Delineating slowly and rapidly evolving fractions of the Drosophila genome.

PubMed

Keith, Jonathan M; Adams, Peter; Stephen, Stuart; Mattick, John S

2008-05-01

Evolutionary conservation is an important indicator of function and a major component of bioinformatic methods to identify non-protein-coding genes. We present a new Bayesian method for segmenting pairwise alignments of eukaryotic genomes while simultaneously classifying segments into slowly and rapidly evolving fractions. We also describe an information criterion similar to the Akaike Information Criterion (AIC) for determining the number of classes. Working with pairwise alignments enables detection of differences in conservation patterns among closely related species. We analyzed three whole-genome and three partial-genome pairwise alignments among eight Drosophila species. Three distinct classes of conservation level were detected. Sequences comprising the most slowly evolving component were consistent across a range of species pairs, and constituted approximately 62-66% of the D. melanogaster genome. Almost all (>90%) of the aligned protein-coding sequence is in this fraction, suggesting much of it (comprising the majority of the Drosophila genome, including approximately 56% of non-protein-coding sequences) is functional. The size and content of the most rapidly evolving component was species dependent, and varied from 1.6% to 4.8%. This fraction is also enriched for protein-coding sequence (while containing significant amounts of non-protein-coding sequence), suggesting it is under positive selection. We also classified segments according to conservation and GC content simultaneously. This analysis identified numerous sub-classes of those identified on the basis of conservation alone, but was nevertheless consistent with that classification. Software, data, and results available at www.maths.qut.edu.au/-keithj/. Genomic segments comprising the conservation classes available in BED format.
The genome sequence of Agrotis segetum granulovirus, isolate AgseGV-DA, reveals a new Betabaculovirus species of a slow killing granulovirus.

PubMed

Gueli Alletti, Gianpiero; Eigenbrod, Marina; Carstens, Eric B; Kleespies, Regina G; Jehle, Johannes A

2017-06-01

The European isolate Agrotis segetum granulovirus DA (AgseGV-DA) is a slow killing, type I granulovirus due to low dose-mortality responses within seven days post infection and a tissue tropism of infection restricted solely to the fat body of infected Agrotis segetum host larvae. The genome of AgseGV-DA was completely sequenced and compared to the whole genome sequences of the Chinese isolates AgseGV-XJ and AgseGV-L1. All three isolates share highly conserved genomes. The AgseGV-DA genome is 131,557bp in length and encodes for 149 putative open reading frames, including 37 baculovirus core genes and the per os infectivity factor ac110. Comprehensive investigations of repeat regions identified one putative non-hr like origin of replication in AgseGV-DA. Phylogenetic analysis based on concatenated amino acid alignments of 37 baculovirus core genes as well as pairwise distances based on the nucleotide alignments of partial granulin, lef-8 and lef-9 sequences with deposited betabaculoviruses confirmed AgseGV-DA, AgseGV-XJ and AgseGV-L1 as representative isolates of the same Betabaculovirus species. AgseGV encodes for a distinct putative enhancin, distantly related to enhancins from other granuloviruses. Copyright © 2017. Published by Elsevier Inc.
Genomic analysis of filoviruses associated with four viral hemorrhagic fever outbreaks in Uganda and the Democratic Republic of the Congo in 2012.

PubMed

Albariño, C G; Shoemaker, T; Khristova, M L; Wamala, J F; Muyembe, J J; Balinandi, S; Tumusiime, A; Campbell, S; Cannon, D; Gibbons, A; Bergeron, E; Bird, B; Dodd, K; Spiropoulou, C; Erickson, B R; Guerrero, L; Knust, B; Nichol, S T; Rollin, P E; Ströher, U

2013-08-01

In 2012, an unprecedented number of four distinct, partially overlapping filovirus-associated viral hemorrhagic fever outbreaks were detected in equatorial Africa. Analysis of complete virus genome sequences confirmed the reemergence of Sudan virus and Marburg virus in Uganda, and the first emergence of Bundibugyo virus in the Democratic Republic of the Congo. Published by Elsevier Inc.
Analysis of whole genome sequences of 16 strains of rubella virus from the United States, 1961-2009.

PubMed

Abernathy, Emily; Chen, Min-hsin; Bera, Jayati; Shrivastava, Susmita; Kirkness, Ewen; Zheng, Qi; Bellini, William; Icenogle, Joseph

2013-01-25

Rubella virus is the causative agent of rubella, a mild rash illness, and a potent teratogenic agent when contracted by a pregnant woman. Global rubella control programs target the reduction and elimination of congenital rubella syndrome. Phylogenetic analysis of partial sequences of rubella viruses has contributed to virus surveillance efforts and played an important role in demonstrating that indigenous rubella viruses have been eliminated in the United States. Sixteen wild-type rubella viruses were chosen for whole genome sequencing. All 16 viruses were collected in the United States from 1961 to 2009 and are from 8 of the 13 known rubella genotypes. Phylogenetic analysis of 30 whole genome sequences produced a maximum likelihood tree giving high bootstrap values for all genotypes except provisional genotype 1a. Comparison of the 16 new complete sequences and 14 previously sequenced wild-type viruses found regions with clusters of variable amino acids. The 5' 250 nucleotides of the genome are more conserved than any other part of the genome. Genotype specific deletions in the untranslated region between the non-structural and structural open reading frames were observed for genotypes 2B and genotype 1G. No evidence was seen for recombination events among the 30 viruses. The analysis presented here is consistent with previous reports on the genetic characterization of rubella virus genomes. Conserved and variable regions were identified and additional evidence for genotype specific nucleotide deletions in the intergenic region was found. Phylogenetic analysis confirmed genotype groupings originally based on structural protein coding region sequences, which provides support for the WHO nomenclature for genetic characterization of wild-type rubella viruses.
EXONSAMPLER: a computer program for genome-wide and candidate gene exon sampling for targeted next-generation sequencing.

PubMed

Cosart, Ted; Beja-Pereira, Albano; Luikart, Gordon

2014-11-01

The computer program EXONSAMPLER automates the sampling of thousands of exon sequences from publicly available reference genome sequences and gene annotation databases. It was designed to provide exon sequences for the efficient, next-generation gene sequencing method called exon capture. The exon sequences can be sampled by a list of gene name abbreviations (e.g. IFNG, TLR1), or by sampling exons from genes spaced evenly across chromosomes. It provides a list of genomic coordinates (a bed file), as well as a set of sequences in fasta format. User-adjustable parameters for collecting exon sequences include a minimum and maximum acceptable exon length, maximum number of exonic base pairs (bp) to sample per gene, and maximum total bp for the entire collection. It allows for partial sampling of very large exons. It can preferentially sample upstream (5 prime) exons, downstream (3 prime) exons, both external exons, or all internal exons. It is written in the Python programming language using its free libraries. We describe the use of EXONSAMPLER to collect exon sequences from the domestic cow (Bos taurus) genome for the design of an exon-capture microarray to sequence exons from related species, including the zebu cow and wild bison. We collected ~10% of the exome (~3 million bp), including 155 candidate genes, and ~16,000 exons evenly spaced genomewide. We prioritized the collection of 5 prime exons to facilitate discovery and genotyping of SNPs near upstream gene regulatory DNA sequences, which control gene expression and are often under natural selection. © 2014 John Wiley & Sons Ltd.
Genomic organization of the neurofibromatosis 1 gene (NF1)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Y.; O`Connell, P.; Huntsman Breidenbach, H.

Neurofibromatosis 1 maps to chromosome band 17q11.2, and the NF1 locus has been partially characterized. Even though the full-length NF1 cDNA has been sequenced, the complete genomic structure of the NF1 gene has not been elucidated. The 5{prime} end of NF1 is embedded in a CpG island containing a NotI restriction site, and the remainder of the gene lies in the adjacent 350-kb NotI fragment. In our efforts to develop a comprehensive screen for NF1 mutations, we have isolated genomic DNA clones that together harbor the entire NF1 cDNA sequence. We have identified all intron-exon boundaries of the coding regionmore » and established that it is composed of 59 exons. Furthermore, we have defined the 3{prime}-untranslated region (3{prime}-UTR) of the NF1 gene; it spans approximately 3.5 kb of genomic DNA sequence and is continuous with the stop codon. Oligonucleotide primer pairs synthesized from exon-flanking DNA sequences were used in the polymerase chain reaction with cloned, chromosome 17-specific genomic DNA as template to amplify NF1 exons 1 through 27b and the exon containing the 3{prime}-UTR separately. This information should be useful for implementing a comprehensive NF1 mutation screen using genomic DNA as template. 41 refs., 3 figs., 2 tabs.« less
Annotation of a hybrid partial genome of the coffee rust (Hemileia vastatrix) contributes to the gene repertoire catalog of the Pucciniales

PubMed Central

Cristancho, Marco A.; Botero-Rozo, David Octavio; Giraldo, William; Tabima, Javier; Riaño-Pachón, Diego Mauricio; Escobar, Carolina; Rozo, Yomara; Rivera, Luis F.; Durán, Andrés; Restrepo, Silvia; Eilam, Tamar; Anikster, Yehoshua; Gaitán, Alvaro L.

2014-01-01

Coffee leaf rust caused by the fungus Hemileia vastatrix is the most damaging disease to coffee worldwide. The pathogen has recently appeared in multiple outbreaks in coffee producing countries resulting in significant yield losses and increases in costs related to its control. New races/isolates are constantly emerging as evidenced by the presence of the fungus in plants that were previously resistant. Genomic studies are opening new avenues for the study of the evolution of pathogens, the detailed description of plant-pathogen interactions and the development of molecular techniques for the identification of individual isolates. For this purpose we sequenced 8 different H. vastatrix isolates using NGS technologies and gathered partial genome assemblies due to the large repetitive content in the coffee rust hybrid genome; 74.4% of the assembled contigs harbor repetitive sequences. A hybrid assembly of 333 Mb was built based on the 8 isolates; this assembly was used for subsequent analyses. Analysis of the conserved gene space showed that the hybrid H. vastatrix genome, though highly fragmented, had a satisfactory level of completion with 91.94% of core protein-coding orthologous genes present. RNA-Seq from urediniospores was used to guide the de novo annotation of the H. vastatrix gene complement. In total, 14,445 genes organized in 3921 families were uncovered; a considerable proportion of the predicted proteins (73.8%) were homologous to other Pucciniales species genomes. Several gene families related to the fungal lifestyle were identified, particularly 483 predicted secreted proteins that represent candidate effector genes and will provide interesting hints to decipher virulence in the coffee rust fungus. The genome sequence of Hva will serve as a template to understand the molecular mechanisms used by this fungus to attack the coffee plant, to study the diversity of this species and for the development of molecular markers to distinguish races/isolates. PMID:25400655
Genomic and molecular analysis of phage CMP1 from Clavibacter michiganensis subspecies michiganensis

PubMed Central

Wittmann, Johannes; Gartemann, Karl-Heinz; Eichenlaub, Rudolf

2011-01-01

Bacteriophage CMP1 is a member of the Siphoviridae family that infects specifically the plant-pathogen Clavibacter michiganensis subsp. michiganensis. The linear double- stranded DNA is terminally redundant and not circularly permuted. The complete nucleotide sequence of the bacteriophage CMP1 genome consists of 58,652 bp including the terminal redundant ends of 791 bp. The G+C content of the phage (57%) is significantly lower than that of its host (72.66%). 74 potential open reading frames were identified and annotated by different bioinformatic tools. Two large clusters which encode the early and the late functions could be identified which are divergently transcribed. There are only a few hypothetical gene products with conserved domains and significant similarity to sequences from the databases. Functional analyses confirmed the activity of four gene products, an endonuclease, an exonuclease, a single-stranded DNA binding protein and a thymidylate synthase. Partial genomic sequences of CN77, a phage of Clavibacter michiganensis subsp. nebraskensis, revealed a similar genome structure and significant similarities on the level of deduced amino acid sequences. An endolysin with peptidase activity has been identified for both phages, which may be good tools for disease control of tomato plants against Clavibacter infections. PMID:21687530
Genomic and molecular analysis of phage CMP1 from Clavibacter michiganensis subspecies michiganensis.

PubMed

Wittmann, Johannes; Gartemann, Karl-Heinz; Eichenlaub, Rudolf; Dreiseikelmann, Brigitte

2011-01-01

Bacteriophage CMP1 is a member of the Siphoviridae family that infects specifically the plant-pathogen Clavibacter michiganensis subsp. michiganensis. The linear double- stranded DNA is terminally redundant and not circularly permuted. The complete nucleotide sequence of the bacteriophage CMP1 genome consists of 58,652 bp including the terminal redundant ends of 791 bp. The G+C content of the phage (57%) is significantly lower than that of its host (72.66%). 74 potential open reading frames were identified and annotated by different bioinformatic tools. Two large clusters which encode the early and the late functions could be identified which are divergently transcribed. There are only a few hypothetical gene products with conserved domains and significant similarity to sequences from the databases. Functional analyses confirmed the activity of four gene products, an endonuclease, an exonuclease, a single-stranded DNA binding protein and a thymidylate synthase. Partial genomic sequences of CN77, a phage of Clavibacter michiganensis subsp. nebraskensis, revealed a similar genome structure and significant similarities on the level of deduced amino acid sequences. An endolysin with peptidase activity has been identified for both phages, which may be good tools for disease control of tomato plants against Clavibacter infections.
Characterization of Urtica dioica agglutinin isolectins and the encoding gene family.

PubMed

Does, M P; Ng, D K; Dekker, H L; Peumans, W J; Houterman, P M; Van Damme, E J; Cornelissen, B J

1999-01-01

Urtica dioica agglutinin (UDA) has previously been found in roots and rhizomes of stinging nettles as a mixture of UDA-isolectins. Protein and cDNA sequencing have shown that mature UDA is composed of two hevein domains and is processed from a precursor protein. The precursor contains a signal peptide, two in-tandem hevein domains, a hinge region and a carboxyl-terminal chitinase domain. Genomic fragments encoding precursors for UDA-isolectins have been amplified by five independent polymerase chain reactions on genomic DNA from stinging nettle ecotype Weerselo. One amplified gene was completely sequenced. As compared to the published cDNA sequence, the genomic sequence contains, besides two basepair substitutions, two introns located at the same positions as in other plant chitinases. By partial sequence analysis of 40 amplified genes, 16 different genes were identified which encode seven putative UDA-isolectins. The deduced amino acid sequences share 78.9-98.9% identity. In extracts of roots and rhizomes of stinging nettle ecotype Weerselo six out of these seven isolectins were detected by mass spectrometry. One of them is an acidic form, which has not been identified before. Our results demonstrate that UDA is encoded by a large gene family.
Sequence and Characterization of the Ig Heavy Chain Constant and Partial Variable Region of the Mouse Strain 129S11

PubMed Central

Retter, Ida; Chevillard, Christophe; Scharfe, Maren; Conrad, Ansgar; Hafner, Martin; Im, Tschong-Hun; Ludewig, Monika; Nordsiek, Gabriele; Severitt, Simone; Thies, Stephanie; Mauhar, America; Blöcker, Helmut; Müller, Werner; Riblet, Roy

2009-01-01

Although the entire mouse genome has been sequenced, there remain challenges concerning the elucidation of particular complex and polymorphic genomic loci. In the murine Igh locus, different haplotypes exist in different inbred mouse strains. For example, the Ighb haplotype sequence of the Mouse Genome Project strain C57BL/6 differs considerably from the Igha haplotype of BALB/c, which has been widely used in the analyses of Ab responses. We have sequenced and annotated the 3′ half of the Igha locus of 129S1/SvImJ, covering the CH region and approximately half of the VH region. This sequence comprises 128 VH genes, of which 49 are judged to be functional. The comparison of the Igha sequence with the homologous Ighb region from C57BL/6 revealed two major expansions in the germline repertoire of Igha. In addition, we found smaller haplotype-specific differences like the duplication of five VH genes in the Igha locus. We generated a VH allele table by comparing the individual VH genes of both haplotypes. Surprisingly, the number and position of DH genes in the 129S1 strain differs not only from the sequence of C57BL/6 but also from the map published for BALB/c. Taken together, the contiguous genomic sequence of the 3′ part of the Igha locus allows a detailed view of the recent evolution of this highly dynamic locus in the mouse. PMID:17675503
Genome sequencing identifies Listeria fleischmannii subsp. coloradonensis subsp. nov., isolated from a ranch.

PubMed

den Bakker, Henk C; Manuel, Clyde S; Fortes, Esther D; Wiedmann, Martin; Nightingale, Kendra K

2013-09-01

Twenty Listeria-like isolates were obtained from environmental samples collected on a cattle ranch in northern Colorado; all of these isolates were found to share an identical partial sigB sequence, suggesting close relatedness. The isolates were similar to members of the genus Listeria in that they were Gram-stain-positive, short rods, oxidase-negative and catalase-positive; the isolates were similar to Listeria fleischmannii because they were non-motile at 25 °C. 16S rRNA gene sequencing for representative isolates and whole genome sequencing for one isolate was performed. The genome of the type strain of Listeria fleischmannii (strain LU2006-1(T)) was also sequenced. The draft genomes were very similar in size and the average MUMmer nucleotide identity across 91% of the genomes was 95.16%. Genome sequence data were used to design primers for a six-gene multi-locus sequence analysis (MLSA) scheme. Phylogenies based on (i) the near-complete 16S rRNA gene, (ii) 31 core genes and (iii) six housekeeping genes illustrated the close relationship of these Listeria-like isolates to Listeria fleischmannii LU2006-1(T). Sufficient genetic divergence of the Listeria-like isolates from the type strain of Listeria fleischmannii and differing phenotypic characteristics warrant these isolates to be classified as members of a distinct infraspecific taxon, for which the name Listeria fleischmannii subsp. coloradonensis subsp. nov. is proposed. The type strain is TTU M1-001(T) ( =BAA-2414(T) =DSM 25391(T)). The isolates of Listeria fleischmannii subsp. coloradonensis subsp. nov. differ from the nominate subspecies by the inability to utilize melezitose, turanose and sucrose, and the ability to utilize inositol. The results also demonstrate the utility of whole genome sequencing to facilitate identification of novel taxa within a well-described genus. The genomes of both subspecies of Listeria fleischmannii contained putative enhancin genes; the Listeria fleischmannii subsp. coloradonensis subsp. nov. genome also encoded a putative mosquitocidal toxin. The presence of these genes suggests possible adaptation to an insect host, and further studies are needed to probe niche adaptation of Listeria fleischmannii.

Phylogenetic utility, and variability in structure and content, of complete mitochondrial genomes among genetic lineages of the Hawaiian anchialine shrimp Halocaridina rubra Holthuis 1963 (Atyidae:Decapoda).

PubMed

Justice, Joshua L; Weese, David A; Santos, Scott Ross

2016-07-01

The Atyidae are caridean shrimp possessing hair-like setae on their claws and are important contributors to ecological services in tropical and temperate fresh and brackish water ecosystems. Complete mitochondrial genomes have only been reported from five of the 449 species in the family, thus limiting understanding of mitochondrial genome evolution and the phylogenetic utility of complete mitochondrial sequences in the Atyidae. Here, comparative analyses of complete mitochondrial genomes from eight genetic lineages of Halocaridina rubra, an atyid endemic to the anchialine ecosystem of the Hawaiian Archipelago, are presented. Although gene number, order, and orientation were syntenic among genomes, three regions were identified and further quantified where conservation was substantially lower: (1) high length and sequence variability in the tRNA-Lys and tRNA-Asp intergenic region; (2) a 317-bp insertion between the NAD6 and CytB genes confined to a single lineage and representing a partial duplication of CytB; and (3) the putative control region. Phylogenetic analyses utilizing complete mitochondrial sequences provided new insights into relationships among the H. rubra genetic lineages, with the topology of one clade correlating to the geologic sequence of the islands. However, deeper nodes in the phylogeny lacked bootstrap support. Overall, our results from H. rubra suggest intra-specific mitochondrial genomic diversity could be underestimated across the Metazoa since the vast majority of complete genomes are from just a single individual of a species.
Second-generation sequencing of entire mitochondrial coding-regions (∼15.4 kb) holds promise for study of the phylogeny and taxonomy of human body lice and head lice.

PubMed

Xiong, H; Campelo, D; Pollack, R J; Raoult, D; Shao, R; Alem, M; Ali, J; Bilcha, K; Barker, S C

2014-08-01

The Illumina Hiseq platform was used to sequence the entire mitochondrial coding-regions of 20 body lice, Pediculus humanus Linnaeus, and head lice, P. capitis De Geer (Phthiraptera: Pediculidae), from eight towns and cities in five countries: Ethiopia, France, China, Australia and the U.S.A. These data (∼310 kb) were used to see how much more informative entire mitochondrial coding-region sequences were than partial mitochondrial coding-region sequences, and thus to guide the design of future studies of the phylogeny, origin, evolution and taxonomy of body lice and head lice. Phylogenies were compared from entire coding-region sequences (∼15.4 kb), entire cox1 (∼1.5 kb), partial cox1 (∼700 bp) and partial cytb (∼600 bp) sequences. On the one hand, phylogenies from entire mitochondrial coding-region sequences (∼15.4 kb) were much more informative than phylogenies from entire cox1 sequences (∼1.5 kb) and partial gene sequences (∼600 to ∼700 bp). For example, 19 branches had > 95% bootstrap support in our maximum likelihood tree from the entire mitochondrial coding-regions (∼15.4 kb) whereas the tree from 700 bp cox1 had only two branches with bootstrap support > 95%. Yet, by contrast, partial cytb (∼600 bp) and partial cox1 (∼486 bp) sequences were sufficient to genotype lice to Clade A, B or C. The sequences of the mitochondrial genomes of the P. humanus, P. capitis and P. schaeffi Fahrenholz studied are in NCBI GenBank under the accession numbers KC660761-800, KC685631-6330, KC241882-97, EU219988-95, HM241895-8 and JX080388-407. © 2014 The Royal Entomological Society.
Genomic structure and paralogous regions of the inversion breakpoint occurring between human chromosome 3p12.3 and orangutan chromosome 2.

PubMed

Yue, Y; Grossmann, B; Tsend-Ayush, E; Grützner, F; Ferguson-Smith, M A; Yang, F; Haaf, T

2005-01-01

Intrachromosomal duplications play a significant role in human genome pathology and evolution. To better understand the molecular basis of evolutionary chromosome rearrangements, we performed molecular cytogenetic and sequence analyses of the breakpoint region that distinguishes human chromosome 3p12.3 and orangutan chromosome 2. FISH with region-specific BAC clones demonstrated that the breakpoint-flanking sequences are duplicated intrachromosomally on orangutan 2 and human 3q21 as well as at many pericentromeric and subtelomeric sites throughout the genomes. Breakage and rearrangement of the human 3p12.3-homologous region in the orangutan lineage were associated with a partial loss of duplicated sequences in the breakpoint region. Consistent with our FISH mapping results, computational analysis of the human chromosome 3 genomic sequence revealed three 3p12.3-paralogous sequence blocks on human chromosome 3q21 and smaller blocks on the short arm end 3p26-->p25. This is consistent with the view that sequences from an ancestral site at 3q21 were duplicated at 3p12.3 in a common ancestor of orangutan and humans. Our results show that evolutionary chromosome rearrangements are associated with microduplications and microdeletions, contributing to the DNA differences between closely related species. Copyright (c) 2005 S. Karger AG, Basel.
Dynamics of actin evolution in dinoflagellates.

PubMed

Kim, Sunju; Bachvaroff, Tsvetan R; Handy, Sara M; Delwiche, Charles F

2011-04-01

Dinoflagellates have unique nuclei and intriguing genome characteristics with very high DNA content making complete genome sequencing difficult. In dinoflagellates, many genes are found in multicopy gene families, but the processes involved in the establishment and maintenance of these gene families are poorly understood. Understanding the dynamics of gene family evolution in dinoflagellates requires comparisons at different evolutionary scales. Studies of closely related species provide fine-scale information relative to species divergence, whereas comparisons of more distantly related species provides broad context. We selected the actin gene family as a highly expressed conserved gene previously studied in dinoflagellates. Of the 142 sequences determined in this study, 103 were from the two closely related species, Dinophysis acuminata and D. caudata, including full length and partial cDNA sequences as well as partial genomic amplicons. For these two Dinophysis species, at least three types of sequences could be identified. Most copies (79%) were relatively similar and in nucleotide trees, the sequences formed two bushy clades corresponding to the two species. In comparisons within species, only eight to ten nucleotide differences were found between these copies. The two remaining types formed clades containing sequences from both species. One type included the most similar sequences in between-species comparisons with as few as 12 nucleotide differences between species. The second type included the most divergent sequences in comparisons between and within species with up to 93 nucleotide differences between sequences. In all the sequences, most variation occurred in synonymous sites or the 5' UnTranslated Region (UTR), although there was still limited amino acid variation between most sequences. Several potential pseudogenes were found (approximately 10% of all sequences depending on species) with incomplete open reading frames due to frameshifts or early stop codons. Overall, variation in the actin gene family fits best with the "birth and death" model of evolution based on recent duplications, pseudogenes, and incomplete lineage sorting. Divergence between species was similar to variation within species, so that actin may be too conserved to be useful for phylogenetic estimation of closely related species.
Optofluidic Single-Cell Genome Amplification of Sub-micron Bacteria in the Ocean Subsurface

PubMed Central

Landry, Zachary C.; Vergin, Kevin; Mannenbach, Christopher; Block, Stephen; Yang, Qiao; Blainey, Paul; Carlson, Craig; Giovannoni, Stephen

2018-01-01

Optofluidic single-cell genome amplification was used to obtain genome sequences from sub-micron cells collected from the euphotic and mesopelagic zones of the northwestern Sargasso Sea. Plankton cells were visually selected and manually sorted with an optical trap, yielding 20 partial genome sequences representing seven bacterial phyla. Two organisms, E01-9C-26 (Gammaproteobacteria), represented by four single cell genomes, and Opi.OSU.00C, an uncharacterized Verrucomicrobia, were the first of their types retrieved by single cell genome sequencing and were studied in detail. Metagenomic data showed that E01-9C-26 is found throughout the dark ocean, while Opi.OSU.00C was observed to bloom transiently in the nutrient-depleted euphotic zone of the late spring and early summer. The E01-9C-26 genomes had an estimated size of 4.76–5.05 Mbps, and contained “O” and “W”-type monooxygenase genes related to methane and ammonium monooxygenases that were previously reported from ocean metagenomes. Metabolic reconstruction indicated E01-9C-26 are likely versatile methylotrophs capable of scavenging C1 compounds, methylated compounds, reduced sulfur compounds, and a wide range of amines, including D-amino acids. The genome sequences identified E01-9C-26 as a source of “O” and “W”-type monooxygenase genes related to methane and ammonium monooxygenases that were previously reported from ocean metagenomes, but are of unknown function. In contrast, Opi.OSU.00C genomes encode genes for catabolizing carbohydrate compounds normally associated with eukaryotic phytoplankton. This exploration of optofluidics showed that it was effective for retrieving diverse single-cell bacterioplankton genomes and has potential advantages in microbiology applications that require working with small sample volumes or targeting cells by their morphology.
A partial nuclear genome of the Jomons who lived 3000 years ago in Fukushima, Japan

PubMed Central

Kanzawa-Kiriyama, Hideaki; Kryukov, Kirill; Jinam, Timothy A; Hosomichi, Kazuyoshi; Saso, Aiko; Suwa, Gen; Ueda, Shintaroh; Yoneda, Minoru; Tajima, Atsushi; Shinoda, Ken-ichi; Inoue, Ituro; Saitou, Naruya

2017-01-01

The Jomon period of the Japanese Archipelago, characterized by cord-marked ‘jomon' potteries, has yielded abundant human skeletal remains. However, the genetic origins of the Jomon people and their relationships with modern populations have not been clarified. We determined a total of 115 million base pair nuclear genome sequences from two Jomon individuals (male and female each) from the Sanganji Shell Mound (dated 3000 years before present) with the Jomon-characteristic mitochondrial DNA haplogroup N9b, and compared these nuclear genome sequences with those of worldwide populations. We found that the Jomon population lineage is best considered to have diverged before diversification of present-day East Eurasian populations, with no evidence of gene flow events between the Jomon and other continental populations. This suggests that the Sanganji Jomon people descended from an early phase of population dispersals in East Asia. We also estimated that the modern mainland Japanese inherited <20% of Jomon peoples' genomes. Our findings, based on the first analysis of Jomon nuclear genome sequence data, firmly demonstrate that the modern mainland Japanese resulted from genetic admixture of the indigenous Jomon people and later migrants. PMID:27581845
Unprecedented genomic diversity of AhR1 and AhR2 genes in Atlantic salmon (Salmo salar L.).

PubMed

Hansson, Maria C; Wittzell, Håkan; Persson, Kerstin; von Schantz, Torbjörn

2004-06-24

Aryl hydrocarbon receptor (AhR) genes encode proteins involved in mediating the toxic responses induced by several environmental pollutants. Here, we describe the identification of the first two AhR1 (alpha and beta) genes and two additional AhR2 (alpha and beta) genes in the tetraploid species Atlantic salmon (Salmo salar L.) from a cosmid library screening. Cosmid clones containing genomic salmon AhR sequences were isolated using a cDNA clone containing the coding region of the Atlantic salmon AhR2gamma as a probe. Screening revealed 14 positive clones, from which four were chosen for further analyses. One of the cosmids contained genomic AhR sequences that were highly similar to the rainbow trout (Oncorhynchus mykiss) AhR2alpha and beta genes. SMART RACE amplified two complete, highly similar but not identical AhR type 2 sequences from salmon cDNA, which from phylogenetic analyses were determined as the rainbow trout AhR2alpha and beta orthologs. The salmon AhR2alpha and beta encode proteins of 1071 and 1058 residues, respectively, and encompass characteristic AhR sequence elements like a basic-helix-loop-helix (bHLH) and two PER-ARNT-SIM (PAS) domains. Both genes are transcribed in liver, spleen and muscle tissues of adult salmon. A second cosmid contained partial sequences, which were identical to the previously characterized AhR2gamma gene. The last two cosmids contained partial genomic AhR sequences, which were more similar to other AhR type 1 fish genes than the four characterized salmon AhR2 genes. However, attempts to amplify the corresponding complete cDNA sequences of the inserts proved very difficult, suggesting that these genes are non-functional or very weakly transcribed in the examined tissues. Phylogenetic analyses of the conserved regions did, however, clearly indicate that these two AhRs belong to the AhR type 1 clade and have been assigned as the Atlantic salmon AhR1alpha and AhR1beta genes. Taken together, these findings demonstrate that multiple AhR genes are present in Atlantic salmon genome, which likely is a consequence of previous genome duplications in the evolutionary past of salmonids. Plausible explanations for the high incidence of AhR genes in fish and more specifically in salmonids, like rapid divergences in specialized functions, are discussed.
A second gene for acyl-(acyl-carrier-protein): glycerol-3-phosphate acyltransferase in squash, Cucurbita moschata cv. Shirogikuza(*), codes for an oleate-selective isozyme: molecular cloning and protein purification studies.

PubMed

Nishida, I; Sugiura, M; Enju, A; Nakamura, M

2000-12-01

A new isogene for acyl-(acyl-carrier-protein):glycerol-3-phosphate acyltransferase (GPAT; EC 2.3.1.15) in squash has been cloned and the gene product was identified as oleate-selective GPAT. Using PCR primers that could hybridise with exons for a previously cloned squash GPAT, we obtained two PCR products of different size: one coded for a previously cloned squash GPAT corresponding to non-selective isoforms AT2 and AT3, and the other for a new isozyme, probably the oleate-selective isoform AT1. Full-length amino acid sequences of respective isozymes were deduced from the nucleotide sequences of genomic genes and cDNAs, which were cloned by a series of PCR-based methods. Thus, we designated the new gene CmATS1;1 and the other one CmATS1;2. Genome blot analysis revealed that the squash genome contained the two isogenes at non-allelic loci. AT1-active fractions were partially purified, and three polypeptide bands were identified as being AT1 polypeptides, which exhibited relative molecular masses of 39.5-40.5 kDa, pI values of 6.75-7.15, and oleate selectivity over palmitate. Partial amino-terminal sequences obtained from two of these bands verified that the new isogene codes for AT1 polypeptides.
Modularity of nitrogen-oxide reducing soil bacteria: linking phenotype to genotype.

PubMed

Roco, Constance A; Bergaust, Linda L; Bakken, Lars R; Yavitt, Joseph B; Shapleigh, James P

2017-06-01

Model denitrifiers convert NO3- to N 2 , but it appears that a significant fraction of natural populations are truncated, conducting only one or two steps of the pathway. To better understand the diversity of partial denitrifiers in soil and whether discrepancies arise between the presence of known N-oxide reductase genes and phenotypic features, bacteria able to reduce NO3- to NO2- were isolated from soil, N-oxide gas products were measured for eight isolates, and six were genome sequenced. Gas phase analyses revealed that two were complete denitrifiers, which genome sequencing corroborated. The remaining six accumulated NO and N 2 O to varying degrees and genome sequencing of four indicated that two isolates held genes encoding nitrate reductase as the only dissimilatory N-oxide reductase, one contained genes for both nitrate and nitric oxide reductase, and one had nitrate and nitrite reductase. The results demonstrated that N-oxide production was not always predicted by the genetic potential and suggested that partial denitrifiers could be readily isolated among soil bacteria. This supported the hypothesis that each N-oxide reductase could provide a selectable benefit on its own, and therefore, reduction of nitrate to dinitrogen may not be obligatorily linked to complete denitrifiers but instead a consequence of a functionally diverse community. © 2016 Society for Applied Microbiology and John Wiley & Sons Ltd.
COMPLETE GENOMIC SEQUENCE OF VIRULENT PIGEON PARAMYXOVIRUS IN LAUGHING DOVES (STREPTOPELIA SENEGALENSIS) IN KENYA.

PubMed

Obanda, Vincent; Michuki, George; Jowers, Michael J; Rumberia, Cecilia; Mutinda, Mathew; Lwande, Olivia Wesula; Wangoru, Kihara; Kasiiti-Orengo, Jacquiline; Yongo, Moses; Angelone-Alasaad, Samer

2016-07-01

Following mass deaths of Laughing Doves (Streptopelia senegalensis) in different localities throughout Kenya, internal organs obtained during necropsy of two moribund birds were sampled and analyzed by next generation sequencing. We isolated the virulent strain of pigeon paramyxovirus type-1 (PPMV-1), PPMV1/Laughing Dove/Kenya/Isiolo/B2/2012, which had a characteristic fusion gene motif (110)GGRRQKRF(117). We obtained a partial full genome of 15,114 nucleotides. The phylogenetic relationship based on the fusion gene and genomic sequence grouped our isolate as class II genotype VI, a group of viruses commonly isolated from wild birds but potentially lethal to Chickens ( Gallus gallus domesticus ). The fusion gene isolate clustered with PPMV-I strains from pigeons (Columbidae) in Nigeria. The complete genome showed a basal and highly divergent lineage to American, European, and Asian strains, indicating a divergent evolutionary pathway. The isolated strain is highly virulent and apparently species-specific to Laughing Doves in Kenya. Risk of transmission of such a strain to poultry is potentially high whereas the cyclic epizootic in doves is a threat to conservation of wild Columbidae in Kenya.
Microsatellite marker development by partial sequencing of the sour passion fruit genome (Passiflora edulis Sims).

PubMed

Araya, Susan; Martins, Alexandre M; Junqueira, Nilton T V; Costa, Ana Maria; Faleiro, Fábio G; Ferreira, Márcio E

2017-07-21

The Passiflora genus comprises hundreds of wild and cultivated species of passion fruit used for food, industrial, ornamental and medicinal purposes. Efforts to develop genomic tools for genetic analysis of P. edulis, the most important commercial Passiflora species, are still incipient. In spite of many recognized applications of microsatellite markers in genetics and breeding, their availability for passion fruit research remains restricted. Microsatellite markers in P. edulis are usually limited in number, show reduced polymorphism, and are mostly based on compound or imperfect repeats. Furthermore, they are confined to only a few Passiflora species. We describe the use of NGS technology to partially assemble the P. edulis genome in order to develop hundreds of new microsatellite markers. A total of 14.11 Gbp of Illumina paired-end sequence reads were analyzed to detect simple sequence repeat sites in the sour passion fruit genome. A sample of 1300 contigs containing perfect repeat microsatellite sequences was selected for PCR primer development. Panels of di- and tri-nucleotide repeat markers were then tested in P. edulis germplasm accessions for validation. DNA polymorphism was detected in 74% of the markers (PIC = 0.16 to 0.77; number of alleles/locus = 2 to 7). A core panel of highly polymorphic markers (PIC = 0.46 to 0.77) was used to cross-amplify PCR products in 79 species of Passiflora (including P. edulis), belonging to four subgenera (Astrophea, Decaloba, Distephana and Passiflora). Approximately 71% of the marker/species combinations resulted in positive amplicons in all species tested. DNA polymorphism was detected in germplasm accessions of six closely related Passiflora species (P. edulis, P. alata, P. maliformis, P. nitida, P. quadrangularis and P. setacea) and the data used for accession discrimination and species assignment. A database of P. edulis DNA sequences obtained by NGS technology was examined to identify microsatellite repeats in the sour passion fruit genome. Markers were submitted to evaluation using accessions of cultivated and wild Passiflora species. The new microsatellite markers detected high levels of DNA polymorphism in sour passion fruit and can potentially be used in genetic analysis of P. edulis and other Passiflora species.
Genome structure of Rosa multiflora, a wild ancestor of cultivated roses

PubMed Central

Nakamura, Noriko; Hirakawa, Hideki; Sato, Shusei; Otagaki, Shungo; Matsumoto, Shogo; Tabata, Satoshi; Tanaka, Yoshikazu

2018-01-01

Abstract The draft genome sequence of a wild rose (Rosa multiflora Thunb.) was determined using Illumina MiSeq and HiSeq platforms. The total length of the scaffolds was 739,637,845 bp, consisting of 83,189 scaffolds, which was close to the 711 Mbp length estimated by k-mer analysis. N50 length of the scaffolds was 90,830 bp, and extent of the longest was 1,133,259 bp. The average GC content of the scaffolds was 38.9%. After gene prediction, 67,380 candidates exhibiting sequence homology to known genes and domains were extracted, which included complete and partial gene structures. This large number of genes for a diploid plant may reflect heterogeneity of the genome originating from self-incompatibility in R. multiflora. According to CEGMA analysis, 91.9% and 98.0% of the core eukaryotic genes were completely and partially conserved in the scaffolds, respectively. Genes presumably involved in flower color, scent and flowering are assigned. The results of this study will serve as a valuable resource for fundamental and applied research in the rose, including breeding and phylogenetic study of cultivated roses. PMID:29045613
VirSorter: mining viral signal from microbial genomic data.

PubMed

Roux, Simon; Enault, Francois; Hurwitz, Bonnie L; Sullivan, Matthew B

2015-01-01

Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome), new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs) of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter's prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages). Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs) as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision) on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in "reverse" to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made available through the iPlant Cyberinfrastructure that provides a web-based user interface interconnected with the required computing resources. VirSorter thus complements existing prophage prediction softwares to better leverage fragmented, SAG and metagenomic datasets in a way that will scale to modern sequencing. Given these features, VirSorter should enable the discovery of new viruses in microbial datasets, and further our understanding of uncultivated viral communities across diverse ecosystems.
VirSorter: mining viral signal from microbial genomic data

PubMed Central

Roux, Simon; Enault, Francois; Hurwitz, Bonnie L.

2015-01-01

Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome), new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs) of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter’s prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages). Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs) as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision) on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in “reverse” to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made available through the iPlant Cyberinfrastructure that provides a web-based user interface interconnected with the required computing resources. VirSorter thus complements existing prophage prediction softwares to better leverage fragmented, SAG and metagenomic datasets in a way that will scale to modern sequencing. Given these features, VirSorter should enable the discovery of new viruses in microbial datasets, and further our understanding of uncultivated viral communities across diverse ecosystems. PMID:26038737
The human homolog of S. cerevisiae CDC27, CDC27 Hs, is encoded by a highly conserved intronless gene present in multiple copies in the human genome

DOE Office of Scientific and Technical Information (OSTI.GOV)

Devor, E.J.; Dill-Devor, R.M.

1994-09-01

We have obtained a number of unique sequences via PCR amplification of human genomic DNA using degenerate primers under low stringency (42{degrees}C). One of these, an 853 bp product, has been identified as a partial genomic sequence of the human homolog of the S. cerevisiae CDC27 gene, CDC27Hs (GenBank No. U00001). This gene, reported by Turgendreich et al. is also designated EST00556 from Adams et al. We have undertaken a more detailed examination of our sequence, MCP34N, and have found that: 1. the genomic sequence is nearly identical to CDC27Hs over its entire 853 bp length; 2. an MCP34N-specific PCRmore » assay of several non-human primate species reveals amplification products in chimpanzee and gorilla genomes having greater than 90% sequence identity with CDC27Hs; and 3. an MCP34N-specific PCR assay of the BIOS hybrid cell line panel gives a discordancy pattern suggesting multiple loci. Based upon these data, we present the following initial characterization: 1. the complete MCP34N sequence identity with CDC27Hs indicates that the latter is encoded by an intronless gene; 2. CDC27Hs is highly conserved among higher primates; and 3. CDC27Hs is present in multiple copies in the human genome. These characteristics, taken together with those initially reported for CDC27Hs, suggest that this is an old gene that carries out an important but, as yet, unknown function in the human brain.« less
Genome-Scale Phylogeny of the Alphavirus Genus Suggests a Marine Origin

PubMed Central

Palacios, G.; Tesh, R. B.; Savji, N.; Guzman, H.; Sherman, M.; Weaver, S. C.; Lipkin, W. I.

2012-01-01

The genus Alphavirus comprises a diverse group of viruses, including some that cause severe disease. Using full-length sequences of all known alphaviruses, we produced a robust and comprehensive phylogeny of the Alphavirus genus, presenting a more complete evolutionary history of these viruses compared to previous studies based on partial sequences. Our phylogeny suggests the origin of the alphaviruses occurred in the southern oceans and spread equally through the Old and New World. Since lice appear to be involved in aquatic alphavirus transmission, it is possible that we are missing a louse-borne branch of the alphaviruses. Complete genome sequencing of all members of the genus also revealed conserved residues forming the structural basis of the E1 and E2 protein dimers. PMID:22190718
A Novel Locomotion-based Validation Assay for Candidate Drugs Using Drosophila DYT1 Disease Model

DTIC Science & Technology

2013-11-01

the genome using the same parental fly line, minimizing the effect of surrounding sequences and genetic variations on the ...locomotion and GTPC cyclrohydolase protein levels; (3) supplementation of dopamine can partially rescue the locomotion defects of Drosophila larvae...8217- GCGAACAACCAAAAAATCATTGAGATAATAAACTCCTCCATTAG-3’) to make dtorsin cDNA that lacks GAC (D307) (Fig. 1) respectively. After confirming mutated sequences , the insert was again
Dobrava Virus Carried by the Yellow-Necked Field Mouse Apodemus flavicollis, Causing Hemorrhagic Fever with Renal Syndrome in Romania

PubMed Central

Panculescu-Gatej, Raluca Ioana; Sirbu, Anca; Dinu, Sorin; Waldstrom, Maria; Heyman, Paul; Murariu, Dimitru; Petrescu, Angela; Szmal, Camelia; Oprisan, Gabriela; Lundkvist, Åke

2014-01-01

Abstract Hemorrhagic fever with renal syndrome (HFRS) has been confirmed by serological methods during recent years in Romania. In the present study, focus-reduction neutralization tests (FRNT) confirmed Dobrava hantavirus (DOBV) as the causative agent in some HFRS cases, but could not distinguish between DOBV and Saaremaa virus (SAAV) infections in other cases. DOBV was detected by a DOBV-specific TaqMan assay in sera of nine patients out of 22 tested. Partial sequences of the M genomic segment of DOBV were obtained from sera of three patients and revealed the circulation of two DOBV lineages in Romania. Investigation of rodents trapped in Romania found three DOBV-positive Apodemus flavicollis out of 83 rodents tested. Two different DOBV lineages were also detected in A. flavicollis as determined from partial sequences of the M and S genomic segments. Sequences of DOBV in A. flavicollis were either identical or closely related to the sequences obtained from the HFRS patients. The DOBV strains circulating in Romania clustered in two monophyletic groups, together with strains from Slovenia and the north of Greece. This is the first evidence for the circulation of DOBV in wild rodents and for a DOBV etiology of HFRS in Romania. PMID:24746107
Dobrava virus carried by the yellow-necked field mouse Apodemus flavicollis, causing hemorrhagic fever with renal syndrome in Romania.

PubMed

Panculescu-Gatej, Raluca Ioana; Sirbu, Anca; Dinu, Sorin; Waldstrom, Maria; Heyman, Paul; Murariu, Dimitru; Petrescu, Angela; Szmal, Camelia; Oprisan, Gabriela; Lundkvist, Ake; Ceianu, Cornelia S

2014-05-01

Hemorrhagic fever with renal syndrome (HFRS) has been confirmed by serological methods during recent years in Romania. In the present study, focus-reduction neutralization tests (FRNT) confirmed Dobrava hantavirus (DOBV) as the causative agent in some HFRS cases, but could not distinguish between DOBV and Saaremaa virus (SAAV) infections in other cases. DOBV was detected by a DOBV-specific TaqMan assay in sera of nine patients out of 22 tested. Partial sequences of the M genomic segment of DOBV were obtained from sera of three patients and revealed the circulation of two DOBV lineages in Romania. Investigation of rodents trapped in Romania found three DOBV-positive Apodemus flavicollis out of 83 rodents tested. Two different DOBV lineages were also detected in A. flavicollis as determined from partial sequences of the M and S genomic segments. Sequences of DOBV in A. flavicollis were either identical or closely related to the sequences obtained from the HFRS patients. The DOBV strains circulating in Romania clustered in two monophyletic groups, together with strains from Slovenia and the north of Greece. This is the first evidence for the circulation of DOBV in wild rodents and for a DOBV etiology of HFRS in Romania.
'Cold shock' increases the frequency of homology directed repair gene editing in induced pluripotent stem cells.

PubMed

Guo, Q; Mintier, G; Ma-Edmonds, M; Storton, D; Wang, X; Xiao, X; Kienzle, B; Zhao, D; Feder, John N

2018-02-01

Using CRISPR/Cas9 delivered as a RNA modality in conjunction with a lipid specifically formulated for large RNA molecules, we demonstrate that homology directed repair (HDR) rates between 20-40% can be achieved in induced pluripotent stem cells (iPSC). Furthermore, low HDR rates (between 1-20%) can be enhanced two- to ten-fold in both iPSCs and HEK293 cells by 'cold shocking' cells at 32 °C for 24-48 hours following transfection. This method can also increases the proportion of loci that have undergone complete sequence conversion across the donor sequence, or 'perfect HDR', as opposed to partial sequence conversion where nucleotides more distal to the CRISPR cut site are less efficiently incorporated ('partial HDR'). We demonstrate that the structure of the single-stranded DNA oligo donor can influence the fidelity of HDR, with oligos symmetric with respect to the CRISPR cleavage site and complementary to the target strand being more efficient at directing 'perfect HDR' compared to asymmetric non-target strand complementary oligos. Our protocol represents an efficient method for making CRISPR-mediated, specific DNA sequence changes within the genome that will facilitate the rapid generation of genetic models of human disease in iPSCs as well as other genome engineered cell lines.

Draft genome sequence of Micrococcus luteus strain O'Kane implicates metabolic versatility and the potential to degrade polyhydroxybutyrates.

PubMed

Hanafy, Radwa A; Couger, M B; Baker, Kristina; Murphy, Chelsea; O'Kane, Shannon D; Budd, Connie; French, Donald P; Hoff, Wouter D; Youssef, Noha

2016-09-01

Micrococcus luteus is a predominant member of skin microbiome. We here report on the genomic analysis of Micrococcus luteus strain O'Kane that was isolated from an elevator. The partial genome assembly of Micrococcus luteus strain O'Kane is 2.5 Mb with 2256 protein-coding genes and 62 RNA genes. Genomic analysis revealed metabolic versatility with genes involved in the metabolism and transport of glucose, galactose, fructose, mannose, alanine, aspartate, asparagine, glutamate, glutamine, glycine, serine, cysteine, methionine, arginine, proline, histidine, phenylalanine, and fatty acids. Genomic comparison to other M. luteus representatives identified the potential to degrade polyhydroxybutyrates, as well as several antibiotic resistance genes absent from other genomes.
Genome-wide patterns of copy number variation in the diversified chicken genomes using next-generation sequencing.

PubMed

Yi, Guoqiang; Qu, Lujiang; Liu, Jianfeng; Yan, Yiyuan; Xu, Guiyun; Yang, Ning

2014-11-07

Copy number variation (CNV) is important and widespread in the genome, and is a major cause of disease and phenotypic diversity. Herein, we performed a genome-wide CNV analysis in 12 diversified chicken genomes based on whole genome sequencing. A total of 8,840 CNV regions (CNVRs) covering 98.2 Mb and representing 9.4% of the chicken genome were identified, ranging in size from 1.1 to 268.8 kb with an average of 11.1 kb. Sequencing-based predictions were confirmed at a high validation rate by two independent approaches, including array comparative genomic hybridization (aCGH) and quantitative PCR (qPCR). The Pearson's correlation coefficients between sequencing and aCGH results ranged from 0.435 to 0.755, and qPCR experiments revealed a positive validation rate of 91.71% and a false negative rate of 22.43%. In total, 2,214 (25.0%) predicted CNVRs span 2,216 (36.4%) RefSeq genes associated with specific biological functions. Besides two previously reported copy number variable genes EDN3 and PRLR, we also found some promising genes with potential in phenotypic variation. Two genes, FZD6 and LIMS1, related to disease susceptibility/resistance are covered by CNVRs. The highly duplicated SOCS2 may lead to higher bone mineral density. Entire or partial duplication of some genes like POPDC3 may have great economic importance in poultry breeding. Our results based on extensive genetic diversity provide a more refined chicken CNV map and genome-wide gene copy number estimates, and warrant future CNV association studies for important traits in chickens.
Combining partially ranked data in plant breeding and biology: II. Analysis with Rasch model.

USDA-ARS?s Scientific Manuscript database

Many years of breeding experiments, germplasm screening, and molecular biologic experimentation have generated volumes of sequence, genotype, and phenotype information that have been stored in public data repositories. These resources afford genetic and genomic researchers the opportunity to handle ...
A Mitochondrial Genome of Rhyparochromidae (Hemiptera: Heteroptera) and a Comparative Analysis of Related Mitochondrial Genomes.

PubMed

Li, Teng; Yang, Jie; Li, Yinwan; Cui, Ying; Xie, Qiang; Bu, Wenjun; Hillis, David M

2016-10-19

The Rhyparochromidae, the largest family of Lygaeoidea, encompasses more than 1,850 described species, but no mitochondrial genome has been sequenced to date. Here we describe the first mitochondrial genome for Rhyparochromidae: a complete mitochondrial genome of Panaorus albomaculatus (Scott, 1874). This mitochondrial genome is comprised of 16,345 bp, and contains the expected 37 genes and control region. The majority of the control region is made up of a large tandem-repeat region, which has a novel pattern not previously observed in other insects. The tandem-repeats region of P. albomaculatus consists of 53 tandem duplications (including one partial repeat), which is the largest number of tandem repeats among all the known insect mitochondrial genomes. Slipped-strand mispairing during replication is likely to have generated this novel pattern of tandem repeats. Comparative analysis of tRNA gene families in sequenced Pentatomomorpha and Lygaeoidea species shows that the pattern of nucleotide conservation is markedly higher on the J-strand. Phylogenetic reconstruction based on mitochondrial genomes suggests that Rhyparochromidae is not the sister group to all the remaining Lygaeoidea, and supports the monophyly of Lygaeoidea.
New genomic resources for switchgrass: a BAC library and comparative analysis of homoeologous genomic regions harboring bioenergy traits

PubMed Central

2011-01-01

Background Switchgrass, a C4 species and a warm-season grass native to the prairies of North America, has been targeted for development into an herbaceous biomass fuel crop. Genetic improvement of switchgrass feedstock traits through marker-assisted breeding and biotechnology approaches calls for genomic tools development. Establishment of integrated physical and genetic maps for switchgrass will accelerate mapping of value added traits useful to breeding programs and to isolate important target genes using map based cloning. The reported polyploidy series in switchgrass ranges from diploid (2X = 18) to duodecaploid (12X = 108). Like in other large, repeat-rich plant genomes, this genomic complexity will hinder whole genome sequencing efforts. An extensive physical map providing enough information to resolve the homoeologous genomes would provide the necessary framework for accurate assembly of the switchgrass genome. Results A switchgrass BAC library constructed by partial digestion of nuclear DNA with EcoRI contains 147,456 clones covering the effective genome approximately 10 times based on a genome size of 3.2 Gigabases (~1.6 Gb effective). Restriction digestion and PFGE analysis of 234 randomly chosen BACs indicated that 95% of the clones contained inserts, ranging from 60 to 180 kb with an average of 120 kb. Comparative sequence analysis of two homoeologous genomic regions harboring orthologs of the rice OsBRI1 locus, a low-copy gene encoding a putative protein kinase and associated with biomass, revealed that orthologous clones from homoeologous chromosomes can be unambiguously distinguished from each other and correctly assembled to respective fingerprint contigs. Thus, the data obtained not only provide genomic resources for further analysis of switchgrass genome, but also improve efforts for an accurate genome sequencing strategy. Conclusions The construction of the first switchgrass BAC library and comparative analysis of homoeologous harboring OsBRI1 orthologs present a glimpse into the switchgrass genome structure and complexity. Data obtained demonstrate the feasibility of using HICF fingerprinting to resolve the homoeologous chromosomes of the two distinct genomes in switchgrass, providing a robust and accurate BAC-based physical platform for this species. The genomic resources and sequence data generated will lay the foundation for deciphering the switchgrass genome and lead the way for an accurate genome sequencing strategy. PMID:21767393
New genomic resources for switchgrass: a BAC library and comparative analysis of homoeologous genomic regions harboring bioenergy traits.

PubMed

Saski, Christopher A; Li, Zhigang; Feltus, Frank A; Luo, Hong

2011-07-18

Switchgrass, a C4 species and a warm-season grass native to the prairies of North America, has been targeted for development into an herbaceous biomass fuel crop. Genetic improvement of switchgrass feedstock traits through marker-assisted breeding and biotechnology approaches calls for genomic tools development. Establishment of integrated physical and genetic maps for switchgrass will accelerate mapping of value added traits useful to breeding programs and to isolate important target genes using map based cloning. The reported polyploidy series in switchgrass ranges from diploid (2X = 18) to duodecaploid (12X = 108). Like in other large, repeat-rich plant genomes, this genomic complexity will hinder whole genome sequencing efforts. An extensive physical map providing enough information to resolve the homoeologous genomes would provide the necessary framework for accurate assembly of the switchgrass genome. A switchgrass BAC library constructed by partial digestion of nuclear DNA with EcoRI contains 147,456 clones covering the effective genome approximately 10 times based on a genome size of 3.2 Gigabases (~1.6 Gb effective). Restriction digestion and PFGE analysis of 234 randomly chosen BACs indicated that 95% of the clones contained inserts, ranging from 60 to 180 kb with an average of 120 kb. Comparative sequence analysis of two homoeologous genomic regions harboring orthologs of the rice OsBRI1 locus, a low-copy gene encoding a putative protein kinase and associated with biomass, revealed that orthologous clones from homoeologous chromosomes can be unambiguously distinguished from each other and correctly assembled to respective fingerprint contigs. Thus, the data obtained not only provide genomic resources for further analysis of switchgrass genome, but also improve efforts for an accurate genome sequencing strategy. The construction of the first switchgrass BAC library and comparative analysis of homoeologous harboring OsBRI1 orthologs present a glimpse into the switchgrass genome structure and complexity. Data obtained demonstrate the feasibility of using HICF fingerprinting to resolve the homoeologous chromosomes of the two distinct genomes in switchgrass, providing a robust and accurate BAC-based physical platform for this species. The genomic resources and sequence data generated will lay the foundation for deciphering the switchgrass genome and lead the way for an accurate genome sequencing strategy.
Identification of copy number variants in whole-genome data using Reference Coverage Profiles

PubMed Central

Glusman, Gustavo; Severson, Alissa; Dhankani, Varsha; Robinson, Max; Farrah, Terry; Mauldin, Denise E.; Stittrich, Anna B.; Ament, Seth A.; Roach, Jared C.; Brunkow, Mary E.; Bodian, Dale L.; Vockley, Joseph G.; Shmulevich, Ilya; Niederhuber, John E.; Hood, Leroy

2015-01-01

The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing, and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150–1000× compression) that enables such analyses. Current methods for analyzing variants in whole-genome sequencing (WGS) data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1–100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes. We analyzed depth of coverage in over 6000 high quality (>40×) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC. To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used. Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes. Use of pre-computed multi-genome coverage profiles improves our ability to analyze each individual genome. We make available RCPs and tools for performing these analyses on personal genomes. We expect the increased sensitivity and specificity for individual genome analysis to be critical for achieving clinical-grade genome interpretation. PMID:25741365
Efficient isolation method for high-quality genomic DNA from cicada exuviae.

PubMed

Nguyen, Hoa Quynh; Kim, Ye Inn; Borzée, Amaël; Jang, Yikweon

2017-10-01

In recent years, animal ethics issues have led researchers to explore nondestructive methods to access materials for genetic studies. Cicada exuviae are among those materials because they are cast skins that individuals left after molt and are easily collected. In this study, we aim to identify the most efficient extraction method to obtain high quantity and quality of DNA from cicada exuviae. We compared relative DNA yield and purity of six extraction protocols, including both manual protocols and available commercial kits, extracting from four different exoskeleton parts. Furthermore, amplification and sequencing of genomic DNA were evaluated in terms of availability of sequencing sequence at the expected genomic size. Both the choice of protocol and exuvia part significantly affected DNA yield and purity. Only samples that were extracted using the PowerSoil DNA Isolation kit generated gel bands of expected size as well as successful sequencing results. The failed attempts to extract DNA using other protocols could be partially explained by a low DNA yield from cicada exuviae and partly by contamination with humic acids that exist in the soil where cicada nymphs reside before emergence, as shown by spectroscopic measurements. Genomic DNA extracted from cicada exuviae could provide valuable information for species identification, allowing the investigation of genetic diversity across consecutive broods, or spatiotemporal variation among various populations. Consequently, we hope to provide a simple method to acquire pure genomic DNA applicable for multiple research purposes.
Comparing de novo genome assembly: the long and short of it.

PubMed

Narzisi, Giuseppe; Mishra, Bud

2011-04-29

Recent advances in DNA sequencing technology and their focal role in Genome Wide Association Studies (GWAS) have rekindled a growing interest in the whole-genome sequence assembly (WGSA) problem, thereby, inundating the field with a plethora of new formalizations, algorithms, heuristics and implementations. And yet, scant attention has been paid to comparative assessments of these assemblers' quality and accuracy. No commonly accepted and standardized method for comparison exists yet. Even worse, widely used metrics to compare the assembled sequences emphasize only size, poorly capturing the contig quality and accuracy. This paper addresses these concerns: it highlights common anomalies in assembly accuracy through a rigorous study of several assemblers, compared under both standard metrics (N50, coverage, contig sizes, etc.) as well as a more comprehensive metric (Feature-Response Curves, FRC) that is introduced here; FRC transparently captures the trade-offs between contigs' quality against their sizes. For this purpose, most of the publicly available major sequence assemblers--both for low-coverage long (Sanger) and high-coverage short (Illumina) reads technologies--are compared. These assemblers are applied to microbial (Escherichia coli, Brucella, Wolbachia, Staphylococcus, Helicobacter) and partial human genome sequences (Chr. Y), using sequence reads of various read-lengths, coverages, accuracies, and with and without mate-pairs. It is hoped that, based on these evaluations, computational biologists will identify innovative sequence assembly paradigms, bioinformaticists will determine promising approaches for developing "next-generation" assemblers, and biotechnologists will formulate more meaningful design desiderata for sequencing technology platforms. A new software tool for computing the FRC metric has been developed and is available through the AMOS open-source consortium.
Analysis of the complete genome of the first Irkut virus isolate from China: comparison across the Lyssavirus genus.

PubMed

Liu, Ye; Li, Nan; Zhang, Shoufeng; Zhang, Fei; Lian, Hai; Wang, Ying; Zhang, Jinxia; Hu, Rongliang

2013-12-01

The genome of Irkut virus, isolate IRKV-THChina12, the first non-rabies lyssavirus from China (of bat origin), has been completely sequenced. In general, coding and non-coding regions of this viral genome are similar to those of other lyssaviruses. However, alignment of the deduced amino acid sequences of the structural proteins of IRKV-THChina12 with those of other lyssavirus representatives revealed significant variability between viral species. The nucleoprotein and matrix protein were found to be the most conserved, followed by the large protein, glycoprotein and phosphoprotein. Differences in the antigenic sites in glycoprotein may result in only partial protection of the available rabies biologics against Irkut virus, which is of particular concern for pre- and post-exposure rabies prophylaxis. Copyright © 2013 Elsevier Inc. All rights reserved.
A comprehensive resource of genomic, epigenomic and transcriptomic sequencing data for the black truffle Tuber melanosporum

PubMed Central

2014-01-01

Background Tuber melanosporum, also known in the gastronomic community as “truffle”, features one of the largest fungal genomes (125 Mb) with an exceptionally high transposable element (TE) and repetitive DNA content (>58%). The main purpose of DNA methylation in fungi is TE silencing. As obligate outcrossing organisms, truffles are bound to a sexual mode of propagation, which together with TEs is thought to represent a major force driving the evolution of DNA methylation. Thus, it was of interest to examine if and how T. melanosporum exploits DNA methylation to maintain genome integrity. Findings We performed whole-genome DNA bisulfite sequencing and mRNA sequencing on different developmental stages of T. melanosporum; namely, fruitbody (“truffle”), free-living mycelium and ectomycorrhiza. The data revealed a high rate of cytosine methylation (>44%), selectively targeting TEs rather than genes with a strong preference for CpG sites. Whole genome DNA sequencing uncovered multiple TE-enriched, copy number variant regions bearing a significant fraction of hypomethylated and expressed TEs, almost exclusively in free-living mycelium propagated in vitro. Treatment of mycelia with 5-azacytidine partially reduced DNA methylation and increased TE transcription. Our transcriptome assembly also resulted in the identification of a set of novel transcripts from 614 genes. Conclusions The datasets presented here provide valuable and comprehensive (epi)genomic information that can be of interest for evolutionary genomics studies of multicellular (filamentous) fungi, in particular Ascomycetes belonging to the subphylum, Pezizomycotina. Evidence derived from comparative methylome and transcriptome analyses indicates that a non-exhaustive and partly reversible methylation process operates in truffles. PMID:25392735
A comprehensive resource of genomic, epigenomic and transcriptomic sequencing data for the black truffle Tuber melanosporum.

PubMed

Chen, Pao-Yang; Montanini, Barbara; Liao, Wen-Wei; Morselli, Marco; Jaroszewicz, Artur; Lopez, David; Ottonello, Simone; Pellegrini, Matteo

2014-01-01

Tuber melanosporum, also known in the gastronomic community as "truffle", features one of the largest fungal genomes (125 Mb) with an exceptionally high transposable element (TE) and repetitive DNA content (>58%). The main purpose of DNA methylation in fungi is TE silencing. As obligate outcrossing organisms, truffles are bound to a sexual mode of propagation, which together with TEs is thought to represent a major force driving the evolution of DNA methylation. Thus, it was of interest to examine if and how T. melanosporum exploits DNA methylation to maintain genome integrity. We performed whole-genome DNA bisulfite sequencing and mRNA sequencing on different developmental stages of T. melanosporum; namely, fruitbody ("truffle"), free-living mycelium and ectomycorrhiza. The data revealed a high rate of cytosine methylation (>44%), selectively targeting TEs rather than genes with a strong preference for CpG sites. Whole genome DNA sequencing uncovered multiple TE-enriched, copy number variant regions bearing a significant fraction of hypomethylated and expressed TEs, almost exclusively in free-living mycelium propagated in vitro. Treatment of mycelia with 5-azacytidine partially reduced DNA methylation and increased TE transcription. Our transcriptome assembly also resulted in the identification of a set of novel transcripts from 614 genes. The datasets presented here provide valuable and comprehensive (epi)genomic information that can be of interest for evolutionary genomics studies of multicellular (filamentous) fungi, in particular Ascomycetes belonging to the subphylum, Pezizomycotina. Evidence derived from comparative methylome and transcriptome analyses indicates that a non-exhaustive and partly reversible methylation process operates in truffles.
Long-read sequence assembly of the firefly Pyrocoelia pectoralis genome

PubMed Central

Fu, Xinhua; Li, Jingjing; Tian, Yu; Quan, Weipeng; Zhang, Shu; Liu, Qian; Liang, Fan; Zhu, Xinlei; Zhang, Liangsheng

2017-01-01

Abstract Background Fireflies are a family of insects within the beetle order Coleoptera, or winged beetles, and they are one of the most well-known and loved insect species because of their bioluminescence. However, the firefly is in danger of extinction because of the massive destruction of its living environment. In order to improve the understanding of fireflies and protect them effectively, we sequenced the whole genome of the terrestrial firefly Pyrocoelia pectoralis. Findings Here, we developed a highly reliable genome resource for the terrestrial firefly Pyrocoelia pectoralis (E. Oliv., 1883; Coleoptera: Lampyridae) using single molecule real time (SMRT) sequencing on the PacBio Sequel platform. In total, 57.8 Gb of long reads were generated and assembled into a 760.4-Mb genome, which is close to the estimated genome size and covered 98.7% complete and 0.7% partial insect Benchmarking Universal Single-Copy Orthologs. The k-mer analysis showed that this genome is highly heterozygous. However, our long-read assembly demonstrates continuousness with a contig N50 length of 3.04 Mb and the longest contig length of 13.69 Mb. Furthermore, 135 589 SSRs and 341 Mb of repeat sequences were detected. A total of 23 092 genes were predicted; 88.44% of genes were annotated with one or more related functions. Conclusions We assembled a high-quality firefly genome, which will not only provide insights into the conservation and biodiversity of fireflies, but also provide a wealth of information to study the mechanisms of their sexual communication, bio-luminescence, and evolution. PMID:29186486
DNA capture and next-generation sequencing can recover whole mitochondrial genomes from highly degraded samples for human identification

PubMed Central

2013-01-01

Background Mitochondrial DNA (mtDNA) typing can be a useful aid for identifying people from compromised samples when nuclear DNA is too damaged, degraded or below detection thresholds for routine short tandem repeat (STR)-based analysis. Standard mtDNA typing, focused on PCR amplicon sequencing of the control region (HVS I and HVS II), is limited by the resolving power of this short sequence, which misses up to 70% of the variation present in the mtDNA genome. Methods We used in-solution hybridisation-based DNA capture (using DNA capture probes prepared from modern human mtDNA) to recover mtDNA from post-mortem human remains in which the majority of DNA is both highly fragmented (<100 base pairs in length) and chemically damaged. The method ‘immortalises’ the finite quantities of DNA in valuable extracts as DNA libraries, which is followed by the targeted enrichment of endogenous mtDNA sequences and characterisation by next-generation sequencing (NGS). Results We sequenced whole mitochondrial genomes for human identification from samples where standard nuclear STR typing produced only partial profiles or demonstrably failed and/or where standard mtDNA hypervariable region sequences lacked resolving power. Multiple rounds of enrichment can substantially improve coverage and sequencing depth of mtDNA genomes from highly degraded samples. The application of this method has led to the reliable mitochondrial sequencing of human skeletal remains from unidentified World War Two (WWII) casualties approximately 70 years old and from archaeological remains (up to 2,500 years old). Conclusions This approach has potential applications in forensic science, historical human identification cases, archived medical samples, kinship analysis and population studies. In particular the methodology can be applied to any case, involving human or non-human species, where whole mitochondrial genome sequences are required to provide the highest level of maternal lineage discrimination. Multiple rounds of in-solution hybridisation-based DNA capture can retrieve whole mitochondrial genome sequences from even the most challenging samples. PMID:24289217
Megabase sequencing of human genome by ordered-shotgun-sequencing (OSS) strategy

NASA Astrophysics Data System (ADS)

Chen, Ellson Y.

1997-05-01

So far we have used OSS strategy to sequence over 2 megabases DNA in large-insert clones from regions of human X chromosomes with different characteristic levels of GC content. The method starts by randomly fragmenting a BAC, YAC or PAC to 8-12 kb pieces and subcloning those into lambda phage. Insert-ends of these clones are sequenced and overlapped to create a partial map. Complete sequencing is then done on a minimal tiling path of selected subclones, recursively focusing on those at the edges of contigs to facilitate mergers of clones across the entire target. To reduce manual labor, PCR processes have been adapted to prepare sequencing templates throughout the entire operation. The streamlined process can thus lend itself to further automation. The OSS approach is suitable for large- scale genomic sequencing, providing considerable flexibility in the choice of subclones or regions for more or less intensive sequencing. For example, subclones containing contaminating host cell DNA or cloning vector can be recognized and ignored with minimal sequencing effort; regions overlapping a neighboring clone already sequenced need not be redone; and segments containing tandem repeats or long repetitive sequences can be spotted early on and targeted for additional attention.
Genomic clones for human cholinesterase

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kott, M.; Venta, P.J.; Larsen, J.

1987-05-01

A human genomic library was prepared from peripheral white blood cells from a single donor by inserting an MboI partial digest into BamHI poly-linker sites of EMBL3. This library was screened using an oligolabeled human cholinesterase cDNA probe over 700 bp long. The latter probe was obtained from a human basal ganglia cDNA library. Of approximately 2 million clones screened with high stringency conditions several positive clones were identified; two have been plaque purified. One of these clones has been partially mapped using restriction enzymes known to cut within the coded region of the cDNA for human serum cholinesterase. Hybridizationmore » of the fragments and their sizes are as expected if the genomic clone is cholinesterase. Sequencing of the DNA fragments in M13 is in progress to verify the identify of the clone and the location of introns.« less
Efficient engineering of chromosomal ribosome binding site libraries in mismatch repair proficient Escherichia coli.

PubMed

Oesterle, Sabine; Gerngross, Daniel; Schmitt, Steven; Roberts, Tania Michelle; Panke, Sven

2017-09-26

Multiplexed gene expression optimization via modulation of gene translation efficiency through ribosome binding site (RBS) engineering is a valuable approach for optimizing artificial properties in bacteria, ranging from genetic circuits to production pathways. Established algorithms design smart RBS-libraries based on a single partially-degenerate sequence that efficiently samples the entire space of translation initiation rates. However, the sequence space that is accessible when integrating the library by CRISPR/Cas9-based genome editing is severely restricted by DNA mismatch repair (MMR) systems. MMR efficiency depends on the type and length of the mismatch and thus effectively removes potential library members from the pool. Rather than working in MMR-deficient strains, which accumulate off-target mutations, or depending on temporary MMR inactivation, which requires additional steps, we eliminate this limitation by developing a pre-selection rule of genome-library-optimized-sequences (GLOS) that enables introducing large functional diversity into MMR-proficient strains with sequences that are no longer subject to MMR-processing. We implement several GLOS-libraries in Escherichia coli and show that GLOS-libraries indeed retain diversity during genome editing and that such libraries can be used in complex genome editing operations such as concomitant deletions. We argue that this approach allows for stable and efficient fine tuning of chromosomal functions with minimal effort.
The genomic and biological characterization of Citrullus lanatus cryptic virus infecting watermelon in China.

PubMed

Xin, Min; Cao, Mengji; Liu, Wenwen; Ren, Yingdang; Lu, Chuantao; Wang, Xifeng

2017-03-15

A dsRNA virus was detected in the watermelon (Citrullus lanatus) samples collected from Kaifeng, Henan province, China through the use of next generation sequencing of small RNAs. The complete genome of this virus is comprised of dsRNA-1 (1603nt) and dsRNA-2 (1466nt), both of which are single open reading frames and potentially encode a 54.2kDa RNA-dependent RNA polymerase (RdRp) and a 45.9kDa coat protein (CP), respectively. The RdRp and CP share the highest amino acid identities 85.3% and 75.4% with a previously reported Israeli strain Citrullus lanatus cryptic virus (CiLCV), respectively. Genome comparisons indicate that this virus is the same species with CiLCV, whereas the reported sequences of the Israeli strain of CiLCV are partial, and our newly identified sequences can represent the complete genome of CiLCV. Futhermore, phylogenetic tree analyses based on the RdRp sequences suggest that CiLCV is one member in the genus Deltapartitivirus, family Partitiviridae. In addition, field investigation and seed-borne bioassays show that CiLCV commonly occurs in many varieties and is transmitted though seeds at a very high rate. Copyright © 2017 Elsevier B.V. All rights reserved.
Genomic insights into the taxonomic status of the Bacillus cereus group

PubMed Central

Liu, Yang; Lai, Qiliang; Göker, Markus; Meier-Kolthoff, Jan P.; Wang, Meng; Sun, Yamin; Wang, Lei; Shao, Zongze

2015-01-01

The identification and phylogenetic relationships of bacteria within the Bacillus cereus group are controversial. This study aimed at determining the taxonomic affiliations of these strains using the whole-genome sequence-based Genome BLAST Distance Phylogeny (GBDP) approach. The GBDP analysis clearly separated 224 strains into 30 clusters, representing eleven known, partially merged species and accordingly 19–20 putative novel species. Additionally, 16S rRNA gene analysis, a novel variant of multi-locus sequence analysis (nMLSA) and screening of virulence genes were performed. The 16S rRNA gene sequence was not sufficient to differentiate the bacteria within this group due to its high conservation. The nMLSA results were consistent with GBDP. Moreover, a fast typing method was proposed using the pycA gene, and where necessary, the ccpA gene. The pXO plasmids and cry genes were widely distributed, suggesting little correlation with the phylogenetic positions of the host bacteria. This might explain why classifications based on virulence characteristics proved unsatisfactory in the past. In summary, this is the first large-scale and systematic study of the taxonomic status of the bacteria within the B. cereus group using whole-genome sequences, and is likely to contribute to further insights into their pathogenicity, phylogeny and adaptation to diverse environments. PMID:26373441
Transmissible Gastroenteritis Coronavirus Genome Packaging Signal Is Located at the 5′ End of the Genome and Promotes Viral RNA Incorporation into Virions in a Replication-Independent Process

PubMed Central

Morales, Lucia; Mateos-Gomez, Pedro A.; Capiscol, Carmen; del Palacio, Lorena; Sola, Isabel

2013-01-01

Preferential RNA packaging in coronaviruses involves the recognition of viral genomic RNA, a crucial process for viral particle morphogenesis mediated by RNA-specific sequences, known as packaging signals. An essential packaging signal component of transmissible gastroenteritis coronavirus (TGEV) has been further delimited to the first 598 nucleotides (nt) from the 5′ end of its RNA genome, by using recombinant viruses transcribing subgenomic mRNA that included potential packaging signals. The integrity of the entire sequence domain was necessary because deletion of any of the five structural motifs defined within this region abrogated specific packaging of this viral RNA. One of these RNA motifs was the stem-loop SL5, a highly conserved motif in coronaviruses located at nucleotide positions 106 to 136. Partial deletion or point mutations within this motif also abrogated packaging. Using TGEV-derived defective minigenomes replicated in trans by a helper virus, we have shown that TGEV RNA packaging is a replication-independent process. Furthermore, the last 494 nt of the genomic 3′ end were not essential for packaging, although this region increased packaging efficiency. TGEV RNA sequences identified as necessary for viral genome packaging were not sufficient to direct packaging of a heterologous sequence derived from the green fluorescent protein gene. These results indicated that TGEV genome packaging is a complex process involving many factors in addition to the identified RNA packaging signal. The identification of well-defined RNA motifs within the TGEV RNA genome that are essential for packaging will be useful for designing packaging-deficient biosafe coronavirus-derived vectors and providing new targets for antiviral therapies. PMID:23966403

Gramene 2013: comparative plant genomics resources.

PubMed

Monaco, Marcela K; Stein, Joshua; Naithani, Sushma; Wei, Sharon; Dharmawardhana, Palitha; Kumari, Sunita; Amarasinghe, Vindhya; Youens-Clark, Ken; Thomason, James; Preece, Justin; Pasternak, Shiran; Olson, Andrew; Jiao, Yinping; Lu, Zhenyuan; Bolser, Dan; Kerhornou, Arnaud; Staines, Dan; Walts, Brandon; Wu, Guanming; D'Eustachio, Peter; Haw, Robin; Croft, David; Kersey, Paul J; Stein, Lincoln; Jaiswal, Pankaj; Ware, Doreen

2014-01-01

Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework for genome comparison and the use of ontologies to integrate structural and functional annotation data. Whole-genome alignments complemented by phylogenetic gene family trees help infer syntenic and orthologous relationships. Genetic variation data, sequences and genome mappings available for 10 species, including Arabidopsis, rice and maize, help infer putative variant effects on genes and transcripts. The pathways section also hosts 10 species-specific metabolic pathways databases developed in-house or by our collaborators using Pathway Tools software, which facilitates searches for pathway, reaction and metabolite annotations, and allows analyses of user-defined expression datasets. Recently, we released a Plant Reactome portal featuring 133 curated rice pathways. This portal will be expanded for Arabidopsis, maize and other plant species. We continue to provide genetic and QTL maps and marker datasets developed by crop researchers. The project provides a unique community platform to support scientific research in plant genomics including studies in evolution, genetics, plant breeding, molecular biology, biochemistry and systems biology.
A Platform for Designing Genome-Based Personalized Immunotherapy or Vaccine against Cancer

PubMed Central

Gupta, Sudheer; Chaudhary, Kumardeep; Dhanda, Sandeep Kumar; Kumar, Rahul; Kumar, Shailesh; Sehgal, Manika; Nagpal, Gandharva

2016-01-01

Due to advancement in sequencing technology, genomes of thousands of cancer tissues or cell-lines have been sequenced. Identification of cancer-specific epitopes or neoepitopes from cancer genomes is one of the major challenges in the field of immunotherapy or vaccine development. This paper describes a platform Cancertope, developed for designing genome-based immunotherapy or vaccine against a cancer cell. Broadly, the integrated resources on this platform are apportioned into three precise sections. First section explains a cancer-specific database of neoepitopes generated from genome of 905 cancer cell lines. This database harbors wide range of epitopes (e.g., B-cell, CD8+ T-cell, HLA class I, HLA class II) against 60 cancer-specific vaccine antigens. Second section describes a partially personalized module developed for predicting potential neoepitopes against a user-specific cancer genome. Finally, we describe a fully personalized module developed for identification of neoepitopes from genomes of cancerous and healthy cells of a cancer-patient. In order to assist the scientific community, wide range of tools are incorporated in this platform that includes screening of epitopes against human reference proteome (http://www.imtech.res.in/raghava/cancertope/). PMID:27832200
Ribosomal DNA sequence heterogeneity reflects intraspecies phylogenies and predicts genome structure in two contrasting yeast species.

PubMed

West, Claire; James, Stephen A; Davey, Robert P; Dicks, Jo; Roberts, Ian N

2014-07-01

The ribosomal RNA encapsulates a wealth of evolutionary information, including genetic variation that can be used to discriminate between organisms at a wide range of taxonomic levels. For example, the prokaryotic 16S rDNA sequence is very widely used both in phylogenetic studies and as a marker in metagenomic surveys and the internal transcribed spacer region, frequently used in plant phylogenetics, is now recognized as a fungal DNA barcode. However, this widespread use does not escape criticism, principally due to issues such as difficulties in classification of paralogous versus orthologous rDNA units and intragenomic variation, both of which may be significant barriers to accurate phylogenetic inference. We recently analyzed data sets from the Saccharomyces Genome Resequencing Project, characterizing rDNA sequence variation within multiple strains of the baker's yeast Saccharomyces cerevisiae and its nearest wild relative Saccharomyces paradoxus in unprecedented detail. Notably, both species possess single locus rDNA systems. Here, we use these new variation datasets to assess whether a more detailed characterization of the rDNA locus can alleviate the second of these phylogenetic issues, sequence heterogeneity, while controlling for the first. We demonstrate that a strong phylogenetic signal exists within both datasets and illustrate how they can be used, with existing methodology, to estimate intraspecies phylogenies of yeast strains consistent with those derived from whole-genome approaches. We also describe the use of partial Single Nucleotide Polymorphisms, a type of sequence variation found only in repetitive genomic regions, in identifying key evolutionary features such as genome hybridization events and show their consistency with whole-genome Structure analyses. We conclude that our approach can transform rDNA sequence heterogeneity from a problem to a useful source of evolutionary information, enabling the estimation of highly accurate phylogenies of closely related organisms, and discuss how it could be extended to future studies of multilocus rDNA systems. [concerted evolution; genome hydridisation; phylogenetic analysis; ribosomal DNA; whole genome sequencing; yeast]. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
Genomic regions with a history of divergent selection affect fitness of hybrids between two butterfly species.

PubMed

Gompert, Zachariah; Lucas, Lauren K; Nice, Chris C; Fordyce, James A; Forister, Matthew L; Buerkle, C Alex

2012-07-01

Speciation is the process by which reproductively isolated lineages arise, and is one of the fundamental means by which the diversity of life increases. Whereas numerous studies have documented an association between ecological divergence and reproductive isolation, relatively little is known about the role of natural selection in genome divergence during the process of speciation. Here, we use genome-wide DNA sequences and Bayesian models to test the hypothesis that loci under divergent selection between two butterfly species (Lycaeides idas and L. melissa) also affect fitness in an admixed population. Locus-specific measures of genetic differentiation between L. idas and L. melissa and genomic introgression in hybrids varied across the genome. The most differentiated genetic regions were characterized by elevated L. idas ancestry in the admixed population, which occurs in L. idas-like habitat, consistent with the hypothesis that local adaptation contributes to speciation. Moreover, locus-specific measures of genetic differentiation (a metric of divergent selection) were positively associated with extreme genomic introgression (a metric of hybrid fitness). Interestingly, concordance of differentiation and introgression was only partial. We discuss multiple, complementary explanations for this partial concordance. © 2012 The Author(s).
New bioinformatic tool for quick identification of functionally relevant endogenous retroviral inserts in human genome.

PubMed

Garazha, Andrew; Ivanova, Alena; Suntsova, Maria; Malakhova, Galina; Roumiantsev, Sergey; Zhavoronkov, Alex; Buzdin, Anton

2015-01-01

Endogenous retroviruses (ERVs) and LTR retrotransposons (LRs) occupy ∼8% of human genome. Deep sequencing technologies provide clues to understanding of functional relevance of individual ERVs/LRs by enabling direct identification of transcription factor binding sites (TFBS) and other landmarks of functional genomic elements. Here, we performed the genome-wide identification of human ERVs/LRs containing TFBS according to the ENCODE project. We created the first interactive ERV/LRs database that groups the individual inserts according to their familial nomenclature, number of mapped TFBS and divergence from their consensus sequence. Information on any particular element can be easily extracted by the user. We also created a genome browser tool, which enables quick mapping of any ERV/LR insert according to genomic coordinates, known human genes and TFBS. These tools can be used to easily explore functionally relevant individual ERV/LRs, and for studying their impact on the regulation of human genes. Overall, we identified ∼110,000 ERV/LR genomic elements having TFBS. We propose a hypothesis of "domestication" of ERV/LR TFBS by the genome milieu including subsequent stages of initial epigenetic repression, partial functional release, and further mutation-driven reshaping of TFBS in tight coevolution with the enclosing genomic loci.
MEGAnnotator: a user-friendly pipeline for microbial genomes assembly and annotation.

PubMed

Lugli, Gabriele Andrea; Milani, Christian; Mancabelli, Leonardo; van Sinderen, Douwe; Ventura, Marco

2016-04-01

Genome annotation is one of the key actions that must be undertaken in order to decipher the genetic blueprint of organisms. Thus, a correct and reliable annotation is essential in rendering genomic data valuable. Here, we describe a bioinformatics pipeline based on freely available software programs coordinated by a multithreaded script named MEGAnnotator (Multithreaded Enhanced prokaryotic Genome Annotator). This pipeline allows the generation of multiple annotated formats fulfilling the NCBI guidelines for assembled microbial genome submission, based on DNA shotgun sequencing reads, and minimizes manual intervention, while also reducing waiting times between software program executions and improving final quality of both assembly and annotation outputs. MEGAnnotator provides an efficient way to pre-arrange the assembly and annotation work required to process NGS genome sequence data. The script improves the final quality of microbial genome annotation by reducing ambiguous annotations. Moreover, the MEGAnnotator platform allows the user to perform a partial annotation of pre-assembled genomes and includes an option to accomplish metagenomic data set assemblies. MEGAnnotator platform will be useful for microbiologists interested in genome analyses of bacteria as well as those investigating the complexity of microbial communities that do not possess the necessary skills to prepare their own bioinformatics pipeline. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
The complete mitochondrial genome of the Japanese ornamental koi carp (Cyprinus carpio) and its implication for the history of koi.

PubMed

Mabuchi, Kohji; Song, Hayeun

2014-02-01

Complete mitochondrial genome (mitogenome) sequences were determined for two individuals of Japanese ornamental koi carp. Interestingly, the obtained mitogenomes (16,581 bp) were both completely identical to the recently reported mitogenome of Oujiang color carp from China. Control region (CR) sequences in DNA database demonstrated that more than half (65%) of the koi carp individuals so far reported had partial or complete CR sequences identical to those from Oujiang color carp. These results might suggest that the Japanese koi carp has been originated from Chinese Oujiang color carp, contrary to the belief in Japan that the koi carps have been developed directly from carp stocks in Japan. In any case, the present results emphasize the importance of analyzing Oujiang color carp when studying the origin of koi carp.
Development of phylogenetic markers for Sebacina (Sebacinaceae) mycorrhizal fungi associated with Australian orchids.

PubMed

Ruibal, Monica P; Peakall, Rod; Foret, Sylvain; Linde, Celeste C

2014-06-01

To investigate fungal species identity and diversity in mycorrhizal fungi of order Sebacinales, we developed phylogenetic markers. These new markers will enable future studies investigating species delineation and phylogenetic relationships of the fungal symbionts and facilitate investigations into evolutionary interactions among Sebacina species and their orchid hosts. • We generated partial genome sequences for a Sebacina symbiont originating from Caladenia huegelii with 454 genome sequencing and from three symbionts from Eriochilus dilatatus and one from E. pulchellus using Illumina sequencing. Six nuclear and two mitochondrial loci showed high variability (10-31% parsimony informative sites) for Sebacinales mycorrhizal fungi across four genera of Australian orchids (Caladenia, Eriochilus, Elythranthera, and Glossodia). • We obtained highly informative DNA markers that will allow investigation of mycorrhizal diversity of Sebacinaceae fungi associated with terrestrial orchids in Australia and worldwide.
Re-sequencing transgenic plants revealed rearrangements at T-DNA inserts, and integration of a short T-DNA fragment, but no increase of small mutations elsewhere.

PubMed

Schouten, Henk J; Vande Geest, Henri; Papadimitriou, Sofia; Bemer, Marian; Schaart, Jan G; Smulders, Marinus J M; Perez, Gabino Sanchez; Schijlen, Elio

2017-03-01

Transformation resulted in deletions and translocations at T-DNA inserts, but not in genome-wide small mutations. A tiny T-DNA splinter was detected that probably would remain undetected by conventional techniques. We investigated to which extent Agrobacterium tumefaciens-mediated transformation is mutagenic, on top of inserting T-DNA. To prevent mutations due to in vitro propagation, we applied floral dip transformation of Arabidopsis thaliana. We re-sequenced the genomes of five primary transformants, and compared these to genomic sequences derived from a pool of four wild-type plants. By genome-wide comparisons, we identified ten small mutations in the genomes of the five transgenic plants, not correlated to the positions or number of T-DNA inserts. This mutation frequency is within the range of spontaneous mutations occurring during seed propagation in A. thaliana, as determined earlier. In addition, we detected small as well as large deletions specifically at the T-DNA insert sites. Furthermore, we detected partial T-DNA inserts, one of these a tiny 50-bp fragment originating from a central part of the T-DNA construct used, inserted into the plant genome without flanking other T-DNA. Because of its small size, we named this fragment a T-DNA splinter. As far as we know this is the first report of such a small T-DNA fragment insert in absence of any T-DNA border sequence. Finally, we found evidence for translocations from other chromosomes, flanking T-DNA inserts. In this study, we showed that next-generation sequencing (NGS) is a highly sensitive approach to detect T-DNA inserts in transgenic plants.
Whole genome sequence phylogenetic analysis of four Mexican rabies viruses isolated from cattle.

PubMed

Bárcenas-Reyes, I; Loza-Rubio, E; Cantó-Alarcón, G J; Luna-Cozar, J; Enríquez-Vázquez, A; Barrón-Rodríguez, R J; Milián-Suazo, F

2017-08-01

Phylogenetic analysis of the rabies virus in molecular epidemiology has been traditionally performed on partial sequences of the genome, such as the N, G, and P genes; however, that approach raises concerns about the discriminatory power compared to whole genome sequencing. In this study we characterized four strains of the rabies virus isolated from cattle in Querétaro, Mexico by comparing the whole genome sequence to that of strains from the American, European and Asian continents. Four cattle brain samples positive to rabies and characterized as AgV11, genotype 1, were used in the study. A cDNA sequence was generated by reverse transcription PCR (RT-PCR) using oligo dT. cDNA samples were sequenced in an Illumina NextSeq 500 platform. The phylogenetic analysis was performed with MEGA 6.0. Minimum evolution phylogenetic trees were constructed with the Neighbor-Joining method and bootstrapped with 1000 replicates. Three large and seven small clusters were formed with the 26 sequences used. The largest cluster grouped strains from different species in South America: Brazil, and the French Guyana. The second cluster grouped five strains from Mexico. A Mexican strain reported in a different study was highly related to our four strains, suggesting common source of infection. The phylogenetic analysis shows that the type of host is different for the different regions in the American Continent; rabies is more related to bats. It was concluded that the rabies virus in central Mexico is genetically stable and that it is transmitted by the vampire bat Desmodus rotundus. Copyright © 2017 Elsevier Ltd. All rights reserved.
Molecular detection and characterization of noroviruses in river water in Thailand.

PubMed

Inoue, K; Motomura, K; Boonchan, M; Takeda, N; Ruchusatsawa, K; Guntapong, R; Tacharoenmuang, R; Sangkitporn, S; Chantaroj, S

2016-03-01

Norovirus (NoV) generally exists as a mixture of multiple genotype variants in nature. However, there has been no published report monitoring NoV in natural settings in Thailand. To obtain information on mixed presence of the NoV RNA genome, we conducted viral genome analysis of 15 water specimens collected from five sites in a river near Bangkok between August 2013 and August 2014. The number of viral RNA copies per specimen declined progressively from the most upstream to the most downstream site. Following direct nucleotide sequencing of the PCR products, we obtained three partial genome sequences of the NoV GI strain and 13 partial genome sequences of the NoV GII strains. Phylogenetic analysis indicated the presence of four GII.4 variant groups pro-circulated after the Den Haag_2006b, New Orleans_2009 and Sydney_2012 outbreaks. On the other hand, only GI.4 was observed from the specimens collected on April, 2014. These results indicated that multiple genogroups and genotypes of noroviruses are present and are circulating in the natural environment in Thailand as in other countries. Our study provides comprehensive information on the occurrence of new variants. Our study is the first paper that multiple genogroups and genotypes of norovirus exist, and are circulating in the river water near Bangkok, Thailand. Phylogenetic analysis indicated the presence of four GII.4 variant groups pro-circulated after the Den Haag_2006b, New Orleans_2009 and Sydney_2012 that caused outbreaks in the world. Continued research will be essential for understanding the natural history of NoV and the control of future outbreaks. © 2015 The Society for Applied Microbiology.
Hyperexpansion of RNA Bacteriophage Diversity

PubMed Central

Krishnamurthy, Siddharth R.; Janowski, Andrew B.; Zhao, Guoyan; Barouch, Dan; Wang, David

2016-01-01

Bacteriophage modulation of microbial populations impacts critical processes in ocean, soil, and animal ecosystems. However, the role of bacteriophages with RNA genomes (RNA bacteriophages) in these processes is poorly understood, in part because of the limited number of known RNA bacteriophage species. Here, we identify partial genome sequences of 122 RNA bacteriophage phylotypes that are highly divergent from each other and from previously described RNA bacteriophages. These novel RNA bacteriophage sequences were present in samples collected from a range of ecological niches worldwide, including invertebrates and extreme microbial sediment, demonstrating that they are more widely distributed than previously recognized. Genomic analyses of these novel bacteriophages yielded multiple novel genome organizations. Furthermore, one RNA bacteriophage was detected in the transcriptome of a pure culture of Streptomyces avermitilis, suggesting for the first time that the known tropism of RNA bacteriophages may include gram-positive bacteria. Finally, reverse transcription PCR (RT-PCR)-based screening for two specific RNA bacteriophages in stool samples from a longitudinal cohort of macaques suggested that they are generally acutely present rather than persistent. PMID:27010970
Complete mitochondrial genomes of eleven extinct or possibly extinct bird species.

PubMed

Anmarkrud, Jarl A; Lifjeld, Jan T

2017-03-01

Natural history museum collections represent a vast source of ancient and historical DNA samples from extinct taxa that can be utilized by high-throughput sequencing tools to reveal novel genetic and phylogenetic information about them. Here, we report on the successful sequencing of complete mitochondrial genome sequences (mitogenomes) from eleven extinct bird species, using de novo assembly of short sequences derived from toepad samples of degraded DNA from museum specimens. For two species (the Passenger Pigeon Ectopistes migratorius and the South Island Piopio Turnagra capensis), whole mitogenomes were already available from recent studies, whereas for five others (the Great Auk Pinguinis impennis, the Imperial Woodpecker Campehilus imperialis, the Huia Heteralocha acutirostris, the Kauai Oo Moho braccathus and the South Island Kokako Callaeas cinereus), there were partial mitochondrial sequences available for comparison. For all seven species, we found sequence similarities of >98%. For the remaining four species (the Kamao Myadestes myadestinus, the Paradise Parrot Psephotellus pulcherrimus, the Ou Psittirostra psittacea and the Lesser Akialoa Akialoa obscura), there was no sequence information available for comparison, so we conducted blast searches and phylogenetic analyses to determine their phylogenetic positions and identify their closest extant relatives. These mitogenomes will be valuable for future analyses of avian phylogenetics and illustrate the importance of museum collections as repositories for genomics resources. © 2016 John Wiley & Sons Ltd.
Persea americana (avocado): bringing ancient flowers to fruit in the genomics era.

PubMed

Chanderbali, André S; Albert, Victor A; Ashworth, Vanessa E T M; Clegg, Michael T; Litz, Richard E; Soltis, Douglas E; Soltis, Pamela S

2008-04-01

The avocado (Persea americana) is a major crop commodity worldwide. Moreover, avocado, a paleopolyploid, is an evolutionary "outpost" among flowering plants, representing a basal lineage (the magnoliid clade) near the origin of the flowering plants themselves. Following centuries of selective breeding, avocado germplasm has been characterized at the level of microsatellite and RFLP markers. Nonetheless, little is known beyond these general diversity estimates, and much work remains to be done to develop avocado as a major subtropical-zone crop. Among the goals of avocado improvement are to develop varieties with fruit that will "store" better on the tree, show uniform ripening and have better post-harvest storage. Avocado transcriptome sequencing, genome mapping and partial genomic sequencing will represent a major step toward the goal of sequencing the entire avocado genome, which is expected to aid in improving avocado varieties and production, as well as understanding the evolution of flowers from non-flowering seed plants (gymnosperms). Additionally, continued evolutionary and other comparative studies of flower and fruit development in different avocado strains can be accomplished at the gene expression level, including in comparison with avocado relatives, and these should provide important insights into the genetic regulation of fruit development in basal angiosperms.
Phylogenetic analysis of nitrite, nitric oxide, and nitrous oxide respiratory enzymes reveal a complex evolutionary history for denitrification.

PubMed

Jones, Christopher M; Stres, Blaz; Rosenquist, Magnus; Hallin, Sara

2008-09-01

Denitrification is a facultative respiratory pathway in which nitrite (NO2(-)), nitric oxide (NO), and nitrous oxide (N2O) are successively reduced to nitrogen gas (N(2)), effectively closing the nitrogen cycle. The ability to denitrify is widely dispersed among prokaryotes, and this polyphyletic distribution has raised the possibility of horizontal gene transfer (HGT) having a substantial role in the evolution of denitrification. Comparisons of 16S rRNA and denitrification gene phylogenies in recent studies support this possibility; however, these results remain speculative as they are based on visual comparisons of phylogenies from partial sequences. We reanalyzed publicly available nirS, nirK, norB, and nosZ partial sequences using Bayesian and maximum likelihood phylogenetic inference. Concomitant analysis of denitrification genes with 16S rRNA sequences from the same organisms showed substantial differences between the trees, which were supported by examining the posterior probability of monophyletic constraints at different taxonomic levels. Although these differences suggest HGT of denitrification genes, the presence of structural variants for nirK, norB, and nosZ makes it difficult to determine HGT from other evolutionary events. Additional analysis using phylogenetic networks and likelihood ratio tests of phylogenies based on full-length sequences retrieved from genomes also revealed significant differences in tree topologies among denitrification and 16S rRNA gene phylogenies, with the exception of the nosZ gene phylogeny within the data set of the nirK-harboring genomes. However, inspection of codon usage and G + C content plots from complete genomes gave no evidence for recent HGT. Instead, the close proximity of denitrification gene copies in the genomes of several denitrifying bacteria suggests duplication. Although HGT cannot be ruled out as a factor in the evolution of denitrification genes, our analysis suggests that other phenomena, such gene duplication/divergence and lineage sorting, may have differently influenced the evolution of each denitrification gene.
A new polymorphic and multicopy MHC gene family related to nonmammalian class I

DOE Office of Scientific and Technical Information (OSTI.GOV)

Leelayuwat, C.; Degli-Esposti, M.A.; Abraham, L.J.

1994-12-31

The authors have used genomic analysis to characterize a region of the central major histocompatibility complex (MHC) spanning {approximately} 300 kilobases (kb) between TNF and HLA-B. This region has been suggested to carry genetic factors relevant to the development of autoimmune diseases such as myasthenia gravis (MG) and insulin dependent diabetes mellitus (IDDM). Genomic sequence was analyzed for coding potential, using two neural network programs, GRAIL and GeneParser. A genomic probe, JAB, containing putative coding sequences (PERB11) located 60 kb centromeric of HLA-B, was used for northern analysis of human tissues. Multiple transcripts were detected. Southern analysis of genomic DNAmore » and overlapping YAC clones, covering the region from BAT1 to HLA-F, indicated that there are at least five copies of PERB11, four of which are located within this region of the MHC. The partial cDNA sequence of PERB11 was obtained from poly-A RNA derived from skeletal muscle. The putative amino acid sequence of PERB11 shares {approximately} 30% identity to MHC class I molecules from various species, including reptiles, chickens, and frogs, as well as to other MHC class I-like molecules, such as the IgG FcR of the mouse and rat and the human Zn-{alpha}2-glycoprotein. From direct comparison of amino acid sequences, it is concluded that PERB11 is a distinct molecule more closely related to nonmammalian than known mammalian MHC class I molecules. Genomic sequence analysis of PERB11 from five MHC ancestral haplotypes (AH) indicated that the gene is polymorphic at both DNA and protein level. The results suggest that the authors have identified a novel polymorphic gene family with multiple copies within the MHC. 48 refs., 10 figs., 2 tabs.« less
A little bit of sex matters for genome evolution in asexual plants.

PubMed

Hojsgaard, Diego; Hörandl, Elvira

2015-01-01

Genome evolution in asexual organisms is theoretically expected to be shaped by various factors: first, hybrid origin, and polyploidy confer a genomic constitution of highly heterozygous genotypes with multiple copies of genes; second, asexuality confers a lack of recombination and variation in populations, which reduces the efficiency of selection against deleterious mutations; hence, the accumulation of mutations and a gradual increase in mutational load (Muller's ratchet) would lead to rapid extinction of asexual lineages; third, allelic sequence divergence is expected to result in rapid divergence of lineages (Meselson effect). Recent transcriptome studies on the asexual polyploid complex Ranunculus auricomus using single-nucleotide polymorphisms confirmed neutral allelic sequence divergence within a short time frame, but rejected a hypothesis of a genome-wide accumulation of mutations in asexuals compared to sexuals, except for a few genes related to reproductive development. We discuss a general model that the observed incidence of facultative sexuality in plants may unmask deleterious mutations with partial dominance and expose them efficiently to purging selection. A little bit of sex may help to avoid genomic decay and extinction.
HIV-1 Full-Genome Phylogenetics of Generalized Epidemics in Sub-Saharan Africa: Impact of Missing Nucleotide Characters in Next-Generation Sequences

PubMed Central

Wymant, Chris; Colijn, Caroline; Danaviah, Siva; Essex, Max; Frost, Simon; Gall, Astrid; Gaseitsiwe, Simani; Grabowski, Mary K.; Gray, Ronald; Guindon, Stephane; von Haeseler, Arndt; Kaleebu, Pontiano; Kendall, Michelle; Kozlov, Alexey; Manasa, Justen; Minh, Bui Quang; Moyo, Sikhulile; Novitsky, Vlad; Nsubuga, Rebecca; Pillay, Sureshnee; Quinn, Thomas C.; Serwadda, David; Ssemwanga, Deogratius; Stamatakis, Alexandros; Trifinopoulos, Jana; Wawer, Maria; Brown, Andy Leigh; de Oliveira, Tulio; Kellam, Paul; Pillay, Deenan; Fraser, Christophe

2017-01-01

Abstract To characterize HIV-1 transmission dynamics in regions where the burden of HIV-1 is greatest, the “Phylogenetics and Networks for Generalised HIV Epidemics in Africa” consortium (PANGEA-HIV) is sequencing full-genome viral isolates from across sub-Saharan Africa. We report the first 3,985 PANGEA-HIV consensus sequences from four cohort sites (Rakai Community Cohort Study, n = 2,833; MRC/UVRI Uganda, n = 701; Mochudi Prevention Project, n = 359; Africa Health Research Institute Resistance Cohort, n = 92). Next-generation sequencing success rates varied: more than 80% of the viral genome from the gag to the nef genes could be determined for all sequences from South Africa, 75% of sequences from Mochudi, 60% of sequences from MRC/UVRI Uganda, and 22% of sequences from Rakai. Partial sequencing failure was primarily associated with low viral load, increased for amplicons closer to the 3′ end of the genome, was not associated with subtype diversity except HIV-1 subtype D, and remained significantly associated with sampling location after controlling for other factors. We assessed the impact of the missing data patterns in PANGEA-HIV sequences on phylogeny reconstruction in simulations. We found a threshold in terms of taxon sampling below which the patchy distribution of missing characters in next-generation sequences (NGS) has an excess negative impact on the accuracy of HIV-1 phylogeny reconstruction, which is attributable to tree reconstruction artifacts that accumulate when branches in viral trees are long. The large number of PANGEA-HIV sequences provides unprecedented opportunities for evaluating HIV-1 transmission dynamics across sub-Saharan Africa and identifying prevention opportunities. Molecular epidemiological analyses of these data must proceed cautiously because sequence sampling remains below the identified threshold and a considerable negative impact of missing characters on phylogeny reconstruction is expected. PMID:28540766
HIV-1 full-genome phylogenetics of generalized epidemics in sub-Saharan Africa: impact of missing nucleotide characters in next-generation sequences.

PubMed

Ratmann, Oliver; Wymant, Chris; Colijn, Caroline; Danaviah, Siva; Essex, M; Frost, Simon D W; Gall, Astrid; Gaiseitsiwe, Simani; Grabowski, Mary; Gray, Ronald; Guindon, Stephane; von Haeseler, Arndt; Kaleebu, Pontiano; Kendall, Michelle; Kozlov, Alexey; Manasa, Justen; Minh, Bui Quang; Moyo, Sikhulile; Novitsky, Vladimir; Nsubuga, Rebecca; Pillay, Sureshnee; Quinn, Thomas C; Serwadda, David; Ssemwanga, Deogratius; Stamatakis, Alexandros; Trifinopoulos, Jana; Wawer, Maria; Leigh Brown, Andrew; de Oliveira, Tulio; Kellam, Paul; Pillay, Deenan; Fraser, Christophe

2017-05-25

To characterize HIV-1 transmission dynamics in regions where the burden of HIV-1 is greatest, the 'Phylogenetics and Networks for Generalised HIV Epidemics in Africa' consortium (PANGEA-HIV) is sequencing full-genome viral isolates from across sub-Saharan Africa. We report the first 3,985 PANGEA-HIV consensus sequences from four cohort sites (Rakai Community Cohort Study, n=2,833; MRC/UVRI Uganda, n=701; Mochudi Prevention Project, n=359; Africa Health Research Institute Resistance Cohort, n=92). Next-generation sequencing success rates varied: more than 80% of the viral genome from the gag to the nef genes could be determined for all sequences from South Africa, 75% of sequences from Mochudi, 60% of sequences from MRC/UVRI Uganda, and 22% of sequences from Rakai. Partial sequencing failure was primarily associated with low viral load, increased for amplicons closer to the 3' end of the genome, was not associated with subtype diversity except HIV-1 subtype D, and remained significantly associated with sampling location after controlling for other factors. We assessed the impact of the missing data patterns in PANGEA-HIV sequences on phylogeny reconstruction in simulations. We found a threshold in terms of taxon sampling below which the patchy distribution of missing characters in next-generation sequences has an excess negative impact on the accuracy of HIV-1 phylogeny reconstruction, which is attributable to tree reconstruction artifacts that accumulate when branches in viral trees are long. The large number of PANGEA-HIV sequences provides unprecedented opportunities for evaluating HIV-1 transmission dynamics across sub-Saharan Africa and identifying prevention opportunities. Molecular epidemiological analyses of these data must proceed cautiously because sequence sampling remains below the identified threshold and a considerable negative impact of missing characters on phylogeny reconstruction is expected.
Differential gene expression in the siphonophore Nanomia bijuga (Cnidaria) assessed with multiple next-generation sequencing workflows.

PubMed

Siebert, Stefan; Robinson, Mark D; Tintori, Sophia C; Goetz, Freya; Helm, Rebecca R; Smith, Stephen A; Shaner, Nathan; Haddock, Steven H D; Dunn, Casey W

2011-01-01

We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through workflow choice and deeper reference sequencing.

Differential Gene Expression in the Siphonophore Nanomia bijuga (Cnidaria) Assessed with Multiple Next-Generation Sequencing Workflows

PubMed Central

Siebert, Stefan; Robinson, Mark D.; Tintori, Sophia C.; Goetz, Freya; Helm, Rebecca R.; Smith, Stephen A.; Shaner, Nathan; Haddock, Steven H. D.; Dunn, Casey W.

2011-01-01

We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through workflow choice and deeper reference sequencing. PMID:21829563
A prospective pilot study of genome-wide exome and transcriptome profiling in patients with small cell lung cancer progressing after first-line therapy.

PubMed

Weiss, Glen J; Byron, Sara A; Aldrich, Jessica; Sangal, Ashish; Barilla, Heather; Kiefer, Jeffrey A; Carpten, John D; Craig, David W; Whitsett, Timothy G

2017-01-01

Small cell lung cancer (SCLC) that has progressed after first-line therapy is an aggressive disease with few effective therapeutic strategies. In this prospective study, we employed next-generation sequencing (NGS) to identify therapeutically actionable alterations to guide treatment for advanced SCLC patients. Twelve patients with SCLC were enrolled after failing platinum-based chemotherapy. Following informed consent, genome-wide exome and RNA-sequencing was performed in a CLIA-certified, CAP-accredited environment. Actionable targets were identified and therapeutic recommendations made from a pharmacopeia of FDA-approved drugs. Clinical response to genomically-guided treatment was evaluated by Response Evaluation Criteria in Solid Tumors (RECIST) 1.1. The study completed its accrual goal of 12 evaluable patients. The minimum tumor content for successful NGS was 20%, with a median turnaround time from sample collection to genomics-based treatment recommendation of 27 days. At least two clinically actionable targets were identified in each patient, and six patients (50%) received treatment identified by NGS. Two had partial responses by RECIST 1.1 on a clinical trial involving a PD-1 inhibitor + irinotecan (indicated by MLH1 alteration). The remaining patients had clinical deterioration before NGS recommended therapy could be initiated. Comprehensive genomic profiling using NGS identified clinically-actionable alterations in SCLC patients who progressed on initial therapy. Recommended PD-1 therapy generated partial responses in two patients. Earlier access to NGS guided therapy, along with improved understanding of those SCLC patients likely to respond to immune-based therapies, should help to extend survival in these cases with poor outcomes.
Molecular characterization of a new species in the genus Alphacoronavirus associated with mink epizootic catarrhal gastroenteritis

PubMed Central

Vlasova, Anastasia N.; Halpin, Rebecca; Wang, Shiliang; Ghedin, Elodie; Spiro, David J.

2011-01-01

A coronavirus (CoV) previously shown to be associated with catarrhal gastroenteritis in mink (Mustela vison) was identified by electron microscopy in mink faeces from two fur farms in Wisconsin and Minnesota in 1998. A pan-coronavirus and a genus-specific RT-PCR assay were used initially to demonstrate that the newly discovered mink CoVs (MCoVs) were members of the genus Alphacoronavirus. Subsequently, using a random RT-PCR approach, full-genomic sequences were generated that further confirmed that, phylogenetically, the MCoVs belonged to the genus Alphacoronavirus, with closest relatedness to the recently identified but only partially sequenced (fragments of the polymerase, and full-length spike, 3c, envelope, nucleoprotein, membrane, 3x and 7b genes) ferret enteric coronavirus (FRECV) and ferret systemic coronavirus (FRSCV). The molecular data presented in this study provide the first genetic evidence for a new coronavirus associated with epizootic catarrhal gastroenteritis outbreaks in mink and demonstrate that MCoVs possess high genomic variability and relatively low overall nucleotide sequence identities (91.7 %) between contemporary strains. Additionally, the new MCoVs appeared to be phylogenetically distant from human (229E and NL63) and other alphacoronaviruses and did not belong to the species Alphacoronavirus 1. It is proposed that, together with the partially sequenced FRECV and FRSCV, they comprise a new species within the genus Alphacoronavirus. PMID:21346029
SCHEMA computational design of virus capsid chimeras: calibrating how genome packaging, protection, and transduction correlate with calculated structural disruption.

PubMed

Ho, Michelle L; Adler, Benjamin A; Torre, Michael L; Silberg, Jonathan J; Suh, Junghae

2013-12-20

Adeno-associated virus (AAV) recombination can result in chimeric capsid protein subunits whose ability to assemble into an oligomeric capsid, package a genome, and transduce cells depends on the inheritance of sequence from different AAV parents. To develop quantitative design principles for guiding site-directed recombination of AAV capsids, we have examined how capsid structural perturbations predicted by the SCHEMA algorithm correlate with experimental measurements of disruption in seventeen chimeric capsid proteins. In our small chimera population, created by recombining AAV serotypes 2 and 4, we found that protection of viral genomes and cellular transduction were inversely related to calculated disruption of the capsid structure. Interestingly, however, we did not observe a correlation between genome packaging and calculated structural disruption; a majority of the chimeric capsid proteins formed at least partially assembled capsids and more than half packaged genomes, including those with the highest SCHEMA disruption. These results suggest that the sequence space accessed by recombination of divergent AAV serotypes is rich in capsid chimeras that assemble into 60-mer capsids and package viral genomes. Overall, the SCHEMA algorithm may be useful for delineating quantitative design principles to guide the creation of libraries enriched in genome-protecting virus nanoparticles that can effectively transduce cells. Such improvements to the virus design process may help advance not only gene therapy applications but also other bionanotechnologies dependent upon the development of viruses with new sequences and functions.
SCHEMA computational design of virus capsid chimeras: calibrating how genome packaging, protection, and transduction correlate with calculated structural disruption

PubMed Central

Ho, Michelle L.; Adler, Benjamin A.; Torre, Michael L.; Silberg, Jonathan J.; Suh, Junghae

2013-01-01

Adeno-associated virus (AAV) recombination can result in chimeric capsid protein subunits whose ability to assemble into an oligomeric capsid, package a genome, and transduce cells depends on the inheritance of sequence from different AAV parents. To develop quantitative design principles for guiding site-directed recombination of AAV capsids, we have examined how capsid structural perturbations predicted by the SCHEMA algorithm correlate with experimental measurements of disruption in seventeen chimeric capsid proteins. In our small chimera population, created by recombining AAV serotypes 2 and 4, we found that protection of viral genomes and cellular transduction were inversely related to calculated disruption of the capsid structure. Interestingly, however, we did not observe a correlation between genome packaging and calculated structural disruption; a majority of the chimeric capsid proteins formed at least partially assembled capsids and more than half packaged genomes, including those with the highest SCHEMA disruption. These results suggest that the sequence space accessed by recombination of divergent AAV serotypes is rich in capsid chimeras that assemble into 60-mer capsids and package viral genomes. Overall, the SCHEMA algorithm may be useful for delineating quantitative design principles to guide the creation of libraries enriched in genome-protecting virus nanoparticles that can effectively transduce cells. Such improvements to the virus design process may help advance not only gene therapy applications, but also other bionanotechnologies dependent upon the development of viruses with new sequences and functions. PMID:23899192
Genomic sequencing and analyses of HearMNPV—a new Multinucleocapsid nucleopolyhedrovirus isolated from Helicoverpa armigera

PubMed Central

2012-01-01

Background HearMNPV, a nucleopolyhedrovirus (NPV), which infects the cotton bollworm, Helicoverpa armigera, comprises multiple rod-shaped nucleocapsids in virion(as detected by electron microscopy). HearMNPV shows a different host range compared with H. armigera single-nucleocapsid NPV (HearSNPV). To better understand HearMNPV, the HearMNPV genome was sequenced and analyzed. Methods The morphology of HearMNPV was observed by electron microscope. The qPCR was used to determine the replication kinetics of HearMNPV infectious for H. armigera in vivo. A random genomic library of HearMNPV was constructed according to the “partial filling-in” method, the sequence and organization of the HearMNPV genome was analyzed and compared with sequence data from other baculoviruses. Results Real time qPCR showed that HearMNPV DNA replication included a decreasing phase, latent phase, exponential phase, and a stationary phase during infection of H. armigera. The HearMNPV genome consists of 154,196 base pairs, with a G + C content of 40.07%. 162 putative ORFs were detected in the HearMNPV genome, which represented 90.16% of the genome. The remaining 9.84% constitute four homologous regions and other non-coding regions. The gene content and gene arrangement in HearMNPV were most similar to those of Mamestra configurata NPV-B (MacoNPV-B), but was different to HearSNPV. Comparison of the genome of HearMNPV and MacoNPV-B suggested that HearMNPV has a deletion of a 5.4-kb fragment containing five ORFs. In addition, HearMNPV orf66, bro genes, and hrs are different to the corresponding parts of the MacoNPV-B genome. Conclusions HearMNPV can replicate in vivo in H. armigera and in vitro, and is a new NPV isolate distinguished from HearSNPV. HearMNPV is most closely related to MacoNPV-B, but has a distinct genomic structure, content, and organization. PMID:22913743
The mitochondrial genome of Polistes jokahamae and a phylogenetic analysis of the Vespoidea (Insecta: Hymenoptera).

PubMed

Song, Sheng-Nan; Chen, Peng-Yan; Wei, Shu-Jun; Chen, Xue-Xin

2016-07-01

The mitochondrial genome sequence of Polistes jokahamae (Radoszkowski, 1887) (Hymenoptera: Vespidae) (GenBank accession no. KR052468) was sequenced. The current length with partial A + T-rich region of this mitochondrial genome is 16,616 bp. All the typical mitochondrial genes were sequenced except for three tRNAs (trnI, trnQ, and trnY) located between the A + T-rich region and nad2. At least three rearrangement events occurred in the sequenced region compared with the pupative ancestral arrangement of insects, corresponding to the shuffling of trnK and trnD, translocation or remote inversion of tnnY and translocation of trnL1. All protein-coding genes start with ATN codons. Eleven, one, and another one protein-coding genes stop with termination codon TAA, TA, and T, respectively. Phylogenetic analysis using the Bayesian method based on all codon positions of the 13 protein-coding genes supports the monophyly of Vespidae and Formicidae. Within the Formicidae, the Myrmicinae and Formicinae form a sister lineage and then sister to the Dolichoderinae, while within the Vespidae, the Eumeninae is sister to the lineage of Vespinae + Polistinae.
A global assembly of cotton ESTs

PubMed Central

Udall, Joshua A.; Swanson, Jordan M.; Haller, Karl; Rapp, Ryan A.; Sparks, Michael E.; Hatfield, Jamie; Yu, Yeisoo; Wu, Yingru; Dowd, Caitriona; Arpat, Aladdin B.; Sickler, Brad A.; Wilkins, Thea A.; Guo, Jin Ying; Chen, Xiao Ya; Scheffler, Jodi; Taliercio, Earl; Turley, Ricky; McFadden, Helen; Payton, Paxton; Klueva, Natalya; Allen, Randell; Zhang, Deshui; Haigler, Candace; Wilkerson, Curtis; Suo, Jinfeng; Schulze, Stefan R.; Pierce, Margaret L.; Essenberg, Margaret; Kim, HyeRan; Llewellyn, Danny J.; Dennis, Elizabeth S.; Kudrna, David; Wing, Rod; Paterson, Andrew H.; Soderlund, Cari; Wendel, Jonathan F.

2006-01-01

Approximately 185,000 Gossypium EST sequences comprising >94,800,000 nucleotides were amassed from 30 cDNA libraries constructed from a variety of tissues and organs under a range of conditions, including drought stress and pathogen challenges. These libraries were derived from allopolyploid cotton (Gossypium hirsutum; AT and DT genomes) as well as its two diploid progenitors, Gossypium arboreum (A genome) and Gossypium raimondii (D genome). ESTs were assembled using the Program for Assembling and Viewing ESTs (PAVE), resulting in 22,030 contigs and 29,077 singletons (51,107 unigenes). Further comparisons among the singletons and contigs led to recognition of 33,665 exemplar sequences that represent a nonredundant set of putative Gossypium genes containing partial or full-length coding regions and usually one or two UTRs. The assembly, along with their UniProt BLASTX hits, GO annotation, and Pfam analysis results, are freely accessible as a public resource for cotton genomics. Because ESTs from diploid and allotetraploid Gossypium were combined in a single assembly, we were in many cases able to bioinformatically distinguish duplicated genes in allotetraploid cotton and assign them to either the A or D genome. The assembly and associated information provide a framework for future investigation of cotton functional and evolutionary genomics. PMID:16478941
Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

PubMed

Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

2015-05-01

To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.
Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering

PubMed Central

Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

2015-01-01

Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745
Distinct Zika Virus Lineage in Salvador, Bahia, Brazil

PubMed Central

Naccache, Samia N.; Thézé, Julien; Sardi, Silvia I.; Somasekar, Sneha; Greninger, Alexander L.; Bandeira, Antonio C.; Campos, Gubio S.; Tauro, Laura B.; Faria, Nuno R.; Pybus, Oliver G.

2016-01-01

Sequencing of isolates from patients in Bahia, Brazil, where most Zika virus cases in Brazil have been reported, resulted in 11 whole and partial Zika virus genomes. Phylogenetic analyses revealed a well-supported Bahia-specific Zika virus lineage, which indicates sustained Zika virus circulation in Salvador, Bahia’s capital city, since mid-2014. PMID:27448188
Identification of Bacterial Species in Kuwaiti Waters Through DNA Sequencing

NASA Astrophysics Data System (ADS)

Chen, K.

2017-01-01

With an objective of identifying the bacterial diversity associated with ecosystem of various Kuwaiti Seas, bacteria were cultured and isolated from 3 water samples. Due to the difficulties for cultured and isolated fecal coliforms on the selective agar plates, bacterial isolates from marine agar plates were selected for molecular identification. 16S rRNA genes were successfully amplified from the genome of the selected isolates using Universal Eubacterial 16S rRNA primers. The resulted amplification products were subjected to automated DNA sequencing. Partial 16S rDNA sequences obtained were compared directly with sequences in the NCBI database using BLAST as well as with the sequences available with Ribosomal Database Project (RDP).
Software for optimization of SNP and PCR-RFLP genotyping to discriminate many genomes with the fewest assays

PubMed Central

Gardner, Shea N; Wagner, Mark C

2005-01-01

Background Microbial forensics is important in tracking the source of a pathogen, whether the disease is a naturally occurring outbreak or part of a criminal investigation. Results A method and SPR Opt (SNP and PCR-RFLP Optimization) software to perform a comprehensive, whole-genome analysis to forensically discriminate multiple sequences is presented. Tools for the optimization of forensic typing using Single Nucleotide Polymorphism (SNP) and PCR-Restriction Fragment Length Polymorphism (PCR-RFLP) analyses across multiple isolate sequences of a species are described. The PCR-RFLP analysis includes prediction and selection of optimal primers and restriction enzymes to enable maximum isolate discrimination based on sequence information. SPR Opt calculates all SNP or PCR-RFLP variations present in the sequences, groups them into haplotypes according to their co-segregation across those sequences, and performs combinatoric analyses to determine which sets of haplotypes provide maximal discrimination among all the input sequences. Those set combinations requiring that membership in the fewest haplotypes be queried (i.e. the fewest assays be performed) are found. These analyses highlight variable regions based on existing sequence data. These markers may be heterogeneous among unsequenced isolates as well, and thus may be useful for characterizing the relationships among unsequenced as well as sequenced isolates. The predictions are multi-locus. Analyses of mumps and SARS viruses are summarized. Phylogenetic trees created based on SNPs, PCR-RFLPs, and full genomes are compared for SARS virus, illustrating that purported phylogenies based only on SNP or PCR-RFLP variations do not match those based on multiple sequence alignment of the full genomes. Conclusion This is the first software to optimize the selection of forensic markers to maximize information gained from the fewest assays, accepting whole or partial genome sequence data as input. As more sequence data becomes available for multiple strains and isolates of a species, automated, computational approaches such as those described here will be essential to make sense of large amounts of information, and to guide and optimize efforts in the laboratory. The software and source code for SPR Opt is publicly available and free for non-profit use at . PMID:15904493
Characterization of Apricot pseudo-chlorotic leaf spot virus, A Novel Trichovirus Isolated from Stone Fruit Trees.

PubMed

Liberti, D; Marais, A; Svanella-Dumas, L; Dulucq, M J; Alioto, D; Ragozzino, A; Rodoni, B; Candresse, T

2005-04-01

ABSTRACT A trichovirus closely related to Apple chlorotic leaf spot virus (ACLSV) was detected in symptomatic apricot and Japanese plum from Italy. The Sus2 isolate of this agent cross-reacted with anti-ACLSV polyclonal reagents but was not detected by broad-specificity anti- ACLSV monoclonal antibodies. It had particles with typical trichovirus morphology but, contrary to ACLSV, was unable to infect Chenopodium quinoa and C. amaranticolor. The sequence of its genome (7,494 nucleotides [nt], missing only approximately 30 to 40 nt of the 5' terminal sequence) and the partial sequence of another isolate were determined. The new virus has a genomic organization similar to that of ACLSV, with three open reading frames coding for a replication-associated protein (RNA-dependent RNA polymerase), a movement protein, and a capsid protein, respectively. However, it had only approximately 65 to 67% nucleotide identity with sequenced isolates of ACLSV. The differences in serology, host range, genome sequence, and phylogenetic reconstructions for all viral proteins support the idea that this agent should be considered a new virus, for which the name Apricot pseudo-chlorotic leaf spot virus (APCLSV) is proposed. APCLSV shows substantial sequence variability and has been recovered from various Prunus sources coming from seven countries, an indication that it is likely to have a wide geographical distribution.
Novel, non-symbiotic isolates of Neorhizobium from a dryland agricultural soil.

PubMed

Soenens, Amalia; Imperial, Juan

2018-01-01

Semi-selective enrichment, followed by PCR screening, resulted in the successful direct isolation of fast-growing Rhizobia from a dryland agricultural soil. Over 50% of these isolates belong to the genus Neorhizobium , as concluded from partial rpoB and near-complete 16S rDNA sequence analysis. Further genotypic and genomic analysis of five representative isolates confirmed that they form a coherent group within Neorhizobium , closer to N. galegae than to the remaining Neorhizobium species, but clearly differentiated from the former, and constituting at least one new genomospecies within Neorhizobium. All the isolates lacked nod and nif symbiotic genes but contained a repABC replication/maintenance region, characteristic of rhizobial plasmids, within large contigs from their draft genome sequences. These repABC sequences were related, but not identical, to repABC sequences found in symbiotic plasmids from N. galegae , suggesting that the non-symbiotic isolates have the potential to harbor symbiotic plasmids. This is the first report of non-symbiotic members of Neorhizobium from soil.
CRISPR Detection From Short Reads Using Partial Overlap Graphs.

PubMed

Ben-Bassat, Ilan; Chor, Benny

2016-06-01

Clustered regularly interspaced short palindromic repeats (CRISPR) are structured regions in bacterial and archaeal genomes, which are part of an adaptive immune system against phages. CRISPRs are important for many microbial studies and are playing an essential role in current gene editing techniques. As such, they attract substantial research interest. The exponential growth in the amount of bacterial sequence data in recent years enables the exploration of CRISPR loci in more and more species. Most of the automated tools that detect CRISPR loci rely on fully assembled genomes. However, many assemblers do not handle repetitive regions successfully. The first tool to work directly on raw sequence data is Crass, which requires reads that are long enough to contain two copies of the same repeat. We present a method to identify CRISPR repeats from raw sequence data of short reads. The algorithm is based on an observation differentiating CRISPR repeats from other types of repeats, and it involves a series of partial constructions of the overlap graph. This enables us to avoid many of the difficulties that assemblers face, as we merely aim to identify the repeats that belong to CRISPR loci. A preliminary implementation of the algorithm shows good results and detects CRISPR repeats in cases where other existing tools fail to do so.
MICA: desktop software for comprehensive searching of DNA databases

PubMed Central

Stokes, William A; Glick, Benjamin S

2006-01-01

Background Molecular biologists work with DNA databases that often include entire genomes. A common requirement is to search a DNA database to find exact matches for a nondegenerate or partially degenerate query. The software programs available for such purposes are normally designed to run on remote servers, but an appealing alternative is to work with DNA databases stored on local computers. We describe a desktop software program termed MICA (K-Mer Indexing with Compact Arrays) that allows large DNA databases to be searched efficiently using very little memory. Results MICA rapidly indexes a DNA database. On a Macintosh G5 computer, the complete human genome could be indexed in about 5 minutes. The indexing algorithm recognizes all 15 characters of the DNA alphabet and fully captures the information in any DNA sequence, yet for a typical sequence of length L, the index occupies only about 2L bytes. The index can be searched to return a complete list of exact matches for a nondegenerate or partially degenerate query of any length. A typical search of a long DNA sequence involves reading only a small fraction of the index into memory. As a result, searches are fast even when the available RAM is limited. Conclusion MICA is suitable as a search engine for desktop DNA analysis software. PMID:17018144
Evolution and Diversity of Listeria monocytogenes from Clinical and Food Samples in Shanghai, China

PubMed Central

Zhang, Jianmin; Cao, Guojie; Xu, Xuebin; Allard, Marc; Li, Peng; Brown, Eric; Yang, Xiaowei; Pan, Haijian; Meng, Jianghong

2016-01-01

Listeria monocytogenes is a significant foodborne pathogen causing severe systemic infections in humans with high mortality rates. The objectives of this work were to establish a phylogenetic framework of L. monocytogenes from China and to investigate sequence diversity among different serotypes. We selected 17 L. monocytogenes strains recovered from patients and foods in China representing serotypes 1/2a, 1/2b, and 1/2c. Draft genome sequences were determined using Illumina MiSeq technique and associated protocols. Open reading frames were assigned using prokaryotic genome annotation pipeline by NCBI. Twenty-four published genomes were included for comparative genomic and phylogenetic analysis. More than 154,000 single nucleotide polymorphisms (SNPs) were identified from multiple genome alignment and used to reconstruct maximum likelihood phylogenetic tree. The 41 genomes were differentiated into lineages I and II, which consisted of 4 and 11 subgroups, respectively. A clinical strain from China (SHL009) contained significant SNP differences compared to the rest genomes, whereas clinical strain SHL001 shared most recent common ancestor with strain SHL017 from food. Moreover, clinical strains SHL004 and SHL015 clustered together with two strains (08-5578 and 08-5923) recovered from an outbreak in Canada. Partial sequences of a plasmid found in the Canadian strain were also present in SHL004. We investigated the presence of various genes and gene clusters associated with virulence and subgroup-specific genes, including internalins, L. monocytogenes pathogenicity islands (LIPIs), L. monocytogenes genomic islands (LGIs), stress survival islet 1 (SSI-1), and clustered regularly interspaced short palindromic repeats (CRISPR)/cas system. A novel genomic island, denoted as LGI-2 was identified. Comparative sequence analysis revealed differences among the L. monocytogenes strains related to virulence, survival abilities, and attributes against foreign genetic elements. L. monocytogenes from China were genetically diverse. Strains from clinical specimens and food related closely suggesting foodborne transmission of human listeriosis. PMID:27499751
New Approaches to Attenuated Hepatitis a Vaccine Development: Cloning and Sequencing of Cell-Culture Adapted Viral cDNA

DTIC Science & Technology

1989-04-01

strain-specific identification of HAV in human fecal samples was a major aim of the original contract application, as clinical trials of live and...derived materials and human and primate fecal specimens. 4. We molecularly cloned and partially sequenced the genome of PA21 strain HAV, a virus...antibody. This approach revealed that 99% of the infectious virus particles present in disrupted cell lysates from the 23rd passage of persistently
Genome resolved analysis of a premature infant gut microbial community reveals a Varibaculum cambriense genome and a shift towards fermentation-based metabolism during the third week of life.

PubMed

Brown, Christopher T; Sharon, Itai; Thomas, Brian C; Castelle, Cindy J; Morowitz, Michael J; Banfield, Jillian F

2013-12-17

The premature infant gut has low individual but high inter-individual microbial diversity compared with adults. Based on prior 16S rRNA gene surveys, many species from this environment are expected to be similar to those previously detected in the human microbiota. However, the level of genomic novelty and metabolic variation of strains found in the infant gut remains relatively unexplored. To study the stability and function of early microbial colonizers of the premature infant gut, nine stool samples were taken during the third week of life of a premature male infant delivered via Caesarean section. Metagenomic sequences were assembled and binned into near-complete and partial genomes, enabling strain-level genomic analysis of the microbial community.We reconstructed eleven near-complete and six partial bacterial genomes representative of the key members of the microbial community. Twelve of these genomes share >90% putative ortholog amino acid identity with reference genomes. Manual curation of the assembly of one particularly novel genome resulted in the first essentially complete genome sequence (in three pieces, the order of which could not be determined due to a repeat) for Varibaculum cambriense (strain Dora), a medically relevant species that has been implicated in abscess formation.During the period studied, the microbial community undergoes a compositional shift, in which obligate anaerobes (fermenters) overtake Escherichia coli as the most abundant species. Other species remain stable, probably due to their ability to either respire anaerobically or grow by fermentation, and their capacity to tolerate fluctuating levels of oxygen. Metabolic predictions for V. cambriense suggest that, like other members of the microbial community, this organism is able to process various sugar substrates and make use of multiple different electron acceptors during anaerobic respiration. Genome comparisons within the family Actinomycetaceae reveal important differences related to respiratory metabolism and motility. Genome-based analysis provided direct insight into strain-specific potential for anaerobic respiration and yielded the first genome for the genus Varibaculum. Importantly, comparison of these de novo assembled genomes with closely related isolate genomes supported the accuracy of the metagenomic methodology. Over a one-week period, the early gut microbial community transitioned to a community with a higher representation of obligate anaerobes, emphasizing both taxonomic and metabolic instability during colonization.

Genome resolved analysis of a premature infant gut microbial community reveals a Varibaculum cambriense genome and a shift towards fermentation-based metabolism during the third week of life

PubMed Central

2013-01-01

Background The premature infant gut has low individual but high inter-individual microbial diversity compared with adults. Based on prior 16S rRNA gene surveys, many species from this environment are expected to be similar to those previously detected in the human microbiota. However, the level of genomic novelty and metabolic variation of strains found in the infant gut remains relatively unexplored. Results To study the stability and function of early microbial colonizers of the premature infant gut, nine stool samples were taken during the third week of life of a premature male infant delivered via Caesarean section. Metagenomic sequences were assembled and binned into near-complete and partial genomes, enabling strain-level genomic analysis of the microbial community. We reconstructed eleven near-complete and six partial bacterial genomes representative of the key members of the microbial community. Twelve of these genomes share >90% putative ortholog amino acid identity with reference genomes. Manual curation of the assembly of one particularly novel genome resulted in the first essentially complete genome sequence (in three pieces, the order of which could not be determined due to a repeat) for Varibaculum cambriense (strain Dora), a medically relevant species that has been implicated in abscess formation. During the period studied, the microbial community undergoes a compositional shift, in which obligate anaerobes (fermenters) overtake Escherichia coli as the most abundant species. Other species remain stable, probably due to their ability to either respire anaerobically or grow by fermentation, and their capacity to tolerate fluctuating levels of oxygen. Metabolic predictions for V. cambriense suggest that, like other members of the microbial community, this organism is able to process various sugar substrates and make use of multiple different electron acceptors during anaerobic respiration. Genome comparisons within the family Actinomycetaceae reveal important differences related to respiratory metabolism and motility. Conclusions Genome-based analysis provided direct insight into strain-specific potential for anaerobic respiration and yielded the first genome for the genus Varibaculum. Importantly, comparison of these de novo assembled genomes with closely related isolate genomes supported the accuracy of the metagenomic methodology. Over a one-week period, the early gut microbial community transitioned to a community with a higher representation of obligate anaerobes, emphasizing both taxonomic and metabolic instability during colonization. PMID:24451181
Thermodynamically optimal whole-genome tiling microarray design and validation.

PubMed

Cho, Hyejin; Chou, Hui-Hsien

2016-06-13

Microarray is an efficient apparatus to interrogate the whole transcriptome of species. Microarray can be designed according to annotated gene sets, but the resulted microarrays cannot be used to identify novel transcripts and this design method is not applicable to unannotated species. Alternatively, a whole-genome tiling microarray can be designed using only genomic sequences without gene annotations, and it can be used to detect novel RNA transcripts as well as known genes. The difficulty with tiling microarray design lies in the tradeoff between probe-specificity and coverage of the genome. Sequence comparison methods based on BLAST or similar software are commonly employed in microarray design, but they cannot precisely determine the subtle thermodynamic competition between probe targets and partially matched probe nontargets during hybridizations. Using the whole-genome thermodynamic analysis software PICKY to design tiling microarrays, we can achieve maximum whole-genome coverage allowable under the thermodynamic constraints of each target genome. The resulted tiling microarrays are thermodynamically optimal in the sense that all selected probes share the same melting temperature separation range between their targets and closest nontargets, and no additional probes can be added without violating the specificity of the microarray to the target genome. This new design method was used to create two whole-genome tiling microarrays for Escherichia coli MG1655 and Agrobacterium tumefaciens C58 and the experiment results validated the design.
Sequence diversity of wheat mosaic virus isolates.

PubMed

Stewart, Lucy R

2016-02-02

Wheat mosaic virus (WMoV), transmitted by eriophyid wheat curl mites (Aceria tosichella) is the causal agent of High Plains disease in wheat and maize. WMoV and other members of the genus Emaravirus evaded thorough molecular characterization for many years due to the experimental challenges of mite transmission and manipulating multisegmented negative sense RNA genomes. Recently, the complete genome sequence of a Nebraska isolate of WMoV revealed eight segments, plus a variant sequence of the nucleocapsid protein-encoding segment. Here, near-complete and partial consensus sequences of five more WMoV isolates are reported and compared to the Nebraska isolate: an Ohio maize isolate (GG1), a Kansas barley isolate (KS7), and three Ohio wheat isolates (H1, K1, W1). Results show two distinct groups of WMoV isolates: Ohio wheat isolate RNA segments had 84% or lower nucleotide sequence identity to the NE isolate, whereas GG1 and KS7 had 98% or higher nucleotide sequence identity to the NE isolate. Knowledge of the sequence variability of WMoV isolates is a step toward understanding virus biology, and potentially explaining observed biological variation. Published by Elsevier B.V.
Recognizing short coding sequences of prokaryotic genome using a novel iteratively adaptive sparse partial least squares algorithm

PubMed Central

2013-01-01

Background Significant efforts have been made to address the problem of identifying short genes in prokaryotic genomes. However, most known methods are not effective in detecting short genes. Because of the limited information contained in short DNA sequences, it is very difficult to accurately distinguish between protein coding and non-coding sequences in prokaryotic genomes. We have developed a new Iteratively Adaptive Sparse Partial Least Squares (IASPLS) algorithm as the classifier to improve the accuracy of the identification process. Results For testing, we chose the short coding and non-coding sequences from seven prokaryotic organisms. We used seven feature sets (including GC content, Z-curve, etc.) of short genes. In comparison with GeneMarkS, Metagene, Orphelia, and Heuristic Approachs methods, our model achieved the best prediction performance in identification of short prokaryotic genes. Even when we focused on the very short length group ([60–100 nt)), our model provided sensitivity as high as 83.44% and specificity as high as 92.8%. These values are two or three times higher than three of the other methods while Metagene fails to recognize genes in this length range. The experiments also proved that the IASPLS can improve the identification accuracy in comparison with other widely used classifiers, i.e. Logistic, Random Forest (RF) and K nearest neighbors (KNN). The accuracy in using IASPLS was improved 5.90% or more in comparison with the other methods. In addition to the improvements in accuracy, IASPLS required ten times less computer time than using KNN or RF. Conclusions It is conclusive that our method is preferable for application as an automated method of short gene classification. Its linearity and easily optimized parameters make it practicable for predicting short genes of newly-sequenced or under-studied species. Reviewers This article was reviewed by Alexey Kondrashov, Rajeev Azad (nominated by Dr J.Peter Gogarten) and Yuriy Fofanov (nominated by Dr Janet Siefert). PMID:24067167
The reduced genomes of Parcubacteria (OD1) contain signatures of a symbiotic lifestyle

PubMed Central

Nelson, William C.; Stegen, James C.

2015-01-01

Candidate phylum OD1 bacteria (also referred to as Parcubacteria) have been identified in a broad range of anoxic environments through community survey analysis. Although none of these species have been isolated in the laboratory, several genome sequences have been reconstructed from metagenomic sequence data and single-cell sequencing. The organisms have small (generally <1 Mb) genomes with severely reduced metabolic capabilities. We have reconstructed 8 partial to near-complete OD1 genomes from oxic groundwater samples, and compared them against existing genomic data. The conserved core gene set comprises 202 genes, or ~28% of the genomic complement. “Housekeeping” genes and genes for biosynthesis of peptidoglycan and Type IV pilus production are conserved. Gene sets for biosynthesis of cofactors, amino acids, nucleotides, and fatty acids are absent entirely or greatly reduced. The only aspects of energy metabolism conserved are the non-oxidative branch of the pentose-phosphate shunt and central glycolysis. These organisms also lack some activities conserved in almost all other known bacterial genomes, including signal recognition particle, pseudouridine synthase A, and FAD synthase. Pan-genome analysis indicates a broad genotypic diversity and perhaps a highly fluid gene complement, indicating historical adaptation to a wide range of growth environments and a high degree of specialization. The genomes were examined for signatures suggesting either a free-living, streamlined lifestyle, or a symbiotic lifestyle. The lack of biosynthetic capabilities and DNA repair, along with the presence of potential attachment and adhesion proteins suggest that the Parcubacteria are ectosymbionts or parasites of other organisms. The wide diversity of genes that potentially mediate cell-cell contact suggests a broad range of partner/prey organisms across the phylum. PMID:26257709
The reduced genomes of Parcubacteria (OD1) contain signatures of a symbiotic lifestyle

DOE PAGES

Nelson, William C.; Stegen, James C.

2015-07-21

Candidate phylum OD1 bacteria (also referred to as Parcubacteria) have been identified in a broad range of anoxic environments through community survey analysis. Although none of these species have been isolated in the laboratory, several genome sequences have been reconstructed from metagenomic sequence data and single-cell sequencing. The organisms have small (generally <1 Mb) genomes with severely reduced metabolic capabilities. We have reconstructed 8 partial to near-complete OD1 genomes from oxic groundwater samples, and compared them against existing genomic data. The conserved core gene set comprises 202 genes, or ~28% of the genomic complement. “Housekeeping” genes and genes for biosynthesismore » of peptidoglycan and Type IV pilus production are conserved. Gene sets for biosynthesis of cofactors, amino acids, nucleotides, and fatty acids are absent entirely or greatly reduced. The only aspects of energy metabolism conserved are the non-oxidative branch of the pentose-phosphate shunt and central glycolysis. These organisms also lack some activities conserved in almost all other known bacterial genomes, including signal recognition particle, pseudouridine synthase A, and FAD synthase. Pan-genome analysis indicates a broad genotypic diversity and perhaps a highly fluid gene complement, indicating historical adaptation to a wide range of growth environments and a high degree of specialization. The genomes were examined for signatures suggesting either a free-living, streamlined lifestyle, or a symbiotic lifestyle. The lack of biosynthetic capabilities and DNA repair, along with the presence of potential attachment and adhesion proteins suggest that the Parcubacteria are ectosymbionts or parasites of other organisms. The wide diversity of genes that potentially mediate cell-cell contact suggests a broad range of partner/prey organisms across the phylum.« less
The reduced genomes of Parcubacteria (OD1) contain signatures of a symbiotic lifestyle

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nelson, William C.; Stegen, James C.

2015-07-21

Candidate phylum OD1 bacteria (also referred to as Parcubacteria) have been identified in broad range of anoxic environments through community survey analysis. Although none of these species have been isolated in the laboratory, several genome sequences have been reconstructed from metagenomic sequence data and single-cell sequencing. The organisms have small (generally <1 Mb) genomes with severely reduced metabolic capabilities. We have reconstructed 8 partial to near-complete OD1 genomes from oxic groundwater samples, and compared them against existing genomic data. The conserved core gene set comprises 202 genes, or ~28% of the genomic complement. ‘Housekeeping’ genes and genes for biosynthesis ofmore » peptidoglycan and Type IV pilus production are conserved. Gene sets for biosynthesis of cofactors, amino acids, nucleotides and fatty acids are absent entirely or greatly reduced. The only aspects of energy metabolism conserved are the non-oxidative branch of the pentose-phosphate shunt and central glycolysis. These organisms also lack some activities conserved in almost all other known bacterial genomes, including signal recognition particle, pseudouridine synthase A, and FAD synthase. Pan-genome analysis indicates a broad genotypic diversity and perhaps a highly fluid gene complement, indicating historical adaptation to a wide range of growth environments and a high degree of specialization. The genomes were examined for signatures suggesting either a free-living, streamlined lifestyle or a symbiotic lifestyle. The lack of biosynthetic capabilities and DNA repair, along with the presence of potential attachment and adhesion proteins suggest the Parcubacteria are ectosymbionts or parasites of other organisms. The wide diversity of genes that potentially mediate cell-cell contact suggests a broad range of partner/prey organisms across the phylum.« less
Phylogenomics of "Candidatus Hepatoplasma crinochetorum," a lineage of mollicutes associated with noninsect arthropods.

PubMed

Leclercq, Sébastien; Dittmer, Jessica; Bouchon, Didier; Cordaux, Richard

2014-02-01

Bacterial gut communities of arthropods are highly diverse and tightly related to host feeding habits. However, our understanding of the origin and role of the symbionts is often hindered by the lack of genetic information. "Candidatus Hepatoplasma crinochetorum" is a Mollicutes symbiont found in the midgut glands of terrestrial isopods. The only available nucleotide sequence for this symbiont is a partial 16S rRNA gene sequence. Here, we present the 657,101 bp assembled genome of Candidatus Hepatoplasma crinochetorum isolated from the terrestrial isopod Armadillidium vulgare. While previous 16S rRNA gene-based analyses have provided inconclusive results regarding the phylogenetic position of Candidatus Hepatoplasma crinochetorum within Mollicutes, we performed a phylogenomic analysis of 127 Mollicutes orthologous genes which confidently branches the species as a sister group to the Hominis group of Mycoplasma. Several genome properties of Candidatus Hepatoplasma crinochetorum are also highlighted compared with other Mollicutes genomes, including adjacent tryptophan tRNA genes, which further our understanding of the evolutionary dynamics of these genes in Mollicutes, and the presence of a probably inactivated CRISPR/Cas system, which constitutes a testimony of past interactions between Candidatus Hepatoplasma crinochetorum and mobile genetic elements, despite their current lack in this streamlined genome. Overall, the availability of the complete genome sequence of Candidatus Hepatoplasma crinochetorum paves the way for further investigation of its ecology and evolution.
Phylogenetic relationships among superfamilies of Neritimorpha (Mollusca: Gastropoda).

PubMed

Uribe, Juan E; Colgan, Don; Castro, Lyda R; Kano, Yasunori; Zardoya, Rafael

2016-11-01

Despite the extraordinary morphological and ecological diversity of Neritimorpha, few studies have focused on the phylogenetic relationships of this lineage of gastropods, which includes four extant superfamilies: Neritopsoidea, Hydrocenoidea, Helicinoidea, and Neritoidea. Here, the nucleotide sequences of the complete mitochondrial genomes of Georissa bangueyensis (Hydrocenoidea), Neritina usnea (Neritoidea), and Pleuropoma jana (Helicinoidea) and the nearly complete mt genomes of Titiscania sp. (Neritopsoidea) and Theodoxus fluviatilis (Neritoidea) were determined. Phylogenetic reconstructions using probabilistic methods were based on mitochondrial (13 protein coding genes and two ribosomal rRNA genes), nuclear (partial 28S rRNA, 18S rRNA, actin, and histone H3 genes) and combined sequence data sets. All phylogenetic analyses except one converged on a single, highly supported tree in which Neritopsoidea was recovered as the sister group of a clade including Helicinoidea as the sister group of Hydrocenoidea and Neritoidea. This topology agrees with the fossil record and supports at least three independent invasions of land by neritimorph snails. The mitochondrial genomes of Titiscania sp., G. bangueyensis, N. usnea, and T. fluviatilis share the same gene organization previously described for Nerita mt genomes whereas that of P. jana has undergone major rearrangements. We sequenced about half of the mitochondrial genome of another species of Helicinoidea, Viana regina, and confirmed that this species shares the highly derived gene order of P. jana. Copyright © 2016 Elsevier Inc. All rights reserved.
Flavivirus and Filovirus EvoPrinters: New alignment tools for the comparative analysis of viral evolution.

PubMed

Brody, Thomas; Yavatkar, Amarendra S; Park, Dong Sun; Kuzin, Alexander; Ross, Jermaine; Odenwald, Ward F

2017-06-01

Flavivirus and Filovirus infections are serious epidemic threats to human populations. Multi-genome comparative analysis of these evolving pathogens affords a view of their essential, conserved sequence elements as well as progressive evolutionary changes. While phylogenetic analysis has yielded important insights, the growing number of available genomic sequences makes comparisons between hundreds of viral strains challenging. We report here a new approach for the comparative analysis of these hemorrhagic fever viruses that can superimpose an unlimited number of one-on-one alignments to identify important features within genomes of interest. We have adapted EvoPrinter alignment algorithms for the rapid comparative analysis of Flavivirus or Filovirus sequences including Zika and Ebola strains. The user can input a full genome or partial viral sequence and then view either individual comparisons or generate color-coded readouts that superimpose hundreds of one-on-one alignments to identify unique or shared identity SNPs that reveal ancestral relationships between strains. The user can also opt to select a database genome in order to access a library of pre-aligned genomes of either 1,094 Flaviviruses or 460 Filoviruses for rapid comparative analysis with all database entries or a select subset. Using EvoPrinter search and alignment programs, we show the following: 1) superimposing alignment data from many related strains identifies lineage identity SNPs, which enable the assessment of sublineage complexity within viral outbreaks; 2) whole-genome SNP profile screens uncover novel Dengue2 and Zika recombinant strains and their parental lineages; 3) differential SNP profiling identifies host cell A-to-I hyper-editing within Ebola and Marburg viruses, and 4) hundreds of superimposed one-on-one Ebola genome alignments highlight ultra-conserved regulatory sequences, invariant amino acid codons and evolutionarily variable protein-encoding domains within a single genome. EvoPrinter allows for the assessment of lineage complexity within Flavivirus or Filovirus outbreaks, identification of recombinant strains, highlights sequences that have undergone host cell A-to-I editing, and identifies unique input and database SNPs within highly conserved sequences. EvoPrinter's ability to superimpose alignment data from hundreds of strains onto a single genome has allowed us to identify unique Zika virus sublineages that are currently spreading in South, Central and North America, the Caribbean, and in China. This new set of integrated alignment programs should serve as a useful addition to existing tools for the comparative analysis of these viruses.
The nearly complete mitochondrial genome of a stonefly species, Styloperla sp. (Plecoptera: Styloperlidae).

PubMed

Chen, Zhi-Teng; Wu, Hai-Yan; Du, Yu-Zhou

2016-07-01

We report the nearly complete mitochondrial genome of a stonefly species, Styloperla sp. (Plecoptera: Styloperlidae), which is a circular molecule of 15,416 bp in length and consists of 13 protein-coding genes, 2 ribosomal RNAs, 20 transfer RNAs and a partial control region (645 bp). Using the 13 protein-coding genes of 8 stoneflies and 3 other related species, we constructed a phylogenetic tree to verify the accuracy of the new determined mitogenome sequences. Our results provide basic data for further study of phylogeny in Plecoptera.
Avian sarcoma virus 17 carries the jun oncogene.

PubMed Central

Maki, Y; Bos, T J; Davis, C; Starbuck, M; Vogt, P K

1987-01-01

Biologically active molecular clones of avian sarcoma virus 17 (ASV 17) contain a replication-defective proviral genome of 3.5 kilobases (kb). The genome retains partial gag and env sequences, which flank a cell-derived putative oncogene of 0.93 kb, termed jun. The jun gene lacks preserved coding domains of tyrosine-specific protein kinases. It also shows no significant nucleic acid homology with other known oncogenes. The probable transformation-specific protein in ASV 17-transformed cells is a 55-kDa gag-jun fusion product. Images PMID:3033666
[Molecular identification and detection of moon jellyfish (Aurelia sp.) based on partial sequencing of mitochondrial 16S rDNA and COI].

PubMed

Wang, Jian-Yan; Zhen, Yu; Wang, Guo-shan; Mi, Tie-Zhu; Yu, Zhi-gang

2013-03-01

Taking the moon jellyfish Aurelia sp. commonly found in our coastal sea areas as test object, its genome DNA was extracted, the partial sequences of mt-16S rDNA (650 bp) and mt-COI (709 bp) were PCR-amplified, and, after purification, cloning, and sequencing, the sequences obtained were BLASTn-analyzed. The sequences of greater difference with those of the other jellyfish were chosen, and eight specific primers for the mt-16S rDNA and mt-COI of Aurelia sp. were designed, respectively. The specificity test indicated that the primer AS3 for the mt-16S rDNA and the primer AC3 for the mt-COI were excellent in rapidly detecting the target jellyfish from Rhopilema esculentum, Nemopilema nomurai, Cyanea nozakii, Acromitus sp., and Aurelia sp., and thus, the techniques for the molecular identification and detection of moon jellyfish were preliminarily established, which could get rid of the limitations in classical morphological identification of Aurelia sp. , being able to find the Aurelia sp. in the samples more quickly and accurately.
A prospective pilot study of genome-wide exome and transcriptome profiling in patients with small cell lung cancer progressing after first-line therapy

PubMed Central

Byron, Sara A.; Aldrich, Jessica; Sangal, Ashish; Barilla, Heather; Kiefer, Jeffrey A.; Carpten, John D.; Craig, David W.; Whitsett, Timothy G.

2017-01-01

Background Small cell lung cancer (SCLC) that has progressed after first-line therapy is an aggressive disease with few effective therapeutic strategies. In this prospective study, we employed next-generation sequencing (NGS) to identify therapeutically actionable alterations to guide treatment for advanced SCLC patients. Methods Twelve patients with SCLC were enrolled after failing platinum-based chemotherapy. Following informed consent, genome-wide exome and RNA-sequencing was performed in a CLIA-certified, CAP-accredited environment. Actionable targets were identified and therapeutic recommendations made from a pharmacopeia of FDA-approved drugs. Clinical response to genomically-guided treatment was evaluated by Response Evaluation Criteria in Solid Tumors (RECIST) 1.1. Results The study completed its accrual goal of 12 evaluable patients. The minimum tumor content for successful NGS was 20%, with a median turnaround time from sample collection to genomics-based treatment recommendation of 27 days. At least two clinically actionable targets were identified in each patient, and six patients (50%) received treatment identified by NGS. Two had partial responses by RECIST 1.1 on a clinical trial involving a PD-1 inhibitor + irinotecan (indicated by MLH1 alteration). The remaining patients had clinical deterioration before NGS recommended therapy could be initiated. Conclusions Comprehensive genomic profiling using NGS identified clinically-actionable alterations in SCLC patients who progressed on initial therapy. Recommended PD-1 therapy generated partial responses in two patients. Earlier access to NGS guided therapy, along with improved understanding of those SCLC patients likely to respond to immune-based therapies, should help to extend survival in these cases with poor outcomes. PMID:28586388
An umbra-like virus of papaya discovered in Ecuador: detection, occurrence and phylogenetic relatedness

USDA-ARS?s Scientific Manuscript database

Double-stranded RNA (dsRNA) extractions from papaya leaves infected with Papaya ringspot virus (PRSV) revealed the presence of an unusual 4kb band, in addition to the presumed PRSV-associated 10kb band. Partial sequence of RT-PCR products from the 4kb dsRNA revealed homology to genomes of several me...
Characterization and use of new monoclonal antibodies to CD11c,CD14, and CD163 to analyze the phenotypic complexity of ruminantmonocyte subsets

USDA-ARS?s Scientific Manuscript database

The sequencing of the bovine genome and development of mass spectrometry, in conjunction with flow cytometry (FC), have afforded an opportunity to complete the characterization of the specificity of monoclonal antibodies (mAbs), only partially characterized during previous international workshops fo...
☆DNA assembly technique simplifies the construction of infectious clone of fowl adenovirus.

PubMed

Zou, Xiao-Hui; Bi, Zhi-Xiang; Guo, Xiao-Juan; Zhang, Zun; Zhao, Yang; Wang, Min; Zhu, Ya-Lu; Jie, Hong-Ying; Yu, Yang; Hung, Tao; Lu, Zhuo-Zhuang

2018-07-01

Plasmid bearing adenovirus genome is generally constructed with the method of homologous recombination in E. coli BJ5183 strain. Here, we utilized Gibson gene assembly technique to generate infectious clone of fowl adenovirus 4 (FAdV-4). Primers flanked with partial inverted terminal repeat (ITR) sequence of FAdV-4 were synthesized to amplify a plasmid backbone containing kanamycin-resistant gene and pBR322 origin (KAN-ORI). DNA assembly was carried out by combining the KAN-ORI fragment, virus genomic DNA and DNA assembly master mix. E. coli competent cells were transformed with the assembled product, and plasmids (pKFAV4) were extracted and confirmed to contain viral genome by restriction analysis and sequencing. Virus was successfully rescued from linear pKFAV4-transfected chicken LMH cells. This approach was further verified in cloning of human adenovirus 5 genome. Our results indicated that DNA assembly technique simplified the construction of infectious clone of adenovirus, suggesting its possible application in virus traditional or reverse genetics. Copyright © 2018 Elsevier B.V. All rights reserved.
Genome projects and the functional-genomic era.

PubMed

Sauer, Sascha; Konthur, Zoltán; Lehrach, Hans

2005-12-01

The problems we face today in public health as a result of the -- fortunately -- increasing age of people and the requirements of developing countries create an urgent need for new and innovative approaches in medicine and in agronomics. Genomic and functional genomic approaches have a great potential to at least partially solve these problems in the future. Important progress has been made by procedures to decode genomic information of humans, but also of other key organisms. The basic comprehension of genomic information (and its transfer) should now give us the possibility to pursue the next important step in life science eventually leading to a basic understanding of biological information flow; the elucidation of the function of all genes and correlative products encoded in the genome, as well as the discovery of their interactions in a molecular context and the response to environmental factors. As a result of the sequencing projects, we are now able to ask important questions about sequence variation and can start to comprehensively study the function of expressed genes on different levels such as RNA, protein or the cell in a systematic context including underlying networks. In this article we review and comment on current trends in large-scale systematic biological research. A particular emphasis is put on technology developments that can provide means to accomplish the tasks of future lines of functional genomics.
Bigfoot. a new family of MITE elements characterized from the Medicago genus.

PubMed

Charrier, B; Foucher, F; Kondorosi, E; d'Aubenton-Carafa, Y; Thermes, C; Kondorosi, A; Ratet, P

1999-05-01

We have characterized from the legume plant Medicago a new family of miniature inverted-repeat transposable elements (MITE), called the Bigfoot transposable elements. Two of these insertion elements are present only in a single allele of two different M. sativa genes. Using a PCR strategy we have isolated 19 other Bigfoot elements from the M. sativa and M. truncatula genomes. They differ from the previously characterized MITEs by their sequence, a target site of 9 bp and a partially clustered genomic distribution. In addition, we show that they exhibit a significantly stable secondary structure. These elements may represent up to 0.1% of the genome of the outcrossing Medicago sativa but are present at a reduced copy number in the genome of the autogamous M. truncatula plant, revealing major differences in the genome organization of these two plants.
Molecular and physiological properties of bacteriophages from North America and Germany affecting the fire blight pathogen Erwinia amylovora

PubMed Central

Müller, Ina; Lurz, Rudi; Kube, Michael; Quedenau, Claudia; Jelkmann, Wilhelm; Geider, Klaus

2011-01-01

Summary For possible control of fire blight affecting apple and pear trees, we characterized Erwinia amylovora phages from North America and Germany. The genome size determined by electron microscopy (EM) was confirmed by sequence data and major coat proteins were identified from gel bands by mass spectroscopy. By their morphology from EM data, φEa1h and φEa100 were assigned to the Podoviridae and φEa104 and φEa116 to the Myoviridae. Host ranges were essentially confined to E. amylovora, strains of the species Erwinia pyrifoliae, E. billingiae and even Pantoea stewartii were partially sensitive. The phages φEa1h and φEa100 were dependent on the amylovoran capsule of E. amylovora, φEa104 and φEa116 were not. The Myoviridae efficiently lysed their hosts and protected apple flowers significantly better than the Podoviridae against E. amylovora and should be preferred in biocontrol experiments. We have also isolated and partially characterized E. amylovora phages from apple orchards in Germany. They belong to the Podoviridae or Myoviridae with a host range similar to the phages isolated in North America. In EM measurements, the genome sizes of the Podoviridae were smaller than the genomes of the Myoviridae from North America and from Germany, which differed from each other in corresponding nucleotide sequences. PMID:21791029

Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver.

PubMed

Wymant, Chris; Blanquart, François; Golubchik, Tanya; Gall, Astrid; Bakker, Margreet; Bezemer, Daniela; Croucher, Nicholas J; Hall, Matthew; Hillebregt, Mariska; Ong, Swee Hoe; Ratmann, Oliver; Albert, Jan; Bannert, Norbert; Fellay, Jacques; Fransen, Katrien; Gourlay, Annabelle; Grabowski, M Kate; Gunsenheimer-Bartmeyer, Barbara; Günthard, Huldrych F; Kivelä, Pia; Kouyos, Roger; Laeyendecker, Oliver; Liitsola, Kirsi; Meyer, Laurence; Porter, Kholoud; Ristola, Matti; van Sighem, Ard; Berkhout, Ben; Cornelissen, Marion; Kellam, Paul; Reiss, Peter; Fraser, Christophe

2018-01-01

Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user's choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver's constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from https://github.com/ChrisHIV/shiver.
Viruses of invasive Argentine ants from the European Main supercolony: characterization, interactions and evolution.

PubMed

Viljakainen, Lumi; Holmberg, Ida; Abril, Sílvia; Jurvansuu, Jaana

2018-06-25

The Argentine ant (Linepithema humile) is a highly invasive pest, yet very little is known about its viruses. We analysed individual RNA-sequencing data from 48 Argentine ant queens to identify and characterisze their viruses. We discovered eight complete RNA virus genomes - all from different virus families - and one putative partial entomopoxvirus genome. Seven of the nine virus sequences were found from ant samples spanning 7 years, suggesting that these viruses may cause long-term infections within the super-colony. Although all nine viruses successfully infect Argentine ants, they have very different characteristics, such as genome organization, prevalence, loads, activation frequencies and rates of evolution. The eight RNA viruses constituted in total 23 different virus combinations which, based on statistical analysis, were non-random, suggesting that virus compatibility is a factor in infections. We also searched for virus sequences from New Zealand and Californian Argentine ant RNA-sequencing data and discovered that many of the viruses are found on different continents, yet some viruses are prevalent only in certain colonies. The viral loads described here most probably present a normal asymptomatic level of infection; nevertheless, detailed knowledge of Argentine ant viruses may enable the design of viral biocontrol methods against this pest.
Construction and sequencing of an infectious clone of the goose embryo-adapted Muscovy duck parvovirus vaccine strain FZ91-30.

PubMed

Wang, Jianye; Huang, Yu; Zhou, Mingxu; Hardwidge, Philip R; Zhu, Guoqiang

2016-06-21

Muscovy duck parvovirus (MDPV) is the etiological agent of Muscovy duckling parvoviral disease, which is characterized by diarrhea, locomotive dysfunction, stunting, and death in young ducklings, and causes substantial economic losses in the Muscovy duck industry worldwide. FZ91-30 is an attenuated vaccine strain that is safe and immunogenic to ducklings, but the genomic information and molecular mechanism underlining the attenuation are not understood. The FZ91-30 strain was propagated in 11-day-old embryonated goose eggs, and viral particles were purified from the pooled allantoic fluid by differential centrifugation and ultracentrifugation. Single-stranded genomic DNA was extracted and annealed to form double-stranded DNA. The dsDNA digested with NcoI resulted two sub-genomic fragments, which were then cloned into the modified plasmid pBluescript II SK, respectively, generating plasmid pBSKNL and pBSKNR. The sub-genomic plasmid clones were sequenced and further combined to construct the plasmid pFZ that contained the entire genome of strain FZ91-30. The complete genome sequences of strain FM and YY and partial genome sequences of other strains were retrieved from GenBank for sequence comparison. The plasmid pFZ containing the entire genome of FZ91-30 was transfected in 11-day-old embryonated goose eggs via the chorioallantoic membranes route to rescue infectious virus. A genetic marker was introduced into the rescued virus to discriminate from its parental virus. The genome of FZ91-30 consists of 5,131 nucleotides and has 98.9 % similarity to the FM strain. The inverted terminal repeats (ITR) are 456 nucleotides in length, 14 nucleotides longer than that of Goose parvovirus (GPV). The exterior 415 nucleotides of the ITR form a hairpin structure, and the interior 41 nucleotides constitute the D sequence, a reverse complement of the D' sequence at the 3' ITR. Amino acid sequence alignment of the VP1 proteins between FZ91-30 and five pathogenic MDPV strains revealed that FZ91-30 had five mutations; two in the unique region of the VP1 protein (VP1u) and three in VP3. Sequence alignment of the Rep1 proteins revealed two amino acid alterations for FZ91-30, both of which were conserved for two pathogenic strains YY and P. Transfection of the plasmid pFZ in 11-day-old embryonated goose eggs resulted in generation of infectious virus with similar biological properties as compared with the parental strain. The amino acid mutations identified in the VP1 and Rep1 protein may contribute to the attenuation of FZ91-30 in Muscovy ducklings. Plasmid transfection in embryonated goose eggs was suitable for rescue of infectious MDPV.
Amalga-like virus infecting Antonospora locustae, a microsporidian pathogen of grasshoppers, plus related viruses associated with other arthropods.

PubMed

Pyle, Jesse D; Keeling, Patrick J; Nibert, Max L

2017-04-02

A previously reported Expressed Sequence Tag (EST) library from spores of microsporidian Antonospora locustae includes a number of clones with sequence similarities to plant amalgaviruses. Reexamining the sequence accessions from that library, we found additional such clones, contributing to a 3247-nt contig that approximates the length of an amalga-like virus genome. Using A. locustae spores stored from that previous study, and new ones obtained from the same source, we newly visualized the putative dsRNA genome of this virus and obtained amplicons yielding a 3387-nt complete genome sequence. Phylogenetic analyses suggested it as prototype strain of a new genus in family Amalgaviridae. The genome contains two partially overlapping long ORFs, with downstream ORF2 in the +1 frame relative to ORF1 and a proposed motif for +1 ribosomal frameshifting in the region of overlap. Subsequent database searches using the predicted fusion protein sequence of this new amalga-like virus identified related sequences in the transcriptome of a basal hexapod, the springtail species Tetrodontophora bielanensis. We speculate that this second new amalga-like virus (contig length, 3475 nt) likely also derived from a microsporidian, or related organism, which was associated with the springtail specimens at the time of sampling for transcriptome analysis. Other findings of interest include evidence that the ORF1 translation products of these two new amalga-like viruses contain a central region of predicted α-helical coiled coil, as recently reported for plant amalgaviruses, and transcriptome-based evidence for another new amalga-like virus in the transcriptome of another basal hexapod, the two-pronged bristletail species Campodea augens. Copyright © 2017 Elsevier B.V. All rights reserved.
Prevalence and genome characteristics of canine astrovirus in southwest China.

PubMed

Li, Mingxiang; Yan, Nan; Ji, Conghui; Wang, Min; Zhang, Bin; Yue, Hua; Tang, Cheng

2018-05-30

The aim of this study was to investigate canine astrovirus (CaAstV) infection in southwest China. We collected 107 faecal samples from domestic dogs with obvious diarrhoea. Forty-two diarrhoeic samples (39.3 %) were positive for CaAstV by RT-PCR, and 41/42 samples showed co-infection with canine coronavirus (CCoV), canine parvovirus-2 (CPV-2) and canine distemper virus (CDV). Phylogenetic analysis based on 26 CaAstV partial ORF1a and ORF1b sequences revealed that most CaAstV strains showed unique evolutionary features. Interestingly, putative recombination events were observed among four of the five complete ORF2 sequences cloned in this study, and three of the five complete ORF2 sequences formed a single unique group, suggesting that these strains could be a novel genotype. We successfully sequenced the complete genome of one CaAstV strain (designated 2017/44/CHN), which was 6628 nt in length. The features of this genome include putative recombination events in the ORF1a, ORF1b and ORF2 genes, while the ORF2 gene had a continuous insertion of 7 aa in region II compared with the other complete ORF2 sequences available in GenBank. Phylogenetic analysis showed that 2017/44/CHN formed a single group based on genome sequences, suggesting that this strain might be a novel genotype. The results of this study revealed that CaAstV circulates widely in diarrhoeic dogs in southwest China and exhibits unique evolutionary events. To the best of our knowledge, this is the first report of recombination events in CaAstV, and it contributes to further understanding of the genetic evolution of CaAstV.
The genetic map of finger millet, Eleusine coracana.

PubMed

Dida, Mathews M; Srinivasachary; Ramakrishnan, Sujatha; Bennetzen, Jeffrey L; Gale, Mike D; Devos, Katrien M

2007-01-01

Restriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP), expressed-sequenced tag (EST), and simple sequence repeat (SSR) markers were used to generate a genetic map of the tetraploid finger millet (Eleusine coracana subsp. coracana) genome (2n = 4x = 36). Because levels of variation in finger millet are low, the map was generated in an inter-subspecific F(2) population from a cross between E. coracana subsp. coracana cv. Okhale-1 and its wild progenitor E. coracana subsp. africana acc. MD-20. Duplicated loci were used to identify homoeologous groups. Assignment of linkage groups to the A and B genome was done by comparing the hybridization patterns of probes in Okhale-1, MD-20, and Eleusine indica acc. MD-36. E. indica is the A genome donor to E. coracana. The maps span 721 cM on the A genome and 787 cM on the B genome and cover all 18 finger millet chromosomes, at least partially. To facilitate the use of marker-assisted selection in finger millet, a first set of 82 SSR markers was developed. The SSRs were identified in small-insert genomic libraries generated using methylation-sensitive restriction enzymes. Thirty-one of the SSRs were mapped. Application of the maps and markers in hybridization-based breeding programs will expedite the improvement of finger millet.
Metaxa: a software tool for automated detection and discrimination among ribosomal small subunit (12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in metagenomes and environmental sequencing datasets.

PubMed

Bengtsson, Johan; Eriksson, K Martin; Hartmann, Martin; Wang, Zheng; Shenoy, Belle Damodara; Grelet, Gwen-Aëlle; Abarenkov, Kessy; Petri, Anna; Rosenblad, Magnus Alm; Nilsson, R Henrik

2011-10-01

The ribosomal small subunit (SSU) rRNA gene has emerged as an important genetic marker for taxonomic identification in environmental sequencing datasets. In addition to being present in the nucleus of eukaryotes and the core genome of prokaryotes, the gene is also found in the mitochondria of eukaryotes and in the chloroplasts of photosynthetic eukaryotes. These three sets of genes are conceptually paralogous and should in most situations not be aligned and analyzed jointly. To identify the origin of SSU sequences in complex sequence datasets has hitherto been a time-consuming and largely manual undertaking. However, the present study introduces Metaxa ( http://microbiology.se/software/metaxa/ ), an automated software tool to extract full-length and partial SSU sequences from larger sequence datasets and assign them to an archaeal, bacterial, nuclear eukaryote, mitochondrial, or chloroplast origin. Using data from reference databases and from full-length organelle and organism genomes, we show that Metaxa detects and scores SSU sequences for origin with very low proportions of false positives and negatives. We believe that this tool will be useful in microbial and evolutionary ecology as well as in metagenomics.
Molecular phylogeny of some avian species using Cytochrome b gene sequence analysis

PubMed Central

Awad, A; Khalil, S. R; Abd-Elhakim, Y. M

2015-01-01

Veritable identification and differentiation of avian species is a vital step in conservative, taxonomic, forensic, legal and other ornithological interventions. Therefore, this study involved the application of molecular approach to identify some avian species i.e. Chicken (Gallus gallus), Muskovy duck (Cairina moschata), Japanese quail (Coturnix japonica), Laughing dove (Streptopelia senegalensis), and Rock pigeon (Columba livia). Genomic DNA was extracted from blood samples and partial sequence of the mitochondrial cytochrome b gene (358 bp) was amplified and sequenced using universal primers. Sequences alignment and phylogenetic analyses were performed by CLC main workbench program. The obtained five sequences were deposited in GenBank and compared with those previously registered in GenBank. The similarity percentage was 88.60% between Gallus gallus and Coturnix japonica and 80.46% between Gallus gallus and Columba livia. The percentage of identity between the studied species and GenBank species ranged from 77.20% (Columba oenas and Anas platyrhynchos) to 100% (Gallus gallus and Gallus sonneratii, Coturnix coturnix and Coturnix japonica, Meleagris gallopavo and Columba livia). Amplification of the partial sequence of mitochondrial cytochrome b gene proved to be practical for identification of an avian species unambiguously. PMID:27175180
Genome of turbot rhabdovirus exhibits unusual non-coding regions and an additional ORF that could be expressed in fish cell.

PubMed

Zhu, Ruo-Lin; Lei, Xiao-Ying; Ke, Fei; Yuan, Xiu-Ping; Zhang, Qi-Ya

2011-02-01

Genomic sequence of Scophthalmus maximus rhabdovirus (SMRV) isolated from diseased turbot has been characterized. The complete genome of SMRV comprises 11,492 nucleotides and encodes five typical rhabdovirus genes N, P, M, G and L. In addition, two open reading frames (ORF) are predicted overlapping with P gene, one upstream of P and smaller than P (temporarily called Ps), and another in P gene which may encodes a protein similar to the vesicular stomatitis virus C protein. The C ORF is contained within the P ORF. The five typical proteins share the highest sequence identities (48.9%) with the corresponding proteins of rhabdoviruses in genus Vesiculovirus. Phylogenetic analysis of partial L protein sequence indicates that SMRV is close to genus Vesiculovirus. The first 13 nucleotides at the ends of the SMRV genome are absolutely inverse complementarity. The gene junctions between the five genes show conserved polyadenylation signal (CATGA(7)) and intergenic dinucleotide (CT) followed by putative transcription initiation sequence A(A/G)(C/G)A(A/G/T), which are different from known rhabdoviruses. The entire Ps ORF was cloned and expressed, and used to generate polyclonal antibody in mice. One obvious band could be detected in SMRV-infected carp leucocyte cells (CLCs) by anti-Ps/C serum via Western blot, and the subcellular localization of Ps-GFP fusion protein exhibited cytoplasm distribution as multiple punctuate or doughnut shaped foci of uneven size. Copyright Â© 2010 Elsevier B.V. All rights reserved.
Novel rod-shaped viruses isolated from garlic, Allium sativum, possessing a unique genome organization.

PubMed

Sumi, S; Tsuneyoshi, T; Furutani, H

1993-09-01

Rod-shaped flexuous viruses were partially purified from garlic plants (Allium sativum) showing typical mosaic symptoms. The genome was shown to be composed of RNA with a poly(A) tail of an estimated size of 10 kb as shown by denaturing agarose gel electrophoresis. We constructed cDNA libraries and screened four independent clones, which were designated GV-A, GV-B, GV-C and GV-D, using Northern and Southern blot hybridization. Nucleotide sequence determination of the cDNAs, two of which correspond to nearly one-third of the virus genomic RNA, shows that all of these viruses possess an identical genomic structure and that also at least four proteins are encoded in the viral cDNA, their M(r)s being estimated to be 15K, 27K, 40K and 11K. The 15K open reading frame (ORF) encodes the core-like sequence of a zinc finger protein preceded by a cluster of basic amino acid residues. The 27K ORF probably encodes the viral coat protein (CP), based on both the existence of some conserved sequences observed in many other rod-shaped or flexuous virus CPs and an overall amino acid sequence similarity to potexvirus and carlavirus CPs. The 11K ORF shows significant amino acid sequence similarities to the corresponding 12K proteins of the potexviruses and carlaviruses. On the other hand, the 40K ORF product does not resemble any other plant virus gene products reported so far. The genomic organization in the 3' region of the garlic viruses resembles, but clearly differs from, that of carlaviruses. Phylogenetic analysis based upon the amino acid sequence of the viral capsid protein also indicates that the garlic viruses have a unique and distinct domain different from those of the potexvirus and carlavirus groups. The results suggest that the garlic viruses described here belong to an unclassified and new virus group closely related to the carlaviruses.
The Complete Genome Phylogeny of Geographically Distinct Dengue Virus Serotype 2 Isolates (1944-2013) Supports Further Groupings within the Cosmopolitan Genotype

PubMed Central

Ali, Akhtar; Ali, Ijaz

2015-01-01

Dengue virus serotype 2 (DENV-2) isolates have been implicated in deadly outbreaks of dengue fever (DF) and dengue hemorrhagic fever (DHF) in several regions of the world. Phylogenetic analysis of DENV-2 isolates collected from particular countries has been performed using partial or individual genes but only a few studies have examined complete whole-genome sequences collected worldwide. Herein, 50 complete genome sequences of DENV-2 isolates, reported over the past 70 years from 19 different countries, were downloaded from GenBank. Phylogenetic analysis was conducted and evolutionary distances of the 50 DENV-2 isolates were determined using maximum likelihood (ML) trees or Bayesian phylogenetic analysis created from complete genome nucleotide (nt) and amino acid (aa) sequences or individual gene sequences. The results showed that all DENV-2 isolates fell into seven main groups containing five previously defined genotypes. A Cosmopolitan genotype showed further division into three groups (C-I, C-II, and C-III) with the C-I group containing two subgroups (C-IA and C-IB). Comparison of the aa sequences showed specific mutations among the various groups of DENV-2 isolates. A maximum number of aa mutations was observed in the NS5 gene, followed by the NS2A, NS3 and NS1 genes, while the smallest number of aa substitutions was recorded in the capsid gene, followed by the PrM/M, NS4A, and NS4B genes. Maximum evolutionary distances were found in the NS2A gene, followed by the NS4A and NS4B genes. Based on these results, we propose that genotyping of DENV-2 isolates in future studies should be performed on entire genome sequences in order to gain a complete understanding of the evolution of various isolates reported from different geographical locations around the world. PMID:26414178
Evaluating whole genome sequence data from the Genetic Absence Epilepsy Rat from Strasbourg and its related non-epileptic strain

PubMed Central

Powell, Kim L.; Zhu, Mingfu; Campbell, C. Ryan; Maia, Jessica M.; Ren, Zhong; Jones, Nigel C.; O’Brien, Terence J.; Petrovski, Slavé

2017-01-01

Objective The Genetic Absence Epilepsy Rats from Strasbourg (GAERS) are an inbreed Wistar rat strain widely used as a model of genetic generalised epilepsy with absence seizures. As in humans, the genetic architecture that results in genetic generalized epilepsy in GAERS is poorly understood. Here we present the strain-specific variants found among the epileptic GAERS and their related Non-Epileptic Control (NEC) strain. The GAERS and NEC represent a powerful opportunity to identify neurobiological factors that are associated with the genetic generalised epilepsy phenotype. Methods We performed whole genome sequencing on adult epileptic GAERS and adult NEC rats, a strain derived from the same original Wistar colony. We also generated whole genome sequencing on four double-crossed (GAERS with NEC) F2 selected for high-seizing (n = 2) and non-seizing (n = 2) phenotypes. Results Specific to the GAERS genome, we identified 1.12 million single nucleotide variants, 296.5K short insertion-deletions, and 354 putative copy number variants that result in complete or partial loss/duplication of 41 genes. Of the GAERS-specific variants that met high quality criteria, 25 are annotated as stop codon gain/loss, 56 as putative essential splice sites, and 56 indels are predicted to result in a frameshift. Subsequent screening against the two F2 progeny sequenced for having the highest and two F2 progeny for having the lowest seizure burden identified only the selected Cacna1h GAERS-private protein-coding variant as exclusively co-segregating with the two high-seizing F2 rats. Significance This study highlights an approach for using whole genome sequencing to narrow down to a manageable candidate list of genetic variants in a complex genetic epilepsy animal model, and suggests utility of this sequencing design to investigate other spontaneously occurring animal models of human disease. PMID:28708842
Complete genome sequencing and phylogenetic analysis of dengue type 1 virus isolated from Jeddah, Saudi Arabia.

PubMed

Azhar, Esam I; Hashem, Anwar M; El-Kafrawy, Sherif A; Abol-Ela, Said; Abd-Alla, Adly M M; Sohrab, Sayed Sartaj; Farraj, Suha A; Othman, Norah A; Ben-Helaby, Huda G; Ashshi, Ahmed; Madani, Tariq A; Jamjoom, Ghazi

2015-01-16

Dengue viruses (DENVs) are mosquito-borne viruses which can cause disease ranging from mild fever to severe dengue infection. These viruses are endemic in several tropical and subtropical regions. Multiple outbreaks of DENV serotypes 1, 2 and 3 (DENV-1, DENV-2 and DENV-3) have been reported from the western region in Saudi Arabia since 1994. Strains from at least two genotypes of DENV-1 (Asia and America/Africa genotypes) have been circulating in western Saudi Arabia until 2006. However, all previous studies reported from Saudi Arabia were based on partial sequencing data of the envelope (E) gene without any reports of full genome sequences for any DENV serotypes circulating in Saudi Arabia. Here, we report the isolation and the first complete genome sequence of a DENV-1 strain (DENV-1-Jeddah-1-2011) isolated from a patient from Jeddah, Saudi Arabia in 2011. Whole genome sequence alignment and phylogenetic analysis showed high similarity between DENV-1-Jeddah-1-2011 strain and D1/H/IMTSSA/98/606 isolate (Asian genotype) reported from Djibouti in 1998. Further analysis of the full envelope gene revealed a close relationship between DENV-1-Jeddah-1-2011 strain and isolates reported between 2004-2006 from Jeddah as well as recent isolates from Somalia, suggesting the widespread of the Asian genotype in this region. These data suggest that strains belonging to the Asian genotype might have been introduced into Saudi Arabia long before 2004 most probably by African pilgrims and continued to circulate in western Saudi Arabia at least until 2011. Most importantly, these results indicate that pilgrims from dengue endemic regions can play an important role in the spread of new DENVs in Saudi Arabia and the rest of the world. Therefore, availability of complete genome sequences would serve as a reference for future epidemiological studies of DENV-1 viruses.
Improved purification, crystallization and primary structure of pyruvate:ferredoxin oxidoreductase from Halobacterium halobium.

PubMed

Plaga, W; Lottspeich, F; Oesterhelt, D

1992-04-01

An improved purification procedure, including nickel chelate affinity chromatography, is reported which resulted in a crystallizable pyruvate:ferredoxin oxidoreductase preparation from Halobacterium halobium. Crystals of the enzyme were obtained using potassium citrate as the precipitant. The genes coding for pyruvate:ferredoxin oxidoreductase were cloned and their nucleotide sequences determined. The genes of both subunits were adjacent to one another on the halobacterial genome. The derived amino acid sequences were confirmed by partial primary structure analysis of the purified protein. The structural motif of thiamin-diphosphate-binding enzymes was unequivocally located in the deduced amino acid sequence of the small subunit.
"Maxillary lateral incisor partial anodontia sequence": a clinical entity with epigenetic origin.

PubMed

Consolaro, Alberto; Cardoso, Maurício Almeida; Consolaro, Renata Bianco

2017-01-01

The relationship between maxillary lateral incisor anodontia and the palatal displacement of unerupted maxillary canines cannot be considered as a multiple tooth abnormality with defined genetic etiology in order to be regarded as a "syndrome". Neither were the involved genes identified and located in the human genome, nor was it presumed on which chromosome the responsible gene would be located. The palatal maxillary canine displacement in cases of partial anodontia of the maxillary lateral incisor is potentially associated with environmental changes caused by its absence in its place of formation and eruption, which would characterize an epigenetic etiology. The lack of the maxillary lateral incisor in the canine region means removing one of the reference guides for the eruptive trajectory of the maxillary canine, which would therefore, not erupt and /or impact on the palate. Consequently, and in sequence, it would lead to malocclusion, maxillary atresia, transposition, prolonged retention of the deciduous canine and resorption in the neighboring teeth. Thus, we can say that we are dealing with a set of anomalies and multiple sequential changes known as sequential development anomalies or, simply, sequence. Once the epigenetics and sequential condition is accepted for this clinical picture, it could be called "Maxillary Lateral Incisor Partial Anodontia Sequence."
DNA Remodeling by Strict Partial Endoreplication in Orchids, an Original Process in the Plant Kingdom

PubMed Central

Brown, Spencer C.; Bourge, Mickaël; Maunoury, Nicolas; Wong, Maurice; Wolfe Bianchi, Michele; Lepers-Andrzejewski, Sandra; Besse, Pascale; Siljak-Yakovlev, Sonja

2017-01-01

DNA remodeling during endoreplication appears to be a strong developmental characteristic in orchids. In this study, we analyzed DNA content and nuclei in 41 species of orchids to further map the genome evolution in this plant family. We demonstrate that the DNA remodeling observed in 36 out of 41 orchids studied corresponds to strict partial endoreplication. Such process is developmentally regulated in each wild species studied. Cytometry data analyses allowed us to propose a model where nuclear states 2C, 4E, 8E, etc. form a series comprising a fixed proportion, the euploid genome 2C, plus 2–32 additional copies of a complementary part of the genome. The fixed proportion ranged from 89% of the genome in Vanilla mexicana down to 19% in V. pompona, the lowest value for all 148 orchids reported. Insterspecific hybridization did not suppress this phenomenon. Interestingly, this process was not observed in mass-produced epiphytes. Nucleolar volumes grow with the number of endocopies present, coherent with high transcription activity in endoreplicated nuclei. Our analyses suggest species-specific chromatin rearrangement. Towards understanding endoreplication, V. planifolia constitutes a tractable system for isolating the genomic sequences that confer an advantage via endoreplication from those that apparently suffice at diploid level. PMID:28419219
Genetic Characterization of a Panel of Diverse HIV-1 Isolates at Seven International Sites

PubMed Central

Chen, Yue; Sanchez, Ana M.; Sabino, Ester; Hunt, Gillian; Ledwaba, Johanna; Hackett, John; Swanson, Priscilla; Hewlett, Indira; Ragupathy, Viswanath; Vikram Vemula, Sai; Zeng, Peibin; Tee, Kok-Keng; Chow, Wei Zhen; Ji, Hezhao; Sandstrom, Paul; Denny, Thomas N.; Busch, Michael P.; Gao, Feng

2016-01-01

HIV-1 subtypes and drug resistance are routinely tested by many international surveillance groups. However, results from different sites often vary. A systematic comparison of results from multiple sites is needed to determine whether a standardized protocol is required for consistent and accurate data analysis. A panel of well-characterized HIV-1 isolates (N = 50) from the External Quality Assurance Program Oversight Laboratory (EQAPOL) was assembled for evaluation at seven international sites. This virus panel included seven subtypes, six circulating recombinant forms (CRFs), nine unique recombinant forms (URFs) and three group O viruses. Seven viruses contained 10 major drug resistance mutations (DRMs). HIV-1 isolates were prepared at a concentration of 107 copies/ml and compiled into blinded panels. Subtypes and DRMs were determined with partial or full pol gene sequences by conventional Sanger sequencing and/or Next Generation Sequencing (NGS). Subtype and DRM results were reported and decoded for comparison with full-length genome sequences generated by EQAPOL. The partial pol gene was amplified by RT-PCR and sequenced for 89.4%-100% of group M viruses at six sites. Subtyping results of majority of the viruses (83%-97.9%) were correctly determined for the partial pol sequences. All 10 major DRMs in seven isolates were detected at these six sites. The complete pol gene sequence was also obtained by NGS at one site. However, this method missed six group M viruses and sequences contained host chromosome fragments. Three group O viruses were only characterized with additional group O-specific RT-PCR primers employed by one site. These results indicate that PCR protocols and subtyping tools should be standardized to efficiently amplify diverse viruses and more consistently assign virus genotypes, which is critical for accurate global subtype and drug resistance surveillance. Targeted NGS analysis of partial pol sequences can serve as an alternative approach, especially for detection of low-abundance DRMs. PMID:27314585
Genetic Characterization of a Panel of Diverse HIV-1 Isolates at Seven International Sites.

PubMed

Hora, Bhavna; Keating, Sheila M; Chen, Yue; Sanchez, Ana M; Sabino, Ester; Hunt, Gillian; Ledwaba, Johanna; Hackett, John; Swanson, Priscilla; Hewlett, Indira; Ragupathy, Viswanath; Vikram Vemula, Sai; Zeng, Peibin; Tee, Kok-Keng; Chow, Wei Zhen; Ji, Hezhao; Sandstrom, Paul; Denny, Thomas N; Busch, Michael P; Gao, Feng

2016-01-01

HIV-1 subtypes and drug resistance are routinely tested by many international surveillance groups. However, results from different sites often vary. A systematic comparison of results from multiple sites is needed to determine whether a standardized protocol is required for consistent and accurate data analysis. A panel of well-characterized HIV-1 isolates (N = 50) from the External Quality Assurance Program Oversight Laboratory (EQAPOL) was assembled for evaluation at seven international sites. This virus panel included seven subtypes, six circulating recombinant forms (CRFs), nine unique recombinant forms (URFs) and three group O viruses. Seven viruses contained 10 major drug resistance mutations (DRMs). HIV-1 isolates were prepared at a concentration of 107 copies/ml and compiled into blinded panels. Subtypes and DRMs were determined with partial or full pol gene sequences by conventional Sanger sequencing and/or Next Generation Sequencing (NGS). Subtype and DRM results were reported and decoded for comparison with full-length genome sequences generated by EQAPOL. The partial pol gene was amplified by RT-PCR and sequenced for 89.4%-100% of group M viruses at six sites. Subtyping results of majority of the viruses (83%-97.9%) were correctly determined for the partial pol sequences. All 10 major DRMs in seven isolates were detected at these six sites. The complete pol gene sequence was also obtained by NGS at one site. However, this method missed six group M viruses and sequences contained host chromosome fragments. Three group O viruses were only characterized with additional group O-specific RT-PCR primers employed by one site. These results indicate that PCR protocols and subtyping tools should be standardized to efficiently amplify diverse viruses and more consistently assign virus genotypes, which is critical for accurate global subtype and drug resistance surveillance. Targeted NGS analysis of partial pol sequences can serve as an alternative approach, especially for detection of low-abundance DRMs.
CID-miRNA: A web server for prediction of novel miRNA precursors in human genome

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tyagi, Sonika; Vaz, Candida; Gupta, Vipin

2008-08-08

microRNAs (miRNA) are a class of non-protein coding functional RNAs that are thought to regulate expression of target genes by direct interaction with mRNAs. miRNAs have been identified through both experimental and computational methods in a variety of eukaryotic organisms. Though these approaches have been partially successful, there is a need to develop more tools for detection of these RNAs as they are also thought to be present in abundance in many genomes. In this report we describe a tool and a web server, named CID-miRNA, for identification of miRNA precursors in a given DNA sequence, utilising secondary structure-based filteringmore » systems and an algorithm based on stochastic context free grammar trained on human miRNAs. CID-miRNA analyses a given sequence using a web interface, for presence of putative miRNA precursors and the generated output lists all the potential regions that can form miRNA-like structures. It can also scan large genomic sequences for the presence of potential miRNA precursors in its stand-alone form. The web server can be accessed at (http://mirna.jnu.ac.in/cidmirna/)« less
Near Full-Length Identification of a Novel HIV-1 CRF01_AE/B/C Recombinant in Northern Myanmar.

PubMed

Zhou, Yan-Heng; Chen, Xin; Liang, Yue-Bo; Pang, Wei; Qin, Wei-Hong; Zhang, Chiyu; Zheng, Yong-Tang

2015-08-01

The Myanmar-China border appears to be the "hot spot" region for the occurrence of HIV-1 recombination. The majority of the previous analyses of HIV-1 recombination were based on partial genomic sequences, which obviously cannot reflect the reality of the genetic diversity of HIV-1 in this area well. Here, we present a near full-length characterization of a novel HIV-1 CRF01_AE/B/C recombinant isolated from a long-distance truck driver in Northern Myanmar. It is the first description of a near full-length genomic sequence in Myanmar since 2003, and might be one of the most complicated HIV-1 chimeras ever detected in Myanmar, containing four CRF01_AE, six B segments, and five C segments separated by 14 breakpoints throughout its genome. The discovery and characterization of this new CRF01_AE/B/C recombinant indicate that intersubtype recombination is ongoing in Myanmar, continuously generating new forms of HIV-1. More work based on near full-length sequence analyses is urgently needed to better understand the genetic diversity of HIV-1 in these regions.

Cloning, genomic organization, and chromosomal localization of human citrate transport protein to the DiGeorge/velocardiofacial syndrome minimal critical region

DOE Office of Scientific and Technical Information (OSTI.GOV)

Goldmuntz, E.; Budarf, M.L.; Wang, Zhili

1996-04-15

DiGeorge syndrome (DGS) and velocardiofacial syndrome have been shown to be associated with microdeletions of chromosomal region 22q11. More recently, patients with conotruncal anomaly face syndrome and some nonsyndromic patients with isolated forms of conotruncal cardiac defects have been found to have 22q11 microdeletions as well. The commonly deleted region, called the DiGeorge chromosomal region (DGCR), spans approximately 1.2 mb and is estimated to contain at least 30 genes. We report a computational approach for gene identification that makes use of large-scale sequencing of cosmids from a contig spanning the DGCR. Using this methodology, we have mapped the human homologmore » of a rodent citrate transport protein to the DGCR. We have isolated a partial cDNA containing the complete open reading frame and have determined the genomic structure by comparing the genomic sequence from the cosmid to the sequence of the cDNA clone. Whether the citrate transport protein can be implicated in the biological etiology of DGS or other 22q11 microdeletion syndromes remains to be defined. 36 refs., 3 figs., 1 tab.« less
Using microarrays to identify positional candidate genes for QTL: the case study of ACTH response in pigs.

PubMed

Jouffe, Vincent; Rowe, Suzanne; Liaubet, Laurence; Buitenhuis, Bart; Hornshøj, Henrik; SanCristobal, Magali; Mormède, Pierre; de Koning, D J

2009-07-16

Microarray studies can supplement QTL studies by suggesting potential candidate genes in the QTL regions, which by themselves are too large to provide a limited selection of candidate genes. Here we provide a case study where we explore ways to integrate QTL data and microarray data for the pig, which has only a partial genome sequence. We outline various procedures to localize differentially expressed genes on the pig genome and link this with information on published QTL. The starting point is a set of 237 differentially expressed cDNA clones in adrenal tissue from two pig breeds, before and after treatment with adrenocorticotropic hormone (ACTH). Different approaches to localize the differentially expressed (DE) genes to the pig genome showed different levels of success and a clear lack of concordance for some genes between the various approaches. For a focused analysis on 12 genes, overlapping QTL from the public domain were presented. Also, differentially expressed genes underlying QTL for ACTH response were described. Using the latest version of the draft sequence, the differentially expressed genes were mapped to the pig genome. This enabled co-location of DE genes and previously studied QTL regions, but the draft genome sequence is still incomplete and will contain many errors. A further step to explore links between DE genes and QTL at the pathway level was largely unsuccessful due to the lack of annotation of the pig genome. This could be improved by further comparative mapping analyses but this would be time consuming. This paper provides a case study for the integration of QTL data and microarray data for a species with limited genome sequence information and annotation. The results illustrate the challenges that must be addressed but also provide a roadmap for future work that is applicable to other non-model species.
Further insight into genetic variation and haplotype diversity of Cherry virus A from China

PubMed Central

Candresse, Thierry; He, Zhen; Li, Shifang; Ma, Yuxin

2017-01-01

Cherry virus A (CVA) infection appears to be prevalent in cherry plantations worldwide. In this study, the diversity of CVA isolates from 31 cherry samples collected from different orchards around Bohai Bay in northeastern China was analyzed. The complete genome of one of these isolates, ChYT52, was found to be 7,434 nt in length excluding the poly (A) tail. It shares between 79.9–98.7% identity with CVA genome sequences in GenBank, while its RdRp core is more divergent (79.1–90.7% nt identity), likely as a consequence of a recombination event. Phylogenetic analysis of ChYT52 genome with CVA genomes in Genbank resulted in at least 7 major clusters plus additional 5 isolates alone at the end of long branches suggesting the existence of further phylogroups diversity in CVA. The genetic diversity of Chinese CVA isolates from 31 samples and GenBank sequences were analyzed in three genomic regions that correspond to the coat protein, the RNA-dependent RNA polymerase core region, and the movement protein genes. With few exceptions likely representing further recombination impact, the trees various trees are largely congruent, indicating that each region provides valuable phylogenetic information. In all cases, the majority of the Chinese CVA isolates clustering in phylogroup I, together with the X82547 reference sequence from Germany. Statistically significant negative values were obtained for Tajima’s D in the three genes for phylogroup I, suggesting that it may be undergoing a period of expansion. There was considerable haplotype diversity in the individual samples and more than half samples contained genetically diverse haplotypes belonging to different phylogroups. In addition, a number of statistically significant recombination events were detected in CVA genomes or in the partial genomic sequences indicating an important contribution of recombination to CVA evolution. This work provides a foundation for elucidation of the epidemiological characteristics and evolutionary history of CVA populations. PMID:29020049
Genome sequence analysis of predicted polyprenol reductase gene from mangrove plant kandelia obovata

NASA Astrophysics Data System (ADS)

Basyuni, M.; Sagami, H.; Baba, S.; Oku, H.

2018-03-01

It has been previously reported that dolichols but not polyprenols were predominated in mangrove leaves and roots. Therefore, the occurrence of larger amounts of dolichol in leaves of mangrove plants implies that polyprenol reductase is responsible for the conversion of polyprenol to dolichol may be active in mangrove leaves. Here we report the early assessment of probably polyprenol reductase gene from genome sequence of mangrove plant Kandelia obovata. The functional assignment of the gene was based on a homology search of the sequences against the non-redundant (nr) peptide database of NCBI using Blastx. The degree of sequence identity between DNA sequence and known polyprenol reductase was confirmed using the Blastx probability E-value, total score, and identity. The genome sequence data resulted in three partial sequences, termed c23157 (700 bp), c23901 (960 bp), and c24171 (531 bp). The c23157 gene showed the highest similarity (61%) to predicted polyprenol reductase 2- like from Gossypium raimondii with E-value 2e-100. The second gene was c23901 to exhibit high similarity (78%) to the steroid 5-alpha-reductase Det2 from J. curcas with E-value 2e-140. Furthermore, the c24171 gene depicted highest similarity (79%) to the polyprenol reductase 2 isoform X1 from Jatropha curcas with E- value 7e-21.The present study suggested that the c23157, c23901, and c24171, genes may encode predicted polyprenol reductase. The c23157, c23901, c24171 are therefore the new type of predicted polyprenol reductase from K. obovata.
Analysis of cellulose synthase genes from domesticated apple identifies collinear genes WDR53 and CesA8A: partial co-expression, bicistronic mRNA, and alternative splicing of CESA8A

PubMed Central

Guerriero, Gea; Spadiut, Oliver; Kerschbamer, Christine; Giorno, Filomena; Baric, Sanja; Ezcurra, Inés

2016-01-01

Cellulose synthase (CesA) genes constitute a complex multigene family with six major phylogenetic clades in angiosperms. The recently sequenced genome of domestic apple, Malus×domestica, was mined for CesA genes, by blasting full-length cellulose synthase protein (CESA) sequences annotated in the apple genome against protein databases from the plant models Arabidopsis thaliana and Populus trichocarpa. Thirteen genes belonging to the six angiosperm CesA clades and coding for proteins with conserved residues typical of processive glycosyltransferases from family 2 were detected. Based on their phylogenetic relationship to Arabidopsis CESAs, as well as expression patterns, a nomenclature is proposed to facilitate further studies. Examination of their genomic organization revealed that MdCesA8-A is closely linked and co-oriented with WDR53, a gene coding for a WD40 repeat protein. The WDR53 and CesA8 genes display conserved collinearity in dicots and are partially co-expressed in the apple xylem. Interestingly, the presence of a bicistronic WDR53–CesA8A transcript was detected in phytoplasma-infected phloem tissues of apple. The bicistronic transcript contains a spliced intergenic sequence that is predicted to fold into hairpin structures typical of internal ribosome entry sites, suggesting its potential cap-independent translation. Surprisingly, the CesA8A cistron is alternatively spliced and lacks the zinc-binding domain. The possible roles of WDR53 and the alternatively spliced CESA8 variant during cellulose biosynthesis in M.×domestica are discussed. PMID:23048131
Characterization of sour cherry isolates of plum pox virus from the Volga Basin in Russia reveals a new cherry strain of the virus.

PubMed

Glasa, Miroslav; Prikhodko, Yuri; Predajňa, Lukáš; Nagyová, Alžbeta; Shneyder, Yuri; Zhivaeva, Tatiana; Subr, Zdeno; Cambra, Mariano; Candresse, Thierry

2013-09-01

Plum pox virus (PPV) is the causal agent of sharka, the most detrimental virus disease of stone fruit trees worldwide. PPV isolates have been assigned into seven distinct strains, of which PPV-C regroups the genetically distinct isolates detected in several European countries on cherry hosts. Here, three complete and several partial genomic sequences of PPV isolates from sour cherry trees in the Volga River basin of Russia have been determined. The comparison of complete genome sequences has shown that the nucleotide identity values with other PPV isolates reached only 77.5 to 83.5%. Phylogenetic analyses clearly assigned the RU-17sc, RU-18sc, and RU-30sc isolates from cherry to a distinct cluster, most closely related to PPV-C and, to a lesser extent, PPV-W. Based on their natural infection of sour cherry trees and genomic characterization, the PPV isolates reported here represent a new strain of PPV, for which the name PPV-CR (Cherry Russia) is proposed. The unique amino acids conserved among PPV-CR and PPV-C cherry-infecting isolates (75 in total) are mostly distributed within the central part of P1, NIa, and the N terminus of the coat protein (CP), making them potential candidates for genetic determinants of the ability to infect cherry species or of adaptation to these hosts. The variability observed within 14 PPV-CR isolates analyzed in this study (0 to 2.6% nucleotide divergence in partial CP sequences) and the identification of these isolates in different localities and cultivation conditions suggest the efficient establishment and competitiveness of the PPV-CR in the environment. A specific primer pair has been developed, allowing the specific reverse-transcription polymerase chain reaction detection of PPV-CR isolates.
Ecology of uncultured Prochlorococcus clades revealed through single-cell genomics and biogeographic analysis

PubMed Central

Malmstrom, Rex R; Rodrigue, Sébastien; Huang, Katherine H; Kelly, Libusha; Kern, Suzanne E; Thompson, Anne; Roggensack, Sara; Berube, Paul M; Henn, Matthew R; Chisholm, Sallie W

2013-01-01

Prochlorococcus is the numerically dominant photosynthetic organism throughout much of the world's oceans, yet little is known about the ecology and genetic diversity of populations inhabiting tropical waters. To help close this gap, we examined natural Prochlorococcus communities in the tropical Pacific Ocean using a single-cell whole-genome amplification and sequencing. Analysis of the gene content of just 10 single cells from these waters added 394 new genes to the Prochlorococcus pan-genome—that is, genes never before seen in a Prochlorococcus cell. Analysis of marker genes, including the ribosomal internal transcribed sequence, from dozens of individual cells revealed several representatives from two uncultivated clades of Prochlorococcus previously identified as HNLC1 and HNLC2. While the HNLC clades can dominate Prochlorococcus communities under certain conditions, their overall geographic distribution was highly restricted compared with other clades of Prochlorococcus. In the Atlantic and Pacific oceans, these clades were only found in warm waters with low Fe and high inorganic P levels. Genomic analysis suggests that at least one of these clades thrives in low Fe environments by scavenging organic-bound Fe, a process previously unknown in Prochlorococcus. Furthermore, the capacity to utilize organic-bound Fe appears to have been acquired horizontally and may be exchanged among other clades of Prochlorococcus. Finally, one of the single Prochlorococcus cells sequenced contained a partial genome of what appears to be a prophage integrated into the genome. PMID:22895163
Inhibition of colorectal cancer genomic copy number alterations and chromosomal fragile site tumor suppressor FHIT and WWOX deletions by DNA mismatch repair

PubMed Central

Gelincik, Ozkan; Blecua, Pedro; Edelmann, Winfried; Kucherlapati, Raju; Zhou, Kathy; Jasin, Maria; Gümüş, Zeynep H.; Lipkin, Steven M.

2017-01-01

Homologous recombination (HR) enables precise DNA repair after DNA double strand breaks (DSBs) using identical sequence templates, whereas homeologous recombination (HeR) uses only partially homologous sequences. Homeologous recombination introduces mutations through gene conversion and genomic deletions through single-strand annealing (SSA). DNA mismatch repair (MMR) inhibits HeR, but the roles of mammalian MMR MutL homologues (MLH1, PMS2 and MLH3) proteins in HeR suppression are poorly characterized. Here, we demonstrate that mouse embryonic fibroblasts (MEFs) carrying Mlh1, Pms2, and Mlh3 mutations have higher HeR rates, by using 7,863 uniquely mapping paired direct repeat sequences (DRs) in the mouse genome as endogenous gene conversion and SSA reporters. Additionally, when DSBs are induced by gamma-radiation, Mlh1, Pms2 and Mlh3 mutant MEFs have higher DR copy number alterations (CNAs), including DR CNA hotspots previously identified in mouse MMR-deficient colorectal cancer (dMMR CRC). Analysis of The Cancer Genome Atlas CRC data revealed that dMMR CRCs have higher genome-wide DR HeR rates than MMR proficient CRCs, and that dMMR CRCs have deletion hotspots in tumor suppressors FHIT/WWOX at chromosomal fragile sites FRA3B and FRA16D (which have elevated DSB rates) flanked by paired homologous DRs and inverted repeats (IR). Overall, these data provide novel insights into the MMR-dependent HeR inhibition mechanism and its role in tumor suppression. PMID:29069730
The nucleotide sequence of Watermelon mosaic virus (WMV, Potyvirus) reveals interspecific recombination between two related potyviruses in the 5' part of the genome.

PubMed

Desbiez, C; Lecoq, H

2004-08-01

Watermelon mosaic virus (WMV, Potyvirus) is a potyvirus with a worldwide distribution, mostly in temperate and mediterranean regions. According to the partial sequences that were available, WMV appeared to share high sequence similarity with Soybean mosaic virus (SMV), and it was almost considered as a strain of SMV in spite of its different and much broader host range. Like SMV, it was also related to legume-infecting potyviruses belonging to the " Bean common mosaic virus (BCMV) subgroup". In this paper we obtained the full-length sequence of WMV, and we confirmed that this virus is very closely related to SMV in most of its genome; however, there is evidence for an interspecific recombination in the P1 protein, as the P1 of WMV was 135 amino-acids longer than that of SMV, and the N-terminal half of the P1 showed no relation to SMV but was 85% identical to BCMV. This suggests that WMV has emerged through an ancestral recombination event, and supports the distinction of WMV and SMV as separate taxonomic units.
Ecdysozoan mitogenomics: evidence for a common origin of the legged invertebrates, the Panarthropoda.

PubMed

Rota-Stabelli, Omar; Kayal, Ehsan; Gleeson, Dianne; Daub, Jennifer; Boore, Jeffrey L; Telford, Maximilian J; Pisani, Davide; Blaxter, Mark; Lavrov, Dennis V

2010-07-12

Ecdysozoa is the recently recognized clade of molting animals that comprises the vast majority of extant animal species and the most important invertebrate model organisms--the fruit fly and the nematode worm. Evolutionary relationships within the ecdysozoans remain, however, unresolved, impairing the correct interpretation of comparative genomic studies. In particular, the affinities of the three Panarthropoda phyla (Arthropoda, Onychophora, and Tardigrada) and the position of Myriapoda within Arthropoda (Mandibulata vs. Myriochelata hypothesis) are among the most contentious issues in animal phylogenetics. To elucidate these relationships, we have determined and analyzed complete or nearly complete mitochondrial genome sequences of two Tardigrada, Hypsibius dujardini and Thulinia sp. (the first genomes to date for this phylum); one Priapulida, Halicryptus spinulosus; and two Onychophora, Peripatoides sp. and Epiperipatus biolleyi; and a partial mitochondrial genome sequence of the Onychophora Euperipatoides kanagrensis. Tardigrada mitochondrial genomes resemble those of the arthropods in term of the gene order and strand asymmetry, whereas Onychophora genomes are characterized by numerous gene order rearrangements and strand asymmetry variations. In addition, Onychophora genomes are extremely enriched in A and T nucleotides, whereas Priapulida and Tardigrada are more balanced. Phylogenetic analyses based on concatenated amino acid coding sequences support a monophyletic origin of the Ecdysozoa and the position of Priapulida as the sister group of a monophyletic Panarthropoda (Tardigrada plus Onychophora plus Arthropoda). The position of Tardigrada is more problematic, most likely because of long branch attraction (LBA). However, experiments designed to reduce LBA suggest that the most likely placement of Tardigrada is as a sister group of Onychophora. The same analyses also recover monophyly of traditionally recognized arthropod lineages such as Arachnida and of the highly debated clade Mandibulata.
Ecdysozoan Mitogenomics: Evidence for a Common Origin of the Legged Invertebrates, the Panarthropoda

PubMed Central

Rota-Stabelli, Omar; Kayal, Ehsan; Gleeson, Dianne; Daub, Jennifer; Boore, Jeffrey L.; Telford, Maximilian J.; Pisani, Davide; Blaxter, Mark; Lavrov, Dennis V.

2010-01-01

Ecdysozoa is the recently recognized clade of molting animals that comprises the vast majority of extant animal species and the most important invertebrate model organisms—the fruit fly and the nematode worm. Evolutionary relationships within the ecdysozoans remain, however, unresolved, impairing the correct interpretation of comparative genomic studies. In particular, the affinities of the three Panarthropoda phyla (Arthropoda, Onychophora, and Tardigrada) and the position of Myriapoda within Arthropoda (Mandibulata vs. Myriochelata hypothesis) are among the most contentious issues in animal phylogenetics. To elucidate these relationships, we have determined and analyzed complete or nearly complete mitochondrial genome sequences of two Tardigrada, Hypsibius dujardini and Thulinia sp. (the first genomes to date for this phylum); one Priapulida, Halicryptus spinulosus; and two Onychophora, Peripatoides sp. and Epiperipatus biolleyi; and a partial mitochondrial genome sequence of the Onychophora Euperipatoides kanagrensis. Tardigrada mitochondrial genomes resemble those of the arthropods in term of the gene order and strand asymmetry, whereas Onychophora genomes are characterized by numerous gene order rearrangements and strand asymmetry variations. In addition, Onychophora genomes are extremely enriched in A and T nucleotides, whereas Priapulida and Tardigrada are more balanced. Phylogenetic analyses based on concatenated amino acid coding sequences support a monophyletic origin of the Ecdysozoa and the position of Priapulida as the sister group of a monophyletic Panarthropoda (Tardigrada plus Onychophora plus Arthropoda). The position of Tardigrada is more problematic, most likely because of long branch attraction (LBA). However, experiments designed to reduce LBA suggest that the most likely placement of Tardigrada is as a sister group of Onychophora. The same analyses also recover monophyly of traditionally recognized arthropod lineages such as Arachnida and of the highly debated clade Mandibulata. PMID:20624745
A Dynamic Tandem Repeat in Monocotyledons Inferred from a Comparative Analysis of Chloroplast Genomes in Melanthiaceae.

PubMed

Do, Hoang Dang Khoa; Kim, Joo-Hwan

2017-01-01

Chloroplast genomes (cpDNA) are highly valuable resources for evolutionary studies of angiosperms, since they are highly conserved, are small in size, and play critical roles in plants. Slipped-strand mispairing (SSM) was assumed to be a mechanism for generating repeat units in cpDNA. However, research on the employment of different small repeated sequences through SSM events, which may induce the accumulation of distinct types of repeats within the same region in cpDNA, has not been documented. Here, we sequenced two chloroplast genomes from the endemic species Heloniopsis tubiflora (Korea) and Xerophyllum tenax (USA) to cover the gap between molecular data and explore "hot spots" for genomic events in Melanthiaceae. Comparative analysis of 23 complete cpDNA sequences revealed that there were different stages of deletion in the rps16 region across the Melanthiaceae. Based on the partial or complete loss of rps16 gene in cpDNA, we have firstly reported potential molecular markers for recognizing two sections ( Veratrum and Fuscoveratrum ) of Veratrum . Melathiaceae exhibits a significant change in the junction between large single copy and inverted repeat regions, ranging from trnH_GUG to a part of rps3 . Our results show an accumulation of tandem repeats in the rpl23-ycf2 regions of cpDNAs. Small conserved sequences exist and flank tandem repeats in further observation of this region across most of the examined taxa of Liliales. Therefore, we propose three scenarios in which different small repeated sequences were used during SSM events to generate newly distinct types of repeats. Occasionally, prior to the SSM process, point mutation event and double strand break repair occurred and induced the formation of initial repeat units which are indispensable in the SSM process. SSM may have likely occurred more frequently for short repeats than for long repeat sequences in tribe Parideae (Melanthiaceae, Liliales). Collectively, these findings add new evidence of dynamic results from SSM in chloroplast genomes which can be useful for further evolutionary studies in angiosperms. Additionally, genomics events in cpDNA are potential resources for mining molecular markers in Liliales.
Contrasting evolutionary genome dynamics between domesticated and wild yeasts

PubMed Central

Yue, Jia-Xing; Li, Jing; Aigrain, Louise; Hallin, Johan; Persson, Karl; Oliver, Karen; Bergström, Anders; Coupland, Paul; Warringer, Jonas; Lagomarsino, Marco Consentino; Fischer, Gilles; Durbin, Richard; Liti, Gianni

2017-01-01

Structural rearrangements have long been recognized as an important source of genetic variation with implications in phenotypic diversity and disease, yet their detailed evolutionary dynamics remain elusive. Here, we use long-read sequencing to generate end-to-end genome assemblies for 12 strains representing major subpopulations of the partially domesticated yeast Saccharomyces cerevisiae and its wild relative Saccharomyces paradoxus. These population-level high-quality genomes with comprehensive annotation allow for the first time a precise definition of chromosomal boundaries between cores and subtelomeres and a high-resolution view of evolutionary genome dynamics. In chromosomal cores, S. paradoxus exhibits faster accumulation of balanced rearrangements (inversions, reciprocal translocations and transpositions) whereas S. cerevisiae accumulates unbalanced rearrangements (novel insertions, deletions and duplications) more rapidly. In subtelomeres, both species show extensive interchromosomal reshuffling, with a higher tempo in S. cerevisiae. Such striking contrasts between wild and domesticated yeasts likely reflect the influence of human activities on structural genome evolution. PMID:28416820
Mapping the yeast genome by melting in nanofluidic devices

NASA Astrophysics Data System (ADS)

Welch, Robert L.; Czolkos, Ilja; Sladek, Rob; Reisner, Walter

2012-02-01

Optical mapping of DNA provides large-scale genomic information that can be used to assemble contigs from next-generation sequencing, and to detect re-arrangements between single cells. A recent optical mapping technique called denaturation mapping has the unique advantage of using physical principles rather than the action of enzymes to probe genomic structure. The absence of reagents or reaction steps makes denaturation mapping simpler than other protocols. Denaturation mapping uses fluorescence microscopy to image the pattern of partial melting along a DNA molecule extended in a channel of cross-section ˜100nm at the heart of a nanofluidic device. We successfully aligned melting maps from single DNA molecules to a theoretical map of the yeast genome (11.6Mbp) to identify their location. By aligning hundreds of molecules we assembled a consensus melting map of the yeast genome with 95% coverage.
Identification of a novel aviadenovirus, designated pigeon adenovirus 2 in domestic pigeons (Columba livia).

PubMed

Teske, L; Rubbenstroth, D; Meixner, M; Liere, K; Bartels, H; Rautenschlein, S

2017-01-02

The young pigeon disease syndrome (YPDS) affects mainly young pigeons of less than one year of age and leads to crop stasis, vomitus, diarrhea, anorexia and occasionally death. This disease is internationally a major health problem because of its seasonal appearance during competitions such as homing pigeon races or exhibitions of ornamental birds. While the etiology of YPDS is still unclear, adenoviruses are frequently discussed as potential causative agents. Electron microscopy of feces from a YPDS outbreak revealed massive shedding of adenovirus-like particles. Whole genome sequencing of this sample identified a novel adenovirus tentatively named pigeon adenovirus 2 (PiAdV-2). Phylogenetic and comparative genome analysis suggest PiAdV-2 to belong to a new species within the genus Aviadenovirus, for which we propose the name Pigeon aviadenovirus B. The PiAdV-2 genome shares 54.9% nucleotide sequence identity with pigeon adenovirus 1 (PiAdV-1). In a screening of further YPDS-affected flocks two variants of PiAdV-2 (variant A and B) were detected which shared 97.6% nucleotide identity of partial polymerase sequences, but only 79.7% nucleotide identity of partial hexon sequences. The distribution of both PiAdV-2 variants was further investigated in fecal samples collected between 2008 and 2015 from healthy or YPDS-affected racing pigeons of different lofts. Independent of their health status, approximately 20% of young and 13% of adult pigeon flocks harbored PiAdV-2 variants. Birds were free of PiAdV-1 or other aviadenoviruses as determined by PCRs targeting the aviadenovirus polymerase or the PiAdV-1 fiber gene, respectively. In conclusion, there is no indication of a correlation between YPDS outbreaks and the presence of PiAdV-2 or other aviadenoviruses, arguing against an causative role in this disease complex. Copyright Â© 2016 Elsevier B.V. All rights reserved.
Emergence of Vaccine-derived Polioviruses, Democratic Republic of Congo, 2004–2011

PubMed Central

Lentsoane, Olivia; Burns, Cara C.; Pallansch, Mark; de Gourville, Esther; Yogolelo, Riziki; Muyembe-Tamfum, Jean Jacques; Puren, Adrian; Schoub, Barry D.; Venter, Marietjie

2013-01-01

Polioviruses isolated from 70 acute flaccid paralysis patients from the Democratic Republic of Congo (DRC) during 2004–2011 were characterized and found to be vaccine-derived type 2 polioviruses (VDPV2s). Partial genomic sequencing of the isolates revealed nucleotide sequence divergence of up to 3.5% in the viral protein 1 capsid region of the viral genome relative to the Sabin vaccine strain. Genetic analysis identified at least 7 circulating lineages localized to specific geographic regions. Multiple independent events of VDPV2 emergence occurred throughout DRC during this 7-year period. During 2010–2011, VDPV2 circulation in eastern DRC occurred in an area distinct from that of wild poliovirus circulation, whereas VDPV2 circulation in the southwestern part of DRC (in Kasai Occidental) occurred within the larger region of wild poliovirus circulation. PMID:24047933
Emergence of vaccine-derived polioviruses, Democratic Republic of Congo, 2004-2011.

PubMed

Gumede, Nicksy; Lentsoane, Olivia; Burns, Cara C; Pallansch, Mark; de Gourville, Esther; Yogolelo, Riziki; Muyembe-Tamfum, Jean Jacques; Puren, Adrian; Schoub, Barry D; Venter, Marietjie

2013-10-01

Polioviruses isolated from 70 acute flaccid paralysis patients from the Democratic Republic of Congo (DRC) during 2004-2011 were characterized and found to be vaccine-derived type 2 polioviruses (VDPV2s). Partial genomic sequencing of the isolates revealed nucleotide sequence divergence of up to 3.5% in the viral protein 1 capsid region of the viral genome relative to the Sabin vaccine strain. Genetic analysis identified at least 7 circulating lineages localized to specific geographic regions. Multiple independent events of VDPV2 emergence occurred throughout DRC during this 7-year period. During 2010-2011, VDPV2 circulation in eastern DRC occurred in an area distinct from that of wild poliovirus circulation, whereas VDPV2 circulation in the southwestern part of DRC (in Kasai Occidental) occurred within the larger region of wild poliovirus circulation.
Degenerate RNA packaging signals in the genome of Satellite Tobacco Necrosis Virus: implications for the assembly of a T=1 capsid.

PubMed

Bunka, David H J; Lane, Stephen W; Lane, Claire L; Dykeman, Eric C; Ford, Robert J; Barker, Amy M; Twarock, Reidun; Phillips, Simon E V; Stockley, Peter G

2011-10-14

Using a recombinant, T=1 Satellite Tobacco Necrosis Virus (STNV)-like particle expressed in Escherichia coli, we have established conditions for in vitro disassembly and reassembly of the viral capsid. In vivo assembly is dependent on the presence of the coat protein (CP) N-terminal region, and in vitro assembly requires RNA. Using immobilised CP monomers under reassembly conditions with "free" CP subunits, we have prepared a range of partially assembled CP species for RNA aptamer selection. SELEX directed against the RNA-binding face of the STNV CP resulted in the isolation of several clones, one of which (B3) matches the STNV-1 genome in 16 out of 25 nucleotide positions, including across a statistically significant 10/10 stretch. This 10-base region folds into a stem-loop displaying the motif ACAA and has been shown to bind to STNV CP. Analysis of the other aptamer sequences reveals that the majority can be folded into stem-loops displaying versions of this motif. Using a sequence and secondary structure search motif to analyse the genomic sequence of STNV-1, we identified 30 stem-loops displaying the sequence motif AxxA. The implication is that there are many stem-loops in the genome carrying essential recognition features for binding STNV CP. Secondary structure predictions of the genomic RNA using Mfold showed that only 8 out of 30 of these stem-loops would be formed in the lowest-energy structure. These results are consistent with an assembly mechanism based on kinetically driven folding of the RNA. Copyright © 2011 Elsevier Ltd. All rights reserved.
Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL): adapting the Partial Phylogenetic Profiling algorithm to scan sequences for signatures that predict protein function

PubMed Central

2010-01-01

Background Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets. Results Here we illustrate and validate the ability of the SIMBAL method to find functionally relevant short sequence signatures by application to two well-characterized protein families. In the first example, we partitioned a family of ABC permeases using a metabolic background property (urea utilization). Thus, the TRUE set for this family comprised members whose genome of origin encoded a urea utilization system. By moving a sliding window across the sequence of a permease, and searching each subsequence in turn against the full set of partitioned proteins, the method found which local sequence signatures best correlated with the urea utilization trait. Mapping of SIMBAL "hot spots" onto crystal structures of homologous permeases reveals that the significant sites are gating determinants on the cytosolic face rather than, say, docking sites for the substrate-binding protein on the extracellular face. In the second example, we partitioned a protein methyltransferase family using gene proximity as a criterion. In this case, the TRUE set comprised those methyltransferases encoded near the gene for the substrate RF-1. SIMBAL identifies sequence regions that map onto the substrate-binding interface while ignoring regions involved in the methyltransferase reaction mechanism in general. Neither method for training set construction requires any prior experimental characterization. Conclusions SIMBAL shows that, in functionally divergent protein families, selected short sequences often significantly outperform their full-length parent sequence for making functional predictions by sequence similarity, suggesting avenues for improved functional classifiers. When combined with structural data, SIMBAL affords the ability to localize and model functional sites. PMID:20102603
Fast Dissemination of New HIV-1 CRF02/A1 Recombinants in Pakistan

PubMed Central

Chen, Yue; Hora, Bhavna; DeMarco, Todd; Shah, Sharaf Ali; Ahmed, Manzoor; Sanchez, Ana M.; Su, Chang; Carter, Meredith; Stone, Mars; Hasan, Rumina; Hasan, Zahra; Busch, Michael P.; Denny, Thomas N.; Gao, Feng

2016-01-01

A number of HIV-1 subtypes are identified in Pakistan by characterization of partial viral gene sequences. Little is known whether new recombinants are generated and how they disseminate since whole genome sequences for these viruses have not been characterized. Near full-length genome (NFLG) sequences were obtained by amplifying two overlapping half genomes or next generation sequencing from 34 HIV-1-infected individuals in Pakistan. Phylogenetic tree analysis showed that the newly characterized sequences were 16 subtype As, one subtype C, and 17 A/G recombinants. Further analysis showed that all 16 subtype A1 sequences (47%), together with the vast majority of sequences from Pakistan from other studies, formed a tight subcluster (A1a) within the subtype A1 clade, suggesting that they were derived from a single introduction. More in-depth analysis of 17 A/G NFLG sequences showed that five shared similar recombination breakpoints as in CRF02 (15%) but were phylogenetically distinct from the prototype CRF02 by forming a tight subcluster (CRF02a) while 12 (38%) were new recombinants between CRF02a and A1a or a divergent A1b viruses. Unique recombination patterns among the majority of the newly characterized recombinants indicated ongoing recombination. Interestingly, recombination breakpoints in these CRF02/A1 recombinants were similar to those in prototype CRF02 viruses, indicating that recombination at these sites more likely generate variable recombinant viruses. The dominance and fast dissemination of new CRF02a/A1 recombinants over prototype CRF02 suggest that these recombinant have more adapted and may become major epidemic strains in Pakistan. PMID:27973597

Random chromosome elimination in synthetic Triticum-Aegilops amphiploids leads to development of a stable partial amphiploid with high grain micro- and macronutrient content and powdery mildew resistance.

PubMed

Tiwari, Vijay K; Rawat, Nidhi; Neelam, Kumari; Kumar, Sundip; Randhawa, Gursharn S; Dhaliwal, Harcharan S

2010-12-01

Synthetic amphiploids are the immortal sources for studies on crop evolution, genome dissection, and introgression of useful variability from related species. Cytological analysis of synthetic decaploid wheat (Triticum aestivum L.) - Aegilops kotschyi Boiss. amphiploids (AABBDDUkUkSkSk) showed some univalents from the C1 generation onward followed by chromosome elimination. Most of the univalents came to metaphase I plate after the reductional division of paired chromosomes and underwent equational division leading to their elimination through laggards and micronuclei. Substantial variation in the chromosome number of pollen mother cells from different tillers, spikelets, and anthers of some plants also indicated somatic chromosome elimination. Genomic in situ hybridization, fluorescence in situ hybridization, and simple sequence repeat markers analysis of two amphiploids with reduced chromosomes indicated random chromosome elimination of various genomes with higher sensitivity of D followed by the Sk and Uk genomes to elimination, whereas 1D chromosome was preferentially eliminated in both the amphiploids investigated. One of the partial amphiploids, C4 T. aestivum 'Chinese Spring' - Ae. kotschyi 396 (2n = 58), with 34 T. aestivum, 14 Uk, and 10 Sk had stable meiosis and high fertility. The partial amphiploids with white glumes, bold seeds, and tough rachis with high grain macro- and micronutrients and resistance to powdery mildew could be used for T. aestivum biofortification and transfer of powdery mildew resistance.
A new technique in reference based DNA sequence compression algorithm: Enabling partial decompression

NASA Astrophysics Data System (ADS)

Banerjee, Kakoli; Prasad, R. A.

2014-10-01

The whole gamut of Genetic data is ever increasing exponentially. The human genome in its base format occupies almost thirty terabyte of data and doubling its size every two and a half year. It is well-know that computational resources are limited. The most important resource which genetic data requires in its collection, storage and retrieval is its storage space. Storage is limited. Computational performance is also dependent on storage and execution time. Transmission capabilities are also directly dependent on the size of the data. Hence Data compression techniques become an issue of utmost importance when we confront with the task of handling such giganticdatabases like GenBank. Decompression is also an issue when such huge databases are being handled. This paper is intended not only to provide genetic data compression but also partially decompress the genetic sequences.
A novel rhabdovirus, related to Merida virus, in field-collected mosquitoes from Anatolia and Thrace.

PubMed

Ergünay, Koray; Brinkmann, Annika; Litzba, Nadine; Günay, Filiz; Kar, Sırrı; Öter, Kerem; Örsten, Serra; Sarıkaya, Yasemen; Alten, Bülent; Nitsche, Andreas; Linton, Yvonne-Marie

2017-07-01

Next-generation sequencing technologies have significantly facilitated the discovery of novel viruses, and metagenomic surveillance of arthropods has enabled exploration of the diversity of novel or known viral agents. We have identified a novel rhabdovirus that is genetically related to the recently described Merida virus via next-generation sequencing in a mosquito pool from Thrace. The complete viral genome contains 11,798 nucleotides with 83% genome-wide nucleotide sequence similarity to Merida virus. Five major putative open reading frames that follow the canonical rhabdovirus genome organization were identified. A total of 1380 mosquitoes comprising 13 species, collected from Thrace and the Mediterranean and Aegean regions of Anatolia were screened for the novel virus using primers based on the N and L genes of the prototype genome. Eight positive pools (6.2%) exclusively comprised Culex pipiens sensu lato specimens originating from all study regions. Infections were observed in pools with female as well as male or mixed-sex individuals. The overall and Cx. pipiens-specific minimal infection rates were calculated to be 5.7 and 14.8, respectively. Sequencing of the PCR products revealed marked diversity within a portion of the N gene, with up to 4% divergence and distinct amino acid substitutions that were unrelated to the collection site. Phylogenetic analysis of the complete and partial viral polymerase (L gene) amino acid sequences placed the novel virus and Merida virus in a distinct group, indicating that these strains are closely related. The strain is tentatively named "Merida-like virus Turkey". Studies are underway to isolate and further explore the host range and distribution of this new strain.
Comprehensive Insights in the Mycobacterium avium subsp. paratuberculosis Genome Using New WGS Data of Sheep Strain JIII-386 from Germany

PubMed Central

Möbius, Petra; Hölzer, Martin; Felder, Marius; Nordsiek, Gabriele; Groth, Marco; Köhler, Heike; Reichwald, Kathrin; Platzer, Matthias; Marz, Manja

2015-01-01

Mycobacterium avium (M. a.) subsp. paratuberculosis (MAP)—the etiologic agent of Johne’s disease—affects cattle, sheep, and other ruminants worldwide. To decipher phenotypic differences among sheep and cattle strains (belonging to MAP-S [Type-I/III], respectively, MAP-C [Type-II]), comparative genome analysis needs data from diverse isolates originating from different geographic regions of the world. This study presents the so far best assembled genome of a MAP-S-strain: Sheep isolate JIII-386 from Germany. One newly sequenced cattle isolate (JII-1961, Germany), four published MAP strains of MAP-C and MAP-S from the United States and Australia, and M. a. subsp. hominissuis (MAH) strain 104 were used for assembly improvement and comparisons. All genomes were annotated by BacProt and results compared with NCBI (National Center for Biotechnology Information) annotation. Corresponding protein-coding sequences (CDSs) were detected, but also CDSs that were exclusively determined by either NCBI or BacProt. A new Shine–Dalgarno sequence motif (5′-AGCTGG-3′) was extracted. Novel CDSs including PE-PGRS family protein genes and about 80 noncoding RNAs exhibiting high sequence conservation are presented. Previously found genetic differences between MAP-types are partially revised. Four of ten assumed MAP-S-specific large sequence polymorphism regions (LSPSs) are still present in MAP-C strains; new LSPSs were identified. Independently of the regional origin of the strains, the number of individual CDSs and single nucleotide variants confirms the strong similarity of MAP-C strains and shows higher diversity among MAP-S strains. This study gives ambiguous results regarding the hypothesis that MAP-S is the evolutionary intermediate between MAH and MAP-C, but it clearly shows a higher similarity of MAP to MAH than to Mycobacterium intracellulare. PMID:26384038
Cross-species bacterial artificial chromosome (BAC) library screening via overgo-based hybridization and BAC-contig mapping of a yield enhancement quantitative trait locus (QTL) yld1.1 in the Malaysian wild rice Oryza rufipogon.

PubMed

Song, Beng-Kah; Nadarajah, Kalaivani; Romanov, Michael N; Ratnam, Wickneswari

2005-01-01

The construction of BAC-contig physical maps is an important step towards a partial or ultimate genome sequence analysis. Here, we describe our initial efforts to apply an overgo approach to screen a BAC library of the Malaysian wild rice species, Oryza rufipogon. Overgo design is based on repetitive element masking and sequence uniqueness, and uses short probes (approximately 40 bp), making this method highly efficient and specific. Pairs of 24-bp oligos that contain an 8-bp overlap were developed from the publicly available genomic sequences of the cultivated rice, O. sativa, to generate 20 overgo probes for a 1-Mb region that encompasses a yield enhancement QTL yld1.1 in O. rufipogon. The advantages of a high similarity in melting temperature, hybridization kinetics and specific activities of overgos further enabled a pooling strategy for library screening by filter hybridization. Two pools of ten overgos each were hybridized to high-density filters representing the O. rufipogon genomic BAC library. These screening tests succeeded in providing 69 PCR-verified positive hits from a total of 23,040 BAC clones of the entire O. rufipogon library. A minimal tilling path of clones was generated to contribute to a fully covered BAC-contig map of the targeted 1-Mb region. The developed protocol for overgo design based on O. sativa sequences as a comparative genomic framework, and the pooled overgo hybridization screening technique are suitable means for high-resolution physical mapping and the identification of BAC candidates for sequencing.
Evolutionary Acquisition and Loss of Saxitoxin Biosynthesis in Dinoflagellates: the Second “Core” Gene, sxtG

PubMed Central

Orr, Russell J. S.; Stüken, Anke; Murray, Shauna A.

2013-01-01

Saxitoxin and its derivatives are potent neurotoxins produced by several cyanobacteria and dinoflagellate species. SxtA is the initial enzyme in the biosynthesis of saxitoxin. The dinoflagellate full mRNA and partial genomic sequences have previously been characterized, and it appears that sxtA originated in dinoflagellates through a horizontal gene transfer from a bacterium. So far, little is known about the remaining genes involved in this pathway in dinoflagellates. Here we characterize sxtG, an amidinotransferase enzyme gene that putatively encodes the second step in saxitoxin biosynthesis. In this study, the entire sxtG transcripts from Alexandrium fundyense CCMP1719 and Alexandrium minutum CCMP113 were amplified and sequenced. The transcripts contained typical dinoflagellate spliced leader sequences and eukaryotic poly(A) tails. In addition, partial sxtG transcript fragments were amplified from four additional Alexandrium species and Gymnodinium catenatum. The phylogenetic inference of dinoflagellate sxtG, congruent with sxtA, revealed a bacterial origin. However, it is not known if sxtG was acquired independently of sxtA. Amplification and sequencing of the corresponding genomic sxtG region revealed noncanonical introns. These introns show a high interspecies and low intraspecies variance, suggesting multiple independent acquisitions and losses. Unlike sxtA, sxtG was also amplified from Alexandrium species not known to synthesize saxitoxin. However, amplification was not observed for 22 non-saxitoxin-producing dinoflagellate species other than those of the genus Alexandrium or G. catenatum. This result strengthens our hypothesis that saxitoxin synthesis has been secondarily lost in conjunction with sxtA for some descendant species. PMID:23335767
Evolutionary acquisition and loss of saxitoxin biosynthesis in dinoflagellates: the second "core" gene, sxtG.

PubMed

Orr, Russell J S; Stüken, Anke; Murray, Shauna A; Jakobsen, Kjetill S

2013-04-01

Saxitoxin and its derivatives are potent neurotoxins produced by several cyanobacteria and dinoflagellate species. SxtA is the initial enzyme in the biosynthesis of saxitoxin. The dinoflagellate full mRNA and partial genomic sequences have previously been characterized, and it appears that sxtA originated in dinoflagellates through a horizontal gene transfer from a bacterium. So far, little is known about the remaining genes involved in this pathway in dinoflagellates. Here we characterize sxtG, an amidinotransferase enzyme gene that putatively encodes the second step in saxitoxin biosynthesis. In this study, the entire sxtG transcripts from Alexandrium fundyense CCMP1719 and Alexandrium minutum CCMP113 were amplified and sequenced. The transcripts contained typical dinoflagellate spliced leader sequences and eukaryotic poly(A) tails. In addition, partial sxtG transcript fragments were amplified from four additional Alexandrium species and Gymnodinium catenatum. The phylogenetic inference of dinoflagellate sxtG, congruent with sxtA, revealed a bacterial origin. However, it is not known if sxtG was acquired independently of sxtA. Amplification and sequencing of the corresponding genomic sxtG region revealed noncanonical introns. These introns show a high interspecies and low intraspecies variance, suggesting multiple independent acquisitions and losses. Unlike sxtA, sxtG was also amplified from Alexandrium species not known to synthesize saxitoxin. However, amplification was not observed for 22 non-saxitoxin-producing dinoflagellate species other than those of the genus Alexandrium or G. catenatum. This result strengthens our hypothesis that saxitoxin synthesis has been secondarily lost in conjunction with sxtA for some descendant species.
Comparative sequence analyses of sixteen reptilian paramyxoviruses

USGS Publications Warehouse

Ahne, W.; Batts, W.N.; Kurath, G.; Winton, J.R.

1999-01-01

Viral genomic RNA of Fer-de-Lance virus (FDLV), a paramyxovirus highly pathogenic for reptiles, was reverse transcribed and cloned. Plasmids with significant sequence similarities to the hemagglutinin-neuraminidase (HN) and polymerase (L) genes of mammalian paramyxoviruses were identified by BLAST search. Partial sequences of the FDLV genes were used to design primers for amplification by nested polymerase chain reaction (PCR) and sequencing of 518-bp L gene and 352-bp HN gene fragments from a collection of 15 previously uncharacterized reptilian paramyxoviruses. Phylogenetic analyses of the partial L and HN sequences produced similar trees in which there were two distinct subgroups of isolates that were supported with maximum bootstrap values, and several intermediate isolates. Within each subgroup the nucleotide divergence values were less than 2.5%, while the divergence between the two subgroups was 20-22%. This indicated that the two subgroups represent distinct virus species containing multiple virus strains. The five intermediate isolates had nucleotide divergence values of 11-20% and may represent additional distinct species. In addition to establishing diversity among reptilian paramyxoviruses, the phylogenetic groupings showed some correlation with geographic location, and clearly demonstrated a low level of host species-specificity within these viruses. Copyright (C) 1999 Elsevier Science B.V.
Assessment of Recombination in the S-segment Genome of Crimean-Congo Hemorrhagic Fever Virus in Iran.

PubMed

Chinikar, Sadegh; Shah-Hosseini, Nariman; Bouzari, Saeid; Shokrgozar, Mohammad Ali; Mostafavi, Ehsan; Jalali, Tahmineh; Khakifirouz, Sahar; Groschup, Martin H; Niedrig, Matthias

2016-03-01

Crimean-Congo Hemorrhagic Fever Virus (CCHFV) belongs to genus Nairovirus and family Bunyaviridae. The main aim of this study was to investigate the extent of recombination in S-segment genome of CCHFV in Iran. Samples were isolated from Iranian patients and those available in GenBank, and analyzed by phylogenetic and bootscan methods. Through comparison of the phylogenetic trees based on full length sequences and partial fragments in the S-segment genome of CCHFV, genetic switch was evident, due to recombination event. Moreover, evidence of multiple recombination events was detected in query isolates when bootscan analysis was used by SimPlot software. Switch of different genomic regions between different strains by recombination could contribute to CCHFV diversification and evolution. The occurrence of recombination in CCHFV has a critical impact on epidemiological investigations and vaccine design.
Gene Discovery through Genomic Sequencing of Brucella abortus

PubMed Central

Sánchez, Daniel O.; Zandomeni, Ruben O.; Cravero, Silvio; Verdún, Ramiro E.; Pierrou, Ester; Faccio, Paula; Diaz, Gabriela; Lanzavecchia, Silvia; Agüero, Fernán; Frasch, Alberto C. C.; Andersson, Siv G. E.; Rossetti, Osvaldo L.; Grau, Oscar; Ugalde, Rodolfo A.

2001-01-01

Brucella abortus is the etiological agent of brucellosis, a disease that affects bovines and human. We generated DNA random sequences from the genome of B. abortus strain 2308 in order to characterize molecular targets that might be useful for developing immunological or chemotherapeutic strategies against this pathogen. The partial sequencing of 1,899 clones allowed the identification of 1,199 genomic sequence surveys (GSSs) with high homology (BLAST expect value < 10−5) to sequences deposited in the GenBank databases. Among them, 925 represent putative novel genes for the Brucella genus. Out of 925 nonredundant GSSs, 470 were classified in 15 categories based on cellular function. Seven hundred GSSs showed no significant database matches and remain available for further studies in order to identify their function. A high number of GSSs with homology to Agrobacterium tumefaciens and Rhizobium meliloti proteins were observed, thus confirming their close phylogenetic relationship. Among them, several GSSs showed high similarity with genes related to nodule nitrogen fixation, synthesis of nod factors, nodulation protein symbiotic plasmid, and nodule bacteroid differentiation. We have also identified several B. abortus homologs of virulence and pathogenesis genes from other pathogens, including a homolog to both the Shda gene from Salmonella enterica serovar Typhimurium and the AidA-1 gene from Escherichia coli. Other GSSs displayed significant homologies to genes encoding components of the type III and type IV secretion machineries, suggesting that Brucella might also have an active type III secretion machinery. PMID:11159979
Molecular and physiological properties of bacteriophages from North America and Germany affecting the fire blight pathogen Erwinia amylovora.

PubMed

Müller, Ina; Lurz, Rudi; Kube, Michael; Quedenau, Claudia; Jelkmann, Wilhelm; Geider, Klaus

2011-11-01

For possible control of fire blight affecting apple and pear trees, we characterized Erwinia amylovora phages from North America and Germany. The genome size determined by electron microscopy (EM) was confirmed by sequence data and major coat proteins were identified from gel bands by mass spectroscopy. By their morphology from EM data, φEa1h and φEa100 were assigned to the Podoviridae and φEa104 and φEa116 to the Myoviridae. Host ranges were essentially confined to E. amylovora, strains of the species Erwinia pyrifoliae, E. billingiae and even Pantoea stewartii were partially sensitive. The phages φEa1h and φEa100 were dependent on the amylovoran capsule of E. amylovora, φEa104 and φEa116 were not. The Myoviridae efficiently lysed their hosts and protected apple flowers significantly better than the Podoviridae against E. amylovora and should be preferred in biocontrol experiments. We have also isolated and partially characterized E. amylovora phages from apple orchards in Germany. They belong to the Podoviridae or Myoviridae with a host range similar to the phages isolated in North America. In EM measurements, the genome sizes of the Podoviridae were smaller than the genomes of the Myoviridae from North America and from Germany, which differed from each other in corresponding nucleotide sequences. © 2011 The Authors. Microbial Biotechnology © 2011 Society for Applied Microbiology and Blackwell Publishing Ltd.
Neo-sex Chromosomes in the Monarch Butterfly, Danaus plexippus

PubMed Central

Mongue, Andrew J.; Nguyen, Petr; Voleníková, Anna; Walters, James R.

2017-01-01

We report the discovery of a neo-sex chromosome in the monarch butterfly, Danaus plexippus, and several of its close relatives. Z-linked scaffolds in the D. plexippus genome assembly were identified via sex-specific differences in Illumina sequencing coverage. Additionally, a majority of the D. plexippus genome assembly was assigned to chromosomes based on counts of one-to-one orthologs relative to the butterfly Melitaea cinxia (with replication using two other lepidopteran species), in which genome scaffolds have been mapped to linkage groups. Sequencing coverage-based assessments of Z linkage combined with homology-based chromosomal assignments provided strong evidence for a Z-autosome fusion in the Danaus lineage, involving the autosome homologous to chromosome 21 in M. cinxia. Coverage analysis also identified three notable assembly errors resulting in chimeric Z-autosome scaffolds. Cytogenetic analysis further revealed a large W chromosome that is partially euchromatic, consistent with being a neo-W chromosome. The discovery of a neo-Z and the provisional assignment of chromosome linkage for >90% of D. plexippus genes lays the foundation for novel insights concerning sex chromosome evolution in this female-heterogametic model species for functional and evolutionary genomics. PMID:28839116
Bioinformatic Characterization of Genes and Proteins Involved in Blood Clotting in Lampreys.

PubMed

Doolittle, Russell F

2015-10-01

Lampreys and hagfish are the earliest diverging of extant vertebrates and are obvious targets for investigating the origins of complex biochemical systems found in mammals. Currently, the simplest approach for such inquiries is to search for the presence of relevant genes in whole genome sequence (WGS) assemblies. Unhappily, in the past a high-quality complete genome sequence has not been available for either lampreys or hagfish, precluding the possibility of proving gene absence. Recently, improved but still incomplete genome assemblies for two species of lamprey have been posted, and, taken together with an extensive collection of short sequences in the NCBI trace archive, they have made it possible to make reliable counts for specific gene families. Particularly, a multi-source tactic has been used to study the lamprey blood clotting system with regard to the presence and absence of genes known to occur in higher vertebrates. As was suggested in earlier studies, lampreys lack genes for coagulation factors VIII and IX, both of which are critical for the "intrinsic" clotting system and responsible for hemophilia in humans. On the other hand, they have three each of genes for factors VII and X, participants in the "extrinsic" clotting system. The strategy of using raw trace sequence "reads" together with partial WGS assemblies for lampreys can be used in studies on the early evolution of other biochemical systems in vertebrates.
Evidence for two transferrin loci in the Salmo trutta genome.

PubMed

Rozman, T; Dovc, P; Marić, S; Kokalj-Vokac, N; Erjavec-Skerget, A; Rab, P; Snoj, A

2008-12-01

To determine the organization of transferrin (TF) locus in the Salmo trutta genome, partial DNA and cDNA sequencing, fluorescent in situ hybridization (FISH) and Salmo salar BAC analysis were performed. TF expression levels and copy number prediction were assessed using real-time PCR. In addition to two previously reported DNA TF variant sequences of S. trutta and Salmo marmoratus (TF1), two novel variant sequences (TF2) were revealed in both species. Variant-specific sequence tags, characterizing two variants for each TF type (TF1 and TF2), were identified in genomic clones from each of the F1 hybrids between S. trutta and S. marmoratus. These clearly documented double heterozygote status at the TF loci. The real-time PCR data showed that each of the two TF types (TF1 and TF2) existed in one copy only and that the transcription of TF2 was considerably lower compared with TF1. Using FISH, hybridization signals were observed on two medium-sized acrocentric chromosomes of S. trutta karyotype. A TF type-specific PCR followed by a restriction analysis revealed the presence of two TF loci in the majority of analysed BAC clones. It was concluded that the TF gene is duplicated in the genome of S. trutta, and that the two TF loci are located adjacent to one another on the same chromosome. The differing transcription levels of TF1 and TF2 appear to depend on the corresponding promoter activity, which at least for TF2 seems to vary between different Salmo congeners.
Revealing Less Derived Nature of Cartilaginous Fish Genomes with Their Evolutionary Time Scale Inferred with Nuclear Genes

PubMed Central

Renz, Adina J.; Meyer, Axel; Kuraku, Shigehiro

2013-01-01

Cartilaginous fishes, divided into Holocephali (chimaeras) and Elasmoblanchii (sharks, rays and skates), occupy a key phylogenetic position among extant vertebrates in reconstructing their evolutionary processes. Their accurate evolutionary time scale is indispensable for better understanding of the relationship between phenotypic and molecular evolution of cartilaginous fishes. However, our current knowledge on the time scale of cartilaginous fish evolution largely relies on estimates using mitochondrial DNA sequences. In this study, making the best use of the still partial, but large-scale sequencing data of cartilaginous fish species, we estimate the divergence times between the major cartilaginous fish lineages employing nuclear genes. By rigorous orthology assessment based on available genomic and transcriptomic sequence resources for cartilaginous fishes, we selected 20 protein-coding genes in the nuclear genome, spanning 2973 amino acid residues. Our analysis based on the Bayesian inference resulted in the mean divergence time of 421 Ma, the late Silurian, for the Holocephali-Elasmobranchii split, and 306 Ma, the late Carboniferous, for the split between sharks and rays/skates. By applying these results and other documented divergence times, we measured the relative evolutionary rate of the Hox A cluster sequences in the cartilaginous fish lineages, which resulted in a lower substitution rate with a factor of at least 2.4 in comparison to tetrapod lineages. The obtained time scale enables mapping phenotypic and molecular changes in a quantitative framework. It is of great interest to corroborate the less derived nature of cartilaginous fish at the molecular level as a genome-wide phenomenon. PMID:23825540
Revealing less derived nature of cartilaginous fish genomes with their evolutionary time scale inferred with nuclear genes.

PubMed

Renz, Adina J; Meyer, Axel; Kuraku, Shigehiro

2013-01-01

Cartilaginous fishes, divided into Holocephali (chimaeras) and Elasmoblanchii (sharks, rays and skates), occupy a key phylogenetic position among extant vertebrates in reconstructing their evolutionary processes. Their accurate evolutionary time scale is indispensable for better understanding of the relationship between phenotypic and molecular evolution of cartilaginous fishes. However, our current knowledge on the time scale of cartilaginous fish evolution largely relies on estimates using mitochondrial DNA sequences. In this study, making the best use of the still partial, but large-scale sequencing data of cartilaginous fish species, we estimate the divergence times between the major cartilaginous fish lineages employing nuclear genes. By rigorous orthology assessment based on available genomic and transcriptomic sequence resources for cartilaginous fishes, we selected 20 protein-coding genes in the nuclear genome, spanning 2973 amino acid residues. Our analysis based on the Bayesian inference resulted in the mean divergence time of 421 Ma, the late Silurian, for the Holocephali-Elasmobranchii split, and 306 Ma, the late Carboniferous, for the split between sharks and rays/skates. By applying these results and other documented divergence times, we measured the relative evolutionary rate of the Hox A cluster sequences in the cartilaginous fish lineages, which resulted in a lower substitution rate with a factor of at least 2.4 in comparison to tetrapod lineages. The obtained time scale enables mapping phenotypic and molecular changes in a quantitative framework. It is of great interest to corroborate the less derived nature of cartilaginous fish at the molecular level as a genome-wide phenomenon.
Novel Positive-Sense, Single-Stranded RNA (+ssRNA) Virus with Di-Cistronic Genome from Intestinal Content of Freshwater Carp (Cyprinus carpio)

PubMed Central

Pankovics, Péter; Simmonds, Peter

2011-01-01

A novel positive-sense, single-stranded RNA (+ssRNA) virus (Halastavi árva RNA virus, HalV; JN000306) with di-cistronic genome organization was serendipitously identified in intestinal contents of freshwater carps (Cyprinus carpio) fished by line-fishing from fishpond “Lőrinte halastó” located in Veszprém County, Hungary. The complete nucleotide (nt) sequence of the genomic RNA is 9565 nt in length and contains two long - non-in-frame - open reading frames (ORFs), which are separated by an intergenic region. The ORF1 (replicase) is preceded by an untranslated sequence of 827 nt, while an untranslated region of 139 nt follows the ORF2 (capsid proteins). The deduced amino acid (aa) sequences of the ORFs showed only low (less than 32%) and partial similarity to the non-structural (2C-like helicase, 3C-like cystein protease and 3D-like RNA dependent RNA polymerase) and structural proteins (VP2/VP4/VP3) of virus families in Picornavirales especially to members of the viruses with dicistronic genome. Halastavi árva RNA virus is present in intestinal contents of omnivorous freshwater carps but the origin and the host species of this virus remains unknown. The unique viral sequence and the actual position indicate that Halastavi árva RNA virus seems to be the first member of a new di-cistronic ssRNA virus. Further studies are required to investigate the specific host species (and spectrum), ecology and role of Halastavi árva RNA virus in the nature. PMID:22195010
Chloroplast DNA sequence of the green alga Oedogonium cardiacum (Chlorophyceae): Unique genome architecture, derived characters shared with the Chaetophorales and novel genes acquired through horizontal transfer

PubMed Central

Brouard, Jean-Simon; Otis, Christian; Lemieux, Claude; Turmel, Monique

2008-01-01

Background To gain insight into the branching order of the five main lineages currently recognized in the green algal class Chlorophyceae and to expand our understanding of chloroplast genome evolution, we have undertaken the sequencing of chloroplast DNA (cpDNA) from representative taxa. The complete cpDNA sequences previously reported for Chlamydomonas (Chlamydomonadales), Scenedesmus (Sphaeropleales), and Stigeoclonium (Chaetophorales) revealed tremendous variability in their architecture, the retention of only few ancestral gene clusters, and derived clusters shared by Chlamydomonas and Scenedesmus. Unexpectedly, our recent phylogenies inferred from these cpDNAs and the partial sequences of three other chlorophycean cpDNAs disclosed two major clades, one uniting the Chlamydomonadales and Sphaeropleales (CS clade) and the other uniting the Oedogoniales, Chaetophorales and Chaetopeltidales (OCC clade). Although molecular signatures provided strong support for this dichotomy and for the branching of the Oedogoniales as the earliest-diverging lineage of the OCC clade, more data are required to validate these phylogenies. We describe here the complete cpDNA sequence of Oedogonium cardiacum (Oedogoniales). Results Like its three chlorophycean homologues, the 196,547-bp Oedogonium chloroplast genome displays a distinctive architecture. This genome is one of the most compact among photosynthetic chlorophytes. It has an atypical quadripartite structure, is intron-rich (17 group I and 4 group II introns), and displays 99 different conserved genes and four long open reading frames (ORFs), three of which are clustered in the spacious inverted repeat of 35,493 bp. Intriguingly, two of these ORFs (int and dpoB) revealed high similarities to genes not usually found in cpDNA. At the gene content and gene order levels, the Oedogonium genome most closely resembles its Stigeoclonium counterpart. Characters shared by these chlorophyceans but missing in members of the CS clade include the retention of psaM, rpl32 and trnL(caa), the loss of petA, the disruption of three ancestral clusters and the presence of five derived gene clusters. Conclusion The Oedogonium chloroplast genome disclosed additional characters that bolster the evidence for a close alliance between the Oedogoniales and Chaetophorales. Our unprecedented finding of int and dpoB in this cpDNA provides a clear example that novel genes were acquired by the chloroplast genome through horizontal transfers, possibly from a mitochondrial genome donor. PMID:18558012
Analysis of the Genome Structure of the Nonpathogenic Probiotic Escherichia coli Strain Nissle 1917

PubMed Central

Grozdanov, Lubomir; Raasch, Carsten; Schulze, Jürgen; Sonnenborn, Ulrich; Gottschalk, Gerhard; Hacker, Jörg; Dobrindt, Ulrich

2004-01-01

Nonpathogenic Escherichia coli strain Nissle 1917 (O6:K5:H1) is used as a probiotic agent in medicine, mainly for the treatment of various gastroenterological diseases. To gain insight on the genetic level into its properties of colonization and commensalism, this strain's genome structure has been analyzed by three approaches: (i) sequence context screening of tRNA genes as a potential indication of chromosomal integration of horizontally acquired DNA, (ii) sequence analysis of 280 kb of genomic islands (GEIs) coding for important fitness factors, and (iii) comparison of Nissle 1917 genome content with that of other E. coli strains by DNA-DNA hybridization. PCR-based screening of 324 nonpathogenic and pathogenic E. coli isolates of different origins revealed that some chromosomal regions are frequently detectable in nonpathogenic E. coli and also among extraintestinal and intestinal pathogenic strains. Many known fitness factor determinants of strain Nissle 1917 are localized on four GEIs which have been partially sequenced and analyzed. Comparison of these data with the available knowledge of the genome structure of E. coli K-12 strain MG1655 and of uropathogenic E. coli O6 strains CFT073 and 536 revealed structural similarities on the genomic level, especially between the E. coli O6 strains. The lack of defined virulence factors (i.e., alpha-hemolysin, P-fimbrial adhesins, and the semirough lipopolysaccharide phenotype) combined with the expression of fitness factors such as microcins, different iron uptake systems, adhesins, and proteases, which may support its survival and successful colonization of the human gut, most likely contributes to the probiotic character of E. coli strain Nissle 1917. PMID:15292145
Comparative Genomics of the Dual-Obligate Symbionts from the Treehopper, Entylia carinata (Hemiptera: Membracidae), Provide Insight into the Origins and Evolution of an Ancient Symbiosis.

PubMed

Mao, Meng; Yang, Xiushuai; Poff, Kirsten; Bennett, Gordon

2017-06-01

Insect species in the Auchenorrhyncha suborder (Hemiptera) maintain ancient obligate symbioses with bacteria that provide essential amino acids (EAAs) deficient in their plant-sap diets. Molecular studies have revealed that two complementary symbiont lineages, "Candidatus Sulcia muelleri" and a betaproteobacterium ("Ca. Zinderia insecticola" in spittlebugs [Cercopoidea] and "Ca. Nasuia deltocephalinicola" in leafhoppers [Cicadellidae]) may have persisted in the suborder since its origin ∼300 Ma. However, investigation of how this pair has co-evolved on a genomic level is limited to only a few host lineages. We sequenced the complete genomes of Sulcia and a betaproteobacterium from the treehopper, Entylia carinata (Membracidae: ENCA), as the first representative from this species-rich group. It also offers the opportunity to compare symbiont evolution across a major insect group, the Membracoidea (leafhoppers + treehoppers). Genomic analyses show that the betaproteobacteria in ENCA is a member of the Nasuia lineage. Both symbionts have larger genomes (Sulcia = 218 kb and Nasuia = 144 kb) than related lineages in Deltocephalinae leafhoppers, retaining genes involved in basic cellular functions and information processing. Nasuia-ENCA further exhibits few unique gene losses, suggesting that its parent lineage in the common ancestor to the Membracoidea was already highly reduced. Sulcia-ENCA has lost the abilities to synthesize menaquinone cofactor and to complete the synthesis of the branched-chain EAAs. Both capabilities are conserved in other Sulcia lineages sequenced from across the Auchenorrhyncha. Finally, metagenomic sequencing recovered the partial genome of an Arsenophonus symbiont, although it infects only 20% of individuals indicating a facultative role. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Comparative Genomics of the Dual-Obligate Symbionts from the Treehopper, Entylia carinata (Hemiptera: Membracidae), Provide Insight into the Origins and Evolution of an Ancient Symbiosis

PubMed Central

Yang, Xiushuai; Poff, Kirsten; Bennett, Gordon

2017-01-01

Abstract Insect species in the Auchenorrhyncha suborder (Hemiptera) maintain ancient obligate symbioses with bacteria that provide essential amino acids (EAAs) deficient in their plant-sap diets. Molecular studies have revealed that two complementary symbiont lineages, “Candidatus Sulcia muelleri” and a betaproteobacterium (“Ca. Zinderia insecticola” in spittlebugs [Cercopoidea] and “Ca. Nasuia deltocephalinicola” in leafhoppers [Cicadellidae]) may have persisted in the suborder since its origin ∼300 Ma. However, investigation of how this pair has co-evolved on a genomic level is limited to only a few host lineages. We sequenced the complete genomes of Sulcia and a betaproteobacterium from the treehopper, Entylia carinata (Membracidae: ENCA), as the first representative from this species-rich group. It also offers the opportunity to compare symbiont evolution across a major insect group, the Membracoidea (leafhoppers + treehoppers). Genomic analyses show that the betaproteobacteria in ENCA is a member of the Nasuia lineage. Both symbionts have larger genomes (Sulcia = 218 kb and Nasuia = 144 kb) than related lineages in Deltocephalinae leafhoppers, retaining genes involved in basic cellular functions and information processing. Nasuia-ENCA further exhibits few unique gene losses, suggesting that its parent lineage in the common ancestor to the Membracoidea was already highly reduced. Sulcia-ENCA has lost the abilities to synthesize menaquinone cofactor and to complete the synthesis of the branched-chain EAAs. Both capabilities are conserved in other Sulcia lineages sequenced from across the Auchenorrhyncha. Finally, metagenomic sequencing recovered the partial genome of an Arsenophonus symbiont, although it infects only 20% of individuals indicating a facultative role. PMID:28854637
Widespread signatures of local mRNA folding structure selection in four Dengue virus serotypes

PubMed Central

2015-01-01

Background It is known that mRNA folding can affect and regulate various gene expression steps both in living organisms and in viruses. Previous studies have recognized functional RNA structures in the genome of the Dengue virus. However, these studies usually focused either on the viral untranslated regions or on very specific and limited regions at the beginning of the coding sequences, in a limited number of strains, and without considering evolutionary selection. Results Here we performed the first large scale comprehensive genomics analysis of selection for local mRNA folding strength in the Dengue virus coding sequences, based on a total of 1,670 genomes and 4 serotypes. Our analysis identified clusters of positions along the coding regions that may undergo a conserved evolutionary selection for strong or weak local folding maintained across different viral variants. Specifically, 53-66 clusters for strong folding and 49-73 clusters for weak folding (depending on serotype) aggregated of positions with a significant conservation of folding energy signals (related to partially overlapping local genomic regions) were recognized. In addition, up to 7% of these positions were found to be conserved in more than 90% of the viral genomes. Although some of the identified positions undergo frequent synonymous / non-synonymous substitutions, the selection for folding strength therein is preserved, and thus cannot be trivially explained based on sequence conservation alone. Conclusions The fact that many of the positions with significant folding related signals are conserved among different Dengue variants suggests that a better understanding of the mRNA structures in the corresponding regions may promote the development of prospective anti- Dengue vaccination strategies. The comparative genomics approach described here can be employed in the future for detecting functional regions in other pathogens with very high mutations rates. PMID:26449467
G23D: Online tool for mapping and visualization of genomic variants on 3D protein structures.

PubMed

Solomon, Oz; Kunik, Vered; Simon, Amos; Kol, Nitzan; Barel, Ortal; Lev, Atar; Amariglio, Ninette; Somech, Raz; Rechavi, Gidi; Eyal, Eran

2016-08-26

Evaluation of the possible implications of genomic variants is an increasingly important task in the current high throughput sequencing era. Structural information however is still not routinely exploited during this evaluation process. The main reasons can be attributed to the partial structural coverage of the human proteome and the lack of tools which conveniently convert genomic positions, which are the frequent output of genomic pipelines, to proteins and structure coordinates. We present G23D, a tool for conversion of human genomic coordinates to protein coordinates and protein structures. G23D allows mapping of genomic positions/variants on evolutionary related (and not only identical) protein three dimensional (3D) structures as well as on theoretical models. By doing so it significantly extends the space of variants for which structural insight is feasible. To facilitate interpretation of the variant consequence, pathogenic variants, functional sites and polymorphism sites are displayed on protein sequence and structure diagrams alongside the input variants. G23D also provides modeling of the mutant structure, analysis of intra-protein contacts and instant access to functional predictions and predictions of thermo-stability changes. G23D is available at http://www.sheba-cancer.org.il/G23D . G23D extends the fraction of variants for which structural analysis is applicable and provides better and faster accessibility for structural data to biologists and geneticists who routinely work with genomic information.
Genomic characterization of Indian isolates of egg drop syndrome 1976 virus.

PubMed

Raj, G D; Sivakumar, S; Sudharsan, S; Mohan, A C; Nachimuthu, K

2001-02-01

Five Indian isolates of egg drop syndrome (EDS) 1976 virus and the reference strain 127 were compared by restriction enzyme analysis of viral DNA, and the hexon gene amplified by polymerase chain reaction. Using these techniques, no differences were seen among these viruses. However, partial sequencing of the hexon gene revealed major differences (4.6%) in one of the isolates sequenced, EDS Kerala. Phylogenetic analysis also placed this isolate in a different lineage compared with the other isolates. The need for constant monitoring of the genetic nature of the field isolates of EDS viruses is emphasized.
Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver

PubMed Central

Blanquart, François; Golubchik, Tanya; Gall, Astrid; Bakker, Margreet; Bezemer, Daniela; Croucher, Nicholas J; Hall, Matthew; Hillebregt, Mariska; Ratmann, Oliver; Albert, Jan; Bannert, Norbert; Fellay, Jacques; Fransen, Katrien; Gourlay, Annabelle; Grabowski, M Kate; Gunsenheimer-Bartmeyer, Barbara; Günthard, Huldrych F; Kivelä, Pia; Kouyos, Roger; Laeyendecker, Oliver; Liitsola, Kirsi; Meyer, Laurence; Porter, Kholoud; Ristola, Matti; van Sighem, Ard; Cornelissen, Marion; Kellam, Paul; Reiss, Peter

2018-01-01

Abstract Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user’s choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver’s constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from https://github.com/ChrisHIV/shiver. PMID:29876136
DNA remodelling by Strict Partial Endoreplication in orchids, an original process in the plant kingdom.

PubMed

Brown, Spencer C; Bourge, Mickaël; Maunoury, Nicolas; Wong, Maurice; Bianchi, Michele Wolfe; Lepers-Andrzejewski, Sandra; Besse, Pascale; Siljak-Yakovlev, Sonja; Dron, Michel; Satiat-Jeunemaître, Béatrice

2017-04-13

DNA remodelling during endoreplication appears to be a strong developmental characteristic in orchids. In this study, we analysed DNA content and nuclei in 41 species of orchids to further map the genome evolution in this plant family. We demonstrate that the DNA remodelling observed in 36 out of 41 orchids studied corresponds to strict partial endoreplication. Such process is developmentally regulated in each wild species studied. Cytometry data analyses allowed us to propose a model where nuclear states 2C, 4E, 8E, etc. form a series comprising a fixed proportion, the euploid genome 2C, plus 2 to 32 additional copies of a complementary part of the genome. The fixed proportion ranged from 89% of the genome in Vanilla mexicana down to 19% in V. pompona, the lowest value for all 148 orchids reported. Insterspecific hybridisation did not suppress this phenomenon. Interestingly, this process was not observed in mass-produced epiphytes. Nucleolar volumes grow with the number of endocopies present, coherent with high transcription activity in endoreplicated nuclei. Our analyses suggest species-specific chromatin rearrangement. Towards understanding endoreplication, V. planifolia constitutes a tractable system for isolating the genomic sequences that confer an advantage via endoreplication from those that apparently suffice at diploid level. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Viruses in diarrhoeic dogs include novel kobuviruses and sapoviruses.

PubMed

Li, Linlin; Pesavento, Patricia A; Shan, Tongling; Leutenegger, Christian M; Wang, Chunlin; Delwart, Eric

2011-11-01

The close interactions of dogs with humans and surrounding wildlife provide frequent opportunities for cross-species virus transmissions. In order to initiate an unbiased characterization of the eukaryotic viruses in the gut of dogs, this study used deep sequencing of partially purified viral capsid-protected nucleic acids from the faeces of 18 diarrhoeic dogs. Known canine parvoviruses, coronaviruses and rotaviruses were identified, and the genomes of the first reported canine kobuvirus and sapovirus were characterized. Canine kobuvirus, the first sequenced canine picornavirus and the closest genetic relative of the diarrhoea-causing human Aichi virus, was detected at high frequency in the faeces of both healthy and diarrhoeic dogs. Canine sapovirus constituted a novel genogroup within the genus Sapovirus, a group of viruses also associated with human and animal diarrhoea. These results highlight the high frequency of new virus detection possible even in extensively studied animal species using metagenomics approaches, and provide viral genomes for further disease-association studies.
Phylogenetic and microsatellite markers for Tulasnella (Tulasnellaceae) mycorrhizal fungi associated with Australian orchids.

PubMed

Ruibal, Monica P; Peakall, Rod; Smith, Leon M; Linde, Celeste C

2013-03-01

Phylogenetic and microsatellite markers were developed for Tulasnella mycorrhizal fungi to investigate fungal species identity and diversity. These markers will be useful in future studies investigating the phylogenetic relationship of the fungal symbionts, specificity of orchid-mycorrhizal associations, and the role of mycorrhizae in orchid speciation within several orchid genera. • We generated partial genome sequences of two Tulasnella symbionts originating from Chiloglottis and Drakaea orchid species with 454 genome sequencing. Cross-genus transferability across mycorrhizal symbionts associated with multiple genera of Australian orchids (Arthrochilus, Chiloglottis, Drakaea, and Paracaleana) was found for seven phylogenetic loci. Five loci showed cross-transferability to Tulasnella from other orchid genera, and two to Sebacina. Furthermore, 11 polymorphic microsatellite loci were developed for Tulasnella from Chiloglottis. • Highly informative markers were obtained, allowing investigation of mycorrhizal diversity of Tulasnellaceae associated with a wide variety of terrestrial orchids in Australia and potentially worldwide.
Distribution and Genetic Diversity of Bacteriocin Gene Clusters in Rumen Microbial Genomes.

PubMed

Azevedo, Analice C; Bento, Cláudia B P; Ruiz, Jeronimo C; Queiroz, Marisa V; Mantovani, Hilário C

2015-10-01

Some species of ruminal bacteria are known to produce antimicrobial peptides, but the screening procedures have mostly been based on in vitro assays using standardized methods. Recent sequencing efforts have made available the genome sequences of hundreds of ruminal microorganisms. In this work, we performed genome mining of the complete and partial genome sequences of 224 ruminal bacteria and 5 ruminal archaea to determine the distribution and diversity of bacteriocin gene clusters. A total of 46 bacteriocin gene clusters were identified in 33 strains of ruminal bacteria. Twenty gene clusters were related to lanthipeptide biosynthesis, while 11 gene clusters were associated with sactipeptide production, 7 gene clusters were associated with class II bacteriocin production, and 8 gene clusters were associated with class III bacteriocin production. The frequency of strains whose genomes encode putative antimicrobial peptide precursors was 14.4%. Clusters related to the production of sactipeptides were identified for the first time among ruminal bacteria. BLAST analysis indicated that the majority of the gene clusters (88%) encoding putative lanthipeptides contained all the essential genes required for lanthipeptide biosynthesis. Most strains of Streptococcus (66.6%) harbored complete lanthipeptide gene clusters, in addition to an open reading frame encoding a putative class II bacteriocin. Albusin B-like proteins were found in 100% of the Ruminococcus albus strains screened in this study. The in silico analysis provided evidence of novel biosynthetic gene clusters in bacterial species not previously related to bacteriocin production, suggesting that the rumen microbiota represents an underexplored source of antimicrobial peptides. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Genome analysis of Mycoplasma synoviae strain MS-H, the most common M. synoviae strain with a worldwide distribution.

PubMed

Zhu, Ling; Shahid, Muhammad A; Markham, John; Browning, Glenn F; Noormohammadi, Amir H; Marenda, Marc S

2018-02-02

The bacterial pathogen Mycoplasma synoviae can cause subclinical respiratory disease, synovitis, airsacculitis and reproductive tract disease in poultry and is a major cause of economic loss worldwide. The M. synoviae strain MS-H was developed by chemical mutagenesis of an Australian isolate and has been used as a live attenuated vaccine in many countries over the past two decades. As a result it may now be the most prevalent strain of M. synoviae globally. Differentiation of the MS-H vaccine from local field strains is important for epidemiological investigations and is often required for registration of the vaccine. The complete genomic sequence of the MS-H strain was determined using a combination of Illumina and Nanopore methods and compared to WVU-1853, the M. synoviae type strain isolated in the USA 30 years before the parent strain of MS-H, and MS53, a more recent isolate from Brazil. The vaccine strain genome had a slightly larger number of pseudogenes than the two other strains and contained a unique 55 kb chromosomal inversion partially affecting a putative genomic island. Variations in gene content were also noted, including a deoxyribose-phosphate aldolase (deoC) fragment and an ATP-dependent DNA helicase gene found only in MS-H. Some of these sequences may have been acquired horizontally from other avian mycoplasma species. MS-H was somewhat more similar to WVU-1853 than to MS53. The genome sequence of MS-H will enable identification of vaccine-specific genetic markers for use as diagnostic and epidemiological tools to better control M. synoviae.
Analysis of complete mitochondrial genome sequences increases phylogenetic resolution of bears (Ursidae), a mammalian family that experienced rapid speciation.

PubMed

Yu, Li; Li, Yi-Wei; Ryder, Oliver A; Zhang, Ya-Ping

2007-10-24

Despite the small number of ursid species, bear phylogeny has long been a focus of study due to their conservation value, as all bear genera have been classified as endangered at either the species or subspecies level. The Ursidae family represents a typical example of rapid evolutionary radiation. Previous analyses with a single mitochondrial (mt) gene or a small number of mt genes either provide weak support or a large unresolved polytomy for ursids. We revisit the contentious relationships within Ursidae by analyzing complete mt genome sequences and evaluating the performance of both entire mt genomes and constituent mtDNA genes in recovering a phylogeny of extremely recent speciation events. This mitochondrial genome-based phylogeny provides strong evidence that the spectacled bear diverged first, while within the genus Ursus, the sloth bear is the sister taxon of all the other five ursines. The latter group is divided into the brown bear/polar bear and the two black bears/sun bear assemblages. These findings resolve the previous conflicts between trees using partial mt genes. The ability of different categories of mt protein coding genes to recover the correct phylogeny is concordant with previous analyses for taxa with deep divergence times. This study provides a robust Ursidae phylogenetic framework for future validation by additional independent evidence, and also has significant implications for assisting in the resolution of other similarly difficult phylogenetic investigations. Identification of base composition bias and utilization of the combined data of whole mitochondrial genome sequences has allowed recovery of a strongly supported phylogeny that is upheld when using multiple alternative outgroups for the Ursidae, a mammalian family that underwent a rapid radiation since the mid- to late Pliocene. It remains to be seen if the reliability of mt genome analysis will hold up in studies of other difficult phylogenetic issues. Although the whole mitochondrial DNA sequence based phylogeny is robust, it remains in conflict with phylogenetic relationships suggested by analysis of limited nuclear-encoded data, a situation that will require gathering more nuclear DNA sequence information.
Analysis of complete mitochondrial genome sequences increases phylogenetic resolution of bears (Ursidae), a mammalian family that experienced rapid speciation

PubMed Central

Yu, Li; Li, Yi-Wei; Ryder, Oliver A; Zhang, Ya-Ping

2007-01-01

Background Despite the small number of ursid species, bear phylogeny has long been a focus of study due to their conservation value, as all bear genera have been classified as endangered at either the species or subspecies level. The Ursidae family represents a typical example of rapid evolutionary radiation. Previous analyses with a single mitochondrial (mt) gene or a small number of mt genes either provide weak support or a large unresolved polytomy for ursids. We revisit the contentious relationships within Ursidae by analyzing complete mt genome sequences and evaluating the performance of both entire mt genomes and constituent mtDNA genes in recovering a phylogeny of extremely recent speciation events. Results This mitochondrial genome-based phylogeny provides strong evidence that the spectacled bear diverged first, while within the genus Ursus, the sloth bear is the sister taxon of all the other five ursines. The latter group is divided into the brown bear/polar bear and the two black bears/sun bear assemblages. These findings resolve the previous conflicts between trees using partial mt genes. The ability of different categories of mt protein coding genes to recover the correct phylogeny is concordant with previous analyses for taxa with deep divergence times. This study provides a robust Ursidae phylogenetic framework for future validation by additional independent evidence, and also has significant implications for assisting in the resolution of other similarly difficult phylogenetic investigations. Conclusion Identification of base composition bias and utilization of the combined data of whole mitochondrial genome sequences has allowed recovery of a strongly supported phylogeny that is upheld when using multiple alternative outgroups for the Ursidae, a mammalian family that underwent a rapid radiation since the mid- to late Pliocene. It remains to be seen if the reliability of mt genome analysis will hold up in studies of other difficult phylogenetic issues. Although the whole mitochondrial DNA sequence based phylogeny is robust, it remains in conflict with phylogenetic relationships suggested by analysis of limited nuclear-encoded data, a situation that will require gathering more nuclear DNA sequence information. PMID:17956639
Influenza C in Lancaster, UK, in the winter of 2014–2015

PubMed Central

Atkinson, Kate V.; Bishop, Lisa A.; Rhodes, Glenn; Salez, Nicolas; McEwan, Neil R.; Hegarty, Matthew J.; Robey, Julie; Harding, Nicola; Wetherell, Simon; Lauder, Robert M.; Pickup, Roger W.; Wilkinson, Mark; Gatherer, Derek

2017-01-01

Influenza C is not included in the annual seasonal influenza vaccine, and has historically been regarded as a minor respiratory pathogen. However, recent work has highlighted its potential role as a cause of pneumonia in infants. We performed nasopharyngeal or nasal swabbing and/or serum sampling (n = 148) in Lancaster, UK, over the winter of 2014–2015. Using enzyme-linked immunosorbent assay (ELISA), we obtain seropositivity of 77%. By contrast, only 2 individuals, both asymptomatic adults, were influenza C-positive by polymerase chain reaction (PCR). Deep sequencing of nasopharyngeal samples produced partial sequences for 4 genome segments in one of these patients. Bayesian phylogenetic analysis demonstrated that the influenza C genome from this individual is evolutionarily distant to those sampled in recent years and represents a novel genome constellation, indicating that it may be a product of a decades-old reassortment event. Although we find no evidence that influenza C was a significant respiratory pathogen during the winter of 2014–2015 in Lancaster, we confirm previous observations of seropositivity in the majority of the population. (170 words). PMID:28406194
Pressure for Pattern-Specific Intertypic Recombination between Sabin Polioviruses: Evolutionary Implications.

PubMed

Korotkova, Ekaterina; Laassri, Majid; Zagorodnyaya, Tatiana; Petrovskaya, Svetlana; Rodionova, Elvira; Cherkasova, Elena; Gmyl, Anatoly; Ivanova, Olga E; Eremeeva, Tatyana P; Lipskaya, Galina Y; Agol, Vadim I; Chumakov, Konstantin

2017-11-22

Complete genomic sequences of a non-redundant set of 70 recombinants between three serotypes of attenuated Sabin polioviruses as well as location (based on partial sequencing) of crossover sites of 28 additional recombinants were determined and compared with the previously published data. It is demonstrated that the genomes of Sabin viruses contain distinct strain-specific segments that are eliminated by recombination. The presumed low fitness of these segments could be linked to mutations acquired upon derivation of the vaccine strains and/or may have been present in wild-type parents of Sabin viruses. These "weak" segments contribute to the propensity of these viruses to recombine with each other and with other enteroviruses as well as determine the choice of crossover sites. The knowledge of location of such segments opens additional possibilities for the design of more genetically stable and/or more attenuated variants, i.e., candidates for new oral polio vaccines. The results also suggest that the genome of wild polioviruses, and, by generalization, of other RNA viruses, may harbor hidden low-fitness segments that can be readily eliminated only by recombination.
The RNA 5 of Prunus necrotic ringspot virus is a biologically inactive copy of the 3'-UTR of the genomic RNA 3.

PubMed

Di Terlizzi, B; Skrzeczkowski, L J; Mink, G I; Scott, S W; Zimmerman, M T

2001-01-01

In addition to the four RNAs known to be encapsidated by Prunus necrotic ringspot virus (PNRSV) and Apple mosaic virus (ApMV), an additional small RNA (RNA 5) was present in purified preparations of several isolates of both viruses. RNA 5 was always produced following infection of a susceptible host by an artificial mixture of RNAs 1, 2, 3, and 4 indicating that it was a product of viral replication. RNA 5 does not activate the infectivity of mixtures that contain the three genomic RNAs (RNA 1 + RNA 2 + RNA 3) nor does it appear to modify symptom expression. Results from hybridization studies suggested that RNA 5 had partial sequence homology with RNAs 1, 2, 3, and 4. Cloning and sequencing the RNA 5 of isolate CH 57/1-M of PNRSV, and the 3' termini of the RNA 1, RNA 2 and RNA 3 of this isolate indicated that it was a copy of the 3' untranslated terminal region (3'-UTR) of the genomic RNA 3.
Genome-Based Comparison of Clostridioides difficile: Average Amino Acid Identity Analysis of Core Genomes.

PubMed

Cabal, Adriana; Jun, Se-Ran; Jenjaroenpun, Piroon; Wanchai, Visanu; Nookaew, Intawat; Wongsurawat, Thidathip; Burgess, Mary J; Kothari, Atul; Wassenaar, Trudy M; Ussery, David W

2018-02-14

Infections due to Clostridioides difficile (previously known as Clostridium difficile) are a major problem in hospitals, where cases can be caused by community-acquired strains as well as by nosocomial spread. Whole genome sequences from clinical samples contain a lot of information but that needs to be analyzed and compared in such a way that the outcome is useful for clinicians or epidemiologists. Here, we compare 663 public available complete genome sequences of C. difficile using average amino acid identity (AAI) scores. This analysis revealed that most of these genomes (640, 96.5%) clearly belong to the same species, while the remaining 23 genomes produce four distinct clusters within the Clostridioides genus. The main C. difficile cluster can be further divided into sub-clusters, depending on the chosen cutoff. We demonstrate that MLST, either based on partial or full gene-length, results in biased estimates of genetic differences and does not capture the true degree of similarity or differences of complete genomes. Presence of genes coding for C. difficile toxins A and B (ToxA/B), as well as the binary C. difficile toxin (CDT), was deduced from their unique PfamA domain architectures. Out of the 663 C. difficile genomes, 535 (80.7%) contained at least one copy of ToxA or ToxB, while these genes were missing from 128 genomes. Although some clusters were enriched for toxin presence, these genes are variably present in a given genetic background. The CDT genes were found in 191 genomes, which were restricted to a few clusters only, and only one cluster lacked the toxin A/B genes consistently. A total of 310 genomes contained ToxA/B without CDT (47%). Further, published metagenomic data from stools were used to assess the presence of C. difficile sequences in blinded cases of C. difficile infection (CDI) and controls, to test if metagenomic analysis is sensitive enough to detect the pathogen, and to establish strain relationships between cases from the same hospital. We conclude that metagenomics can contribute to the identification of CDI and can assist in characterization of the most probable causative strain in CDI patients.
Fast and accurate de novo genome assembly from long uncorrected reads

PubMed Central

Vaser, Robert; Sović, Ivan; Nagarajan, Niranjan

2017-01-01

The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource-intensive error-correction and consensus-generation steps to obtain high-quality assemblies. We show that the error-correction step can be omitted and that high-quality consensus sequences can be generated efficiently with a SIMD-accelerated, partial-order alignment–based, stand-alone consensus module called Racon. Based on tests with PacBio and Oxford Nanopore data sets, we show that Racon coupled with miniasm enables consensus genomes with similar or better quality than state-of-the-art methods while being an order of magnitude faster. PMID:28100585
Enhancer Evolution across 20 Mammalian Species

PubMed Central

Villar, Diego; Berthelot, Camille; Aldridge, Sarah; Rayner, Tim F.; Lukk, Margus; Pignatelli, Miguel; Park, Thomas J.; Deaville, Robert; Erichsen, Jonathan T.; Jasinska, Anna J.; Turner, James M.A.; Bertelsen, Mads F.; Murchison, Elizabeth P.; Flicek, Paul; Odom, Duncan T.

2015-01-01

Summary The mammalian radiation has corresponded with rapid changes in noncoding regions of the genome, but we lack a comprehensive understanding of regulatory evolution in mammals. Here, we track the evolution of promoters and enhancers active in liver across 20 mammalian species from six diverse orders by profiling genomic enrichment of H3K27 acetylation and H3K4 trimethylation. We report that rapid evolution of enhancers is a universal feature of mammalian genomes. Most of the recently evolved enhancers arise from ancestral DNA exaptation, rather than lineage-specific expansions of repeat elements. In contrast, almost all liver promoters are partially or fully conserved across these species. Our data further reveal that recently evolved enhancers can be associated with genes under positive selection, demonstrating the power of this approach for annotating regulatory adaptations in genomic sequences. These results provide important insight into the functional genetics underpinning mammalian regulatory evolution. PMID:25635462
Assessment of Recombination in the S-segment Genome of Crimean-Congo Hemorrhagic Fever Virus in Iran

PubMed Central

Chinikar, Sadegh; Shah-Hosseini, Nariman; Bouzari, Saeid; Shokrgozar, Mohammad Ali; Mostafavi, Ehsan; Jalali, Tahmineh; Khakifirouz, Sahar; Groschup, Martin H; Niedrig, Matthias

2016-01-01

Background: Crimean-Congo Hemorrhagic Fever Virus (CCHFV) belongs to genus Nairovirus and family Bunyaviridae. The main aim of this study was to investigate the extent of recombination in S-segment genome of CCHFV in Iran. Methods: Samples were isolated from Iranian patients and those available in GenBank, and analyzed by phylogenetic and bootscan methods. Results: Through comparison of the phylogenetic trees based on full length sequences and partial fragments in the S-segment genome of CCHFV, genetic switch was evident, due to recombination event. Moreover, evidence of multiple recombination events was detected in query isolates when bootscan analysis was used by SimPlot software. Conclusion: Switch of different genomic regions between different strains by recombination could contribute to CCHFV diversification and evolution. The occurrence of recombination in CCHFV has a critical impact on epidemiological investigations and vaccine design. PMID:27047968
The earliest maize from San Marcos Tehuacán is a partial domesticate with genomic evidence of inbreeding

PubMed Central

Vallebueno-Estrada, Miguel; Rodríguez-Arévalo, Isaac; Rougon-Cardoso, Alejandra; Martínez González, Javier; García Cook, Angel; Vielle-Calzada, Jean-Philippe

2016-01-01

Pioneering archaeological expeditions lead by Richard MacNeish in the 1960s identified the valley of Tehuacán as an important center of early Mesoamerican agriculture, providing by far the widest collection of ancient crop remains, including maize. In 2012, a new exploration of San Marcos cave (Tehuacán, Mexico) yielded nonmanipulated maize specimens dating at a similar age of 5,300–4,970 calibrated y B.P. On the basis of shotgun sequencing and genomic comparisons to Balsas teosinte and modern maize, we show herein that the earliest maize from San Marcos cave was a partial domesticate diverging from the landraces and containing ancestral allelic variants that are absent from extant maize populations. Whereas some domestication loci, such as teosinte branched1 (tb1) and brittle endosperm2 (bt2), had already lost most of the nucleotide variability present in Balsas teosinte, others, such as teosinte glume architecture1 (tga1) and sugary1 (su1), conserved partial levels of nucleotide variability that are absent from extant maize. Genetic comparisons among three temporally convergent samples revealed that they were homozygous and identical by descent across their genome. Our results indicate that the earliest maize from San Marcos was already inbred, opening the possibility for Tehuacán maize cultivation evolving from reduced founder populations of isolated and perhaps self-pollinated individuals. PMID:27872313

The earliest maize from San Marcos Tehuacán is a partial domesticate with genomic evidence of inbreeding.

PubMed

Vallebueno-Estrada, Miguel; Rodríguez-Arévalo, Isaac; Rougon-Cardoso, Alejandra; Martínez González, Javier; García Cook, Angel; Montiel, Rafael; Vielle-Calzada, Jean-Philippe

2016-12-06

Pioneering archaeological expeditions lead by Richard MacNeish in the 1960s identified the valley of Tehuacán as an important center of early Mesoamerican agriculture, providing by far the widest collection of ancient crop remains, including maize. In 2012, a new exploration of San Marcos cave (Tehuacán, Mexico) yielded nonmanipulated maize specimens dating at a similar age of 5,300-4,970 calibrated y B.P. On the basis of shotgun sequencing and genomic comparisons to Balsas teosinte and modern maize, we show herein that the earliest maize from San Marcos cave was a partial domesticate diverging from the landraces and containing ancestral allelic variants that are absent from extant maize populations. Whereas some domestication loci, such as teosinte branched1 (tb1) and brittle endosperm2 (bt2), had already lost most of the nucleotide variability present in Balsas teosinte, others, such as teosinte glume architecture1 (tga1) and sugary1 (su1), conserved partial levels of nucleotide variability that are absent from extant maize. Genetic comparisons among three temporally convergent samples revealed that they were homozygous and identical by descent across their genome. Our results indicate that the earliest maize from San Marcos was already inbred, opening the possibility for Tehuacán maize cultivation evolving from reduced founder populations of isolated and perhaps self-pollinated individuals.
REDIdb: the RNA editing database.

PubMed

Picardi, Ernesto; Regina, Teresa Maria Rosaria; Brennicke, Axel; Quagliariello, Carla

2007-01-01

The RNA Editing Database (REDIdb) is an interactive, web-based database created and designed with the aim to allocate RNA editing events such as substitutions, insertions and deletions occurring in a wide range of organisms. The database contains both fully and partially sequenced DNA molecules for which editing information is available either by experimental inspection (in vitro) or by computational detection (in silico). Each record of REDIdb is organized in a specific flat-file containing a description of the main characteristics of the entry, a feature table with the editing events and related details and a sequence zone with both the genomic sequence and the corresponding edited transcript. REDIdb is a relational database in which the browsing and identification of editing sites has been simplified by means of two facilities to either graphically display genomic or cDNA sequences or to show the corresponding alignment. In both cases, all editing sites are highlighted in colour and their relative positions are detailed by mousing over. New editing positions can be directly submitted to REDIdb after a user-specific registration to obtain authorized secure access. This first version of REDIdb database stores 9964 editing events and can be freely queried at http://biologia.unical.it/py_script/search.html.
Striking similarities in amino acid sequence among nonstructural proteins encoded by RNA viruses that have dissimilar genomic organization.

PubMed Central

Haseloff, J; Goelet, P; Zimmern, D; Ahlquist, P; Dasgupta, R; Kaesberg, P

1984-01-01

The plant viruses alfalfa mosaic virus (AMV) and brome mosaic virus (BMV) each divide their genetic information among three RNAs while tobacco mosaic virus (TMV) contains a single genomic RNA. Amino acid sequence comparisons suggest that the single proteins encoded by AMV RNA 1 and BMV RNA 1 and by AMV RNA 2 and BMV RNA 2 are related to the NH2-terminal two-thirds and the COOH-terminal one-third, respectively, of the largest protein encoded by TMV. Separating these two domains in the TMV RNA sequence is an amber termination codon, whose partial suppression allows translation of the downstream domain. Many of the residues that the TMV read-through domain and the segmented plant viruses have in common are also conserved in a read-through domain found in the nonstructural polyprotein of the animal alphaviruses Sindbis and Middelburg. We suggest that, despite substantial differences in gene organization and expression, all of these viruses use related proteins for common functions in RNA replication. Reassortment of functional modules of coding and regulatory sequence from preexisting viral or cellular sources, perhaps via RNA recombination, may be an important mechanism in RNA virus evolution. PMID:6611550
Nucleotide sequence analysis of the 3' terminal region of a wasabi strain of crucifer tobamovirus genomic RNA: subgrouping of crucifer tobamoviruses.

PubMed

Shimamoto, I; Sonoda, S; Vazquez, P; Minaka, N; Nishiguchi, M

1998-01-01

The 3' terminal 2378 nucleotides of a wasabi strain of crucifer tobamovirus (CTMV-W) infectious to crucifer plants was determined. This includes the 3' non-coding region of 235 nucleotides, coat protein (CP) gene (468 nucleotides), movement protein (MP) gene (798 nucleotides) and C-terminal partial readthrough portion of 180 K protein gene (940 nucleotides). Comparison of the sequence with homologous regions of thirteen other tobamovirus genomes showed that it had much higher identity to those of four other crucifer tobamoviruses, 85.2% to cr-TMV and turnip vein-clearing virus (TVCV), 87.4% to oilseed rape mosaic virus (ORMV) and 87.1% to TMV-Cg, than to those of other tobamoviruses. Thus CTMV-W was most similar to ORMV and TMV-Cg in sequence, but only marginally so, whereas the location and size of its MP gene was the same as cr-TMV amd TVCV. These results, together with other analyses, show that CTMV-W is a new crucifer tobamovirus, that the five crucifer tobamoviruses can be classified into two subgroups based on MP gene organization, and that the rate of sequence change is not the same in all lineages.
Full-length genome characterization and genetic relatedness analysis of hepatitis A virus outbreak strains associated with acute liver failure among children.

PubMed

Vaughan, Gilberto; Forbi, Joseph C; Xia, Guo-Liang; Fonseca-Ford, Maureen; Vazquez, Roberto; Khudyakov, Yury E; Montiel, Sonia; Waterman, Steve; Alpuche, Celia; Gonçalves Rossi, Livia Maria; Luna, Norma

2014-02-01

Clinical infection by hepatitis A virus (HAV) is generally self-limited but in some cases can progress to liver failure. Here, an HAV outbreak investigation among children with acute liver failure in a highly endemic country is presented. In addition, a sensitive method for HAV whole genome amplification and sequencing suitable for analysis of clinical samples is described. In this setting, two fatal cases attributed to acute liver failure and two asymptomatic cases living in the same household were identified. In a second household, one HAV case was observed with jaundice which resolved spontaneously. Partial molecular characterization showed that both households were infected by HAV subtype IA; however, the infecting strains in the two households were different. The HAV outbreak strains recovered from all cases grouped together within cluster IA1, which contains closely related HAV strains from the United States commonly associated with international travelers. Full-genome HAV sequences obtained from the household with the acute liver failure cases were related (genetic distances ranging from 0.01% to 0.04%), indicating a common-source infection. Interestingly, the strain recovered from the asymptomatic household contact was nearly identical to the strain causing acute liver failure. The whole genome sequence from the case in the second household was distinctly different from the strains associated with acute liver failure. Thus, infection with almost identical HAV strains resulted in drastically different clinical outcomes. © 2013 Wiley Periodicals, Inc.
Genomics of Sponge-Associated Streptomyces spp. Closely Related to Streptomyces albus J1074: Insights into Marine Adaptation and Secondary Metabolite Biosynthesis Potential

PubMed Central

Ian, Elena; Malko, Dmitry B.; Sekurova, Olga N.; Bredholt, Harald; Rückert, Christian; Borisova, Marina E.; Albersmeier, Andreas; Kalinowski, Jörn; Gelfand, Mikhail S.; Zotchev, Sergey B.

2014-01-01

A total of 74 actinomycete isolates were cultivated from two marine sponges, Geodia barretti and Phakellia ventilabrum collected at the same spot at the bottom of the Trondheim fjord (Norway). Phylogenetic analyses of sponge-associated actinomycetes based on the 16S rRNA gene sequences demonstrated the presence of species belonging to the genera Streptomyces, Nocardiopsis, Rhodococcus, Pseudonocardia and Micromonospora. Most isolates required sea water for growth, suggesting them being adapted to the marine environment. Phylogenetic analysis of Streptomyces spp. revealed two isolates that originated from different sponges and had 99.7% identity in their 16S rRNA gene sequences, indicating that they represent very closely related strains. Sequencing, annotation, and analyses of the genomes of these Streptomyces isolates demonstrated that they are sister organisms closely related to terrestrial Streptomyces albus J1074. Unlike S. albus J1074, the two sponge streptomycetes grew and differentiated faster on the medium containing sea water. Comparative genomics revealed several genes presumably responsible for partial marine adaptation of these isolates. Genome mining targeted to secondary metabolite biosynthesis gene clusters identified several of those, which were not present in S. albus J1074, and likely to have been retained from a common ancestor, or acquired from other actinomycetes. Certain genes and gene clusters were shown to be differentially acquired or lost, supporting the hypothesis of divergent evolution of the two Streptomyces species in different sponge hosts. PMID:24819608
Genomics of sponge-associated Streptomyces spp. closely related to Streptomyces albus J1074: insights into marine adaptation and secondary metabolite biosynthesis potential.

PubMed

Ian, Elena; Malko, Dmitry B; Sekurova, Olga N; Bredholt, Harald; Rückert, Christian; Borisova, Marina E; Albersmeier, Andreas; Kalinowski, Jörn; Gelfand, Mikhail S; Zotchev, Sergey B

2014-01-01

A total of 74 actinomycete isolates were cultivated from two marine sponges, Geodia barretti and Phakellia ventilabrum collected at the same spot at the bottom of the Trondheim fjord (Norway). Phylogenetic analyses of sponge-associated actinomycetes based on the 16S rRNA gene sequences demonstrated the presence of species belonging to the genera Streptomyces, Nocardiopsis, Rhodococcus, Pseudonocardia and Micromonospora. Most isolates required sea water for growth, suggesting them being adapted to the marine environment. Phylogenetic analysis of Streptomyces spp. revealed two isolates that originated from different sponges and had 99.7% identity in their 16S rRNA gene sequences, indicating that they represent very closely related strains. Sequencing, annotation, and analyses of the genomes of these Streptomyces isolates demonstrated that they are sister organisms closely related to terrestrial Streptomyces albus J1074. Unlike S. albus J1074, the two sponge streptomycetes grew and differentiated faster on the medium containing sea water. Comparative genomics revealed several genes presumably responsible for partial marine adaptation of these isolates. Genome mining targeted to secondary metabolite biosynthesis gene clusters identified several of those, which were not present in S. albus J1074, and likely to have been retained from a common ancestor, or acquired from other actinomycetes. Certain genes and gene clusters were shown to be differentially acquired or lost, supporting the hypothesis of divergent evolution of the two Streptomyces species in different sponge hosts.
BAC sequencing using pooled methods.

PubMed

Saski, Christopher A; Feltus, F Alex; Parida, Laxmi; Haiminen, Niina

2015-01-01

Shotgun sequencing and assembly of a large, complex genome can be both expensive and challenging to accurately reconstruct the true genome sequence. Repetitive DNA arrays, paralogous sequences, polyploidy, and heterozygosity are main factors that plague de novo genome sequencing projects that typically result in highly fragmented assemblies and are difficult to extract biological meaning. Targeted, sub-genomic sequencing offers complexity reduction by removing distal segments of the genome and a systematic mechanism for exploring prioritized genomic content through BAC sequencing. If one isolates and sequences the genome fraction that encodes the relevant biological information, then it is possible to reduce overall sequencing costs and efforts that target a genomic segment. This chapter describes the sub-genome assembly protocol for an organism based upon a BAC tiling path derived from a genome-scale physical map or from fine mapping using BACs to target sub-genomic regions. Methods that are described include BAC isolation and mapping, DNA sequencing, and sequence assembly.
“Maxillary lateral incisor partial anodontia sequence”: a clinical entity with epigenetic origin

PubMed Central

Consolaro, Alberto; Cardoso, Maurício Almeida; Consolaro, Renata Bianco

2017-01-01

ABSTRACT The relationship between maxillary lateral incisor anodontia and the palatal displacement of unerupted maxillary canines cannot be considered as a multiple tooth abnormality with defined genetic etiology in order to be regarded as a “syndrome”. Neither were the involved genes identified and located in the human genome, nor was it presumed on which chromosome the responsible gene would be located. The palatal maxillary canine displacement in cases of partial anodontia of the maxillary lateral incisor is potentially associated with environmental changes caused by its absence in its place of formation and eruption, which would characterize an epigenetic etiology. The lack of the maxillary lateral incisor in the canine region means removing one of the reference guides for the eruptive trajectory of the maxillary canine, which would therefore, not erupt and /or impact on the palate. Consequently, and in sequence, it would lead to malocclusion, maxillary atresia, transposition, prolonged retention of the deciduous canine and resorption in the neighboring teeth. Thus, we can say that we are dealing with a set of anomalies and multiple sequential changes known as sequential development anomalies or, simply, sequence. Once the epigenetics and sequential condition is accepted for this clinical picture, it could be called “Maxillary Lateral Incisor Partial Anodontia Sequence.” PMID:29364376
Molecular epidemiology of hepatitis B virus in Misiones, Argentina.

PubMed

Mojsiejczuk, Laura Noelia; Torres, Carolina; Sevic, Ina; Badano, Inés; Malan, Richard; Flichman, Diego Martin; Liotta, Domingo Javier; Campos, Rodolfo Hector

2016-10-01

Hepatitis B virus (HBV) infection is a major public health problem worldwide. The aims of this study were to describe the molecular epidemiology of HBV in the Province of Misiones, Argentina and estimate the phylodynamic of the main groups in a Bayesian coalescent framework. To this end, partial or complete genome sequences were obtained from 52 blood donor candidates. The phylogenetic analysis based on partial sequences of S/P region showed a predominance of genotype D (65.4%), followed by genotype F (30.8%) and genotype A as a minority (3.8%). At subgenotype level, the circulation of subgenotypes D3 (42.3%), D2 (13.5%), F1b (11.5%) and F4 (9.6%) was mainly identified. The Bayesian coalescent analysis of 29 complete genome sequences for the main groups revealed that the subgenotypes D2 and D3 had several introductions to the region, with ancestors dating back from 1921 to 1969 and diversification events until the late '70s. The genotype F in Misiones has a more recent history; subgenotype F4 isolates were intermixed with sequences from Argentina and neighboring countries and only one significant cluster dated back in 1994 was observed. Subgenotype F1b isolates exhibited low genetic distance and formed a closely related monophyletic cluster, suggesting a very recent introduction. In conclusion, the phylogenetic and coalescent analyses showed that the European genotype D has a higher circulation, a longer history of diversification and may be responsible for the largest proportion of chronic HBV infections in the Province of Misiones. Genotype F, especially subgenotype F1b, had a more recent introduction and its diversification in the last 20years might be related to its involvement in new transmission events. Copyright © 2016 Elsevier B.V. All rights reserved.
Identification of an HIV-1 BG Intersubtype Recombinant Form (CRF73_BG), Partially Related to CRF14_BG, Which Is Circulating in Portugal and Spain

PubMed Central

Fernández-García, Aurora; Delgado, Elena; Cuevas, María Teresa; Vega, Yolanda; Montero, Vanessa; Sánchez, Mónica; Carrera, Cristina; López-Álvarez, María José; Miralles, Celia; Pérez-Castro, Sonia; Cilla, Gustavo; Hinojosa, Carmen; Pérez-Álvarez, Lucía; Thomson, Michael M.

2016-01-01

HIV-1 exhibits a characteristically high genetic diversity, with the M group, responsible for the pandemic, being classified into nine subtypes, 72 circulating recombinant forms (CRFs) and numerous unique recombinant forms (URFs). Here we characterize the near full-length genome sequence of an HIV-1 BG intersubtype recombinant virus (X3208) collected in Galicia (Northwest Spain) which exhibits a mosaic structure coincident with that of a previously characterized BG recombinant virus (9601_01), collected in Germany and epidemiologically linked to Portugal, and different from currently defined CRFs. Similar recombination patterns were found in partial genome sequences from three other BG recombinant viruses, one newly derived, from a virus collected in Spain, and two retrieved from databases, collected in France and Portugal, respectively. Breakpoint coincidence and clustering in phylogenetic trees of these epidemiologically-unlinked viruses allow to define a new HIV-1 CRF (CRF73_BG). CRF73_BG shares one breakpoint in the envelope with CRF14_BG, which circulates in Portugal and Spain, and groups with it in a subtype B envelope fragment, but the greatest part of its genome does not appear to derive from CRF14_BG, although both CRFs share as parental strain the subtype G variant circulating in the Iberian Peninsula. Phylogenetic clustering of partial pol and env segments from viruses collected in Portugal and Spain with X3208 and 9691_01 indicates that CRF73_BG is circulating in both countries, with proportions of around 2–3% Portuguese database HIV-1 isolates clustering with CRF73_BG. The fact that an HIV-1 recombinant virus characterized ten years ago as a URF has been shown to represent a CRF suggests that the number of HIV-1 CRFs may be much greater than currently known. PMID:26900693
Development of Genic and Genomic SSR Markers of Robusta Coffee (Coffea canephora Pierre Ex A. Froehner)

PubMed Central

Hendre, Prasad S.; Aggarwal, Ramesh K.

2014-01-01

Coffee breeding and improvement efforts can be greatly facilitated by availability of a large repository of simple sequence repeats (SSRs) based microsatellite markers, which provides efficiency and high-resolution in genetic analyses. This study was aimed to improve SSR availability in coffee by developing new genic−/genomic-SSR markers using in-silico bioinformatics and streptavidin-biotin based enrichment approach, respectively. The expressed sequence tag (EST) based genic microsatellite markers (EST-SSRs) were developed using the publicly available dataset of 13,175 unigene ESTs, which showed a distribution of 1 SSR/3.4 kb of coffee transcriptome. Genomic SSRs, on the other hand, were developed from an SSR-enriched small-insert partial genomic library of robusta coffee. In total, 69 new SSRs (44 EST-SSRs and 25 genomic SSRs) were developed and validated as suitable genetic markers. Diversity analysis of selected coffee genotypes revealed these to be highly informative in terms of allelic diversity and PIC values, and eighteen of these markers (∼27%) could be mapped on a robusta linkage map. Notably, the markers described here also revealed a very high cross-species transferability. In addition to the validated markers, we have also designed primer pairs for 270 putative EST-SSRs, which are expected to provide another ca. 200 useful genetic markers considering the high success rate (88%) of marker conversion of similar pairs tested/validated in this study. PMID:25461752
Phylogenomics of plant genomes: a methodology for genome-wide searches for orthologs in plants

PubMed Central

Conte, Matthieu G; Gaillard, Sylvain; Droc, Gaetan; Perin, Christophe

2008-01-01

Background Gene ortholog identification is now a major objective for mining the increasing amount of sequence data generated by complete or partial genome sequencing projects. Comparative and functional genomics urgently need a method for ortholog detection to reduce gene function inference and to aid in the identification of conserved or divergent genetic pathways between several species. As gene functions change during evolution, reconstructing the evolutionary history of genes should be a more accurate way to differentiate orthologs from paralogs. Phylogenomics takes into account phylogenetic information from high-throughput genome annotation and is the most straightforward way to infer orthologs. However, procedures for automatic detection of orthologs are still scarce and suffer from several limitations. Results We developed a procedure for ortholog prediction between Oryza sativa and Arabidopsis thaliana. Firstly, we established an efficient method to cluster A. thaliana and O. sativa full proteomes into gene families. Then, we developed an optimized phylogenomics pipeline for ortholog inference. We validated the full procedure using test sets of orthologs and paralogs to demonstrate that our method outperforms pairwise methods for ortholog predictions. Conclusion Our procedure achieved a high level of accuracy in predicting ortholog and paralog relationships. Phylogenomic predictions for all validated gene families in both species were easily achieved and we can conclude that our methodology outperforms similarly based methods. PMID:18426584
High Variety of Known and New RNA and DNA Viruses of Diverse Origins in Untreated Sewage

PubMed Central

Ng, Terry Fei Fan; Marine, Rachel; Wang, Chunlin; Simmonds, Peter; Kapusinszky, Beatrix; Bodhidatta, Ladaporn; Oderinde, Bamidele Soji; Wommack, K. Eric

2012-01-01

Deep sequencing of untreated sewage provides an opportunity to monitor enteric infections in large populations and for high-throughput viral discovery. A metagenomics analysis of purified viral particles in untreated sewage from the United States (San Francisco, CA), Nigeria (Maiduguri), Thailand (Bangkok), and Nepal (Kathmandu) revealed sequences related to 29 eukaryotic viral families infecting vertebrates, invertebrates, and plants (BLASTx E score, <10−4), including known pathogens (>90% protein identities) in numerous viral families infecting humans (Adenoviridae, Astroviridae, Caliciviridae, Hepeviridae, Parvoviridae, Picornaviridae, Picobirnaviridae, and Reoviridae), plants (Alphaflexiviridae, Betaflexiviridae, Partitiviridae, Sobemovirus, Secoviridae, Tombusviridae, Tymoviridae, Virgaviridae), and insects (Dicistroviridae, Nodaviridae, and Parvoviridae). The full and partial genomes of a novel kobuvirus, salivirus, and sapovirus are described. A novel astrovirus (casa astrovirus) basal to those infecting mammals and birds, potentially representing a third astrovirus genus, was partially characterized. Potential new genera and families of viruses distantly related to members of the single-stranded RNA picorna-like virus superfamily were genetically characterized and named Picalivirus, Secalivirus, Hepelivirus, Nedicistrovirus, Cadicistrovirus, and Niflavirus. Phylogenetic analysis placed these highly divergent genomes near the root of the picorna-like virus superfamily, with possible vertebrate, plant, or arthropod hosts inferred from nucleotide composition analysis. Circular DNA genomes distantly related to the plant-infecting Geminiviridae family were named Baminivirus, Nimivirus, and Niminivirus. These results highlight the utility of analyzing sewage to monitor shedding of viral pathogens and the high viral diversity found in this common pollutant and provide genetic information to facilitate future studies of these newly characterized viruses. PMID:22933275
Variation in the fumonisin biosynthetic gene cluster in fumonisin-producing and nonproducing black aspergilli.

PubMed

Susca, Antonia; Proctor, Robert H; Butchko, Robert A E; Haidukowski, Miriam; Stea, Gaetano; Logrieco, Antonio; Moretti, Antonio

2014-12-01

The ability to produce fumonisin mycotoxins varies among members of the black aspergilli. Previously, analyses of selected genes in the fumonisin biosynthetic gene (fum) cluster in black aspergilli from California grapes indicated that fumonisin-nonproducing isolates of Aspergillus welwitschiae lack six fum genes, but nonproducing isolates of Aspergillus niger do not. In the current study, analyses of black aspergilli from grapes from the Mediterranean Basin indicate that the genomic context of the fum cluster is the same in isolates of A. niger and A. welwitschiae regardless of fumonisin-production ability and that full-length clusters occur in producing isolates of both species and nonproducing isolates of A. niger. In contrast, the cluster has undergone an eight-gene deletion in fumonisin-nonproducing isolates of A. welwitschiae. Phylogenetic analyses suggest each species consists of a mixed population of fumonisin-producing and nonproducing individuals, and that existence of both production phenotypes may provide a selective advantage to these species. Differences in gene content of fum cluster homologues and phylogenetic relationships of fum genes suggest that the mutation(s) responsible for the nonproduction phenotype differs, and therefore arose independently, in the two species. Partial fum cluster homologues were also identified in genome sequences of four other black Aspergillus species. Gene content of these partial clusters and phylogenetic relationships of fum sequences indicate that non-random partial deletion of the cluster has occurred multiple times among the species. This in turn suggests that an intact cluster and fumonisin production were once more widespread among black aspergilli. Copyright © 2014 Elsevier Inc. All rights reserved.
Whole-genome sequencing for comparative genomics and de novo genome assembly.

PubMed

Benjak, Andrej; Sala, Claudia; Hartkoorn, Ruben C

2015-01-01

Next-generation sequencing technologies for whole-genome sequencing of mycobacteria are rapidly becoming an attractive alternative to more traditional sequencing methods. In particular this technology is proving useful for genome-wide identification of mutations in mycobacteria (comparative genomics) as well as for de novo assembly of whole genomes. Next-generation sequencing however generates a vast quantity of data that can only be transformed into a usable and comprehensible form using bioinformatics. Here we describe the methodology one would use to prepare libraries for whole-genome sequencing, and the basic bioinformatics to identify mutations in a genome following Illumina HiSeq or MiSeq sequencing, as well as de novo genome assembly following sequencing using Pacific Biosciences (PacBio).
Cloning, Assembly, and Modification of the Primary Human Cytomegalovirus Isolate Toledo by Yeast-Based Transformation-Associated Recombination.

PubMed

Vashee, Sanjay; Stockwell, Timothy B; Alperovich, Nina; Denisova, Evgeniya A; Gibson, Daniel G; Cady, Kyle C; Miller, Kristofer; Kannan, Krishna; Malouli, Daniel; Crawford, Lindsey B; Voorhies, Alexander A; Bruening, Eric; Caposio, Patrizia; Früh, Klaus

2017-01-01

Genetic engineering of cytomegalovirus (CMV) currently relies on generating a bacterial artificial chromosome (BAC) by introducing a bacterial origin of replication into the viral genome using in vivo recombination in virally infected tissue culture cells. However, this process is inefficient, results in adaptive mutations, and involves deletion of viral genes to avoid oversized genomes when inserting the BAC cassette. Moreover, BAC technology does not permit the simultaneous manipulation of multiple genome loci and cannot be used to construct synthetic genomes. To overcome these limitations, we adapted synthetic biology tools to clone CMV genomes in Saccharomyces cerevisiae . Using an early passage of the human CMV isolate Toledo, we first applied transformation-associated recombination (TAR) to clone 16 overlapping fragments covering the entire Toledo genome in Saccharomyces cerevisiae . Then, we assembled these fragments by TAR in a stepwise process until the entire genome was reconstituted in yeast. Since next-generation sequence analysis revealed that the low-passage-number isolate represented a mixture of parental and fibroblast-adapted genomes, we selectively modified individual DNA fragments of fibroblast-adapted Toledo (Toledo-F) and again used TAR assembly to recreate parental Toledo (Toledo-P). Linear, full-length HCMV genomes were transfected into human fibroblasts to recover virus. Unlike Toledo-F, Toledo-P displayed characteristics of primary isolates, including broad cellular tropism in vitro and the ability to establish latency and reactivation in humanized mice. Our novel strategy thus enables de novo cloning of CMV genomes, more-efficient genome-wide engineering, and the generation of viral genomes that are partially or completely derived from synthetic DNA. IMPORTANCE The genomes of large DNA viruses, such as human cytomegalovirus (HCMV), are difficult to manipulate using current genetic tools, and at this time, it is not possible to obtain, molecular clones of CMV without extensive tissue culture. To overcome these limitations, we used synthetic biology tools to capture genomic fragments from viral DNA and assemble full-length genomes in yeast. Using an early passage of the HCMV isolate Toledo containing a mixture of wild-type and tissue culture-adapted virus. we directly cloned the majority sequence and recreated the minority sequence by simultaneous modification of multiple genomic regions. Thus, our novel approach provides a paradigm to not only efficiently engineer HCMV and other large DNA viruses on a genome-wide scale but also facilitates the cloning and genetic manipulation of primary isolates and provides a pathway to generating entirely synthetic genomes.
Cloning, Assembly, and Modification of the Primary Human Cytomegalovirus Isolate Toledo by Yeast-Based Transformation-Associated Recombination

PubMed Central

Vashee, Sanjay; Stockwell, Timothy B.; Alperovich, Nina; Denisova, Evgeniya A.; Gibson, Daniel G.; Cady, Kyle C.; Miller, Kristofer; Kannan, Krishna; Malouli, Daniel; Crawford, Lindsey B.; Voorhies, Alexander A.; Bruening, Eric; Caposio, Patrizia

2017-01-01

ABSTRACT Genetic engineering of cytomegalovirus (CMV) currently relies on generating a bacterial artificial chromosome (BAC) by introducing a bacterial origin of replication into the viral genome using in vivo recombination in virally infected tissue culture cells. However, this process is inefficient, results in adaptive mutations, and involves deletion of viral genes to avoid oversized genomes when inserting the BAC cassette. Moreover, BAC technology does not permit the simultaneous manipulation of multiple genome loci and cannot be used to construct synthetic genomes. To overcome these limitations, we adapted synthetic biology tools to clone CMV genomes in Saccharomyces cerevisiae. Using an early passage of the human CMV isolate Toledo, we first applied transformation-associated recombination (TAR) to clone 16 overlapping fragments covering the entire Toledo genome in Saccharomyces cerevisiae. Then, we assembled these fragments by TAR in a stepwise process until the entire genome was reconstituted in yeast. Since next-generation sequence analysis revealed that the low-passage-number isolate represented a mixture of parental and fibroblast-adapted genomes, we selectively modified individual DNA fragments of fibroblast-adapted Toledo (Toledo-F) and again used TAR assembly to recreate parental Toledo (Toledo-P). Linear, full-length HCMV genomes were transfected into human fibroblasts to recover virus. Unlike Toledo-F, Toledo-P displayed characteristics of primary isolates, including broad cellular tropism in vitro and the ability to establish latency and reactivation in humanized mice. Our novel strategy thus enables de novo cloning of CMV genomes, more-efficient genome-wide engineering, and the generation of viral genomes that are partially or completely derived from synthetic DNA. IMPORTANCE The genomes of large DNA viruses, such as human cytomegalovirus (HCMV), are difficult to manipulate using current genetic tools, and at this time, it is not possible to obtain, molecular clones of CMV without extensive tissue culture. To overcome these limitations, we used synthetic biology tools to capture genomic fragments from viral DNA and assemble full-length genomes in yeast. Using an early passage of the HCMV isolate Toledo containing a mixture of wild-type and tissue culture-adapted virus. we directly cloned the majority sequence and recreated the minority sequence by simultaneous modification of multiple genomic regions. Thus, our novel approach provides a paradigm to not only efficiently engineer HCMV and other large DNA viruses on a genome-wide scale but also facilitates the cloning and genetic manipulation of primary isolates and provides a pathway to generating entirely synthetic genomes. PMID:28989973
Information Fusion for Hypothesis Generation under Uncertain and Partial Information Access Situation

DTIC Science & Technology

2006-07-21

This fundermental architecture can be illustrated as bow-tie architeture , that is claimed to be the fundermenal architecture of robust systems (Fig...sequenced genomes86,87. The Gene Ontology’s biological process hierarchy88 was used to annotate functional categories to each gene, and proportions of... proportions in the early Drosophila embryo. Nature 415, 798-802 (2002). 14. Kitano, H. Cancer robustness: tumour tactics. Nature 426, 125 (2003). 15
Information Fusion for Hypothesis Generation under Uncertain and Partial Information Access Situation

DTIC Science & Technology

2008-07-21

size. This fundermental architecture can be illustrated as bow-tie architeture , that is claimed to be the fundermenal architecture of robust systems...sequenced genomes86,87. The Gene Ontology’s biological process hierarchy88 was used to annotate functional categories to each gene, and proportions ...and proportions in the early Drosophila embryo. Nature 415, 798-802 (2002). 14. Kitano, H. Cancer robustness: tumour tactics. Nature 426, 125 (2003

The complete chloroplast genome sequence of the CAM epiphyte Spanish moss (Tillandsia usneoides, Bromeliaceae) and its comparative analysis.

PubMed

Poczai, Péter; Hyvönen, Jaakko

2017-01-01

Spanish moss (Tillandsia usneoides) is an epiphytic bromeliad widely distributed throughout tropical and warm temperate America. This plant is highly adapted to extreme environmental conditions. Striking features of this species include specialized trichomes (scales) covering the surface of its shoots aiding the absorption of water and nutrients directly from the atmosphere and a specific photosynthesis using crassulacean acid metabolism (CAM). Here we report the plastid genome of Spanish moss and present the comparison of genome organization and sequence evolution within Poales. The plastome of Spanish moss has a quadripartite structure consisting of a large single copy (LSC, 87,439 bp), two inverted regions (IRa and IRb, 26,803 bp) and short single copy (SSC, 18,612 bp) region. The plastid genome had 37.2% GC content and 134 genes with 88 being unique protein-coding genes and 20 of these are duplicated in the IR, similar to other reported bromeliads. Our study shows that early diverging lineages of Poales do not have high substitution rates as compared to grasses, and plastid genomes of bromeliads show structural features considered to be ancestral in graminids. These include the loss of the introns in the clpP and rpoC1 genes and the complete loss or partial degradation of accD and ycf genes in the Graminid clade. Further structural rearrangements appeared in the graminids lacking in Spanish moss, which include a 28-kb inversion between the trnG-UCC-rps14 region and 6-kb in the trnG-UCC-psbD, followed by a third <1kb inversion in the trnT sequence.
The complete chloroplast genome sequence of the CAM epiphyte Spanish moss (Tillandsia usneoides, Bromeliaceae) and its comparative analysis

PubMed Central

Hyvönen, Jaakko

2017-01-01

Spanish moss (Tillandsia usneoides) is an epiphytic bromeliad widely distributed throughout tropical and warm temperate America. This plant is highly adapted to extreme environmental conditions. Striking features of this species include specialized trichomes (scales) covering the surface of its shoots aiding the absorption of water and nutrients directly from the atmosphere and a specific photosynthesis using crassulacean acid metabolism (CAM). Here we report the plastid genome of Spanish moss and present the comparison of genome organization and sequence evolution within Poales. The plastome of Spanish moss has a quadripartite structure consisting of a large single copy (LSC, 87,439 bp), two inverted regions (IRa and IRb, 26,803 bp) and short single copy (SSC, 18,612 bp) region. The plastid genome had 37.2% GC content and 134 genes with 88 being unique protein-coding genes and 20 of these are duplicated in the IR, similar to other reported bromeliads. Our study shows that early diverging lineages of Poales do not have high substitution rates as compared to grasses, and plastid genomes of bromeliads show structural features considered to be ancestral in graminids. These include the loss of the introns in the clpP and rpoC1 genes and the complete loss or partial degradation of accD and ycf genes in the Graminid clade. Further structural rearrangements appeared in the graminids lacking in Spanish moss, which include a 28-kb inversion between the trnG-UCC–rps14 region and 6-kb in the trnG-UCC–psbD, followed by a third <1kb inversion in the trnT sequence. PMID:29095905
Identification and genomic characterization of a novel rat bocavirus from brown rats in China.

PubMed

Lau, Susanna K P; Yeung, Hazel C; Li, Kenneth S M; Lam, Carol S F; Cai, Jian-Piao; Yuen, Ming-Chi; Wang, Ming; Zheng, Bo-Jian; Woo, Patrick C Y; Yuen, Kwok-Yung

2017-01-01

Despite recent discoveries of novel animal bocaparvoviruses, current understandings on the diversity and evolution of bocaparvoviruses are still limited. We report the identification and genome characterization of a novel bocaparvovirus, rat bocaparvovirus (RBoV), in brown rats (Rattus norvegicus) in China. RBoV was detected in 11.5%, 2.4%, 16.2% and 0.3% of alimentary, respiratory, spleen and kidney samples respectively, of 636 brown rats by PCR, but not in samples of other rodent species, suggesting that brown rats are the primary reservoir of RBoV. Six RBoV genomes sequenced from three brown rats revealed the presence of three ORFs, characteristic of bocaparvoviruses. Phylogenetic analysis showed that RBoV was distantly related to other bocaparvoviruses, forming a distinct cluster within the genus, with ≤55.5% nucleotide identities to the genome of ungulate bocaparvovirus 3, supporting its classification as a novel bocaparvovirus species. RBoV possessed a putative second exon encoding the C-terminal region of NS1 and conserved RNA splicing signals, similar to human bocaparvoviruses and canine bocaparvovirus. In contrast to human, feline and canine bocaparvoviruses which demonstrates inter/intra-host viral diversity, partial VP1/VP2 sequences of 49 RBoV strains demonstrated little inter-host genetic diversity, suggesting a single genetic group. Although the pathogenicity of RBoV remains to be determined, its presence in different host tissues suggests wide tissue tropism. RBoV represents the first bocaparvovirus in rodents with genome sequenced, which extends our knowledge on the host range of bocaparvoviruses. Further studies are required to better understand the epidemiology, genetic diversity and pathogenicity of bocaparvoviruses in different rodent populations. Copyright © 2016 Elsevier B.V. All rights reserved.
Using Drosophila melanogaster as a Model for Genotoxic Chemical Mutational Studies with a New Program, SnpSift

PubMed Central

Cingolani, Pablo; Patel, Viral M.; Coon, Melissa; Nguyen, Tung; Land, Susan J.; Ruden, Douglas M.; Lu, Xiangyi

2012-01-01

This paper describes a new program SnpSift for filtering differential DNA sequence variants between two or more experimental genomes after genotoxic chemical exposure. Here, we illustrate how SnpSift can be used to identify candidate phenotype-relevant variants including single nucleotide polymorphisms, multiple nucleotide polymorphisms, insertions, and deletions (InDels) in mutant strains isolated from genome-wide chemical mutagenesis of Drosophila melanogaster. First, the genomes of two independently isolated mutant fly strains that are allelic for a novel recessive male-sterile locus generated by genotoxic chemical exposure were sequenced using the Illumina next-generation DNA sequencer to obtain 20- to 29-fold coverage of the euchromatic sequences. The sequencing reads were processed and variants were called using standard bioinformatic tools. Next, SnpEff was used to annotate all sequence variants and their potential mutational effects on associated genes. Then, SnpSift was used to filter and select differential variants that potentially disrupt a common gene in the two allelic mutant strains. The potential causative DNA lesions were partially validated by capillary sequencing of polymerase chain reaction-amplified DNA in the genetic interval as defined by meiotic mapping and deletions that remove defined regions of the chromosome. Of the five candidate genes located in the genetic interval, the Pka-like gene CG12069 was found to carry a separate pre-mature stop codon mutation in each of the two allelic mutants whereas the other four candidate genes within the interval have wild-type sequences. The Pka-like gene is therefore a strong candidate gene for the male-sterile locus. These results demonstrate that combining SnpEff and SnpSift can expedite the identification of candidate phenotype-causative mutations in chemically mutagenized Drosophila strains. This technique can also be used to characterize the variety of mutations generated by genotoxic chemicals. PMID:22435069
First genome sequences of Achromobacter phages reveal new members of the N4 family.

PubMed

Wittmann, Johannes; Dreiseikelmann, Brigitte; Rohde, Manfred; Meier-Kolthoff, Jan P; Bunk, Boyke; Rohde, Christine

2014-01-27

Multi-resistant Achromobacter xylosoxidans has been recognized as an emerging pathogen causing nosocomially acquired infections during the last years. Phages as natural opponents could be an alternative to fight such infections. Bacteriophages against this opportunistic pathogen were isolated in a recent study. This study shows a molecular analysis of two podoviruses and reveals first insights into the genomic structure of Achromobacter phages so far. Growth curve experiments and adsorption kinetics were performed for both phages. Adsorption and propagation in cells were visualized by electron microscopy. Both phage genomes were sequenced with the PacBio RS II system based on single molecule, real-time (SMRT) technology and annotated with several bioinformatic tools. To further elucidate the evolutionary relationships between the phage genomes, a phylogenomic analysis was conducted using the genome Blast Distance Phylogeny approach (GBDP). In this study, we present the first detailed analysis of genome sequences of two Achromobacter phages so far. Phages JWAlpha and JWDelta were isolated from two different waste water treatment plants in Germany. Both phages belong to the Podoviridae and contain linear, double-stranded DNA with a length of 72329 bp and 73659 bp, respectively. 92 and 89 putative open reading frames were identified for JWAlpha and JWDelta, respectively, by bioinformatic analysis with several tools. The genomes have nearly the same organization and could be divided into different clusters for transcription, replication, host interaction, head and tail structure and lysis. Detailed annotation via protein comparisons with BLASTP revealed strong similarities to N4-like phages. Analysis of the genomes of Achromobacter phages JWAlpha and JWDelta and comparisons of different gene clusters with other phages revealed that they might be strongly related to other N4-like phages, especially of the Escherichia group. Although all these phages show a highly conserved genomic structure and partially strong similarities at the amino acid level, some differences could be identified. Those differences, e.g. the existence of specific genes for replication or host interaction in some N4-like phages, seem to be interesting targets for further examination of function and specific mechanisms, which might enlighten the mechanism of phage establishment in the host cell after infection.
Sensitive and Specific Target Sequences Selected from Retrotransposons of Schistosoma japonicum for the Diagnosis of Schistosomiasis

PubMed Central

Xu, Jing; Zhu, Xing-Quan; Wang, Sheng-Yue; Xia, Chao-Ming

2012-01-01

Background Schistosomiasis japonica is a serious debilitating and sometimes fatal disease. Accurate diagnostic tests play a key role in patient management and control of the disease. However, currently available diagnostic methods are not ideal, and the detection of the parasite DNA in blood samples has turned out to be one of the most promising tools for the diagnosis of schistosomiasis. In our previous investigations, a 230-bp sequence from the highly repetitive retrotransposon SjR2 was identified and it showed high sensitivity and specificity for detecting Schistosoma japonicum DNA in the sera of rabbit model and patients. Recently, 29 retrotransposons were found in S. japonicum genome by our group. The present study highlighted the key factors for selecting a new perspective sensitive target DNA sequence for the diagnosis of schistosomiasis, which can serve as example for other parasitic pathogens. Methodology/Principal Findings In this study, we demonstrated that the key factors based on the bioinformatic analysis for selecting target sequence are the higher genome proportion, repetitive complete copies and partial copies, and active ESTs than the others in the chromosome genome. New primers based on 25 novel retrotransposons and SjR2 were designed and their sensitivity and specificity for detecting S. japonicum DNA were compared. The results showed that a new 303-bp sequence from non-long terminal repeat (LTR) retrotransposon (SjCHGCS19) had high sensitivity and specificity. The 303-bp target sequence was amplified from the sera of rabbit model at 3 d post-infection by nested-PCR and it became negative at 17 weeks post-treatment. Furthermore, the percentage sensitivity of the nested-PCR was 97.67% in 43 serum samples of S. japonicum-infected patients. Conclusions/Significance Our findings highlighted the key factors based on the bioinformatic analysis for selecting target sequence from S. japonicum genome, which provide basis for establishing powerful molecular diagnostic techniques that can be used for monitoring early infection and therapy efficacy to support schistosomiasis control programs. PMID:22479661
Genomic sequences of murine gamma B- and gamma C-crystallin-encoding genes: promoter analysis and complete evolutionary pattern of mouse, rat and human gamma-crystallins.

PubMed

Graw, J; Liebstein, A; Pietrowski, D; Schmitt-John, T; Werner, T

1993-12-22

The murine genes, gamma B-cry and gamma C-cry, encoding the gamma B- and gamma C-crystallins, were isolated from a genomic DNA library. The complete nucleotide (nt) sequences of both genes were determined from 661 and 711 bp, respectively, upstream from the first exon to the corresponding polyadenylation sites, comprising more than 2650 and 2890 bp, respectively. The new sequences were compared to the partial cDNA sequences available for the murine gamma B-cry and gamma C-cry, as well as to the corresponding genomic sequences from rat and man, at both the nt and predicted amino acid (aa) sequence levels. In the gamma B-cry promoter region, a canonical CCAAT-box, a TATA-box, putative NF-I and C/EBP sites were detected. An R-repeat is inserted 366 bp upstream from the transcription start point. In contrast, the gamma C-cry promoter does not contain a CCAAT-box, but some other putative binding sites for transcription factors (AP-2, UBP-1, LBP-1) were located by computer analysis. The promoter regions of all six gamma-cry from mouse, rat and human, except human psi gamma F-cry, were analyzed for common sequence elements. A complex sequence element of about 70-80 bp was found in the proximal promoter, which contains a gamma-cry-specific and almost invariant sequence (crygpel) of 14 nt, and ends with the also invariant TATA-box. Within the complex sequence element, a minimum of three further features specific for the gamma A-, gamma B- and gamma D/E/F-cry genes can be defined, at least two of which were recently shown to be functional. In addition to these four sequence elements, a subtype-specific structure of inverted repeats with different-sized spacers can be deduced from the multiple sequence alignment. A phylogenetic analysis based on the promoter region, as well as the complete exon 3 of all gamma-cry from mouse, rat and man, suggests separation of only five gamma-cry subtypes (gamma A-, gamma B-, gamma C-, gamma D- and gamma E/F-cry) prior to species separation.
An efficient approach to BAC based assembly of complex genomes.

PubMed

Visendi, Paul; Berkman, Paul J; Hayashi, Satomi; Golicz, Agnieszka A; Bayer, Philipp E; Ruperao, Pradeep; Hurgobin, Bhavna; Montenegro, Juan; Chan, Chon-Kit Kenneth; Staňková, Helena; Batley, Jacqueline; Šimková, Hana; Doležel, Jaroslav; Edwards, David

2016-01-01

There has been an exponential growth in the number of genome sequencing projects since the introduction of next generation DNA sequencing technologies. Genome projects have increasingly involved assembly of whole genome data which produces inferior assemblies compared to traditional Sanger sequencing of genomic fragments cloned into bacterial artificial chromosomes (BACs). While whole genome shotgun sequencing using next generation sequencing (NGS) is relatively fast and inexpensive, this method is extremely challenging for highly complex genomes, where polyploidy or high repeat content confounds accurate assembly, or where a highly accurate 'gold' reference is required. Several attempts have been made to improve genome sequencing approaches by incorporating NGS methods, to variable success. We present the application of a novel BAC sequencing approach which combines indexed pools of BACs, Illumina paired read sequencing, a sequence assembler specifically designed for complex BAC assembly, and a custom bioinformatics pipeline. We demonstrate this method by sequencing and assembling BAC cloned fragments from bread wheat and sugarcane genomes. We demonstrate that our assembly approach is accurate, robust, cost effective and scalable, with applications for complete genome sequencing in large and complex genomes.
Library Resources for Bac End Sequencing. Final Technical Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pieter J. de Jong

2000-10-01

Studies directed towards the specific aims outlined for this research award are summarized. The RPCI II Human Bac Library has been expanded by the addition of 6.9-fold genomic coverage. This segment has been generated from a MBOI partial digest of the same anonymous donor DNA used for the rest of the library. A new cloning vector, pTARBAC1, has been constructed and used in the construction of RPCI-II segment 5. This new cloning vector provides a new strategy in identifying targeted genomic regions and will greatly facilitate a large-scale analysis for positional cloning. A new maleCS7BC/6J mouse BAC library has beenmore » constructed. RPCI-23 contain 576 plates (approx 210,000 clones) and represents approximately 11-fold coverage of the mouse genome.« less
DR-78, a novel Drosophila melanogaster genomic DNA fragment highly homologous to the DNA-binding domain of thyroid hormone-retinoic acid-vitamin D receptor subfamily.

PubMed

Martín-Blanco, E; Kornberg, T B

1993-11-16

Degenerate oligodeoxyribonucleotides were designed for both ends of the DNA-binding domain of members of the nuclear receptor superfamily. PCR amplified Drosophila melanogaster DNA was purified and cloned (DR plasmids). Genomic lambda DASH clones were identified at high stringency with an amplified DR-78 plasmid DNA and isolated. The partial sequence shows a very probable open reading frame which would encode a peptide highly homologous to members of the thyroid hormone-retinoic acid-vitamin D receptor subfamily. The fragment corresponds to a single copy gene and was mapped at position 78D of chromosome three by in situ hybridization.
Phylogeny and differentiation of reptilian and amphibian ranaviruses detected in Europe.

PubMed

Stöhr, Anke C; López-Bueno, Alberto; Blahak, Silvia; Caeiro, Maria F; Rosa, Gonçalo M; Alves de Matos, António Pedro; Martel, An; Alejo, Alí; Marschang, Rachel E

2015-01-01

Ranaviruses in amphibians and fish are considered emerging pathogens and several isolates have been extensively characterized in different studies. Ranaviruses have also been detected in reptiles with increasing frequency, but the role of reptilian hosts is still unclear and only limited sequence data has been provided. In this study, we characterized a number of ranaviruses detected in wild and captive animals in Europe based on sequence data from six genomic regions (major capsid protein (MCP), DNA polymerase (DNApol), ribonucleoside diphosphate reductase alpha and beta subunit-like proteins (RNR-α and -β), viral homolog of the alpha subunit of eukaryotic initiation factor 2, eIF-2α (vIF-2α) genes and microsatellite region). A total of ten different isolates from reptiles (tortoises, lizards, and a snake) and four ranaviruses from amphibians (anurans, urodeles) were included in the study. Furthermore, the complete genome sequences of three reptilian isolates were determined and a new PCR for rapid classification of the different variants of the genomic arrangement was developed. All ranaviruses showed slight variations on the partial nucleotide sequences from the different genomic regions (92.6-100%). Some very similar isolates could be distinguished by the size of the band from the microsatellite region. Three of the lizard isolates had a truncated vIF-2α gene; the other ranaviruses had full-length genes. In the phylogenetic analyses of concatenated sequences from different genes (3223 nt/10287 aa), the reptilian ranaviruses were often more closely related to amphibian ranaviruses than to each other, and most clustered together with previously detected ranaviruses from the same geographic region of origin. Comparative analyses show that among the closely related amphibian-like ranaviruses (ALRVs) described to date, three recently split and independently evolving distinct genetic groups can be distinguished. These findings underline the wide host range of ranaviruses and the emergence of pathogen pollution via animal trade of ectothermic vertebrates.
Phylogeny and Differentiation of Reptilian and Amphibian Ranaviruses Detected in Europe

PubMed Central

Stöhr, Anke C.; López-Bueno, Alberto; Blahak, Silvia; Caeiro, Maria F.; Rosa, Gonçalo M.; Alves de Matos, António Pedro; Martel, An; Alejo, Alí; Marschang, Rachel E.

2015-01-01

Ranaviruses in amphibians and fish are considered emerging pathogens and several isolates have been extensively characterized in different studies. Ranaviruses have also been detected in reptiles with increasing frequency, but the role of reptilian hosts is still unclear and only limited sequence data has been provided. In this study, we characterized a number of ranaviruses detected in wild and captive animals in Europe based on sequence data from six genomic regions (major capsid protein (MCP), DNA polymerase (DNApol), ribonucleoside diphosphate reductase alpha and beta subunit-like proteins (RNR-α and -β), viral homolog of the alpha subunit of eukaryotic initiation factor 2, eIF-2α (vIF-2α) genes and microsatellite region). A total of ten different isolates from reptiles (tortoises, lizards, and a snake) and four ranaviruses from amphibians (anurans, urodeles) were included in the study. Furthermore, the complete genome sequences of three reptilian isolates were determined and a new PCR for rapid classification of the different variants of the genomic arrangement was developed. All ranaviruses showed slight variations on the partial nucleotide sequences from the different genomic regions (92.6–100%). Some very similar isolates could be distinguished by the size of the band from the microsatellite region. Three of the lizard isolates had a truncated vIF-2α gene; the other ranaviruses had full-length genes. In the phylogenetic analyses of concatenated sequences from different genes (3223 nt/10287 aa), the reptilian ranaviruses were often more closely related to amphibian ranaviruses than to each other, and most clustered together with previously detected ranaviruses from the same geographic region of origin. Comparative analyses show that among the closely related amphibian-like ranaviruses (ALRVs) described to date, three recently split and independently evolving distinct genetic groups can be distinguished. These findings underline the wide host range of ranaviruses and the emergence of pathogen pollution via animal trade of ectothermic vertebrates. PMID:25706285
The VirusBanker database uses a Java program to allow flexible searching through Bunyaviridae sequences.

PubMed

Fourment, Mathieu; Gibbs, Mark J

2008-02-05

Viruses of the Bunyaviridae have segmented negative-stranded RNA genomes and several of them cause significant disease. Many partial sequences have been obtained from the segments so that GenBank searches give complex results. Sequence databases usually use HTML pages to mediate remote sorting, but this approach can be limiting and may discourage a user from exploring a database. The VirusBanker database contains Bunyaviridae sequences and alignments and is presented as two spreadsheets generated by a Java program that interacts with a MySQL database on a server. Sequences are displayed in rows and may be sorted using information that is displayed in columns and includes data relating to the segment, gene, protein, species, strain, sequence length, terminal sequence and date and country of isolation. Bunyaviridae sequences and alignments may be downloaded from the second spreadsheet with titles defined by the user from the columns, or viewed when passed directly to the sequence editor, Jalview. VirusBanker allows large datasets of aligned nucleotide and protein sequences from the Bunyaviridae to be compiled and winnowed rapidly using criteria that are formulated heuristically.
Mechanistic insights into induction of vitellogenin gene expression by estrogens in Sydney rock oysters, Saccostrea glomerata.

PubMed

Tran, Thi Kim Anh; MacFarlane, Geoff R; Kong, Richard Yuen Chong; O'Connor, Wayne A; Yu, Richard Man Kit

2016-05-01

Marine molluscs, such as oysters, respond to estrogenic compounds with the induction of the egg yolk protein precursor, vitellogenin (Vtg), availing a biomarker for estrogenic pollution. Despite this application, the precise molecular mechanism through which estrogens exert their action to induce molluscan vitellogenesis is unknown. As a first step to address this question, we cloned a gene encoding Vtg from the Sydney rock oyster Saccostrea glomerata (sgVtg). Using primers designed from a partial sgVtg cDNA sequence available in Genbank, a full-length sgVtg cDNA of 8498bp was obtained by 5'- and 3'-RACE. The open reading frame (ORF) of sgVtg was determined to be 7980bp, which is substantially longer than the orthologs of other oyster species. Its deduced protein sequence shares the highest homology at the N- and C-terminal regions with other molluscan Vtgs. The full-length genomic DNA sequence of sgVtg was obtained by genomic PCR and genome walking targeting the gene body and flanking regions, respectively. The genomic sequence spans 20kb and consists of 30 exons and 29 introns. Computer analysis identified three closely spaced half-estrogen responsive elements (EREs) in the promoter region and a 210-bp CpG island 62bp downstream of the transcription start site. Upregulation of sgVtg mRNA expression was observed in the ovaries following in vitro (explants) and in vivo (tank) exposure to 17β-estradiol (E2). Notably, treatment with an estrogen receptor (ER) antagonist in vitro abolished the upregulation, suggesting a requirement for an estrogen-dependent receptor for transcriptional activation. DNA methylation of the 5' CpG island was analysed using bisulfite genomic sequencing of the in vivo exposed ovaries. The CpG island was found to be hypomethylated (with 0-3% methylcytosines) in both control and E2-exposed oysters. However, no significant differential methylation or any correlation between methylation and sgVtg expression levels was observed. Overall, the results support the possible involvement of an ERE-containing promoter and an estrogen-activated receptor in estrogen signalling in marine molluscs. Copyright © 2016 Elsevier B.V. All rights reserved.
Duck egg-drop syndrome caused by BYD virus, a new Tembusu-related flavivirus.

PubMed

Su, Jingliang; Li, Shuang; Hu, Xudong; Yu, Xiuling; Wang, Yongyue; Liu, Peipei; Lu, Xishan; Zhang, Guozhong; Hu, Xueying; Liu, Di; Li, Xiaoxia; Su, Wenliang; Lu, Hao; Mok, Ngai Shing; Wang, Peiyi; Wang, Ming; Tian, Kegong; Gao, George F

2011-03-24

Since April 2010, a severe outbreak of duck viral infection, with egg drop, feed uptake decline and ovary-oviduct disease, has spread around the major duck-producing regions in China. A new virus, named BYD virus, was isolated in different areas, and a similar disease was reproduced in healthy egg-producing ducks, infecting with the isolated virus. The virus was re-isolated from the affected ducks and replicated well in primary duck embryo fibroblasts and Vero cells, causing the cytopathic effect. The virus was identified as an enveloped positive-stranded RNA virus with a size of approximately 55 nm in diameter. Genomic sequencing of the isolated virus revealed that it is closely related to Tembusu virus (a mosquito-borne Ntaya group flavivirus), with 87-91% nucleotide identity of the partial E (envelope) proteins to that of Tembusu virus and 72% of the entire genome coding sequence with Bagaza virus, the most closely related flavivirus with an entirely sequenced genome. Collectively our systematic studies fulfill Koch's postulates, and therefore, the causative agent of the duck egg drop syndrome occurring in China is a new flavivirus. Flavivirus is an emerging and re-emerging zoonotic pathogen and BYD virus that causes severe egg-drop, could be disastrous for the duck industry. More importantly its public health concerns should also be evaluated, and its epidemiology should be closely watched due to the zoonotic nature of flaviviruses.
Human genetics and genomics a decade after the release of the draft sequence of the human genome.

PubMed

Naidoo, Nasheen; Pawitan, Yudi; Soong, Richie; Cooper, David N; Ku, Chee-Seng

2011-10-01

Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade.
Human genetics and genomics a decade after the release of the draft sequence of the human genome

PubMed Central

2011-01-01

Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade. PMID:22155605
Complete genome sequence of Planctomyces brasiliensis type strain (DSM 5305 T), phylogenomic analysis and reclassification of Planctomycetes including the descriptions of Gimesia gen. nov., Planctopirus gen. nov. and Rubinisphaera gen. nov. and emended descriptions of the order Planctomycetales and the family Planctomycetaceae

DOE Office of Scientific and Technical Information (OSTI.GOV)

Scheuner, Carmen; Tindall, Brian J.; Lu, Megan

Planctomyces brasiliensis Schlesner 1990 belongs to the order Planctomycetales, which differs from other bacterial taxa by several distinctive features such as internal cell compartmentalization, multiplication by forming buds directly from the spherical, ovoid or pear-shaped mother cell and a cell wall consisting of a proteinaceous layer rather than a peptidoglycan layer. The first strains of P. brasiliensis, including the type strain IFAM 1448 T, were isolated from a water sample of Lagoa Vermelha, a salt pit near Rio de Janeiro, Brasil. This is the second completed genome sequence of a type strain of the genus Planctomyces to be published andmore » the sixth type strain genome sequence from the family Planctomycetaceae. The 6,006,602 bp long genome with its 4,811 protein-coding and 54 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. We study phylogenomic analyses that indicate that the classification within the Planctomycetaceae is partially in conflict with its evolutionary history, as the positioning of Schlesneria renders the genus Planctomyces paraphyletic. A re-analysis of published fatty-acid measurements also does not support the current arrangement of the two genera. A quantitative comparison of phylogenetic and phenotypic aspects indicates that the three Planctomyces species with type strains available in public culture collections should be placed in separate genera. Thus the genera Gimesia, Planctopirus and Rubinisphaera are proposed to accommodate P. maris, P. limnophilus and P. brasiliensis, respectively. Pronounced differences between the reported G + C content of Gemmata obscuriglobus, Singulisphaera acidiphila and Zavarzinella formosa and G + C content calculated from their genome sequences call for emendation of their species descriptions. Lastly, in addition to other features, the range of G + C values reported for the genera within the Planctomycetaceae indicates that the descriptions of the family and the order should be emended.« less
Complete genome sequence of Planctomyces brasiliensis type strain (DSM 5305 T), phylogenomic analysis and reclassification of Planctomycetes including the descriptions of Gimesia gen. nov., Planctopirus gen. nov. and Rubinisphaera gen. nov. and emended descriptions of the order Planctomycetales and the family Planctomycetaceae

DOE PAGES

Scheuner, Carmen; Tindall, Brian J.; Lu, Megan; ...

2014-12-08

Planctomyces brasiliensis Schlesner 1990 belongs to the order Planctomycetales, which differs from other bacterial taxa by several distinctive features such as internal cell compartmentalization, multiplication by forming buds directly from the spherical, ovoid or pear-shaped mother cell and a cell wall consisting of a proteinaceous layer rather than a peptidoglycan layer. The first strains of P. brasiliensis, including the type strain IFAM 1448 T, were isolated from a water sample of Lagoa Vermelha, a salt pit near Rio de Janeiro, Brasil. This is the second completed genome sequence of a type strain of the genus Planctomyces to be published andmore » the sixth type strain genome sequence from the family Planctomycetaceae. The 6,006,602 bp long genome with its 4,811 protein-coding and 54 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. We study phylogenomic analyses that indicate that the classification within the Planctomycetaceae is partially in conflict with its evolutionary history, as the positioning of Schlesneria renders the genus Planctomyces paraphyletic. A re-analysis of published fatty-acid measurements also does not support the current arrangement of the two genera. A quantitative comparison of phylogenetic and phenotypic aspects indicates that the three Planctomyces species with type strains available in public culture collections should be placed in separate genera. Thus the genera Gimesia, Planctopirus and Rubinisphaera are proposed to accommodate P. maris, P. limnophilus and P. brasiliensis, respectively. Pronounced differences between the reported G + C content of Gemmata obscuriglobus, Singulisphaera acidiphila and Zavarzinella formosa and G + C content calculated from their genome sequences call for emendation of their species descriptions. Lastly, in addition to other features, the range of G + C values reported for the genera within the Planctomycetaceae indicates that the descriptions of the family and the order should be emended.« less
Genome sequence of Phytophthora ramorum: implications for management

Treesearch

Brett Tyler; Sucheta Tripathy; Nik Grunwald; Kurt Lamour; Kelly Ivors; Matteo Garbelotto; Daniel Rokhsar; Nik Putnam; Igor Grigoriev; Jeffrey Boore

2006-01-01

A draft genome sequence has been determined for Phytophthora ramorum, together with a draft sequence of the soybean pathogen Phytophthora sojae. The P. ramorum genome was sequenced to a depth of 7-fold coverage, while the P. sojae genome was sequenced to a depth of 9-fold coverage. The genome...

Genome Science: A Video Tour of the Washington University Genome Sequencing Center for High School and Undergraduate Students

ERIC Educational Resources Information Center

Flowers, Susan K.; Easter, Carla; Holmes, Andrea; Cohen, Brian; Bednarski, April E.; Mardis, Elaine R.; Wilson, Richard K.; Elgin, Sarah C. R.

2005-01-01

Sequencing of the human genome has ushered in a new era of biology. The technologies developed to facilitate the sequencing of the human genome are now being applied to the sequencing of other genomes. In 2004, a partnership was formed between Washington University School of Medicine Genome Sequencing Center's Outreach Program and Washington…
Exaptation of Bornavirus-Like Nucleoprotein Elements in Afrotherians

PubMed Central

Kobayashi, Yuki; Horie, Masayuki; Nakano, Ayumi; Murata, Koichi; Itou, Takuya; Suzuki, Yoshiyuki

2016-01-01

Endogenous bornavirus-like nucleoprotein elements (EBLNs), the nucleotide sequence elements derived from the nucleoprotein gene of ancient bornavirus-like viruses, have been identified in many animal genomes. Here we show evidence that EBLNs encode functional proteins in their host. Some afrotherian EBLNs were observed to have been maintained for more than 83.3 million years under negative selection. Splice variants were expressed from the genomic loci of EBLNs in elephant, and some were translated into proteins. The EBLN proteins appeared to be localized to the rough endoplasmic reticulum in African elephant cells, in contrast to the nuclear localization of bornavirus N. These observations suggest that afrotherian EBLNs have acquired a novel function in their host. Interestingly, genomic sequences of the first exon and its flanking regions in these EBLN loci were homologous to those of transmembrane protein 106B (TMEM106B). The upstream region of the first exon in the EBLN loci exhibited a promoter activity, suggesting that the ability of these EBLNs to be transcribed in the host cell was gained through capturing a partial duplicate of TMEM106B. In conclusion, our results strongly support for exaptation of EBLNs to encode host proteins in afrotherians. PMID:27518265
The kinetoplast DNA of the Australian trypanosome, Trypanosoma copemani, shares features with Trypanosoma cruzi and Trypanosoma lewisi.

PubMed

Botero, Adriana; Kapeller, Irit; Cooper, Crystal; Clode, Peta L; Shlomai, Joseph; Thompson, R C Andrew

2018-05-17

Kinetoplast DNA (kDNA) is the mitochondrial genome of trypanosomatids. It consists of a few dozen maxicircles and several thousand minicircles, all catenated topologically to form a two-dimensional DNA network. Minicircles are heterogeneous in size and sequence among species. They present one or several conserved regions that contain three highly conserved sequence blocks. CSB-1 (10 bp sequence) and CSB-2 (8 bp sequence) present lower interspecies homology, while CSB-3 (12 bp sequence) or the Universal Minicircle Sequence is conserved within most trypanosomatids. The Universal Minicircle Sequence is located at the replication origin of the minicircles, and is the binding site for the UMS binding protein, a protein involved in trypanosomatid survival and virulence. Here, we describe the structure and organisation of the kDNA of Trypanosoma copemani, a parasite that has been shown to infect mammalian cells and has been associated with the drastic decline of the endangered Australian marsupial, the woylie (Bettongia penicillata). Deep genomic sequencing showed that T. copemani presents two classes of minicircles that share sequence identity and organisation in the conserved sequence blocks with those of Trypanosoma cruzi and Trypanosoma lewisi. A 19,257 bp partial region of the maxicircle of T. copemani that contained the entire coding region was obtained. Comparative analysis of the T. copemani entire maxicircle coding region with the coding regions of T. cruzi and T. lewisi showed they share 71.05% and 71.28% identity, respectively. The shared features in the maxicircle/minicircle organisation and sequence between T. copemani and T. cruzi/T. lewisi suggest similarities in their process of kDNA replication, and are of significance in understanding the evolution of Australian trypanosomes. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
Accumulation of point mutations and reassortment of genomic RNA segments are involved in the microevolution of Puumala hantavirus in a bank vole (Myodes glareolus) population.

PubMed

Razzauti, Maria; Plyusnina, Angelina; Henttonen, Heikki; Plyusnin, Alexander

2008-07-01

The genetic diversity of Puumala hantavirus (PUUV) was studied in a local population of its natural host, the bank vole (Myodes glareolus). The trapping area (2.5 x 2.5 km) at Konnevesi, Central Finland, included 14 trapping sites, at least 500 m apart; altogether, 147 voles were captured during May and October 2005. Partial sequences of the S, M and L viral genome segments were recovered from 40 animals. Seven, 12 and 17 variants were detected for the S, M and L sequences, respectively; these represent new wild-type PUUV strains that belong to the Finnish genetic lineage. The genetic diversity of PUUV strains from Konnevesi was 0.2-4.9 % for the S segment, 0.2-4.8 % for the M segment and 0.2-9.7 % for the L segment. Most nucleotide substitutions were synonymous and most deduced amino acid substitutions were conservative, probably due to strong stabilizing selection operating at the protein level. Based on both sequence markers and phylogenetic clustering, the S, M and L sequences could be assigned to two groups, 'A' and 'B'. Notably, not all bank voles carried S, M and L sequences belonging to the same group, i.e. S(A)M(A)L(A) or S(B)M(B)L(B). A substantial proportion (8/40, 20 %) of the newly characterized PUUV strains possessed reassortant genomes such as S(B)M(A)L(A), S(A)M(B)L(B) or S(B)M(A)L(B). These results suggest that at least some of the PUUV reassortants are viable and can survive in the presence of their parental strains.
Occurrence of and Sequence Variation among F-Specific RNA Bacteriophage Subgroups in Feces and Wastewater of Urban and Animal Origins

PubMed Central

Hartard, C.; Rivet, R.; Banas, S.

2015-01-01

F-specific RNA bacteriophages (FRNAPH) have been widely studied as tools for evaluating fecal or viral pollution in water. It has also been proposed that they can be used to differentiate human from animal fecal contamination. While FRNAPH subgroup I (FRNAPH-I) and FRNAPH-IV are often associated with animal pollution, FRNAPH-II and -III prevail in human wastewater. However, this distribution is not absolute, and variable survival rates in these subgroups lead to misinterpretation of the original distribution. In this context, we studied FRNAPH distribution in urban wastewater and animal feces/wastewater. To increase the specificity, we partially sequenced the genomes of phages of urban and animal origins. The persistence of the genomes and infectivity were also studied, over time in wastewater and during treatment, for each subgroup. FRNAPH-I genome sequences did not show any specific urban or animal clusters to allow development of molecular tools for differentiation. They were the most resistant and as such may be used as fecal or viral indicators. FRNAPH-II's low prevalence and low sequence variability in animal stools, combined with specific clusters formed by urban strains, allowed differentiation between urban and animal pollution by using a specific reverse transcription-PCR (RT-PCR) method. The subgroup's resistance over time was comparable to that of FRNAPH-I, but its surface properties allowed higher elimination rates during activated-sludge treatment. FRNAPH-III's low sequence variability in animal wastewater and specific cluster formation by urban strains also allowed differentiation by using a specific RT-PCR method. Nevertheless, its low resistance restricted it to being used only for recent urban pollution detection. FRNAPH-IV was too rare to be used. PMID:26162878
Occurrence of and Sequence Variation among F-Specific RNA Bacteriophage Subgroups in Feces and Wastewater of Urban and Animal Origins.

PubMed

Hartard, C; Rivet, R; Banas, S; Gantzer, C

2015-09-01

F-specific RNA bacteriophages (FRNAPH) have been widely studied as tools for evaluating fecal or viral pollution in water. It has also been proposed that they can be used to differentiate human from animal fecal contamination. While FRNAPH subgroup I (FRNAPH-I) and FRNAPH-IV are often associated with animal pollution, FRNAPH-II and -III prevail in human wastewater. However, this distribution is not absolute, and variable survival rates in these subgroups lead to misinterpretation of the original distribution. In this context, we studied FRNAPH distribution in urban wastewater and animal feces/wastewater. To increase the specificity, we partially sequenced the genomes of phages of urban and animal origins. The persistence of the genomes and infectivity were also studied, over time in wastewater and during treatment, for each subgroup. FRNAPH-I genome sequences did not show any specific urban or animal clusters to allow development of molecular tools for differentiation. They were the most resistant and as such may be used as fecal or viral indicators. FRNAPH-II's low prevalence and low sequence variability in animal stools, combined with specific clusters formed by urban strains, allowed differentiation between urban and animal pollution by using a specific reverse transcription-PCR (RT-PCR) method. The subgroup's resistance over time was comparable to that of FRNAPH-I, but its surface properties allowed higher elimination rates during activated-sludge treatment. FRNAPH-III's low sequence variability in animal wastewater and specific cluster formation by urban strains also allowed differentiation by using a specific RT-PCR method. Nevertheless, its low resistance restricted it to being used only for recent urban pollution detection. FRNAPH-IV was too rare to be used. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Origin of the Y genome in Elymus and its relationship to other genomes in Triticeae based on evidence from elongation factor G (EF-G) gene sequences.

PubMed

Sun, Genlou; Komatsuda, Takao

2010-08-01

It is well known that Elymus arose through hybridization between representatives of different genera. Cytogenetic analyses show that all its members include the St genome in combination with one or more of four other genomes, the H, Y, P, and W genomes. The origins of the H, P, and W genomes are known, but not for the Y genome. We analyzed the single copy nuclear gene coding for elongation factor G (EF-G) from 28 accessions of polyploid Elymus species and 45 accessions of diploid Triticeae species in order to investigate origin of the Y genome and its relationship to other genomes in the tribe Triticeae. Sequence comparisons among the St, H, Y, P, W, and E genomes detected genome-specific polymorphisms at 66 nucleotide positions. The St and Y genomes are relatively dissimilar. The phylogeny of the Y genome sequences was investigated for the first time. They were most similar to the W genome sequences. The Y genome sequences were placed in two different groups. These two groups were included in an unresolved clade that included the W and E sequences as well as sequences from many annual species. The H genomes sequences were in a clade with the F, P, and Ns genome sequences as sister groups. These two clades were more closely related to each other and to the L and Xp genomes than they were to the St genome sequences. These data support the hypothesis that the Y genome evolved in a diploid species and has a different origin from the St genome. Copyright 2010 Elsevier Inc. All rights reserved.
Characterization of a prototype strain of hepatitis E virus.

PubMed

Tsarev, S A; Emerson, S U; Reyes, G R; Tsareva, T S; Legters, L J; Malik, I A; Iqbal, M; Purcell, R H

1992-01-15

A strain of hepatitis E virus (SAR-55) implicated in an epidemic of enterically transmitted non-A, non-B hepatitis, now called hepatitis E, was characterized extensively. Six cynomolgus monkeys (Macaca fascicularis) were infected with a strain of hepatitis E virus from Pakistan. Reverse transcription-polymerase chain reaction was used to determine the pattern of virus shedding in feces, bile, and serum relative to hepatitis and induction of specific antibodies. Virtually the entire genome of SAR-55 (7195 nucleotides) was sequenced. Comparison of the sequence of SAR-55 with that of a Burmese strain revealed a high level of homology except for one region encoding 100 amino acids of a putative nonstructural polyprotein. Identification of this region as hypervariable was obtained by partial sequencing of a third isolate of hepatitis E virus from Kirgizia.
Company profile: Complete Genomics Inc.

PubMed

Reid, Clifford

2011-02-01

Complete Genomics Inc. is a life sciences company that focuses on complete human genome sequencing. It is taking a completely different approach to DNA sequencing than other companies in the industry. Rather than building a general-purpose platform for sequencing all organisms and all applications, it has focused on a single application - complete human genome sequencing. The company's Complete Genomics Analysis Platform (CGA™ Platform) comprises an integrated package of biochemistry, instrumentation and software that sequences human genomes at the highest quality, lowest cost and largest scale available. Complete Genomics offers a turnkey service that enables customers to outsource their human genome sequencing to the company's genome sequencing center in Mountain View, CA, USA. Customers send in their DNA samples, the company does all the library preparation, DNA sequencing, assembly and variant analysis, and customers receive research-ready data that they can use for biological discovery.
Origin of porcine circovirus type 2 (PCV2) from swine affected by PCV2-associated diseases in Croatia.

PubMed

Novosel, D; Tuboly, T; Csagola, A; Lorincz, M; Cubric-Curik, V; Jungic, A; Curik, I; Segalés, J; Cortey, M; Lipej, Z

2014-04-26

Porcine circovirus type 2 (PCV2) causes some of the most significant economic losses in pig production. Several multisystemic syndromes have been attributed to PCV2 infection, which are known as PCV2-associated diseases (PCVDs). This study investigated the origin and evolution of PCV2 sequences in domestic pigs and wild boars affected by PCVDs in Croatia. Viral sequences were recovered from three wild boars diagnosed with PCV2-systemic disease (PCV2-SD), 63 fetuses positive for PCV2 DNA as determined by PCR, 14 domestic pigs affected with PCV2-SD (displaying severe interstitial nephritis) and five domestic pigs with proliferative and necrotising pneumonia. Seventeen complete PCV2 genomes were recovered. Phylogenetic and evolutionary analyses based on median-joining phylogenetic networks, amino acid alignments and principal coordinate analysis were performed using complete genomes, as well as complete and partial ORF sequences for ORF1 and ORF2. Two of the 17 PCV2 sequences belonged to PCV2a, 14 to PCV2b and one was unclustered. PCV2b was the predominant genotype in Croatia and has been linked to international trade as a route of introduction. Correlation between particular viral strains with PCVDs is lacking.
Phylogenetic and microsatellite markers for Tulasnella (Tulasnellaceae) mycorrhizal fungi associated with Australian orchids1

PubMed Central

Ruibal, Monica P.; Peakall, Rod; Smith, Leon M.; Linde, Celeste C.

2013-01-01

• Premise of the study: Phylogenetic and microsatellite markers were developed for Tulasnella mycorrhizal fungi to investigate fungal species identity and diversity. These markers will be useful in future studies investigating the phylogenetic relationship of the fungal symbionts, specificity of orchid–mycorrhizal associations, and the role of mycorrhizae in orchid speciation within several orchid genera. • Methods and Results: We generated partial genome sequences of two Tulasnella symbionts originating from Chiloglottis and Drakaea orchid species with 454 genome sequencing. Cross-genus transferability across mycorrhizal symbionts associated with multiple genera of Australian orchids (Arthrochilus, Chiloglottis, Drakaea, and Paracaleana) was found for seven phylogenetic loci. Five loci showed cross-transferability to Tulasnella from other orchid genera, and two to Sebacina. Furthermore, 11 polymorphic microsatellite loci were developed for Tulasnella from Chiloglottis. • Conclusions: Highly informative markers were obtained, allowing investigation of mycorrhizal diversity of Tulasnellaceae associated with a wide variety of terrestrial orchids in Australia and potentially worldwide. PMID:25202528
DUK - A Fast and Efficient Kmer Based Sequence Matching Tool

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Mingkun; Copeland, Alex; Han, James

2011-03-21

A new tool, DUK, is developed to perform matching task. Matching is to find whether a query sequence partially or totally matches given reference sequences or not. Matching is similar to alignment. Indeed many traditional analysis tasks like contaminant removal use alignment tools. But for matching, there is no need to know which bases of a query sequence matches which position of a reference sequence, it only need know whether there exists a match or not. This subtle difference can make matching task much faster than alignment. DUK is accurate, versatile, fast, and has efficient memory usage. It uses Kmermore » hashing method to index reference sequences and Poisson model to calculate p-value. DUK is carefully implemented in C++ in object oriented design. The resulted classes can also be used to develop other tools quickly. DUK have been widely used in JGI for a wide range of applications such as contaminant removal, organelle genome separation, and assembly refinement. Many real applications and simulated dataset demonstrate its power.« less
Curated eutherian third party data gene data sets.

PubMed

Premzl, Marko

2016-03-01

The free available eutherian genomic sequence data sets advanced scientific field of genomics. Of note, future revisions of gene data sets were expected, due to incompleteness of public eutherian genomic sequence assemblies and potential genomic sequence errors. The eutherian comparative genomic analysis protocol was proposed as guidance in protection against potential genomic sequence errors in public eutherian genomic sequences. The protocol was applicable in updates of 7 major eutherian gene data sets, including 812 complete coding sequences deposited in European Nucleotide Archive as curated third party data gene data sets.
Kinetic Induction of Oat Shoot Pulvinus Invertase mRNA by Gravistimulation and Partial cDNA Cloning by the Polymerase Chain Reaction

NASA Technical Reports Server (NTRS)

Wu, Liu-Lai; Song, Il; Karuppiah, Nadarajah; Kaufman, Peter B.

1993-01-01

An asymmetric (top vs. bottom halves of pulvini) induction of invertase mRNA by gravistimulation was analyzed in oat shoot pulvini. Total RNA and poly(A)(+) RNA, isolated from oat pulvini, and two oli-gonucleotide primers, corresponding to two conserved amino acid sequences (NDPNG and WECPD) found in invertase from other species, were used for the polymerase chain reaction (PCR). A partial length cDNA (550 bp) was obtained and characterized. A 62% nucleotide sequence homology and 58% deduced amino acid sequence homology, as compared to beta-fructosidase of carrot cell wall, was found. Northern blot analysis showed that there was an obviously transient induction of invertase mRNA by gravistimulation in the oat pulvinus system. The mRNA was rapidly induced to a maximum level at 1 hour after gravistimulation treatment and gradually decreased afterwards. The mRNA level in the bottom half of the oat pulvinus was significantly higher than that in the top half of the pulvinus tissue. The kinetic induction of invertase mRNA was consistent with the transient accumulation of invertase activity during the graviresponse of the pulvinus. This indicates that the expression of the invertase gene(s) could be regulated by gravistimulation at the transcriptional level. Southern blot analysis showed that there were two to three genomic DNA fragments which hybridized with the partial-length invertase cDNA.
Complete genome sequence of a coxsackievirus B3 recombinant isolated from an aseptic meningitis outbreak in eastern China.

PubMed

Zhang, Wenqiang; Lin, Xiaojuan; Jiang, Ping; Tao, Zexin; Liu, Xiaolin; Ji, Feng; Wang, Tongzhan; Wang, Suting; Lv, Hui; Xu, Aiqiang; Wang, Haiyan

2016-08-01

Coxsackievirus B3 (CV-B3) has frequently been associated with aseptic meningitis outbreaks in China. To identify sequence motifs related to aseptic meningitis and to construct an infectious clone, the genome sequence of 08TC170, a representative strain isolated from cerebrospinal fluid (CSF) samples from an outbreak in Shandong in 2008, was determined, and the coding regions for P1-P3 and VP1 were aligned. The first 21 and last 20 residues were "TTAAAACAGCCTGTGGGTTGT" and "ATTCTCCGCATTCGGTGCGG", respectively. The whole genome consisted of 7401 nucleotides, sharing 80.8 % identity with the prototype strain Nancy and low sequence similarity with members of clusters A-C. In contrast, 08TC170 showed high sequence similarity to members of cluster D. An especially high level of sequence identity (≥97.7 %) was found within a branch constituted by 08TC170 and four Chinese strains that clustered together in all of the P1-P3 phylogenic trees. In addition, 08TC170 also possessed a close relationship to the Hong Kong strain 26362/08 in VP1. Similarity plot analysis showed that 08TC170 was most similar to the Chinese CV-B3 strain SSM in P1 and the partial P2 coding region but to the CV-B5 or E-6 strain in 2C and following regions. A T277A mutation was found in 08TC170 and other strains isolated in 2008-2010, but not in strains isolated before 2008, which had high sequence similarity and formed the cluster A277. The results suggested that 08TC170 was the product of both intertypic recombination and point mutation, whose effects on viral neurovirulence will be investigated in a further study. The high homology between 08TC170 and other strains revealed their co-circulation in mainland China and Hong Kong and indicates that further surveillance is needed.
Approaches for in silico finishing of microbial genome sequences

PubMed Central

Kremer, Frederico Schmitt; McBride, Alan John Alexander; Pinto, Luciano da Silva

2017-01-01

Abstract The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as “drafts”, incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing. PMID:28898352
Approaches for in silico finishing of microbial genome sequences.

PubMed

Kremer, Frederico Schmitt; McBride, Alan John Alexander; Pinto, Luciano da Silva

The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as "drafts", incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing.
Innovative assembly strategy contributes to understanding the evolution and conservation genetics of the endangered Solenodon paradoxus from the island of Hispaniola.

PubMed

Grigorev, Kirill; Kliver, Sergey; Dobrynin, Pavel; Komissarov, Aleksey; Wolfsberger, Walter; Krasheninnikova, Ksenia; Afanador-Herna Ndez, Yashira M; Brandt, Adam L; Paulino, Liz A; Carreras, Rosanna; Rodríguez, Luis E; Nu N Ez, Adrell; Brandt, Jessica R; Silva, Filipe; Herna Ndez-Martich, J David; Majeske, Audrey J; Antunes, Agostinho; Roca, Alfred L; O'Brien, Stephen J; Martínez-Cruzado, Juan Carlos; Oleksyk, Taras K

2018-03-16

Solenodons are insectivores living in Hispaniola and Cuba that form an isolated branch in the tree of placental mammals highly divergent from other eulipothyplan insectivores The history, unique biology and adaptations of these enigmatic venomous species could be illuminated by the availability of genome data, but a whole genome assembly for solenodons has not been previously performed, partially due to the difficulty in obtaining samples from the field. Island isolation and reduced numbers have likely resulted in high homozygosity within the Hispaniolan solenodon (Solenodon paradoxus), thus we tested the performance of several assembly strategies on the genome of this genetically impoverished species. The string-graph based assembly strategy seemed a better choice compared to the conventional de Bruijn graph approach, due to the high levels of homozygosity, which is often a hallmark of endemic or endangered species. A consensus reference genome was assembled from sequences of five individuals from the southern subspecies (S. p. woodi). In addition, we obtained additional sequence from one sample of the northern subspecies (S. p. paradoxus). The resulting genome assemblies were compared to each other, and annotated for genes, with a specific emphasis on venom genes, repeats, variable microsatellite loci and other genomic variants. Phylogenetic positioning and selection signatures were inferred based on 4,416 single copy orthologs from 10 other mammals. We estimated that solenodons diverged from other extant mammals 73.6 Mya. Patterns of SNP variation allowed us to infer population demography, which supported a subspecies split within the Hispaniolan solenodon at least 300 Kya.
Mediator binding to UASs is broadly uncoupled from transcription and cooperative with TFIID recruitment to promoters.

PubMed

Grünberg, Sebastian; Henikoff, Steven; Hahn, Steven; Zentner, Gabriel E

2016-11-15

Mediator is a conserved, essential transcriptional coactivator complex, but its in vivo functions have remained unclear due to conflicting data regarding its genome-wide binding pattern obtained by genome-wide ChIP Here, we used ChEC-seq, a method orthogonal to ChIP, to generate a high-resolution map of Mediator binding to the yeast genome. We find that Mediator associates with upstream activating sequences (UASs) rather than the core promoter or gene body under all conditions tested. Mediator occupancy is surprisingly correlated with transcription levels at only a small fraction of genes. Using the same approach to map TFIID, we find that TFIID is associated with both TFIID- and SAGA-dependent genes and that TFIID and Mediator occupancy is cooperative. Our results clarify Mediator recruitment and binding to the genome, showing that Mediator binding to UASs is widespread, partially uncoupled from transcription, and mediated in part by TFIID. © 2016 The Authors.
How Much Do rRNA Gene Surveys Underestimate Extant Bacterial Diversity?

PubMed

Rodriguez-R, Luis M; Castro, Juan C; Kyrpides, Nikos C; Cole, James R; Tiedje, James M; Konstantinidis, Konstantinos T

2018-03-15

The most common practice in studying and cataloguing prokaryotic diversity involves the grouping of sequences into operational taxonomic units (OTUs) at the 97% 16S rRNA gene sequence identity level, often using partial gene sequences, such as PCR-generated amplicons. Due to the high sequence conservation of rRNA genes, organisms belonging to closely related yet distinct species may be grouped under the same OTU. However, it remains unclear how much diversity has been underestimated by this practice. To address this question, we compared the OTUs of genomes defined at the 97% or 98.5% 16S rRNA gene identity level against OTUs of the same genomes defined at the 95% whole-genome average nucleotide identity (ANI), which is a much more accurate proxy for species. Our results show that OTUs resulting from a 98.5% 16S rRNA gene identity cutoff are more accurate than 97% compared to 95% ANI (90.5% versus 89.9% accuracy) but indistinguishable from any other threshold in the 98.29 to 98.78% range. Even with the more stringent thresholds, however, the 16S rRNA gene-based approach commonly underestimates the number of OTUs by ∼12%, on average, compared to the ANI-based approach (∼14% underestimation when using the 97% identity threshold). More importantly, the degree of underestimation can become 50% or more for certain taxa, such as the genera Pseudomonas , Burkholderia , Escherichia , Campylobacter , and Citrobacter These results provide a quantitative view of the degree of underestimation of extant prokaryotic diversity by 16S rRNA gene-defined OTUs and suggest that genomic resolution is often necessary. IMPORTANCE Species diversity is one of the most fundamental pieces of information for community ecology and conservational biology. Therefore, employing accurate proxies for what a species or the unit of diversity is are cornerstones for a large set of microbial ecology and diversity studies. The most common proxies currently used rely on the clustering of 16S rRNA gene sequences at some threshold of nucleotide identity, typically 97% or 98.5%. Here, we explore how well this strategy reflects the more accurate whole-genome-based proxies and determine the frequency with which the high conservation of 16S rRNA sequences masks substantial species-level diversity. Copyright © 2018 American Society for Microbiology.

Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

PubMed Central

2011-01-01

Background Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS) of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence. Results An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA containing putative SNPs was amplified by PCR from AL8/78 and AS75 and resequenced with the ABI 3730 xl. In a sample of 302 randomly selected putative SNPs, 84.0% in gene regions, 88.0% in repeat junctions, and 81.3% in uncharacterized regions were validated. Conclusion An annotation-based genome-wide SNP discovery pipeline for NGS platforms was developed. The pipeline is suitable for SNP discovery in genomic libraries of complex genomes and does not require a reference genome sequence. The pipeline is applicable to all current NGS platforms, provided that at least one such platform generates relatively long reads. The pipeline package, AGSNP, and the discovered 497,118 Ae. tauschii SNPs can be accessed at (http://avena.pw.usda.gov/wheatD/agsnp.shtml). PMID:21266061
The Nerium oleander aphid Aphis nerii is tolerant to a local isolate of Aphid lethal paralysis virus (ALPV).

PubMed

Dombrovsky, Aviv; Luria, Neta

2013-04-01

In a survey that was conducted during the year 2011, a local strain of Aphid lethal paralysis virus (ALPV) was identified and isolated from a wild population of Aphis nerii aphids living on Nerium oleander plants located in northern Israel. The new strain was tentatively named (ALPV-An). RNA extracted from the viral particles allowed the amplification and determination of the complete genome sequence. The virus genome is comprised of 9835 nucleotides. In a BLAST search analysis, the ALPV-An sequence showed 89 % nucleotide sequence identity with the whole genome of a South African ALPV and 96 and 94 % amino acid sequence identity with the ORF1 and ORF2 of that strain, respectively. In preliminary experiments, spray-applied, purified ALPV virions were highly pathogenic to the green peach aphid Myzus persicae; 95 % mortality was recorded 4 days post-infection. These preliminary results demonstrate the potential of ALPV for use as a biologic agent for some aphid control. Surprisingly, no visible ALPV pathogenic effects, such as morphological changes or paralysis, were observed in the A. nerii aphids infected with ALPV-An. The absence of clear ALPV symptoms in A. nerii led to the formulation of two hypotheses, which were partially examined in this study. The first hypothesis suggest that A. nerii is resistant or tolerant of ALPV, while the second hypothesis propose that ALPV-An may be a mild strain of ALPV. Currently, our results is in favor with the first hypothesis since ALPV-An is cryptic in A. nerii aphids and can be lethal for M. persicae aphids.
The Incidence and Genetic Diversity of Apple Mosaic Virus (ApMV) and Prune Dwarf Virus (PDV) in Prunus Species in Australia

PubMed Central

Constable, Fiona E.; Nancarrow, Narelle; Rodoni, Brendan

2018-01-01

Apple mosaic virus (ApMV) and prune dwarf virus (PDV) are amongst the most common viruses infecting Prunus species worldwide but their incidence and genetic diversity in Australia is not known. In a survey of 127 Prunus tree samples collected from five states in Australia, ApMV and PDV occurred in 4 (3%) and 13 (10%) of the trees respectively. High-throughput sequencing (HTS) of amplicons from partial conserved regions of RNA1, RNA2, and RNA3, encoding the methyltransferase (MT), RNA-dependent RNA polymerase (RdRp), and the coat protein (CP) genes respectively, of ApMV and PDV was used to determine the genetic diversity of the Australian isolates of each virus. Phylogenetic comparison of Australian ApMV and PDV amplicon HTS variants and full length genomes of both viruses with isolates occurring in other countries identified genetic strains of each virus occurring in Australia. A single Australian Prunus infecting ApMV genetic strain was identified as all ApMV isolates sequence variants formed a single phylogenetic group in each of RNA1, RNA2, and RNA3. Two Australian PDV genetic strains were identified based on the combination of observed phylogenetic groups in each of RNA1, RNA2, and RNA3 and one Prunus tree had both strains. The accuracy of amplicon sequence variants phylogenetic analysis based on segments of each virus RNA were confirmed by phylogenetic analysis of full length genome sequences of Australian ApMV and PDV isolates and all published ApMV and PDV genomes from other countries. PMID:29562672
Gene discovery in Boophilus microplus, the cattle tick: the transcriptomes of ovaries, salivary glands, and hemocytes.

PubMed

Santos, Isabel K F de Miranda; Valenzuela, Jesus G; Ribeiro, José Marcos C; de Castro, Marilia; Costa, Juliana Nardelli; Costa, Ana Maria; da Silva, Edson Ramiro; Neto, Olavo Bilac Rego; Rocha, Clarisse; Daffre, Sirlei; Ferreira, Beatriz R; da Silva, João Santana; Szabó, Matias Pablo; Bechara, Gervasio Henrique

2004-10-01

The quest for new control strategies for ticks can profit from high throughput genomics. In order to identify genes that are involved in oogenesis and development, in defense, and in hematophagy, the transcriptomes of ovaries, hemocytes, and salivary glands from rapidly ingurgitating females, and of salivary glands from males of Boophilus microplus were PCR amplified, and the expressed sequence tags (EST) of random clones were mass sequenced. So far, more than 1,344 EST have been generated for these tissues, with approximately 30% novelty, depending on the the tissue studied. To date approximately 760 nucleotide sequences from B. microplus are deposited in the NCBI database. Mass sequencing of partial cDNAs of parasite genes can build up this scant database and rapidly generate a large quantity of useful information about potential targets for immunobiological or chemical control.
Deciphering viral presences: two novel partial giant viruses detected in marine metagenome and in a mine drainage metagenome.

PubMed

Andreani, Julien; Verneau, Jonathan; Raoult, Didier; Levasseur, Anthony; La Scola, Bernard

2018-04-10

Nucleo-cytoplasmic large DNA viruses are doubled stranded DNA viruses capable of infecting eukaryotic cells. Since the discovery of Mimivirus and Pandoravirus, there has been no doubt about their extraordinary features compared to "classic" viruses. Recently, we reported the expansion of the proposed family Pithoviridae, with the description of Cedratvirus and Orpheovirus, two new viruses related to Pithoviruses. Studying the major capsid protein of Orpheovirus, we detected a homologous sequence in a mine drainage metagenome. The in-depth exploration of this metagenome, using the MG-Digger program, enabled us to retrieve up to 10 contigs with clear evidence of viral sequences. Moreover, phylogenetic analyses further extended our screening with the discovery in another marine metagenome of a second virus closely related to Orpheovirus IHUMI-LCC2. This virus is a misidentified virus confused with and annotated as a Rickettsiales bacterium. It presents a partial genome size of about 170 kbp.
Whole Genome Sequencing of Greater Amberjack (Seriola dumerili) for SNP Identification on Aligned Scaffolds and Genome Structural Variation Analysis Using Parallel Resequencing

PubMed Central

Aokic, Jun-ya; Kawase, Junya; Hamada, Kazuhisa; Fujimoto, Hiroshi; Yamamoto, Ikki; Usuki, Hironori

2018-01-01

Greater amberjack (Seriola dumerili) is distributed in tropical and temperate waters worldwide and is an important aquaculture fish. We carried out de novo sequencing of the greater amberjack genome to construct a reference genome sequence to identify single nucleotide polymorphisms (SNPs) for breeding amberjack by marker-assisted or gene-assisted selection as well as to identify functional genes for biological traits. We obtained 200 times coverage and constructed a high-quality genome assembly using next generation sequencing technology. The assembled sequences were aligned onto a yellowtail (Seriola quinqueradiata) radiation hybrid (RH) physical map by sequence homology. A total of 215 of the longest amberjack sequences, with a total length of 622.8 Mbp (92% of the total length of the genome scaffolds), were lined up on the yellowtail RH map. We resequenced the whole genomes of 20 greater amberjacks and mapped the resulting sequences onto the reference genome sequence. About 186,000 nonredundant SNPs were successfully ordered on the reference genome. Further, we found differences in the genome structural variations between two greater amberjack populations using BreakDancer. We also analyzed the greater amberjack transcriptome and mapped the annotated sequences onto the reference genome sequence. PMID:29785397
Rapid and accurate pyrosequencing of angiosperm plastid genomes

PubMed Central

Moore, Michael J; Dhingra, Amit; Soltis, Pamela S; Shaw, Regina; Farmerie, William G; Folta, Kevin M; Soltis, Douglas E

2006-01-01

Background Plastid genome sequence information is vital to several disciplines in plant biology, including phylogenetics and molecular biology. The past five years have witnessed a dramatic increase in the number of completely sequenced plastid genomes, fuelled largely by advances in conventional Sanger sequencing technology. Here we report a further significant reduction in time and cost for plastid genome sequencing through the successful use of a newly available pyrosequencing platform, the Genome Sequencer 20 (GS 20) System (454 Life Sciences Corporation), to rapidly and accurately sequence the whole plastid genomes of the basal eudicot angiosperms Nandina domestica (Berberidaceae) and Platanus occidentalis (Platanaceae). Results More than 99.75% of each plastid genome was simultaneously obtained during two GS 20 sequence runs, to an average depth of coverage of 24.6× in Nandina and 17.3× in Platanus. The Nandina and Platanus plastid genomes shared essentially identical gene complements and possessed the typical angiosperm plastid structure and gene arrangement. To assess the accuracy of the GS 20 sequence, over 45 kilobases of sequence were generated for each genome using conventional sequencing. Overall error rates of 0.043% and 0.031% were observed in GS 20 sequence for Nandina and Platanus, respectively. More than 97% of all observed errors were associated with homopolymer runs, with ~60% of all errors associated with homopolymer runs of 5 or more nucleotides and ~50% of all errors associated with regions of extensive homopolymer runs. No substitution errors were present in either genome. Error rates were generally higher in the single-copy and noncoding regions of both plastid genomes relative to the inverted repeat and coding regions. Conclusion Highly accurate and essentially complete sequence information was obtained for the Nandina and Platanus plastid genomes using the GS 20 System. More importantly, the high accuracy observed in the GS 20 plastid genome sequence was generated for a significant reduction in time and cost over traditional shotgun-based genome sequencing techniques, although with approximately half the coverage of previously reported GS 20 de novo genome sequence. The GS 20 should be broadly applicable to angiosperm plastid genome sequencing, and therefore promises to expand the scale of plant genetic and phylogenetic research dramatically. PMID:16934154
Genome Sequencing of Steroid Producing Bacteria Using Ion Torrent Technology and a Reference Genome.

PubMed

Sola-Landa, Alberto; Rodríguez-García, Antonio; Barreiro, Carlos; Pérez-Redondo, Rosario

2017-01-01

The Next-Generation Sequencing technology has enormously eased the bacterial genome sequencing and several tens of thousands of genomes have been sequenced during the last 10 years. Most of the genome projects are published as draft version, however, for certain applications the complete genome sequence is required.In this chapter, we describe the strategy that allowed the complete genome sequencing of Mycobacterium neoaurum NRRL B-3805, an industrial strain exploited for steroid production, using Ion Torrent sequencing reads and the genome of a close strain as the reference. This protocol can be applied to analyze the genetic variations between closely related strains; for example, to elucidate the point mutations between a parental strain and a random mutagenesis-derived mutant.
BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes.

PubMed

Staňková, Helena; Hastie, Alex R; Chan, Saki; Vrána, Jan; Tulpová, Zuzana; Kubaláková, Marie; Visendi, Paul; Hayashi, Satomi; Luo, Mingcheng; Batley, Jacqueline; Edwards, David; Doležel, Jaroslav; Šimková, Hana

2016-07-01

The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC-by-BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high-resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high-resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome-scale analysis of repetitive sequences and revealed a ~800-kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone-by-clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC-contig physical map and validate sequence assembly on a chromosome-arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome-by-chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Bioinformatics analysis and detection of gelatinase encoded gene in Lysinibacillussphaericus

NASA Astrophysics Data System (ADS)

Repin, Rul Aisyah Mat; Mutalib, Sahilah Abdul; Shahimi, Safiyyah; Khalid, Rozida Mohd.; Ayob, Mohd. Khan; Bakar, Mohd. Faizal Abu; Isa, Mohd Noor Mat

2016-11-01

In this study, we performed bioinformatics analysis toward genome sequence of Lysinibacillussphaericus (L. sphaericus) to determine gene encoded for gelatinase. L. sphaericus was isolated from soil and gelatinase species-specific bacterium to porcine and bovine gelatin. This bacterium offers the possibility of enzymes production which is specific to both species of meat, respectively. The main focus of this research is to identify the gelatinase encoded gene within the bacteria of L. Sphaericus using bioinformatics analysis of partially sequence genome. From the research study, three candidate gene were identified which was, gelatinase candidate gene 1 (P1), NODE_71_length_93919_cov_158.931839_21 which containing 1563 base pair (bp) in size with 520 amino acids sequence; Secondly, gelatinase candidate gene 2 (P2), NODE_23_length_52851_cov_190.061386_17 which containing 1776 bp in size with 591 amino acids sequence; and Thirdly, gelatinase candidate gene 3 (P3), NODE_106_length_32943_cov_169.147919_8 containing 1701 bp in size with 566 amino acids sequence. Three pairs of oligonucleotide primers were designed and namely as, F1, R1, F2, R2, F3 and R3 were targeted short sequences of cDNA by PCR. The amplicons were reliably results in 1563 bp in size for candidate gene P1 and 1701 bp in size for candidate gene P3. Therefore, the results of bioinformatics analysis of L. Sphaericus resulting in gene encoded gelatinase were identified.
Isolation and molecular characterization of partial FSH and LH receptor genes in Arabian camels (Camelus dromedarius)

PubMed Central

Jelokhani-Niaraki, Saber; Tahmoorespur, Mojtaba; Bitaraf-Sani, Morteza

2015-01-01

Very little is known about LHR and FSHR genes of domestic dromedary camels. The main objective of this study was to determine and analyze partial genomic regions of FSHR and LHR genes in dromedary camels for the first time. To this end, a total of50 DNA samples belonging to dromedary camels raised in Iran were sent for sequencing (25 samples of each gene). We compared the nucleotide sequences of Camelus dromedarius with corresponding sequences of previously published FSHR and LHR genes in bactrian camels and other species. According to the data, the same nucleotide variation was identified in both regions of the two camel species. The alignment of deduced protein sequences of the two different species revealed an amino acid variation at the FSHR region. No evidence of amino acid variation was observed, however, in LHR sequences. Phylogenetic analysis indicated that both camel species had a close relationship and clustered together in a separate branch. This was further confirmed by genetic distance values illustrating significant sequence identity between Camelus dromedarius and Camelus bactrianus. Interestingly, sequence comparisons revealed heterozygote patterns in FSHR sequences isolated from dromedary camels of Iran. In comparison to other species, this camel contains three amino acid substitutions at 5, 67, and 105 positions in the FSHR coding region. These positions are found exclusively in camels and can be considered as species specific. The results of our study can be used for hormone functionality research (FSHR and LHR) as well as reproduction-linked polymorphisms and breeding programs. PMID:27844002
Isolation and molecular characterization of partial FSH and LH receptor genes in Arabian camels (Camelus dromedarius).

PubMed

Jelokhani-Niaraki, Saber; Tahmoorespur, Mojtaba; Bitaraf-Sani, Morteza

2015-06-01

Very little is known about LHR and FSHR genes of domestic dromedary camels. The main objective of this study was to determine and analyze partial genomic regions of FSHR and LHR genes in dromedary camels for the first time. To this end, a total of50 DNA samples belonging to dromedary camels raised in Iran were sent for sequencing (25 samples of each gene). We compared the nucleotide sequences of Camelus dromedarius with corresponding sequences of previously published FSHR and LHR genes in bactrian camels and other species. According to the data, the same nucleotide variation was identified in both regions of the two camel species. The alignment of deduced protein sequences of the two different species revealed an amino acid variation at the FSHR region. No evidence of amino acid variation was observed, however, in LHR sequences. Phylogenetic analysis indicated that both camel species had a close relationship and clustered together in a separate branch. This was further confirmed by genetic distance values illustrating significant sequence identity between Camelus dromedarius and Camelus bactrianus . Interestingly, sequence comparisons revealed heterozygote patterns in FSHR sequences isolated from dromedary camels of Iran. In comparison to other species, this camel contains three amino acid substitutions at 5, 67, and 105 positions in the FSHR coding region. These positions are found exclusively in camels and can be considered as species specific. The results of our study can be used for hormone functionality research ( FSHR and LHR ) as well as reproduction-linked polymorphisms and breeding programs.
Whole-genome random sequencing and assembly of Haemophilus influenzae Rd

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fleischmann, R.D.; Adams, M.D.; White, O.

1995-07-28

An approach for genome analysis based on sequencing and assembly of unselected pieces of DNA from the whole chromosome has been applied to obtain the complete nucleotide sequence (1,830,137 base pairs) of the genome from the bacterium Haemophilus influenzae Rd. This approach eliminates the need for initial mapping efforts and is therefore applicable to the vast array of microbial species for which genome maps are unavailable. The H. influenzae Rd genome sequence (Genome Sequence DataBase accession number L42023) represents the only complete genome sequence from a free-living organism. 46 refs., 4 figs., 4 tabs.
The abundance of homoeologue transcripts is disrupted by hybridization and is partially restored by genome doubling in synthetic hexaploid wheat.

PubMed

Hao, Ming; Li, Aili; Shi, Tongwei; Luo, Jiangtao; Zhang, Lianquan; Zhang, Xuechuan; Ning, Shunzong; Yuan, Zhongwei; Zeng, Deying; Kong, Xingchen; Li, Xiaolong; Zheng, Hongkun; Lan, Xiujin; Zhang, Huaigang; Zheng, Youliang; Mao, Long; Liu, Dengcai

2017-02-10

The formation of an allopolyploid is a two step process, comprising an initial wide hybridization event, which is later followed by a whole genome doubling. Both processes can affect the transcription of homoeologues. Here, RNA-Seq was used to obtain the genome-wide leaf transcriptome of two independent Triticum turgidum × Aegilops tauschii allotriploids (F1), along with their spontaneous allohexaploids (S1) and their parental lines. The resulting sequence data were then used to characterize variation in homoeologue transcript abundance. The hybridization event strongly down-regulated D-subgenome homoeologues, but this effect was in many cases reversed by whole genome doubling. The suppression of D-subgenome homoeologue transcription resulted in a marked frequency of parental transcription level dominance, especially with respect to genes encoding proteins involved in photosynthesis. Singletons (genes where no homoeologues were present) were frequently transcribed at both the allotriploid and allohexaploid plants. The implication is that whole genome doubling helps to overcome the phenotypic weakness of the allotriploid, restoring a more favourable gene dosage in genes experiencing transcription level dominance in hexaploid wheat.
Structural forms of the human amylase locus and their relationships to SNPs, haplotypes, and obesity

PubMed Central

Usher, Christina L; Handsaker, Robert E; Esko, Tõnu; Tuke, Marcus A; Weedon, Michael N; Hastie, Alex R; Cao, Han; Moon, Jennifer E; Kashin, Seva; Fuchsberger, Christian; Metspalu, Andres; Pato, Carlos N; Pato, Michele T; McCarthy, Mark I; Boehnke, Michael; Altshuler, David M; Frayling, Timothy M; Hirschhorn, Joel N; McCarroll, Steven A

2016-01-01

Hundreds of genes reside in structurally complex, poorly understood regions of the human genome1-3. One such region contains the three amylase genes (AMY2B, AMY2A, and AMY1) responsible for digesting starch into sugar. The copy number of AMY1 is reported to be the genome’s largest influence on obesity4, though genome-wide association studies for obesity have found this locus unremarkable. Using whole genome sequence analysis3,5, droplet digital PCR6, and genome mapping7, we identified eight common structural haplotypes of the amylase locus that suggest its mutational history. We found that AMY1 copy number in individuals’ genomes is generally even (rather than odd) and partially correlates to nearby SNPs, which do not associate with BMI. We measured amylase gene copy number in 1,000 obese or lean Estonians and in two other cohorts totaling ~3,500 individuals. We had 99% power to detect the lower bound of the reported effects on BMI4, yet found no association. PMID:26098870
Expressed sequence tags (ESTs) from immune tissues of turbot (Scophthalmus maximus) challenged with pathogens

PubMed Central

Pardo, Belén G; Fernández, Carlos; Millán, Adrián; Bouza, Carmen; Vázquez-López, Araceli; Vera, Manuel; Alvarez-Dios, José A; Calaza, Manuel; Gómez-Tato, Antonio; Vázquez, María; Cabaleiro, Santiago; Magariños, Beatriz; Lemos, Manuel L; Leiro, José M; Martínez, Paulino

2008-01-01

Background The turbot (Scophthalmus maximus; Scophthalmidae; Pleuronectiformes) is a flatfish species of great relevance for marine aquaculture in Europe. In contrast to other cultured flatfish, very few genomic resources are available in this species. Aeromonas salmonicida and Philasterides dicentrarchi are two pathogens that affect turbot culture causing serious economic losses to the turbot industry. Little is known about the molecular mechanisms for disease resistance and host-pathogen interactions in this species. In this work, thousands of ESTs for functional genomic studies and potential markers linked to ESTs for mapping (microsatellites and single nucleotide polymorphisms (SNPs)) are provided. This information enabled us to obtain a preliminary view of regulated genes in response to these pathogens and it constitutes the basis for subsequent and more accurate microarray analysis. Results A total of 12584 cDNAs partially sequenced from three different cDNA libraries of turbot (Scophthalmus maximus) infected with Aeromonas salmonicida, Philasterides dicentrarchi and from healthy fish were analyzed. Three immune-relevant tissues (liver, spleen and head kidney) were sampled at several time points in the infection process for library construction. The sequences were processed into 9256 high-quality sequences, which constituted the source for the turbot EST database. Clustering and assembly of these sequences, revealed 3482 different putative transcripts, 1073 contigs and 2409 singletons. BLAST searches with public databases detected significant similarity (e-value ≤ 1e-5) in 1766 (50.7%) sequences and 816 of them (23.4%) could be functionally annotated. Two hundred three of these genes (24.9%), encoding for defence/immune-related proteins, were mostly identified for the first time in turbot. Some ESTs showed significant differences in the number of transcripts when comparing the three libraries, suggesting regulation in response to these pathogens. A total of 191 microsatellites, with 104 having sufficient flanking sequences for primer design, and 1158 putative SNPs were identified from these EST resources in turbot. Conclusion A collection of 9256 high-quality ESTs was generated representing 3482 unique turbot sequences. A large proportion of defence/immune-related genes were identified, many of them regulated in response to specific pathogens. Putative microsatellites and SNPs were identified. These genome resources constitute the basis to develop a microarray for functional genomics studies and marker validation for genetic linkage and QTL analysis in turbot. PMID:18817567
Towards pathogenomics: a web-based resource for pathogenicity islands

PubMed Central

Yoon, Sung Ho; Park, Young-Kyu; Lee, Soohyun; Choi, Doil; Oh, Tae Kwang; Hur, Cheol-Goo; Kim, Jihyun F.

2007-01-01

Pathogenicity islands (PAIs) are genetic elements whose products are essential to the process of disease development. They have been horizontally (laterally) transferred from other microbes and are important in evolution of pathogenesis. In this study, a comprehensive database and search engines specialized for PAIs were established. The pathogenicity island database (PAIDB) is a comprehensive relational database of all the reported PAIs and potential PAI regions which were predicted by a method that combines feature-based analysis and similarity-based analysis. Also, using the PAI Finder search application, a multi-sequence query can be analyzed onsite for the presence of potential PAIs. As of April 2006, PAIDB contains 112 types of PAIs and 889 GenBank accessions containing either partial or all PAI loci previously reported in the literature, which are present in 497 strains of pathogenic bacteria. The database also offers 310 candidate PAIs predicted from 118 sequenced prokaryotic genomes. With the increasing number of prokaryotic genomes without functional inference and sequenced genetic regions of suspected involvement in diseases, this web-based, user-friendly resource has the potential to be of significant use in pathogenomics. PAIDB is freely accessible at . PMID:17090594
Mitochondrial genomes reveal the extinct Hippidion as an outgroup to all living equids.

PubMed

Der Sarkissian, Clio; Vilstrup, Julia T; Schubert, Mikkel; Seguin-Orlando, Andaine; Eme, David; Weinstock, Jacobo; Alberdi, Maria Teresa; Martin, Fabiana; Lopez, Patricio M; Prado, Jose L; Prieto, Alfredo; Douady, Christophe J; Stafford, Tom W; Willerslev, Eske; Orlando, Ludovic

2015-03-01

Hippidions were equids with very distinctive anatomical features. They lived in South America 2.5 million years ago (Ma) until their extinction approximately 10 000 years ago. The evolutionary origin of the three known Hippidion morphospecies is still disputed. Based on palaeontological data, Hippidion could have diverged from the lineage leading to modern equids before 10 Ma. In contrast, a much later divergence date, with Hippidion nesting within modern equids, was indicated by partial ancient mitochondrial DNA sequences. Here, we characterized eight Hippidion complete mitochondrial genomes at 3.4-386.3-fold coverage using target-enrichment capture and next-generation sequencing. Our dataset reveals that the two morphospecies sequenced (H. saldiasi and H. principale) formed a monophyletic clade, basal to extant and extinct Equus lineages. This contrasts with previous genetic analyses and supports Hippidion as a distinct genus, in agreement with palaeontological models. We date the Hippidion split from Equus at 5.6-6.5 Ma, suggesting an early divergence in North America prior to the colonization of South America, after the formation of the Panamanian Isthmus 3.5 Ma and the Great American Biotic Interchange. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Mitochondrial genomes reveal the extinct Hippidion as an outgroup to all living equids

PubMed Central

Der Sarkissian, Clio; Vilstrup, Julia T.; Schubert, Mikkel; Seguin-Orlando, Andaine; Eme, David; Weinstock, Jacobo; Alberdi, Maria Teresa; Martin, Fabiana; Lopez, Patricio M.; Prado, Jose L.; Prieto, Alfredo; Douady, Christophe J.; Stafford, Tom W.; Willerslev, Eske; Orlando, Ludovic

2015-01-01

Hippidions were equids with very distinctive anatomical features. They lived in South America 2.5 million years ago (Ma) until their extinction approximately 10 000 years ago. The evolutionary origin of the three known Hippidion morphospecies is still disputed. Based on palaeontological data, Hippidion could have diverged from the lineage leading to modern equids before 10 Ma. In contrast, a much later divergence date, with Hippidion nesting within modern equids, was indicated by partial ancient mitochondrial DNA sequences. Here, we characterized eight Hippidion complete mitochondrial genomes at 3.4–386.3-fold coverage using target-enrichment capture and next-generation sequencing. Our dataset reveals that the two morphospecies sequenced (H. saldiasi and H. principale) formed a monophyletic clade, basal to extant and extinct Equus lineages. This contrasts with previous genetic analyses and supports Hippidion as a distinct genus, in agreement with palaeontological models. We date the Hippidion split from Equus at 5.6–6.5 Ma, suggesting an early divergence in North America prior to the colonization of South America, after the formation of the Panamanian Isthmus 3.5 Ma and the Great American Biotic Interchange. PMID:25762573
Identification of an expressed gene in Dipylidium caninum.

PubMed

Miranda, Rodrigo R C; Costa-Júnior, Livio M; Campos, Artur K; Santos, Hudson A; Rabelo, Elida M L

2004-10-01

Recombinant DNA studies have been focused on developing vaccines to different cestodes. But few studies involving Dipylidium caninum molecular biology and genes have been done. Only partial sequences of mitochondrial DNA and ribosomal RNA gene are available in databases. Any molecular work with this parasite, including epidemiology, study of drug-resistant strains, and vaccine development, is hampered by the lack of knowledge of its genome. Thus, the knowledge of specific genes of different developmental stages of D. caninum is crucial to locate potential targets to be used as candidates to develop a vaccine and/or new drugs against this parasite. Here we report, for the first time, the sequencing of a fragment of a D. caninum expressed gene.

Fungal genome sequencing: basic biology to biotechnology.

PubMed

Sharma, Krishna Kant

2016-08-01

The genome sequences provide a first glimpse into the genomic basis of the biological diversity of filamentous fungi and yeast. The genome sequence of the budding yeast, Saccharomyces cerevisiae, with a small genome size, unicellular growth, and rich history of genetic and molecular analyses was a milestone of early genomics in the 1990s. The subsequent completion of fission yeast, Schizosaccharomyces pombe and genetic model, Neurospora crassa initiated a revolution in the genomics of the fungal kingdom. In due course of time, a substantial number of fungal genomes have been sequenced and publicly released, representing the widest sampling of genomes from any eukaryotic kingdom. An ambitious genome-sequencing program provides a wealth of data on metabolic diversity within the fungal kingdom, thereby enhancing research into medical science, agriculture science, ecology, bioremediation, bioenergy, and the biotechnology industry. Fungal genomics have higher potential to positively affect human health, environmental health, and the planet's stored energy. With a significant increase in sequenced fungal genomes, the known diversity of genes encoding organic acids, antibiotics, enzymes, and their pathways has increased exponentially. Currently, over a hundred fungal genome sequences are publicly available; however, no inclusive review has been published. This review is an initiative to address the significance of the fungal genome-sequencing program and provides the road map for basic and applied research.
The Spectrum of Replication Errors in the Absence of Error Correction Assayed Across the Whole Genome of Escherichia coli.

PubMed

Niccum, Brittany A; Lee, Heewook; MohammedIsmail, Wazim; Tang, Haixu; Foster, Patricia L

2018-06-15

When the DNA polymerase that replicates the Escherichia coli chromosome, DNA Pol III, makes an error, there are two primary defenses against mutation: proofreading by the epsilon subunit of the holoenzyme and mismatch repair. In proofreading deficient strains, mismatch repair is partially saturated and the cell's response to DNA damage, the SOS response, may be partially induced. To investigate the nature of replication errors, we used mutation accumulation experiments and whole genome sequencing to determine mutation rates and mutational spectra across the entire chromosome of strains deficient in proofreading, mismatch repair, and the SOS response. We report that a proofreading-deficient strain has a mutation rate 4,000-fold greater than wild-type strains. While the SOS response may be induced in these cells, it does not contribute to the mutational load. Inactivating mismatch repair in a proofreading-deficient strain increases the mutation rate another 1.5-fold. DNA polymerase has a bias for converting G:C to A:T base pairs, but proofreading reduces the impact of these mutations, helping to maintain the genomic G:C content. These findings give an unprecedented view of how polymerase and error-correction pathways work together to maintain E. coli' s low mutation rate of 1 per thousand generations. Copyright © 2018, Genetics.
Molecular diversity and evolutionary history of rabies virus strains circulating in the Balkans.

PubMed

McElhinney, L M; Marston, D A; Freuling, C M; Cragg, W; Stankov, S; Lalosevic, D; Lalosevic, V; Müller, T; Fooks, A R

2011-09-01

Molecular studies of European classical rabies viruses (RABV) have revealed a number of geographically clustered lineages. To study the diversity of Balkan RABV, partial nucleoprotein (N) gene sequences were analysed from a unique panel of isolates (n = 210), collected from various hosts between 1972 and 2006. All of the Balkan isolates grouped within the European/Middle East Lineage, with the majority most closely related to East European strains. A number of RABV from Bosnia & Herzegovina and Montenegro, collected between 1986 and 2006, grouped with the West European strains, believed to be responsible for the rabies epizootic that spread throughout Europe in the latter half of the 20th Century. In contrast, no Serbian RABV belonged to this sublineage. However, a distinct group of Serbian fox RABV provided further evidence for the southwards wildlife-mediated movement of rabies from Hungary, Romania and Serbia into Bulgaria. To determine the optimal region for evolutionary analysis, partial, full and concatenated N-gene and glycoprotein (G) gene sequences were compared. Whilst both the divergence times and evolutionary rates were similar irrespective of genomic region, the 95 % highest probability density (HPD) limits were significantly reduced for full N-gene and concatenated NG-gene sequences compared with partial gene sequences. Bayesian coalescent analysis estimated the date of the most common recent ancestor of the Balkan RABV to be 1885 (95 % HPD, 1852-1913), and skyline plots suggested an expansion of the local viral population in 1980-1990, which coincides with the observed emergence of fox rabies in the region.
Genome Improvement at JGI-HAGSC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grimwood, Jane; Schmutz, Jeremy J.; Myers, Richard M.

Since the completion of the sequencing of the human genome, the Joint Genome Institute (JGI) has rapidly expanded its scientific goals in several DOE mission-relevant areas. At the JGI-HAGSC, we have kept pace with this rapid expansion of projects with our focus on assessing, assembling, improving and finishing eukaryotic whole genome shotgun (WGS) projects for which the shotgun sequence is generated at the Production Genomic Facility (JGI-PGF). We follow this by combining the draft WGS with genomic resources generated at JGI-HAGSC or in collaborator laboratories (including BAC end sequences, genetic maps and FLcDNA sequences) to produce an improved draft sequence.more » For eukaryotic genomes important to the DOE mission, we then add further information from directed experiments to produce reference genomic sequences that are publicly available for any scientific researcher. Also, we have continued our program for producing BAC-based finished sequence, both for adding information to JGI genome projects and for small BAC-based sequencing projects proposed through any of the JGI sequencing programs. We have now built our computational expertise in WGS assembly and analysis and have moved eukaryotic genome assembly from the JGI-PGF to JGI-HAGSC. We have concentrated our assembly development work on large plant genomes and complex fungal and algal genomes.« less
Reference-quality genome sequence of Aegilops tauschii, the source of wheat D genome, shows that recombination shapes genome structure and evolution

USDA-ARS?s Scientific Manuscript database

Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat and an important genetic resource for wheat. A reference-quality sequence for the Ae. tauschii genome was produced with a combination of ordered-clone sequencing, whole-genome shotgun sequencing, and BioNano optical geno...
Identification of a precursor genomic segment that provided a sequence unique to glycophorin B and E genes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Onda, M.; Kudo, S.; Fukuda, M.

Human glycophorin A, B, and E (GPA, GPB, and GPE) genes belong to a gene family located at the long arm of chromosome 4. These three genes are homologous from the 5'-flanking sequence to the Alu sequence, which is 1 kb downstream from the exon encoding the transmembrane domain. Analysis of the Alu sequence and flanking direct repeat sequences suggested that the GPA gene most closely resembles the ancestral gene, whereas the GPB and GPE gene arose by homologous recombination within the Alu sequence, acquiring 3' sequences from an unrelated precursor genomic segment. Here the authors describe the identification ofmore » this putative precursor genomic segment. A human genomic library was screened by using the sequence of the 3' region of the GPB gene as a probe. The genomic clones isolated were found to contain an Alu sequence that appeared to be involved in the recombination. Downstream from the Alu sequence, the nucleotide sequence of the precursor genomic segment is almost identical to that of the GPB or GPE gene. In contrast, the upstream sequence of the genomic segment differs entirely from that of the GPA, GPB, and GPE genes. Conservation of the direct repeats flanking the Alu sequence of the genomic segment strongly suggests that the sequence of this genomic segment has been maintained during evolution. This identified genomic segment was found to reside downstream from the GPA gene by both gene mapping and in situ chromosomal localization. The precursor genomic segment was also identified in the orangutan genome, which is known to lack GPB and GPE genes. These results indicate that one of the duplicated ancestral glycophorin genes acquired a unique 3' sequence by unequal crossing-over through its Alu sequence and the further downstream Alu sequence present in the duplicated gene. Further duplication and divergence of this gene yielded the GPB and GPE genes. 37 refs., 5 figs.« less
Analysis and Functional Annotation of an Expressed Sequence Tag Collection for Tropical Crop Sugarcane

PubMed Central

Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo

2003-01-01

To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979
in silico Whole Genome Sequencer & Analyzer (iWGS): A Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhou, Xiaofan; Peris, David; Kominek, Jacek

The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimentalmore » design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.« less
in silico Whole Genome Sequencer & Analyzer (iWGS): A Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies

DOE PAGES

Zhou, Xiaofan; Peris, David; Kominek, Jacek; ...

2016-09-16

The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimentalmore » design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.« less
Use of essential gene, encoding prophobilinogen deaminase from extreme psychrophilic Colwellia sp. C1, to generate temperature-sensitive strain of Francisella novicida.

PubMed

Pankowski, J A

2016-08-01

Previously, several essential genes from psychrophilic bacteria have been substituted for their homologues in mesophilic bacterial pathogens to make the latter temperature sensitive. It has been noted that an essential ligA gene from an extreme psychrophile, Colwellia sp. C1, yielded a gene product that is inactivated at 27°C, the lowest that has been observed for any psychrophilic enzyme, and hypothesized that other essential proteins of that strain would also have low inactivation temperatures. This work describes the partial sequencing of the genome of Colwellia sp. C1 strain and the identification of 24 open reading frames encoding homologues of highly conserved bacterial essential genes. The gene encoding porphobilinogen deaminase (hemC), which is involved in the pathway of haem synthesis, has been tested for its ability to convert Francisella novicida into a temperature-sensitive strain. The hybrid strain carrying the C1-derived hemC gene exhibited a temperature-sensitive phenotype with a restrictive temperature of 36°C. These results support the conclusion that Colwellia sp. C1 is a rich source of heat-labile enzymes. The issue of biosafety is often raised when it comes to work with pathogenic organisms. The main concern is caused by the risk of researchers being exposed to infectious doses of dangerous microbes. This paper analyses essential genes identified in partial genomic sequence of the psychrophilic bacterium Collwelia sp. C1. These sequences can be used as a mean of generating temperature-sensitive strains of pathogenic bacteria. Such strains are incapable of surviving at the temperature of human body. This means they could be applied as vaccines or for safer work with dangerous organisms. © 2016 The Society for Applied Microbiology.
Ultrasensitive Quantification of Hepatitis B Virus A1762T/G1764A Mutant by a SimpleProbe PCR Using a Wild-Type-Selective PCR Blocker and a Primer-Blocker-Probe Partial-Overlap Approach ▿

PubMed Central

Nie, Hui; Evans, Alison A.; London, W. Thomas; Block, Timothy M.; Ren, Xiangdong David

2011-01-01

Hepatitis B virus (HBV) carrying the A1762T/G1764A double mutation in the basal core promoter (BCP) region is associated with HBe antigen seroconversion and increased risk of liver cirrhosis and hepatocellular carcinoma (HCC). Quantification of the mutant viruses may help in predicting the risk of HCC. However, the viral genome tends to have nucleotide polymorphism, which makes it difficult to design hybridization-based assays including real-time PCR. Ultrasensitive quantification of the mutant viruses at the early developmental stage is even more challenging, as the mutant is masked by excessive amounts of the wild-type (WT) viruses. In this study, we developed a selective inhibitory PCR (siPCR) using a locked nucleic acid-based PCR blocker to selectively inhibit the amplification of the WT viral DNA but not the mutant DNA. At the end of siPCR, the proportion of the mutant could be increased by about 10,000-fold, making the mutant more readily detectable by downstream applications such as real-time PCR and DNA sequencing. We also describe a primer-probe partial overlap approach which significantly simplified the melting curve patterns and minimized the influence of viral genome polymorphism on assay accuracy. Analysis of 62 patient samples showed a complete match of the melting curve patterns with the sequencing results. More than 97% of HBV BCP sequences in the GenBank database can be correctly identified by the melting curve analysis. The combination of siPCR and the SimpleProbe real-time PCR enabled mutant quantification in the presence of a 100,000-fold excess of the WT DNA. PMID:21562108
Visualization of genome signatures of eukaryote genomes by batch-learning self-organizing map with a special emphasis on Drosophila genomes.

PubMed

Abe, Takashi; Hamano, Yuta; Ikemura, Toshimichi

2014-01-01

A strategy of evolutionary studies that can compare vast numbers of genome sequences is becoming increasingly important with the remarkable progress of high-throughput DNA sequencing methods. We previously established a sequence alignment-free clustering method "BLSOM" for di-, tri-, and tetranucleotide compositions in genome sequences, which can characterize sequence characteristics (genome signatures) of a wide range of species. In the present study, we generated BLSOMs for tetra- and pentanucleotide compositions in approximately one million sequence fragments derived from 101 eukaryotes, for which almost complete genome sequences were available. BLSOM recognized phylotype-specific characteristics (e.g., key combinations of oligonucleotide frequencies) in the genome sequences, permitting phylotype-specific clustering of the sequences without any information regarding the species. In our detailed examination of 12 Drosophila species, the correlation between their phylogenetic classification and the classification on the BLSOMs was observed to visualize oligonucleotides diagnostic for species-specific clustering.
The diploid genome sequence of an Asian individual

PubMed Central

Wang, Jun; Wang, Wei; Li, Ruiqiang; Li, Yingrui; Tian, Geng; Goodman, Laurie; Fan, Wei; Zhang, Junqing; Li, Jun; Zhang, Juanbin; Guo, Yiran; Feng, Binxiao; Li, Heng; Lu, Yao; Fang, Xiaodong; Liang, Huiqing; Du, Zhenglin; Li, Dong; Zhao, Yiqing; Hu, Yujie; Yang, Zhenzhen; Zheng, Hancheng; Hellmann, Ines; Inouye, Michael; Pool, John; Yi, Xin; Zhao, Jing; Duan, Jinjie; Zhou, Yan; Qin, Junjie; Ma, Lijia; Li, Guoqing; Yang, Zhentao; Zhang, Guojie; Yang, Bin; Yu, Chang; Liang, Fang; Li, Wenjie; Li, Shaochuan; Li, Dawei; Ni, Peixiang; Ruan, Jue; Li, Qibin; Zhu, Hongmei; Liu, Dongyuan; Lu, Zhike; Li, Ning; Guo, Guangwu; Zhang, Jianguo; Ye, Jia; Fang, Lin; Hao, Qin; Chen, Quan; Liang, Yu; Su, Yeyang; san, A.; Ping, Cuo; Yang, Shuang; Chen, Fang; Li, Li; Zhou, Ke; Zheng, Hongkun; Ren, Yuanyuan; Yang, Ling; Gao, Yang; Yang, Guohua; Li, Zhuo; Feng, Xiaoli; Kristiansen, Karsten; Wong, Gane Ka-Shu; Nielsen, Rasmus; Durbin, Richard; Bolund, Lars; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian

2009-01-01

Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics. PMID:18987735
Snake Genome Sequencing: Results and Future Prospects

PubMed Central

Kerkkamp, Harald M. I.; Kini, R. Manjunatha; Pospelov, Alexey S.; Vonk, Freek J.; Henkel, Christiaan V.; Richardson, Michael K.

2016-01-01

Snake genome sequencing is in its infancy—very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression. PMID:27916957
Snake Genome Sequencing: Results and Future Prospects.

PubMed

Kerkkamp, Harald M I; Kini, R Manjunatha; Pospelov, Alexey S; Vonk, Freek J; Henkel, Christiaan V; Richardson, Michael K

2016-12-01

Snake genome sequencing is in its infancy-very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression.
Sequencing and comparative genomic analysis of 1227 Felis catus cDNA sequences enriched for developmental, clinical and nutritional phenotypes

PubMed Central

2012-01-01

Background The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated. Results We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes. Conclusions The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information. PMID:22257742
Detection of a novel herpesvirus from bats in the Philippines.

PubMed

Sano, Kaori; Okazaki, Sachiko; Taniguchi, Satoshi; Masangkay, Joseph S; Puentespina, Roberto; Eres, Eduardo; Cosico, Edison; Quibod, Niña; Kondo, Taisuke; Shimoda, Hiroshi; Hatta, Yuuki; Mitomo, Shumpei; Oba, Mami; Katayama, Yukie; Sassa, Yukiko; Furuya, Tetsuya; Nagai, Makoto; Une, Yumi; Maeda, Ken; Kyuwa, Shigeru; Yoshikawa, Yasuhiro; Akashi, Hiroomi; Omatsu, Tsutomu; Mizutani, Tetsuya

2015-08-01

Bats are natural hosts of many zoonotic viruses. Monitoring bat viruses is important to detect novel bat-borne infectious diseases. In this study, next generation sequencing techniques and conventional PCR were used to analyze intestine, lung, and blood clot samples collected from wild bats captured at three locations in Davao region, in the Philippines in 2012. Different viral genes belonging to the Retroviridae and Herpesviridae families were identified using next generation sequencing. The existence of herpesvirus in the samples was confirmed by PCR using herpesvirus consensus primers. The nucleotide sequences of the resulting PCR amplicons were 166-bp. Further phylogenetic analysis identified that the virus from which this nucleotide sequence was obtained belonged to the Gammaherpesvirinae subfamily. PCR using primers specific to the nucleotide sequence obtained revealed that the infection rate among the captured bats was 30 %. In this study, we present the partial genome of a novel gammaherpesvirus detected from wild bats. Our observations also indicate that this herpesvirus may be widely distributed in bat populations in Davao region.
Whole Genome Complete Resequencing of Bacillus subtilis Natto by Combining Long Reads with High-Quality Short Reads

PubMed Central

Kamada, Mayumi; Hase, Sumitaka; Sato, Kengo; Toyoda, Atsushi; Fujiyama, Asao; Sakakibara, Yasubumi

2014-01-01

De novo microbial genome sequencing reached a turning point with third-generation sequencing (TGS) platforms, and several microbial genomes have been improved by TGS long reads. Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and it has a function in the production of the traditional Japanese fermented food “natto.” The B. subtilis natto BEST195 genome was previously sequenced with short reads, but it included some incomplete regions. We resequenced the BEST195 genome using a PacBio RS sequencer, and we successfully obtained a complete genome sequence from one scaffold without any gaps, and we also applied Illumina MiSeq short reads to enhance quality. Compared with the previous BEST195 draft genome and Marburg 168 genome, we found that incomplete regions in the previous genome sequence were attributed to GC-bias and repetitive sequences, and we also identified some novel genes that are found only in the new genome. PMID:25329997
GeNemo: a search engine for web-based functional genomic data.

PubMed

Zhang, Yongqing; Cao, Xiaoyi; Zhong, Sheng

2016-07-08

A set of new data types emerged from functional genomic assays, including ChIP-seq, DNase-seq, FAIRE-seq and others. The results are typically stored as genome-wide intensities (WIG/bigWig files) or functional genomic regions (peak/BED files). These data types present new challenges to big data science. Here, we present GeNemo, a web-based search engine for functional genomic data. GeNemo searches user-input data against online functional genomic datasets, including the entire collection of ENCODE and mouse ENCODE datasets. Unlike text-based search engines, GeNemo's searches are based on pattern matching of functional genomic regions. This distinguishes GeNemo from text or DNA sequence searches. The user can input any complete or partial functional genomic dataset, for example, a binding intensity file (bigWig) or a peak file. GeNemo reports any genomic regions, ranging from hundred bases to hundred thousand bases, from any of the online ENCODE datasets that share similar functional (binding, modification, accessibility) patterns. This is enabled by a Markov Chain Monte Carlo-based maximization process, executed on up to 24 parallel computing threads. By clicking on a search result, the user can visually compare her/his data with the found datasets and navigate the identified genomic regions. GeNemo is available at www.genemo.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Characterization of Transposable Elements in Laccaria bicolor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Labbe, Jessy L; Murat, Claude; Morin, Emmanuelle

2012-01-01

Background: The publicly available Laccaria bicolor genome sequence has provided a considerable genomic resource allowing systematic identification of transposable elements (TEs) in this symbiotic ectomycorrhizal fungus. Using a TE-specific annotation pipeline we have characterized and analyzed TEs in the L. bicolor S238N-H82 genome. Methodology/Principal Findings: TEs occupy 24% of the 60 Mb L. bicolor genome and represent 25,787 full-length and partial copies elements distributed within 172 families. The most abundant elements were the Copia-like. TEs are not randomly distributed across the genome, but are tightly nested or clustered. The majority of TEs are ancient except some terminal inverted repeats (TIRS),more » long terminal repeats (LTRs) and a large retrotransposon derivative (LARD) element. There were three main periods of TEs expansion in L. bicolor; the first from 57 to 10 Mya, the second from 5 to 1 Mya and the most recent from 500,000 years ago until now. LTR retrotransposons are closely related to retrotransposons found in another basidiomycete, Coprinopsis cinerea. Conclusions: This analysis represents an initial characterization of TEs in the L. bicolor genome, contributes to genome assembly and to a greater understanding of the role TEs played in genome organization and evolution, and provides a valuable resource for the ongoing Laccaria Pan-Genome project supported by the U.S.-DOE Joint Genome Institute.« less

Sequencing intractable DNA to close microbial genomes.

PubMed

Hurt, Richard A; Brown, Steven D; Podar, Mircea; Palumbo, Anthony V; Elias, Dwayne A

2012-01-01

Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.
Genomics and privacy: implications of the new reality of closed data for the field.

PubMed

Greenbaum, Dov; Sboner, Andrea; Mu, Xinmeng Jasmine; Gerstein, Mark

2011-12-01

Open source and open data have been driving forces in bioinformatics in the past. However, privacy concerns may soon change the landscape, limiting future access to important data sets, including personal genomics data. Here we survey this situation in some detail, describing, in particular, how the large scale of the data from personal genomic sequencing makes it especially hard to share data, exacerbating the privacy problem. We also go over various aspects of genomic privacy: first, there is basic identifiability of subjects having their genome sequenced. However, even for individuals who have consented to be identified, there is the prospect of very detailed future characterization of their genotype, which, unanticipated at the time of their consent, may be more personal and invasive than the release of their medical records. We go over various computational strategies for dealing with the issue of genomic privacy. One can "slice" and reformat datasets to allow them to be partially shared while securing the most private variants. This is particularly applicable to functional genomics information, which can be largely processed without variant information. For handling the most private data there are a number of legal and technological approaches-for example, modifying the informed consent procedure to acknowledge that privacy cannot be guaranteed, and/or employing a secure cloud computing environment. Cloud computing in particular may allow access to the data in a more controlled fashion than the current practice of downloading and computing on large datasets. Furthermore, it may be particularly advantageous for small labs, given that the burden of many privacy issues falls disproportionately on them in comparison to large corporations and genome centers. Finally, we discuss how education of future genetics researchers will be important, with curriculums emphasizing privacy and data security. However, teaching personal genomics with identifiable subjects in the university setting will, in turn, create additional privacy issues and social conundrums. © 2011 Greenbaum et al.
Insights from 20 years of bacterial genome sequencing

DOE PAGES

Land, Miriam L.; Hauser, Loren; Jun, Se-Ran; ...

2015-02-27

Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date,more » there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them.« less
Insights from 20 years of bacterial genome sequencing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Land, Miriam L.; Hauser, Loren; Jun, Se-Ran

Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date,more » there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them.« less
RefSeq microbial genomes database: new representation and annotation strategy.

PubMed

Tatusova, Tatiana; Ciufo, Stacy; Fedorov, Boris; O'Neill, Kathleen; Tolstoy, Igor

2014-01-01

The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives. These can be accessed through the Entrez search and retrieval system at http://www.ncbi.nlm.nih.gov/genome. Next-generation sequencing has enabled researchers to perform genomic sequencing at rates that were unimaginable in the past. Microbial genomes can now be sequenced in a matter of hours, which has led to a significant increase in the number of assembled genomes deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the annotation, analysis and visualization bioinformatics tools. New strategies have been developed for the annotation and representation of reference genomes and sequence variations derived from population studies and clinical outbreaks.
Gene calling and bacterial genome annotation with BG7.

PubMed

Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

2015-01-01

New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).
Characterization of the glutathione S-transferase gene family through ESTs and expression analyses within common and pigmented cultivars of Citrus sinensis (L.) Osbeck.

PubMed

Licciardello, Concetta; D'Agostino, Nunzio; Traini, Alessandra; Recupero, Giuseppe Reforgiato; Frusciante, Luigi; Chiusano, Maria Luisa

2014-02-03

Glutathione S-transferases (GSTs) represent a ubiquitous gene family encoding detoxification enzymes able to recognize reactive electrophilic xenobiotic molecules as well as compounds of endogenous origin. Anthocyanin pigments require GSTs for their transport into the vacuole since their cytoplasmic retention is toxic to the cell. Anthocyanin accumulation in Citrus sinensis (L.) Osbeck fruit flesh determines different phenotypes affecting the typical pigmentation of Sicilian blood oranges. In this paper we describe: i) the characterization of the GST gene family in C. sinensis through a systematic EST analysis; ii) the validation of the EST assembly by exploiting the genome sequences of C. sinensis and C. clementina and their genome annotations; iii) GST gene expression profiling in six tissues/organs and in two different sweet orange cultivars, Cadenera (common) and Moro (pigmented). We identified 61 GST transcripts, described the full- or partial-length nature of the sequences and assigned to each sequence the GST class membership exploiting a comparative approach and the classification scheme proposed for plant species. A total of 23 full-length sequences were defined. Fifty-four of the 61 transcripts were successfully aligned to the C. sinensis and C. clementina genomes. Tissue specific expression profiling demonstrated that the expression of some GST transcripts was 'tissue-affected' and cultivar specific. A comparative analysis of C. sinensis GSTs with those from other plant species was also considered. Data from the current analysis are accessible at http://biosrv.cab.unina.it/citrusGST/, with the aim to provide a reference resource for C. sinensis GSTs. This study aimed at the characterization of the GST gene family in C. sinensis. Based on expression patterns from two different cultivars and on sequence-comparative analyses, we also highlighted that two sequences, a Phi class GST and a Mapeg class GST, could be involved in the conjugation of anthocyanin pigments and in their transport into the vacuole, specifically in fruit flesh of the pigmented cultivar.
Characterization of a prototype strain of hepatitis E virus.

PubMed Central

Tsarev, S A; Emerson, S U; Reyes, G R; Tsareva, T S; Legters, L J; Malik, I A; Iqbal, M; Purcell, R H

1992-01-01

A strain of hepatitis E virus (SAR-55) implicated in an epidemic of enterically transmitted non-A, non-B hepatitis, now called hepatitis E, was characterized extensively. Six cynomolgus monkeys (Macaca fascicularis) were infected with a strain of hepatitis E virus from Pakistan. Reverse transcription-polymerase chain reaction was used to determine the pattern of virus shedding in feces, bile, and serum relative to hepatitis and induction of specific antibodies. Virtually the entire genome of SAR-55 (7195 nucleotides) was sequenced. Comparison of the sequence of SAR-55 with that of a Burmese strain revealed a high level of homology except for one region encoding 100 amino acids of a putative nonstructural polyprotein. Identification of this region as hypervariable was obtained by partial sequencing of a third isolate of hepatitis E virus from Kirgizia. Images PMID:1731327
MHC class I loci of the Bar-Headed goose (Anser indicus)

PubMed Central

2010-01-01

MHC class I proteins mediate functions in anti-pathogen defense. MHC diversity has already been investigated by many studies in model avian species, but here we chose the bar-headed goose, a worldwide migrant bird, as a non-model avian species. Sequences from exons encoding the peptide-binding region (PBR) of MHC class I molecules were isolated from liver genomic DNA, to investigate variation in these genes. These are the first MHC class I partial sequences of the bar-headed goose to be reported. A preliminary analysis suggests the presence of at least four MHC class I genes, which share great similarity with those of the goose and duck. A phylogenetic analysis of bar-headed goose, goose and duck MHC class I sequences using the NJ method supports the idea that they all cluster within the anseriforms clade. PMID:21637434
Genetic variation and dynamics of infections of equid herpesvirus 5 in individual horses.

PubMed

Back, Helena; Ullman, Karin; Leijon, Mikael; Söderlund, Robert; Penell, Johanna; Ståhl, Karl; Pringle, John; Valarcher, Jean-François

2016-01-01

Equid herpesvirus 5 (EHV-5) is related to the human Epstein-Barr virus (human herpesvirus 4) and has frequently been observed in equine populations worldwide. EHV-5 was previously assumed to be low to non-pathogenic; however, studies have also related the virus to the severe lung disease equine multinodular pulmonary fibrosis (EMPF). Genetic information of EHV-5 is scanty: the whole genome was recently described and only limited nucleotide sequences are available. In this study, samples were taken twice 1 year apart from eight healthy horses at the same professional training yard and samples from a ninth horse that was diagnosed with EMPF with samples taken pre- and post-mortem to analyse partial glycoprotein B (gB) gene of EHV-5 by using next-generation sequencing. The analysis resulted in 27 partial gB gene sequences, 11 unique sequence types and five amino acid sequences. These sequences could be classified within four genotypes (I-IV) of the EHV-5 gB gene based on the degree of similarity of the nucleotide and amino acid sequences, and in this work horses were shown to be identified with up to three different genotypes simultaneously. The observations showed a range of interactions between EHV-5 and the host over time, where the same virus persists in some horses, whereas others have a more dynamic infection pattern including strains from different genotypes. This study provides insight into the genetic variation and dynamics of EHV-5, and highlights that further work is needed to understand the EHV-5 interaction with its host.
Multiple independent origins of mitochondrial control region duplications in the order Psittaciformes

PubMed Central

Schirtzinger, Erin E.; Tavares, Erika S.; Gonzales, Lauren A.; Eberhard, Jessica R.; Miyaki, Cristina Y.; Sanchez, Juan J.; Hernandez, Alexis; Müeller, Heinrich; Graves, Gary R.; Fleischer, Robert C.; Wright, Timothy F.

2012-01-01

Mitochondrial genomes are generally thought to be under selection for compactness, due to their small size, consistent gene content, and a lack of introns or intergenic spacers. As more animal mitochondrial genomes are fully sequenced, rearrangements and partial duplications are being identified with increasing frequency, particularly in birds (Class Aves). In this study, we investigate the evolutionary history of mitochondrial control region states within the avian order Psittaciformes (parrots and cockatoos). To this aim, we reconstructed a comprehensive multi-locus phylogeny of parrots, used PCR of three diagnostic fragments to classify the mitochondrial control region state as single or duplicated, and mapped these states onto the phylogeny. We further sequenced 44 selected species to validate these inferences of control region state. Ancestral state reconstruction using a range of weighting schemes identified six independent origins of mitochondrial control region duplications within Psittaciformes. Analysis of sequence data showed that varying levels of mitochondrial gene and tRNA homology and degradation were present within a given clade exhibiting duplications. Levels of divergence between control regions within an individual varied from 0–10.9% with the differences occurring mainly between 51 and 225 nucleotides 3′ of the goose hairpin in domain I. Further investigations into the fates of duplicated mitochondrial genes, the potential costs and benefits of having a second control region, and the complex relationship between evolutionary rates, selection, and time since duplication are needed to fully explain these patterns in the mitochondrial genome. PMID:22543055
Widespread Site-Dependent Buffering of Human Regulatory Polymorphism

PubMed Central

Kutyavin, Tanya; Stamatoyannopoulos, John A.

2012-01-01

The average individual is expected to harbor thousands of variants within non-coding genomic regions involved in gene regulation. However, it is currently not possible to interpret reliably the functional consequences of genetic variation within any given transcription factor recognition sequence. To address this, we comprehensively analyzed heritable genome-wide binding patterns of a major sequence-specific regulator (CTCF) in relation to genetic variability in binding site sequences across a multi-generational pedigree. We localized and quantified CTCF occupancy by ChIP-seq in 12 related and unrelated individuals spanning three generations, followed by comprehensive targeted resequencing of the entire CTCF–binding landscape across all individuals. We identified hundreds of variants with reproducible quantitative effects on CTCF occupancy (both positive and negative). While these effects paralleled protein–DNA recognition energetics when averaged, they were extensively buffered by striking local context dependencies. In the significant majority of cases buffering was complete, resulting in silent variants spanning every position within the DNA recognition interface irrespective of level of binding energy or evolutionary constraint. The prevalence of complex partial or complete buffering effects severely constrained the ability to predict reliably the impact of variation within any given binding site instance. Surprisingly, 40% of variants that increased CTCF occupancy occurred at positions of human–chimp divergence, challenging the expectation that the vast majority of functional regulatory variants should be deleterious. Our results suggest that, even in the presence of “perfect” genetic information afforded by resequencing and parallel studies in multiple related individuals, genomic site-specific prediction of the consequences of individual variation in regulatory DNA will require systematic coupling with empirical functional genomic measurements. PMID:22457641
Further characterization of a new recombinant group of Plum pox virus isolates, PPV-T, found in orchards in the Ankara province of Turkey.

PubMed

Serçe, Ciğdem Ulubaş; Candresse, Thierry; Svanella-Dumas, Laurence; Krizbai, Laszlo; Gazel, Mona; Cağlayan, Kadriye

2009-06-01

Sixteen Plum pox virus (PPV) isolates collected in the Ankara region of Turkey were analyzed using available serological and molecular typing assays. Surprisingly, despite the fact that all isolates except one, which was a mix infection, were typed as belonging to the PPV-M strain in four independent molecular assays, nine of them (60%) reacted with both PPV-M specific and PPV-D specific monoclonal antibodies. Partial 5' and 3' genomic sequence analysis on four isolates demonstrated that irrespective of their reactivity towards the PPV-D specific monoclonal antibody, they were all closely related to a recombinant PPV isolate from Turkey, Ab-Tk. All three isolates for which the relevant genomic sequence was obtained showed the same recombination event as Ab-Tk in the HC-Pro gene, around position 1566 of the genome. Complete genomic sequencing of Ab-Tk did not provide evidence for additional recombination events in its evolutionary history. Taken together, these results indicate that a group of closely related PPV isolates characterized by a unique recombination in the HC-Pro gene is prevalent under field conditions in the Ankara region of Turkey. Similar to the situation with the PPV-Rec strain, we propose that these isolates represent a novel strain of PPV, for which the name PPV-T (Turkey) is proposed. Given that PPV-T isolates cannot be identified by currently available typing techniques, it is possible that their presence has been overlooked in other situations. Further efforts should allow a precise description of their prevalence and of their geographical distribution in Turkey and, possibly, in other countries.
Genetic variation in Pythium myriotylum based on SNP typing and development of a PCR-RFLP detection of isolates recovered from Pythium soft rot ginger.

PubMed

Le, D P; Smith, M K; Aitken, E A B

2017-10-01

Pythium myriotylum is responsible for severe losses in both capsicum and ginger crops in Australia under different regimes. Intraspecific genomic variation within the pathogen might explain the differences in aggressiveness and pathogenicity on diverse hosts. In this study, whole genome data of four P. myriotylum isolates recovered from three hosts and one Pythium zingiberis isolate were derived and analysed for sequence diversity based on single nucleotide polymorphisms (SNPs). A higher number of true and unique SNPs occurred in P. myriotylum isolates obtained from ginger with symptoms of Pythium soft rot (PSR) in Australia compared to other P. myriotylum isolates. Overall, SNPs were discovered more in the mitochondrial genome than those in the nuclear genome. Among the SNPs, a single substitution from the cytosine (C) to the thymine (T) in the partially sequenced CoxII gene of 14 representatives of PSR P. myriotylum isolates was within a restriction site of HinP1I enzyme which was used in the PCR-RFLP for detection and identification of the isolates without sequencing. The PCR-RFLP was also sensitive to detect PSR P. myriotylum strains from artificially infected ginger without the need for isolation for pure cultures. This is the first study of intraspecific variants of Pythium myriotylum isolates recovered from different hosts and origins based on single nucleotide polymorphism (SNP) genotyping of multiple genes. The SNPs discovered provide valuable makers for detection and identification of P. myriotylum strains initially isolated from Pythium soft rot (PSR) ginger by using PCR-RFLP of the CoxII locus. The PCR-RFLP was also sensitive to detect P. myriotylum directly from PSR ginger sampled from pot trials without the need of isolation for pure cultures. © 2017 The Society for Applied Microbiology.
PCR Amplification Strategies towards full-length HIV-1 Genome sequencing.

PubMed

Liu, Chao Chun; Ji, Hezhao

2018-06-26

The advent of next generation sequencing has enabled greater resolution of viral diversity and improved feasibility of full viral genome sequencing allowing routine HIV-1 full genome sequencing in both research and diagnostic settings. Regardless of the sequencing platform selected, successful PCR amplification of the HIV-1 genome is essential for sequencing template preparation. As such, full HIV-1 genome amplification is a crucial step in dictating the successful and reliable sequencing downstream. Here we reviewed existing PCR protocols leading to HIV-1 full genome sequencing. In addition to the discussion on basic considerations on relevant PCR design, the advantages as well as the pitfalls of published protocols were reviewed. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Whole genome sequence analysis of BT-474 using complete Genomics' standard and long fragment read technologies.

PubMed

Ciotlos, Serban; Mao, Qing; Zhang, Rebecca Yu; Li, Zhenyu; Chin, Robert; Gulbahce, Natali; Liu, Sophie Jia; Drmanac, Radoje; Peters, Brock A

2016-01-01

The cell line BT-474 is a popular cell line for studying the biology of cancer and developing novel drugs. However, there is no complete, published genome sequence for this highly utilized scientific resource. In this study we sought to provide a comprehensive and useful data set for the scientific community by generating a whole genome sequence for BT-474. Five μg of genomic DNA, isolated from an early passage of the BT-474 cell line, was used to generate a whole genome sequence (114X coverage) using Complete Genomics' standard sequencing process. To provide additional variant phasing and structural variation data we also processed and analyzed two separate libraries of 5 and 6 individual cells to depths of 99X and 87X, respectively, using Complete Genomics' Long Fragment Read (LFR) technology. BT-474 is a highly aneuploid cell line with an extremely complex genome sequence. This ~300X total coverage genome sequence provides a more complete understanding of this highly utilized cell line at the genomic level.
The VirusBanker database uses a Java program to allow flexible searching through Bunyaviridae sequences

PubMed Central

Fourment, Mathieu; Gibbs, Mark J

2008-01-01

Background Viruses of the Bunyaviridae have segmented negative-stranded RNA genomes and several of them cause significant disease. Many partial sequences have been obtained from the segments so that GenBank searches give complex results. Sequence databases usually use HTML pages to mediate remote sorting, but this approach can be limiting and may discourage a user from exploring a database. Results The VirusBanker database contains Bunyaviridae sequences and alignments and is presented as two spreadsheets generated by a Java program that interacts with a MySQL database on a server. Sequences are displayed in rows and may be sorted using information that is displayed in columns and includes data relating to the segment, gene, protein, species, strain, sequence length, terminal sequence and date and country of isolation. Bunyaviridae sequences and alignments may be downloaded from the second spreadsheet with titles defined by the user from the columns, or viewed when passed directly to the sequence editor, Jalview. Conclusion VirusBanker allows large datasets of aligned nucleotide and protein sequences from the Bunyaviridae to be compiled and winnowed rapidly using criteria that are formulated heuristically. PMID:18251994
Highly effective sequencing whole chloroplast genomes of angiosperms by nine novel universal primer pairs.

PubMed

Yang, Jun-Bo; Li, De-Zhu; Li, Hong-Tao

2014-09-01

Chloroplast genomes supply indispensable information that helps improve the phylogenetic resolution and even as organelle-scale barcodes. Next-generation sequencing technologies have helped promote sequencing of complete chloroplast genomes, but compared with the number of angiosperms, relatively few chloroplast genomes have been sequenced. There are two major reasons for the paucity of completely sequenced chloroplast genomes: (i) massive amounts of fresh leaves are needed for chloroplast sequencing and (ii) there are considerable gaps in the sequenced chloroplast genomes of many plants because of the difficulty of isolating high-quality chloroplast DNA, preventing complete chloroplast genomes from being assembled. To overcome these obstacles, all known angiosperm chloroplast genomes available to date were analysed, and then we designed nine universal primer pairs corresponding to the highly conserved regions. Using these primers, angiosperm whole chloroplast genomes can be amplified using long-range PCR and sequenced using next-generation sequencing methods. The primers showed high universality, which was tested using 24 species representing major clades of angiosperms. To validate the functionality of the primers, eight species representing major groups of angiosperms, that is, early-diverging angiosperms, magnoliids, monocots, Saxifragales, fabids, malvids and asterids, were sequenced and assembled their complete chloroplast genomes. In our trials, only 100 mg of fresh leaves was used. The results show that the universal primer set provided an easy, effective and feasible approach for sequencing whole chloroplast genomes in angiosperms. The designed universal primer pairs provide a possibility to accelerate genome-scale data acquisition and will therefore magnify the phylogenetic resolution and species identification in angiosperms. © 2014 John Wiley & Sons Ltd.
Sequencing and assembly of the 22-gb loblolly pine genome.

PubMed

Zimin, Aleksey; Stevens, Kristian A; Crepeau, Marc W; Holtz-Morris, Ann; Koriabine, Maxim; Marçais, Guillaume; Puiu, Daniela; Roberts, Michael; Wegrzyn, Jill L; de Jong, Pieter J; Neale, David B; Salzberg, Steven L; Yorke, James A; Langley, Charles H

2014-03-01

Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer "super-reads," rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp.
Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum.

PubMed

VanBuren, Robert; Bryant, Doug; Edger, Patrick P; Tang, Haibao; Burgess, Diane; Challabathula, Dinakar; Spittle, Kristi; Hall, Richard; Gu, Jenny; Lyons, Eric; Freeling, Michael; Bartels, Dorothea; Ten Hallers, Boudewijn; Hastie, Alex; Michael, Todd P; Mockler, Todd C

2015-11-26

Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetium genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a 'near-complete' draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. The Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.

Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.

PubMed

Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi

2017-07-01

PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.
The Complete Mitochondrial Genome of Gossypium hirsutum and Evolutionary Analysis of Higher Plant Mitochondrial Genomes

PubMed Central

Su, Aiguo; Geng, Jianing; Grover, Corrinne E.; Hu, Songnian; Hua, Jinping

2013-01-01

Background Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. Methodology/Principal Findings We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. Conclusion The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species. PMID:23940520
The complete mitochondrial genome of Gossypium hirsutum and evolutionary analysis of higher plant mitochondrial genomes.

PubMed

Liu, Guozheng; Cao, Dandan; Li, Shuangshuang; Su, Aiguo; Geng, Jianing; Grover, Corrinne E; Hu, Songnian; Hua, Jinping

2013-01-01

Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species.
Selection of a DNA barcode for Nectriaceae from fungal whole-genomes.

PubMed

Zeng, Zhaoqing; Zhao, Peng; Luo, Jing; Zhuang, Wenying; Yu, Zhihe

2012-01-01

A DNA barcode is a short segment of sequence that is able to distinguish species. A barcode must ideally contain enough variation to distinguish every individual species and be easily obtained. Fungi of Nectriaceae are economically important and show high species diversity. To establish a standard DNA barcode for this group of fungi, the genomes of Neurospora crassa and 30 other filamentous fungi were compared. The expect value was treated as a criterion to recognize homologous sequences. Four candidate markers, Hsp90, AAC, CDC48, and EF3, were tested for their feasibility as barcodes in the identification of 34 well-established species belonging to 13 genera of Nectriaceae. Two hundred and fifteen sequences were analyzed. Intra- and inter-specific variations and the success rate of PCR amplification and sequencing were considered as important criteria for estimation of the candidate markers. Ultimately, the partial EF3 gene met the requirements for a good DNA barcode: No overlap was found between the intra- and inter-specific pairwise distances. The smallest inter-specific distance of EF3 gene was 3.19%, while the largest intra-specific distance was 1.79%. In addition, there was a high success rate in PCR and sequencing for this gene (96.3%). CDC48 showed sufficiently high sequence variation among species, but the PCR and sequencing success rate was 84% using a single pair of primers. Although the Hsp90 and AAC genes had higher PCR and sequencing success rates (96.3% and 97.5%, respectively), overlapping occurred between the intra- and inter-specific variations, which could lead to misidentification. Therefore, we propose the EF3 gene as a possible DNA barcode for the nectriaceous fungi.
Isolation and characterization of adrenoleukodystrophy protein (ALDP) related sequences in the human genome

DOE Office of Scientific and Technical Information (OSTI.GOV)

Geraghty, M.T.; Stetten, G.; Kearns, W.

1994-09-01

X-linked adrenoleukodystrophy (ALD) is a disorder of peroxisomal {beta}-oxidation of very long chain fatty acids. It presents either as progressive dementia in childhood or as progressive paraparesis in later years. Adrenal insufficiency occurs in both phenotypes. The gene of the ALD protein has been mapped to Xq28 and has recently been cloned and characterized. The ALD protein has significant homology to the peroxisomal membrane protein, PMP70 and belongs to the ATP binding cassette superfamily of transporters. We screened a human genomic library with an ALDP cDNA and isolated 5 different but highly similar clones containing sequences corresponding to the 3{prime}more » end of the ALDP gene. Comparison of the sequences over the region corresponding to exon 9 through the 3{prime} end of the ALDP gene reveals {approximately}96% nucleotide identity in both exonic and intronic regions. Splice sites and open reading frames are maintained. Using both FISH and human-rodent DNA mapping panels, we positively assign these ALDP-related sequences to chromosomes 2, 16 and 22, and provisionally to 1 and 20. Southern blot of primate DNA probed with a partial ALDP cDNA (exon 2-10) shows that expansion of ALDP-related sequences occurred in higher primates (chimp, gorilla and human). Although Northern blots show multiple ALDP-hybridizing transcripts in certain tissues, we have no evidence to date for expression of these ALDP-related sequences. In conclusion, our data show there has been an unusual and recent dispersal to multiple chromosomes of structural gene sequences related to the ALDP gene. The functional significance of these sequences remains to be determined but their existence complicates PCR and mutation analysis of the ALDP gene.« less
The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences

PubMed Central

2010-01-01

Background In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24). The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda. Results We present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS) sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (≥ 75% nucleotide identity) elsewhere in the genome, but only 23% have identical copies (99% identity). The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome. Conclusions This study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is a feasible goal. PMID:20609256
Use of low-coverage, large-insert, short-read data for rapid and accurate generation of enhanced-quality draft Pseudomonas genome sequences.

PubMed

O'Brien, Heath E; Gong, Yunchen; Fung, Pauline; Wang, Pauline W; Guttman, David S

2011-01-01

Next-generation genomic technology has both greatly accelerated the pace of genome research as well as increased our reliance on draft genome sequences. While groups such as the Genomics Standards Consortium have made strong efforts to promote genome standards there is a still a general lack of uniformity among published draft genomes, leading to challenges for downstream comparative analyses. This lack of uniformity is a particular problem when using standard draft genomes that frequently have large numbers of low-quality sequencing tracts. Here we present a proposal for an "enhanced-quality draft" genome that identifies at least 95% of the coding sequences, thereby effectively providing a full accounting of the genic component of the genome. Enhanced-quality draft genomes are easily attainable through a combination of small- and large-insert next-generation, paired-end sequencing. We illustrate the generation of an enhanced-quality draft genome by re-sequencing the plant pathogenic bacterium Pseudomonas syringae pv. phaseolicola 1448A (Pph 1448A), which has a published, closed genome sequence of 5.93 Mbp. We use a combination of Illumina paired-end and mate-pair sequencing, and surprisingly find that de novo assemblies with 100x paired-end coverage and mate-pair sequencing with as low as low as 2-5x coverage are substantially better than assemblies based on higher coverage. The rapid and low-cost generation of large numbers of enhanced-quality draft genome sequences will be of particular value for microbial diagnostics and biosecurity, which rely on precise discrimination of potentially dangerous clones from closely related benign strains.
Illuminating the Black Box of Genome Sequence Assembly: A Free Online Tool to Introduce Students to Bioinformatics

ERIC Educational Resources Information Center

Taylor, D. Leland; Campbell, A. Malcolm; Heyer, Laurie J.

2013-01-01

Next-generation sequencing technologies have greatly reduced the cost of sequencing genomes. With the current sequencing technology, a genome is broken into fragments and sequenced, producing millions of "reads." A computer algorithm pieces these reads together in the genome assembly process. PHAST is a set of online modules…
Exome-wide DNA capture and next generation sequencing in domestic and wild species.

PubMed

Cosart, Ted; Beja-Pereira, Albano; Chen, Shanyuan; Ng, Sarah B; Shendure, Jay; Luikart, Gordon

2011-07-05

Gene-targeted and genome-wide markers are crucial to advance evolutionary biology, agriculture, and biodiversity conservation by improving our understanding of genetic processes underlying adaptation and speciation. Unfortunately, for eukaryotic species with large genomes it remains costly to obtain genome sequences and to develop genome resources such as genome-wide SNPs. A method is needed to allow gene-targeted, next-generation sequencing that is flexible enough to include any gene or number of genes, unlike transcriptome sequencing. Such a method would allow sequencing of many individuals, avoiding ascertainment bias in subsequent population genetic analyses.We demonstrate the usefulness of a recent technology, exon capture, for genome-wide, gene-targeted marker discovery in species with no genome resources. We use coding gene sequences from the domestic cow genome sequence (Bos taurus) to capture (enrich for), and subsequently sequence, thousands of exons of B. taurus, B. indicus, and Bison bison (wild bison). Our capture array has probes for 16,131 exons in 2,570 genes, including 203 candidate genes with known function and of interest for their association with disease and other fitness traits. We successfully sequenced and mapped exon sequences from across the 29 autosomes and X chromosome in the B. taurus genome sequence. Exon capture and high-throughput sequencing identified thousands of putative SNPs spread evenly across all reference chromosomes, in all three individuals, including hundreds of SNPs in our targeted candidate genes. This study shows exon capture can be customized for SNP discovery in many individuals and for non-model species without genomic resources. Our captured exome subset was small enough for affordable next-generation sequencing, and successfully captured exons from a divergent wild species using the domestic cow genome as reference.
Mosaic Graphs and Comparative Genomics in Phage Communities

PubMed Central

Belcaid, Mahdi; Bergeron, Anne

2010-01-01

Abstract Comparing the genomes of two closely related viruses often produces mosaics where nearly identical sequences alternate with sequences that are unique to each genome. When several closely related genomes are compared, the unique sequences are likely to be shared with third genomes, leading to virus mosaic communities. Here we present comparative analysis of sets of Staphylococcus aureus phages that share large identical sequences with up to three other genomes, and with different partners along their genomes. We introduce mosaic graphs to represent these complex recombination events, and use them to illustrate the breath and depth of sequence sharing: some genomes are almost completely made up of shared sequences, while genomes that share very large identical sequences can adopt alternate functional modules. Mosaic graphs also allow us to identify breakpoints that could eventually be used for the construction of recombination networks. These findings have several implications on phage metagenomics assembly, on the horizontal gene transfer paradigm, and more generally on the understanding of the composition and evolutionary dynamics of virus communities. PMID:20874413
Single-cell genome sequencing at ultra-high-throughput with microfluidic droplet barcoding.

PubMed

Lan, Freeman; Demaree, Benjamin; Ahmed, Noorsher; Abate, Adam R

2017-07-01

The application of single-cell genome sequencing to large cell populations has been hindered by technical challenges in isolating single cells during genome preparation. Here we present single-cell genomic sequencing (SiC-seq), which uses droplet microfluidics to isolate, fragment, and barcode the genomes of single cells, followed by Illumina sequencing of pooled DNA. We demonstrate ultra-high-throughput sequencing of >50,000 cells per run in a synthetic community of Gram-negative and Gram-positive bacteria and fungi. The sequenced genomes can be sorted in silico based on characteristic sequences. We use this approach to analyze the distributions of antibiotic-resistance genes, virulence factors, and phage sequences in microbial communities from an environmental sample. The ability to routinely sequence large populations of single cells will enable the de-convolution of genetic heterogeneity in diverse cell populations.
Identification of three genotypes of sugarcane yellow leaf virus causing yellow leaf disease from India and their molecular characterization.

PubMed

Viswanathan, R; Balamuralikrishnan, M; Karuppaiah, R

2008-12-01

Sugarcane yellow leaf virus (SCYLV) that causes yellow leaf disease (YLD) in sugarcane (recently reported in India) belongs to Polerovirus. Detailed studies were conducted to characterize the virus based on partial open reading frames (ORFs) 1 and 2 and complete ORFs 3 and 4 sequences in their genome. Reverse-transcriptase polymerase chain reaction (RT-PCR) was performed on 48 sugarcane leaf samples to detect the virus using a specific set of primers. Of the 48 samples, 36 samples (field samples with and without foliar symptoms) including 10 meristem culture derived plants were found to be positive to SCYLV infection. Additionally, an aphid colony collected from symptomatic sugarcane in the field was also found to be SCYLV positive. The amplicons from 22 samples were cloned, sequenced and acronymed as SCYLV-CB isolates. The nucleotide (nt) and amino acid (aa) sequence comparison showed a significant variation between SCYLV-CB and the database sequences at nt (3.7-5.1%) and aa (3.2-5.3%) sequence level in the CP coding region. However, the database sequences comprising isolates of three reported genotypes, viz., BRA, PER and REU, were observed with least nt and aa sequence dissimilarities (0.0-1.6%). The phylogenetic analyses of the overlapping ORFs (ORF 3 and ORF 4) of SCYLV encoding CP and MP determined in this study and additional sequences of 26 other isolates including an Indian isolate (SCYLV-IND) available from GenBank were distributed in four phylogenetic clusters. The SCYLV-CB isolates from this study lineated in two clusters (C1 and C2) and all the other isolates from the worldwide locations into another two clusters (C3 and C4). The sequence variation of the isolates in this study with the database isolates, even in the least variable region of the SCYLV genome, showed that the population existing in India is significantly different from rest of the world. Further, comparison of partial sequences encoding for ORFs 1 and 2 revealed that YLD in sugarcane in India is caused by at least three genotypes, viz., CUB, IND and BRA-PER, of which a majority of the samples were found infected with Cuban genotype (CUB) and lesser by IND and BRA-PER genotypes. The genotype IND was identified as a new genotype from this study, and this was found to have significant variation with the reported genotypes.
De novo Assembly of a 40 Mb Eukaryotic Genome from Short Sequence Reads: Sordaria macrospora, a Model Organism for Fungal Morphogenesis

PubMed Central

Nowrousian, Minou; Stajich, Jason E.; Chu, Meiling; Engh, Ines; Espagne, Eric; Halliday, Karen; Kamerewerd, Jens; Kempken, Frank; Knab, Birgit; Kuo, Hsiao-Che; Osiewacz, Heinz D.; Pöggeler, Stefanie; Read, Nick D.; Seiler, Stephan; Smith, Kristina M.; Zickler, Denise; Kück, Ulrich; Freitag, Michael

2010-01-01

Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30–90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in ∼4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb) with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data can be used for comparative studies to address basic questions of fungal biology. PMID:20386741
De novo assembly of a 40 Mb eukaryotic genome from short sequence reads: Sordaria macrospora, a model organism for fungal morphogenesis.

PubMed

Nowrousian, Minou; Stajich, Jason E; Chu, Meiling; Engh, Ines; Espagne, Eric; Halliday, Karen; Kamerewerd, Jens; Kempken, Frank; Knab, Birgit; Kuo, Hsiao-Che; Osiewacz, Heinz D; Pöggeler, Stefanie; Read, Nick D; Seiler, Stephan; Smith, Kristina M; Zickler, Denise; Kück, Ulrich; Freitag, Michael

2010-04-08

Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30-90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in approximately 4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb) with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data can be used for comparative studies to address basic questions of fungal biology.
Newborn Sequencing in Genomic Medicine and Public Health

PubMed Central

Agrawal, Pankaj B.; Bailey, Donald B.; Beggs, Alan H.; Brenner, Steven E.; Brower, Amy M.; Cakici, Julie A.; Ceyhan-Birsoy, Ozge; Chan, Kee; Chen, Flavia; Currier, Robert J.; Dukhovny, Dmitry; Green, Robert C.; Harris-Wai, Julie; Holm, Ingrid A.; Iglesias, Brenda; Joseph, Galen; Kingsmore, Stephen F.; Koenig, Barbara A.; Kwok, Pui-Yan; Lantos, John; Leeder, Steven J.; Lewis, Megan A.; McGuire, Amy L.; Milko, Laura V.; Mooney, Sean D.; Parad, Richard B.; Pereira, Stacey; Petrikin, Joshua; Powell, Bradford C.; Powell, Cynthia M.; Puck, Jennifer M.; Rehm, Heidi L.; Risch, Neil; Roche, Myra; Shieh, Joseph T.; Veeraraghavan, Narayanan; Watson, Michael S.; Willig, Laurel; Yu, Timothy W.; Urv, Tiina; Wise, Anastasia L.

2017-01-01

The rapid development of genomic sequencing technologies has decreased the cost of genetic analysis to the extent that it seems plausible that genome-scale sequencing could have widespread availability in pediatric care. Genomic sequencing provides a powerful diagnostic modality for patients who manifest symptoms of monogenic disease and an opportunity to detect health conditions before their development. However, many technical, clinical, ethical, and societal challenges should be addressed before such technology is widely deployed in pediatric practice. This article provides an overview of the Newborn Sequencing in Genomic Medicine and Public Health Consortium, which is investigating the application of genome-scale sequencing in newborns for both diagnosis and screening. PMID:28096516
A novel bioinformatics method for efficient knowledge discovery by BLSOM from big genomic sequence data.

PubMed

Bai, Yu; Iwasaki, Yuki; Kanaya, Shigehiko; Zhao, Yue; Ikemura, Toshimichi

2014-01-01

With remarkable increase of genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional data such as oligonucleotide composition on one map. By modifying the conventional SOM, we have previously developed Batch-Learning SOM (BLSOM), which allows classification of sequence fragments according to species, solely depending on the oligonucleotide composition. In the present study, we introduce the oligonucleotide BLSOM used for characterization of vertebrate genome sequences. We first analyzed pentanucleotide compositions in 100 kb sequences derived from a wide range of vertebrate genomes and then the compositions in the human and mouse genomes in order to investigate an efficient method for detecting differences between the closely related genomes. BLSOM can recognize the species-specific key combination of oligonucleotide frequencies in each genome, which is called a "genome signature," and the specific regions specifically enriched in transcription-factor-binding sequences. Because the classification and visualization power is very high, BLSOM is an efficient powerful tool for extracting a wide range of information from massive amounts of genomic sequences (i.e., big sequence data).
Genome Science: A Video Tour of the Washington University Genome Sequencing Center for High School and Undergraduate Students

PubMed Central

2005-01-01

Sequencing of the human genome has ushered in a new era of biology. The technologies developed to facilitate the sequencing of the human genome are now being applied to the sequencing of other genomes. In 2004, a partnership was formed between Washington University School of Medicine Genome Sequencing Center's Outreach Program and Washington University Department of Biology Science Outreach to create a video tour depicting the processes involved in large-scale sequencing. “Sequencing a Genome: Inside the Washington University Genome Sequencing Center” is a tour of the laboratory that follows the steps in the sequencing pipeline, interspersed with animated explanations of the scientific procedures used at the facility. Accompanying interviews with the staff illustrate different entry levels for a career in genome science. This video project serves as an example of how research and academic institutions can provide teachers and students with access and exposure to innovative technologies at the forefront of biomedical research. Initial feedback on the video from undergraduate students, high school teachers, and high school students provides suggestions for use of this video in a classroom setting to supplement present curricula. PMID:16341256
From Conventional to Next Generation Sequencing of Epstein-Barr Virus Genomes.

PubMed

Kwok, Hin; Chiang, Alan Kwok Shing

2016-02-24

Genomic sequences of Epstein-Barr virus (EBV) have been of interest because the virus is associated with cancers, such as nasopharyngeal carcinoma, and conditions such as infectious mononucleosis. The progress of whole-genome EBV sequencing has been limited by the inefficiency and cost of the first-generation sequencing technology. With the advancement of next-generation sequencing (NGS) and target enrichment strategies, increasing number of EBV genomes has been published. These genomes were sequenced using different approaches, either with or without EBV DNA enrichment. This review provides an overview of the EBV genomes published to date, and a description of the sequencing technology and bioinformatic analyses employed in generating these sequences. We further explored ways through which the quality of sequencing data can be improved, such as using DNA oligos for capture hybridization, and longer insert size and read length in the sequencing runs. These advances will enable large-scale genomic sequencing of EBV which will facilitate a better understanding of the genetic variations of EBV in different geographic regions and discovery of potentially pathogenic variants in specific diseases.
Initial sequencing and comparative analysis of the mouse genome

DOE Office of Scientific and Technical Information (OSTI.GOV)

Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan

2002-12-15

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of themore » genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.« less
Tapping the promise of genomics in species with complex, nonmodel genomes.

PubMed

Hirsch, Candice N; Buell, C Robin

2013-01-01

Genomics is enabling a renaissance in all disciplines of plant biology. However, many plant genomes are complex and remain recalcitrant to current genomic technologies. The complexities of these nonmodel plant genomes are attributable to gene and genome duplication, heterozygosity, ploidy, and/or repetitive sequences. Methods are available to simplify the genome and reduce these barriers, including inbreeding and genome reduction, making these species amenable to current sequencing and assembly methods. Some, but not all, of the complexities in nonmodel genomes can be bypassed by sequencing the transcriptome rather than the genome. Additionally, comparative genomics approaches, which leverage phylogenetic relatedness, can aid in the interpretation of complex genomes. Although there are limitations in accessing complex nonmodel plant genomes using current sequencing technologies, genome manipulation and resourceful analyses can allow access to even the most recalcitrant plant genomes.

Development of genome- and transcriptome-derived microsatellites in related species of snapping shrimps with highly duplicated genomes.

PubMed

Gaynor, Kaitlyn M; Solomon, Joseph W; Siller, Stefanie; Jessell, Linnet; Duffy, J Emmett; Rubenstein, Dustin R

2017-11-01

Molecular markers are powerful tools for studying patterns of relatedness and parentage within populations and for making inferences about social evolution. However, the development of molecular markers for simultaneous study of multiple species presents challenges, particularly when species exhibit genome duplication or polyploidy. We developed microsatellite markers for Synalpheus shrimp, a genus in which species exhibit not only great variation in social organization, but also interspecific variation in genome size and partial genome duplication. From the four primary clades within Synalpheus, we identified microsatellites in the genomes of four species and in the consensus transcriptome of two species. Ultimately, we designed and tested primers for 143 microsatellite markers across 25 species. Although the majority of markers were disomic, many markers were polysomic for certain species. Surprisingly, we found no relationship between genome size and the number of polysomic markers. As expected, markers developed for a given species amplified better for closely related species than for more distant relatives. Finally, the markers developed from the transcriptome were more likely to work successfully and to be disomic than those developed from the genome, suggesting that consensus transcriptomes are likely to be conserved across species. Our findings suggest that the transcriptome, particularly consensus sequences from multiple species, can be a valuable source of molecular markers for taxa with complex, duplicated genomes. © 2017 John Wiley & Sons Ltd.
Characterization of transposable elements in the ectomycorrhizal fungus Laccaria bicolor.

PubMed

Labbé, Jessy; Murat, Claude; Morin, Emmanuelle; Tuskan, Gerald A; Le Tacon, François; Martin, Francis

2012-01-01

The publicly available Laccaria bicolor genome sequence has provided a considerable genomic resource allowing systematic identification of transposable elements (TEs) in this symbiotic ectomycorrhizal fungus. Using a TE-specific annotation pipeline we have characterized and analyzed TEs in the L. bicolor S238N-H82 genome. TEs occupy 24% of the 60 Mb L. bicolor genome and represent 25,787 full-length and partial copy elements distributed within 171 families. The most abundant elements were the Copia-like. TEs are not randomly distributed across the genome, but are tightly nested or clustered. The majority of TEs exhibits signs of ancient transposition except some intact copies of terminal inverted repeats (TIRS), long terminal repeats (LTRs) and a large retrotransposon derivative (LARD) element. There were three main periods of TE expansion in L. bicolor: the first from 57 to 10 Mya, the second from 5 to 1 Mya and the most recent from 0.5 Mya ago until now. LTR retrotransposons are closely related to retrotransposons found in another basidiomycete, Coprinopsis cinerea. This analysis 1) represents an initial characterization of TEs in the L. bicolor genome, 2) contributes to improve genome annotation and a greater understanding of the role TEs played in genome organization and evolution and 3) provides a valuable resource for future research on the genome evolution within the Laccaria genus.
Characterization of Transposable Elements in the Ectomycorrhizal Fungus Laccaria bicolor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Labbe, Jessy L; Murat, Claude; Morin, Emmanuelle

2012-01-01

Background: The publicly available Laccaria bicolor genome sequence has provided a considerable genomic resource allowing systematic identification of transposable elements (TEs) in this symbiotic ectomycorrhizal fungus. Using a TEspecific annotation pipeline we have characterized and analyzed TEs in the L. bicolor S238N-H82 genome. Methodology/Principal Findings: TEs occupy 24% of the 60 Mb L. bicolor genome and represent 25,787 full-length and partial copy elements distributed within 171 families. The most abundant elements were the Copia-like. TEs are not randomly distributed across the genome, but are tightly nested or clustered. The majority of TEs exhibits signs of ancient transposition except some intactmore » copies of terminal inverted repeats (TIRS), long terminal repeats (LTRs) and a large retrotransposon derivative (LARD) element. There were three main periods of TE expansion in L. bicolor: the first from 57 to 10 Mya, the second from 5 to 1 Mya and the most recent from 0.5 Mya ago until now. LTR retrotransposons are closely related to retrotransposons found in another basidiomycete, Coprinopsis cinerea. Conclusions: This analysis 1) represents an initial characterization of TEs in the L. bicolor genome, 2) contributes to improve genome annotation and a greater understanding of the role TEs played in genome organization and evolution and 3) provides a valuable resource for future research on the genome evolution within the Laccaria genus.« less
Genome-wide characterization of centromeric satellites from multiple mammalian genomes.

PubMed

Alkan, Can; Cardone, Maria Francesca; Catacchio, Claudia Rita; Antonacci, Francesca; O'Brien, Stephen J; Ryder, Oliver A; Purgato, Stefania; Zoli, Monica; Della Valle, Giuliano; Eichler, Evan E; Ventura, Mario

2011-01-01

Despite its importance in cell biology and evolution, the centromere has remained the final frontier in genome assembly and annotation due to its complex repeat structure. However, isolation and characterization of the centromeric repeats from newly sequenced species are necessary for a complete understanding of genome evolution and function. In recent years, various genomes have been sequenced, but the characterization of the corresponding centromeric DNA has lagged behind. Here, we present a computational method (RepeatNet) to systematically identify higher-order repeat structures from unassembled whole-genome shotgun sequence and test whether these sequence elements correspond to functional centromeric sequences. We analyzed genome datasets from six species of mammals representing the diversity of the mammalian lineage, namely, horse, dog, elephant, armadillo, opossum, and platypus. We define candidate monomer satellite repeats and demonstrate centromeric localization for five of the six genomes. Our analysis revealed the greatest diversity of centromeric sequences in horse and dog in contrast to elephant and armadillo, which showed high-centromeric sequence homogeneity. We could not isolate centromeric sequences within the platypus genome, suggesting that centromeres in platypus are not enriched in satellite DNA. Our method can be applied to the characterization of thousands of other vertebrate genomes anticipated for sequencing in the near future, providing an important tool for annotation of centromeres.
Parents' interest in whole-genome sequencing of newborns.

PubMed

Goldenberg, Aaron J; Dodson, Daniel S; Davis, Matthew M; Tarini, Beth A

2014-01-01

The aim of this study was to assess parents' interest in whole-genome sequencing for newborns. We conducted a survey of a nationally representative sample of 1,539 parents about their interest in whole-genome sequencing of newborns. Participants were randomly presented with one of two scenarios that differed in the venue of testing: one offered whole-genome sequencing through a state newborn screening program, whereas the other offered whole-genome sequencing in a pediatrician's office. Overall interest in having future newborns undergo whole-genome sequencing was generally high among parents. If whole-genome sequencing were offered through a state's newborn-screening program, 74% of parents were either definitely or somewhat interested in utilizing this technology. If offered in a pediatrician's office, 70% of parents were either definitely or somewhat interested. Parents in both groups most frequently identified test accuracy and the ability to prevent a child from developing a disease as "very important" in making a decision to have a newborn's whole genome sequenced. These data may help health departments and children's health-care providers anticipate parents' level of interest in genomic screening for newborns. As whole-genome sequencing is integrated into clinical and public health services, these findings may inform the development of educational strategies and outreach messages for parents.
Characterization of a novel ADAM protease expressed by Pneumocystis carinii.

PubMed

Kennedy, Cassie C; Kottom, Theodore J; Limper, Andrew H

2009-08-01

Pneumocystis species are opportunistic fungal pathogens that cause severe pneumonia in immunocompromised hosts. Recent evidence has suggested that unidentified proteases are involved in Pneumocystis life cycle regulation. Proteolytically active ADAM (named for "a disintegrin and metalloprotease") family molecules have been identified in some fungal organisms, such as Aspergillus fumigatus and Schizosaccharomyces pombe, and some have been shown to participate in life cycle regulation. Accordingly, we sought to characterize ADAM-like molecules in the fungal opportunistic pathogen, Pneumocystis carinii (PcADAM). After an in silico search of the P. carinii genomic sequencing project identified a 329-bp partial sequence with homology to known ADAM proteins, the full-length PcADAM sequence was obtained by PCR extension cloning, yielding a final coding sequence of 1,650 bp. Sequence analysis detected the presence of a typical ADAM catalytic active site (HEXXHXXGXXHD). Expression of PcADAM over the Pneumocystis life cycle was analyzed by Northern blot. Southern and contour-clamped homogenous electronic field blot analysis demonstrated its presence in the P. carinii genome. Expression of PcADAM was observed to be increased in Pneumocystis cysts compared to trophic forms. The full-length gene was subsequently cloned and heterologously expressed in Saccharomyces cerevisiae. Purified PcADAMp protein was proteolytically active in casein zymography, requiring divalent zinc. Furthermore, native PcADAMp extracted directly from freshly isolated Pneumocystis organisms also exhibited protease activity. This is the first report of protease activity attributable to a specific, characterized protein in the clinically important opportunistic fungal pathogen Pneumocystis.
Genome-Wide Methylome Analyses Reveal Novel Epigenetic Regulation Patterns in Schizophrenia and Bipolar Disorder

PubMed Central

Li, Yongsheng; Camarillo, Cynthia; Xu, Juan; Arana, Tania Bedard; Xiao, Yun; Zhao, Zheng; Chen, Hong; Ramirez, Mercedes; Zavala, Juan; Escamilla, Michael A.; Armas, Regina; Mendoza, Ricardo; Ontiveros, Alfonso; Nicolini, Humberto; Jerez Magaña, Alvaro Antonio; Rubin, Lewis P.; Li, Xia; Xu, Chun

2015-01-01

Schizophrenia (SZ) and bipolar disorder (BP) are complex genetic disorders. Their appearance is also likely informed by as yet only partially described epigenetic contributions. Using a sequencing-based method for genome-wide analysis, we quantitatively compared the blood DNA methylation landscapes in SZ and BP subjects to control, both in an understudied population, Hispanics along the US-Mexico border. Remarkably, we identified thousands of differentially methylated regions for SZ and BP preferentially located in promoters 3′-UTRs and 5′-UTRs of genes. Distinct patterns of aberrant methylation of promoter sequences were located surrounding transcription start sites. In these instances, aberrant methylation occurred in CpG islands (CGIs) as well as in flanking regions as well as in CGI sparse promoters. Pathway analysis of genes displaying these distinct aberrant promoter methylation patterns showed enhancement of epigenetic changes in numerous genes previously related to psychiatric disorders and neurodevelopment. Integration of gene expression data further suggests that in SZ aberrant promoter methylation is significantly associated with altered gene transcription. In particular, we found significant associations between (1) promoter CGIs hypermethylation with gene repression and (2) CGI 3′-shore hypomethylation with increased gene expression. Finally, we constructed a specific methylation analysis platform that facilitates viewing and comparing aberrant genome methylation in human neuropsychiatric disorders. PMID:25734057
Sampling gene diversity across the supergroup Amoebozoa: large EST data sets from Acanthamoeba castellanii, Hartmannella vermiformis, Physarum polycephalum, Hyperamoeba dachnaya and Hyperamoeba sp.

PubMed

Watkins, Russell F; Gray, Michael W

2008-04-01

From comparative analysis of EST data for five taxa within the eukaryotic supergroup Amoebozoa, including two free-living amoebae (Acanthamoeba castellanii, Hartmannella vermiformis) and three slime molds (Physarum polycephalum, Hyperamoeba dachnaya and Hyperamoeba sp.), we obtained new broad-range perspectives on the evolution and biosynthetic capacity of this assemblage. Together with genome sequences for the amoebozoans Dictyostelium discoideum and Entamoeba histolytica, and including partial genome sequence available for A. castellanii, we used the EST data to identify genes that appear to be exclusive to the supergroup, and to specific clades therein. Many of these genes are likely involved in cell-cell communication or differentiation. In examining on a broad scale a number of characters that previously have been considered in simpler cross-species comparisons, typically between Dictyostelium and Entamoeba, we find that Amoebozoa as a whole exhibits striking variation in the number and distribution of biosynthetic pathways, for example, ones for certain critical stress-response molecules, including trehalose and mannitol. Finally, we report additional compelling cases of lateral gene transfer within Amoebozoa, further emphasizing that although this process has influenced genome evolution in all examined amoebozoan taxa, it has done so to a variable extent.
Coding Complete Genome for the Mogiana Tick Virus, a Jingmenvirus Isolated from Ticks in Brazil

DTIC Science & Technology

2017-05-04

sequences for all four genome segments. We downloaded the raw Illumina sequence reads from the NCBI Short Read Archive (GenBank...MGTV genome segments through sequence similarity (BLASTN) to the published genome of Jingmen tick virus (JMTV) isolate SY84 (GenBank: KJ001579-KJ001582...2014. Standards for sequencing viral genomes in the era of high-throughput sequencing . MBio 5:e01360–14. 8. Bankevich A, Nurk S, Antipov
A one-page summary report of genome sequencing for the healthy adult.

PubMed

Vassy, Jason L; McLaughlin, Heather M; McLaughlin, Heather L; MacRae, Calum A; Seidman, Christine E; Lautenbach, Denise; Krier, Joel B; Lane, William J; Kohane, Isaac S; Murray, Michael F; McGuire, Amy L; Rehm, Heidi L; Green, Robert C

2015-01-01

As genome sequencing technologies increasingly enter medical practice, genetics laboratories must communicate sequencing results effectively to nongeneticist physicians. We describe the design and delivery of a clinical genome sequencing report, including a one-page summary suitable for interpretation by primary care physicians. To illustrate our preliminary experience with this report, we summarize the genomic findings from 10 healthy participants in a study of genome sequencing in primary care. © 2015 S. Karger AG, Basel.
A One-Page Summary Report of Genome Sequencing for the Healthy Adult

PubMed Central

Vassy, Jason L.; McLaughlin, Heather M.; MacRae, Calum A.; Seidman, Christine E.; Lautenbach, Denise; Krier, Joel B.; Lane, William J.; Kohane, Isaac S.; Murray, Michael F.; McGuire, Amy L.; Rehm, Heidi L.; Green, Robert C.

2015-01-01

As genome sequencing technologies increasingly enter medical practice, genetics laboratories must communicate sequencing results effectively to non-geneticist physicians. We describe the design and delivery of a clinical genome sequencing report, including a one-page summary suitable for interpretation by primary care physicians. To illustrate our preliminary experience with this report, we summarize the genomic findings from ten healthy patient participants in a study of genome sequencing in primary care. PMID:25612602
The Global Trade in Fresh Produce and the Vagility of Plant Viruses: A Case Study in Garlic

PubMed Central

Wylie, Stephen J.; Li, Hua; Saqib, Muhammad; Jones, Michael G. K.

2014-01-01

As cuisine becomes globalized, large volumes of fresh produce are traded internationally. The potential exists for pathogens infecting fresh produce to hitchhike to new locations and perhaps to establish there. It is difficult to identify them using traditional methods if pathogens are novel, scarce, and/or unexpected. In an attempt to overcome this limitation, we used high-throughput sequencing technology as a means of detecting all RNA viruses infecting garlic (Allium sativum L.) bulbs imported into Australia from China, the USA, Mexico, Argentina and Spain, and those growing in Australia. Bulbs tested were grown over multiple vegetative generations and all were stably infected with one or more viruses, including two species not previously recorded in Australia. Present in various combinations from 10 garlic bulbs were 41 virus isolates representing potyviruses (Onion yellow dwarf virus, Leek yellow stripe virus), carlaviruses (Shallot latent virus, Garlic common latent virus) and allexiviruses (Garlic virus A, B, C, D, and X), for which 19 complete and 22 partial genome sequences were obtained, including the first complete genome sequences of two isolates of GarVD. The most genetically distinct isolates of GarVA and GarVX described so far were identified from Mexico and Argentina, and possible scenarios explaining this are presented. The complete genome sequence of an isolate of the potexvirus Asparagus virus 3 (AV3) was obtained in Australia from wild garlic (A. vineale L.), a naturalized weed. This is first time AV3 has been identified from wild garlic and the first time it has been identified beyond China and Japan. The need for routine generic diagnosis and appropriate legislation to address the risks to primary production and wild plant communities from pathogens spread through the international trade in fresh produce is discussed. PMID:25133543
Optimisation of DNA extraction from the crustacean Daphnia

PubMed Central

Athanasio, Camila Gonçalves; Chipman, James K.; Viant, Mark R.

2016-01-01

Daphnia are key model organisms for mechanistic studies of phenotypic plasticity, adaptation and microevolution, which have led to an increasing demand for genomics resources. A key step in any genomics analysis, such as high-throughput sequencing, is the availability of sufficient and high quality DNA. Although commercial kits exist to extract genomic DNA from several species, preparation of high quality DNA from Daphnia spp. and other chitinous species can be challenging. Here, we optimise methods for tissue homogenisation, DNA extraction and quantification customised for different downstream analyses (e.g., LC-MS/MS, Hiseq, mate pair sequencing or Nanopore). We demonstrate that if Daphnia magna are homogenised as whole animals (including the carapace), absorbance-based DNA quantification methods significantly over-estimate the amount of DNA, resulting in using insufficient starting material for experiments, such as preparation of sequencing libraries. This is attributed to the high refractive index of chitin in Daphnia’s carapace at 260 nm. Therefore, unless the carapace is removed by overnight proteinase digestion, the extracted DNA should be quantified with fluorescence-based methods. However, overnight proteinase digestion will result in partial fragmentation of DNA therefore the prepared DNA is not suitable for downstream methods that require high molecular weight DNA, such as PacBio, mate pair sequencing and Nanopore. In conclusion, we found that the MasterPure DNA purification kit, coupled with grinding of frozen tissue, is the best method for extraction of high molecular weight DNA as long as the extracted DNA is quantified with fluorescence-based methods. This method generated high yield and high molecular weight DNA (3.10 ± 0.63 ng/µg dry mass, fragments >60 kb), free of organic contaminants (phenol, chloroform) and is suitable for large number of downstream analyses. PMID:27190714
Genomic structure and promoter functional analysis of GnRH3 gene in large yellow croaker (Larimichthys crocea).

PubMed

Huang, Wei; Zhang, Jianshe; Liao, Zhi; Lv, Zhenming; Wu, Huifei; Zhu, Aiyi; Wu, Changwen

2016-01-15

Gonadotropin-releasing hormone III (GnRH3) is considered to be a key neurohormone in fish reproduction control. In the present study, the cDNA and genomic sequences of GnRH3 were cloned and characterized from large yellow croaker Larimichthys crocea. The cDNA encoded a protein of 99 amino acids with four functional motifs. The full-length genome sequence was composed of 3797 nucleotides, including four exons and three introns. Higher identities of amino acid sequences and conserved exon-intron organizations were found between LcGnRH3 and other GnRH3 genes. In addition, some special features of the sequences were detected in partial species. For example, two specific residues (V and A) were found in the family Sciaenidae, and the unique 75-72 bp type of the open reading frame 2 and 3 existed in the family Cyprinidae. Analysis of the 2576 bp promoter fragment of LcGnRH3 showed a number of transcription factor binding sites, such as AP1, CREB, GATA-1, HSF, FOXA2, and FOXL1. Promoter functional analysis using an EGFP reporter fusion in zebrafish larvae presented positive signals in the brain, including the olfactory region, the terminal nerve ganglion, the telencephalon, and the hypothalamus. The expression pattern was generally consistent with the endogenous GnRH3 GFP-expressing transgenic zebrafish lines, but the details were different. These results indicate that the structure and function of LcGnRH3 are generally similar to the other teleost GnRH3 genes, but there exist some distinctions among them. Copyright © 2015 Elsevier B.V. All rights reserved.
Characterization of AFLAV, a Tf1/Sushi retrotransposon from Aspergillus flavus.

PubMed

Hua, Sui-Sheng T; Tarun, Alice S; Pandey, Sonal N; Chang, Leo; Chang, Perng-Kuang

2007-02-01

The plasmid, pAF28, a genomic clone from Aspergillus flavus NRRL 6541, has been used as a hybridization probe to fingerprint A. flavus strains isolated in corn and peanut fields. The insert of pAF28 contains a 4.5 kb region which encodes a truncated retrotransposon (AfRTL-1). In search for a full-length and intact copy of retrotransposon, we exploited a novel PCR cloning strategy by amplifying a 3.4 kb region from the genomic DNA of A. flavus NRRL 6541. The fragment was cloned into pCR 4-TOPO. Sequence analysis confirmed that this region encoded putative domains of partial reverse transcriptase, RNase H, and integrase of the predicted retrotransposon. The two flanking long terminal repeats (LTRs) and the sequence between them comprise a putative full-length LTR retrotransposon of 7799 bp in length. This intact retrotransposon sequence is named AFLAV (A. flavus Retrotransposon). The order of the predicted catalytic domains in the polyprotein (Pol) placed AFLAV in the Tf1/sushi subgroup of the Ty3/gypsy retrotransposon family. Primers derived from AFLAV sequence were used to screen this retrotransposon in other strains of A. flavus. More than fifty strains of A. flavus isolated from different geological origins were surveyed and the results show that many strains have extensive deletions in the regions encoding the capsid (Gag) and Pol.
Consequences of Normalizing Transcriptomic and Genomic Libraries of Plant Genomes Using a Duplex-Specific Nuclease and Tetramethylammonium Chloride

PubMed Central

Froenicke, Lutz; Lavelle, Dean; Martineau, Belinda; Perroud, Bertrand; Michelmore, Richard

2013-01-01

Several applications of high throughput genome and transcriptome sequencing would benefit from a reduction of the high-copy-number sequences in the libraries being sequenced and analyzed, particularly when applied to species with large genomes. We adapted and analyzed the consequences of a method that utilizes a thermostable duplex-specific nuclease for reducing the high-copy components in transcriptomic and genomic libraries prior to sequencing. This reduces the time, cost, and computational effort of obtaining informative transcriptomic and genomic sequence data for both fully sequenced and non-sequenced genomes. It also reduces contamination from organellar DNA in preparations of nuclear DNA. Hybridization in the presence of 3 M tetramethylammonium chloride (TMAC), which equalizes the rates of hybridization of GC and AT nucleotide pairs, reduced the bias against sequences with high GC content. Consequences of this method on the reduction of high-copy and enrichment of low-copy sequences are reported for Arabidopsis and lettuce. PMID:23409088
Consequences of normalizing transcriptomic and genomic libraries of plant genomes using a duplex-specific nuclease and tetramethylammonium chloride.

PubMed

Matvienko, Marta; Kozik, Alexander; Froenicke, Lutz; Lavelle, Dean; Martineau, Belinda; Perroud, Bertrand; Michelmore, Richard

2013-01-01

Several applications of high throughput genome and transcriptome sequencing would benefit from a reduction of the high-copy-number sequences in the libraries being sequenced and analyzed, particularly when applied to species with large genomes. We adapted and analyzed the consequences of a method that utilizes a thermostable duplex-specific nuclease for reducing the high-copy components in transcriptomic and genomic libraries prior to sequencing. This reduces the time, cost, and computational effort of obtaining informative transcriptomic and genomic sequence data for both fully sequenced and non-sequenced genomes. It also reduces contamination from organellar DNA in preparations of nuclear DNA. Hybridization in the presence of 3 M tetramethylammonium chloride (TMAC), which equalizes the rates of hybridization of GC and AT nucleotide pairs, reduced the bias against sequences with high GC content. Consequences of this method on the reduction of high-copy and enrichment of low-copy sequences are reported for Arabidopsis and lettuce.
Personal Genome Sequencing in Ostensibly Healthy Individuals and the PeopleSeq Consortium

PubMed Central

Linderman, Michael D.; Nielsen, Daiva E.; Green, Robert C.

2016-01-01

Thousands of ostensibly healthy individuals have had their exome or genome sequenced, but a much smaller number of these individuals have received any personal genomic results from that sequencing. We term those projects in which ostensibly healthy participants can receive sequencing-derived genetic findings and may also have access to their genomic data as participatory predispositional personal genome sequencing (PPGS). Here we are focused on genome sequencing applied in a pre-symptomatic context and so define PPGS to exclude diagnostic genome sequencing intended to identify the molecular cause of suspected or diagnosed genetic disease. In this report we describe the design of completed and underway PPGS projects, briefly summarize the results reported to date and introduce the PeopleSeq Consortium, a newly formed collaboration of PPGS projects designed to collect much-needed longitudinal outcome data. PMID:27023617
Research progress of plant population genomics based on high-throughput sequencing.

PubMed

Wang, Yun-sheng

2016-08-01

Population genomics, a new paradigm for population genetics, combine the concepts and techniques of genomics with the theoretical system of population genetics and improve our understanding of microevolution through identification of site-specific effect and genome-wide effects using genome-wide polymorphic sites genotypeing. With the appearance and improvement of the next generation high-throughput sequencing technology, the numbers of plant species with complete genome sequences increased rapidly and large scale resequencing has also been carried out in recent years. Parallel sequencing has also been done in some plant species without complete genome sequences. These studies have greatly promoted the development of population genomics and deepened our understanding of the genetic diversity, level of linking disequilibium, selection effect, demographical history and molecular mechanism of complex traits of relevant plant population at a genomic level. In this review, I briely introduced the concept and research methods of population genomics and summarized the research progress of plant population genomics based on high-throughput sequencing. I also discussed the prospect as well as existing problems of plant population genomics in order to provide references for related studies.
Fungal Genomics for Energy and Environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grigoriev, Igor V.

2013-03-11

Genomes of fungi relevant to energy and environment are in focus of the Fungal Genomic Program at the US Department of Energy Joint Genome Institute (JGI). One of its projects, the Genomics Encyclopedia of Fungi, targets fungi related to plant health (symbionts, pathogens, and biocontrol agents) and biorefinery processes (cellulose degradation, sugar fermentation, industrial hosts) by means of genome sequencing and analysis. New chapters of the Encyclopedia can be opened with user proposals to the JGI Community Sequencing Program (CSP). Another JGI project, the 1000 fungal genomes, explores fungal diversity on genome level at scale and is open for usersmore » to nominate new species for sequencing. Over 200 fungal genomes have been sequenced by JGI to date and released through MycoCosm (www.jgi.doe.gov/fungi), a fungal web-portal, which integrates sequence and functional data with genome analysis tools for user community. Sequence analysis supported by functional genomics leads to developing parts list for complex systems ranging from ecosystems of biofuel crops to biorefineries. Recent examples of such parts suggested by comparative genomics and functional analysis in these areas are presented here.« less

Regulatory Mechanisms That Prevent Re-initiation of DNA Replication Can Be Locally Modulated at Origins by Nearby Sequence Elements

PubMed Central

Richardson, Christopher D.; Li, Joachim J.

2014-01-01

Eukaryotic cells must inhibit re-initiation of DNA replication at each of the thousands of origins in their genome because re-initiation can generate genomic alterations with extraordinary frequency. To minimize the probability of re-initiation from so many origins, cells use a battery of regulatory mechanisms that reduce the activity of replication initiation proteins. Given the global nature of these mechanisms, it has been presumed that all origins are inhibited identically. However, origins re-initiate with diverse efficiencies when these mechanisms are disabled, and this diversity cannot be explained by differences in the efficiency or timing of origin initiation during normal S phase replication. This observation raises the possibility of an additional layer of replication control that can differentially regulate re-initiation at distinct origins. We have identified novel genetic elements that are necessary for preferential re-initiation of two origins and sufficient to confer preferential re-initiation on heterologous origins when the control of re-initiation is partially deregulated. The elements do not enhance the S phase timing or efficiency of adjacent origins and thus are specifically acting as re-initiation promoters (RIPs). We have mapped the two RIPs to ∼60 bp AT rich sequences that act in a distance- and sequence-dependent manner. During the induction of re-replication, Mcm2-7 reassociates both with origins that preferentially re-initiate and origins that do not, suggesting that the RIP elements can overcome a block to re-initiation imposed after Mcm2-7 associates with origins. Our findings identify a local level of control in the block to re-initiation. This local control creates a complex genomic landscape of re-replication potential that is revealed when global mechanisms preventing re-replication are compromised. Hence, if re-replication does contribute to genomic alterations, as has been speculated for cancer cells, some regions of the genome may be more susceptible to these alterations than others. PMID:24945837
Development and validation of an rDNA operon based primer walking strategy applicable to de novo bacterial genome finishing

PubMed Central

Eastman, Alexander W.; Yuan, Ze-Chun

2015-01-01

Advances in sequencing technology have drastically increased the depth and feasibility of bacterial genome sequencing. However, little information is available that details the specific techniques and procedures employed during genome sequencing despite the large numbers of published genomes. Shotgun approaches employed by second-generation sequencing platforms has necessitated the development of robust bioinformatics tools for in silico assembly, and complete assembly is limited by the presence of repetitive DNA sequences and multi-copy operons. Typically, re-sequencing with multiple platforms and laborious, targeted Sanger sequencing are employed to finish a draft bacterial genome. Here we describe a novel strategy based on the identification and targeted sequencing of repetitive rDNA operons to expedite bacterial genome assembly and finishing. Our strategy was validated by finishing the genome of Paenibacillus polymyxa strain CR1, a bacterium with potential in sustainable agriculture and bio-based processes. An analysis of the 38 contigs contained in the P. polymyxa strain CR1 draft genome revealed 12 repetitive rDNA operons with varied intragenic and flanking regions of variable length, unanimously located at contig boundaries and within contig gaps. These highly similar but not identical rDNA operons were experimentally verified and sequenced simultaneously with multiple, specially designed primer sets. This approach also identified and corrected significant sequence rearrangement generated during the initial in silico assembly of sequencing reads. Our approach reduces the required effort associated with blind primer walking for contig assembly, increasing both the speed and feasibility of genome finishing. Our study further reinforces the notion that repetitive DNA elements are major limiting factors for genome finishing. Moreover, we provided a step-by-step workflow for genome finishing, which may guide future bacterial genome finishing projects. PMID:25653642
Why Assembling Plant Genome Sequences Is So Challenging

PubMed Central

Claros, Manuel Gonzalo; Bautista, Rocío; Guerrero-Fernández, Darío; Benzerki, Hicham; Seoane, Pedro; Fernández-Pozo, Noé

2012-01-01

In spite of the biological and economic importance of plants, relatively few plant species have been sequenced. Only the genome sequence of plants with relatively small genomes, most of them angiosperms, in particular eudicots, has been determined. The arrival of next-generation sequencing technologies has allowed the rapid and efficient development of new genomic resources for non-model or orphan plant species. But the sequencing pace of plants is far from that of animals and microorganisms. This review focuses on the typical challenges of plant genomes that can explain why plant genomics is less developed than animal genomics. Explanations about the impact of some confounding factors emerging from the nature of plant genomes are given. As a result of these challenges and confounding factors, the correct assembly and annotation of plant genomes is hindered, genome drafts are produced, and advances in plant genomics are delayed. PMID:24832233
Insights from Human/Mouse genome comparisons

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pennacchio, Len A.

2003-03-30

Large-scale public genomic sequencing efforts have provided a wealth of vertebrate sequence data poised to provide insights into mammalian biology. These include deep genomic sequence coverage of human, mouse, rat, zebrafish, and two pufferfish (Fugu rubripes and Tetraodon nigroviridis) (Aparicio et al. 2002; Lander et al. 2001; Venter et al. 2001; Waterston et al. 2002). In addition, a high-priority has been placed on determining the genomic sequence of chimpanzee, dog, cow, frog, and chicken (Boguski 2002). While only recently available, whole genome sequence data have provided the unique opportunity to globally compare complete genome contents. Furthermore, the shared evolutionary ancestrymore » of vertebrate species has allowed the development of comparative genomic approaches to identify ancient conserved sequences with functionality. Accordingly, this review focuses on the initial comparison of available mammalian genomes and describes various insights derived from such analysis.« less
First evidence of dengue infection in domestic dogs living in different ecological settings in Thailand.

PubMed

Thongyuan, Suporn; Kittayapong, Pattamaporn

2017-01-01

Dengue is a vector-borne disease transmitted by Aedes mosquitoes. It is considered an important public health problem in many countries worldwide. However, only a few studies have been conducted on primates and domestic animals that could potentially be a reservoir of dengue viruses. Since domestic dogs share both habitats and vectors with humans, this study aimed to investigate whether domestic dogs living in different ecological settings in dengue endemic areas in Thailand could be naturally infected with dengue viruses. Serum samples were collected from domestic dogs in three different ecological settings of Thailand: urban dengue endemic areas of Nakhon Sawan Province; rubber plantation areas of Rayong Province; and Koh Chang, an island tourist spot of Trat Province. These samples were screened for dengue viral genome by using semi-nested RT-PCR. Positive samples were then inoculated in mosquito and dog cell lines for virus isolation. Supernatant collected from cell culture was tested for the presence of dengue viral genome by semi-nested RT-PCR, then double-strand DNA products were double-pass custom-sequenced. Partial nucleotide sequences were aligned with the sequences already recorded in GenBank, and a phylogenetic tree was constructed. In the urban setting, 632 domestic dog serum samples were screened for dengue virus genome by RT-PCR, and six samples (0.95%) tested positive for dengue virus. Four out of six dengue viruses from positive samples were successfully isolated. Dengue virus serotype 2 and serotype 3 were found to have circulated in domestic dog populations. One of 153 samples (0.65%) collected from the rubber plantation area showed a PCR-positive result, and dengue serotype 3 was successfully isolated. Partial gene phylogeny revealed that the isolated dengue viruses were closely related to those strains circulating in human populations. None of the 71 samples collected from the island tourist spot showed a positive result. We concluded that domestic dogs can be infected with dengue virus strains circulating in dengue endemic areas. The role of domestic dogs in dengue transmission needs to be further investigated, i.e., whether they are potential reservoirs or incidental hosts of dengue viruses.
Variable Copy Number, Intra-Genomic Heterogeneities and Lateral Transfers of the 16S rRNA Gene in Pseudomonas

PubMed Central

Bodilis, Josselin; Nsigue-Meilo, Sandrine; Besaury, Ludovic; Quillet, Laurent

2012-01-01

Even though the 16S rRNA gene is the most commonly used taxonomic marker in microbial ecology, its poor resolution is still not fully understood at the intra-genus level. In this work, the number of rRNA gene operons, intra-genomic heterogeneities and lateral transfers were investigated at a fine-scale resolution, throughout the Pseudomonas genus. In addition to nineteen sequenced Pseudomonas strains, we determined the 16S rRNA copy number in four other Pseudomonas strains by Southern hybridization and Pulsed-Field Gel Electrophoresis, and studied the intra-genomic heterogeneities by Denaturing Gradient Gel Electrophoresis and sequencing. Although the variable copy number (from four to seven) seems to be correlated with the evolutionary distance, some close strains in the P. fluorescens lineage showed a different number of 16S rRNA genes, whereas all the strains in the P. aeruginosa lineage displayed the same number of genes (four copies). Further study of the intra-genomic heterogeneities revealed that most of the Pseudomonas strains (15 out of 19 strains) had at least two different 16S rRNA alleles. A great difference (5 or 19 nucleotides, essentially grouped near the V1 hypervariable region) was observed only in two sequenced strains. In one of our strains studied (MFY30 strain), we found a difference of 12 nucleotides (grouped in the V3 hypervariable region) between copies of the 16S rRNA gene. Finally, occurrence of partial lateral transfers of the 16S rRNA gene was further investigated in 1803 full-length sequences of Pseudomonas available in the databases. Remarkably, we found that the two most variable regions (the V1 and V3 hypervariable regions) had probably been laterally transferred from another evolutionary distant Pseudomonas strain for at least 48.3 and 41.6% of the 16S rRNA sequences, respectively. In conclusion, we strongly recommend removing these regions of the 16S rRNA gene during the intra-genus diversity studies. PMID:22545126
Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum

DOE Office of Scientific and Technical Information (OSTI.GOV)

VanBuren, Robert; Bryant, Doug; Edger, Patrick P.

Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly1. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetiummore » genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a ‘near-complete’ draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. As a result, the Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.« less
Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum

DOE PAGES

VanBuren, Robert; Bryant, Doug; Edger, Patrick P.; ...

2015-11-11

Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly1. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetiummore » genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a ‘near-complete’ draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. As a result, the Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.« less
Partial nucleotide sequences, and routine typing by polymerase chain reaction-restriction fragment length polymorphism, of the brown trout (Salmo trutta) lactate dehydrogenase, LDH-C1*90 and *100 alleles.

PubMed

McMeel, O M; Hoey, E M; Ferguson, A

2001-01-01

The cDNA nucleotide sequences of the lactate dehydrogenase alleles LDH-C1*90 and *100 of brown trout (Salmo trutta) were found to differ at position 308 where an A is present in the *100 allele but a G is present in the *90 allele. This base substitution results in an amino acid change from aspartic acid at position 82 in the LDH-C1 100 allozyme to a glycine in the 90 allozyme. Since aspartic acid has a net negative charge whilst glycine is uncharged, this is consistent with the electrophoretic observation that the LDH-C1 100 allozyme has a more anodal mobility relative to the LDH-C1 90 allozyme. Based on alignment of the cDNA sequence with the mouse genomic sequence, a local primer set was designed, incorporating the variable position, and was found to give very good amplification with brown trout genomic DNA. Sequencing of this fragment confirmed the difference in both homozygous and heterozygous individuals. Digestion of the polymerase chain reaction products with BslI, a restriction enzyme specific for the site difference, gave one, two and three fragments for the two homozygotes and the heterozygote, respectively, following electrophoretic separation. This provides a DNA-based means of routine screening of the highly informative LDH-C1* polymorphism in brown trout population genetic studies. Primer sets presented could be used to sequence cDNA of other LDH* genes of brown trout and other species.
It’s More Than Stamp Collecting: How Genome Sequencing Can Unify Biological Research

PubMed Central

Richards, Stephen

2015-01-01

The availability of reference genome sequences, especially the human reference, has revolutionized the study of biology. However, whilst the genomes of some species have been fully sequenced, a wide range of biological problems still cannot be effectively studied for lack of genome sequence information. Here, I identify neglected areas of biology and describe how both targeted species sequencing and more broad taxonomic surveys of the tree of life can address important biological questions. I enumerate the significant benefits that would accrue from sequencing a broader range of taxa, as well as discuss the technical advances in sequencing and assembly methods that would allow for wide-ranging application of whole-genome analysis. Finally, I suggest that in addition to “Big Science” survey initiatives to sequence the tree of life, a modified infrastructure-funding paradigm would better support reference genome sequence generation for research communities most in need. PMID:26003218
It's more than stamp collecting: how genome sequencing can unify biological research.

PubMed

Richards, Stephen

2015-07-01

The availability of reference genome sequences, especially the human reference, has revolutionized the study of biology. However, while the genomes of some species have been fully sequenced, a wide range of biological problems still cannot be effectively studied for lack of genome sequence information. Here, I identify neglected areas of biology and describe how both targeted species sequencing and more broad taxonomic surveys of the tree of life can address important biological questions. I enumerate the significant benefits that would accrue from sequencing a broader range of taxa, as well as discuss the technical advances in sequencing and assembly methods that would allow for wide-ranging application of whole-genome analysis. Finally, I suggest that in addition to 'big science' survey initiatives to sequence the tree of life, a modified infrastructure-funding paradigm would better support reference genome sequence generation for research communities most in need. Copyright © 2015 Elsevier Ltd. All rights reserved.
Complete Genome Sequence of Pigmentation Negative Yersinia Pestis strain Cadman Running head: Complete Genome Sequence of Y. pestis strain Cadman

DTIC Science & Technology

2016-10-27

Institute of Infectious Diseases, Fort Detrick, Frederick, Maryland, USA 9 10 11 Running head: Complete Genome Sequence of Y. pestis strain Cadman...1 Complete Genome Sequence of Pigmentation Negative Yersinia pestis strain Cadman 1 2 3 Sean Lovetta, Kitty Chaseb, Galina Korolevaa, Gustavo...we report the genome sequence of Yersinia pestis strain Cadman, an attenuated strain 25 lacking the pgm locus. Y. pestis is the causative agent of
Systematics of Hypocrea citrina and related taxa

PubMed Central

Overton, Barrie E.; Stewart, Elwin L.; Geiser, David M.; Jaklitsch, Walter M.

2006-01-01

Morphological studies and phylogenetic analyses of DNA sequences from three genomic regions – the internal transcribed spacer (ITS) regions of the nuclear ribosomal gene repeat, a partial sequence of RNA polymerase II subunit (rpb2), and a partial sequence of translation elongation factor (tef1) – were used to investigate the systematics of Hypocrea citrina and related species. A neotype specimen is designated for H. citrina that conforms to Persoon's description of a yellow effuse fungus occurring on leaf litter. Historical information and results obtained in this study provide the foundation for selection of a lectotype specimen from Fries's herbarium for H. lactea. The results indicate that (1) Hypocrea citrina and H. pulvinata are distinct species; (2) H. lactea sensu Fries is a synonym of the older name H. citrina; (3) H. pulvinata, H. protopulvinata, and H. americana are phylogenetically distinct species that form a well-supported polyporicolous clade; (4) H. citrina is situated in a clade closely related to H. pulvinata; and (5) H. microcitrina and H. pseudostraminea reside in a highly supported clade phylogenetically distinct from H. citrina. Hypocrea protopulvinata, H. microcitrina, H. megalocitrina, H. pseudostraminea, and a new species, H. aurantiistroma, are reported and described from North America. Variation in rpb2 and tef1 gene sequences suggests geographical subgroupings between European and North American isolates of H. pulvinata. The phylogenies inferred from ITS, rpb2, and tef1 gene sequences are concordant. Hypocrea citrina var. americana is elevated to species status, Hypocrea americana. PMID:18490988
Whole genome sequencing of Saccharomyces cerevisiae: from genotype to phenotype for improved metabolic engineering applications.

PubMed

Otero, José Manuel; Vongsangnak, Wanwipa; Asadollahi, Mohammad A; Olivares-Hernandes, Roberto; Maury, Jérôme; Farinelli, Laurent; Barlocher, Loïc; Osterås, Magne; Schalk, Michel; Clark, Anthony; Nielsen, Jens

2010-12-22

The need for rapid and efficient microbial cell factory design and construction are possible through the enabling technology, metabolic engineering, which is now being facilitated by systems biology approaches. Metabolic engineering is often complimented by directed evolution, where selective pressure is applied to a partially genetically engineered strain to confer a desirable phenotype. The exact genetic modification or resulting genotype that leads to the improved phenotype is often not identified or understood to enable further metabolic engineering. In this work we performed whole genome high-throughput sequencing and annotation can be used to identify single nucleotide polymorphisms (SNPs) between Saccharomyces cerevisiae strains S288c and CEN.PK113-7D. The yeast strain S288c was the first eukaryote sequenced, serving as the reference genome for the Saccharomyces Genome Database, while CEN.PK113-7D is a preferred laboratory strain for industrial biotechnology research. A total of 13,787 high-quality SNPs were detected between both strains (reference strain: S288c). Considering only metabolic genes (782 of 5,596 annotated genes), a total of 219 metabolism specific SNPs are distributed across 158 metabolic genes, with 85 of the SNPs being nonsynonymous (e.g., encoding amino acid modifications). Amongst metabolic SNPs detected, there was pathway enrichment in the galactose uptake pathway (GAL1, GAL10) and ergosterol biosynthetic pathway (ERG8, ERG9). Physiological characterization confirmed a strong deficiency in galactose uptake and metabolism in S288c compared to CEN.PK113-7D, and similarly, ergosterol content in CEN.PK113-7D was significantly higher in both glucose and galactose supplemented cultivations compared to S288c. Furthermore, DNA microarray profiling of S288c and CEN.PK113-7D in both glucose and galactose batch cultures did not provide a clear hypothesis for major phenotypes observed, suggesting that genotype to phenotype correlations are manifested post-transcriptionally or post-translationally either through protein concentration and/or function. With an intensifying need for microbial cell factories that produce a wide array of target compounds, whole genome high-throughput sequencing and annotation for SNP detection can aid in better reducing and defining the metabolic landscape. This work demonstrates direct correlations between genotype and phenotype that provides clear and high-probability of success metabolic engineering targets. The genome sequence, annotation, and a SNP viewer of CEN.PK113-7D are deposited at http://www.sysbio.se/cenpk.
Whole genome sequencing of Saccharomyces cerevisiae: from genotype to phenotype for improved metabolic engineering applications

PubMed Central

2010-01-01

Background The need for rapid and efficient microbial cell factory design and construction are possible through the enabling technology, metabolic engineering, which is now being facilitated by systems biology approaches. Metabolic engineering is often complimented by directed evolution, where selective pressure is applied to a partially genetically engineered strain to confer a desirable phenotype. The exact genetic modification or resulting genotype that leads to the improved phenotype is often not identified or understood to enable further metabolic engineering. Results In this work we performed whole genome high-throughput sequencing and annotation can be used to identify single nucleotide polymorphisms (SNPs) between Saccharomyces cerevisiae strains S288c and CEN.PK113-7D. The yeast strain S288c was the first eukaryote sequenced, serving as the reference genome for the Saccharomyces Genome Database, while CEN.PK113-7D is a preferred laboratory strain for industrial biotechnology research. A total of 13,787 high-quality SNPs were detected between both strains (reference strain: S288c). Considering only metabolic genes (782 of 5,596 annotated genes), a total of 219 metabolism specific SNPs are distributed across 158 metabolic genes, with 85 of the SNPs being nonsynonymous (e.g., encoding amino acid modifications). Amongst metabolic SNPs detected, there was pathway enrichment in the galactose uptake pathway (GAL1, GAL10) and ergosterol biosynthetic pathway (ERG8, ERG9). Physiological characterization confirmed a strong deficiency in galactose uptake and metabolism in S288c compared to CEN.PK113-7D, and similarly, ergosterol content in CEN.PK113-7D was significantly higher in both glucose and galactose supplemented cultivations compared to S288c. Furthermore, DNA microarray profiling of S288c and CEN.PK113-7D in both glucose and galactose batch cultures did not provide a clear hypothesis for major phenotypes observed, suggesting that genotype to phenotype correlations are manifested post-transcriptionally or post-translationally either through protein concentration and/or function. Conclusions With an intensifying need for microbial cell factories that produce a wide array of target compounds, whole genome high-throughput sequencing and annotation for SNP detection can aid in better reducing and defining the metabolic landscape. This work demonstrates direct correlations between genotype and phenotype that provides clear and high-probability of success metabolic engineering targets. The genome sequence, annotation, and a SNP viewer of CEN.PK113-7D are deposited at http://www.sysbio.se/cenpk. PMID:21176163
MIPS: a database for genomes and protein sequences.

PubMed Central

Mewes, H W; Heumann, K; Kaps, A; Mayer, K; Pfeiffer, F; Stocker, S; Frishman, D

1999-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried near Munich, Germany, develops and maintains genome oriented databases. It is commonplace that the amount of sequence data available increases rapidly, but not the capacity of qualified manual annotation at the sequence databases. Therefore, our strategy aims to cope with the data stream by the comprehensive application of analysis tools to sequences of complete genomes, the systematic classification of protein sequences and the active support of sequence analysis and functional genomics projects. This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). MIPS provides access through its WWW server (http://www.mips.biochem.mpg.de) to a spectrum of generic databases, including the above mentioned as well as a database of protein families (PROTFAM), the MITOP database, and the all-against-all FASTA database. PMID:9847138
Partial characterization of the lettuce infectious yellows virus genomic RNAs, identification of the coat protein gene and comparison of its amino acid sequence with those of other filamentous RNA plant viruses.

PubMed

Klaassen, V A; Boeshore, M; Dolja, V V; Falk, B W

1994-07-01

Purified virions of lettuce infectious yellows virus (LIYV), a tentative member of the closterovirus group, contained two RNAs of approximately 8500 and 7300 nucleotides (RNAs 1 and 2 respectively) and a single coat protein species with M(r) of approximately 28,000. LIYV-infected plants contained multiple dsRNAs. The two largest were the correct size for the replicative forms of LIYV virion RNAs 1 and 2. To assess the relationships between LIYV RNAs 1 and 2, cDNAs corresponding to the virion RNAs were cloned. Northern blot hybridization analysis showed no detectable sequence homology between these RNAs. A partial amino acid sequence obtained from purified LIYV coat protein was found to align in the most upstream of four complete open reading frames (ORFs) identified in a LIYV RNA 2 cDNA clone. The identity of this ORF was confirmed as the LIYV coat protein gene by immunological analysis of the gene product expressed in vitro and in Escherichia coli. Computer analysis of the LIYV coat protein amino acid sequence indicated that it belongs to a large family of proteins forming filamentous capsids of RNA plant viruses. The LIYV coat protein appears to be most closely related to the coat proteins of two closteroviruses, beet yellows virus and citrus tristeza virus.
The Genome Sequencer FLX System--longer reads, more applications, straight forward bioinformatics and more complete data sets.

PubMed

Droege, Marcus; Hill, Brendon

2008-08-31

The Genome Sequencer FLX System (GS FLX), powered by 454 Sequencing, is a next-generation DNA sequencing technology featuring a unique mix of long reads, exceptional accuracy, and ultra-high throughput. It has been proven to be the most versatile of all currently available next-generation sequencing technologies, supporting many high-profile studies in over seven applications categories. GS FLX users have pursued innovative research in de novo sequencing, re-sequencing of whole genomes and target DNA regions, metagenomics, and RNA analysis. 454 Sequencing is a powerful tool for human genetics research, having recently re-sequenced the genome of an individual human, currently re-sequencing the complete human exome and targeted genomic regions using the NimbleGen sequence capture process, and detected low-frequency somatic mutations linked to cancer.
Newborn Sequencing in Genomic Medicine and Public Health.

PubMed

Berg, Jonathan S; Agrawal, Pankaj B; Bailey, Donald B; Beggs, Alan H; Brenner, Steven E; Brower, Amy M; Cakici, Julie A; Ceyhan-Birsoy, Ozge; Chan, Kee; Chen, Flavia; Currier, Robert J; Dukhovny, Dmitry; Green, Robert C; Harris-Wai, Julie; Holm, Ingrid A; Iglesias, Brenda; Joseph, Galen; Kingsmore, Stephen F; Koenig, Barbara A; Kwok, Pui-Yan; Lantos, John; Leeder, Steven J; Lewis, Megan A; McGuire, Amy L; Milko, Laura V; Mooney, Sean D; Parad, Richard B; Pereira, Stacey; Petrikin, Joshua; Powell, Bradford C; Powell, Cynthia M; Puck, Jennifer M; Rehm, Heidi L; Risch, Neil; Roche, Myra; Shieh, Joseph T; Veeraraghavan, Narayanan; Watson, Michael S; Willig, Laurel; Yu, Timothy W; Urv, Tiina; Wise, Anastasia L

2017-02-01

The rapid development of genomic sequencing technologies has decreased the cost of genetic analysis to the extent that it seems plausible that genome-scale sequencing could have widespread availability in pediatric care. Genomic sequencing provides a powerful diagnostic modality for patients who manifest symptoms of monogenic disease and an opportunity to detect health conditions before their development. However, many technical, clinical, ethical, and societal challenges should be addressed before such technology is widely deployed in pediatric practice. This article provides an overview of the Newborn Sequencing in Genomic Medicine and Public Health Consortium, which is investigating the application of genome-scale sequencing in newborns for both diagnosis and screening. Copyright © 2017 by the American Academy of Pediatrics.
Comparative sequence analysis of Sordaria macrospora and Neurospora crassa as a means to improve genome annotation.

PubMed

Nowrousian, Minou; Würtz, Christian; Pöggeler, Stefanie; Kück, Ulrich

2004-03-01

One of the most challenging parts of large scale sequencing projects is the identification of functional elements encoded in a genome. Recently, studies of genomes of up to six different Saccharomyces species have demonstrated that a comparative analysis of genome sequences from closely related species is a powerful approach to identify open reading frames and other functional regions within genomes [Science 301 (2003) 71, Nature 423 (2003) 241]. Here, we present a comparison of selected sequences from Sordaria macrospora to their corresponding Neurospora crassa orthologous regions. Our analysis indicates that due to the high degree of sequence similarity and conservation of overall genomic organization, S. macrospora sequence information can be used to simplify the annotation of the N. crassa genome.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.