consensus genome sequence: Topics by Science.gov

Sample records for consensus genome sequence

Construction of an ultra-high density consensus genetic map, and enhancement of the physical map from genome sequencing in Lupinus angustifolius.

PubMed

Zhou, Gaofeng; Jian, Jianbo; Wang, Penghao; Li, Chengdao; Tao, Ye; Li, Xuan; Renshaw, Daniel; Clements, Jonathan; Sweetingham, Mark; Yang, Huaan

2018-01-01

An ultra-high density genetic map containing 34,574 sequence-defined markers was developed in Lupinus angustifolius. Markers closely linked to nine genes of agronomic traits were identified. A physical map was improved to cover 560.5 Mb genome sequence. Lupin (Lupinus angustifolius L.) is a recently domesticated legume grain crop. In this study, we applied the restriction-site associated DNA sequencing (RADseq) method to genotype an F 9 recombinant inbred line population derived from a wild type × domesticated cultivar (W × D) cross. A high density linkage map was developed based on the W × D population. By integrating sequence-defined DNA markers reported in previous mapping studies, we established an ultra-high density consensus genetic map, which contains 34,574 markers consisting of 3508 loci covering 2399 cM on 20 linkage groups. The largest gap in the entire consensus map was 4.73 cM. The high density W × D map and the consensus map were used to develop an improved physical map, which covered 560.5 Mb of genome sequence data. The ultra-high density consensus linkage map, the improved physical map and the markers linked to genes of breeding interest reported in this study provide a common tool for genome sequence assembly, structural genomics, comparative genomics, functional genomics, QTL mapping, and molecular plant breeding in lupin.
First full-length genome sequence of the polerovirus luffa aphid-borne yellows virus (LABYV) reveals the presence of at least two consensus sequences in an isolate from Thailand.

PubMed

Knierim, Dennis; Maiss, Edgar; Kenyon, Lawrence; Winter, Stephan; Menzel, Wulf

2015-10-01

Luffa aphid-borne yellows virus (LABYV) was proposed as the name for a previously undescribed polerovirus based on partial genome sequences obtained from samples of cucurbit plants collected in Thailand between 2008 and 2013. In this study, we determined the first full-length genome sequence of LABYV. Based on phylogenetic analysis and genome properties, it is clear that this virus represents a distinct species in the genus Polerovirus. Analysis of sequences from sample TH24, which was collected in 2010 from a luffa plant in Thailand, reveals the presence of two different full-length genome consensus sequences.
Consensus generation and variant detection by Celera Assembler.

PubMed

Denisov, Gennady; Walenz, Brian; Halpern, Aaron L; Miller, Jason; Axelrod, Nelson; Levy, Samuel; Sutton, Granger

2008-04-15

We present an algorithm to identify allelic variation given a Whole Genome Shotgun (WGS) assembly of haploid sequences, and to produce a set of haploid consensus sequences rather than a single consensus sequence. Existing WGS assemblers take a column-by-column approach to consensus generation, and produce a single consensus sequence which can be inconsistent with the underlying haploid alleles, and inconsistent with any of the aligned sequence reads. Our new algorithm uses a dynamic windowing approach. It detects alleles by simultaneously processing the portions of aligned reads spanning a region of sequence variation, assigns reads to their respective alleles, phases adjacent variant alleles and generates a consensus sequence corresponding to each confirmed allele. This algorithm was used to produce the first diploid genome sequence of an individual human. It can also be applied to assemblies of multiple diploid individuals and hybrid assemblies of multiple haploid organisms. Being applied to the individual human genome assembly, the new algorithm detects exactly two confirmed alleles and reports two consensus sequences in 98.98% of the total number 2,033311 detected regions of sequence variation. In 33,269 out of 460,373 detected regions of size >1 bp, it fixes the constructed errors of a mosaic haploid representation of a diploid locus as produced by the original Celera Assembler consensus algorithm. Using an optimized procedure calibrated against 1 506 344 known SNPs, it detects 438 814 new heterozygous SNPs with false positive rate 12%. The open source code is available at: http://wgs-assembler.cvs.sourceforge.net/wgs-assembler/
Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies.

PubMed

Zeng, Lu; Kortschak, R Daniel; Raison, Joy M; Bertozzi, Terry; Adelson, David L

2018-01-01

Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package.
Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies

PubMed Central

Zeng, Lu; Kortschak, R. Daniel; Raison, Joy M.

2018-01-01

Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package. PMID:29538441
Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

PubMed

Chin, Chen-Shan; Alexander, David H; Marks, Patrick; Klammer, Aaron A; Drake, James; Heiner, Cheryl; Clum, Alicia; Copeland, Alex; Huddleston, John; Eichler, Evan E; Turner, Stephen W; Korlach, Jonas

2013-06-01

We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.
Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples

PubMed Central

Quick, Josh; Grubaugh, Nathan D; Pullan, Steven T; Claro, Ingra M; Smith, Andrew D; Gangavarapu, Karthik; Oliveira, Glenn; Robles-Sikisaka, Refugio; Rogers, Thomas F; Beutler, Nathan A; Burton, Dennis R; Lewis-Ximenez, Lia Laura; de Jesus, Jaqueline Goes; Giovanetti, Marta; Hill, Sarah; Black, Allison; Bedford, Trevor; Carroll, Miles W; Nunes, Marcio; Alcantara, Luiz Carlos; Sabino, Ester C; Baylis, Sally A; Faria, Nuno; Loose, Matthew; Simpson, Jared T; Pybus, Oliver G; Andersen, Kristian G; Loman, Nicholas J

2018-01-01

Genome sequencing has become a powerful tool for studying emerging infectious diseases; however, genome sequencing directly from clinical samples without isolation remains challenging for viruses such as Zika, where metagenomic sequencing methods may generate insufficient numbers of viral reads. Here we present a protocol for generating coding-sequence complete genomes comprising an online primer design tool, a novel multiplex PCR enrichment protocol, optimised library preparation methods for the portable MinION sequencer (Oxford Nanopore Technologies) and the Illumina range of instruments, and a bioinformatics pipeline for generating consensus sequences. The MinION protocol does not require an internet connection for analysis, making it suitable for field applications with limited connectivity. Our method relies on multiplex PCR for targeted enrichment of viral genomes from samples containing as few as 50 genome copies per reaction. Viral consensus sequences can be achieved starting with clinical samples in 1-2 days following a simple laboratory workflow. This method has been successfully used by several groups studying Zika virus evolution and is facilitating an understanding of the spread of the virus in the Americas. PMID:28538739
Isolation and characterization of target sequences of the chicken CdxA homeobox gene.

PubMed Central

Margalit, Y; Yarus, S; Shapira, E; Gruenbaum, Y; Fainsod, A

1993-01-01

The DNA binding specificity of the chicken homeodomain protein CDXA was studied. Using a CDXA-glutathione-S-transferase fusion protein, DNA fragments containing the binding site for this protein were isolated. The sources of DNA were oligonucleotides with random sequence and chicken genomic DNA. The DNA fragments isolated were sequenced and tested in DNA binding assays. Sequencing revealed that most DNA fragments are AT rich which is a common feature of homeodomain binding sites. By electrophoretic mobility shift assays it was shown that the different target sequences isolated bind to the CDXA protein with different affinities. The specific sequences bound by the CDXA protein in the genomic fragments isolated, were determined by DNase I footprinting. From the footprinted sequences, the CDXA consensus binding site was determined. The CDXA protein binds the consensus sequence A, A/T, T, A/T, A, T, A/G. The CAUDAL binding site in the ftz promoter is also included in this consensus sequence. When tested, some of the genomic target sequences were capable of enhancing the transcriptional activity of reporter plasmids when introduced into CDXA expressing cells. This study determined the DNA sequence specificity of the CDXA protein and it also shows that this protein can further activate transcription in cells in culture. Images PMID:7909943
Nanopore DNA Sequencing and Genome Assembly on the International Space Station.

PubMed

Castro-Wallace, Sarah L; Chiu, Charles Y; John, Kristen K; Stahl, Sarah E; Rubins, Kathleen H; McIntyre, Alexa B R; Dworkin, Jason P; Lupisella, Mark L; Smith, David J; Botkin, Douglas J; Stephenson, Timothy A; Juul, Sissel; Turner, Daniel J; Izquierdo, Fernando; Federman, Scot; Stryke, Doug; Somasekar, Sneha; Alexander, Noah; Yu, Guixia; Mason, Christopher E; Burton, Aaron S

2017-12-21

We evaluated the performance of the MinION DNA sequencer in-flight on the International Space Station (ISS), and benchmarked its performance off-Earth against the MinION, Illumina MiSeq, and PacBio RS II sequencing platforms in terrestrial laboratories. Samples contained equimolar mixtures of genomic DNA from lambda bacteriophage, Escherichia coli (strain K12, MG1655) and Mus musculus (female BALB/c mouse). Nine sequencing runs were performed aboard the ISS over a 6-month period, yielding a total of 276,882 reads with no apparent decrease in performance over time. From sequence data collected aboard the ISS, we constructed directed assemblies of the ~4.6 Mb E. coli genome, ~48.5 kb lambda genome, and a representative M. musculus sequence (the ~16.3 kb mitochondrial genome), at 100%, 100%, and 96.7% consensus pairwise identity, respectively; de novo assembly of the E. coli genome from raw reads yielded a single contig comprising 99.9% of the genome at 98.6% consensus pairwise identity. Simulated real-time analyses of in-flight sequence data using an automated bioinformatic pipeline and laptop-based genomic assembly demonstrated the feasibility of sequencing analysis and microbial identification aboard the ISS. These findings illustrate the potential for sequencing applications including disease diagnosis, environmental monitoring, and elucidating the molecular basis for how organisms respond to spaceflight.
Mapping and Sequencing the Human Genome

DOE R&D Accomplishments Database

1988-01-01

Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.
Sequences of Zika Virus Genomes from a Pediatric Cohort in Nicaragua.

PubMed

Oldfield, Lauren M; Fedorova, Nadia; Puri, Vinita; Shrivastava, Susmita; Amedeo, Paolo; Durbin, Alan; Rocchi, Iara; Williams, Torrey; Shabman, Reed S; Tan, Gene S; Balmaseda, Angel; Kuan, Guillermina; Saborio, Saira; Gordon, Aubree; Harris, Eva; Pickett, Brett E

2018-06-14

We report here the whole-genome sequence of 11 Zika virus (ZIKV) samples from six pediatric patients in Nicaragua. Serum samples were collected, and ZIKV was isolated in tissue culture. Both serum and virus isolates were sequenced. The consensus ZIKV genomes are greater than 99% identical to each other. Copyright © 2018 Oldfield et al.
High-Throughput SNP Discovery through Deep Resequencing of a Reduced Representation Library to Anchor and Orient Scaffolds in the Soybean Whole Genome Sequence

USDA-ARS?s Scientific Manuscript database

The soybean Consensus Map 4.0 facilitated the anchoring of 95.6% of the soybean whole genome sequence developed by the Joint Genome Institute, Department of Energy but only properly oriented 66% of the sequence scaffolds. To find additional single nucleotide polymorphism (SNP) markers for additiona...
A 1,681-locus consensus genetic map of cultivated cucumber including 67 NB-LRR resistance gene homolog and ten gene loci

PubMed Central

2013-01-01

Background Cucumber is an important vegetable crop that is susceptible to many pathogens, but no disease resistance (R) genes have been cloned. The availability of whole genome sequences provides an excellent opportunity for systematic identification and characterization of the nucleotide binding and leucine-rich repeat (NB-LRR) type R gene homolog (RGH) sequences in the genome. Cucumber has a very narrow genetic base making it difficult to construct high-density genetic maps. Development of a consensus map by synthesizing information from multiple segregating populations is a method of choice to increase marker density. As such, the objectives of the present study were to identify and characterize NB-LRR type RGHs, and to develop a high-density, integrated cucumber genetic-physical map anchored with RGH loci. Results From the Gy14 draft genome, 70 NB-containing RGHs were identified and characterized. Most RGHs were in clusters with uneven distribution across seven chromosomes. In silico analysis indicated that all 70 RGHs had EST support for gene expression. Phylogenetic analysis classified 58 RGHs into two clades: CNL and TNL. Comparative analysis revealed high-degree sequence homology and synteny in chromosomal locations of these RGH members between the cucumber and melon genomes. Fifty-four molecular markers were developed to delimit 67 of the 70 RGHs, which were integrated into a genetic map through linkage analysis. A 1,681-locus cucumber consensus map including 10 gene loci and spanning 730.0 cM in seven linkage groups was developed by integrating three component maps with a bin-mapping strategy. Physically, 308 scaffolds with 193.2 Mbp total DNA sequences were anchored onto this consensus map that covered 52.6% of the 367 Mbp cucumber genome. Conclusions Cucumber contains relatively few NB-LRR RGHs that are clustered and unevenly distributed in the genome. All RGHs seem to be transcribed and shared significant sequence homology and synteny with the melon genome suggesting conservation of these RGHs in the Cucumis lineage. The 1,681-locus consensus genetic-physical map developed and the RGHs identified and characterized herein are valuable genomics resources that may have many applications such as quantitative trait loci identification, map-based gene cloning, association mapping, marker-assisted selection, as well as assembly of a more complete cucumber genome. PMID:23531125
A first report and complete genome sequence of alfalfa enamovirus from Sudan

USDA-ARS?s Scientific Manuscript database

A full genome sequence of a viral pathogen, provisionally named alfalfa enamovirus 2 (AEV-2), was reconstructed from short reads obtained by Illumina RNA sequencing of alfalfa sample originating from Sudan. Ambiguous nucleotides in the resultant consensus assembly and identity of the predicted virus...
Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples.

PubMed

Quick, Joshua; Grubaugh, Nathan D; Pullan, Steven T; Claro, Ingra M; Smith, Andrew D; Gangavarapu, Karthik; Oliveira, Glenn; Robles-Sikisaka, Refugio; Rogers, Thomas F; Beutler, Nathan A; Burton, Dennis R; Lewis-Ximenez, Lia Laura; de Jesus, Jaqueline Goes; Giovanetti, Marta; Hill, Sarah C; Black, Allison; Bedford, Trevor; Carroll, Miles W; Nunes, Marcio; Alcantara, Luiz Carlos; Sabino, Ester C; Baylis, Sally A; Faria, Nuno R; Loose, Matthew; Simpson, Jared T; Pybus, Oliver G; Andersen, Kristian G; Loman, Nicholas J

2017-06-01

Genome sequencing has become a powerful tool for studying emerging infectious diseases; however, genome sequencing directly from clinical samples (i.e., without isolation and culture) remains challenging for viruses such as Zika, for which metagenomic sequencing methods may generate insufficient numbers of viral reads. Here we present a protocol for generating coding-sequence-complete genomes, comprising an online primer design tool, a novel multiplex PCR enrichment protocol, optimized library preparation methods for the portable MinION sequencer (Oxford Nanopore Technologies) and the Illumina range of instruments, and a bioinformatics pipeline for generating consensus sequences. The MinION protocol does not require an Internet connection for analysis, making it suitable for field applications with limited connectivity. Our method relies on multiplex PCR for targeted enrichment of viral genomes from samples containing as few as 50 genome copies per reaction. Viral consensus sequences can be achieved in 1-2 d by starting with clinical samples and following a simple laboratory workflow. This method has been successfully used by several groups studying Zika virus evolution and is facilitating an understanding of the spread of the virus in the Americas. The protocol can be used to sequence other viral genomes using the online Primal Scheme primer designer software. It is suitable for sequencing either RNA or DNA viruses in the field during outbreaks or as an inexpensive, convenient method for use in the lab.
A universal protocol to generate consensus level genome sequences for foot-and-mouth disease virus and other positive-sense polyadenylated RNA viruses using the Illumina MiSeq.

PubMed

Logan, Grace; Freimanis, Graham L; King, David J; Valdazo-González, Begoña; Bachanek-Bankowska, Katarzyna; Sanderson, Nicholas D; Knowles, Nick J; King, Donald P; Cottam, Eleanor M

2014-09-30

Next-Generation Sequencing (NGS) is revolutionizing molecular epidemiology by providing new approaches to undertake whole genome sequencing (WGS) in diagnostic settings for a variety of human and veterinary pathogens. Previous sequencing protocols have been subject to biases such as those encountered during PCR amplification and cell culture, or are restricted by the need for large quantities of starting material. We describe here a simple and robust methodology for the generation of whole genome sequences on the Illumina MiSeq. This protocol is specific for foot-and-mouth disease virus (FMDV) or other polyadenylated RNA viruses and circumvents both the use of PCR and the requirement for large amounts of initial template. The protocol was successfully validated using five FMDV positive clinical samples from the 2001 epidemic in the United Kingdom, as well as a panel of representative viruses from all seven serotypes. In addition, this protocol was successfully used to recover 94% of an FMDV genome that had previously been identified as cell culture negative. Genome sequences from three other non-FMDV polyadenylated RNA viruses (EMCV, ERAV, VESV) were also obtained with minor protocol amendments. We calculated that a minimum coverage depth of 22 reads was required to produce an accurate consensus sequence for FMDV O. This was achieved in 5 FMDV/O/UKG isolates and the type O FMDV from the serotype panel with the exception of the 5' genomic termini and area immediately flanking the poly(C) region. We have developed a universal WGS method for FMDV and other polyadenylated RNA viruses. This method works successfully from a limited quantity of starting material and eliminates the requirement for genome-specific PCR amplification. This protocol has the potential to generate consensus-level sequences within a routine high-throughput diagnostic environment.
Nucleotide sequence of a cluster of early and late genes in a conserved segment of the vaccinia virus genome.

PubMed Central

Plucienniczak, A; Schroeder, E; Zettlmeissl, G; Streeck, R E

1985-01-01

The nucleotide sequence of a 7.6 kb vaccinia DNA segment from a genomic region conserved among different orthopox virus has been determined. This segment contains a tight cluster of 12 partly overlapping open reading frames most of which can be correlated with previously identified early and late proteins and mRNAs. Regulatory signals used by vaccinia virus have been studied. Presumptive promoter regions are rich in A, T and carry the consensus sequences TATA and AATAA spaced at 20-24 base pairs. Tandem repeats of a CTATTC consensus sequence are proposed to be involved in the termination of early transcription. PMID:2987815
Draft Genome Sequences of Two Novel Salmonella enterica subsp. enterica Strains Isolated from Low-Moisture Foods with Applications in Food Safety Research.

PubMed

Radford, Devon R; Leon-Velarde, Carlos G; Chen, Shu; Hamidi Oskouei, Amir M; Balamurugan, Sampathkumar

2018-03-29

The genomes of two strains of Salmonella enterica subsp. enterica serovar Cubana and serovar Muenchen, isolated from dry hazelnuts and chia seeds, respectively, were sequenced using the Illumina MiSeq platform, assembled de novo using the overlap-layout-consensus method, and aligned to their respective most identical sequence genome scaffolds using MUMMER and BLAST searches. Copyright © 2018 Radford et al.
An ethnically relevant consensus Korean reference genome is a step towards personal reference genomes

PubMed Central

Cho, Yun Sung; Kim, Hyunho; Kim, Hak-Min; Jho, Sungwoong; Jun, JeHoon; Lee, Yong Joo; Chae, Kyun Shik; Kim, Chang Geun; Kim, Sangsoo; Eriksson, Anders; Edwards, Jeremy S.; Lee, Semin; Kim, Byung Chul; Manica, Andrea; Oh, Tae-Kwang; Church, George M.; Bhak, Jong

2016-01-01

Human genomes are routinely compared against a universal reference. However, this strategy could miss population-specific and personal genomic variations, which may be detected more efficiently using an ethnically relevant or personal reference. Here we report a hybrid assembly of a Korean reference genome (KOREF) for constructing personal and ethnic references by combining sequencing and mapping methods. We also build its consensus variome reference, providing information on millions of variants from 40 additional ethnically homogeneous genomes from the Korean Personal Genome Project. We find that the ethnically relevant consensus reference can be beneficial for efficient variant detection. Systematic comparison of human assemblies shows the importance of assembly quality, suggesting the necessity of new technologies to comprehensively map ethnic and personal genomic structure variations. In the era of large-scale population genome projects, the leveraging of ethnicity-specific genome assemblies as well as the human reference genome will accelerate mapping all human genome diversity. PMID:27882922
Bioinformatics prediction of siRNAs as potential antiviral agents against dengue viruses

PubMed Central

Villegas-Rosales, Paula M; Méndez-Tenorio, Alfonso; Ortega-Soto, Elizabeth; Barrón, Blanca L

2012-01-01

Dengue virus (DENV 1-4) represents the major emerging arthropod-borne viral infection in the world. Currently, there is neither an available vaccine nor a specific treatment. Hence, there is a need of antiviral drugs for these viral infections; we describe the prediction of short interfering RNA (siRNA) as potential therapeutic agents against the four DENV serotypes. Our strategy was to carry out a series of multiple alignments using ClustalX program to find conserved sequences among the four DENV serotype genomes to obtain a consensus sequence for siRNAs design. A highly conserved sequence among the four DENV serotypes, located in the encoding sequence for NS4B and NS5 proteins was found. A total of 2,893 complete DENV genomes were downloaded from the NCBI, and after a depuration procedure to identify identical sequences, 220 complete DENV genomes were left. They were edited to select the NS4B and NS5 sequences, which were aligned to obtain a consensus sequence. Three different servers were used for siRNA design, and the resulting siRNAs were aligned to identify the most prevalent sequences. Three siRNAs were chosen, one targeted the genome region that codifies for NS4B protein and the other two; the region for NS5 protein. Predicted secondary structure for DENV genomes was used to demonstrate that the siRNAs were able to target the viral genome forming double stranded structures, necessary to activate the RNA silencing machinery. PMID:22829722

Development of the first consensus genetic map of intermediate wheatgrass (Thinopyrum intermedium) using genotyping-by-sequencing.

PubMed

Kantarski, Traci; Larson, Steve; Zhang, Xiaofei; DeHaan, Lee; Borevitz, Justin; Anderson, James; Poland, Jesse

2017-01-01

Development of the first consensus genetic map of intermediate wheatgrass gives insight into the genome and tools for molecular breeding. Intermediate wheatgrass (Thinopyrum intermedium) has been identified as a candidate for domestication and improvement as a perennial grain, forage, and biofuel crop and is actively being improved by several breeding programs. To accelerate this process using genomics-assisted breeding, efficient genotyping methods and genetic marker reference maps are needed. We present here the first consensus genetic map for intermediate wheatgrass (IWG), which confirms the species' allohexaploid nature (2n = 6x = 42) and homology to Triticeae genomes. Genotyping-by-sequencing was used to identify markers that fit expected segregation ratios and construct genetic maps for 13 heterogeneous parents of seven full-sib families. These maps were then integrated using a linear programming method to produce a consensus map with 21 linkage groups containing 10,029 markers, 3601 of which were present in at least two populations. Each of the 21 linkage groups contained between 237 and 683 markers, cumulatively covering 5061 cM (2891 cM--Kosambi) with an average distance of 0.5 cM between each pair of markers. Through mapping the sequence tags to the diploid (2n = 2x = 14) barley reference genome, we observed high colinearity and synteny between these genomes, with three homoeologous IWG chromosomes corresponding to each of the seven barley chromosomes, and mapped translocations that are known in the Triticeae. The consensus map is a valuable tool for wheat breeders to map important disease-resistance genes within intermediate wheatgrass. These genomic tools can help lead to rapid improvement of IWG and development of high-yielding cultivars of this perennial grain that would facilitate the sustainable intensification of agricultural systems.
The diploid genome sequence of an Asian individual

PubMed Central

Wang, Jun; Wang, Wei; Li, Ruiqiang; Li, Yingrui; Tian, Geng; Goodman, Laurie; Fan, Wei; Zhang, Junqing; Li, Jun; Zhang, Juanbin; Guo, Yiran; Feng, Binxiao; Li, Heng; Lu, Yao; Fang, Xiaodong; Liang, Huiqing; Du, Zhenglin; Li, Dong; Zhao, Yiqing; Hu, Yujie; Yang, Zhenzhen; Zheng, Hancheng; Hellmann, Ines; Inouye, Michael; Pool, John; Yi, Xin; Zhao, Jing; Duan, Jinjie; Zhou, Yan; Qin, Junjie; Ma, Lijia; Li, Guoqing; Yang, Zhentao; Zhang, Guojie; Yang, Bin; Yu, Chang; Liang, Fang; Li, Wenjie; Li, Shaochuan; Li, Dawei; Ni, Peixiang; Ruan, Jue; Li, Qibin; Zhu, Hongmei; Liu, Dongyuan; Lu, Zhike; Li, Ning; Guo, Guangwu; Zhang, Jianguo; Ye, Jia; Fang, Lin; Hao, Qin; Chen, Quan; Liang, Yu; Su, Yeyang; san, A.; Ping, Cuo; Yang, Shuang; Chen, Fang; Li, Li; Zhou, Ke; Zheng, Hongkun; Ren, Yuanyuan; Yang, Ling; Gao, Yang; Yang, Guohua; Li, Zhuo; Feng, Xiaoli; Kristiansen, Karsten; Wong, Gane Ka-Shu; Nielsen, Rasmus; Durbin, Richard; Bolund, Lars; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian

2009-01-01

Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics. PMID:18987735
The role of recombination in the origin and evolution of Alu subfamilies.

PubMed

Teixeira-Silva, Ana; Silva, Raquel M; Carneiro, João; Amorim, António; Azevedo, Luísa

2013-01-01

Alus are the most abundant and successful short interspersed nuclear elements found in primate genomes. In humans, they represent about 10% of the genome, although few are retrotransposition-competent and are clustered into subfamilies according to the source gene from which they evolved. Recombination between them can lead to genomic rearrangements of clinical and evolutionary significance. In this study, we have addressed the role of recombination in the origin of chimeric Alu source genes by the analysis of all known consensus sequences of human Alus. From the allelic diversity of Alu consensus sequences, validated in extant elements resulting from whole genome searches, distinct events of recombination were detected in the origin of particular subfamilies of AluS and AluY source genes. These results demonstrate that at least two subfamilies are likely to have emerged from ectopic Alu-Alu recombination, which stimulates further research regarding the potential of chimeric active Alus to punctuate the genome.
Fast and accurate de novo genome assembly from long uncorrected reads

PubMed Central

Vaser, Robert; Sović, Ivan; Nagarajan, Niranjan

2017-01-01

The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource-intensive error-correction and consensus-generation steps to obtain high-quality assemblies. We show that the error-correction step can be omitted and that high-quality consensus sequences can be generated efficiently with a SIMD-accelerated, partial-order alignment–based, stand-alone consensus module called Racon. Based on tests with PacBio and Oxford Nanopore data sets, we show that Racon coupled with miniasm enables consensus genomes with similar or better quality than state-of-the-art methods while being an order of magnitude faster. PMID:28100585
BAUM: improving genome assembly by adaptive unique mapping and local overlap-layout-consensus approach.

PubMed

Wang, Anqi; Wang, Zhanyu; Li, Zheng; Li, Lei M

2018-06-15

It is highly desirable to assemble genomes of high continuity and consistency at low cost. The current bottleneck of draft genome continuity using the second generation sequencing (SGS) reads is primarily caused by uncertainty among repetitive sequences. Even though the single-molecule real-time sequencing technology is very promising to overcome the uncertainty issue, its relatively high cost and error rate add burden on budget or computation. Many long-read assemblers take the overlap-layout-consensus (OLC) paradigm, which is less sensitive to sequencing errors, heterozygosity and variability of coverage. However, current assemblers of SGS data do not sufficiently take advantage of the OLC approach. Aiming at minimizing uncertainty, the proposed method BAUM, breaks the whole genome into regions by adaptive unique mapping; then the local OLC is used to assemble each region in parallel. BAUM can (i) perform reference-assisted assembly based on the genome of a close species (ii) or improve the results of existing assemblies that are obtained based on short or long sequencing reads. The tests on two eukaryote genomes, a wild rice Oryza longistaminata and a parrot Melopsittacus undulatus, show that BAUM achieved substantial improvement on genome size and continuity. Besides, BAUM reconstructed a considerable amount of repetitive regions that failed to be assembled by existing short read assemblers. We also propose statistical approaches to control the uncertainty in different steps of BAUM. http://www.zhanyuwang.xin/wordpress/index.php/2017/07/21/baum. Supplementary data are available at Bioinformatics online.
Single molecule sequencing of the M13 virus genome without amplification

PubMed Central

Zhao, Luyang; Deng, Liwei; Li, Gailing; Jin, Huan; Cai, Jinsen; Shang, Huan; Li, Yan; Wu, Haomin; Xu, Weibin; Zeng, Lidong; Zhang, Renli; Zhao, Huan; Wu, Ping; Zhou, Zhiliang; Zheng, Jiao; Ezanno, Pierre; Yang, Andrew X.; Yan, Qin; Deem, Michael W.; He, Jiankui

2017-01-01

Next generation sequencing (NGS) has revolutionized life sciences research. However, GC bias and costly, time-intensive library preparation make NGS an ill fit for increasing sequencing demands in the clinic. A new class of third-generation sequencing platforms has arrived to meet this need, capable of directly measuring DNA and RNA sequences at the single-molecule level without amplification. Here, we use the new GenoCare single-molecule sequencing platform from Direct Genomics to sequence the genome of the M13 virus. Our platform detects single-molecule fluorescence by total internal reflection microscopy, with sequencing-by-synthesis chemistry. We sequenced the genome of M13 to a depth of 316x, with 100% coverage. We determined a consensus sequence accuracy of 100%. In contrast to GC bias inherent to NGS results, we demonstrated that our single-molecule sequencing method yields minimal GC bias. PMID:29253901
Single molecule sequencing of the M13 virus genome without amplification.

PubMed

Zhao, Luyang; Deng, Liwei; Li, Gailing; Jin, Huan; Cai, Jinsen; Shang, Huan; Li, Yan; Wu, Haomin; Xu, Weibin; Zeng, Lidong; Zhang, Renli; Zhao, Huan; Wu, Ping; Zhou, Zhiliang; Zheng, Jiao; Ezanno, Pierre; Yang, Andrew X; Yan, Qin; Deem, Michael W; He, Jiankui

2017-01-01

Next generation sequencing (NGS) has revolutionized life sciences research. However, GC bias and costly, time-intensive library preparation make NGS an ill fit for increasing sequencing demands in the clinic. A new class of third-generation sequencing platforms has arrived to meet this need, capable of directly measuring DNA and RNA sequences at the single-molecule level without amplification. Here, we use the new GenoCare single-molecule sequencing platform from Direct Genomics to sequence the genome of the M13 virus. Our platform detects single-molecule fluorescence by total internal reflection microscopy, with sequencing-by-synthesis chemistry. We sequenced the genome of M13 to a depth of 316x, with 100% coverage. We determined a consensus sequence accuracy of 100%. In contrast to GC bias inherent to NGS results, we demonstrated that our single-molecule sequencing method yields minimal GC bias.
Virus genome dynamics under different propagation pressures: reconstruction of whole genome haplotypes of West Nile viruses from NGS data.

PubMed

Kortenhoeven, Cornell; Joubert, Fourie; Bastos, Armanda D S; Abolnik, Celia

2015-02-22

Extensive focus is placed on the comparative analyses of consensus genotypes in the study of West Nile virus (WNV) emergence. Few studies account for genetic change in the underlying WNV quasispecies population variants. These variants are not discernable in the consensus genome at the time of emergence, and the maintenance of mutation-selection equilibria of population variants is greatly underestimated. The emergence of lineage 1 WNV strains has been studied extensively, but recent epidemics caused by lineage 2 WNV strains in Hungary, Austria, Greece and Italy emphasizes the increasing importance of this lineage to public health. In this study we explored the quasispecies dynamics of minority variants that contribute to cell-tropism and host determination, i.e. the ability to infect different cell types or cells from different species from Next Generation Sequencing (NGS) data of a historic lineage 2 WNV strain. Minority variants contributing to host cell membrane association persist in the viral population without contributing to the genetic change in the consensus genome. Minority variants are shown to maintain a stable mutation-selection equilibrium under positive selection, particularly in the capsid gene region. This study is the first to infer positive selection and the persistence of WNV haplotype variants that contribute to viral fitness without accompanying genetic change in the consensus genotype, documented solely from NGS sequence data. The approach used in this study streamlines the experimental design seeking viral minority variants accurately from NGS data whilst minimizing the influence of associated sequence error.
Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.

PubMed

Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi

2017-07-01

PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.
The Role of Recombination in the Origin and Evolution of Alu Subfamilies

PubMed Central

Teixeira-Silva, Ana; Silva, Raquel M.; Carneiro, João; Amorim, António; Azevedo, Luísa

2013-01-01

Alus are the most abundant and successful short interspersed nuclear elements found in primate genomes. In humans, they represent about 10% of the genome, although few are retrotransposition-competent and are clustered into subfamilies according to the source gene from which they evolved. Recombination between them can lead to genomic rearrangements of clinical and evolutionary significance. In this study, we have addressed the role of recombination in the origin of chimeric Alu source genes by the analysis of all known consensus sequences of human Alus. From the allelic diversity of Alu consensus sequences, validated in extant elements resulting from whole genome searches, distinct events of recombination were detected in the origin of particular subfamilies of AluS and AluY source genes. These results demonstrate that at least two subfamilies are likely to have emerged from ectopic Alu-Alu recombination, which stimulates further research regarding the potential of chimeric active Alus to punctuate the genome. PMID:23750218
A High Quality Draft Consensus Sequence of the Genome of a Heterozygous Grapevine Variety

PubMed Central

Cartwright, Dustin A.; Cestaro, Alessandro; Pruss, Dmitry; Pindo, Massimo; FitzGerald, Lisa M.; Vezzulli, Silvia; Reid, Julia; Malacarne, Giulia; Iliev, Diana; Coppola, Giuseppina; Wardell, Bryan; Micheletti, Diego; Macalma, Teresita; Facci, Marco; Mitchell, Jeff T.; Perazzolli, Michele; Eldredge, Glenn; Gatto, Pamela; Oyzerski, Rozan; Moretto, Marco; Gutin, Natalia; Stefanini, Marco; Chen, Yang; Segala, Cinzia; Davenport, Christine; Demattè, Lorenzo; Mraz, Amy; Battilana, Juri; Stormo, Keith; Costa, Fabrizio; Tao, Quanzhou; Si-Ammour, Azeddine; Harkins, Tim; Lackey, Angie; Perbost, Clotilde; Taillon, Bruce; Stella, Alessandra; Solovyev, Victor; Fawcett, Jeffrey A.; Sterck, Lieven; Vandepoele, Klaas; Grando, Stella M.; Toppo, Stefano; Moser, Claudio; Lanchbury, Jerry; Bogden, Robert; Skolnick, Mark; Sgaramella, Vittorio; Bhatnagar, Satish K.; Fontana, Paolo; Gutin, Alexander; Van de Peer, Yves; Salamini, Francesco; Viola, Roberto

2007-01-01

Background Worldwide, grapes and their derived products have a large market. The cultivated grape species Vitis vinifera has potential to become a model for fruit trees genetics. Like many plant species, it is highly heterozygous, which is an additional challenge to modern whole genome shotgun sequencing. In this paper a high quality draft genome sequence of a cultivated clone of V. vinifera Pinot Noir is presented. Principal Findings We estimate the genome size of V. vinifera to be 504.6 Mb. Genomic sequences corresponding to 477.1 Mb were assembled in 2,093 metacontigs and 435.1 Mb were anchored to the 19 linkage groups (LGs). The number of predicted genes is 29,585, of which 96.1% were assigned to LGs. This assembly of the grape genome provides candidate genes implicated in traits relevant to grapevine cultivation, such as those influencing wine quality, via secondary metabolites, and those connected with the extreme susceptibility of grape to pathogens. Single nucleotide polymorphism (SNP) distribution was consistent with a diffuse haplotype structure across the genome. Of around 2,000,000 SNPs, 1,751,176 were mapped to chromosomes and one or more of them were identified in 86.7% of anchored genes. The relative age of grape duplicated genes was estimated and this made possible to reveal a relatively recent Vitis-specific large scale duplication event concerning at least 10 chromosomes (duplication not reported before). Conclusions Sanger shotgun sequencing and highly efficient sequencing by synthesis (SBS), together with dedicated assembly programs, resolved a complex heterozygous genome. A consensus sequence of the genome and a set of mapped marker loci were generated. Homologous chromosomes of Pinot Noir differ by 11.2% of their DNA (hemizygous DNA plus chromosomal gaps). SNP markers are offered as a tool with the potential of introducing a new era in the molecular breeding of grape. PMID:18094749
Core-SINE blocks comprise a large fraction of monotreme genomes; implications for vertebrate chromosome evolution.

PubMed

Kirby, Patrick J; Greaves, Ian K; Koina, Edda; Waters, Paul D; Marshall Graves, Jennifer A

2007-01-01

The genomes of the egg-laying platypus and echidna are of particular interest because monotremes are the most basal mammal group. The chromosomal distribution of an ancient family of short interspersed repeats (SINEs), the core-SINEs, was investigated to better understand monotreme genome organization and evolution. Previous studies have identified the core-SINE as the predominant SINE in the platypus genome, and in this study we quantified, characterized and localized subfamilies. Dot blot analysis suggested that a very large fraction (32% of the platypus and 16% of the echidna genome) is composed of Mon core-SINEs. Core-SINE-specific primers were used to amplify PCR products from platypus and echidna genomic DNA. Sequence analysis suggests a common consensus sequence Mon 1-B, shared by platypus and echidna, as well as platypus-specific Mon 1-C and echidna specific Mon 1-D consensus sequences. FISH mapping of the Mon core-SINE products to platypus metaphase spreads demonstrates that the Mon-1C subfamily is responsible for the striking Mon core-SINE accumulation in the distal regions of the six large autosomal pairs and the largest X chromosome. This unusual distribution highlights the dichotomy between the seven large chromosome pairs and the 19 smaller pairs in the monotreme karyotype, which has some similarity to the macro- and micro-chromosomes of birds and reptiles, and suggests that accumulation of repetitive sequences may have enlarged small chromosomes in an ancestral vertebrate. In the forthcoming sequence of the platypus genome there are still large gaps, and the extensive Mon core-SINE accumulation on the distal regions of the six large autosomal pairs may provide one explanation for this missing sequence.
From cheek swabs to consensus sequences: an A to Z protocol for high-throughput DNA sequencing of complete human mitochondrial genomes

PubMed Central

2014-01-01

Background Next-generation DNA sequencing (NGS) technologies have made huge impacts in many fields of biological research, but especially in evolutionary biology. One area where NGS has shown potential is for high-throughput sequencing of complete mtDNA genomes (of humans and other animals). Despite the increasing use of NGS technologies and a better appreciation of their importance in answering biological questions, there remain significant obstacles to the successful implementation of NGS-based projects, especially for new users. Results Here we present an ‘A to Z’ protocol for obtaining complete human mitochondrial (mtDNA) genomes – from DNA extraction to consensus sequence. Although designed for use on humans, this protocol could also be used to sequence small, organellar genomes from other species, and also nuclear loci. This protocol includes DNA extraction, PCR amplification, fragmentation of PCR products, barcoding of fragments, sequencing using the 454 GS FLX platform, and a complete bioinformatics pipeline (primer removal, reference-based mapping, output of coverage plots and SNP calling). Conclusions All steps in this protocol are designed to be straightforward to implement, especially for researchers who are undertaking next-generation sequencing for the first time. The molecular steps are scalable to large numbers (hundreds) of individuals and all steps post-DNA extraction can be carried out in 96-well plate format. Also, the protocol has been assembled so that individual ‘modules’ can be swapped out to suit available resources. PMID:24460871
Reliable Detection of Herpes Simplex Virus Sequence Variation by High-Throughput Resequencing.

PubMed

Morse, Alison M; Calabro, Kaitlyn R; Fear, Justin M; Bloom, David C; McIntyre, Lauren M

2017-08-16

High-throughput sequencing (HTS) has resulted in data for a number of herpes simplex virus (HSV) laboratory strains and clinical isolates. The knowledge of these sequences has been critical for investigating viral pathogenicity. However, the assembly of complete herpesviral genomes, including HSV, is complicated due to the existence of large repeat regions and arrays of smaller reiterated sequences that are commonly found in these genomes. In addition, the inherent genetic variation in populations of isolates for viruses and other microorganisms presents an additional challenge to many existing HTS sequence assembly pipelines. Here, we evaluate two approaches for the identification of genetic variants in HSV1 strains using Illumina short read sequencing data. The first, a reference-based approach, identifies variants from reads aligned to a reference sequence and the second, a de novo assembly approach, identifies variants from reads aligned to de novo assembled consensus sequences. Of critical importance for both approaches is the reduction in the number of low complexity regions through the construction of a non-redundant reference genome. We compared variants identified in the two methods. Our results indicate that approximately 85% of variants are identified regardless of the approach. The reference-based approach to variant discovery captures an additional 15% representing variants divergent from the HSV1 reference possibly due to viral passage. Reference-based approaches are significantly less labor-intensive and identify variants across the genome where de novo assembly-based approaches are limited to regions where contigs have been successfully assembled. In addition, regions of poor quality assembly can lead to false variant identification in de novo consensus sequences. For viruses with a well-assembled reference genome, a reference-based approach is recommended.
Cloning, sequencing and characterization of lipase genes from a polyhydroxyalkanoate- (PHA-) synthesizing Pseudomonas resinovorans

USDA-ARS?s Scientific Manuscript database

Lipase (lip) and lipase-specific foldase (lif) genes of a biodegradable polyhydroxyalkanoate- (PHA-) synthesizing Pseudomonas resinovorans NRRL B-2649 were cloned using primers based on consensus sequences, followed by PCR-based genome walking. Sequence analyses showed a putative Lip gene-product (...
A consensus linkage map of lentil based on DArT markers from three RIL mapping populations.

PubMed

Ates, Duygu; Aldemir, Secil; Alsaleh, Ahmad; Erdogmus, Semih; Nemli, Seda; Kahriman, Abdullah; Ozkan, Hakan; Vandenberg, Albert; Tanyolac, Bahattin

2018-01-01

Lentil (Lens culinaris ssp. culinaris Medikus) is a diploid (2n = 2x = 14), self-pollinating grain legume with a haploid genome size of about 4 Gbp and is grown throughout the world with current annual production of 4.9 million tonnes. A consensus map of lentil (Lens culinaris ssp. culinaris Medikus) was constructed using three different lentils recombinant inbred line (RIL) populations, including "CDC Redberry" x "ILL7502" (LR8), "ILL8006" x "CDC Milestone" (LR11) and "PI320937" x "Eston" (LR39). The lentil consensus map was composed of 9,793 DArT markers, covered a total of 977.47 cM with an average distance of 0.10 cM between adjacent markers and constructed 7 linkage groups representing 7 chromosomes of the lentil genome. The consensus map had no gap larger than 12.67 cM and only 5 gaps were found to be between 12.67 cM and 6.0 cM (on LG3 and LG4). The localization of the SNP markers on the lentil consensus map were in general consistent with their localization on the three individual genetic linkage maps and the lentil consensus map has longer map length, higher marker density and shorter average distance between the adjacent markers compared to the component linkage maps. This high-density consensus map could provide insight into the lentil genome. The consensus map could also help to construct a physical map using a Bacterial Artificial Chromosome library and map based cloning studies. Sequence information of DArT may help localization of orientation scaffolds from Next Generation Sequencing data.
A consensus linkage map of lentil based on DArT markers from three RIL mapping populations

PubMed Central

Ates, Duygu; Aldemir, Secil; Alsaleh, Ahmad; Erdogmus, Semih; Nemli, Seda; Kahriman, Abdullah; Ozkan, Hakan; Vandenberg, Albert

2018-01-01

Background Lentil (Lens culinaris ssp. culinaris Medikus) is a diploid (2n = 2x = 14), self-pollinating grain legume with a haploid genome size of about 4 Gbp and is grown throughout the world with current annual production of 4.9 million tonnes. Materials and methods A consensus map of lentil (Lens culinaris ssp. culinaris Medikus) was constructed using three different lentils recombinant inbred line (RIL) populations, including “CDC Redberry” x “ILL7502” (LR8), “ILL8006” x “CDC Milestone” (LR11) and “PI320937” x “Eston” (LR39). Results The lentil consensus map was composed of 9,793 DArT markers, covered a total of 977.47 cM with an average distance of 0.10 cM between adjacent markers and constructed 7 linkage groups representing 7 chromosomes of the lentil genome. The consensus map had no gap larger than 12.67 cM and only 5 gaps were found to be between 12.67 cM and 6.0 cM (on LG3 and LG4). The localization of the SNP markers on the lentil consensus map were in general consistent with their localization on the three individual genetic linkage maps and the lentil consensus map has longer map length, higher marker density and shorter average distance between the adjacent markers compared to the component linkage maps. Conclusion This high-density consensus map could provide insight into the lentil genome. The consensus map could also help to construct a physical map using a Bacterial Artificial Chromosome library and map based cloning studies. Sequence information of DArT may help localization of orientation scaffolds from Next Generation Sequencing data. PMID:29351563
A Consensus Genetic Map for Pinus taeda and Pinus elliottii and Extent of Linkage Disequilibrium in Two Genotype-Phenotype Discovery Populations of Pinus taeda

PubMed Central

Westbrook, Jared W.; Chhatre, Vikram E.; Wu, Le-Shin; Chamala, Srikar; Neves, Leandro Gomide; Muñoz, Patricio; Martínez-García, Pedro J.; Neale, David B.; Kirst, Matias; Mockaitis, Keithanne; Nelson, C. Dana; Peter, Gary F.; Echt, Craig S.

2015-01-01

A consensus genetic map for Pinus taeda (loblolly pine) and Pinus elliottii (slash pine) was constructed by merging three previously published P. taeda maps with a map from a pseudo-backcross between P. elliottii and P. taeda. The consensus map positioned 3856 markers via genotyping of 1251 individuals from four pedigrees. It is the densest linkage map for a conifer to date. Average marker spacing was 0.6 cM and total map length was 2305 cM. Functional predictions of mapped genes were improved by aligning expressed sequence tags used for marker discovery to full-length P. taeda transcripts. Alignments to the P. taeda genome mapped 3305 scaffold sequences onto 12 linkage groups. The consensus genetic map was used to compare the genome-wide linkage disequilibrium in a population of distantly related P. taeda individuals (ADEPT2) used for association genetic studies and a multiple-family pedigree used for genomic selection (CCLONES). The prevalence and extent of LD was greater in CCLONES as compared to ADEPT2; however, extended LD with LGs or between LGs was rare in both populations. The average squared correlations, r2, between SNP alleles less than 1 cM apart were less than 0.05 in both populations and r2 did not decay substantially with genetic distance. The consensus map and analysis of linkage disequilibrium establish a foundation for comparative association mapping and genomic selection in P. taeda and P. elliottii. PMID:26068575
Investigation of DNA sequence recognition by a streptomycete MarR family transcriptional regulator through surface plasmon resonance and X-ray crystallography

PubMed Central

Stevenson, Clare E. M.; Assaad, Aoun; Chandra, Govind; Le, Tung B. K.; Greive, Sandra J.; Bibb, Mervyn J.; Lawson, David M.

2013-01-01

Consistent with their complex lifestyles and rich secondary metabolite profiles, the genomes of streptomycetes encode a plethora of transcription factors, the vast majority of which are uncharacterized. Herein, we use Surface Plasmon Resonance (SPR) to identify and delineate putative operator sites for SCO3205, a MarR family transcriptional regulator from Streptomyces coelicolor that is well represented in sequenced actinomycete genomes. In particular, we use a novel SPR footprinting approach that exploits indirect ligand capture to vastly extend the lifetime of a standard streptavidin SPR chip. We define two operator sites upstream of sco3205 and a pseudopalindromic consensus sequence derived from these enables further potential operator sites to be identified in the S. coelicolor genome. We evaluate each of these through SPR and test the importance of the conserved bases within the consensus sequence. Informed by these results, we determine the crystal structure of a SCO3205-DNA complex at 2.8 Å resolution, enabling molecular level rationalization of the SPR data. Taken together, our observations support a DNA recognition mechanism involving both direct and indirect sequence readout. PMID:23748564
PigGIS: Pig Genomic Informatics System

PubMed Central

Ruan, Jue; Guo, Yiran; Li, Heng; Hu, Yafeng; Song, Fei; Huang, Xin; Kristiensen, Karsten; Bolund, Lars; Wang, Jun

2007-01-01

Pig Genomic Information System (PigGIS) is a web-based depository of pig (Sus scrofa) genomic learning mainly engineered for biomedical research to locate pig genes from their human homologs and position single nucleotide polymorphisms (SNPs) in different pig populations. It utilizes a variety of sequence data, including whole genome shotgun (WGS) reads and expressed sequence tags (ESTs), and achieves a successful mapping solution to the low-coverage genome problem. With the data presently available, we have identified a total of 15 700 pig consensus sequences covering 18.5 Mb of the homologous human exons. We have also recovered 18 700 SNPs and 20 800 unique 60mer oligonucleotide probes for future pig genome analyses. PigGIS can be freely accessed via the web at and . PMID:17090590

Mining a database of single amplified genomes from Red Sea brine pool extremophiles—improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA)

PubMed Central

Grötzinger, Stefan W.; Alam, Intikhab; Ba Alawi, Wail; Bajic, Vladimir B.; Stingl, Ulrich; Eppinger, Jörg

2014-01-01

Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile's genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available through the INDIGO website. PMID:24778629
CoVaCS: a consensus variant calling system.

PubMed

Chiara, Matteo; Gioiosa, Silvia; Chillemi, Giovanni; D'Antonio, Mattia; Flati, Tiziano; Picardi, Ernesto; Zambelli, Federico; Horner, David Stephen; Pesole, Graziano; Castrignanò, Tiziana

2018-02-05

The advent and ongoing development of next generation sequencing technologies (NGS) has led to a rapid increase in the rate of human genome re-sequencing data, paving the way for personalized genomics and precision medicine. The body of genome resequencing data is progressively increasing underlining the need for accurate and time-effective bioinformatics systems for genotyping - a crucial prerequisite for identification of candidate causal mutations in diagnostic screens. Here we present CoVaCS, a fully automated, highly accurate system with a web based graphical interface for genotyping and variant annotation. Extensive tests on a gold standard benchmark data-set -the NA12878 Illumina platinum genome- confirm that call-sets based on our consensus strategy are completely in line with those attained by similar command line based approaches, and far more accurate than call-sets from any individual tool. Importantly our system exhibits better sensitivity and higher specificity than equivalent commercial software. CoVaCS offers optimized pipelines integrating state of the art tools for variant calling and annotation for whole genome sequencing (WGS), whole-exome sequencing (WES) and target-gene sequencing (TGS) data. The system is currently hosted at Cineca, and offers the speed of a HPC computing facility, a crucial consideration when large numbers of samples must be analysed. Importantly, all the analyses are performed automatically allowing high reproducibility of the results. As such, we believe that CoVaCS can be a valuable tool for the analysis of human genome resequencing studies. CoVaCS is available at: https://bioinformatics.cineca.it/covacs .
Dfam: a database of repetitive DNA based on profile hidden Markov models.

PubMed

Wheeler, Travis J; Clements, Jody; Eddy, Sean R; Hubley, Robert; Jones, Thomas A; Jurka, Jerzy; Smit, Arian F A; Finn, Robert D

2013-01-01

We present a database of repetitive DNA elements, called Dfam (http://dfam.janelia.org). Many genomes contain a large fraction of repetitive DNA, much of which is made up of remnants of transposable elements (TEs). Accurate annotation of TEs enables research into their biology and can shed light on the evolutionary processes that shape genomes. Identification and masking of TEs can also greatly simplify many downstream genome annotation and sequence analysis tasks. The commonly used TE annotation tools RepeatMasker and Censor depend on sequence homology search tools such as cross_match and BLAST variants, as well as Repbase, a collection of known TE families each represented by a single consensus sequence. Dfam contains entries corresponding to all Repbase TE entries for which instances have been found in the human genome. Each Dfam entry is represented by a profile hidden Markov model, built from alignments generated using RepeatMasker and Repbase. When used in conjunction with the hidden Markov model search tool nhmmer, Dfam produces a 2.9% increase in coverage over consensus sequence search methods on a large human benchmark, while maintaining low false discovery rates, and coverage of the full human genome is 54.5%. The website provides a collection of tools and data views to support improved TE curation and annotation efforts. Dfam is also available for download in flat file format or in the form of MySQL table dumps.
Integrated consensus genetic and physical maps of flax (Linum usitatissimum L.).

PubMed

Cloutier, Sylvie; Ragupathy, Raja; Miranda, Evelyn; Radovanovic, Natasa; Reimer, Elsa; Walichnowski, Andrzej; Ward, Kerry; Rowland, Gordon; Duguid, Scott; Banik, Mitali

2012-12-01

Three linkage maps of flax (Linum usitatissimum L.) were constructed from populations CDC Bethune/Macbeth, E1747/Viking and SP2047/UGG5-5 containing between 385 and 469 mapped markers each. The first consensus map of flax was constructed incorporating 770 markers based on 371 shared markers including 114 that were shared by all three populations and 257 shared between any two populations. The 15 linkage group map corresponds to the haploid number of chromosomes of this species. The marker order of the consensus map was largely collinear in all three individual maps but a few local inversions and marker rearrangements spanning short intervals were observed. Segregation distortion was present in all linkage groups which contained 1-52 markers displaying non-Mendelian segregation. The total length of the consensus genetic map is 1,551 cM with a mean marker density of 2.0 cM. A total of 670 markers were anchored to 204 of the 416 fingerprinted contigs of the physical map corresponding to ~274 Mb or 74 % of the estimated flax genome size of 370 Mb. This high resolution consensus map will be a resource for comparative genomics, genome organization, evolution studies and anchoring of the whole genome shotgun sequence.
Structure of genes and an insertion element in the methane producing archaebacterium Methanobrevibacter smithii.

PubMed

Hamilton, P T; Reeve, J N

1985-01-01

DNA fragments cloned from the methanogenic archaebacterium Methanobrevibacter smithii which complement mutations in the purE and proC genes of E. coli have been sequenced. Sequence analyses, transposon mutagenesis and expression in E. coli minicells indicate that purE and proC complementations result from the synthesis of M. smithii polypeptides with molecular weights of 36,697 and 27,836 respectively. The encoding genes appear to be located in operons. The M. smithii genome contains 69% A/T basepairs (bp) which is reflected in unusual codon usages and intergenic regions containing approximately 85% A/T bp. An insertion element, designated ISM1, was found within the cloned M. smithii DNA located adjacent to the proC complementing region. ISM1 is 1381 bp in length, has 29 bp terminal inverted repeat sequences and contains one major ORF encoded in 87% of the ISM1 sequence. ISM1 is mobile, present in approximately 10 copies per genome and integration duplicates 8 bp at the site of insertion. The duplicated sequences show homology with sequences within the 29 bp terminal repeat sequence of ISM1. Comparison of our data with sequences from halophilic archaebacteria suggests that 5'GAANTTTCA and 5'TTTTAATATAAA may be consensus promoter sequences for archaebacteria. These sequences closely resemble the consensus sequences which precede Drosophila heat-shock genes (Pelham 1982; Davidson et al. 1983). Methanogens appear to employ the eubacterial system of mRNA: 16SrRNA hybridization to ensure initiation of translation; the consensus ribosome binding sequence is 5'AGGTGA.
DNA breathing dynamics distinguish binding from nonbinding consensus sites for transcription factor YY1 in cells.

PubMed

Alexandrov, Boian S; Fukuyo, Yayoi; Lange, Martin; Horikoshi, Nobuo; Gelev, Vladimir; Rasmussen, Kim Ø; Bishop, Alan R; Usheva, Anny

2012-11-01

The genome-wide mapping of the major gene expression regulators, the transcription factors (TFs) and their DNA binding sites, is of great importance for describing cellular behavior and phenotypic diversity. Presently, the methods for prediction of genomic TF binding produce a large number of false positives, most likely due to insufficient description of the physiochemical mechanisms of protein-DNA binding. Growing evidence suggests that, in the cell, the double-stranded DNA (dsDNA) is subject to local transient strands separations (breathing) that contribute to genomic functions. By using site-specific chromatin immunopecipitations, gel shifts, BIOBASE data, and our model that accurately describes the melting behavior and breathing dynamics of dsDNA we report a specific DNA breathing profile found at YY1 binding sites in cells. We find that the genomic flanking sequence variations and SNPs, may exert long-range effects on DNA dynamics and predetermine YY1 binding. The ubiquitous TF YY1 has a fundamental role in essential biological processes by activating, initiating or repressing transcription depending upon the sequence context it binds. We anticipate that consensus binding sequences together with the related DNA dynamics profile may significantly improve the accuracy of genomic TF binding sites and TF binding-related functional SNPs.
The challenge of informed consent and return of results in translational genomics: empirical analysis and recommendations.

PubMed

Henderson, Gail E; Wolf, Susan M; Kuczynski, Kristine J; Joffe, Steven; Sharp, Richard R; Parsons, D Williams; Knoppers, Bartha M; Yu, Joon-Ho; Appelbaum, Paul S

2014-01-01

As exome and genome sequencing move into clinical application, questions surround how to elicit consent and handle potential return of individual genomic results. This study analyzes nine consent forms used in NIH-funded sequencing studies. Content analysis reveals considerable heterogeneity, including in defining results that may be returned, identifying potential benefits and risks of return, protecting privacy, addressing placement of results in the medical record, and data-sharing. In response to lack of consensus, we offer recommendations. © 2014 American Society of Law, Medicine & Ethics, Inc.
Generation and Characterization of HIV-1 Transmitted and Founder Virus Consensus Sequence from Intravenous Drug Users in Xinjiang, China.

PubMed

Li, Fan; Ma, Liying; Feng, Yi; Hu, Jing; Ni, Na; Ruan, Yuhua; Shao, Yiming

2017-06-01

HIV-1 transmission in intravenous drug users (IDUs) has been characterized by high genetic multiplicity and suggests a greater challenge for HIV-1 infection blocking. We investigated a total of 749 sequences of full-length gp160 gene obtained by single genome sequencing (SGS) from 22 HIV-1 early infected IDUs in Xinjiang province, northwest China, and generated a transmitted and founder virus (T/F virus) consensus sequence (IDU.CON). The T/F virus was classified as subtype CRF07_BC and predicted to be CCR5-tropic virus. The variable region (V1, V2, and V4 loop) of IDU.CON showed length variation compared with the heterosexual T/F virus consensus sequence (HSX.CON) and homosexual T/F virus consensus sequence (MSM.CON). A total of 26 N-linked glycosylation sites were discovered in the IDU.CON sequence, which is less than that of MSM.CON and HSX.CON. Characterization of T/F virus from IDUs highlights the genetic make-up and complexity of virus near the moment of transmission or in early infection preceding systemic dissemination and is important toward the development of an effective HIV-1 preventive methods, including vaccines.
Guinea Pig ID-Like Families of SINEs

PubMed Central

Kass, David H.; Schaetz, Brian A.; Beitler, Lindsey; Bonney, Kevin M.; Jamison, Nicole; Wiesner, Cathy

2009-01-01

Previous studies have indicated a paucity of SINEs within the genomes of the guinea pig and nutria, representatives of the Hystricognathi suborder of rodents. More recent work has shown that the guinea pig genome contains a large number of B1 elements, expanding to various levels among different rodents. In this work we utilized A–B PCR and screened GenBank with sequences from isolated clones to identify potentially uncharacterized SINEs within the guinea pig genome, and identified numerous sequences with a high degree of similarity (>92%) specific to the guinea pig. The presence of A-tails and flanking direct repeats associated with these sequences supported the identification of a full-length SINE, with a consensus sequence notably distinct from other rodent SINEs. Although most similar to the ID SINE, it clearly was not derived from the known ID master gene (BC1), hence we refer to this element as guinea pig ID-like (GPIDL). Using the consensus to screen the guinea pig genomic database (Assembly CavPor2) with Ensembl BlastView, we estimated at least 100,000 copies, which contrasts markedly to just over 100 copies of ID elements. Additionally we provided evidence of recent integrations of GPIDL as two of seven analyzed conserved GPIDL-containing loci demonstrated presence/absence variants in Cavia porcellus and C. aperea. Using intra-IDL PCR and sequence analyses we also provide evidence that GPIDL is derived from a hystricognath-specific SINE family. These results demonstrate that this SINE family continues to contribute to the dynamics of genomes of hystricognath rodents. PMID:19232383
Guinea pig ID-like families of SINEs.

PubMed

Kass, David H; Schaetz, Brian A; Beitler, Lindsey; Bonney, Kevin M; Jamison, Nicole; Wiesner, Cathy

2009-05-01

Previous studies have indicated a paucity of SINEs within the genomes of the guinea pig and nutria, representatives of the Hystricognathi suborder of rodents. More recent work has shown that the guinea pig genome contains a large number of B1 elements, expanding to various levels among different rodents. In this work we utilized A-B PCR and screened GenBank with sequences from isolated clones to identify potentially uncharacterized SINEs within the guinea pig genome, and identified numerous sequences with a high degree of similarity (>92%) specific to the guinea pig. The presence of A-tails and flanking direct repeats associated with these sequences supported the identification of a full-length SINE, with a consensus sequence notably distinct from other rodent SINEs. Although most similar to the ID SINE, it clearly was not derived from the known ID master gene (BC1), hence we refer to this element as guinea pig ID-like (GPIDL). Using the consensus to screen the guinea pig genomic database (Assembly CavPor2) with Ensembl BlastView, we estimated at least 100,000 copies, which contrasts markedly to just over 100 copies of ID elements. Additionally we provided evidence of recent integrations of GPIDL as two of seven analyzed conserved GPIDL-containing loci demonstrated presence/absence variants in Cavia porcellus and C. aperea. Using intra-IDL PCR and sequence analyses we also provide evidence that GPIDL is derived from a hystricognath-specific SINE family. These results demonstrate that this SINE family continues to contribute to the dynamics of genomes of hystricognath rodents.
Complete genome sequences of two strains of Treponema pallidum subsp. pertenue from Ghana, Africa: Identical genome sequences in samples isolated more than 7 years apart.

PubMed

Strouhal, Michal; Mikalová, Lenka; Havlíčková, Pavla; Tenti, Paolo; Čejková, Darina; Rychlík, Ivan; Bruisten, Sylvia; Šmajs, David

2017-09-01

Treponema pallidum subsp. pertenue (TPE) is the causative agent of yaws, a multi-stage disease, endemic in tropical regions of Africa, Asia, Oceania, and South America. To date, four TPE strains have been completely sequenced including three TPE strains of human origin (Samoa D, CDC-2, and Gauthier) and one TPE strain (Fribourg-Blanc) isolated from a baboon. All TPE strains are highly similar to T. pallidum subsp. pallidum (TPA) strains. The mutation rate in syphilis and related treponemes has not been experimentally determined yet. Complete genomes of two TPE strains, CDC 2575 and Ghana-051, that infected patients in Ghana and were isolated in 1980 and 1988, respectively, were sequenced and analyzed. Both strains had identical consensus genome nucleotide sequences raising the question whether TPE CDC 2575 and Ghana-051 represent two different strains. Several lines of evidence support the fact that both strains represent independent samples including regions showing intrastrain heterogeneity (13 and 5 intrastrain heterogeneous sites in TPE Ghana-051 and TPE CDC 2575, respectively). Four of these heterogeneous sites were found in both genomes but the frequency of alternative alleles differed. The identical consensus genome sequences were used to estimate the upper limit of the yaws treponeme evolution rate, which was 4.1 x 10-10 nucleotide changes per site per generation. The estimated upper limit for the mutation rate of TPE was slightly lower than the mutation rate of E. coli, which was determined during a long-term experiment. Given the known diversity between TPA and TPE genomes and the assumption that both TPA and TPE have a similar mutation rate, the most recent common ancestor of syphilis and yaws treponemes appears to be more than ten thousand years old and likely even older.
The genome sequence of sweet cherry (Prunus avium) for use in genomics-assisted breeding.

PubMed

Shirasawa, Kenta; Isuzugawa, Kanji; Ikenaga, Mitsunobu; Saito, Yutaro; Yamamoto, Toshiya; Hirakawa, Hideki; Isobe, Sachiko

2017-10-01

We determined the genome sequence of sweet cherry (Prunus avium) using next-generation sequencing technology. The total length of the assembled sequences was 272.4 Mb, consisting of 10,148 scaffold sequences with an N50 length of 219.6 kb. The sequences covered 77.8% of the 352.9 Mb sweet cherry genome, as estimated by k-mer analysis, and included >96.0% of the core eukaryotic genes. We predicted 43,349 complete and partial protein-encoding genes. A high-density consensus map with 2,382 loci was constructed using double-digest restriction site-associated DNA sequencing. Comparing the genetic maps of sweet cherry and peach revealed high synteny between the two genomes; thus the scaffolds were integrated into pseudomolecules using map- and synteny-based strategies. Whole-genome resequencing of six modern cultivars found 1,016,866 SNPs and 162,402 insertions/deletions, out of which 0.7% were deleterious. The sequence variants, as well as simple sequence repeats, can be used as DNA markers. The genomic information helps us to identify agronomically important genes and will accelerate genetic studies and breeding programs for sweet cherries. Further information on the genomic sequences and DNA markers is available in DBcherry (http://cherry.kazusa.or.jp (8 May 2017, date last accessed)). © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Genome-wide comparisons of phylogenetic similarities between partial genomic regions and the full-length genome in Hepatitis E virus genotyping.

PubMed

Wang, Shuai; Wei, Wei; Luo, Xuenong; Cai, Xuepeng

2014-01-01

Besides the complete genome, different partial genomic sequences of Hepatitis E virus (HEV) have been used in genotyping studies, making it difficult to compare the results based on them. No commonly agreed partial region for HEV genotyping has been determined. In this study, we used a statistical method to evaluate the phylogenetic performance of each partial genomic sequence from a genome wide, by comparisons of evolutionary distances between genomic regions and the full-length genomes of 101 HEV isolates to identify short genomic regions that can reproduce HEV genotype assignments based on full-length genomes. Several genomic regions, especially one genomic region at the 3'-terminal of the papain-like cysteine protease domain, were detected to have relatively high phylogenetic correlations with the full-length genome. Phylogenetic analyses confirmed the identical performances between these regions and the full-length genome in genotyping, in which the HEV isolates involved could be divided into reasonable genotypes. This analysis may be of value in developing a partial sequence-based consensus classification of HEV species.
Human Ro60 (SSA2) genomic organization and sequence alterations, examined in cutaneous lupus erythematosus.

PubMed

Millard, T P; Ashton, G H S; Kondeatis, E; Vaughan, R W; Hughes, G R V; Khamashta, M A; Hawk, J L M; McGregor, J M; McGrath, J A

2002-02-01

The Ro 60 kDa protein (Ro60 or SSA2) is the major component of the Ro ribonucleoprotein (Ro RNP) complex, to which an immune response is a specific feature of several autoimmune diseases. The genomic organization and any sequence variation within the DNA encoding Ro60 are unknown. To characterize the Ro60 gene structure and to assess whether any sequence alterations might be associated with serum anti-Ro antibody in subacute cutaneous lupus erythematosus (SCLE), thus potentially providing new insight into disease pathogenesis. The cDNA sequence for Ro60 was obtained from the NCBI database and used for a BLAST search for a clone containing the entire genomic sequence. The intron-exon borders were confirmed by designing intronic primer pairs to flank each exon, which were then used to amplify genomic DNA for automated sequencing from 36 caucasian patients with SCLE (anti-Ro positive) and 49 with discoid LE (DLE, anti-Ro negative), in addition to 36 healthy caucasian controls. Heteroduplex analysis of polymerase chain reaction (PCR) products from patients and controls spanning all Ro60 exons (1-8) revealed a common bandshift in the PCR products spanning exon 7. Sequencing of the corresponding PCR products demonstrated an A > G substitution at nucleotide position 1318-7, within the consensus acceptor splice site of exon 7 (GenBank XM001901). The allele frequencies were major allele A (0.71) and minor allele G (0.29) in 72 control chromosomes, with no significant differences found between SCLE patients, DLE patients and controls. The genomic organization of the DNA encoding the Ro60 protein is described, including a common polymorphism within the consensus acceptor splice site of exon 7. Our delineation of a strategy for the genomic amplification of Ro60 forms a basis for further examination of the pathological functions of the Ro RNP in autoimmune disease.
Context based computational analysis and characterization of ARS consensus sequences (ACS) of Saccharomyces cerevisiae genome.

PubMed

Singh, Vinod Kumar; Krishnamachari, Annangarachari

2016-09-01

Genome-wide experimental studies in Saccharomyces cerevisiae reveal that autonomous replicating sequence (ARS) requires an essential consensus sequence (ACS) for replication activity. Computational studies identified thousands of ACS like patterns in the genome. However, only a few hundreds of these sites act as replicating sites and the rest are considered as dormant or evolving sites. In a bid to understand the sequence makeup of replication sites, a content and context-based analysis was performed on a set of replicating ACS sequences that binds to origin-recognition complex (ORC) denoted as ORC-ACS and non-replicating ACS sequences (nrACS), that are not bound by ORC. In this study, DNA properties such as base composition, correlation, sequence dependent thermodynamic and DNA structural profiles, and their positions have been considered for characterizing ORC-ACS and nrACS. Analysis reveals that ORC-ACS depict marked differences in nucleotide composition and context features in its vicinity compared to nrACS. Interestingly, an A-rich motif was also discovered in ORC-ACS sequences within its nucleosome-free region. Profound changes in the conformational features, such as DNA helical twist, inclination angle and stacking energy between ORC-ACS and nrACS were observed. Distribution of ACS motifs in the non-coding segments points to the locations of ORC-ACS which are found far away from the adjacent gene start position compared to nrACS thereby enabling an accessible environment for ORC-proteins. Our attempt is novel in considering the contextual view of ACS and its flanking region along with nucleosome positioning in the S. cerevisiae genome and may be useful for any computational prediction scheme.
A core phylogeny of Dictyostelia inferred from genomes representative of the eight major and minor taxonomic divisions of the group.

PubMed

Singh, Reema; Schilde, Christina; Schaap, Pauline

2016-11-17

Dictyostelia are a well-studied group of organisms with colonial multicellularity, which are members of the mostly unicellular Amoebozoa. A phylogeny based on SSU rDNA data subdivided all Dictyostelia into four major groups, but left the position of the root and of six group-intermediate taxa unresolved. Recent phylogenies inferred from 30 or 213 proteins from sequenced genomes, positioned the root between two branches, each containing two major groups, but lacked data to position the group-intermediate taxa. Since the positions of these early diverging taxa are crucial for understanding the evolution of phenotypic complexity in Dictyostelia, we sequenced six representative genomes of early diverging taxa. We retrieved orthologs of 47 housekeeping proteins with an average size of 890 amino acids from six newly sequenced and eight published genomes of Dictyostelia and unicellular Amoebozoa and inferred phylogenies from single and concatenated protein sequence alignments. Concatenated alignments of all 47 proteins, and four out of five subsets of nine concatenated proteins all produced the same consensus phylogeny with 100% statistical support. Trees inferred from just two out of the 47 proteins, individually reproduced the consensus phylogeny, highlighting that single gene phylogenies will rarely reflect correct species relationships. However, sets of two or three concatenated proteins again reproduced the consensus phylogeny, indicating that a small selection of genes suffices for low cost classification of as yet unincorporated or newly discovered dictyostelid and amoebozoan taxa by gene amplification. The multi-locus consensus phylogeny shows that groups 1 and 2 are sister clades in branch I, with the group-intermediate taxon D. polycarpum positioned as outgroup to group 2. Branch II consists of groups 3 and 4, with the group-intermediate taxon Polysphondylium violaceum positioned as sister to group 4, and the group-intermediate taxon Dictyostelium polycephalum branching at the base of that whole clade. Given the data, the approximately unbiased test rejects all alternative topologies favoured by SSU rDNA and individual proteins with high statistical support. The test also rejects monophyletic origins for the genera Acytostelium, Polysphondylium and Dictyostelium. The current position of Acytostelium ellipticum in the consensus phylogeny indicates that somatic cells were lost twice in Dictyostelia.
Cloning, sequencing and characterization of lipase from a polyhydroxyalkanoate- (PHA-) synthesizing Pseudomonas resinovorans

USDA-ARS?s Scientific Manuscript database

Lipase gene (lip) of a biodegradable polyhydroxyalkanoate- (PHA-) synthesizing bacterium P. resinovorans NRRL B-2649 was cloned, sequenced and characterized by using consensus primers and PCR-based genome walking method. The ORF of the putative Lip (314 amino acids) and its active site (Ser111, Asp...
Final Report for LDRD Project 02-ERD-069: Discovering the Unknown Mechanism(s) of Virulence in a BW, Class A Select Agent

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chain, P; Garcia, E

2003-02-06

The goal of this proposed effort was to assess the difficulty in identifying and characterizing virulence candidate genes in an organism for which very limited data exists. This was accomplished by first addressing the finishing phase of draft-sequenced F. tularensis genomes and conducting comparative analyses to determine the coding potential of each genome; to discover the differences in genome structure and content, and to identify potential genes whose products may be involved in the F. tularensis virulence process. The project was divided into three parts: (1) Genome finishing: This part involves determining the order and orientation of the consensus sequencesmore » of contigs obtained from Phrap assemblies of random draft genomic sequences. This tedious process consists of linking contig ends using information embedded in each sequence file that relates the sequence to the original cloned insert. Since inserts are sequenced from both ends, we can establish a link between these paired-ends in different contigs and thus order and orient contigs. Since these genomes carry numerous copies of insertion sequences, these repeated elements ''confuse'' the Phrap assembly program. It is thus necessary to break these contigs apart at the repeated sequences and individually join the proper flanking regions using paired-end information, or using results of comparisons against a similar genome. Larger repeated elements such as the small subunit ribosomal RNA operon require verification with PCR. Tandem repeats require manual intervention and typically rely on single nucleotide polymorphisms to be resolved. Remaining gaps require PCR reactions and sequencing. Once the genomes have been ''closed'', low quality regions are addressed by resequencing reactions. (2) Genome analysis: The final consensus sequences are processed by combining the results of three gene modelers: Glimmer, Critica and Generation. The final gene models are submitted to a battery of homology searches and domain prediction programs in order to annotate them (e.g. BLAST, Pfam, TIGRfam, COG, KEGG, InterPro, TMhmm, SignalP). The genome structure is also assessed in terms of G+C content, GC bias (GC skew), and locations of repeated regions (e.g. IS elements) and phage-like genes. (3) Comparative genomics: The results of the various genome analyses are compared between the finished (or almost finished) genomes. Here, we have compared the F. tularensis genomes from the extremely lethal strain Schu4 (subsp. tularensis), the vaccine strain LVS (subsp. holartica), and strain UT01-4992 of the less virulent, opportunistic subsp. novicida. Regions present in the highly virulent strain that are absent from the other less virulent strains may provide insight into what factors are required for the high level of virulence.« less
Importing statistical measures into Artemis enhances gene identification in the Leishmania genome project.

PubMed

Aggarwal, Gautam; Worthey, E A; McDonagh, Paul D; Myler, Peter J

2003-06-07

Seattle Biomedical Research Institute (SBRI) as part of the Leishmania Genome Network (LGN) is sequencing chromosomes of the trypanosomatid protozoan species Leishmania major. At SBRI, chromosomal sequence is annotated using a combination of trained and untrained non-consensus gene-prediction algorithms with ARTEMIS, an annotation platform with rich and user-friendly interfaces. Here we describe a methodology used to import results from three different protein-coding gene-prediction algorithms (GLIMMER, TESTCODE and GENESCAN) into the ARTEMIS sequence viewer and annotation tool. Comparison of these methods, along with the CODONUSAGE algorithm built into ARTEMIS, shows the importance of combining methods to more accurately annotate the L. major genomic sequence. An improvised and powerful tool for gene prediction has been developed by importing data from widely-used algorithms into an existing annotation platform. This approach is especially fruitful in the Leishmania genome project where there is large proportion of novel genes requiring manual annotation.
Survey of gene splicing algorithms based on reads.

PubMed

Si, Xiuhua; Wang, Qian; Zhang, Lei; Wu, Ruo; Ma, Jiquan

2017-11-02

Gene splicing is the process of assembling a large number of unordered short sequence fragments to the original genome sequence as accurately as possible. Several popular splicing algorithms based on reads are reviewed in this article, including reference genome algorithms and de novo splicing algorithms (Greedy-extension, Overlap-Layout-Consensus graph, De Bruijn graph). We also discuss a new splicing method based on the MapReduce strategy and Hadoop. By comparing these algorithms, some conclusions are drawn and some suggestions on gene splicing research are made.

Consensus-Degenerate Hybrid Oligonucleotide Primers for Amplification of Priming Glycosyltransferase Genes of the Exopolysaccharide Locus in Strains of the Lactobacillus casei Group

PubMed Central

Provencher, Cathy; LaPointe, Gisèle; Sirois, Stéphane; Van Calsteren, Marie-Rose; Roy, Denis

2003-01-01

A primer design strategy named CODEHOP (consensus-degenerate hybrid oligonucleotide primer) for amplification of distantly related sequences was used to detect the priming glycosyltransferase (GT) gene in strains of the Lactobacillus casei group. Each hybrid primer consisted of a short 3′ degenerate core based on four highly conserved amino acids and a longer 5′ consensus clamp region based on six sequences of the priming GT gene products from exopolysaccharide (EPS)-producing bacteria. The hybrid primers were used to detect the priming GT gene of 44 commercial isolates and reference strains of Lactobacillus rhamnosus, L. casei, Lactobacillus zeae, and Streptococcus thermophilus. The priming GT gene was detected in the genome of both non-EPS-producing (EPS−) and EPS-producing (EPS+) strains of L. rhamnosus. The sequences of the cloned PCR products were similar to those of the priming GT gene of various gram-negative and gram-positive EPS+ bacteria. Specific primers designed from the L. rhamnosus RW-9595M GT gene were used to sequence the end of the priming GT gene in selected EPS+ strains of L. rhamnosus. Phylogenetic analysis revealed that Lactobacillus spp. form a distinctive group apart from other lactic acid bacteria for which GT genes have been characterized to date. Moreover, the sequences show a divergence existing among strains of L. rhamnosus with respect to the terminal region of the priming GT gene. Thus, the PCR approach with consensus-degenerate hybrid primers designed with CODEHOP is a practical approach for the detection of similar genes containing conserved motifs in different bacterial genomes. PMID:12788729
A draft physical map of a D-genome cotton species (Gossypium raimondii)

PubMed Central

2010-01-01

Background Genetically anchored physical maps of large eukaryotic genomes have proven useful both for their intrinsic merit and as an adjunct to genome sequencing. Cultivated tetraploid cottons, Gossypium hirsutum and G. barbadense, share a common ancestor formed by a merger of the A and D genomes about 1-2 million years ago. Toward the long-term goal of characterizing the spectrum of diversity among cotton genomes, the worldwide cotton community has prioritized the D genome progenitor Gossypium raimondii for complete sequencing. Results A whole genome physical map of G. raimondii, the putative D genome ancestral species of tetraploid cottons was assembled, integrating genetically-anchored overgo hybridization probes, agarose based fingerprints and 'high information content fingerprinting' (HICF). A total of 13,662 BAC-end sequences and 2,828 DNA probes were used in genetically anchoring 1585 contigs to a cotton consensus genetic map, and 370 and 438 contigs, respectively to Arabidopsis thaliana (AT) and Vitis vinifera (VV) whole genome sequences. Conclusion Several lines of evidence suggest that the G. raimondii genome is comprised of two qualitatively different components. Much of the gene rich component is aligned to the Arabidopsis and Vitis vinifera genomes and shows promise for utilizing translational genomic approaches in understanding this important genome and its resident genes. The integrated genetic-physical map is of value both in assembling and validating a planned reference sequence. PMID:20569427
Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq

PubMed Central

Ode, Hirotaka; Matsuda, Masakazu; Matsuoka, Kazuhiro; Hachiya, Atsuko; Hattori, Junko; Kito, Yumiko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru

2015-01-01

Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome. PMID:26617593
Common Viral Integration Sites Identified in Avian Leukosis Virus-Induced B-Cell Lymphomas

PubMed Central

Justice, James F.; Morgan, Robin W.

2015-01-01

ABSTRACT Avian leukosis virus (ALV) induces B-cell lymphoma and other neoplasms in chickens by integrating within or near cancer genes and perturbing their expression. Four genes—MYC, MYB, Mir-155, and TERT—have previously been identified as common integration sites in these virus-induced lymphomas and are thought to play a causal role in tumorigenesis. In this study, we employ high-throughput sequencing to identify additional genes driving tumorigenesis in ALV-induced B-cell lymphomas. In addition to the four genes implicated previously, we identify other genes as common integration sites, including TNFRSF1A, MEF2C, CTDSPL, TAB2, RUNX1, MLL5, CXorf57, and BACH2. We also analyze the genome-wide ALV integration landscape in vivo and find increased frequency of ALV integration near transcriptional start sites and within transcripts. Previous work has shown ALV prefers a weak consensus sequence for integration in cultured human cells. We confirm this consensus sequence for ALV integration in vivo in the chicken genome. PMID:26670384
Development of genome- and transcriptome-derived microsatellites in related species of snapping shrimps with highly duplicated genomes.

PubMed

Gaynor, Kaitlyn M; Solomon, Joseph W; Siller, Stefanie; Jessell, Linnet; Duffy, J Emmett; Rubenstein, Dustin R

2017-11-01

Molecular markers are powerful tools for studying patterns of relatedness and parentage within populations and for making inferences about social evolution. However, the development of molecular markers for simultaneous study of multiple species presents challenges, particularly when species exhibit genome duplication or polyploidy. We developed microsatellite markers for Synalpheus shrimp, a genus in which species exhibit not only great variation in social organization, but also interspecific variation in genome size and partial genome duplication. From the four primary clades within Synalpheus, we identified microsatellites in the genomes of four species and in the consensus transcriptome of two species. Ultimately, we designed and tested primers for 143 microsatellite markers across 25 species. Although the majority of markers were disomic, many markers were polysomic for certain species. Surprisingly, we found no relationship between genome size and the number of polysomic markers. As expected, markers developed for a given species amplified better for closely related species than for more distant relatives. Finally, the markers developed from the transcriptome were more likely to work successfully and to be disomic than those developed from the genome, suggesting that consensus transcriptomes are likely to be conserved across species. Our findings suggest that the transcriptome, particularly consensus sequences from multiple species, can be a valuable source of molecular markers for taxa with complex, duplicated genomes. © 2017 John Wiley & Sons Ltd.
RNA-Seq analysis and transcriptome assembly for blackberry (Rubus sp. Var. Lochness) fruit.

PubMed

Garcia-Seco, Daniel; Zhang, Yang; Gutierrez-Mañero, Francisco J; Martin, Cathie; Ramos-Solano, Beatriz

2015-01-22

There is an increasing interest in berries, especially blackberries in the diet, because of recent reports of their health benefits due to their high content of flavonoids. A broad range of genomic tools are available for other Rosaceae species but these tools are still lacking in the Rubus genus, thus limiting gene discovery and the breeding of improved varieties. De novo RNA-seq of ripe blackberries grown under field conditions was performed using Illumina Hiseq 2000. Almost 9 billion nucleotide bases were sequenced in total. Following assembly, 42,062 consensus sequences were detected. For functional annotation, 33,040 (NR), 32,762 (NT), 21,932 (Swiss-Prot), 20,134 (KEGG), 13,676 (COG), 24,168 (GO) consensus sequences were annotated using different databases; in total 34,552 annotated sequences were identified. For protein prediction analysis, the number of coding DNA sequences (CDS) that mapped to the protein database was 32,540. Non redundant (NR), annotation showed that 25,418 genes (73.5%) has the highest similarity with Fragaria vesca subspecies vesca. Reanalysis was undertaken by aligning the reads with this reference genome for a deeper analysis of the transcriptome. We demonstrated that de novo assembly, using Trinity and later annotation with Blast using different databases, were complementary to alignment to the reference sequence using SOAPaligner/SOAP2. The Fragaria reference genome belongs to a species in the same family as blackberry (Rosaceae) but to a different genus. Since blackberries are tetraploids, the possibility of artefactual gene chimeras resulting from mis-assembly was tested with one of the genes sequenced by RNAseq, Chalcone Synthase (CHS). cDNAs encoding this protein were cloned and sequenced. Primers designed to the assembled sequences accurately distinguished different contigs, at least for chalcone synthase genes. We prepared and analysed transcriptome data from ripe blackberries, for which prior genomic information was limited. This new sequence information will improve the knowledge of this important and healthy fruit, providing an invaluable new tool for biological research.
Complete genome sequence of the phenanthrene-degrading soil bacterium Delftia acidovorans Cs1-4

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shetty, Ameesha R.; de Gannes, Vidya; Obi, Chioma C.

Polycyclic aromatic hydrocarbons (PAH) are ubiquitous environmental pollutants and microbial biodegradation is an important means of remediation of PAH-contaminated soil. Delftia acidovorans Cs1-4 (formerly Delftia sp. Cs1-4) was isolated by using phenanthrene as the sole carbon source from PAH contaminated soil in Wisconsin. Its full genome sequence was determined to gain insights into a mechanisms underlying biodegradation of PAH. Three genomic libraries were constructed and sequenced: an Illumina GAii shotgun library (916,416,493 reads), a 454 Titanium standard library (770,171 reads) and one paired-end 454 library (average insert size of 8 kb, 508,092 reads). The initial assembly contained 40 contigs inmore » two scaffolds. The 454 Titanium standard data and the 454 paired end data were assembled together and the consensus sequences were computationally shredded into 2 kb overlapping shreds. Illumina sequencing data was assembled, and the consensus sequence was computationally shredded into 1.5 kb overlapping shreds. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks. A total of 182 additional reactions were needed to close gaps and to raise the quality of the finished sequence. The final assembly is based on 253.3 Mb of 454 draft data (averaging 38.4 X coverage) and 590.2 Mb of Illumina draft data (averaging 89.4 X coverage). The genome of strain Cs1-4 consists of a single circular chromosome of 6,685,842 bp (66.7 %G+C) containing 6,028 predicted genes; 5,931 of these genes were protein-encoding and 4,425 gene products were assigned to a putative function. Genes encoding phenanthrene degradation were localized to a 232 kb genomic island (termed the phn island), which contained near its 3’ end a bacteriophage P4-like integrase, an enzyme often associated with chromosomal integration of mobile genetic elements. Other biodegradation pathways reconstructed from the genome sequence included: benzoate (by the acetyl-CoA pathway), styrene, nicotinic acid (by the maleamate pathway) and the pesticides Dicamba and Fenitrothion. Lastly, determination of the complete genome sequence of D. acidovorans Cs1-4 has provided new insights the microbial mechanisms of PAH biodegradation that may shape the process in the environment.« less
Complete genome sequence of the phenanthrene-degrading soil bacterium Delftia acidovorans Cs1-4

DOE PAGES

Shetty, Ameesha R.; de Gannes, Vidya; Obi, Chioma C.; ...

2015-08-15

Polycyclic aromatic hydrocarbons (PAH) are ubiquitous environmental pollutants and microbial biodegradation is an important means of remediation of PAH-contaminated soil. Delftia acidovorans Cs1-4 (formerly Delftia sp. Cs1-4) was isolated by using phenanthrene as the sole carbon source from PAH contaminated soil in Wisconsin. Its full genome sequence was determined to gain insights into a mechanisms underlying biodegradation of PAH. Three genomic libraries were constructed and sequenced: an Illumina GAii shotgun library (916,416,493 reads), a 454 Titanium standard library (770,171 reads) and one paired-end 454 library (average insert size of 8 kb, 508,092 reads). The initial assembly contained 40 contigs inmore » two scaffolds. The 454 Titanium standard data and the 454 paired end data were assembled together and the consensus sequences were computationally shredded into 2 kb overlapping shreds. Illumina sequencing data was assembled, and the consensus sequence was computationally shredded into 1.5 kb overlapping shreds. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks. A total of 182 additional reactions were needed to close gaps and to raise the quality of the finished sequence. The final assembly is based on 253.3 Mb of 454 draft data (averaging 38.4 X coverage) and 590.2 Mb of Illumina draft data (averaging 89.4 X coverage). The genome of strain Cs1-4 consists of a single circular chromosome of 6,685,842 bp (66.7 %G+C) containing 6,028 predicted genes; 5,931 of these genes were protein-encoding and 4,425 gene products were assigned to a putative function. Genes encoding phenanthrene degradation were localized to a 232 kb genomic island (termed the phn island), which contained near its 3’ end a bacteriophage P4-like integrase, an enzyme often associated with chromosomal integration of mobile genetic elements. Other biodegradation pathways reconstructed from the genome sequence included: benzoate (by the acetyl-CoA pathway), styrene, nicotinic acid (by the maleamate pathway) and the pesticides Dicamba and Fenitrothion. Lastly, determination of the complete genome sequence of D. acidovorans Cs1-4 has provided new insights the microbial mechanisms of PAH biodegradation that may shape the process in the environment.« less
VCGDB: a dynamic genome database of the Chinese population

PubMed Central

2014-01-01

Background The data released by the 1000 Genomes Project contain an increasing number of genome sequences from different nations and populations with a large number of genetic variations. As a result, the focus of human genome studies is changing from single and static to complex and dynamic. The currently available human reference genome (GRCh37) is based on sequencing data from 13 anonymous Caucasian volunteers, which might limit the scope of genomics, transcriptomics, epigenetics, and genome wide association studies. Description We used the massive amount of sequencing data published by the 1000 Genomes Project Consortium to construct the Virtual Chinese Genome Database (VCGDB), a dynamic genome database of the Chinese population based on the whole genome sequencing data of 194 individuals. VCGDB provides dynamic genomic information, which contains 35 million single nucleotide variations (SNVs), 0.5 million insertions/deletions (indels), and 29 million rare variations, together with genomic annotation information. VCGDB also provides a highly interactive user-friendly virtual Chinese genome browser (VCGBrowser) with functions like seamless zooming and real-time searching. In addition, we have established three population-specific consensus Chinese reference genomes that are compatible with mainstream alignment software. Conclusions VCGDB offers a feasible strategy for processing big data to keep pace with the biological data explosion by providing a robust resource for genomics studies; in particular, studies aimed at finding regions of the genome associated with diseases. PMID:24708222
The short interspersed repetitive element of Trypanosoma cruzi, SIRE, is part of VIPER, an unusual retroelement related to long terminal repeat retrotransposons

PubMed Central

Vázquez, Martín; Ben-Dov, Claudia; Lorenzi, Hernan; Moore, Troy; Schijman, Alejandro; Levin, Mariano J.

2000-01-01

The short interspersed repetitive element (SIRE) of Trypanosoma cruzi was first detected when comparing the sequences of loci that encode the TcP2β genes. It is present in about 1,500–3,000 copies per genome, depending on the strain, and it is distributed in all chromosomes. An initial analysis of SIRE sequences from 21 genomic fragments allowed us to derive a consensus nucleotide sequence and structure for the element, consisting of three regions (I, II, and III) each harboring distinctive features. Analysis of 158 transcribed SIREs demonstrates that the consensus is highly conserved. The sequences of 51 cDNAs show that SIRE is included in the 3′ end of several mRNAs, always transcribed from the sense strand, contributing the polyadenylation site in 63% of the cases. This study led to the characterization of VIPER (vestigial interposed retroelement), a 2,326-bp-long unusual retroelement. VIPER's 5′ end is formed by the first 182 bp of SIRE, whereas its 3′ end is formed by the last 220 bp of the element. Both SIRE moieties are connected by a 1,924-bp-long fragment that carries a unique ORF encoding a complete reverse transcriptase-RNase H gene whose 15 C-terminal amino acids derive from codons specified by SIRE's region II. The amino acid sequence of VIPER's reverse transcriptase-RNase H shares significant homology to that of long terminal repeat retrotransposons. The fact that SIRE and VIPER sequences are found only in the T. cruzi genome may be of relevance for studies concerning the evolution and the genome flexibility of this protozoan parasite. PMID:10688909
A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs.

PubMed

Swain, Martin T; Tsai, Isheng J; Assefa, Samual A; Newbold, Chris; Berriman, Matthew; Otto, Thomas D

2012-06-07

Genome projects now produce draft assemblies within weeks owing to advanced high-throughput sequencing technologies. For milestone projects such as Escherichia coli or Homo sapiens, teams of scientists were employed to manually curate and finish these genomes to a high standard. Nowadays, this is not feasible for most projects, and the quality of genomes is generally of a much lower standard. This protocol describes software (PAGIT) that is used to improve the quality of draft genomes. It offers flexible functionality to close gaps in scaffolds, correct base errors in the consensus sequence and exploit reference genomes (if available) in order to improve scaffolding and generating annotations. The protocol is most accessible for bacterial and small eukaryotic genomes (up to 300 Mb), such as pathogenic bacteria, malaria and parasitic worms. Applying PAGIT to an E. coli assembly takes ∼24 h: it doubles the average contig size and annotates over 4,300 gene models.
Randomization and In Vivo Selection Reveal a GGRG Motif Essential for Packaging Human Immunodeficiency Virus Type 2 RNA ▿ †

PubMed Central

Baig, Tayyba T.; Lanchy, Jean-Marc; Lodmell, J. Stephen

2009-01-01

The packaging signal (ψ) of human immunodeficiency virus type 2 (HIV-2) is present in the 5′ noncoding region of RNA and contains a 10-nucleotide palindrome (pal; 5′-392-GGAGUGCUCC) located upstream of the dimerization signal stem-loop 1 (SL1). pal has been shown to be functionally important in vitro and in vivo. We previously showed that the 3′ side of pal (GCUCC-3′) is involved in base-pairing interactions with a sequence downstream of SL1 to make an extended SL1, which is important for replication in vivo and the regulation of dimerization in vitro. However, the role of the 5′ side of pal (5′-GGAGU) was less clear. Here, we characterized this role using an in vivo SELEX approach. We produced a population of HIV-2 DNA genomes with random sequences within the 5′ side of pal and transfected these into COS-7 cells. Viruses from COS-7 cells were used to infect C8166 permissive cells. After several weeks of serial passage in C8166 cells, surviving viruses were sequenced. On the 5′ side of pal there was a striking convergence toward a GGRGN consensus sequence. Individual clones with consensus and nonconsensus sequences were tested in infectivity and packaging assays. Analysis of individuals that diverged from the consensus sequence showed normal viral RNA and protein synthesis but had replication defects and impaired RNA packaging. These findings clearly indicate that the GGRG motif is essential for viral replication and genomic RNA packaging. PMID:18971263
Augmenting Chinese hamster genome assembly by identifying regions of high confidence.

PubMed

Vishwanathan, Nandita; Bandyopadhyay, Arpan A; Fu, Hsu-Yuan; Sharma, Mohit; Johnson, Kathryn C; Mudge, Joann; Ramaraj, Thiruvarangan; Onsongo, Getiria; Silverstein, Kevin A T; Jacob, Nitya M; Le, Huong; Karypis, George; Hu, Wei-Shou

2016-09-01

Chinese hamster Ovary (CHO) cell lines are the dominant industrial workhorses for therapeutic recombinant protein production. The availability of genome sequence of Chinese hamster and CHO cells will spur further genome and RNA sequencing of producing cell lines. However, the mammalian genomes assembled using shot-gun sequencing data still contain regions of uncertain quality due to assembly errors. Identifying high confidence regions in the assembled genome will facilitate its use for cell engineering and genome engineering. We assembled two independent drafts of Chinese hamster genome by de novo assembly from shotgun sequencing reads and by re-scaffolding and gap-filling the draft genome from NCBI for improved scaffold lengths and gap fractions. We then used the two independent assemblies to identify high confidence regions using two different approaches. First, the two independent assemblies were compared at the sequence level to identify their consensus regions as "high confidence regions" which accounts for at least 78 % of the assembled genome. Further, a genome wide comparison of the Chinese hamster scaffolds with mouse chromosomes revealed scaffolds with large blocks of collinearity, which were also compiled as high-quality scaffolds. Genome scale collinearity was complemented with EST based synteny which also revealed conserved gene order compared to mouse. As cell line sequencing becomes more commonly practiced, the approaches reported here are useful for assessing the quality of assembly and potentially facilitate the engineering of cell lines. Copyright © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver.

PubMed

Wymant, Chris; Blanquart, François; Golubchik, Tanya; Gall, Astrid; Bakker, Margreet; Bezemer, Daniela; Croucher, Nicholas J; Hall, Matthew; Hillebregt, Mariska; Ong, Swee Hoe; Ratmann, Oliver; Albert, Jan; Bannert, Norbert; Fellay, Jacques; Fransen, Katrien; Gourlay, Annabelle; Grabowski, M Kate; Gunsenheimer-Bartmeyer, Barbara; Günthard, Huldrych F; Kivelä, Pia; Kouyos, Roger; Laeyendecker, Oliver; Liitsola, Kirsi; Meyer, Laurence; Porter, Kholoud; Ristola, Matti; van Sighem, Ard; Berkhout, Ben; Cornelissen, Marion; Kellam, Paul; Reiss, Peter; Fraser, Christophe

2018-01-01

Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user's choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver's constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from https://github.com/ChrisHIV/shiver.
de novo assembly and population genomic survey of natural yeast isolates with the Oxford Nanopore MinION sequencer.

PubMed

Istace, Benjamin; Friedrich, Anne; d'Agata, Léo; Faye, Sébastien; Payen, Emilie; Beluche, Odette; Caradec, Claudia; Davidas, Sabrina; Cruaud, Corinne; Liti, Gianni; Lemainque, Arnaud; Engelen, Stefan; Wincker, Patrick; Schacherer, Joseph; Aury, Jean-Marc

2017-02-01

Oxford Nanopore Technologies Ltd (Oxford, UK) have recently commercialized MinION, a small single-molecule nanopore sequencer, that offers the possibility of sequencing long DNA fragments from small genomes in a matter of seconds. The Oxford Nanopore technology is truly disruptive; it has the potential to revolutionize genomic applications due to its portability, low cost, and ease of use compared with existing long reads sequencing technologies. The MinION sequencer enables the rapid sequencing of small eukaryotic genomes, such as the yeast genome. Combined with existing assembler algorithms, near complete genome assemblies can be generated and comprehensive population genomic analyses can be performed. Here, we resequenced the genome of the Saccharomyces cerevisiae S288C strain to evaluate the performance of nanopore-only assemblers. Then we de novo sequenced and assembled the genomes of 21 isolates representative of the S. cerevisiae genetic diversity using the MinION platform. The contiguity of our assemblies was 14 times higher than the Illumina-only assemblies and we obtained one or two long contigs for 65 % of the chromosomes. This high contiguity allowed us to accurately detect large structural variations across the 21 studied genomes. Because of the high completeness of the nanopore assemblies, we were able to produce a complete cartography of transposable elements insertions and inspect structural variants that are generally missed using a short-read sequencing strategy. Our analyses show that the Oxford Nanopore technology is already usable for de novo sequencing and assembly; however, non-random errors in homopolymers require polishing the consensus using an alternate sequencing technology. © The Author 2017. Published by Oxford University Press.
de novo assembly and population genomic survey of natural yeast isolates with the Oxford Nanopore MinION sequencer

PubMed Central

Istace, Benjamin; Friedrich, Anne; d'Agata, Léo; Faye, Sébastien; Payen, Emilie; Beluche, Odette; Caradec, Claudia; Davidas, Sabrina; Cruaud, Corinne; Liti, Gianni; Lemainque, Arnaud; Engelen, Stefan; Wincker, Patrick; Schacherer, Joseph

2017-01-01

Abstract Background: Oxford Nanopore Technologies Ltd (Oxford, UK) have recently commercialized MinION, a small single-molecule nanopore sequencer, that offers the possibility of sequencing long DNA fragments from small genomes in a matter of seconds. The Oxford Nanopore technology is truly disruptive; it has the potential to revolutionize genomic applications due to its portability, low cost, and ease of use compared with existing long reads sequencing technologies. The MinION sequencer enables the rapid sequencing of small eukaryotic genomes, such as the yeast genome. Combined with existing assembler algorithms, near complete genome assemblies can be generated and comprehensive population genomic analyses can be performed. Results: Here, we resequenced the genome of the Saccharomyces cerevisiae S288C strain to evaluate the performance of nanopore-only assemblers. Then we de novo sequenced and assembled the genomes of 21 isolates representative of the S. cerevisiae genetic diversity using the MinION platform. The contiguity of our assemblies was 14 times higher than the Illumina-only assemblies and we obtained one or two long contigs for 65 % of the chromosomes. This high contiguity allowed us to accurately detect large structural variations across the 21 studied genomes. Conclusion: Because of the high completeness of the nanopore assemblies, we were able to produce a complete cartography of transposable elements insertions and inspect structural variants that are generally missed using a short-read sequencing strategy. Our analyses show that the Oxford Nanopore technology is already usable for de novo sequencing and assembly; however, non-random errors in homopolymers require polishing the consensus using an alternate sequencing technology. PMID:28369459
Insights Into Upland Cotton (Gossypium hirsutum L.) Genetic Recombination Based on 3 High-Density Single-Nucleotide Polymorphism and a Consensus Map Developed Independently With Common Parents.

PubMed

Ulloa, Mauricio; Hulse-Kemp, Amanda M; De Santiago, Luis M; Stelly, David M; Burke, John J

2017-01-01

High-density linkage maps are vital to supporting the correct placement of scaffolds and gene sequences on chromosomes and fundamental to contemporary organismal research and scientific approaches to genetic improvement, especially in paleopolyploids with exceptionally complex genomes, eg, upland cotton ( Gossypium hirsutum L., "2n = 52"). Three independently developed intraspecific upland mapping populations were analyzed to generate 3 high-density genetic linkage single-nucleotide polymorphism (SNP) maps and a consensus map using the CottonSNP63K array. The populations consisted of a previously reported F 2 , a recombinant inbred line (RIL), and reciprocal RIL population, from "Phytogen 72" and "Stoneville 474" cultivars. The cluster file provided 7417 genotyped SNP markers, resulting in 26 linkage groups corresponding to the 26 chromosomes (c) of the allotetraploid upland cotton (AD) 1 arisen from the merging of 2 genomes ("A" Old World and "D" New World). Patterns of chromosome-specific recombination were largely consistent across mapping populations. The high-density genetic consensus map included 7244 SNP markers that spanned 3538 cM and comprised 3824 SNP bins, of which 1783 and 2041 were in the A t and D t subgenomes with 1825 and 1713 cM map lengths, respectively. Subgenome average distances were nearly identical, indicating that subgenomic differences in bin number arose due to the high numbers of SNPs on the D t subgenome. Examination of expected recombination frequency or crossovers (COs) on the chromosomes within each population of the 2 subgenomes revealed that COs were also not affected by the SNPs or SNP bin number in these subgenomes. Comparative alignment analyses identified historical ancestral A t -subgenomic translocations of c02 and c03, as well as of c04 and c05. The consensus map SNP sequences aligned with high congruency to the NBI assembly of Gossypium hirsutum . However, the genomic comparisons revealed evidence of additional unconfirmed possible duplications, inversions and translocations, and unbalance SNP sequence homology or SNP sequence/loci genomic dominance, or homeolog loci bias of the upland tetraploid A t and D t subgenomes. The alignments indicated that 364 SNP-associated previously unintegrated scaffolds can be placed in pseudochromosomes of the NBI G hirsutum assembly. This is the first intraspecific SNP genetic linkage consensus map assembled in G hirsutum with a core of reproducible mendelian SNP markers assayed on different populations and it provides further knowledge of chromosome arrangement of genic and nongenic SNPs. Together, the consensus map and RIL populations provide a synergistically useful platform for localizing and identifying agronomically important loci for improvement of the cotton crop.
A genomic landscape of mitochondrial DNA insertions in the pig nuclear genome provides evolutionary signatures of interspecies admixture.

PubMed

Schiavo, Giuseppina; Hoffmann, Orsolya Ivett; Ribani, Anisa; Utzeri, Valerio Joe; Ghionda, Marco Ciro; Bertolini, Francesca; Geraci, Claudia; Bovo, Samuele; Fontanesi, Luca

2017-10-01

Nuclear DNA sequences of mitochondrial origin (numts) are derived by insertion of mitochondrial DNA (mtDNA), into the nuclear genome. In this study, we provide, for the first time, a genome picture of numts inserted in the pig nuclear genome. The Sus scrofa reference nuclear genome (Sscrofa10.2) was aligned with circularized and consensus mtDNA sequences using LAST software. A total of 430 numt sequences that may represent 246 different numt integration events (57 numt regions determined by at least two numt sequences and 189 singletons) were identified, covering about 0.0078% of the nuclear genome. Numt integration events were correlated (0.99) to the chromosome length. The longest numt sequence (about 11 kbp) was located on SSC2. Six numts were sequenced and PCR amplified in pigs of European commercial and local pig breeds, of the Chinese Meishan breed and in European wild boars. Three of them were polymorphic for the presence or absence of the insertion. Surprisingly, the estimated age of insertion of two of the three polymorphic numts was more ancient than that of the speciation time of the Sus scrofa, supporting that these polymorphic sites were originated from interspecies admixture that contributed to shape the pig genome. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Barley whole exome capture: a tool for genomic research in the genus Hordeum and beyond

PubMed Central

Mascher, Martin; Richmond, Todd A; Gerhardt, Daniel J; Himmelbach, Axel; Clissold, Leah; Sampath, Dharanya; Ayling, Sarah; Steuernagel, Burkhard; Pfeifer, Matthias; D'Ascenzo, Mark; Akhunov, Eduard D; Hedley, Pete E; Gonzales, Ana M; Morrell, Peter L; Kilian, Benjamin; Blattner, Frank R; Scholz, Uwe; Mayer, Klaus FX; Flavell, Andrew J; Muehlbauer, Gary J; Waugh, Robbie; Jeddeloh, Jeffrey A; Stein, Nils

2013-01-01

Advanced resources for genome-assisted research in barley (Hordeum vulgare) including a whole-genome shotgun assembly and an integrated physical map have recently become available. These have made possible studies that aim to assess genetic diversity or to isolate single genes by whole-genome resequencing and in silico variant detection. However such an approach remains expensive given the 5 Gb size of the barley genome. Targeted sequencing of the mRNA-coding exome reduces barley genomic complexity more than 50-fold, thus dramatically reducing this heavy sequencing and analysis load. We have developed and employed an in-solution hybridization-based sequence capture platform to selectively enrich for a 61.6 megabase coding sequence target that includes predicted genes from the genome assembly of the cultivar Morex as well as publicly available full-length cDNAs and de novo assembled RNA-Seq consensus sequence contigs. The platform provides a highly specific capture with substantial and reproducible enrichment of targeted exons, both for cultivated barley and related species. We show that this exome capture platform provides a clear path towards a broader and deeper understanding of the natural variation residing in the mRNA-coding part of the barley genome and will thus constitute a valuable resource for applications such as mapping-by-sequencing and genetic diversity analyzes. PMID:23889683
A consensus genetic map of cowpea [Vigna unguiculata (L) Walp.] and synteny based on EST-derived SNPs.

PubMed

Muchero, Wellington; Diop, Ndeye N; Bhat, Prasanna R; Fenton, Raymond D; Wanamaker, Steve; Pottorff, Marti; Hearne, Sarah; Cisse, Ndiaga; Fatokun, Christian; Ehlers, Jeffrey D; Roberts, Philip A; Close, Timothy J

2009-10-27

Consensus genetic linkage maps provide a genomic framework for quantitative trait loci identification, map-based cloning, assessment of genetic diversity, association mapping, and applied breeding in marker-assisted selection schemes. Among "orphan crops" with limited genomic resources such as cowpea [Vigna unguiculata (L.) Walp.] (2n = 2x = 22), the use of transcript-derived SNPs in genetic maps provides opportunities for automated genotyping and estimation of genome structure based on synteny analysis. Here, we report the development and validation of a high-throughput EST-derived SNP assay for cowpea, its application in consensus map building, and determination of synteny to reference genomes. SNP mining from 183,118 ESTs sequenced from 17 cDNA libraries yielded approximately 10,000 high-confidence SNPs from which an Illumina 1,536-SNP GoldenGate genotyping array was developed and applied to 741 recombinant inbred lines from six mapping populations. Approximately 90% of the SNPs were technically successful, providing 1,375 dependable markers. Of these, 928 were incorporated into a consensus genetic map spanning 680 cM with 11 linkage groups and an average marker distance of 0.73 cM. Comparison of this cowpea genetic map to reference legumes, soybean (Glycine max) and Medicago truncatula, revealed extensive macrosynteny encompassing 85 and 82%, respectively, of the cowpea map. Regions of soybean genome duplication were evident relative to the simpler diploid cowpea. Comparison with Arabidopsis revealed extensive genomic rearrangement with some conserved microsynteny. These results support evolutionary closeness between cowpea and soybean and identify regions for synteny-based functional genomics studies in legumes.

Distribution and sequence homogeneity of an abundant satellite DNA in the beetle, Tenebrio molitor.

PubMed Central

Davis, C A; Wyatt, G R

1989-01-01

The mealworm beetle, Tenebrio molitor, contains an unusually abundant and homogeneous satellite DNA which constitutes up to 60% of its genome. The satellite DNA is shown to be present in all of the chromosomes by in situ hybridization. 18 dimers of the repeat unit were cloned and sequenced. The consensus sequence is 142 nt long and lacks any internal repeat structure. Monomers of the sequence are very similar, showing on average a 2% divergence from the calculated consensus. Variant nucleotides are scattered randomly throughout the sequence although some variants are more common than others. Neighboring repeat units are no more alike than randomly chosen ones. The results suggest that some mechanism, perhaps gene conversion, is acting to maintain the homogeneity of the satellite DNA despite its abundance and distribution on all of the chromosomes. Images PMID:2762148
Best practices for mapping replication origins in eukaryotic chromosomes.

PubMed

Besnard, Emilie; Desprat, Romain; Ryan, Michael; Kahli, Malik; Aladjem, Mirit I; Lemaitre, Jean-Marc

2014-09-02

Understanding the regulatory principles ensuring complete DNA replication in each cell division is critical for deciphering the mechanisms that maintain genomic stability. Recent advances in genome sequencing technology facilitated complete mapping of DNA replication sites and helped move the field from observing replication patterns at a handful of single loci to analyzing replication patterns genome-wide. These advances address issues, such as the relationship between replication initiation events, transcription, and chromatin modifications, and identify potential replication origin consensus sequences. This unit summarizes the technological and fundamental aspects of replication profiling and briefly discusses novel insights emerging from mining large datasets, published in the last 3 years, and also describes DNA replication dynamics on a whole-genome scale. Copyright © 2014 John Wiley & Sons, Inc.
Presence of a consensus DNA motif at nearby DNA sequence of the mutation susceptible CG nucleotides.

PubMed

Chowdhury, Kaushik; Kumar, Suresh; Sharma, Tanu; Sharma, Ankit; Bhagat, Meenakshi; Kamai, Asangla; Ford, Bridget M; Asthana, Shailendra; Mandal, Chandi C

2018-01-10

Complexity in tissues affected by cancer arises from somatic mutations and epigenetic modifications in the genome. The mutation susceptible hotspots present within the genome indicate a non-random nature and/or a position specific selection of mutation. An association exists between the occurrence of mutations and epigenetic DNA methylation. This study is primarily aimed at determining mutation status, and identifying a signature for predicting mutation prone zones of tumor suppressor (TS) genes. Nearby sequences from the top five positions having a higher mutation frequency in each gene of 42 TS genes were selected from a cosmic database and were considered as mutation prone zones. The conserved motifs present in the mutation prone DNA fragments were identified. Molecular docking studies were done to determine putative interactions between the identified conserved motifs and enzyme methyltransferase DNMT1. Collective analysis of 42 TS genes found GC as the most commonly replaced and AT as the most commonly formed residues after mutation. Analysis of the top 5 mutated positions of each gene (210 DNA segments for 42 TS genes) identified that CG nucleotides of the amino acid codons (e.g., Arginine) are most susceptible to mutation, and found a consensus DNA "T/AGC/GAGGA/TG" sequence present in these mutation prone DNA segments. Similar to TS genes, analysis of 54 oncogenes not only found CG nucleotides of the amino acid Arg as the most susceptible to mutation, but also identified the presence of similar consensus DNA motifs in the mutation prone DNA fragments (270 DNA segments for 54 oncogenes) of oncogenes. Docking studies depicted that, upon binding of DNMT1 methylates to this consensus DNA motif (C residues of CpG islands), mutation was likely to occur. Thus, this study proposes that DNMT1 mediated methylation in chromosomal DNA may decrease if a foreign DNA segment containing this consensus sequence along with CG nucleotides is exogenously introduced to dividing cancer cells. Copyright © 2017 Elsevier B.V. All rights reserved.
Control of artefactual variation in reported inter-sample relatedness during clinical use of a Mycobacterium tuberculosis sequencing pipeline.

PubMed

Wyllie, David H; Sanderson, Nicholas; Myers, Richard; Peto, Tim; Robinson, Esther; Crook, Derrick W; Smith, E Grace; Walker, A Sarah

2018-06-06

Contact tracing requires reliable identification of closely related bacterial isolates. When we noticed the reporting of artefactual variation between M. tuberculosis isolates during routine next generation sequencing of Mycobacterium spp, we investigated its basis in 2,018 consecutive M. tuberculosis isolates. In the routine process used, clinical samples were decontaminated and inoculated into broth cultures; from positive broth cultures DNA was extracted, sequenced, reads mapped, and consensus sequences determined. We investigated the process of consensus sequence determination, which selects the most common nucleotide at each position. Having determined the high-quality read depth and depth of minor variants across 8,006 M. tuberculosis genomic regions, we quantified the relationship between the minor variant depth and the amount of non-Mycobacterial bacterial DNA, which originates from commensal microbes killed during sample decontamination. In the presence of non-Mycobacterial bacterial DNA, we found significant increases in minor variant frequencies of more than 1.5 fold in 242 regions covering 5.1% of the M. tuberculosis genome. Included within these were four high variation regions strongly influenced by the amount of non-Mycobacterial bacterial DNA. Excluding these four regions from pairwise distance comparisons reduced biologically implausible variation from 5.2% to 0% in an independent validation set derived from 226 individuals. Thus, we have demonstrated an approach identifying critical genomic regions contributing to clinically relevant artefactual variation in bacterial similarity searches. The approach described monitors the outputs of the complex multi-step laboratory and bioinformatics process, allows periodic process adjustments, and will have application to quality control of routine bacterial genomics. Copyright © 2018 Wyllie et al.
Identification and Resolution of Microdiversity through Metagenomic Sequencing of Parallel Consortia

PubMed Central

Maezato, Yukari; Wu, Yu-Wei; Romine, Margaret F.; Lindemann, Stephen R.

2015-01-01

To gain a predictive understanding of the interspecies interactions within microbial communities that govern community function, the genomic complement of every member population must be determined. Although metagenomic sequencing has enabled the de novo reconstruction of some microbial genomes from environmental communities, microdiversity confounds current genome reconstruction techniques. To overcome this issue, we performed short-read metagenomic sequencing on parallel consortia, defined as consortia cultivated under the same conditions from the same natural community with overlapping species composition. The differences in species abundance between the two consortia allowed reconstruction of near-complete (at an estimated >85% of gene complement) genome sequences for 17 of the 20 detected member species. Two Halomonas spp. indistinguishable by amplicon analysis were found to be present within the community. In addition, comparison of metagenomic reads against the consensus scaffolds revealed within-species variation for one of the Halomonas populations, one of the Rhodobacteraceae populations, and the Rhizobiales population. Genomic comparison of these representative instances of inter- and intraspecies microdiversity suggests differences in functional potential that may result in the expression of distinct roles in the community. In addition, isolation and complete genome sequence determination of six member species allowed an investigation into the sensitivity and specificity of genome reconstruction processes, demonstrating robustness across a wide range of sequence coverage (9× to 2,700×) within the metagenomic data set. PMID:26497460
Identification and Resolution of Microdiversity through Metagenomic Sequencing of Parallel Consortia

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nelson, William C.; Maezato, Yukari; Wu, Yu-Wei

2015-10-23

To gain a predictive understanding of the interspecies interactions within microbial communities that govern community function, the genomic complement of every member population must be determined. Although metagenomic sequencing has enabled thede novoreconstruction of some microbial genomes from environmental communities, microdiversity confounds current genome reconstruction techniques. To overcome this issue, we performed short-read metagenomic sequencing on parallel consortia, defined as consortia cultivated under the same conditions from the same natural community with overlapping species composition. The differences in species abundance between the two consortia allowed reconstruction of near-complete (at an estimated >85% of gene complement) genome sequences for 17 ofmore » the 20 detected member species. TwoHalomonasspp. indistinguishable by amplicon analysis were found to be present within the community. In addition, comparison of metagenomic reads against the consensus scaffolds revealed within-species variation for one of theHalomonaspopulations, one of theRhodobacteraceaepopulations, and theRhizobialespopulation. Genomic comparison of these representative instances of inter- and intraspecies microdiversity suggests differences in functional potential that may result in the expression of distinct roles in the community. In addition, isolation and complete genome sequence determination of six member species allowed an investigation into the sensitivity and specificity of genome reconstruction processes, demonstrating robustness across a wide range of sequence coverage (9× to 2,700×) within the metagenomic data set.« less
DArT Markers Effectively Target Gene Space in the Rye Genome

PubMed Central

Gawroński, Piotr; Pawełkowicz, Magdalena; Tofil, Katarzyna; Uszyński, Grzegorz; Sharifova, Saida; Ahluwalia, Shivaksh; Tyrka, Mirosław; Wędzony, Maria; Kilian, Andrzej; Bolibok-Brągoszewska, Hanna

2016-01-01

Large genome size and complexity hamper considerably the genomics research in relevant species. Rye (Secale cereale L.) has one of the largest genomes among cereal crops and repetitive sequences account for over 90% of its length. Diversity Arrays Technology is a high-throughput genotyping method, in which a preferential sampling of gene-rich regions is achieved through the use of methylation sensitive restriction enzymes. We obtained sequences of 6,177 rye DArT markers and following a redundancy analysis assembled them into 3,737 non-redundant sequences, which were then used in homology searches against five Pooideae sequence sets. In total 515 DArT sequences could be incorporated into publicly available rye genome zippers providing a starting point for the integration of DArT- and transcript-based genomics resources in rye. Using Blast2Go pipeline we attributed putative gene functions to 1101 (29.4%) of the non-redundant DArT marker sequences, including 132 sequences with putative disease resistance-related functions, which were found to be preferentially located in the 4RL and 6RL chromosomes. Comparative analysis based on the DArT sequences revealed obvious inconsistencies between two recently published high density consensus maps of rye. Furthermore we demonstrated that DArT marker sequences can be a source of SSR polymorphisms. Obtained data demonstrate that DArT markers effectively target gene space in the large, complex, and repetitive rye genome. Through the annotation of putative gene functions and the alignment of DArT sequences relative to reference genomes we obtained information, that will complement the results of the studies, where DArT genotyping was deployed, by simplifying the gene ontology and microcolinearity based identification of candidate genes. PMID:27833625
DArT Markers Effectively Target Gene Space in the Rye Genome.

PubMed

Gawroński, Piotr; Pawełkowicz, Magdalena; Tofil, Katarzyna; Uszyński, Grzegorz; Sharifova, Saida; Ahluwalia, Shivaksh; Tyrka, Mirosław; Wędzony, Maria; Kilian, Andrzej; Bolibok-Brągoszewska, Hanna

2016-01-01

Large genome size and complexity hamper considerably the genomics research in relevant species. Rye ( Secale cereale L.) has one of the largest genomes among cereal crops and repetitive sequences account for over 90% of its length. Diversity Arrays Technology is a high-throughput genotyping method, in which a preferential sampling of gene-rich regions is achieved through the use of methylation sensitive restriction enzymes. We obtained sequences of 6,177 rye DArT markers and following a redundancy analysis assembled them into 3,737 non-redundant sequences, which were then used in homology searches against five Pooideae sequence sets. In total 515 DArT sequences could be incorporated into publicly available rye genome zippers providing a starting point for the integration of DArT- and transcript-based genomics resources in rye. Using Blast2Go pipeline we attributed putative gene functions to 1101 (29.4%) of the non-redundant DArT marker sequences, including 132 sequences with putative disease resistance-related functions, which were found to be preferentially located in the 4RL and 6RL chromosomes. Comparative analysis based on the DArT sequences revealed obvious inconsistencies between two recently published high density consensus maps of rye. Furthermore we demonstrated that DArT marker sequences can be a source of SSR polymorphisms. Obtained data demonstrate that DArT markers effectively target gene space in the large, complex, and repetitive rye genome. Through the annotation of putative gene functions and the alignment of DArT sequences relative to reference genomes we obtained information, that will complement the results of the studies, where DArT genotyping was deployed, by simplifying the gene ontology and microcolinearity based identification of candidate genes.
Rapid hybrid de novo assembly of a microbial genome using only short reads: Corynebacterium pseudotuberculosis I19 as a case study.

PubMed

Cerdeira, Louise Teixeira; Carneiro, Adriana Ribeiro; Ramos, Rommel Thiago Jucá; de Almeida, Sintia Silva; D'Afonseca, Vivian; Schneider, Maria Paula Cruz; Baumbach, Jan; Tauch, Andreas; McCulloch, John Anthony; Azevedo, Vasco Ariston Carvalho; Silva, Artur

2011-08-01

Due to the advent of the so-called Next-Generation Sequencing (NGS) technologies the amount of monetary and temporal resources for whole-genome sequencing has been reduced by several orders of magnitude. Sequence reads can be assembled either by anchoring them directly onto an available reference genome (classical reference assembly), or can be concatenated by overlap (de novo assembly). The latter strategy is preferable because it tends to maintain the architecture of the genome sequence the however, depending on the NGS platform used, the shortness of read lengths cause tremendous problems the in the subsequent genome assembly phase, impeding closing of the entire genome sequence. To address the problem, we developed a multi-pronged hybrid de novo strategy combining De Bruijn graph and Overlap-Layout-Consensus methods, which was used to assemble from short reads the entire genome of Corynebacterium pseudotuberculosis strain I19, a bacterium with immense importance in veterinary medicine that causes Caseous Lymphadenitis in ruminants, principally ovines and caprines. Briefly, contigs were assembled de novo from the short reads and were only oriented using a reference genome by anchoring. Remaining gaps were closed using iterative anchoring of short reads by craning to gap flanks. Finally, we compare the genome sequence assembled using our hybrid strategy to a classical reference assembly using the same data as input and show that with the availability of a reference genome, it pays off to use the hybrid de novo strategy, rather than a classical reference assembly, because more genome sequences are preserved using the former. Copyright © 2011 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Lapidus, Alla L.

From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly ofmore » whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.« less
Improvement in Protein Domain Identification Is Reached by Breaking Consensus, with the Agreement of Many Profiles and Domain Co-occurrence

PubMed Central

Bernardes, Juliana; Zaverucha, Gerson; Vaquero, Catherine; Carbone, Alessandra

2016-01-01

Traditional protein annotation methods describe known domains with probabilistic models representing consensus among homologous domain sequences. However, when relevant signals become too weak to be identified by a global consensus, attempts for annotation fail. Here we address the fundamental question of domain identification for highly divergent proteins. By using high performance computing, we demonstrate that the limits of state-of-the-art annotation methods can be bypassed. We design a new strategy based on the observation that many structural and functional protein constraints are not globally conserved through all species but might be locally conserved in separate clades. We propose a novel exploitation of the large amount of data available: 1. for each known protein domain, several probabilistic clade-centered models are constructed from a large and differentiated panel of homologous sequences, 2. a decision-making protocol combines outcomes obtained from multiple models, 3. a multi-criteria optimization algorithm finds the most likely protein architecture. The method is evaluated for domain and architecture prediction over several datasets and statistical testing hypotheses. Its performance is compared against HMMScan and HHblits, two widely used search methods based on sequence-profile and profile-profile comparison. Due to their closeness to actual protein sequences, clade-centered models are shown to be more specific and functionally predictive than the broadly used consensus models. Based on them, we improved annotation of Plasmodium falciparum protein sequences on a scale not previously possible. We successfully predict at least one domain for 72% of P. falciparum proteins against 63% achieved previously, corresponding to 30% of improvement over the total number of Pfam domain predictions on the whole genome. The method is applicable to any genome and opens new avenues to tackle evolutionary questions such as the reconstruction of ancient domain duplications, the reconstruction of the history of protein architectures, and the estimation of protein domain age. Website and software: http://www.lcqb.upmc.fr/CLADE. PMID:27472895
Investigating intra-host and intra-herd sequence diversity of foot-and-mouth disease virus.

PubMed

King, David J; Freimanis, Graham L; Orton, Richard J; Waters, Ryan A; Haydon, Daniel T; King, Donald P

2016-10-01

Due to the poor-fidelity of the enzymes involved in RNA genome replication, foot-and-mouth disease (FMD) virus samples comprise of unique polymorphic populations. In this study, deep sequencing was utilised to characterise the diversity of FMD virus (FMDV) populations in 6 infected cattle present on a single farm during the series of outbreaks in the UK in 2007. A novel RT-PCR method was developed to amplify a 7.6kb nucleotide fragment encompassing the polyprotein coding region of the FMDV genome. Illumina sequencing of each sample identified the fine polymorphic structures at each nucleotide position, from consensus level changes to variants present at a 0.24% frequency. These data were used to investigate population dynamics of FMDV at both herd and host levels, evaluate the impact of host on the viral swarm structure and to identify transmission links with viruses recovered from other farms in the same series of outbreaks. In 7 samples, from 6 different animals, a total of 5 consensus level variants were identified, in addition to 104 sub-consensus variants of which 22 were shared between 2 or more animals. Further analysis revealed differences in swarm structures from samples derived from the same animal suggesting the presence of distinct viral populations evolving independently at different lesion sites within the same infected animal. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Genomic newborn screening: public health policy considerations and recommendations.

PubMed

Friedman, Jan M; Cornel, Martina C; Goldenberg, Aaron J; Lister, Karla J; Sénécal, Karine; Vears, Danya F

2017-02-21

The use of genome-wide (whole genome or exome) sequencing for population-based newborn screening presents an opportunity to detect and treat or prevent many more serious early-onset health conditions than is possible today. The Paediatric Task Team of the Global Alliance for Genomics and Health's Regulatory and Ethics Working Group reviewed current understanding and concerns regarding the use of genomic technologies for population-based newborn screening and developed, by consensus, eight recommendations for clinicians, clinical laboratory scientists, and policy makers. Before genome-wide sequencing can be implemented in newborn screening programs, its clinical utility and cost-effectiveness must be demonstrated, and the ability to distinguish disease-causing and benign variants of all genes screened must be established. In addition, each jurisdiction needs to resolve ethical and policy issues regarding the disclosure of incidental or secondary findings to families and ownership, appropriate storage and sharing of genomic data. The best interests of children should be the basis for all decisions regarding the implementation of genomic newborn screening.
Comprehensive Survey of Genetic Diversity in Chloroplast Genomes and 45S nrDNAs within Panax ginseng Species

PubMed Central

Kim, Kyunghee; Lee, Sang-Choon; Lee, Junki; Lee, Hyun Oh; Joh, Ho Jun; Kim, Nam-Hoon; Park, Hyun-Seung; Yang, Tae-Jin

2015-01-01

We report complete sequences of chloroplast (cp) genome and 45S nuclear ribosomal DNA (45S nrDNA) for 11 Panax ginseng cultivars. We have obtained complete sequences of cp and 45S nrDNA, the representative barcoding target sequences for cytoplasm and nuclear genome, respectively, based on low coverage NGS sequence of each cultivar. The cp genomes sizes ranged from 156,241 to 156,425 bp and the major size variation was derived from differences in copy number of tandem repeats in the ycf1 gene and in the intergenic regions of rps16-trnUUG and rpl32-trnUAG. The complete 45S nrDNA unit sequences were 11,091 bp, representing a consensus single transcriptional unit with an intergenic spacer region. Comparative analysis of these sequences as well as those previously reported for three Chinese accessions identified very rare but unique polymorphism in the cp genome within P. ginseng cultivars. There were 12 intra-species polymorphisms (six SNPs and six InDels) among 14 cultivars. We also identified five SNPs from 45S nrDNA of 11 Korean ginseng cultivars. From the 17 unique informative polymorphic sites, we developed six reliable markers for analysis of ginseng diversity and cultivar authentication. PMID:26061692
Unveiling the Hybrid Genome Structure of Escherichia coli RR1 (HB101 RecA+)

PubMed Central

Jeong, Haeyoung; Sim, Young Mi; Kim, Hyun Ju; Lee, Sang Jun

2017-01-01

There have been extensive genome sequencing studies for Escherichia coli strains, particularly for pathogenic isolates, because fast determination of pathogenic potential and/or drug resistance and their propagation routes is crucial. For laboratory E. coli strains, however, genome sequence information is limited except for several well-known strains. We determined the complete genome sequence of laboratory E. coli strain RR1 (HB101 RecA+), which has long been used as a general cloning host. A hybrid genome sequence of K-12 MG1655 and B BL21(DE3) was constructed based on the initial mapping of Illumina HiSeq reads to each reference, and iterative rounds of read mapping, variant detection, and consensus extraction were carried out. Finally, PCR and Sanger sequencing-based finishing were applied to resolve non-single nucleotide variant regions with aberrant read depths and breakpoints, most of them resulting from prophages and insertion sequence transpositions that are not present in the reference genome sequence. We found that 96.9% of the RR1 genome is derived from K-12, and identified exact crossover junctions between K-12 and B genomic fragments. However, because RR1 has experienced a series of genetic manipulations since branching from the common ancestor, it has a set of mutations different from those found in K-12 MG1655. As well as identifying all known genotypes of RR1 on the basis of genomic context, we found novel mutations. Our results extend current knowledge of the genotype of RR1 and its relatives, and provide insights into the pedigree, genomic background, and physiology of common laboratory strains. PMID:28421066
Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation

PubMed Central

Pujar, Shashikant; O’Leary, Nuala A; Farrell, Catherine M; Mudge, Jonathan M; Wallin, Craig; Diekhans, Mark; Barnes, If; Bennett, Ruth; Berry, Andrew E; Cox, Eric; Davidson, Claire; Goldfarb, Tamara; Gonzalez, Jose M; Hunt, Toby; Jackson, John; Joardar, Vinita; Kay, Mike P; Kodali, Vamsi K; McAndrews, Monica; McGarvey, Kelly M; Murphy, Michael; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Seal, Ruth L; Webb, David; Zhu, Sophia; Aken, Bronwen L; Bult, Carol J; Frankish, Adam; Pruitt, Kim D

2018-01-01

Abstract The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. PMID:29126148
Reference-guided assembly of four diverse Arabidopsis thaliana genomes

PubMed Central

Schneeberger, Korbinian; Ossowski, Stephan; Ott, Felix; Klein, Juliane D.; Wang, Xi; Lanz, Christa; Smith, Lisa M.; Cao, Jun; Fitz, Joffrey; Warthmann, Norman; Henz, Stefan R.; Huson, Daniel H.; Weigel, Detlef

2011-01-01

We present whole-genome assemblies of four divergent Arabidopsis thaliana strains that complement the 125-Mb reference genome sequence released a decade ago. Using a newly developed reference-guided approach, we assembled large contigs from 9 to 42 Gb of Illumina short-read data from the Landsberg erecta (Ler-1), C24, Bur-0, and Kro-0 strains, which have been sequenced as part of the 1,001 Genomes Project for this species. Using alignments against the reference sequence, we first reduced the complexity of the de novo assembly and later integrated reads without similarity to the reference sequence. As an example, half of the noncentromeric C24 genome was covered by scaffolds that are longer than 260 kb, with a maximum of 2.2 Mb. Moreover, over 96% of the reference genome was covered by the reference-guided assembly, compared with only 87% with a complete de novo assembly. Comparisons with 2 Mb of dideoxy sequence reveal that the per-base error rate of the reference-guided assemblies was below 1 in 10,000. Our assemblies provide a detailed, genomewide picture of large-scale differences between A. thaliana individuals, most of which are difficult to access with alignment-consensus methods only. We demonstrate their practical relevance in studying the expression differences of polymorphic genes and show how the analysis of sRNA sequencing data can lead to erroneous conclusions if aligned against the reference genome alone. Genome assemblies, raw reads, and further information are accessible through http://1001genomes.org/projects/assemblies.html. PMID:21646520
A High Density Consensus Genetic Map of Tetraploid Cotton That Integrates Multiple Component Maps through Molecular Marker Redundancy Check

PubMed Central

Blenda, Anna; Fang, David D.; Rami, Jean-François; Garsmeur, Olivier; Luo, Feng; Lacape, Jean-Marc

2012-01-01

A consensus genetic map of tetraploid cotton was constructed using six high-density maps and after the integration of a sequence-based marker redundancy check. Public cotton SSR libraries (17,343 markers) were curated for sequence redundancy using 90% as a similarity cutoff. As a result, 20% of the markers (3,410) could be considered as redundant with some other markers. The marker redundancy information had been a crucial part of the map integration process, in which the six most informative interspecific Gossypium hirsutum×G. barbadense genetic maps were used for assembling a high density consensus (HDC) map for tetraploid cotton. With redundant markers being removed, the HDC map could be constructed thanks to the sufficient number of collinear non-redundant markers in common between the component maps. The HDC map consists of 8,254 loci, originating from 6,669 markers, and spans 4,070 cM, with an average of 2 loci per cM. The HDC map presents a high rate of locus duplications, as 1,292 markers among the 6,669 were mapped in more than one locus. Two thirds of the duplications are bridging homoeologous AT and DT chromosomes constitutive of allopolyploid cotton genome, with an average of 64 duplications per AT/DT chromosome pair. Sequences of 4,744 mapped markers were used for a mutual blast alignment (BBMH) with the 13 major scaffolds of the recently released Gossypium raimondii genome indicating high level of homology between the diploid D genome and the tetraploid cotton genetic map, with only a few minor possible structural rearrangements. Overall, the HDC map will serve as a valuable resource for trait QTL comparative mapping, map-based cloning of important genes, and better understanding of the genome structure and evolution of tetraploid cotton. PMID:23029214
A filtering method to generate high quality short reads using illumina paired-end technology.

PubMed

Eren, A Murat; Vineis, Joseph H; Morrison, Hilary G; Sogin, Mitchell L

2013-01-01

Consensus between independent reads improves the accuracy of genome and transcriptome analyses, however lack of consensus between very similar sequences in metagenomic studies can and often does represent natural variation of biological significance. The common use of machine-assigned quality scores on next generation platforms does not necessarily correlate with accuracy. Here, we describe using the overlap of paired-end, short sequence reads to identify error-prone reads in marker gene analyses and their contribution to spurious OTUs following clustering analysis using QIIME. Our approach can also reduce error in shotgun sequencing data generated from libraries with small, tightly constrained insert sizes. The open-source implementation of this algorithm in Python programming language with user instructions can be obtained from https://github.com/meren/illumina-utils.
Chimaeric Virus-Like Particles Derived from Consensus Genome Sequences of Human Rotavirus Strains Co-Circulating in Africa

PubMed Central

Jere, Khuzwayo C.; O'Neill, Hester G.; Potgieter, A. Christiaan; van Dijk, Alberdina A.

2014-01-01

Rotavirus virus-like particles (RV-VLPs) are potential alternative non-live vaccine candidates due to their high immunogenicity. They mimic the natural conformation of native viral proteins but cannot replicate because they do not contain genomic material which makes them safe. To date, most RV-VLPs have been derived from cell culture adapted strains or common G1 and G3 rotaviruses that have been circulating in communities for some time. In this study, chimaeric RV-VLPs were generated from the consensus sequences of African rotaviruses (G2, G8, G9 or G12 strains associated with either P[4], P[6] or P[8] genotypes) characterised directly from human stool samples without prior adaptation of the wild type strains to cell culture. Codon-optimised sequences for insect cell expression of genome segments 2 (VP2), 4 (VP4), 6 (VP6) and 9 (VP7) were cloned into a modified pFASTBAC vector, which allowed simultaneous expression of up to four genes using the Bac-to-Bac Baculovirus Expression System (BEVS; Invitrogen). Several combinations of the genome segments originating from different field strains were cloned to produce double-layered RV-VLPs (dRV-VLP; VP2/6), triple-layered RV-VLPs (tRV-VLP; VP2/6/7 or VP2/6/7/4) and chimaeric tRV-VLPs. The RV-VLPs were produced by infecting Spodoptera frugiperda 9 and Trichoplusia ni cells with recombinant baculoviruses using multi-cistronic, dual co-infection and stepwise-infection expression strategies. The size and morphology of the RV-VLPs, as determined by transmission electron microscopy, revealed successful production of RV-VLPs. The novel approach of producing tRV-VLPs, by using the consensus insect cell codon-optimised nucleotide sequence derived from dsRNA extracted directly from clinical specimens, should speed-up vaccine research and development by by-passing the need to adapt rotaviruses to cell culture. Other problems associated with cell culture adaptation, such as possible changes in epitopes, can also be circumvented. Thus, it is now possible to generate tRV-VLPs for evaluation as non-live vaccine candidates for any human or animal field rotavirus strain. PMID:25268783

Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver

PubMed Central

Blanquart, François; Golubchik, Tanya; Gall, Astrid; Bakker, Margreet; Bezemer, Daniela; Croucher, Nicholas J; Hall, Matthew; Hillebregt, Mariska; Ratmann, Oliver; Albert, Jan; Bannert, Norbert; Fellay, Jacques; Fransen, Katrien; Gourlay, Annabelle; Grabowski, M Kate; Gunsenheimer-Bartmeyer, Barbara; Günthard, Huldrych F; Kivelä, Pia; Kouyos, Roger; Laeyendecker, Oliver; Liitsola, Kirsi; Meyer, Laurence; Porter, Kholoud; Ristola, Matti; van Sighem, Ard; Cornelissen, Marion; Kellam, Paul; Reiss, Peter

2018-01-01

Abstract Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user’s choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver’s constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from https://github.com/ChrisHIV/shiver. PMID:29876136
Effective de novo assembly of fish genome using haploid larvae.

PubMed

Iwasaki, Yuki; Nishiki, Issei; Nakamura, Yoji; Yasuike, Motoshige; Kai, Wataru; Nomura, Kazuharu; Yoshida, Kazunori; Nomura, Yousuke; Fujiwara, Atushi; Kobayashi, Takanori; Ototake, Mitsuru

2016-02-01

Recent improvements in next-generation sequencing technology have made it possible to do whole genome sequencing, on even non-model eukaryote species with no available reference genomes. However, de novo assembly of diploid genomes is still a big challenge because of allelic variation. The aim of this study was to determine the feasibility of utilizing the genome of haploid fish larvae for de novo assembly of whole-genome sequences. We compared the efficiency of assembly using the haploid genome of yellowtail (Seriola quinqueradiata) with that using the diploid genome obtained from the dam. De novo assembly from the haploid and the diploid sequence reads (100 million reads per each datasets) generated by the Ion Proton sequencer (200 bp) was done under two different assembly algorithms, namely overlap-layout-consensus (OLC) and de Bruijn graph (DBG). This revealed that the assembly of the haploid genome significantly reduced (approximately 22% for OLC, 9% for DBG) the total number of contigs (with longer average and N50 contig lengths) when compared to the diploid genome assembly. The haploid assembly also improved the quality of the scaffolds by reducing the number of regions with unassigned nucleotides (Ns) (total length of Ns; 45,331,916 bp for haploids and 67,724,360 bp for diploids) in OLC-based assemblies. It appears clear that the haploid genome assembly is better because the allelic variation in the diploid genome disrupts the extension of contigs during the assembly process. Our results indicate that utilizing the genome of haploid larvae leads to a significant improvement in the de novo assembly process, thus providing a novel strategy for the construction of reference genomes from non-model diploid organisms such as fish. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
The First Isolation and Whole Genome Sequencing of Murray Valley Encephalitis Virus from Cerebrospinal Fluid of a Patient with Encephalitis.

PubMed

Russell, Jessica S; Caly, Leon; Kostecki, Renata; McGuinness, Sarah L; Carter, Glen; Bulach, Dieter; Seemann, Torsten; Stinear, Tim P; Baird, Rob; Catton, Mike; Druce, Julian

2018-06-11

Murray Valley Encephalitis virus (MVEV) is a mosquito-borne Flavivirus. Clinical presentation is rare but severe, with a case fatality rate of 15⁻30%. Here we report a case of MVEV from the cerebrospinal fluid (CSF) of a patient in the Northern Territory in Australia. Initial diagnosis was performed using both MVEV-specific real-time, and Pan- Flavivirus conventional, Polymerase Chain Reaction (PCR), with confirmation by Sanger sequencing. Subsequent isolation, the first from CSF, was conducted in Vero cells and the observed cytopathic effect was confirmed by increasing viral titre in the real-time PCR. Isolation allowed for full genome sequencing using the Scriptseq V2 RNASeq library preparation kit. A consensus genome for VIDRL-MVE was generated and phylogenetic analysis identified it as Genotype 2. This is the first reported isolation, and full genome sequencing of MVEV from CSF. It is also the first time Genotype 2 has been identified in humans. As such, this case has significant implications for public health surveillance, epidemiology, and the understanding of MVEV evolution.
Transcriptome characterization and polymorphism detection between subspecies of big sagebrush (Artemisia tridentata)

PubMed Central

2011-01-01

Background Big sagebrush (Artemisia tridentata) is one of the most widely distributed and ecologically important shrub species in western North America. This species serves as a critical habitat and food resource for many animals and invertebrates. Habitat loss due to a combination of disturbances followed by establishment of invasive plant species is a serious threat to big sagebrush ecosystem sustainability. Lack of genomic data has limited our understanding of the evolutionary history and ecological adaptation in this species. Here, we report on the sequencing of expressed sequence tags (ESTs) and detection of single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers in subspecies of big sagebrush. Results cDNA of A. tridentata sspp. tridentata and vaseyana were normalized and sequenced using the 454 GS FLX Titanium pyrosequencing technology. Assembly of the reads resulted in 20,357 contig consensus sequences in ssp. tridentata and 20,250 contigs in ssp. vaseyana. A BLASTx search against the non-redundant (NR) protein database using 29,541 consensus sequences obtained from a combined assembly resulted in 21,436 sequences with significant blast alignments (≤ 1e-15). A total of 20,952 SNPs and 119 polymorphic SSRs were detected between the two subspecies. SNPs were validated through various methods including sequence capture. Validation of SNPs in different individuals uncovered a high level of nucleotide variation in EST sequences. EST sequences of a third, tetraploid subspecies (ssp. wyomingensis) obtained by Illumina sequencing were mapped to the consensus sequences of the combined 454 EST assembly. Approximately one-third of the SNPs between sspp. tridentata and vaseyana identified in the combined assembly were also polymorphic within the two geographically distant ssp. wyomingensis samples. Conclusion We have produced a large EST dataset for Artemisia tridentata, which contains a large sample of the big sagebrush leaf transcriptome. SNP mapping among the three subspecies suggest the origin of ssp. wyomingensis via mixed ancestry. A large number of SNP and SSR markers provide the foundation for future research to address questions in big sagebrush evolution, ecological genetics, and conservation using genomic approaches. PMID:21767398
Molecular phylogeny and SNP variation of polar bears (Ursus maritimus), brown bears (U. arctos), and black bears (U. americanus) derived from genome sequences.

PubMed

Cronin, Matthew A; Rincon, Gonzalo; Meredith, Robert W; MacNeil, Michael D; Islas-Trejo, Alma; Cánovas, Angela; Medrano, Juan F

2014-01-01

We assessed the relationships of polar bears (Ursus maritimus), brown bears (U. arctos), and black bears (U. americanus) with high throughput genomic sequencing data with an average coverage of 25× for each species. A total of 1.4 billion 100-bp paired-end reads were assembled using the polar bear and annotated giant panda (Ailuropoda melanoleuca) genome sequences as references. We identified 13.8 million single nucleotide polymorphisms (SNP) in the 3 species aligned to the polar bear genome. These data indicate that polar bears and brown bears share more SNP with each other than either does with black bears. Concatenation and coalescence-based analysis of consensus sequences of approximately 1 million base pairs of ultraconserved elements in the nuclear genome resulted in a phylogeny with black bears as the sister group to brown and polar bears, and all brown bears are in a separate clade from polar bears. Genotypes for 162 SNP loci of 336 bears from Alaska and Montana showed that the species are genetically differentiated and there is geographic population structure of brown and black bears but not polar bears.
Improvement of the banana "Musa acuminata" reference sequence using NGS data and semi-automated bioinformatics methods.

PubMed

Martin, Guillaume; Baurens, Franc-Christophe; Droc, Gaëtan; Rouard, Mathieu; Cenci, Alberto; Kilian, Andrzej; Hastie, Alex; Doležel, Jaroslav; Aury, Jean-Marc; Alberti, Adriana; Carreel, Françoise; D'Hont, Angélique

2016-03-16

Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (Musa acuminata). We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of Musa acuminata. Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total Musa scaffold number from 7513 to 1532 (i.e. by 80%), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5% of the assembly was anchored to the 11 Musa chromosomes compared to the previous 70%. Unknown sites (N) were reduced from 17.3 to 10.0%. The release of the Musa acuminata reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species.
GFam: a platform for automatic annotation of gene families.

PubMed

Sasidharan, Rajkumar; Nepusz, Tamás; Swarbreck, David; Huala, Eva; Paccanaro, Alberto

2012-10-01

We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam's capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/.
A Pan-HIV Strategy for Complete Genome Sequencing

PubMed Central

Yamaguchi, Julie; Alessandri-Gradt, Elodie; Tell, Robert W.; Brennan, Catherine A.

2015-01-01

Molecular surveillance is essential to monitor HIV diversity and track emerging strains. We have developed a universal library preparation method (HIV-SMART [i.e., switching mechanism at 5′ end of RNA transcript]) for next-generation sequencing that harnesses the specificity of HIV-directed priming to enable full genome characterization of all HIV-1 groups (M, N, O, and P) and HIV-2. Broad application of the HIV-SMART approach was demonstrated using a panel of diverse cell-cultured virus isolates. HIV-1 non-subtype B-infected clinical specimens from Cameroon were then used to optimize the protocol to sequence directly from plasma. When multiplexing 8 or more libraries per MiSeq run, full genome coverage at a median ∼2,000× depth was routinely obtained for either sample type. The method reproducibly generated the same consensus sequence, consistently identified viral sequence heterogeneity present in specimens, and at viral loads of ≤4.5 log copies/ml yielded sufficient coverage to permit strain classification. HIV-SMART provides an unparalleled opportunity to identify diverse HIV strains in patient specimens and to determine phylogenetic classification based on the entire viral genome. Easily adapted to sequence any RNA virus, this technology illustrates the utility of next-generation sequencing (NGS) for viral characterization and surveillance. PMID:26699702
SeqMule: automated pipeline for analysis of human exome/genome sequencing data.

PubMed

Guo, Yunfei; Ding, Xiaolei; Shen, Yufeng; Lyon, Gholson J; Wang, Kai

2015-09-18

Next-generation sequencing (NGS) technology has greatly helped us identify disease-contributory variants for Mendelian diseases. However, users are often faced with issues such as software compatibility, complicated configuration, and no access to high-performance computing facility. Discrepancies exist among aligners and variant callers. We developed a computational pipeline, SeqMule, to perform automated variant calling from NGS data on human genomes and exomes. SeqMule integrates computational-cluster-free parallelization capability built on top of the variant callers, and facilitates normalization/intersection of variant calls to generate consensus set with high confidence. SeqMule integrates 5 alignment tools, 5 variant calling algorithms and accepts various combinations all by one-line command, therefore allowing highly flexible yet fully automated variant calling. In a modern machine (2 Intel Xeon X5650 CPUs, 48 GB memory), when fast turn-around is needed, SeqMule generates annotated VCF files in a day from a 30X whole-genome sequencing data set; when more accurate calling is needed, SeqMule generates consensus call set that improves over single callers, as measured by both Mendelian error rate and consistency. SeqMule supports Sun Grid Engine for parallel processing, offers turn-key solution for deployment on Amazon Web Services, allows quality check, Mendelian error check, consistency evaluation, HTML-based reports. SeqMule is available at http://seqmule.openbioinformatics.org.
Using a color-coded ambigraphic nucleic acid notation to visualize conserved palindromic motifs within and across genomes

PubMed Central

2014-01-01

Background Ambiscript is a graphically-designed nucleic acid notation that uses symbol symmetries to support sequence complementation, highlight biologically-relevant palindromes, and facilitate the analysis of consensus sequences. Although the original Ambiscript notation was designed to easily represent consensus sequences for multiple sequence alignments, the notation’s black-on-white ambiguity characters are unable to reflect the statistical distribution of nucleotides found at each position. We now propose a color-augmented ambigraphic notation to encode the frequency of positional polymorphisms in these consensus sequences. Results We have implemented this color-coding approach by creating an Adobe Flash® application ( http://www.ambiscript.org) that shades and colors modified Ambiscript characters according to the prevalence of the encoded nucleotide at each position in the alignment. The resulting graphic helps viewers perceive biologically-relevant patterns in multiple sequence alignments by uniquely combining color, shading, and character symmetries to highlight palindromes and inverted repeats in conserved DNA motifs. Conclusion Juxtaposing an intuitive color scheme over the deliberate character symmetries of an ambigraphic nucleic acid notation yields a highly-functional nucleic acid notation that maximizes information content and successfully embodies key principles of graphic excellence put forth by the statistician and graphic design theorist, Edward Tufte. PMID:24447494
Sequencing ebola and marburg viruses genomes using microarrays.

PubMed

Hardick, Justin; Woelfel, Roman; Gardner, Warren; Ibrahim, Sofi

2016-08-01

Periodic outbreaks of Ebola and Marburg hemorrhagic fevers have occurred in Africa over the past four decades with case fatality rates reaching as high as 90%. The latest Ebola outbreak in West Africa in 2014 raised concerns that these infections can spread across continents and pose serious health risks. Early and accurate identification of the causative agents is necessary to contain outbreaks. In this report, we describe sequencing-by-hybridization (SBH) technique using high density microarrays to identify Ebola and Marburg viruses. The microarrays were designed to interrogate the sequences of entire viral genomes, and were evaluated with three species of Ebolavirus (Reston, Sudan, and Zaire), and three strains of Marburgvirus (Angola, Musoke, and Ravn). The results showed that the consensus sequences generated with four or more hybridizations had 92.1-98.9% accuracy over 95-99% of the genomes. Additionally, with SBH microarrays it was possible to distinguish between different strains of the Lake Victoria Marburgvirus. J. Med. Virol. 88:1303-1308, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Nuclear Mitochondrial DNA Activates Replication in Saccharomyces cerevisiae

PubMed Central

Chatre, Laurent; Ricchetti, Miria

2011-01-01

The nuclear genome of eukaryotes is colonized by DNA fragments of mitochondrial origin, called NUMTs. These insertions have been associated with a variety of germ-line diseases in humans. The significance of this uptake of potentially dangerous sequences into the nuclear genome is unclear. Here we provide functional evidence that sequences of mitochondrial origin promote nuclear DNA replication in Saccharomyces cerevisiae. We show that NUMTs are rich in key autonomously replicating sequence (ARS) consensus motifs, whose mutation results in the reduction or loss of DNA replication activity. Furthermore, 2D-gel analysis of the mrc1 mutant exposed to hydroxyurea shows that several NUMTs function as late chromosomal origins. We also show that NUMTs located close to or within ARS provide key sequence elements for replication. Thus NUMTs can act as independent origins, when inserted in an appropriate genomic context or affect the efficiency of pre-existing origins. These findings show that migratory mitochondrial DNAs can impact on the replication of the nuclear region they are inserted in. PMID:21408151
Nuclear mitochondrial DNA activates replication in Saccharomyces cerevisiae.

PubMed

Chatre, Laurent; Ricchetti, Miria

2011-03-08

The nuclear genome of eukaryotes is colonized by DNA fragments of mitochondrial origin, called NUMTs. These insertions have been associated with a variety of germ-line diseases in humans. The significance of this uptake of potentially dangerous sequences into the nuclear genome is unclear. Here we provide functional evidence that sequences of mitochondrial origin promote nuclear DNA replication in Saccharomyces cerevisiae. We show that NUMTs are rich in key autonomously replicating sequence (ARS) consensus motifs, whose mutation results in the reduction or loss of DNA replication activity. Furthermore, 2D-gel analysis of the mrc1 mutant exposed to hydroxyurea shows that several NUMTs function as late chromosomal origins. We also show that NUMTs located close to or within ARS provide key sequence elements for replication. Thus NUMTs can act as independent origins, when inserted in an appropriate genomic context or affect the efficiency of pre-existing origins. These findings show that migratory mitochondrial DNAs can impact on the replication of the nuclear region they are inserted in.
Saturation of an Intra-Gene Pool Linkage Map: Towards a Unified Consensus Linkage Map for Fine Mapping and Synteny Analysis in Common Bean

PubMed Central

Galeano, Carlos H.; Fernandez, Andrea C.; Franco-Herrera, Natalia; Cichy, Karen A.; McClean, Phillip E.; Vanderleyden, Jos; Blair, Matthew W.

2011-01-01

Map-based cloning and fine mapping to find genes of interest and marker assisted selection (MAS) requires good genetic maps with reproducible markers. In this study, we saturated the linkage map of the intra-gene pool population of common bean DOR364×BAT477 (DB) by evaluating 2,706 molecular markers including SSR, SNP, and gene-based markers. On average the polymorphism rate was 7.7% due to the narrow genetic base between the parents. The DB linkage map consisted of 291 markers with a total map length of 1,788 cM. A consensus map was built using the core mapping populations derived from inter-gene pool crosses: DOR364×G19833 (DG) and BAT93×JALO EEP558 (BJ). The consensus map consisted of a total of 1,010 markers mapped, with a total map length of 2,041 cM across 11 linkage groups. On average, each linkage group on the consensus map contained 91 markers of which 83% were single copy markers. Finally, a synteny analysis was carried out using our highly saturated consensus maps compared with the soybean pseudo-chromosome assembly. A total of 772 marker sequences were compared with the soybean genome. A total of 44 syntenic blocks were identified. The linkage group Pv6 presented the most diverse pattern of synteny with seven syntenic blocks, and Pv9 showed the most consistent relations with soybean with just two syntenic blocks. Additionally, a co-linear analysis using common bean transcript map information against soybean coding sequences (CDS) revealed the relationship with 787 soybean genes. The common bean consensus map has allowed us to map a larger number of markers, to obtain a more complete coverage of the common bean genome. Our results, combined with synteny relationships provide tools to increase marker density in selected genomic regions to identify closely linked polymorphic markers for indirect selection, fine mapping or for positional cloning. PMID:22174773
Genomic Characterization of DArT Markers Based on High-Density Linkage Analysis and Physical Mapping to the Eucalyptus Genome

PubMed Central

Petroli, César D.; Sansaloni, Carolina P.; Carling, Jason; Steane, Dorothy A.; Vaillancourt, René E.; Myburg, Alexander A.; da Silva, Orzenil Bonfim; Pappas, Georgios Joannis; Kilian, Andrzej; Grattapaglia, Dario

2012-01-01

Diversity Arrays Technology (DArT) provides a robust, high throughput, cost-effective method to query thousands of sequence polymorphisms in a single assay. Despite the extensive use of this genotyping platform for numerous plant species, little is known regarding the sequence attributes and genome-wide distribution of DArT markers. We investigated the genomic properties of the 7,680 DArT marker probes of a Eucalyptus array, by sequencing them, constructing a high density linkage map and carrying out detailed physical mapping analyses to the Eucalyptus grandis reference genome. A consensus linkage map with 2,274 DArT markers anchored to 210 microsatellites and a framework map, with improved support for ordering, displayed extensive collinearity with the genome sequence. Only 1.4 Mbp of the 75 Mbp of still unplaced scaffold sequence was captured by 45 linkage mapped but physically unaligned markers to the 11 main Eucalyptus pseudochromosomes, providing compelling evidence for the quality and completeness of the current Eucalyptus genome assembly. A highly significant correspondence was found between the locations of DArT markers and predicted gene models, while most of the 89 DArT probes unaligned to the genome correspond to sequences likely absent in E. grandis, consistent with the pan-genomic feature of this multi-Eucalyptus species DArT array. These comprehensive linkage-to-physical mapping analyses provide novel data regarding the genomic attributes of DArT markers in plant genomes in general and for Eucalyptus in particular. DArT markers preferentially target the gene space and display a largely homogeneous distribution across the genome, thereby providing superb coverage for mapping and genome-wide applications in breeding and diversity studies. Data reported on these ubiquitous properties of DArT markers will be particularly valuable to researchers working on less-studied crop species who already count on DArT genotyping arrays but for which no reference genome is yet available to allow such detailed characterization. PMID:22984541
MEGANTE: A Web-Based System for Integrated Plant Genome Annotation

PubMed Central

Numa, Hisataka; Itoh, Takeshi

2014-01-01

The recent advancement of high-throughput genome sequencing technologies has resulted in a considerable increase in demands for large-scale genome annotation. While annotation is a crucial step for downstream data analyses and experimental studies, this process requires substantial expertise and knowledge of bioinformatics. Here we present MEGANTE, a web-based annotation system that makes plant genome annotation easy for researchers unfamiliar with bioinformatics. Without any complicated configuration, users can perform genomic sequence annotations simply by uploading a sequence and selecting the species to query. MEGANTE automatically runs several analysis programs and integrates the results to select the appropriate consensus exon–intron structures and to predict open reading frames (ORFs) at each locus. Functional annotation, including a similarity search against known proteins and a functional domain search, are also performed for the predicted ORFs. The resultant annotation information is visualized with a widely used genome browser, GBrowse. For ease of analysis, the results can be downloaded in Microsoft Excel format. All of the query sequences and annotation results are stored on the server side so that users can access their own data from virtually anywhere on the web. The current release of MEGANTE targets 24 plant species from the Brassicaceae, Fabaceae, Musaceae, Poaceae, Salicaceae, Solanaceae, Rosaceae and Vitaceae families, and it allows users to submit a sequence up to 10 Mb in length and to save up to 100 sequences with the annotation information on the server. The MEGANTE web service is available at https://megante.dna.affrc.go.jp/. PMID:24253915
A High Density Consensus Map of Rye (Secale cereale L.) Based on DArT Markers

PubMed Central

Myśków, Beata; Stojałowski, Stefan; Heller-Uszyńska, Katarzyna; Góralska, Magdalena; Brągoszewski, Piotr; Uszyński, Grzegorz; Kilian, Andrzej; Rakoczy-Trojanowska, Monika

2011-01-01

Background Rye (Secale cereale L.) is an economically important crop, exhibiting unique features such as outstanding resistance to biotic and abiotic stresses and high nutrient use efficiency. This species presents a challenge to geneticists and breeders due to its large genome containing a high proportion of repetitive sequences, self incompatibility, severe inbreeding depression and tissue culture recalcitrance. The genomic resources currently available for rye are underdeveloped in comparison with other crops of similar economic importance. The aim of this study was to create a highly saturated, multilocus linkage map of rye via consensus mapping, based on Diversity Arrays Technology (DArT) markers. Methodology/Principal Findings Recombinant inbred lines (RILs) from 5 populations (564 in total) were genotyped using DArT markers and subjected to linkage analysis using Join Map 4.0 and Multipoint Consensus 2.2 software. A consensus map was constructed using a total of 9703 segregating markers. The average chromosome map length ranged from 199.9 cM (2R) to 251.4 cM (4R) and the average map density was 1.1 cM. The integrated map comprised 4048 loci with the number of markers per chromosome ranging from 454 for 7R to 805 for 4R. In comparison with previously published studies on rye, this represents an eight-fold increase in the number of loci placed on a consensus map and a more than two-fold increase in the number of genetically mapped DArT markers. Conclusions/Significance Through the careful choice of marker type, mapping populations and the use of software packages implementing powerful algorithms for map order optimization, we produced a valuable resource for rye and triticale genomics and breeding, which provides an excellent starting point for more in-depth studies on rye genome organization. PMID:22163026
Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains.

PubMed

Lewis, Tony E; Sillitoe, Ian; Andreeva, Antonina; Blundell, Tom L; Buchan, Daniel W A; Chothia, Cyrus; Cuff, Alison; Dana, Jose M; Filippis, Ioannis; Gough, Julian; Hunter, Sarah; Jones, David T; Kelley, Lawrence A; Kleywegt, Gerard J; Minneci, Federico; Mitchell, Alex; Murzin, Alexey G; Ochoa-Montaño, Bernardo; Rackham, Owen J L; Smith, James; Sternberg, Michael J E; Velankar, Sameer; Yeats, Corin; Orengo, Christine

2013-01-01

Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence-structure-function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker's yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).
Analysis of sequence variability in the macronuclear DNA of Paramecium tetraurelia: A somatic view of the germline

PubMed Central

Duret, Laurent; Cohen, Jean; Jubin, Claire; Dessen, Philippe; Goût, Jean-François; Mousset, Sylvain; Aury, Jean-Marc; Jaillon, Olivier; Noël, Benjamin; Arnaiz, Olivier; Bétermier, Mireille; Wincker, Patrick; Meyer, Eric; Sperling, Linda

2008-01-01

Ciliates are the only unicellular eukaryotes known to separate germinal and somatic functions. Diploid but silent micronuclei transmit the genetic information to the next sexual generation. Polyploid macronuclei express the genetic information from a streamlined version of the genome but are replaced at each sexual generation. The macronuclear genome of Paramecium tetraurelia was recently sequenced by a shotgun approach, providing access to the gene repertoire. The 72-Mb assembly represents a consensus sequence for the somatic DNA, which is produced after sexual events by reproducible rearrangements of the zygotic genome involving elimination of repeated sequences, precise excision of unique-copy internal eliminated sequences (IES), and amplification of the cellular genes to high copy number. We report use of the shotgun sequencing data (>106 reads representing 13× coverage of a completely homozygous clone) to evaluate variability in the somatic DNA produced by these developmental genome rearrangements. Although DNA amplification appears uniform, both of the DNA elimination processes produce sequence heterogeneity. The variability that arises from IES excision allowed identification of hundreds of putative new IESs, compared to 42 that were previously known, and revealed cases of erroneous excision of segments of coding sequences. We demonstrate that IESs in coding regions are under selective pressure to introduce premature termination of translation in case of excision failure. PMID:18256234
Development of the first consensus genetic map of intermediate wheatgrass (Thinopyrum intermedium) using genotyping-by-sequencing

USDA-ARS?s Scientific Manuscript database

Intermediate wheatgrass (Thinopyrum intermedium) has been identified as a candidate for domestication and improvement as a perennial grain, forage, and biofuel crop by several active breeding programs. To accelerate this process using genomics-assisted breeding, efficient genotyping methods and gen...

Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation.

PubMed

Pujar, Shashikant; O'Leary, Nuala A; Farrell, Catherine M; Loveland, Jane E; Mudge, Jonathan M; Wallin, Craig; Girón, Carlos G; Diekhans, Mark; Barnes, If; Bennett, Ruth; Berry, Andrew E; Cox, Eric; Davidson, Claire; Goldfarb, Tamara; Gonzalez, Jose M; Hunt, Toby; Jackson, John; Joardar, Vinita; Kay, Mike P; Kodali, Vamsi K; Martin, Fergal J; McAndrews, Monica; McGarvey, Kelly M; Murphy, Michael; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Seal, Ruth L; Suner, Marie-Marthe; Webb, David; Zhu, Sophia; Aken, Bronwen L; Bruford, Elspeth A; Bult, Carol J; Frankish, Adam; Murphy, Terence; Pruitt, Kim D

2018-01-04

The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. Published by Oxford University Press on behalf of Nucleic Acids Research 2017.
Fine definition of the pedigree haplotypes of closely related rice cultivars by means of genome-wide discovery of single-nucleotide polymorphisms.

PubMed

Yamamoto, Toshio; Nagasaki, Hideki; Yonemaru, Jun-ichi; Ebana, Kaworu; Nakajima, Maiko; Shibaya, Taeko; Yano, Masahiro

2010-04-27

To create useful gene combinations in crop breeding, it is necessary to clarify the dynamics of the genome composition created by breeding practices. A large quantity of single-nucleotide polymorphism (SNP) data is required to permit discrimination of chromosome segments among modern cultivars, which are genetically related. Here, we used a high-throughput sequencer to conduct whole-genome sequencing of an elite Japanese rice cultivar, Koshihikari, which is closely related to Nipponbare, whose genome sequencing has been completed. Then we designed a high-throughput typing array based on the SNP information by comparison of the two sequences. Finally, we applied this array to analyze historical representative rice cultivars to understand the dynamics of their genome composition. The total 5.89-Gb sequence for Koshihikari, equivalent to 15.7 x the entire rice genome, was mapped using the Pseudomolecules 4.0 database for Nipponbare. The resultant Koshihikari genome sequence corresponded to 80.1% of the Nipponbare sequence and led to the identification of 67,051 SNPs. A high-throughput typing array consisting of 1917 SNP sites distributed throughout the genome was designed to genotype 151 representative Japanese cultivars that have been grown during the past 150 years. We could identify the ancestral origin of the pedigree haplotypes in 60.9% of the Koshihikari genome and 18 consensus haplotype blocks which are inherited from traditional landraces to current improved varieties. Moreover, it was predicted that modern breeding practices have generally decreased genetic diversity Detection of genome-wide SNPs by both high-throughput sequencer and typing array made it possible to evaluate genomic composition of genetically related rice varieties. With the aid of their pedigree information, we clarified the dynamics of chromosome recombination during the historical rice breeding process. We also found several genomic regions decreasing genetic diversity which might be caused by a recent human selection in rice breeding. The definition of pedigree haplotypes by means of genome-wide SNPs will facilitate next-generation breeding of rice and other crops.
Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes.

PubMed

Lomsadze, Alexandre; Gemayel, Karl; Tang, Shiyuyun; Borodovsky, Mark

2018-05-17

In a conventional view of the prokaryotic genome organization, promoters precede operons and ribosome binding sites (RBSs) with Shine-Dalgarno consensus precede genes. However, recent experimental research suggesting a more diverse view motivated us to develop an algorithm with improved gene-finding accuracy. We describe GeneMarkS-2, an ab initio algorithm that uses a model derived by self-training for finding species-specific (native) genes, along with an array of precomputed "heuristic" models designed to identify harder-to-detect genes (likely horizontally transferred). Importantly, we designed GeneMarkS-2 to identify several types of distinct sequence patterns (signals) involved in gene expression control, among them the patterns characteristic for leaderless transcription as well as noncanonical RBS patterns. To assess the accuracy of GeneMarkS-2, we used genes validated by COG (Clusters of Orthologous Groups) annotation, proteomics experiments, and N-terminal protein sequencing. We observed that GeneMarkS-2 performed better on average in all accuracy measures when compared with the current state-of-the-art gene prediction tools. Furthermore, the screening of ∼5000 representative prokaryotic genomes made by GeneMarkS-2 predicted frequent leaderless transcription in both archaea and bacteria. We also observed that the RBS sites in some species with leadered transcription did not necessarily exhibit the Shine-Dalgarno consensus. The modeling of different types of sequence motifs regulating gene expression prompted a division of prokaryotic genomes into five categories with distinct sequence patterns around the gene starts. © 2018 Lomsadze et al.; Published by Cold Spring Harbor Laboratory Press.
“A draft Musa balbisiana genome sequence for molecular genetics in polyploid, inter- and intra-specific Musa hybrids”

PubMed Central

2013-01-01

Background Modern banana cultivars are primarily interspecific triploid hybrids of two species, Musa acuminata and Musa balbisiana, which respectively contribute the A- and B-genomes. The M. balbisiana genome has been associated with improved vigour and tolerance to biotic and abiotic stresses and is thus a target for Musa breeding programs. However, while a reference M. acuminata genome has recently been released (Nature 488:213–217, 2012), little sequence data is available for the corresponding B-genome. To address these problems we carried out Next Generation gDNA sequencing of the wild diploid M. balbisiana variety ‘Pisang Klutuk Wulung’ (PKW). Our strategy was to align PKW gDNA reads against the published A-genome and to extract the mapped consensus sequences for subsequent rounds of evaluation and gene annotation. Results The resulting B-genome is 79% the size of the A-genome, and contains 36,638 predicted functional gene sequences which is nearly identical to the 36,542 of the A-genome. There is substantial sequence divergence from the A-genome at a frequency of 1 homozygous SNP per 23.1 bp, and a high degree of heterozygosity corresponding to one heterozygous SNP per 55.9 bp. Using expressed small RNA data, a similar number of microRNA sequences were predicted in both A- and B-genomes, but additional novel miRNAs were detected, including some that are unique to each genome. The usefulness of this B-genome sequence was evaluated by mapping RNA-seq data from a set of triploid AAA and AAB hybrids simultaneously to both genomes. Results for the plantains demonstrated the expected 2:1 distribution of reads across the A- and B-genomes, but for the AAA genomes, results show they contain regions of significant homology to the B-genome supporting proposals that there has been a history of interspecific recombination between homeologous A and B chromosomes in Musa hybrids. Conclusions We have generated and annotated a draft reference Musa B-genome and demonstrate that this can be used for molecular genetic mapping of gene transcripts and small RNA expression data from several allopolyploid banana cultivars. This draft therefore represents a valuable resource to support the study of metabolism in inter- and intraspecific triploid Musa hybrids and to help direct breeding programs. PMID:24094114
LPmerge: an R package for merging genetic maps by linear programming.

PubMed

Endelman, Jeffrey B; Plomion, Christophe

2014-06-01

Consensus genetic maps constructed from multiple populations are an important resource for both basic and applied research, including genome-wide association analysis, genome sequence assembly and studies of evolution. The LPmerge software uses linear programming to efficiently minimize the mean absolute error between the consensus map and the linkage maps from each population. This minimization is performed subject to linear inequality constraints that ensure the ordering of the markers in the linkage maps is preserved. When marker order is inconsistent between linkage maps, a minimum set of ordinal constraints is deleted to resolve the conflicts. LPmerge is on CRAN at http://cran.r-project.org/web/packages/LPmerge. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Comparative analysis of seven viral nuclear export signals (NESs) reveals the crucial role of nuclear export mediated by the third NES consensus sequence of nucleoprotein (NP) in influenza A virus replication.

PubMed

Chutiwitoonchai, Nopporn; Kakisaka, Michinori; Yamada, Kazunori; Aida, Yoko

2014-01-01

The assembly of influenza virus progeny virions requires machinery that exports viral genomic ribonucleoproteins from the cell nucleus. Currently, seven nuclear export signal (NES) consensus sequences have been identified in different viral proteins, including NS1, NS2, M1, and NP. The present study examined the roles of viral NES consensus sequences and their significance in terms of viral replication and nuclear export. Mutation of the NP-NES3 consensus sequence resulted in a failure to rescue viruses using a reverse genetics approach, whereas mutation of the NS2-NES1 and NS2-NES2 sequences led to a strong reduction in viral replication kinetics compared with the wild-type sequence. While the viral replication kinetics for other NES mutant viruses were also lower than those of the wild-type, the difference was not so marked. Immunofluorescence analysis after transient expression of NP-NES3, NS2-NES1, or NS2-NES2 proteins in host cells showed that they accumulated in the cell nucleus. These results suggest that the NP-NES3 consensus sequence is mostly required for viral replication. Therefore, each of the hydrophobic (Φ) residues within this NES consensus sequence (Φ1, Φ2, Φ3, or Φ4) was mutated, and its viral replication and nuclear export function were analyzed. No viruses harboring NP-NES3 Φ2 or Φ3 mutants could be rescued. Consistent with this, the NP-NES3 Φ2 and Φ3 mutants showed reduced binding affinity with CRM1 in a pull-down assay, and both accumulated in the cell nucleus. Indeed, a nuclear export assay revealed that these mutant proteins showed lower nuclear export activity than the wild-type protein. Moreover, the Φ2 and Φ3 residues (along with other Φ residues) within the NP-NES3 consensus were highly conserved among different influenza A viruses, including human, avian, and swine. Taken together, these results suggest that the Φ2 and Φ3 residues within the NP-NES3 protein are important for its nuclear export function during viral replication.
Evaluation of a decision aid for incidental genomic results, the Genomics ADvISER: protocol for a mixed methods randomised controlled trial.

PubMed

Shickh, Salma; Clausen, Marc; Mighton, Chloe; Casalino, Selina; Joshi, Esha; Glogowski, Emily; Schrader, Kasmintan A; Scheer, Adena; Elser, Christine; Panchal, Seema; Eisen, Andrea; Graham, Tracy; Aronson, Melyssa; Semotiuk, Kara M; Winter-Paquette, Laura; Evans, Michael; Lerner-Ellis, Jordan; Carroll, June C; Hamilton, Jada G; Offit, Kenneth; Robson, Mark; Thorpe, Kevin E; Laupacis, Andreas; Bombard, Yvonne

2018-04-26

Genome sequencing, a novel genetic diagnostic technology that analyses the billions of base pairs of DNA, promises to optimise healthcare through personalised diagnosis and treatment. However, implementation of genome sequencing faces challenges including the lack of consensus on disclosure of incidental results, gene changes unrelated to the disease under investigation, but of potential clinical significance to the patient and their provider. Current recommendations encourage clinicians to return medically actionable incidental results and stress the importance of education and informed consent. Given the shortage of genetics professionals and genomics expertise among healthcare providers, decision aids (DAs) can help fill a critical gap in the clinical delivery of genome sequencing. We aim to assess the effectiveness of an interactive DA developed for selection of incidental results. We will compare the DA in combination with a brief Q&A session with a genetic counsellor to genetic counselling alone in a mixed-methods randomised controlled trial. Patients who received negative standard cancer genetic results for their personal and family history of cancer and are thus eligible for sequencing will be recruited from cancer genetics clinics in Toronto. Our primary outcome is decisional conflict. Secondary outcomes are knowledge, satisfaction, preparation for decision-making, anxiety and length of session with the genetic counsellor. A subset of participants will complete a qualitative interview about preferences for incidental results. This study has been approved by research ethics boards of St. Michael's Hospital, Mount Sinai Hospital and Sunnybrook Health Sciences Centre. This research poses no significant risk to participants. This study evaluates the effectiveness of a novel patient-centred tool to support clinical delivery of incidental results. Results will be shared through national and international conferences, and at a stakeholder workshop to develop a consensus statement to optimise implementation of the DA in practice. NCT03244202; Pre-results. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Structure and genomic organization of the human B1 receptor gene for kinins (BDKRB1).

PubMed

Bachvarov, D R; Hess, J F; Menke, J G; Larrivée, J F; Marceau, F

1996-05-01

Two subtypes of mammalian bradykinin receptors, B1 and B2 (BDKRB1 and BDKRB2), have been defined based on their pharmacological properties. The B1 type kinin receptors have weak affinity for intact BK or Lys-BK but strong affinity for kinin metabolites without the C-terminal arginine (e.g., des-Arg9-BK and Lys-des-Arg9-BK, also called des-Arg10-kallidin), which are generated by kininase I. The B1 receptor expression is up-regulated following tissue injury and inflammation (hyperemia, exudation, hyperalgesia, etc.). In the present study, we have cloned and sequenced the gene encoding human B1 receptor from a human genomic library. The human B1 receptor gene contains three exons separated by two introns. The first and the second exon are noncoding, while the coding region and the 3'-flanking region are located entirely on the third exon. The exon-intron arrangement of the human B1 receptor gene shows significant similarity with the genes encoding the B2 receptor subtype in human, mouse, and rat. Sequence analysis of the 5'-flanking region revealed the presence of a consensus TATA box and of numerous candidate transcription factor binding sequences. Primer extension experiments have shown the existence of multiple transcription initiation sites situated downstream and upstream from the consensus TATA box. Genomic Southern blot analysis indicated that the human B1 receptor is encoded by a single-copy gene.
Reference quality assembly of the 3.5-Gb genome of Capsicum annuum from a single linked-read library.

PubMed

Hulse-Kemp, Amanda M; Maheshwari, Shamoni; Stoffel, Kevin; Hill, Theresa A; Jaffe, David; Williams, Stephen R; Weisenfeld, Neil; Ramakrishnan, Srividya; Kumar, Vijay; Shah, Preyas; Schatz, Michael C; Church, Deanna M; Van Deynze, Allen

2018-01-01

Linked-Read sequencing technology has recently been employed successfully for de novo assembly of human genomes, however, the utility of this technology for complex plant genomes is unproven. We evaluated the technology for this purpose by sequencing the 3.5-gigabase (Gb) diploid pepper ( Capsicum annuum ) genome with a single Linked-Read library. Plant genomes, including pepper, are characterized by long, highly similar repetitive sequences. Accordingly, significant effort is used to ensure that the sequenced plant is highly homozygous and the resulting assembly is a haploid consensus. With a phased assembly approach, we targeted a heterozygous F 1 derived from a wide cross to assess the ability to derive both haplotypes and characterize a pungency gene with a large insertion/deletion. The Supernova software generated a highly ordered, more contiguous sequence assembly than all currently available C. annuum reference genomes. Over 83% of the final assembly was anchored and oriented using four publicly available de novo linkage maps. A comparison of the annotation of conserved eukaryotic genes indicated the completeness of assembly. The validity of the phased assembly is further demonstrated with the complete recovery of both 2.5-Kb insertion/deletion haplotypes of the PUN1 locus in the F 1 sample that represents pungent and nonpungent peppers, as well as nearly full recovery of the BUSCO2 gene set within each of the two haplotypes. The most contiguous pepper genome assembly to date has been generated which demonstrates that Linked-Read library technology provides a tool to de novo assemble complex highly repetitive heterozygous plant genomes. This technology can provide an opportunity to cost-effectively develop high-quality genome assemblies for other complex plants and compare structural and gene differences through accurate haplotype reconstruction.
Genomic Variability of Haemophilus influenzae Isolated from Mexican Children Determined by Using Enterobacterial Repetitive Intergenic Consensus Sequences and PCR

PubMed Central

Gomez-De-Leon, Patricia; Santos, Jose I.; Caballero, Javier; Gomez, Demostenes; Espinosa, Luz E.; Moreno, Isabel; Piñero, Daniel; Cravioto, Alejandro

2000-01-01

Genomic fingerprints from 92 capsulated and noncapsulated strains of Haemophilus influenzae from Mexican children with different diseases and healthy carriers were generated by PCR using the enterobacterial repetitive intergenic consensus (ERIC) sequences. A cluster analysis by the unweighted pair-group method with arithmetic averages based on the overall similarity as estimated from the characteristics of the genomic fingerprints, was conducted to group the strains. A total of 69 fingerprint patterns were detected in the H. influenzae strains. Isolates from patients with different diseases were represented by a variety of patterns, which clustered into two major groups. Of the 37 strains isolated from cases of meningitis, 24 shared patterns and were clustered into five groups within a similarity level of 1.0. One fragment of 1.25 kb was common to all meningitis strains. H. influenzae strains from healthy carriers presented fingerprint patterns different from those found in strains from sick children. Isolates from healthy individuals were more variable and were distributed differently from those from patients. The results show that ERIC-PCR provides a powerful tool for the determination of the distinctive pathogenicity potentials of H. influenzae strains and encourage its use for molecular epidemiology investigations. PMID:10878033
Enterobacterial Repetitive Intergenic Consensus Sequences as Molecular Targets for Typing of Mycobacterium tuberculosis Strains

PubMed Central

Sechi, Leonardo A.; Zanetti, Stefania; Dupré, Ilaria; Delogu, Giovanni; Fadda, Giovanni

1998-01-01

The presence of enterobacterial repetitive intergenic consensus (ERIC) sequences was demonstrated for the first time in the genome of Mycobacterium tuberculosis; these sequences have been found in transcribed regions of the chromosomes of gram-negative bacteria. In this study genetic diversity among clinical isolates of M. tuberculosis was determined by PCR with ERIC primers (ERIC-PCR). The study isolates comprised 71 clinical isolates collected from Sardinia, Italy. ERIC-PCR was able to identify 59 distinct profiles. The results obtained were compared with IS6110 and PCR-GTG fingerprinting. We found that the level of differentiation obtained by ERIC-PCR is greater than that obtained by IS6110 fingerprinting and comparable to that obtained by PCR-GTG. This method of fingerprinting is rapid and sensitive and can be applied to the study of the epidemiology of M. tuberculosis infections, especially when IS6110 fingerprinting is not of any help. PMID:9431935
Complete sequence of the genome of avian paramyxovirus type 9 and comparison with other paramyxoviruses

PubMed Central

Samuel, Arthur S.; Kumar, Sachin; Madhuri, Subbiah; Collins, Peter L.; Samal, Siba K.

2009-01-01

The complete genome consensus sequence was determined for avian paramyxovirus (APMV) serotype 9 prototype strain PMV-9/domestic Duck/New York/22/78. The genome is 15,438 nucleotides (nt) long and encodes six non-overlapping genes in the order of 3′-N-P/V/W-M-F-HN-L-5′ with intergenic regions of 0–30 nt. The genome length follows the “rule of six” and contains a 55-nt leader sequence at the 3′ end and a 47-nt trailer sequence at the 5′ end. The cleavage site of the F protein is I-R-E-G-R-I↓F, which does not conform to the conventional cleavage site of the ubiquitous cellular protease furin. The virus required exogenous protease for in vitro replication and grew only in a few established cell lines, indicating a restricted host range. Alignment and phylogenetic analysis of the predicted amino acid sequences of APMV-9 proteins with the cognate proteins of viruses of all five genera of family Paramyxoviridae showed that APMV-9 is more closely related to APMV-1 than to other APMVs. The mean death time in embryonated chicken eggs was found to be more than 120 h, indicating APMV-9 to be avirulent for chickens. PMID:19185593
Human adenovirus serotype 12 virion precursors pMu and pVI are cleaved at amino-terminal and carboxy-terminal sites that conform to the adenovirus 2 endoproteinase cleavage consensus sequence.

PubMed

Freimuth, P; Anderson, C W

1993-03-01

The sequence of a 1158-base pair fragment of the human adenovirus serotype 12 (Ad12) genome was determined. This segment encodes the precursors for virion components Mu and VI. Both Ad12 precursors contain two sequences that conform to a consensus sequence motif for cleavage by the endoproteinase of adenovirus 2 (Ad2). Analysis of the amino terminus of VI and of the peptide fragments found in Ad12 virions demonstrated that these sites are cleaved during Ad12 maturation. This observation suggests that the recognition motif for adenovirus endoproteinases is highly conserved among human serotypes. The adenovirus 2 endoproteinase polypeptide requires additional co-factors for activity (C. W. Anderson, Protein Expression Purif., 1993, 4, 8-15). Synthetic Ad12 or Ad2 pVI carboxy-terminal peptides each permitted efficient cleavage of an artificial endoproteinase substrate by recombinant Ad2 endoproteinase polypeptide.
Repetitive Elements May Comprise Over Two-Thirds of the Human Genome

PubMed Central

de Koning, A. P. Jason; Gu, Wanjun; Castoe, Todd A.; Batzer, Mark A.; Pollock, David D.

2011-01-01

Transposable elements (TEs) are conventionally identified in eukaryotic genomes by alignment to consensus element sequences. Using this approach, about half of the human genome has been previously identified as TEs and low-complexity repeats. We recently developed a highly sensitive alternative de novo strategy, P-clouds, that instead searches for clusters of high-abundance oligonucleotides that are related in sequence space (oligo “clouds”). We show here that P-clouds predicts >840 Mbp of additional repetitive sequences in the human genome, thus suggesting that 66%–69% of the human genome is repetitive or repeat-derived. To investigate this remarkable difference, we conducted detailed analyses of the ability of both P-clouds and a commonly used conventional approach, RepeatMasker (RM), to detect different sized fragments of the highly abundant human Alu and MIR SINEs. RM can have surprisingly low sensitivity for even moderately long fragments, in contrast to P-clouds, which has good sensitivity down to small fragment sizes (∼25 bp). Although short fragments have a high intrinsic probability of being false positives, we performed a probabilistic annotation that reflects this fact. We further developed “element-specific” P-clouds (ESPs) to identify novel Alu and MIR SINE elements, and using it we identified ∼100 Mb of previously unannotated human elements. ESP estimates of new MIR sequences are in good agreement with RM-based predictions of the amount that RM missed. These results highlight the need for combined, probabilistic genome annotation approaches and suggest that the human genome consists of substantially more repetitive sequence than previously believed. PMID:22144907
Enhanced Methods for Local Ancestry Assignment in Sequenced Admixed Individuals

PubMed Central

Brown, Robert; Pasaniuc, Bogdan

2014-01-01

Inferring the ancestry at each locus in the genome of recently admixed individuals (e.g., Latino Americans) plays a major role in medical and population genetic inferences, ranging from finding disease-risk loci, to inferring recombination rates, to mapping missing contigs in the human genome. Although many methods for local ancestry inference have been proposed, most are designed for use with genotyping arrays and fail to make use of the full spectrum of data available from sequencing. In addition, current haplotype-based approaches are very computationally demanding, requiring large computational time for moderately large sample sizes. Here we present new methods for local ancestry inference that leverage continent-specific variants (CSVs) to attain increased performance over existing approaches in sequenced admixed genomes. A key feature of our approach is that it incorporates the admixed genomes themselves jointly with public datasets, such as 1000 Genomes, to improve the accuracy of CSV calling. We use simulations to show that our approach attains accuracy similar to widely used computationally intensive haplotype-based approaches with large decreases in runtime. Most importantly, we show that our method recovers comparable local ancestries, as the 1000 Genomes consensus local ancestry calls in the real admixed individuals from the 1000 Genomes Project. We extend our approach to account for low-coverage sequencing and show that accurate local ancestry inference can be attained at low sequencing coverage. Finally, we generalize CSVs to sub-continental population-specific variants (sCSVs) and show that in some cases it is possible to determine the sub-continental ancestry for short chromosomal segments on the basis of sCSVs. PMID:24743331
In vivo binding of PRDM9 reveals interactions with noncanonical genomic sites

PubMed Central

Grey, Corinne; Clément, Julie A.J.; Buard, Jérôme; Leblanc, Benjamin; Gut, Ivo; Gut, Marta; Duret, Laurent

2017-01-01

In mouse and human meiosis, DNA double-strand breaks (DSBs) initiate homologous recombination and occur at specific sites called hotspots. The localization of these sites is determined by the sequence-specific DNA binding domain of the PRDM9 histone methyl transferase. Here, we performed an extensive analysis of PRDM9 binding in mouse spermatocytes. Unexpectedly, we identified a noncanonical recruitment of PRDM9 to sites that lack recombination activity and the PRDM9 binding consensus motif. These sites include gene promoters, where PRDM9 is recruited in a DSB-dependent manner. Another subset reveals DSB-independent interactions between PRDM9 and genomic sites, such as the binding sites for the insulator protein CTCF. We propose that these DSB-independent sites result from interactions between hotspot-bound PRDM9 and genomic sequences located on the chromosome axis. PMID:28336543
Transcription factor profiling reveals molecular choreography and key regulators of human retrotransposon expression

PubMed Central

Sun, Xiaoji; Wang, Xuya; Tang, Zuojian; Grivainis, Mark; Kahler, David; Yun, Chi; Mita, Paolo; Fenyö, David

2018-01-01

Transposable elements (TEs) represent a substantial fraction of many eukaryotic genomes, and transcriptional regulation of these factors is important to determine TE activities in human cells. However, due to the repetitive nature of TEs, identifying transcription factor (TF)-binding sites from ChIP-sequencing (ChIP-seq) datasets is challenging. Current algorithms are focused on subtle differences between TE copies and thus bias the analysis to relatively old and inactive TEs. Here we describe an approach termed “MapRRCon” (mapping repeat reads to a consensus) which allows us to identify proteins binding to TE DNA sequences by mapping ChIP-seq reads to the TE consensus sequence after whole-genome alignment. Although this method does not assign binding sites to individual insertions in the genome, it provides a landscape of interacting TFs by capturing factors that bind to TEs under various conditions. We applied this method to screen TFs’ interaction with L1 in human cells/tissues using ENCODE ChIP-seq datasets and identified 178 of the 512 TFs tested as bound to L1 in at least one biological condition with most of them (138) localized to the promoter. Among these L1-binding factors, we focused on Myc and CTCF, as they play important roles in cancer progression and 3D chromatin structure formation. Furthermore, we explored the transcriptomes of The Cancer Genome Atlas breast and ovarian tumor samples in which a consistent anti-/correlation between L1 and Myc/CTCF expression was observed, suggesting that these two factors may play roles in regulating L1 transcription during the development of such tumors. PMID:29802231
Genome characterization of Sugarcane Yellow Leaf Virus with special reference to RNAi based molecular breeding.

PubMed

Khalil, Farghama; Yueyu, Xu; Naiyan, Xiao; Di, Liu; Tayyab, Muhammad; Hengbo, Wang; Islam, Waqar; Rauf, Saeed; Pinghua, Chen

2018-05-04

Sugarcane is an essential crop for sugar and biofuel. Globally, its production is severely affected by sugarcane yellow leaf disease (SCYLD) caused by Sugarcane Yellow Leaf Virus (SCYLV). Many aphid vectors are involved in the spread of the disease which reduced the effectiveness of cultural and chemical management. Empirical methods of plant breeding such as introgression from wild and cultivated germplasm were not possible or at least challenging due to the absence of resistance in cultivated and wild germplasm of sugarcane. RNA interference (RNAi) transformation is an effective method to create virus-resistant varieties. Nevertheless, limited progress has been made due to lack of comprehensive research program on SCYLV based on RNAi technique. In order to show improvement and to propose future strategies for the feasibility of the RNAi technique to cope SCYLV, genome-wide consensus sequences of SCYLV were analyzed through GenBank. The coverage rates of every consensus sequence in SCYLV isolates were calculated to evaluate their practicability. Our analysis showed that single consensus sequence from SCYLV could not work well for RNAi based sugarcane breeding programs. This may be due to high mutation rate and continuous recombination within and between various viral strains. Alternative multi-target RNAi strategy is suggested to combat several strains of the viruses and to reduce the silencing escape. The multi-target small interfering RNA (siRNA) can be used together to construct RNAi plant expression plasmid, and to transform sugarcane tissues to develop new sugarcane varieties resistant to SCYLV. Copyright © 2018 Elsevier Ltd. All rights reserved.
A new comprehensive method for detection of livestock-related pathogenic viruses using a target enrichment system.

PubMed

Oba, Mami; Tsuchiaka, Shinobu; Omatsu, Tsutomu; Katayama, Yukie; Otomaru, Konosuke; Hirata, Teppei; Aoki, Hiroshi; Murata, Yoshiteru; Makino, Shinji; Nagai, Makoto; Mizutani, Tetsuya

2018-01-08

We tested usefulness of a target enrichment system SureSelect, a comprehensive viral nucleic acid detection method, for rapid identification of viral pathogens in feces samples of cattle, pigs and goats. This system enriches nucleic acids of target viruses in clinical/field samples by using a library of biotinylated RNAs with sequences complementary to the target viruses. The enriched nucleic acids are amplified by PCR and subjected to next generation sequencing to identify the target viruses. In many samples, SureSelect target enrichment method increased efficiencies for detection of the viruses listed in the biotinylated RNA library. Furthermore, this method enabled us to determine nearly full-length genome sequence of porcine parainfluenza virus 1 and greatly increased Breadth, a value indicating the ratio of the mapping consensus length in the reference genome, in pig samples. Our data showed usefulness of SureSelect target enrichment system for comprehensive analysis of genomic information of various viruses in field samples. Copyright © 2017 Elsevier Inc. All rights reserved.
An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge.

PubMed

Brownstein, Catherine A; Beggs, Alan H; Homer, Nils; Merriman, Barry; Yu, Timothy W; Flannery, Katherine C; DeChene, Elizabeth T; Towne, Meghan C; Savage, Sarah K; Price, Emily N; Holm, Ingrid A; Luquette, Lovelace J; Lyon, Elaine; Majzoub, Joseph; Neupert, Peter; McCallie, David; Szolovits, Peter; Willard, Huntington F; Mendelsohn, Nancy J; Temme, Renee; Finkel, Richard S; Yum, Sabrina W; Medne, Livija; Sunyaev, Shamil R; Adzhubey, Ivan; Cassa, Christopher A; de Bakker, Paul I W; Duzkale, Hatice; Dworzyński, Piotr; Fairbrother, William; Francioli, Laurent; Funke, Birgit H; Giovanni, Monica A; Handsaker, Robert E; Lage, Kasper; Lebo, Matthew S; Lek, Monkol; Leshchiner, Ignaty; MacArthur, Daniel G; McLaughlin, Heather M; Murray, Michael F; Pers, Tune H; Polak, Paz P; Raychaudhuri, Soumya; Rehm, Heidi L; Soemedi, Rachel; Stitziel, Nathan O; Vestecka, Sara; Supper, Jochen; Gugenmus, Claudia; Klocke, Bernward; Hahn, Alexander; Schubach, Max; Menzel, Mortiz; Biskup, Saskia; Freisinger, Peter; Deng, Mario; Braun, Martin; Perner, Sven; Smith, Richard J H; Andorf, Janeen L; Huang, Jian; Ryckman, Kelli; Sheffield, Val C; Stone, Edwin M; Bair, Thomas; Black-Ziegelbein, E Ann; Braun, Terry A; Darbro, Benjamin; DeLuca, Adam P; Kolbe, Diana L; Scheetz, Todd E; Shearer, Aiden E; Sompallae, Rama; Wang, Kai; Bassuk, Alexander G; Edens, Erik; Mathews, Katherine; Moore, Steven A; Shchelochkov, Oleg A; Trapane, Pamela; Bossler, Aaron; Campbell, Colleen A; Heusel, Jonathan W; Kwitek, Anne; Maga, Tara; Panzer, Karin; Wassink, Thomas; Van Daele, Douglas; Azaiez, Hela; Booth, Kevin; Meyer, Nic; Segal, Michael M; Williams, Marc S; Tromp, Gerard; White, Peter; Corsmeier, Donald; Fitzgerald-Butt, Sara; Herman, Gail; Lamb-Thrush, Devon; McBride, Kim L; Newsom, David; Pierson, Christopher R; Rakowsky, Alexander T; Maver, Aleš; Lovrečić, Luca; Palandačić, Anja; Peterlin, Borut; Torkamani, Ali; Wedell, Anna; Huss, Mikael; Alexeyenko, Andrey; Lindvall, Jessica M; Magnusson, Måns; Nilsson, Daniel; Stranneheim, Henrik; Taylan, Fulya; Gilissen, Christian; Hoischen, Alexander; van Bon, Bregje; Yntema, Helger; Nelen, Marcel; Zhang, Weidong; Sager, Jason; Zhang, Lu; Blair, Kathryn; Kural, Deniz; Cariaso, Michael; Lennon, Greg G; Javed, Asif; Agrawal, Saloni; Ng, Pauline C; Sandhu, Komal S; Krishna, Shuba; Veeramachaneni, Vamsi; Isakov, Ofer; Halperin, Eran; Friedman, Eitan; Shomron, Noam; Glusman, Gustavo; Roach, Jared C; Caballero, Juan; Cox, Hannah C; Mauldin, Denise; Ament, Seth A; Rowen, Lee; Richards, Daniel R; San Lucas, F Anthony; Gonzalez-Garay, Manuel L; Caskey, C Thomas; Bai, Yu; Huang, Ying; Fang, Fang; Zhang, Yan; Wang, Zhengyuan; Barrera, Jorge; Garcia-Lobo, Juan M; González-Lamuño, Domingo; Llorca, Javier; Rodriguez, Maria C; Varela, Ignacio; Reese, Martin G; De La Vega, Francisco M; Kiruluta, Edward; Cargill, Michele; Hart, Reece K; Sorenson, Jon M; Lyon, Gholson J; Stevenson, David A; Bray, Bruce E; Moore, Barry M; Eilbeck, Karen; Yandell, Mark; Zhao, Hongyu; Hou, Lin; Chen, Xiaowei; Yan, Xiting; Chen, Mengjie; Li, Cong; Yang, Can; Gunel, Murat; Li, Peining; Kong, Yong; Alexander, Austin C; Albertyn, Zayed I; Boycott, Kym M; Bulman, Dennis E; Gordon, Paul M K; Innes, A Micheil; Knoppers, Bartha M; Majewski, Jacek; Marshall, Christian R; Parboosingh, Jillian S; Sawyer, Sarah L; Samuels, Mark E; Schwartzentruber, Jeremy; Kohane, Isaac S; Margulies, David M

2014-03-25

There is tremendous potential for genome sequencing to improve clinical diagnosis and care once it becomes routinely accessible, but this will require formalizing research methods into clinical best practices in the areas of sequence data generation, analysis, interpretation and reporting. The CLARITY Challenge was designed to spur convergence in methods for diagnosing genetic disease starting from clinical case history and genome sequencing data. DNA samples were obtained from three families with heritable genetic disorders and genomic sequence data were donated by sequencing platform vendors. The challenge was to analyze and interpret these data with the goals of identifying disease-causing variants and reporting the findings in a clinically useful format. Participating contestant groups were solicited broadly, and an independent panel of judges evaluated their performance. A total of 30 international groups were engaged. The entries reveal a general convergence of practices on most elements of the analysis and interpretation process. However, even given this commonality of approach, only two groups identified the consensus candidate variants in all disease cases, demonstrating a need for consistent fine-tuning of the generally accepted methods. There was greater diversity of the final clinical report content and in the patient consenting process, demonstrating that these areas require additional exploration and standardization. The CLARITY Challenge provides a comprehensive assessment of current practices for using genome sequencing to diagnose and report genetic diseases. There is remarkable convergence in bioinformatic techniques, but medical interpretation and reporting are areas that require further development by many groups.

An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge

PubMed Central

2014-01-01

Background There is tremendous potential for genome sequencing to improve clinical diagnosis and care once it becomes routinely accessible, but this will require formalizing research methods into clinical best practices in the areas of sequence data generation, analysis, interpretation and reporting. The CLARITY Challenge was designed to spur convergence in methods for diagnosing genetic disease starting from clinical case history and genome sequencing data. DNA samples were obtained from three families with heritable genetic disorders and genomic sequence data were donated by sequencing platform vendors. The challenge was to analyze and interpret these data with the goals of identifying disease-causing variants and reporting the findings in a clinically useful format. Participating contestant groups were solicited broadly, and an independent panel of judges evaluated their performance. Results A total of 30 international groups were engaged. The entries reveal a general convergence of practices on most elements of the analysis and interpretation process. However, even given this commonality of approach, only two groups identified the consensus candidate variants in all disease cases, demonstrating a need for consistent fine-tuning of the generally accepted methods. There was greater diversity of the final clinical report content and in the patient consenting process, demonstrating that these areas require additional exploration and standardization. Conclusions The CLARITY Challenge provides a comprehensive assessment of current practices for using genome sequencing to diagnose and report genetic diseases. There is remarkable convergence in bioinformatic techniques, but medical interpretation and reporting are areas that require further development by many groups. PMID:24667040
Rice MEL2, the RNA recognition motif (RRM) protein, binds in vitro to meiosis-expressed genes containing U-rich RNA consensus sequences in the 3'-UTR.

PubMed

Miyazaki, Saori; Sato, Yutaka; Asano, Tomoya; Nagamura, Yoshiaki; Nonomura, Ken-Ichi

2015-10-01

Post-transcriptional gene regulation by RNA recognition motif (RRM) proteins through binding to cis-elements in the 3'-untranslated region (3'-UTR) is widely used in eukaryotes to complete various biological processes. Rice MEIOSIS ARRESTED AT LEPTOTENE2 (MEL2) is the RRM protein that functions in the transition to meiosis in proper timing. The MEL2 RRM preferentially associated with the U-rich RNA consensus, UUAGUU[U/A][U/G][A/U/G]U, dependently on sequences and proportionally to MEL2 protein amounts in vitro. The consensus sequences were located in the putative looped structures of the RNA ligand. A genome-wide survey revealed a tendency of MEL2-binding consensus appearing in 3'-UTR of rice genes. Of 249 genes that conserved the consensus in their 3'-UTR, 13 genes spatiotemporally co-expressed with MEL2 in meiotic flowers, and included several genes whose function was supposed in meiosis; such as Replication protein A and OsMADS3. The proteome analysis revealed that the amounts of small ubiquitin-related modifier-like protein and eukaryotic translation initiation factor3-like protein were dramatically altered in mel2 mutant anthers. Taken together with transcriptome and gene ontology results, we propose that the rice MEL2 is involved in the translational regulation of key meiotic genes on 3'-UTRs to achieve the faithful transition of germ cells to meiosis.
A Distance Measure for Genome Phylogenetic Analysis

NASA Astrophysics Data System (ADS)

Cao, Minh Duc; Allison, Lloyd; Dix, Trevor

Phylogenetic analyses of species based on single genes or parts of the genomes are often inconsistent because of factors such as variable rates of evolution and horizontal gene transfer. The availability of more and more sequenced genomes allows phylogeny construction from complete genomes that is less sensitive to such inconsistency. For such long sequences, construction methods like maximum parsimony and maximum likelihood are often not possible due to their intensive computational requirement. Another class of tree construction methods, namely distance-based methods, require a measure of distances between any two genomes. Some measures such as evolutionary edit distance of gene order and gene content are computational expensive or do not perform well when the gene content of the organisms are similar. This study presents an information theoretic measure of genetic distances between genomes based on the biological compression algorithm expert model. We demonstrate that our distance measure can be applied to reconstruct the consensus phylogenetic tree of a number of Plasmodium parasites from their genomes, the statistical bias of which would mislead conventional analysis methods. Our approach is also used to successfully construct a plausible evolutionary tree for the γ-Proteobacteria group whose genomes are known to contain many horizontally transferred genes.
New consensus nomenclature for mammalian keratins

PubMed Central

Schweizer, Jürgen; Bowden, Paul E.; Coulombe, Pierre A.; Langbein, Lutz; Lane, E. Birgitte; Magin, Thomas M.; Maltais, Lois; Omary, M. Bishr; Parry, David A.D.; Rogers, Michael A.; Wright, Mathew W.

2006-01-01

Keratins are intermediate filament–forming proteins that provide mechanical support and fulfill a variety of additional functions in epithelial cells. In 1982, a nomenclature was devised to name the keratin proteins that were known at that point. The systematic sequencing of the human genome in recent years uncovered the existence of several novel keratin genes and their encoded proteins. Their naming could not be adequately handled in the context of the original system. We propose a new consensus nomenclature for keratin genes and proteins that relies upon and extends the 1982 system and adheres to the guidelines issued by the Human and Mouse Genome Nomenclature Committees. This revised nomenclature accommodates functional genes and pseudogenes, and although designed specifically for the full complement of human keratins, it offers the flexibility needed to incorporate additional keratins from other mammalian species. PMID:16831889
Standard Mutation Nomenclature in Molecular Diagnostics

PubMed Central

Ogino, Shuji; Gulley, Margaret L.; den Dunnen, Johan T.; Wilson, Robert B.

2007-01-01

To translate basic research findings into clinical practice, it is essential that information about mutations and variations in the human genome are communicated easily and unequivocally. Unfortunately, there has been much confusion regarding the description of genetic sequence variants. This is largely because research articles that first report novel sequence variants do not often use standard nomenclature, and the final genomic sequence is compiled over many separate entries. In this article, we discuss issues crucial to clear communication, using examples of genes that are commonly assayed in clinical laboratories. Although molecular diagnostics is a dynamic field, this should not inhibit the need for and movement toward consensus nomenclature for accurate reporting among laboratories. Our aim is to alert laboratory scientists and other health care professionals to the important issues and provide a foundation for further discussions that will ultimately lead to solutions. PMID:17251329
SWPhylo - A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees.

PubMed

Yu, Xiaoyu; Reva, Oleg N

2018-01-01

Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA.
SWPhylo – A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees

PubMed Central

Yu, Xiaoyu; Reva, Oleg N

2018-01-01

Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA. PMID:29511354
Bovine Genome Database: supporting community annotation and analysis of the Bos taurus genome

PubMed Central

2010-01-01

Background A goal of the Bovine Genome Database (BGD; http://BovineGenome.org) has been to support the Bovine Genome Sequencing and Analysis Consortium (BGSAC) in the annotation and analysis of the bovine genome. We were faced with several challenges, including the need to maintain consistent quality despite diversity in annotation expertise in the research community, the need to maintain consistent data formats, and the need to minimize the potential duplication of annotation effort. With new sequencing technologies allowing many more eukaryotic genomes to be sequenced, the demand for collaborative annotation is likely to increase. Here we present our approach, challenges and solutions facilitating a large distributed annotation project. Results and Discussion BGD has provided annotation tools that supported 147 members of the BGSAC in contributing 3,871 gene models over a fifteen-week period, and these annotations have been integrated into the bovine Official Gene Set. Our approach has been to provide an annotation system, which includes a BLAST site, multiple genome browsers, an annotation portal, and the Apollo Annotation Editor configured to connect directly to our Chado database. In addition to implementing and integrating components of the annotation system, we have performed computational analyses to create gene evidence tracks and a consensus gene set, which can be viewed on individual gene pages at BGD. Conclusions We have provided annotation tools that alleviate challenges associated with distributed annotation. Our system provides a consistent set of data to all annotators and eliminates the need for annotators to format data. Involving the bovine research community in genome annotation has allowed us to leverage expertise in various areas of bovine biology to provide biological insight into the genome sequence. PMID:21092105
Chromosome-Encoded Broad-Spectrum Ambler Class A β-Lactamase RUB-1 from Serratia rubidaea

PubMed Central

Didi, Jennifer; Ergani, Ayla; Lima, Sandra

2016-01-01

ABSTRACT Whole-genome sequencing of Serratia rubidaea CIP 103234T revealed a chromosomally located Ambler class A β-lactamase gene. The gene was cloned, and the β-lactamase, RUB-1, was characterized. RUB-1 displayed 74% and 73% amino acid sequence identity with the GIL-1 and TEM-1 penicillinases, respectively, and its substrate profile was similar to that of the latter β-lactamases. Analysis by 5′ rapid amplification of cDNA ends revealed promoter sequences highly divergent from the Escherichia coli σ70 consensus sequence. This work further illustrates the heterogeneity of β-lactamases among Serratia spp. PMID:27956418
Chromosome-Encoded Broad-Spectrum Ambler Class A β-Lactamase RUB-1 from Serratia rubidaea.

PubMed

Bonnin, Rémy A; Didi, Jennifer; Ergani, Ayla; Lima, Sandra; Naas, Thierry

2017-02-01

Whole-genome sequencing of Serratia rubidaea CIP 103234 T revealed a chromosomally located Ambler class A β-lactamase gene. The gene was cloned, and the β-lactamase, RUB-1, was characterized. RUB-1 displayed 74% and 73% amino acid sequence identity with the GIL-1 and TEM-1 penicillinases, respectively, and its substrate profile was similar to that of the latter β-lactamases. Analysis by 5' rapid amplification of cDNA ends revealed promoter sequences highly divergent from the Escherichia coli σ 70 consensus sequence. This work further illustrates the heterogeneity of β-lactamases among Serratia spp. Copyright © 2017 American Society for Microbiology.
Cloning of a CACTA transposon-like insertion in intron I of tomato invertase Lin5 gene and identification of transposase-like sequences of Solanaceae species.

PubMed

Proels, Reinhard K; Roitsch, Thomas

2006-03-01

Very few CACTA transposon-like sequences have been described in Solanaceae species. Sequence information has been restricted to partial transposase (TPase)-like fragments, and no target gene of CACTA-like transposon insertion has been described in tomato to date. In this manuscript, we report on a CACTA transposon-like insertion in intron I of tomato (Lycopersicon esculentum) invertase gene Lin5 and TPase-like sequences of several Solanaceae species. Consensus primers deduced from the TPase region of the tomato CACTA transposon-like element allowed the amplification of similar sequences from various Solanaceae species of different subfamilies including Solaneae (Solanum tuberosum), Cestreae (Nicotiana tabacum) and Datureae (Datura stramonium). This demonstrates the ubiquitous presence of CACTA-like elements in Solanaceae genomes. The obtained partial sequences are highly conserved, and allow further detection and detailed analysis of CACTA-like transposons throughout Solanaceae species. CACTA-like transposon sequences make possible the evaluation of their use for genome analysis, functional studies of genes and the evolutionary relationships between plant species.
Packaging signals in two single-stranded RNA viruses imply a conserved assembly mechanism and geometry of the packaged genome.

PubMed

Dykeman, Eric C; Stockley, Peter G; Twarock, Reidun

2013-09-09

The current paradigm for assembly of single-stranded RNA viruses is based on a mechanism involving non-sequence-specific packaging of genomic RNA driven by electrostatic interactions. Recent experiments, however, provide compelling evidence for sequence specificity in this process both in vitro and in vivo. The existence of multiple RNA packaging signals (PSs) within viral genomes has been proposed, which facilitates assembly by binding coat proteins in such a way that they promote the protein-protein contacts needed to build the capsid. The binding energy from these interactions enables the confinement or compaction of the genomic RNAs. Identifying the nature of such PSs is crucial for a full understanding of assembly, which is an as yet untapped potential drug target for this important class of pathogens. Here, for two related bacterial viruses, we determine the sequences and locations of their PSs using Hamiltonian paths, a concept from graph theory, in combination with bioinformatics and structural studies. Their PSs have a common secondary structure motif but distinct consensus sequences and positions within the respective genomes. Despite these differences, the distributions of PSs in both viruses imply defined conformations for the packaged RNA genomes in contact with the protein shell in the capsid, consistent with a recent asymmetric structure determination of the MS2 virion. The PS distributions identified moreover imply a preferred, evolutionarily conserved assembly pathway with respect to the RNA sequence with potentially profound implications for other single-stranded RNA viruses known to have RNA PSs, including many animal and human pathogens. Copyright © 2013 Elsevier Ltd. All rights reserved.
Accurate multiplex polony sequencing of an evolved bacterial genome.

PubMed

Shendure, Jay; Porreca, Gregory J; Reppas, Nikos B; Lin, Xiaoxia; McCutcheon, John P; Rosenbaum, Abraham M; Wang, Michael D; Zhang, Kun; Mitra, Robi D; Church, George M

2005-09-09

We describe a DNA sequencing technology in which a commonly available, inexpensive epifluorescence microscope is converted to rapid nonelectrophoretic DNA sequencing automation. We apply this technology to resequence an evolved strain of Escherichia coli at less than one error per million consensus bases. A cell-free, mate-paired library provided single DNA molecules that were amplified in parallel to 1-micrometer beads by emulsion polymerase chain reaction. Millions of beads were immobilized in a polyacrylamide gel and subjected to automated cycles of sequencing by ligation and four-color imaging. Cost per base was roughly one-ninth as much as that of conventional sequencing. Our protocols were implemented with off-the-shelf instrumentation and reagents.
The autism sequencing consortium: large-scale, high-throughput sequencing in autism spectrum disorders.

PubMed

Buxbaum, Joseph D; Daly, Mark J; Devlin, Bernie; Lehner, Thomas; Roeder, Kathryn; State, Matthew W

2012-12-20

Research during the past decade has seen significant progress in the understanding of the genetic architecture of autism spectrum disorders (ASDs), with gene discovery accelerating as the characterization of genomic variation has become increasingly comprehensive. At the same time, this research has highlighted ongoing challenges. Here we address the enormous impact of high-throughput sequencing (HTS) on ASD gene discovery, outline a consensus view for leveraging this technology, and describe a large multisite collaboration developed to accomplish these goals. Similar approaches could prove effective for severe neurodevelopmental disorders more broadly. Copyright © 2012 Elsevier Inc. All rights reserved.
Endogenous lentivirus in Malayan colugo (Galeopterus variegatus), a close relative of primates.

PubMed

Hron, Tomáš; Fábryová, Helena; Pačes, Jan; Elleder, Daniel

2014-10-04

A significant fraction of mammalian genomes is composed of endogenous retroviral (ERV) sequences that are formed by germline infiltration of various retroviruses. In contrast to other retroviral genera, lentiviruses only rarely form ERV copies. We performed a computational search aimed at identification of novel endogenous lentiviruses in vertebrate genomes. Using the in silico strategy, we have screened 104 publicly available vertebrate genomes for the presence of endogenous lentivirus sequences. In addition to the previously described cases, the search revealed the presence of endogenous lentivirus in the genome of Malayan colugo (Galeopterus variegatus). At least three complete copies of this virus, denoted ELVgv, were detected in the colugo genome, and approximately one hundred solo LTR sequences. The assembled consensus sequence of ELVgv had typical lentivirus genome organization including three predicted accessory genes. Phylogenetic analysis placed this virus as a distinct subgroup within the lentivirus genus. The time of insertion into the dermopteran lineage was estimated to be more than thirteen million years ago. We report the discovery of the first endogenous lentivirus in the mammalian order Dermoptera, which is a taxon close to the Primates. Lentiviruses have infiltrated the mammalian germline several times across millions of years. The colugo virus described here represents possibly the oldest documented endogenization event and its discovery can lead to new insights into lentivirus evolution. This is also the first report of an endogenous lentivirus in an Asian mammal, indicating a long-term presence of this retrovirus family in Asian continent.
BreaKmer: detection of structural variation in targeted massively parallel sequencing data using kmers.

PubMed

Abo, Ryan P; Ducar, Matthew; Garcia, Elizabeth P; Thorner, Aaron R; Rojas-Rudilla, Vanesa; Lin, Ling; Sholl, Lynette M; Hahn, William C; Meyerson, Matthew; Lindeman, Neal I; Van Hummelen, Paul; MacConaill, Laura E

2015-02-18

Genomic structural variation (SV), a common hallmark of cancer, has important predictive and therapeutic implications. However, accurately detecting SV using high-throughput sequencing data remains challenging, especially for 'targeted' resequencing efforts. This is critically important in the clinical setting where targeted resequencing is frequently being applied to rapidly assess clinically actionable mutations in tumor biopsies in a cost-effective manner. We present BreaKmer, a novel approach that uses a 'kmer' strategy to assemble misaligned sequence reads for predicting insertions, deletions, inversions, tandem duplications and translocations at base-pair resolution in targeted resequencing data. Variants are predicted by realigning an assembled consensus sequence created from sequence reads that were abnormally aligned to the reference genome. Using targeted resequencing data from tumor specimens with orthogonally validated SV, non-tumor samples and whole-genome sequencing data, BreaKmer had a 97.4% overall sensitivity for known events and predicted 17 positively validated, novel variants. Relative to four publically available algorithms, BreaKmer detected SV with increased sensitivity and limited calls in non-tumor samples, key features for variant analysis of tumor specimens in both the clinical and research settings. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Identification of genes in anonymous DNA sequences. Annual performance report, February 1, 1991--January 31, 1992

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fields, C.A.

1996-06-01

The objective of this project is the development of practical software to automate the identification of genes in anonymous DNA sequences from the human, and other higher eukaryotic genomes. A software system for automated sequence analysis, gm (gene modeler) has been designed, implemented, tested, and distributed to several dozen laboratories worldwide. A significantly faster, more robust, and more flexible version of this software, gm 2.0 has now been completed, and is being tested by operational use to analyze human cosmid sequence data. A range of efforts to further understand the features of eukaryoyic gene sequences are also underway. This progressmore » report also contains papers coming out of the project including the following: gm: a Tool for Exploratory Analysis of DNA Sequence Data; The Human THE-LTR(O) and MstII Interspersed Repeats are subfamilies of a single widely distruted highly variable repeat family; Information contents and dinucleotide compostions of plant intron sequences vary with evolutionary origin; Splicing signals in Drosophila: intron size, information content, and consensus sequences; Integration of automated sequence analysis into mapping and sequencing projects; Software for the C. elegans genome project.« less
Different phylogenomic approaches to resolve the evolutionary relationships among model fish species.

PubMed

Negrisolo, Enrico; Kuhl, Heiner; Forcato, Claudio; Vitulo, Nicola; Reinhardt, Richard; Patarnello, Tomaso; Bargelloni, Luca

2010-12-01

Comparative genomics holds the promise to magnify the information obtained from individual genome sequencing projects, revealing common features conserved across genomes and identifying lineage-specific characteristics. To implement such a comparative approach, a robust phylogenetic framework is required to accurately reconstruct evolution at the genome level. Among vertebrate taxa, teleosts represent the second best characterized group, with high-quality draft genome sequences for five model species (Danio rerio, Gasterosteus aculeatus, Oryzias latipes, Takifugu rubripes, and Tetraodon nigroviridis), and several others are in the finishing lane. However, the relationships among the acanthomorph teleost model fishes remain an unresolved taxonomic issue. Here, a genomic region spanning over 1.2 million base pairs was sequenced in the teleost fish Dicentrarchus labrax. Together with genomic data available for the above fish models, the new sequence was used to identify unique orthologous genomic regions shared across all target taxa. Different strategies were applied to produce robust multiple gene and genomic alignments spanning from 11,802 to 186,474 amino acid/nucleotide positions. Ten data sets were analyzed according to Bayesian inference, maximum likelihood, maximum parsimony, and neighbor joining methods. Extensive analyses were performed to explore the influence of several factors (e.g., alignment methodology, substitution model, data set partitions, and long-branch attraction) on the tree topology. Although a general consensus was observed for a closer relationship between G. aculeatus (Gasterosteidae) and Di. labrax (Moronidae) with the atherinomorph O. latipes (Beloniformes) sister taxon of this clade, with the tetraodontiform group Ta. rubripes and Te. nigroviridis (Tetraodontiformes) representing a more distantly related taxon among acanthomorph model fish species, conflicting results were obtained between data sets and methods, especially with respect to the choice of alignment methodology applied to noncoding parts of the genomic region under study. This may limit the use of intergenic/noncoding sequences in phylogenomics until more robust alignment algorithms are developed.
New bioinformatic tool for quick identification of functionally relevant endogenous retroviral inserts in human genome.

PubMed

Garazha, Andrew; Ivanova, Alena; Suntsova, Maria; Malakhova, Galina; Roumiantsev, Sergey; Zhavoronkov, Alex; Buzdin, Anton

2015-01-01

Endogenous retroviruses (ERVs) and LTR retrotransposons (LRs) occupy ∼8% of human genome. Deep sequencing technologies provide clues to understanding of functional relevance of individual ERVs/LRs by enabling direct identification of transcription factor binding sites (TFBS) and other landmarks of functional genomic elements. Here, we performed the genome-wide identification of human ERVs/LRs containing TFBS according to the ENCODE project. We created the first interactive ERV/LRs database that groups the individual inserts according to their familial nomenclature, number of mapped TFBS and divergence from their consensus sequence. Information on any particular element can be easily extracted by the user. We also created a genome browser tool, which enables quick mapping of any ERV/LR insert according to genomic coordinates, known human genes and TFBS. These tools can be used to easily explore functionally relevant individual ERV/LRs, and for studying their impact on the regulation of human genes. Overall, we identified ∼110,000 ERV/LR genomic elements having TFBS. We propose a hypothesis of "domestication" of ERV/LR TFBS by the genome milieu including subsequent stages of initial epigenetic repression, partial functional release, and further mutation-driven reshaping of TFBS in tight coevolution with the enclosing genomic loci.
Identification of multiple binding sites for the THAP domain of the Galileo transposase in the long terminal inverted-repeats☆

PubMed Central

Marzo, Mar; Liu, Danxu; Ruiz, Alfredo; Chalmers, Ronald

2013-01-01

Galileo is a DNA transposon responsible for the generation of several chromosomal inversions in Drosophila. In contrast to other members of the P-element superfamily, it has unusually long terminal inverted-repeats (TIRs) that resemble those of Foldback elements. To investigate the function of the long TIRs we derived consensus and ancestral sequences for the Galileo transposase in three species of Drosophilids. Following gene synthesis, we expressed and purified their constituent THAP domains and tested their binding activity towards the respective Galileo TIRs. DNase I footprinting located the most proximal DNA binding site about 70 bp from the transposon end. Using this sequence we identified further binding sites in the tandem repeats that are found within the long TIRs. This suggests that the synaptic complex between Galileo ends may be a complicated structure containing higher-order multimers of the transposase. We also attempted to reconstitute Galileo transposition in Drosophila embryos but no events were detected. Thus, although the limited numbers of Galileo copies in each genome were sufficient to provide functional consensus sequences for the THAP domains, they do not specify a fully active transposase. Since the THAP recognition sequence is short, and will occur many times in a large genome, it seems likely that the multiple binding sites within the long, internally repetitive, TIRs of Galileo and other Foldback-like elements may provide the transposase with its binding specificity. PMID:23648487

Identification of multiple binding sites for the THAP domain of the Galileo transposase in the long terminal inverted-repeats.

PubMed

Marzo, Mar; Liu, Danxu; Ruiz, Alfredo; Chalmers, Ronald

2013-08-01

Galileo is a DNA transposon responsible for the generation of several chromosomal inversions in Drosophila. In contrast to other members of the P-element superfamily, it has unusually long terminal inverted-repeats (TIRs) that resemble those of Foldback elements. To investigate the function of the long TIRs we derived consensus and ancestral sequences for the Galileo transposase in three species of Drosophilids. Following gene synthesis, we expressed and purified their constituent THAP domains and tested their binding activity towards the respective Galileo TIRs. DNase I footprinting located the most proximal DNA binding site about 70 bp from the transposon end. Using this sequence we identified further binding sites in the tandem repeats that are found within the long TIRs. This suggests that the synaptic complex between Galileo ends may be a complicated structure containing higher-order multimers of the transposase. We also attempted to reconstitute Galileo transposition in Drosophila embryos but no events were detected. Thus, although the limited numbers of Galileo copies in each genome were sufficient to provide functional consensus sequences for the THAP domains, they do not specify a fully active transposase. Since the THAP recognition sequence is short, and will occur many times in a large genome, it seems likely that the multiple binding sites within the long, internally repetitive, TIRs of Galileo and other Foldback-like elements may provide the transposase with its binding specificity. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.
Multiplex Degenerate Primer Design for Targeted Whole Genome Amplification of Many Viral Genomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gardner, Shea N.; Jaing, Crystal J.; Elsheikh, Maher M.

Background . Targeted enrichment improves coverage of highly mutable viruses at low concentration in complex samples. Degenerate primers that anneal to conserved regions can facilitate amplification of divergent, low concentration variants, even when the strain present is unknown. Results . A tool for designing multiplex sets of degenerate sequencing primers to tile overlapping amplicons across multiple whole genomes is described. The new script, run_tiled_primers, is part of the PriMux software. Primers were designed for each segment of South American hemorrhagic fever viruses, tick-borne encephalitis, Henipaviruses, Arenaviruses, Filoviruses, Crimean-Congo hemorrhagic fever virus, Rift Valley fever virus, and Japanese encephalitis virus. Eachmore » group is highly diverse with as little as 5% genome consensus. Primer sets were computationally checked for nontarget cross reactions against the NCBI nucleotide sequence database. Primers for murine hepatitis virus were demonstrated in the lab to specifically amplify selected genes from a laboratory cultured strain that had undergone extensive passage in vitro and in vivo. Conclusions . This software should help researchers design multiplex sets of primers for targeted whole genome enrichment prior to sequencing to obtain better coverage of low titer, divergent viruses. Applications include viral discovery from a complex background and improved sensitivity and coverage of rapidly evolving strains or variants in a gene family.« less
Multiplex Degenerate Primer Design for Targeted Whole Genome Amplification of Many Viral Genomes

DOE PAGES

Gardner, Shea N.; Jaing, Crystal J.; Elsheikh, Maher M.; ...

2014-01-01

Background . Targeted enrichment improves coverage of highly mutable viruses at low concentration in complex samples. Degenerate primers that anneal to conserved regions can facilitate amplification of divergent, low concentration variants, even when the strain present is unknown. Results . A tool for designing multiplex sets of degenerate sequencing primers to tile overlapping amplicons across multiple whole genomes is described. The new script, run_tiled_primers, is part of the PriMux software. Primers were designed for each segment of South American hemorrhagic fever viruses, tick-borne encephalitis, Henipaviruses, Arenaviruses, Filoviruses, Crimean-Congo hemorrhagic fever virus, Rift Valley fever virus, and Japanese encephalitis virus. Eachmore » group is highly diverse with as little as 5% genome consensus. Primer sets were computationally checked for nontarget cross reactions against the NCBI nucleotide sequence database. Primers for murine hepatitis virus were demonstrated in the lab to specifically amplify selected genes from a laboratory cultured strain that had undergone extensive passage in vitro and in vivo. Conclusions . This software should help researchers design multiplex sets of primers for targeted whole genome enrichment prior to sequencing to obtain better coverage of low titer, divergent viruses. Applications include viral discovery from a complex background and improved sensitivity and coverage of rapidly evolving strains or variants in a gene family.« less
High resolution identity testing of inactivated poliovirus vaccines

PubMed Central

Mee, Edward T.; Minor, Philip D.; Martin, Javier

2015-01-01

Background Definitive identification of poliovirus strains in vaccines is essential for quality control, particularly where multiple wild-type and Sabin strains are produced in the same facility. Sequence-based identification provides the ultimate in identity testing and would offer several advantages over serological methods. Methods We employed random RT-PCR and high throughput sequencing to recover full-length genome sequences from monovalent and trivalent poliovirus vaccine products at various stages of the manufacturing process. Results All expected strains were detected in previously characterised products and the method permitted identification of strains comprising as little as 0.1% of sequence reads. Highly similar Mahoney and Sabin 1 strains were readily discriminated on the basis of specific variant positions. Analysis of a product known to contain incorrect strains demonstrated that the method correctly identified the contaminants. Conclusion Random RT-PCR and shotgun sequencing provided high resolution identification of vaccine components. In addition to the recovery of full-length genome sequences, the method could also be easily adapted to the characterisation of minor variant frequencies and distinction of closely related products on the basis of distinguishing consensus and low frequency polymorphisms. PMID:26049003
A comparative analysis of exome capture.

PubMed

Parla, Jennifer S; Iossifov, Ivan; Grabill, Ian; Spector, Mona S; Kramer, Melissa; McCombie, W Richard

2011-09-29

Human exome resequencing using commercial target capture kits has been and is being used for sequencing large numbers of individuals to search for variants associated with various human diseases. We rigorously evaluated the capabilities of two solution exome capture kits. These analyses help clarify the strengths and limitations of those data as well as systematically identify variables that should be considered in the use of those data. Each exome kit performed well at capturing the targets they were designed to capture, which mainly corresponds to the consensus coding sequences (CCDS) annotations of the human genome. In addition, based on their respective targets, each capture kit coupled with high coverage Illumina sequencing produced highly accurate nucleotide calls. However, other databases, such as the Reference Sequence collection (RefSeq), define the exome more broadly, and so not surprisingly, the exome kits did not capture these additional regions. Commercial exome capture kits provide a very efficient way to sequence select areas of the genome at very high accuracy. Here we provide the data to help guide critical analyses of sequencing data derived from these products.
Identification of Sinorhizobium (Ensifer) medicae based on a specific genomic sequence unveiled by M13-PCR fingerprinting.

PubMed

Dourado, Ana Catarina; Alves, Paula I L; Tenreiro, Tania; Ferreira, Eugénio M; Tenreiro, Rogério; Fareleira, Paula; Crespo, M Teresa Barreto

2009-12-01

A collection of nodule isolates from Medicago polymorpha obtained from southern and central Portugal was evaluated by M13-PCR fingerprinting and hierarchical cluster analysis. Several genomic clusters were obtained which, by 16S rRNA gene sequencing of selected representatives, were shown to be associated with particular taxonomic groups of rhizobia and other soil bacteria. The method provided a clear separation between rhizobia and co-isolated non-symbiotic soil contaminants. Ten M13-PCR groups were assigned to Sinorhizobium (Ensifer) medicae and included all isolates responsible for the formation of nitrogen-fixing nodules upon re-inoculation of M. polymorpha test-plants. In addition, enterobacterial repetitive intergenic consensus (ERIC)-PCR fingerprinting indicated a high genomic heterogeneity within the major M13- PCR clusters of S. medicae isolates. Based on nucleotide sequence data of an M13-PCR amplicon of ca. 1500 bp, observed only in S. medicae isolates and spanning locus Smed_3707 to Smed_3709 from the pSMED01 plasmid sequence of S. medicae WSM419 genome's sequence, a pair of PCR primers was designed and used for direct PCR amplification of a 1399-bp sequence within this fragment. Additional in silico and in vitro experiments, as well as phylogenetic analysis, confirmed the specificity of this primer combination and therefore the reliability of this approach in the prompt identification of S. medicae isolates and their distinction from other soil bacteria.
Insights Into Upland Cotton (Gossypium hirsutum L.) Genetic Recombination Based on 3 High-Density Single-Nucleotide Polymorphism and a Consensus Map Developed Independently With Common Parents. Genomics Insights

USDA-ARS?s Scientific Manuscript database

High-density linkage maps are vital to supporting the correct placement of scaffolds and gene sequences on chromosomes and fundamental to contemporary organismal research and scientific approaches to genetic improvement; high-density linkage maps are especially important in paleopolyploids with exce...
Structure, replication efficiency and fragility of yeast ARS elements.

PubMed

Dhar, Manoj K; Sehgal, Shelly; Kaul, Sanjana

2012-05-01

DNA replication in eukaryotes initiates at specific sites known as origins of replication, or replicators. These replication origins occur throughout the genome, though the propensity of their occurrence depends on the type of organism. In eukaryotes, zones of initiation of replication spanning from about 100 to 50,000 base pairs have been reported. The characteristics of eukaryotic replication origins are best understood in the budding yeast Saccharomyces cerevisiae, where some autonomously replicating sequences, or ARS elements, confer origin activity. ARS elements are short DNA sequences of a few hundred base pairs, identified by their efficiency at initiating a replication event when cloned in a plasmid. ARS elements, although structurally diverse, maintain a basic structure composed of three domains, A, B and C. Domain A is comprised of a consensus sequence designated ACS (ARS consensus sequence), while the B domain has the DNA unwinding element and the C domain is important for DNA-protein interactions. Although there are ∼400 ARS elements in the yeast genome, not all of them are active origins of replication. Different groups within the genus Saccharomyces have ARS elements as components of replication origin. The present paper provides a comprehensive review of various aspects of ARSs, starting from their structural conservation to sequence thermodynamics. All significant and conserved functional sequence motifs within different types of ARS elements have been extensively described. Issues like silencing at ARSs, their inherent fragility and factors governing their replication efficiency have also been addressed. Progress in understanding crucial components associated with the replication machinery and timing at these ARS elements is discussed in the section entitled "The replicon revisited". Copyright © 2012 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.
[A family of short retroposons (Squaml) from squamate reptiles (Reptilia: Squamata): structure, evolution and correlation with phylogeny].

PubMed

Kosushkin, S A; Borodulina, O R; Solov'eva, E N; Grechko, V V

2008-01-01

We have isolated and characterised sequences of a SINE family specific for squamate reptiles from a genome of lacertid lizard that we called Squam1. Copies are 360-390 bp in length and share a significant similarity with tRNA gene sequence on its 5'-end. This family was also detected by us in DNA of representatives of varanids, iguanids (anolis), gekkonids, and snakes. No signs of it were found in DNA of mammals, birds, amphibians, and crocodiles. Detailed analysis of primary structure of the retroposons obtained by us from genomic libraries or GenBank sequences was carried out. Most taxa possess 2-3 subfamilies of the SINE in their genomes with specific diagnostic features in their primary structure. Individual variability of copies in different families is about 85% and is just slightly lower on the genera level. Comparison of consensus sequences on family level reveals a high degree of structural similarity with a number of specific apomorphic features which makes it a useful marker of phylogeny for this group of reptiles. Snakes do not show specific affinity to varanids when compared to other lizards, as it was suggested earlier.
Mitochondrial Genomes of Kinorhyncha: trnM Duplication and New Gene Orders within Animals.

PubMed

Popova, Olga V; Mikhailov, Kirill V; Nikitin, Mikhail A; Logacheva, Maria D; Penin, Aleksey A; Muntyan, Maria S; Kedrova, Olga S; Petrov, Nikolai B; Panchin, Yuri V; Aleoshin, Vladimir V

2016-01-01

Many features of mitochondrial genomes of animals, such as patterns of gene arrangement, nucleotide content and substitution rate variation are extensively used in evolutionary and phylogenetic studies. Nearly 6,000 mitochondrial genomes of animals have already been sequenced, covering the majority of animal phyla. One of the groups that escaped mitogenome sequencing is phylum Kinorhyncha-an isolated taxon of microscopic worm-like ecdysozoans. The kinorhynchs are thought to be one of the early-branching lineages of Ecdysozoa, and their mitochondrial genomes may be important for resolving evolutionary relations between major animal taxa. Here we present the results of sequencing and analysis of mitochondrial genomes from two members of Kinorhyncha, Echinoderes svetlanae (Cyclorhagida) and Pycnophyes kielensis (Allomalorhagida). Their mitochondrial genomes are circular molecules approximately 15 Kbp in size. The kinorhynch mitochondrial gene sequences are highly divergent, which precludes accurate phylogenetic inference. The mitogenomes of both species encode a typical metazoan complement of 37 genes, which are all positioned on the major strand, but the gene order is distinct and unique among Ecdysozoa or animals as a whole. We predict four types of start codons for protein-coding genes in E. svetlanae and five in P. kielensis with a consensus DTD in single letter code. The mitochondrial genomes of E. svetlanae and P. kielensis encode duplicated methionine tRNA genes that display compensatory nucleotide substitutions. Two distant species of Kinorhyncha demonstrate similar patterns of gene arrangements in their mitogenomes. Both genomes have duplicated methionine tRNA genes; the duplication predates the divergence of two species. The kinorhynchs share a few features pertaining to gene order that align them with Priapulida. Gene order analysis reveals that gene arrangement specific of Priapulida may be ancestral for Scalidophora, Ecdysozoa, and even Protostomia.
Mitochondrial Genomes of Kinorhyncha: trnM Duplication and New Gene Orders within Animals

PubMed Central

Popova, Olga V.; Mikhailov, Kirill V.; Nikitin, Mikhail A.; Logacheva, Maria D.; Penin, Aleksey A.; Muntyan, Maria S.; Kedrova, Olga S.; Petrov, Nikolai B.; Panchin, Yuri V.

2016-01-01

Many features of mitochondrial genomes of animals, such as patterns of gene arrangement, nucleotide content and substitution rate variation are extensively used in evolutionary and phylogenetic studies. Nearly 6,000 mitochondrial genomes of animals have already been sequenced, covering the majority of animal phyla. One of the groups that escaped mitogenome sequencing is phylum Kinorhyncha—an isolated taxon of microscopic worm-like ecdysozoans. The kinorhynchs are thought to be one of the early-branching lineages of Ecdysozoa, and their mitochondrial genomes may be important for resolving evolutionary relations between major animal taxa. Here we present the results of sequencing and analysis of mitochondrial genomes from two members of Kinorhyncha, Echinoderes svetlanae (Cyclorhagida) and Pycnophyes kielensis (Allomalorhagida). Their mitochondrial genomes are circular molecules approximately 15 Kbp in size. The kinorhynch mitochondrial gene sequences are highly divergent, which precludes accurate phylogenetic inference. The mitogenomes of both species encode a typical metazoan complement of 37 genes, which are all positioned on the major strand, but the gene order is distinct and unique among Ecdysozoa or animals as a whole. We predict four types of start codons for protein-coding genes in E. svetlanae and five in P. kielensis with a consensus DTD in single letter code. The mitochondrial genomes of E. svetlanae and P. kielensis encode duplicated methionine tRNA genes that display compensatory nucleotide substitutions. Two distant species of Kinorhyncha demonstrate similar patterns of gene arrangements in their mitogenomes. Both genomes have duplicated methionine tRNA genes; the duplication predates the divergence of two species. The kinorhynchs share a few features pertaining to gene order that align them with Priapulida. Gene order analysis reveals that gene arrangement specific of Priapulida may be ancestral for Scalidophora, Ecdysozoa, and even Protostomia. PMID:27755612
Common position of indels that cause deviations from canonical genome organization in different measles virus strains.

PubMed

Ivancic-Jelecki, Jelena; Slovic, Anamarija; Šantak, Maja; Tešović, Goran; Forcic, Dubravko

2016-07-29

The canonical genome organization of measles virus (MV) is characterized by total size of 15 894 nucleotides (nts) and defined length of every genomic region, both coding and non-coding. Only rarely have reports of strains possessing non-canonical genomic properties (possessing indels, with or without the change of total genome length) been published. The observed mutations are mutually compensatory in a sense that the total genome length remains polyhexameric. Although programmed and highly precise pseudo-templated nucleotide additions during transcription are inherent to polymerases of all viruses belonging to family Paramyxoviridae, a similar mechanism that would serve to non-randomly correct genome length, if an indel has occurred during replication, has so far not been described in the context of a complete virus genome. We compiled all complete MV genomic sequences (64 in total) available in open access sequence databases. Multiple sequence comparisons and phylogenetic analyses were performed with the aim of exploring whether non-recombinant and non-evolutionary linked measles strains that show deviations from canonical genome organization possess a common genetic characteristic. In 11 MV sequences we detected deviations from canonical genome organization due to short indels located within homopolymeric stretches or next to them. In nine out of 11 identified non-canonical MV sequences, a common feature was observed: one mutation, either an insertion or a deletion, was located in a 28 nts long region in F gene 5' untranslated region (positions 5051-5078 in genomic cDNA of canonical strains). This segment is composed of five tandemly linked homopolymeric stretches, its consensus sequence is G6-7C7-8A6-7G1-3C5-6. Although none of the mononucleotide repeats within this segment has fixed length, the total number of nts in canonical strains is always 28. These nine non-canonical strains, as well as the tenth (not mutated in 5051-5078 segment), can be grouped in three clusters, based on their passage histories/epidemiological data/genetic similarities. There are no indications that the 3 clusters are evolutionary linked, other than the fact that they all belong to clade D. A common narrow genomic region was found to be mutated in different, non-related, wild type strains suggesting that this region might have a function in non-random genome length corrections occurring during MV replication.
Transcriptome Analysis and Comparison of Marmota monax and Marmota himalayana.

PubMed

Liu, Yanan; Wang, Baoju; Wang, Lu; Vikash, Vikash; Wang, Qin; Roggendorf, Michael; Lu, Mengji; Yang, Dongliang; Liu, Jia

2016-01-01

The Eastern woodchuck (Marmota monax) is a classical animal model for studying hepatitis B virus (HBV) infection and hepatocellular carcinoma (HCC) in humans. Recently, we found that Marmota himalayana, an Asian animal species closely related to Marmota monax, is susceptible to woodchuck hepatitis virus (WHV) infection and can be used as a new mammalian model for HBV infection. However, the lack of genomic sequence information of both Marmota models strongly limited their application breadth and depth. To address this major obstacle of the Marmota models, we utilized Illumina RNA-Seq technology to sequence the cDNA libraries of liver and spleen samples of two Marmota monax and four Marmota himalayana. In total, over 13 billion nucleotide bases were sequenced and approximately 1.5 billion clean reads were obtained. Following assembly, 106,496 consensus sequences of Marmota monax and 78,483 consensus sequences of Marmota himalayana were detected. For functional annotation, in total 73,603 Unigenes of Marmota monax and 78,483 Unigenes of Marmota himalayana were identified using different databases (NR, NT, Swiss-Prot, KEGG, COG, GO). The Unigenes were aligned by blastx to protein databases to decide the coding DNA sequences (CDS) and in total 41,247 CDS of Marmota monax and 34,033 CDS of Marmota himalayana were predicted. The single nucleotide polymorphisms (SNPs) and the simple sequence repeats (SSRs) were also analyzed for all Unigenes obtained. Moreover, a large-scale transcriptome comparison was performed and revealed a high similarity in transcriptome sequences between the two marmota species. Our study provides an extensive amount of novel sequence information for Marmota monax and Marmota himalayana. This information may serve as a valuable genomics resource for further molecular, developmental and comparative evolutionary studies, as well as for the identification and characterization of functional genes that are involved in WHV infection and HCC development in the woodchuck model.
Transcriptome Analysis and Comparison of Marmota monax and Marmota himalayana

PubMed Central

Wang, Lu; Vikash, Vikash; Wang, Qin; Roggendorf, Michael; Lu, Mengji; Yang, Dongliang; Liu, Jia

2016-01-01

The Eastern woodchuck (Marmota monax) is a classical animal model for studying hepatitis B virus (HBV) infection and hepatocellular carcinoma (HCC) in humans. Recently, we found that Marmota himalayana, an Asian animal species closely related to Marmota monax, is susceptible to woodchuck hepatitis virus (WHV) infection and can be used as a new mammalian model for HBV infection. However, the lack of genomic sequence information of both Marmota models strongly limited their application breadth and depth. To address this major obstacle of the Marmota models, we utilized Illumina RNA-Seq technology to sequence the cDNA libraries of liver and spleen samples of two Marmota monax and four Marmota himalayana. In total, over 13 billion nucleotide bases were sequenced and approximately 1.5 billion clean reads were obtained. Following assembly, 106,496 consensus sequences of Marmota monax and 78,483 consensus sequences of Marmota himalayana were detected. For functional annotation, in total 73,603 Unigenes of Marmota monax and 78,483 Unigenes of Marmota himalayana were identified using different databases (NR, NT, Swiss-Prot, KEGG, COG, GO). The Unigenes were aligned by blastx to protein databases to decide the coding DNA sequences (CDS) and in total 41,247 CDS of Marmota monax and 34,033 CDS of Marmota himalayana were predicted. The single nucleotide polymorphisms (SNPs) and the simple sequence repeats (SSRs) were also analyzed for all Unigenes obtained. Moreover, a large-scale transcriptome comparison was performed and revealed a high similarity in transcriptome sequences between the two marmota species. Our study provides an extensive amount of novel sequence information for Marmota monax and Marmota himalayana. This information may serve as a valuable genomics resource for further molecular, developmental and comparative evolutionary studies, as well as for the identification and characterization of functional genes that are involved in WHV infection and HCC development in the woodchuck model. PMID:27806133
Complete genome sequences of avian paramyxovirus type 8 strains goose/Delaware/1053/76 and pintail/Wakuya/20/78

PubMed Central

Paldurai, Anandan; Subbiah, Madhuri; Kumar, Sachin; Collins, Peter L.; Samal, Siba K.

2009-01-01

Complete consensus genome sequences were determined for avian paramyxovirus type 8 (APMV-8) strains goose/Delaware/1053/76 (prototype strain) and pintail/Wakuya/20/78. The genome of each strain is 15,342 nucleotides (nt) long, which follows the “rule of six”. The genome consists of six genes in the order of 3′-N-P/V/W-M-F-HN-L-5′. The genes are flanked on either side by conserved transcription start and stop signals, and have intergenic regions ranging from 1 to 30 nt. The genome contains a 55 nt leader region at the 3′-end and a 171 nt trailer region at the 5′-end. Comparison of sequences of strains Delaware and Wakuya showed nucleotide identity of 96.8% at the genome level and amino acid identities of 99.3%, 96.5%, 98.6%, 99.4%, 98.6% and 99.1% for the predicted N, P, M, F, HN and L proteins, respectively. Both strains grew in embryonated chicken eggs and in primary chicken embryo kidney cells, and 293T cells. Both strains contained only a single basic residue at the cleavage activation site of the F protein and their efficiency of replication in vitro depended on and was augmented by, the presence of exogenous protease in most cell lines. Sequence alignment and phylogenic analysis of the predicted amino acid sequence of APMV-8 strain Delaware proteins with the cognate proteins of other available APMV serotypes showed that APMV-8 is more closely related to APMV-2 and -6 than to APMV-1, -3 and -4. PMID:19341613
Return of Genomic Results to Research Participants: The Floor, the Ceiling, and the Choices In Between

PubMed Central

Jarvik, Gail P.; Amendola, Laura M.; Berg, Jonathan S.; Brothers, Kyle; Clayton, Ellen W.; Chung, Wendy; Evans, Barbara J.; Evans, James P.; Fullerton, Stephanie M.; Gallego, Carlos J.; Garrison, Nanibaa’ A.; Gray, Stacy W.; Holm, Ingrid A.; Kullo, Iftikhar J.; Lehmann, Lisa Soleymani; McCarty, Cathy; Prows, Cynthia A.; Rehm, Heidi L.; Sharp, Richard R.; Salama, Joseph; Sanderson, Saskia; Van Driest, Sara L.; Williams, Marc S.; Wolf, Susan M.; Wolf, Wendy A.; Harley, John; Myers, Melanie; Namjou, Bahram; Vinks, Sander; Connolly, John; Keating, Brendan; Gerhard, Glenn; Sundaresan, Agnes; Tromp, Gerard; Crosslin, David; Leppig, Kathy; Wicklund, Cathy; Chute, Christopher; Lynch, John; De Andrade, Mariza; Heit, John; McCormick, Jen; Brilliant, Murray; Kitchner, Terrie; Ritchie, Marylyn; Böttinger, Erwin; Peter, Inga; Persell, Stephen; Rasmussen-Torvik, Laura; McGregor, Tracy; Roden, Dan; Antommaria, Armand; Chiavacci, Rosetta; Faucett, Andy; Ledbetter, David; Williams, Janet; Hartzler, Andrea; Vitek, Carolyn R. Rohrer; Frost, Norm; Ferryman, Kadija; Horowitz, Carol; Rhodes, Rosamond; Zinberg, Randi; Aufox, Sharon; Pan, Vivian; Long, Rochelle; Ramos, Erin; Odgis, Jackie; Wise, Anastasia; Hull, Sara; Gitlin, Jonathan; Green, Robert; Metterville, Danielle; McGuire, Amy; Kong, Sek Won; Trinidad, Sue; Veenstra, David; Roche, Myra; Skinner, Debra; Raspberry, Kelly; O’Daniel, Julianne; Parsons, Will; Eng, Christine; Hilsenbeck, Susan; Karavite, Dean; Conlin, Laura; Spinner, Nancy; Krantz, Ian; Falk, Marni; Santani, Avni; Dechene, Elizabeth; Dulik, Matthew; Bernhardt, Barbara; Schuetze, Scott; Everett, Jessica; Gornick, Michele Caroline; Wilfond, Ben; Tabor, Holly; Lemke, Amy A.; Richards, Sue; Goddard, Katrina; Cooper, Greg; East, Kelly; Barsh, Greg; Koenig, Barbara; Van Allen, Eliezer; Garber, Judy; Garrett, Jeremy; Zawati, Ma’n; Lewis, Michelle; Savage, Sarah; Smith, Maureen; Roychowdhury, Sameek; Bailey, Alice; Berkman, Benjamin; Anan, Charlisse Caga; Hindorff, Lucia; Hutter, Carolyn; King, Rosalind; Li, Rongling; Lockhart, Nicole; McEwen, Jean; Scholes, Derek; Schully, Sheri; Sun, Kathie; Burke, Wylie

2014-01-01

As more research studies incorporate next-generation sequencing (including whole-genome or whole-exome sequencing), investigators and institutional review boards face difficult questions regarding which genomic results to return to research participants and how. An American College of Medical Genetics and Genomics 2013 policy paper suggesting that pathogenic mutations in 56 specified genes should be returned in the clinical setting has raised the question of whether comparable recommendations should be considered in research settings. The Clinical Sequencing Exploratory Research (CSER) Consortium and the Electronic Medical Records and Genomics (eMERGE) Network are multisite research programs that aim to develop practical strategies for addressing questions concerning the return of results in genomic research. CSER and eMERGE committees have identified areas of consensus regarding the return of genomic results to research participants. In most circumstances, if results meet an actionability threshold for return and the research participant has consented to return, genomic results, along with referral for appropriate clinical follow-up, should be offered to participants. However, participants have a right to decline the receipt of genomic results, even when doing so might be viewed as a threat to the participants’ health. Research investigators should be prepared to return research results and incidental findings discovered in the course of their research and meeting an actionability threshold, but they have no ethical obligation to actively search for such results. These positions are consistent with the recognition that clinical research is distinct from medical care in both its aims and its guiding moral principles. PMID:24814192
A Sequence-Independent Strategy for Detection and Cloning of Circular DNA Virus Genomes by Using Multiply Primed Rolling-Circle Amplification

PubMed Central

Rector, Annabel; Tachezy, Ruth; Van Ranst, Marc

2004-01-01

The discovery of novel viruses has often been accomplished by using hybridization-based methods that necessitate the availability of a previously characterized virus genome probe or knowledge of the viral nucleotide sequence to construct consensus or degenerate PCR primers. In their natural replication cycle, certain viruses employ a rolling-circle mechanism to propagate their circular genomes, and multiply primed rolling-circle amplification (RCA) with φ29 DNA polymerase has recently been applied in the amplification of circular plasmid vectors used in cloning. We employed an isothermal RCA protocol that uses random hexamer primers to amplify the complete genomes of papillomaviruses without the need for prior knowledge of their DNA sequences. We optimized this RCA technique with extracted human papillomavirus type 16 (HPV-16) DNA from W12 cells, using a real-time quantitative PCR assay to determine amplification efficiency, and obtained a 2.4 × 104-fold increase in HPV-16 DNA concentration. We were able to clone the complete HPV-16 genome from this multiply primed RCA product. The optimized protocol was subsequently applied to a bovine fibropapillomatous wart tissue sample. Whereas no papillomavirus DNA could be detected by restriction enzyme digestion of the original sample, multiply primed RCA enabled us to obtain a sufficient amount of papillomavirus DNA for restriction enzyme analysis, cloning, and subsequent sequencing of a novel variant of bovine papillomavirus type 1. The multiply primed RCA method allows the discovery of previously unknown papillomaviruses, and possibly also other circular DNA viruses, without a priori sequence information. PMID:15113879
Cloning and sequence analysis of the invertase gene INV 1 from the yeast Pichia anomala.

PubMed

Pérez, J A; Rodríguez, J; Rodríguez, L; Ruiz, T

1996-02-01

A genomic library from the yeast Pichia anomala has been constructed and employed to clone the gene encoding the sucrose-hydrolysing enzyme invertase by complementation of a sucrose non-fermenting mutant of Saccharomyces cerevisiae. The cloned gene, INV1, was sequenced and found to encode a polypeptide of 550 amino acids which contained a 22 amino-acid signal sequence and ten potential glycosylation sites. The amino-acid sequence shows significant identity with other yeast invertases and also with Kluyveromyces marxianus inulinase, a yeast beta-fructofuranosidase which has a different substrate specificity. The nucleotide sequences of the 5' and 3' non-coding regions were found to contain several consensus motifs probably involved in the initiation and termination of gene transcription.
Characterization of NIST human mitochondrial DNA SRM-2392 and SRM-2392-I standard reference materials by next generation sequencing.

PubMed

Riman, Sarah; Kiesler, Kevin M; Borsuk, Lisa A; Vallone, Peter M

2017-07-01

Standard Reference Materials SRM 2392 and 2392-I are intended to provide quality control when amplifying and sequencing human mitochondrial genome sequences. The National Institute of Standards and Technology (NIST) offers these SRMs to laboratories performing DNA-based forensic human identification, molecular diagnosis of mitochondrial diseases, mutation detection, evolutionary anthropology, and genetic genealogy. The entire mtGenome (∼16569bp) of SRM 2392 and 2392-I have previously been characterized at NIST by Sanger sequencing. Herein, we used the sensitivity, specificity, and accuracy offered by next generation sequencing (NGS) to: (1) re-sequence the certified values of the SRM 2392 and 2392-I; (2) confirm Sanger data with a high coverage new sequencing technology; (3) detect lower level heteroplasmies (<20%); and thus (4) support mitochondrial sequencing communities in the adoption of NGS methods. To obtain a consensus sequence for the SRMs as well as identify and control any bias, sequencing was performed using two NGS platforms and data was analyzed using different bioinformatics pipelines. Our results confirm five low level heteroplasmy sites that were not previously observed with Sanger sequencing: three sites in the GM09947A template in SRM 2392 and two sites in the HL-60 template in SRM 2392-I. Copyright © 2017 Elsevier B.V. All rights reserved.
A High-Resolution InDel (Insertion–Deletion) Markers-Anchored Consensus Genetic Map Identifies Major QTLs Governing Pod Number and Seed Yield in Chickpea

PubMed Central

Srivastava, Rishi; Singh, Mohar; Bajaj, Deepak; Parida, Swarup K.

2016-01-01

Development and large-scale genotyping of user-friendly informative genome/gene-derived InDel markers in natural and mapping populations is vital for accelerating genomics-assisted breeding applications of chickpea with minimal resource expenses. The present investigation employed a high-throughput whole genome next-generation resequencing strategy in low and high pod number parental accessions and homozygous individuals constituting the bulks from each of two inter-specific mapping populations [(Pusa 1103 × ILWC 46) and (Pusa 256 × ILWC 46)] to develop non-erroneous InDel markers at a genome-wide scale. Comparing these high-quality genomic sequences, 82,360 InDel markers with reference to kabuli genome and 13,891 InDel markers exhibiting differentiation between low and high pod number parental accessions and bulks of aforementioned mapping populations were developed. These informative markers were structurally and functionally annotated in diverse coding and non-coding sequence components of genome/genes of kabuli chickpea. The functional significance of regulatory and coding (frameshift and large-effect mutations) InDel markers for establishing marker-trait linkages through association/genetic mapping was apparent. The markers detected a greater amplification (97%) and intra-specific polymorphic potential (58–87%) among a diverse panel of cultivated desi, kabuli, and wild accessions even by using a simpler cost-efficient agarose gel-based assay implicating their utility in large-scale genetic analysis especially in domesticated chickpea with narrow genetic base. Two high-density inter-specific genetic linkage maps generated using aforesaid mapping populations were integrated to construct a consensus 1479 InDel markers-anchored high-resolution (inter-marker distance: 0.66 cM) genetic map for efficient molecular mapping of major QTLs governing pod number and seed yield per plant in chickpea. Utilizing these high-density genetic maps as anchors, three major genomic regions harboring each of pod number and seed yield robust QTLs (15–28% phenotypic variation explained) were identified on chromosomes 2, 4, and 6. The integration of genetic and physical maps at these QTLs mapped on chromosomes scaled-down the long major QTL intervals into high-resolution short pod number and seed yield robust QTL physical intervals (0.89–2.94 Mb) which were essentially got validated in multiple genetic backgrounds of two chickpea mapping populations. The genome-wide InDel markers including natural allelic variants and genomic loci/genes delineated at major six especially in one colocalized novel congruent robust pod number and seed yield robust QTLs mapped on a high-density consensus genetic map were found most promising in chickpea. These functionally relevant molecular tags can drive marker-assisted genetic enhancement to develop high-yielding cultivars with increased seed/pod number and yield in chickpea. PMID:27695461

A cricket Gene Index: a genomic resource for studying neurobiology, speciation, and molecular evolution

PubMed Central

Danley, Patrick D; Mullen, Sean P; Liu, Fenglong; Nene, Vishvanath; Quackenbush, John; Shaw, Kerry L

2007-01-01

Background As the developmental costs of genomic tools decline, genomic approaches to non-model systems are becoming more feasible. Many of these systems may lack advanced genetic tools but are extremely valuable models in other biological fields. Here we report the development of expressed sequence tags (EST's) in an orthopteroid insect, a model for the study of neurobiology, speciation, and evolution. Results We report the sequencing of 14,502 EST's from clones derived from a nerve cord cDNA library, and the subsequent construction of a Gene Index from these sequences, from the Hawaiian trigonidiine cricket Laupala kohalensis. The Gene Index contains 8607 unique sequences comprised of 2575 tentative consensus (TC) sequences and 6032 singletons. For each of the unique sequences, an attempt was made to assign a provisional annotation and to categorize its function using a Gene Ontology-based classification through a sequence-based comparison to known proteins. In addition, a set of unique 70 base pair oligomers that can be used for DNA microarrays was developed. All Gene Index information is posted at the DFCI Gene Indices web page Conclusion Orthopterans are models used to understand the neurophysiological basis of complex motor patterns such as flight and stridulation. The sequences presented in the cricket Gene Index will provide neurophysiologists with many genetic tools that have been largely absent in this field. The cricket Gene Index is one of only two gene indices to be developed in an evolutionary model system. Species within the genus Laupala have speciated recently, rapidly, and extensively. Therefore, the genes identified in the cricket Gene Index can be used to study the genomics of speciation. Furthermore, this gene index represents a significant EST resources for basal insects. As such, this resource is a valuable comparative tool for the understanding of invertebrate molecular evolution. The sequences presented here will provide much needed genomic resources for three distinct but overlapping fields of inquiry: neurobiology, speciation, and molecular evolution. PMID:17459168
Molecular cloning and nucleotide sequence of the alpha and beta subunits of allophycocyanin from the cyanelle genome of Cyanophora paradoxa.

PubMed Central

Bryant, D A; de Lorimier, R; Lambert, D H; Dubbs, J M; Stirewalt, V L; Stevens, S E; Porter, R D; Tam, J; Jay, E

1985-01-01

The genes for the alpha- and beta-subunit apoproteins of allophycocyanin (AP) were isolated from the cyanelle genome of Cyanophora paradoxa and subjected to nucleotide sequence analysis. The AP beta-subunit apoprotein gene was localized to a 7.8-kilobase-pair Pst I restriction fragment from cyanelle DNA by hybridization with a tetradecameric oligonucleotide probe. Sequence analysis using that oligonucleotide and its complement as primers for the dideoxy chain-termination sequencing method confirmed the presence of both AP alpha- and beta-subunit genes on this restriction fragment. Additional oligonucleotide primers were synthesized as sequencing progressed and were used to determine rapidly the nucleotide sequence of a 1336-base-pair region of this cloned fragment. This strategy allowed the sequencing to be completed without a detailed restriction map and without extensive and time-consuming subcloning. The sequenced region contains two open reading frames whose deduced amino acid sequences are 81-85% homologous to cyanobacterial and red algal AP subunits whose amino acid sequences have been determined. The two open reading frames are in the same orientation and are separated by 39 base pairs. AP alpha is 5' to AP beta and both coding sequences are preceded by a polypurine, Shine-Dalgarno-type sequence. Sequences upstream from AP alpha closely resemble the Escherichia coli consensus promoter sequences and also show considerable homology to promoter sequences for several chloroplast-encoded psbA genes. A 56-base-pair palindromic sequence downstream from the AP beta gene could play a role in the termination of transcription or translation. The allophycocyanin apoprotein subunit genes are located on the large single-copy region of the cyanelle genome. PMID:2987916
Whole-genome analyses of DS-1-like human G2P[4] and G8P[4] rotavirus strains from Eastern, Western and Southern Africa

PubMed Central

Nyaga, Martin M.; Stucker, Karla M.; Esona, Mathew D.; Jere, Khuzwayo C.; Mwinyi, Bakari; Shonhai, Annie; Tsolenyanu, Enyonam; Mulindwa, Augustine; Chibumbya, Julia N.; Adolfine, Hokororo; Halpin, Rebecca A.; Roy, Sunando; Stockwell, Timothy B.; Berejena, Chipo; Seheri, Mapaseka L.; Mwenda, Jason M.; Steele, A. Duncan; Wentworth, David E.

2018-01-01

Group A rotaviruses (RVAs) with distinct G and P genotype combinations have been reported globally. We report the genome composition and possible origin of seven G8P[4] and five G2P[4] human RVA strains based on the genetic evolution of all 11 genome segments at the nucleotide level. Twelve RVA ELISA positive stool samples collected in the representative countries of Eastern, Southern and West Africa during the 2007–2012 surveillance seasons were subjected to sequencing using the Ion Torrent PGM and Illumina MiSeq platforms. A reference-based assembly was performed using CLC Bio’s clc_ref_assemble_long program, and full-genome consensus sequences were obtained. With the exception of the neutralising antigen, VP7, all study strains exhibited the DS-1-like genome constellation (P[4]-I2-R2-C2-M2-A2-N2-T2-E2-H2) and clustered phylogenetically with reference strains having a DS-1-like genetic backbone. Comparison of the nucleotide and amino acid sequences with selected global cognate genome segments revealed nucleotide and amino acid sequence identities of 81.7–100 % and 90.6–100 %, respectively, with NSP4 gene segment showing the most diversity among the strains. Bayesian analyses of all gene sequences to estimate the time of divergence of the lineage indicated that divergence times ranged from 16 to 44 years, except for the NSP4 gene where the lineage seemed to arise in the more distant past at an estimated 203 years ago. However, the long-term effects of changes found within the NSP4 genome segment should be further explored, and thus we recommend continued whole-genome analyses from larger sample sets to determine the evolutionary mechanisms of the DS-1-like strains collected in Africa. PMID:24952422
Sequence diversity of wheat mosaic virus isolates.

PubMed

Stewart, Lucy R

2016-02-02

Wheat mosaic virus (WMoV), transmitted by eriophyid wheat curl mites (Aceria tosichella) is the causal agent of High Plains disease in wheat and maize. WMoV and other members of the genus Emaravirus evaded thorough molecular characterization for many years due to the experimental challenges of mite transmission and manipulating multisegmented negative sense RNA genomes. Recently, the complete genome sequence of a Nebraska isolate of WMoV revealed eight segments, plus a variant sequence of the nucleocapsid protein-encoding segment. Here, near-complete and partial consensus sequences of five more WMoV isolates are reported and compared to the Nebraska isolate: an Ohio maize isolate (GG1), a Kansas barley isolate (KS7), and three Ohio wheat isolates (H1, K1, W1). Results show two distinct groups of WMoV isolates: Ohio wheat isolate RNA segments had 84% or lower nucleotide sequence identity to the NE isolate, whereas GG1 and KS7 had 98% or higher nucleotide sequence identity to the NE isolate. Knowledge of the sequence variability of WMoV isolates is a step toward understanding virus biology, and potentially explaining observed biological variation. Published by Elsevier B.V.
Transcriptome sequencing for high throughput SNP development and genetic mapping in Pea

PubMed Central

2014-01-01

Background Pea has a complex genome of 4.3 Gb for which only limited genomic resources are available to date. Although SNP markers are now highly valuable for research and modern breeding, only a few are described and used in pea for genetic diversity and linkage analysis. Results We developed a large resource by cDNA sequencing of 8 genotypes representative of modern breeding material using the Roche 454 technology, combining both long reads (400 bp) and high coverage (3.8 million reads, reaching a total of 1,369 megabases). Sequencing data were assembled and generated a 68 K unigene set, from which 41 K were annotated from their best blast hit against the model species Medicago truncatula. Annotated contigs showed an even distribution along M. truncatula pseudochromosomes, suggesting a good representation of the pea genome. 10 K pea contigs were found to be polymorphic among the genetic material surveyed, corresponding to 35 K SNPs. We validated a subset of 1538 SNPs through the GoldenGate assay, proving their ability to structure a diversity panel of breeding germplasm. Among them, 1340 were genetically mapped and used to build a new consensus map comprising a total of 2070 markers. Based on blast analysis, we could establish 1252 bridges between our pea consensus map and the pseudochromosomes of M. truncatula, which provides new insight on synteny between the two species. Conclusions Our approach created significant new resources in pea, i.e. the most comprehensive genetic map to date tightly linked to the model species M. truncatula and a large SNP resource for both academic research and breeding. PMID:24521263
Reviving the Dead: History and Reactivation of an Extinct L1

PubMed Central

Yang, Lei; Brunsfeld, John; Scott, LuAnn; Wichman, Holly

2014-01-01

Although L1 sequences are present in the genomes of all placental mammals and marsupials examined to date, their activity was lost in the megabat family, Pteropodidae, ∼24 million years ago. To examine the characteristics of L1s prior to their extinction, we analyzed the evolutionary history of L1s in the genome of a megabat, Pteropus vampyrus, and found a pattern of periodic L1 expansion and quiescence. In contrast to the well-characterized L1s in human and mouse, megabat genomes have accommodated two or more simultaneously active L1 families throughout their evolutionary history, and major peaks of L1 deposition into the genome always involved multiple families. We compared the consensus sequences of the two major megabat L1 families at the time of their extinction to consensus L1s of a variety of mammalian species. Megabat L1s are comparable to the other mammalian L1s in terms of adenosine content and conserved amino acids in the open reading frames (ORFs). However, the intergenic region (IGR) of the reconstructed element from the more active family is dramatically longer than the IGR of well-characterized human and mouse L1s. We synthesized the reconstructed element from this L1 family and tested the ability of its components to support retrotransposition in a tissue culture assay. Both ORFs are capable of supporting retrotransposition, while the IGR is inhibitory to retrotransposition, especially when combined with either of the reconstructed ORFs. We dissected the inhibitory effect of the IGR by testing truncated and shuffled versions and found that length is a key factor, but not the only one affecting inhibition of retrotransposition. Although the IGR is inhibitory to retrotransposition, this inhibition does not account for the extinction of L1s in megabats. Overall, the evolution of the L1 sequence or the quiescence of L1 is unlikely the reason of L1 extinction. PMID:24968166
Reviving the dead: history and reactivation of an extinct l1.

PubMed

Yang, Lei; Brunsfeld, John; Scott, LuAnn; Wichman, Holly

2014-06-01

Although L1 sequences are present in the genomes of all placental mammals and marsupials examined to date, their activity was lost in the megabat family, Pteropodidae, ∼24 million years ago. To examine the characteristics of L1s prior to their extinction, we analyzed the evolutionary history of L1s in the genome of a megabat, Pteropus vampyrus, and found a pattern of periodic L1 expansion and quiescence. In contrast to the well-characterized L1s in human and mouse, megabat genomes have accommodated two or more simultaneously active L1 families throughout their evolutionary history, and major peaks of L1 deposition into the genome always involved multiple families. We compared the consensus sequences of the two major megabat L1 families at the time of their extinction to consensus L1s of a variety of mammalian species. Megabat L1s are comparable to the other mammalian L1s in terms of adenosine content and conserved amino acids in the open reading frames (ORFs). However, the intergenic region (IGR) of the reconstructed element from the more active family is dramatically longer than the IGR of well-characterized human and mouse L1s. We synthesized the reconstructed element from this L1 family and tested the ability of its components to support retrotransposition in a tissue culture assay. Both ORFs are capable of supporting retrotransposition, while the IGR is inhibitory to retrotransposition, especially when combined with either of the reconstructed ORFs. We dissected the inhibitory effect of the IGR by testing truncated and shuffled versions and found that length is a key factor, but not the only one affecting inhibition of retrotransposition. Although the IGR is inhibitory to retrotransposition, this inhibition does not account for the extinction of L1s in megabats. Overall, the evolution of the L1 sequence or the quiescence of L1 is unlikely the reason of L1 extinction.
On the Concept of Cis-regulatory Information: From Sequence Motifs to Logic Functions

NASA Astrophysics Data System (ADS)

Tarpine, Ryan; Istrail, Sorin

The regulatory genome is about the “system level organization of the core genomic regulatory apparatus, and how this is the locus of causality underlying the twin phenomena of animal development and animal evolution” (E.H. Davidson. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution, Academic Press, 2006). Information processing in the regulatory genome is done through regulatory states, defined as sets of transcription factors (sequence-specific DNA binding proteins which determine gene expression) that are expressed and active at the same time. The core information processing machinery consists of modular DNA sequence elements, called cis-modules, that interact with transcription factors. The cis-modules “read” the information contained in the regulatory state of the cell through transcription factor binding, “process” it, and directly or indirectly communicate with the basal transcription apparatus to determine gene expression. This endowment of each gene with the information-receiving capacity through their cis-regulatory modules is essential for the response to every possible regulatory state to which it might be exposed during all phases of the life cycle and in all cell types. We present here a set of challenges addressed by our CYRENE research project aimed at studying the cis-regulatory code of the regulatory genome. The CYRENE Project is devoted to (1) the construction of a database, the cis-Lexicon, containing comprehensive information across species about experimentally validated cis-regulatory modules; and (2) the software development of a next-generation genome browser, the cis-Browser, specialized for the regulatory genome. The presentation is anchored on three main computational challenges: the Gene Naming Problem, the Consensus Sequence Bottleneck Problem, and the Logic Function Inference Problem.
HIV-1 Full-Genome Phylogenetics of Generalized Epidemics in Sub-Saharan Africa: Impact of Missing Nucleotide Characters in Next-Generation Sequences

PubMed Central

Wymant, Chris; Colijn, Caroline; Danaviah, Siva; Essex, Max; Frost, Simon; Gall, Astrid; Gaseitsiwe, Simani; Grabowski, Mary K.; Gray, Ronald; Guindon, Stephane; von Haeseler, Arndt; Kaleebu, Pontiano; Kendall, Michelle; Kozlov, Alexey; Manasa, Justen; Minh, Bui Quang; Moyo, Sikhulile; Novitsky, Vlad; Nsubuga, Rebecca; Pillay, Sureshnee; Quinn, Thomas C.; Serwadda, David; Ssemwanga, Deogratius; Stamatakis, Alexandros; Trifinopoulos, Jana; Wawer, Maria; Brown, Andy Leigh; de Oliveira, Tulio; Kellam, Paul; Pillay, Deenan; Fraser, Christophe

2017-01-01

Abstract To characterize HIV-1 transmission dynamics in regions where the burden of HIV-1 is greatest, the “Phylogenetics and Networks for Generalised HIV Epidemics in Africa” consortium (PANGEA-HIV) is sequencing full-genome viral isolates from across sub-Saharan Africa. We report the first 3,985 PANGEA-HIV consensus sequences from four cohort sites (Rakai Community Cohort Study, n = 2,833; MRC/UVRI Uganda, n = 701; Mochudi Prevention Project, n = 359; Africa Health Research Institute Resistance Cohort, n = 92). Next-generation sequencing success rates varied: more than 80% of the viral genome from the gag to the nef genes could be determined for all sequences from South Africa, 75% of sequences from Mochudi, 60% of sequences from MRC/UVRI Uganda, and 22% of sequences from Rakai. Partial sequencing failure was primarily associated with low viral load, increased for amplicons closer to the 3′ end of the genome, was not associated with subtype diversity except HIV-1 subtype D, and remained significantly associated with sampling location after controlling for other factors. We assessed the impact of the missing data patterns in PANGEA-HIV sequences on phylogeny reconstruction in simulations. We found a threshold in terms of taxon sampling below which the patchy distribution of missing characters in next-generation sequences (NGS) has an excess negative impact on the accuracy of HIV-1 phylogeny reconstruction, which is attributable to tree reconstruction artifacts that accumulate when branches in viral trees are long. The large number of PANGEA-HIV sequences provides unprecedented opportunities for evaluating HIV-1 transmission dynamics across sub-Saharan Africa and identifying prevention opportunities. Molecular epidemiological analyses of these data must proceed cautiously because sequence sampling remains below the identified threshold and a considerable negative impact of missing characters on phylogeny reconstruction is expected. PMID:28540766
HIV-1 full-genome phylogenetics of generalized epidemics in sub-Saharan Africa: impact of missing nucleotide characters in next-generation sequences.

PubMed

Ratmann, Oliver; Wymant, Chris; Colijn, Caroline; Danaviah, Siva; Essex, M; Frost, Simon D W; Gall, Astrid; Gaiseitsiwe, Simani; Grabowski, Mary; Gray, Ronald; Guindon, Stephane; von Haeseler, Arndt; Kaleebu, Pontiano; Kendall, Michelle; Kozlov, Alexey; Manasa, Justen; Minh, Bui Quang; Moyo, Sikhulile; Novitsky, Vladimir; Nsubuga, Rebecca; Pillay, Sureshnee; Quinn, Thomas C; Serwadda, David; Ssemwanga, Deogratius; Stamatakis, Alexandros; Trifinopoulos, Jana; Wawer, Maria; Leigh Brown, Andrew; de Oliveira, Tulio; Kellam, Paul; Pillay, Deenan; Fraser, Christophe

2017-05-25

To characterize HIV-1 transmission dynamics in regions where the burden of HIV-1 is greatest, the 'Phylogenetics and Networks for Generalised HIV Epidemics in Africa' consortium (PANGEA-HIV) is sequencing full-genome viral isolates from across sub-Saharan Africa. We report the first 3,985 PANGEA-HIV consensus sequences from four cohort sites (Rakai Community Cohort Study, n=2,833; MRC/UVRI Uganda, n=701; Mochudi Prevention Project, n=359; Africa Health Research Institute Resistance Cohort, n=92). Next-generation sequencing success rates varied: more than 80% of the viral genome from the gag to the nef genes could be determined for all sequences from South Africa, 75% of sequences from Mochudi, 60% of sequences from MRC/UVRI Uganda, and 22% of sequences from Rakai. Partial sequencing failure was primarily associated with low viral load, increased for amplicons closer to the 3' end of the genome, was not associated with subtype diversity except HIV-1 subtype D, and remained significantly associated with sampling location after controlling for other factors. We assessed the impact of the missing data patterns in PANGEA-HIV sequences on phylogeny reconstruction in simulations. We found a threshold in terms of taxon sampling below which the patchy distribution of missing characters in next-generation sequences has an excess negative impact on the accuracy of HIV-1 phylogeny reconstruction, which is attributable to tree reconstruction artifacts that accumulate when branches in viral trees are long. The large number of PANGEA-HIV sequences provides unprecedented opportunities for evaluating HIV-1 transmission dynamics across sub-Saharan Africa and identifying prevention opportunities. Molecular epidemiological analyses of these data must proceed cautiously because sequence sampling remains below the identified threshold and a considerable negative impact of missing characters on phylogeny reconstruction is expected.
Horizontal transfers of Mariner transposons between mammals and insects.

PubMed

Oliveira, Sarah G; Bao, Weidong; Martins, Cesar; Jurka, Jerzy

2012-09-26

Active transposable elements (TEs) can be passed between genomes of different species by horizontal transfer (HT). This may help them to avoid vertical extinction due to elimination by natural selection or silencing. HT is relatively frequent within eukaryotic taxa, but rare between distant species. Closely related Mariner-type DNA transposon families, collectively named as Mariner-1_Tbel families, are present in the genomes of two ants and two mammalian genomes. Consensus sequences of the four families show pairwise identities greater than 95%. In addition, mammalian Mariner1_BT family shows a close evolutionary relationship with some insect Mariner families. Mammalian Mariner1_BT type sequences are present only in species from three groups including ruminants, tooth whales (Odontoceti), and New World leaf-nosed bats (Phyllostomidae). Horizontal transfer accounts for the presence of Mariner_Tbel and Mariner1_BT families in mammals. Mariner_Tbel family was introduced into hedgehog and tree shrew genomes approximately 100 to 69 million years ago (MYA). Most likely, these TE families were transferred from insects to mammals, but details of the transfer remain unknown.
Characterization of the Porphyromonas gingivalis conjugative transposon CTnPg1: determination of the integration site and the genes essential for conjugal transfer.

PubMed

Naito, Mariko; Sato, Keiko; Shoji, Mikio; Yukitake, Hideharu; Ogura, Yoshitoshi; Hayashi, Tetsuya; Nakayama, Koji

2011-07-01

In our previous study, extensive genomic rearrangements were found in two strains of the Gram-negative anaerobic bacterium Porphyromonas (Por.) gingivalis, and most of these rearrangements were associated with mobile genetic elements such as insertion sequences and conjugative transposons (CTns). CTnPg1, identified in Por. gingivalis strain ATCC 33277, was the first complete CTn reported for the genus Porphyromonas. In the present study, we found that CTnPg1 can be transferred from strain ATCC 33277 to another Por. gingivalis strain, W83, at a frequency of 10(-7) to 10(-6). The excision of CTnPg1 from the chromosome in a donor cell depends on an integrase (Int; PGN_0094) encoded in CTnPg1, whereas CTnPg1 excision is independent of PGN_0084 (a DNA topoisomerase I homologue; Exc) encoded within CTnPg1 and recA (PGN_1057) on the donor chromosome. Intriguingly, however, the transfer of CTnPg1 between Por. gingivalis strains requires RecA function in the recipient. Sequencing analysis of CTnPg1-integrated sites on the chromosomes of transconjugants revealed that the consensus attachment (att) sequence is a 13 bp sequence, TTTTCNNNNAAAA. We further report that CTnPg1 is able to transfer to two other bacterial species, Bacteroides thetaiotaomicron and Prevotella oralis. In addition, CTnPg1-like CTns are located in the genomes of other oral anaerobic bacteria, Porphyromonas endodontalis, Prevotella buccae and Prevotella intermedia, with the same consensus att sequence. These results suggest that CTns in the CTnPg1 family are widely distributed among oral anaerobic Gram-negative bacteria found in humans and play important roles in horizontal gene transfer among these bacteria.
Genetic map of artichoke × wild cardoon: toward a consensus map for Cynara cardunculus.

PubMed

Sonnante, Gabriella; Gatto, Angela; Morgese, Anita; Montemurro, Francesco; Sarli, Giulio; Blanco, Emanuela; Pignone, Domenico

2011-11-01

An integrated consensus linkage map is proposed for globe artichoke. Maternal and paternal genetic maps were constructed on the basis of an F(1) progeny derived from crossing an artichoke genotype (Mola) with its progenitor, the wild cardoon (Tolfa), using EST-derived SSRs, genomic SSRs, AFLPs, ten genes, and two morphological traits. For most genes, mainly belonging to the chlorogenic acid pathway, new markers were developed. Five of these were SNP markers analyzed through high-resolution melt technology. From the maternal (Mola) and paternal (Tolfa) maps, an integrated map was obtained, containing 337 molecular and one morphological markers ordered in 17 linkage groups (LGs), linked between Mola and Tolfa. The integrated map covers 1,488.8 cM, with an average distance of 4.4 cM between markers. The map was aligned with already existing maps for artichoke, and 12 LGs were linked via 31 bridge markers. LG numbering has been proposed. A total of 124 EST-SSRs and two genes were mapped here for the first time, providing a framework for the construction of a functional map in artichoke. The establishment of a consensus map represents a necessary condition to plan a complete sequencing of the globe artichoke genome.
Testing Convergent Evolution in Auditory Processing Genes between Echolocating Mammals and the Aye-Aye, a Percussive-Foraging Primate

PubMed Central

Jerjos, Michael; Hohman, Baily; Lauterbur, M. Elise; Kistler, Logan

2017-01-01

Abstract Several taxonomically distinct mammalian groups—certain microbats and cetaceans (e.g., dolphins)—share both morphological adaptations related to echolocation behavior and strong signatures of convergent evolution at the amino acid level across seven genes related to auditory processing. Aye-ayes (Daubentonia madagascariensis) are nocturnal lemurs with a specialized auditory processing system. Aye-ayes tap rapidly along the surfaces of trees, listening to reverberations to identify the mines of wood-boring insect larvae; this behavior has been hypothesized to functionally mimic echolocation. Here we investigated whether there are signals of convergence in auditory processing genes between aye-ayes and known mammalian echolocators. We developed a computational pipeline (Basic Exon Assembly Tool) that produces consensus sequences for regions of interest from shotgun genomic sequencing data for nonmodel organisms without requiring de novo genome assembly. We reconstructed complete coding region sequences for the seven convergent echolocating bat–dolphin genes for aye-ayes and another lemur. We compared sequences from these two lemurs in a phylogenetic framework with those of bat and dolphin echolocators and appropriate nonecholocating outgroups. Our analysis reaffirms the existence of amino acid convergence at these loci among echolocating bats and dolphins; some methods also detected signals of convergence between echolocating bats and both mice and elephants. However, we observed no significant signal of amino acid convergence between aye-ayes and echolocating bats and dolphins, suggesting that aye-aye tap-foraging auditory adaptations represent distinct evolutionary innovations. These results are also consistent with a developing consensus that convergent behavioral ecology does not reliably predict convergent molecular evolution. PMID:28810710
Genome-wide analysis of Tol2 transposon reintegration in zebrafish.

PubMed

Kondrychyn, Igor; Garcia-Lecea, Marta; Emelyanov, Alexander; Parinov, Sergey; Korzh, Vladimir

2009-09-08

Tol2, a member of the hAT family of transposons, has become a useful tool for genetic manipulation of model animals, but information about its interactions with vertebrate genomes is still limited. Furthermore, published reports on Tol2 have mainly been based on random integration of the transposon system after co-injection of a plasmid DNA harboring the transposon and a transposase mRNA. It is important to understand how Tol2 would behave upon activation after integration into the genome. We performed a large-scale enhancer trap (ET) screen and generated 338 insertions of the Tol2 transposon-based ET cassette into the zebrafish genome. These insertions were generated by remobilizing the transposon from two different donor sites in two transgenic lines. We found that 39% of Tol2 insertions occurred in transcription units, mostly into introns. Analysis of the transposon target sites revealed no strict specificity at the DNA sequence level. However, Tol2 was prone to target AT-rich regions with weak palindromic consensus sequences centered at the insertion site. Our systematic analysis of sequential remobilizations of the Tol2 transposon from two independent sites within a vertebrate genome has revealed properties such as a tendency to integrate into transcription units and into AT-rich palindrome-like sequences. This information will influence the development of various applications involving DNA transposons and Tol2 in particular.
Characterization of a species-specific repetitive DNA from a highly endangered wild animal, Rhinoceros unicornis, and assessment of genetic polymorphism by microsatellite associated sequence amplification (MASA).

PubMed

Ali, S; Azfer, M A; Bashamboo, A; Mathur, P K; Malik, P K; Mathur, V B; Raha, A K; Ansari, S

1999-03-04

We have cloned and sequenced a 906bp EcoRI repeat DNA fraction from Rhinoceros unicornis genome. The contig pSS(R)2 is AT rich with 340 A (37.53%), 187 C (20.64%), 173 G (19.09%) and 206 T (22.74%). The sequence contains MALT box, NF-E1, Poly-A signal, lariat consensus sequences, TATA box, translational initiation sequences and several stop codons. Translation of the contig showed seven different types of protein motifs, among which, EGF-like domain cysteine pattern signatures and Bowman-Birk serine protease inhibitor family signatures were prominent. The presence of eukaryotic transcriptional elements, protein signatures and analysis of subset sequences in the 5' region from 1 to 165nt indicating coding potential (test code value=0.97) suggest possible regulatory and/or functional role(s) of these sequences in the rhino genome. Translation of the complementary strand from 906 to 706nt and 190 to 2nt showed proteins of more than 7kDa rich in non-polar residues. This suggests that pSS(R)2 is either a part of, or adjacent to, a functional gene. The contig contains mostly non-consecutive simple repeat units from 2 to 17nt with varying frequencies, of which four base motifs were found to be predominant. Zoo-blot hybridization revealed that pSS(R)2 sequences are unique to R. unicornis genome because they do not cross-hybridize, even with the genomic DNA of South African black rhino Diceros bicornis. Southern blot analysis of R. unicornis genomic DNA with pSS(R)2 and other synthetic oligo probes revealed a high level of genetic homogeneity, which was also substantiated by microsatellite associated sequence amplification (MASA). Owing to its uniqueness, the pSS(R)2 probe has a potential application in the area of conservation biology for unequivocal identification of horn or other body tissues of R. unicornis. The evolutionary aspect of this repeat fraction in the context of comparative genome analysis is discussed.
Identification and characterization of ARS-like sequences as putative origin(s) of replication in human malaria parasite Plasmodium falciparum.

PubMed

Agarwal, Meetu; Bhowmick, Krishanu; Shah, Kushal; Krishnamachari, Annangarachari; Dhar, Suman Kumar

2017-08-01

DNA replication is a fundamental process in genome maintenance, and initiates from several genomic sites (origins) in eukaryotes. In Saccharomyces cerevisiae, conserved sequences known as autonomously replicating sequences (ARSs) provide a landing pad for the origin recognition complex (ORC), leading to replication initiation. Although origins from higher eukaryotes share some common sequence features, the definitive genomic organization of these sites remains elusive. The human malaria parasite Plasmodium falciparum undergoes multiple rounds of DNA replication; therefore, control of initiation events is crucial to ensure proper replication. However, the sites of DNA replication initiation and the mechanism by which replication is initiated are poorly understood. Here, we have identified and characterized putative origins in P. falciparum by bioinformatics analyses and experimental approaches. An autocorrelation measure method was initially used to search for regions with marked fluctuation (dips) in the chromosome, which we hypothesized might contain potential origins. Indeed, S. cerevisiae ARS consensus sequences were found in dip regions. Several of these P. falciparum sequences were validated with chromatin immunoprecipitation-quantitative PCR, nascent strand abundance and a plasmid stability assay. Subsequently, the same sequences were used in yeast to confirm their potential as origins in vivo. Our results identify the presence of functional ARSs in P. falciparum and provide meaningful insights into replication origins in these deadly parasites. These data could be useful in designing transgenic vectors with improved stability for transfection in P. falciparum. © 2017 Federation of European Biochemical Societies.
Comparison of phasing strategies for whole human genomes

PubMed Central

Kirkness, Ewen; Schork, Nicholas J.

2018-01-01

Humans are a diploid species that inherit one set of chromosomes paternally and one homologous set of chromosomes maternally. Unfortunately, most human sequencing initiatives ignore this fact in that they do not directly delineate the nucleotide content of the maternal and paternal copies of the 23 chromosomes individuals possess (i.e., they do not ‘phase’ the genome) often because of the costs and complexities of doing so. We compared 11 different widely-used approaches to phasing human genomes using the publicly available ‘Genome-In-A-Bottle’ (GIAB) phased version of the NA12878 genome as a gold standard. The phasing strategies we compared included laboratory-based assays that prepare DNA in unique ways to facilitate phasing as well as purely computational approaches that seek to reconstruct phase information from general sequencing reads and constructs or population-level haplotype frequency information obtained through a reference panel of haplotypes. To assess the performance of the 11 approaches, we used metrics that included, among others, switch error rates, haplotype block lengths, the proportion of fully phase-resolved genes, phasing accuracy and yield between pairs of SNVs. Our comparisons suggest that a hybrid or combined approach that leverages: 1. population-based phasing using the SHAPEIT software suite, 2. either genome-wide sequencing read data or parental genotypes, and 3. a large reference panel of variant and haplotype frequencies, provides a fast and efficient way to produce highly accurate phase-resolved individual human genomes. We found that for population-based approaches, phasing performance is enhanced with the addition of genome-wide read data; e.g., whole genome shotgun and/or RNA sequencing reads. Further, we found that the inclusion of parental genotype data within a population-based phasing strategy can provide as much as a ten-fold reduction in phasing errors. We also considered a majority voting scheme for the construction of a consensus haplotype combining multiple predictions for enhanced performance and site coverage. Finally, we also identified DNA sequence signatures associated with the genomic regions harboring phasing switch errors, which included regions of low polymorphism or SNV density. PMID:29621242
A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer.

PubMed

Quick, Joshua; Quinlan, Aaron R; Loman, Nicholas J

2014-01-01

The MinION™ is a new, portable single-molecule sequencer developed by Oxford Nanopore Technologies. It measures four inches in length and is powered from the USB 3.0 port of a laptop computer. The MinION™ measures the change in current resulting from DNA strands interacting with a charged protein nanopore. These measurements can then be used to deduce the underlying nucleotide sequence. We present a read dataset from whole-genome shotgun sequencing of the model organism Escherichia coli K-12 substr. MG1655 generated on a MinION™ device during the early-access MinION™ Access Program (MAP). Sequencing runs of the MinION™ are presented, one generated using R7 chemistry (released in July 2014) and one using R7.3 (released in September 2014). Base-called sequence data are provided to demonstrate the nature of data produced by the MinION™ platform and to encourage the development of customised methods for alignment, consensus and variant calling, de novo assembly and scaffolding. FAST5 files containing event data within the HDF5 container format are provided to assist with the development of improved base-calling methods.
High resolution identity testing of inactivated poliovirus vaccines.

PubMed

Mee, Edward T; Minor, Philip D; Martin, Javier

2015-07-09

Definitive identification of poliovirus strains in vaccines is essential for quality control, particularly where multiple wild-type and Sabin strains are produced in the same facility. Sequence-based identification provides the ultimate in identity testing and would offer several advantages over serological methods. We employed random RT-PCR and high throughput sequencing to recover full-length genome sequences from monovalent and trivalent poliovirus vaccine products at various stages of the manufacturing process. All expected strains were detected in previously characterised products and the method permitted identification of strains comprising as little as 0.1% of sequence reads. Highly similar Mahoney and Sabin 1 strains were readily discriminated on the basis of specific variant positions. Analysis of a product known to contain incorrect strains demonstrated that the method correctly identified the contaminants. Random RT-PCR and shotgun sequencing provided high resolution identification of vaccine components. In addition to the recovery of full-length genome sequences, the method could also be easily adapted to the characterisation of minor variant frequencies and distinction of closely related products on the basis of distinguishing consensus and low frequency polymorphisms. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

Variation of 45S rDNA intergenic spacers in Arabidopsis thaliana.

PubMed

Havlová, Kateřina; Dvořáčková, Martina; Peiro, Ramon; Abia, David; Mozgová, Iva; Vansáčová, Lenka; Gutierrez, Crisanto; Fajkus, Jiří

2016-11-01

Approximately seven hundred 45S rRNA genes (rDNA) in the Arabidopsis thaliana genome are organised in two 4 Mbp-long arrays of tandem repeats arranged in head-to-tail fashion separated by an intergenic spacer (IGS). These arrays make up 5 % of the A. thaliana genome. IGS are rapidly evolving sequences and frequent rearrangements inside the rDNA loci have generated considerable interspecific and even intra-individual variability which allows to distinguish among otherwise highly conserved rRNA genes. The IGS has not been comprehensively described despite its potential importance in regulation of rDNA transcription and replication. Here we describe the detailed sequence variation in the complete IGS of A. thaliana WT plants and provide the reference/consensus IGS sequence, as well as genomic DNA analysis. We further investigate mutants dysfunctional in chromatin assembly factor-1 (CAF-1) (fas1 and fas2 mutants), which are known to have a reduced number of rDNA copies, and plant lines with restored CAF-1 function (segregated from a fas1xfas2 genetic background) showing major rDNA rearrangements. The systematic rDNA loss in CAF-1 mutants leads to the decreased variability of the IGS and to the occurrence of distinct IGS variants. We present for the first time a comprehensive and representative set of complete IGS sequences, obtained by conventional cloning and by Pacific Biosciences sequencing. Our data expands the knowledge of the A. thaliana IGS sequence arrangement and variability, which has not been available in full and in detail until now. This is also the first study combining IGS sequencing data with RFLP analysis of genomic DNA.
[An intriguing model for 5S rDNA sequences dispersion in the genome of freshwater stingray Potamotrygon motoro (Chondrichthyes: Potamotrygonidae)].

PubMed

Cruz, V P; Oliveira, C; Foresti, F

2015-01-01

5S rDNA genes of the stingray Potamotrygon motoro were PCR replicated, purified, cloned and sequenced. Two distinct classes of segments of different sizes were obtained. The smallest, with 342 bp units, was classified as class I, and the largest, with 1900 bp units, was designated as class II. Alignment with the consensus sequences for both classes showed changes in a few bases in the 5S rDNA genes. TATA-like sequences were detected in the nontranscribed spacer (NTS) regions of class I and a microsatellite (GCT) 10 sequence was detected in the NTS region of class II. The results obtained can help to understand the molecular organization of ribosomal genes and the mechanism of gene dispersion.
Genetic Diversity and Selective Pressure in Hepatitis C Virus Genotypes 1-6: Significance for Direct-Acting Antiviral Treatment and Drug Resistance.

PubMed

Cuypers, Lize; Li, Guangdi; Libin, Pieter; Piampongsant, Supinya; Vandamme, Anne-Mieke; Theys, Kristof

2015-09-16

Treatment with pan-genotypic direct-acting antivirals, targeting different viral proteins, is the best option for clearing hepatitis C virus (HCV) infection in chronically infected patients. However, the diversity of the HCV genome is a major obstacle for the development of antiviral drugs, vaccines, and genotyping assays. In this large-scale analysis, genome-wide diversity and selective pressure was mapped, focusing on positions important for treatment, drug resistance, and resistance testing. A dataset of 1415 full-genome sequences, including genotypes 1-6 from the Los Alamos database, was analyzed. In 44% of all full-genome positions, the consensus amino acid was different for at least one genotype. Focusing on positions sharing the same consensus amino acid in all genotypes revealed that only 15% was defined as pan-genotypic highly conserved (≥99% amino acid identity) and an additional 24% as pan-genotypic conserved (≥95%). Despite its large genetic diversity, across all genotypes, codon positions were rarely identified to be positively selected (0.23%-0.46%) and predominantly found to be under negative selective pressure, suggesting mainly neutral evolution. For NS3, NS5A, and NS5B, respectively, 40% (6/15), 33% (3/9), and 14% (2/14) of the resistance-related positions harbored as consensus the amino acid variant related to resistance, potentially impeding treatment. For example, the NS3 variant 80K, conferring resistance to simeprevir used for treatment of HCV1 infected patients, was present in 39.3% of the HCV1a strains and 0.25% of HCV1b strains. Both NS5A variants 28M and 30S, known to be associated with resistance to the pan-genotypic drug daclatasvir, were found in a significant proportion of HCV4 strains (10.7%). NS5B variant 556G, known to confer resistance to non-nucleoside inhibitor dasabuvir, was observed in 8.4% of the HCV1b strains. Given the large HCV genetic diversity, sequencing efforts for resistance testing purposes may need to be genotype-specific or geographically tailored.
Return of genomic results to research participants: the floor, the ceiling, and the choices in between.

PubMed

Jarvik, Gail P; Amendola, Laura M; Berg, Jonathan S; Brothers, Kyle; Clayton, Ellen W; Chung, Wendy; Evans, Barbara J; Evans, James P; Fullerton, Stephanie M; Gallego, Carlos J; Garrison, Nanibaa' A; Gray, Stacy W; Holm, Ingrid A; Kullo, Iftikhar J; Lehmann, Lisa Soleymani; McCarty, Cathy; Prows, Cynthia A; Rehm, Heidi L; Sharp, Richard R; Salama, Joseph; Sanderson, Saskia; Van Driest, Sara L; Williams, Marc S; Wolf, Susan M; Wolf, Wendy A; Burke, Wylie

2014-06-05

As more research studies incorporate next-generation sequencing (including whole-genome or whole-exome sequencing), investigators and institutional review boards face difficult questions regarding which genomic results to return to research participants and how. An American College of Medical Genetics and Genomics 2013 policy paper suggesting that pathogenic mutations in 56 specified genes should be returned in the clinical setting has raised the question of whether comparable recommendations should be considered in research settings. The Clinical Sequencing Exploratory Research (CSER) Consortium and the Electronic Medical Records and Genomics (eMERGE) Network are multisite research programs that aim to develop practical strategies for addressing questions concerning the return of results in genomic research. CSER and eMERGE committees have identified areas of consensus regarding the return of genomic results to research participants. In most circumstances, if results meet an actionability threshold for return and the research participant has consented to return, genomic results, along with referral for appropriate clinical follow-up, should be offered to participants. However, participants have a right to decline the receipt of genomic results, even when doing so might be viewed as a threat to the participants' health. Research investigators should be prepared to return research results and incidental findings discovered in the course of their research and meeting an actionability threshold, but they have no ethical obligation to actively search for such results. These positions are consistent with the recognition that clinical research is distinct from medical care in both its aims and its guiding moral principles. Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Mapping the yeast genome by melting in nanofluidic devices

NASA Astrophysics Data System (ADS)

Welch, Robert L.; Czolkos, Ilja; Sladek, Rob; Reisner, Walter

2012-02-01

Optical mapping of DNA provides large-scale genomic information that can be used to assemble contigs from next-generation sequencing, and to detect re-arrangements between single cells. A recent optical mapping technique called denaturation mapping has the unique advantage of using physical principles rather than the action of enzymes to probe genomic structure. The absence of reagents or reaction steps makes denaturation mapping simpler than other protocols. Denaturation mapping uses fluorescence microscopy to image the pattern of partial melting along a DNA molecule extended in a channel of cross-section ˜100nm at the heart of a nanofluidic device. We successfully aligned melting maps from single DNA molecules to a theoretical map of the yeast genome (11.6Mbp) to identify their location. By aligning hundreds of molecules we assembled a consensus melting map of the yeast genome with 95% coverage.
Genetic Variation and Its Reflection on Posttranslational Modifications in Frequency Clock and Mating Type a-1 Proteins in Sordaria fimicola

PubMed Central

Arif, Rabia; Akram, Faiza; Jamil, Tazeen; Lee, Siu Fai

2017-01-01

Posttranslational modifications (PTMs) occur in all essential proteins taking command of their functions. There are many domains inside proteins where modifications take place on side-chains of amino acids through various enzymes to generate different species of proteins. In this manuscript we have, for the first time, predicted posttranslational modifications of frequency clock and mating type a-1 proteins in Sordaria fimicola collected from different sites to see the effect of environment on proteins or various amino acids pickings and their ultimate impact on consensus sequences present in mating type proteins using bioinformatics tools. Furthermore, we have also measured and walked through genomic DNA of various Sordaria strains to determine genetic diversity by genotyping the short sequence repeats (SSRs) of wild strains of S. fimicola collected from contrasting environments of two opposing slopes (harsh and xeric south facing slope and mild north facing slope) of Evolution Canyon (EC), Israel. Based on the whole genome sequence of S. macrospora, we targeted 20 genomic regions in S. fimicola which contain short sequence repeats (SSRs). Our data revealed genetic variations in strains from south facing slope and these findings assist in the hypothesis that genetic variations caused by stressful environments lead to evolution. PMID:28717646
Genetic Variation and Its Reflection on Posttranslational Modifications in Frequency Clock and Mating Type a-1 Proteins in Sordaria fimicola.

PubMed

Arif, Rabia; Akram, Faiza; Jamil, Tazeen; Mukhtar, Hamid; Lee, Siu Fai; Saleem, Muhammad

2017-01-01

Posttranslational modifications (PTMs) occur in all essential proteins taking command of their functions. There are many domains inside proteins where modifications take place on side-chains of amino acids through various enzymes to generate different species of proteins. In this manuscript we have, for the first time, predicted posttranslational modifications of frequency clock and mating type a-1 proteins in Sordaria fimicola collected from different sites to see the effect of environment on proteins or various amino acids pickings and their ultimate impact on consensus sequences present in mating type proteins using bioinformatics tools. Furthermore, we have also measured and walked through genomic DNA of various Sordaria strains to determine genetic diversity by genotyping the short sequence repeats (SSRs) of wild strains of S. fimicola collected from contrasting environments of two opposing slopes (harsh and xeric south facing slope and mild north facing slope) of Evolution Canyon (EC), Israel. Based on the whole genome sequence of S. macrospora , we targeted 20 genomic regions in S. fimicola which contain short sequence repeats (SSRs). Our data revealed genetic variations in strains from south facing slope and these findings assist in the hypothesis that genetic variations caused by stressful environments lead to evolution.
ChIP-seq analysis of the σ E regulon of Salmonella enterica serovar typhimurium reveals new genes implicated in heat shock and oxidative stress response

DOE PAGES

Li, Jie; Overall, Christopher C.; Johnson, Rudd C.; ...

2015-09-21

The alternative sigma factor σ E functions to maintain bacterial homeostasis and membrane integrity in response to extracytoplasmic stress by regulating thousands of genes both directly and indirectly. The transcriptional regulatory network governed by σ E in Salmonella and E. coli has been examined using microarray, however a genome-wide analysis of σ E–binding sites inSalmonella has not yet been reported. We infected macrophages with Salmonella Typhimurium over a select time course. Using chromatin immunoprecipitation followed by high-throughput DNA sequencing (ChIP-seq), 31 σ E–binding sites were identified. Seventeen sites were new, which included outer membrane proteins, a quorum-sensing protein, a cellmore » division factor, and a signal transduction modulator. The consensus sequence identified for σ E in vivo binding was similar to the one previously reported, except for a conserved G and A between the -35 and -10 regions. One third of the σ E–binding sites did not contain the consensus sequence, suggesting there may be alternative mechanisms by which σ E modulates transcription. By dissecting direct and indirect modes of σ E-mediated regulation, we found that σ E activates gene expression through recognition of both canonical and reversed consensus sequence. Lastly, new σ E regulated genes ( greA, luxS, ompA and ompX) are shown to be involved in heat shock and oxidative stress responses.« less
ChIP-seq analysis of the σ E regulon of Salmonella enterica serovar typhimurium reveals new genes implicated in heat shock and oxidative stress response

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Jie; Overall, Christopher C.; Johnson, Rudd C.

The alternative sigma factor σ E functions to maintain bacterial homeostasis and membrane integrity in response to extracytoplasmic stress by regulating thousands of genes both directly and indirectly. The transcriptional regulatory network governed by σ E in Salmonella and E. coli has been examined using microarray, however a genome-wide analysis of σ E–binding sites inSalmonella has not yet been reported. We infected macrophages with Salmonella Typhimurium over a select time course. Using chromatin immunoprecipitation followed by high-throughput DNA sequencing (ChIP-seq), 31 σ E–binding sites were identified. Seventeen sites were new, which included outer membrane proteins, a quorum-sensing protein, a cellmore » division factor, and a signal transduction modulator. The consensus sequence identified for σ E in vivo binding was similar to the one previously reported, except for a conserved G and A between the -35 and -10 regions. One third of the σ E–binding sites did not contain the consensus sequence, suggesting there may be alternative mechanisms by which σ E modulates transcription. By dissecting direct and indirect modes of σ E-mediated regulation, we found that σ E activates gene expression through recognition of both canonical and reversed consensus sequence. Lastly, new σ E regulated genes ( greA, luxS, ompA and ompX) are shown to be involved in heat shock and oxidative stress responses.« less
Identification and cloning of a gamma 3 subunit splice variant of the human GABA(A) receptor.

PubMed

Poulsen, C F; Christjansen, K N; Hastrup, S; Hartvig, L

2000-05-31

cDNA sequences encoding two forms of the GABA(A) gamma 3 receptor subunit were cloned from human hippocampus. The nucleotide sequences differ by the absence (gamma 3S) or presence (gamma 3L) of 18 bp located in the presumed intracellular loop between transmembrane region (TM) III and IV. The extra 18 bp in the gamma 3L subunit generates a consensus site for phosphorylation by protein kinase C (PKC). Analysis of human genomic DNA encoding the gamma 3 subunit reveals that the 18 bp insert is contiguous with the upstream proximal exon.
Prevention of influenza virus shedding and protection from lethal H1N1 challenge using a consensus 2009 H1N1 HA and NA adenovirus vector vaccine

PubMed Central

Jones, Frank R.; Gabitzsch, Elizabeth S.; Xu, Younong; Balint, Joseph P.; Borisevich, Viktoriya; Smith, Jennifer; Smith, Jeanon; Peng, Bi-Hung; Walker, Aida; Salazar, Magda; Paessler, Slobodan

2013-01-01

Vaccines against emerging pathogens such as the 2009 H1N1 pandemic virus can benefit from current technologies such as rapid genomic sequencing to construct the most biologically relevant vaccine. A novel platform (Ad5 [E1-, E2b-]) has been utilized to induce immune responses to various antigenic targets. We employed this vector platform to express hemagglutinin (HA) and neuraminidase (NA) genes from 2009 H1N1 pandemic viruses. Inserts were consensuses sequences designed from viral isolate sequences and the vaccine was rapidly constructed and produced. Vaccination induced H1N1 immune responses in mice, which afforded protection from lethal virus challenge. In ferrets, vaccination protected from disease development and significantly reduced viral titers in nasal washes. H1N1 cell mediated immunity as well as antibody induction correlated with the prevention of disease symptoms and reduction of virus replication. The Ad5 [E1-, E2b-] should be evaluated for the rapid development of effective vaccines against infectious diseases. PMID:21821082
Discovery and mapping of a new expressed sequence tag-single nucleotide polymorphism and simple sequence repeat panel for large-scale genetic studies and breeding of Theobroma cacao L.

PubMed Central

Allegre, Mathilde; Argout, Xavier; Boccara, Michel; Fouet, Olivier; Roguet, Yolande; Bérard, Aurélie; Thévenin, Jean Marc; Chauveau, Aurélie; Rivallan, Ronan; Clement, Didier; Courtois, Brigitte; Gramacho, Karina; Boland-Augé, Anne; Tahi, Mathias; Umaharan, Pathmanathan; Brunel, Dominique; Lanaud, Claire

2012-01-01

Theobroma cacao is an economically important tree of several tropical countries. Its genetic improvement is essential to provide protection against major diseases and improve chocolate quality. We discovered and mapped new expressed sequence tag-single nucleotide polymorphism (EST-SNP) and simple sequence repeat (SSR) markers and constructed a high-density genetic map. By screening 149 650 ESTs, 5246 SNPs were detected in silico, of which 1536 corresponded to genes with a putative function, while 851 had a clear polymorphic pattern across a collection of genetic resources. In addition, 409 new SSR markers were detected on the Criollo genome. Lastly, 681 new EST-SNPs and 163 new SSRs were added to the pre-existing 418 co-dominant markers to construct a large consensus genetic map. This high-density map and the set of new genetic markers identified in this study are a milestone in cocoa genomics and for marker-assisted breeding. The data are available at http://tropgenedb.cirad.fr. PMID:22210604
Whole-Genome Sequencing for Detecting Antimicrobial Resistance in Nontyphoidal Salmonella

PubMed Central

Tyson, Gregory H.; Kabera, Claudine; Chen, Yuansha; Li, Cong; Folster, Jason P.; Ayers, Sherry L.; Lam, Claudia; Tate, Heather P.; Zhao, Shaohua

2016-01-01

Laboratory-based in vitro antimicrobial susceptibility testing is the foundation for guiding anti-infective therapy and monitoring antimicrobial resistance trends. We used whole-genome sequencing (WGS) technology to identify known antimicrobial resistance determinants among strains of nontyphoidal Salmonella and correlated these with susceptibility phenotypes to evaluate the utility of WGS for antimicrobial resistance surveillance. Six hundred forty Salmonella of 43 different serotypes were selected from among retail meat and human clinical isolates that were tested for susceptibility to 14 antimicrobials using broth microdilution. The MIC for each drug was used to categorize isolates as susceptible or resistant based on Clinical and Laboratory Standards Institute clinical breakpoints or National Antimicrobial Resistance Monitoring System (NARMS) consensus interpretive criteria. Each isolate was subjected to whole-genome shotgun sequencing, and resistance genes were identified from assembled sequences. A total of 65 unique resistance genes, plus mutations in two structural resistance loci, were identified. There were more unique resistance genes (n = 59) in the 104 human isolates than in the 536 retail meat isolates (n = 36). Overall, resistance genotypes and phenotypes correlated in 99.0% of cases. Correlations approached 100% for most classes of antibiotics but were lower for aminoglycosides and beta-lactams. We report the first finding of extended-spectrum β-lactamases (ESBLs) (blaCTX-M1 and blaSHV2a) in retail meat isolates of Salmonella in the United States. Whole-genome sequencing is an effective tool for predicting antibiotic resistance in nontyphoidal Salmonella, although the use of more appropriate surveillance breakpoints and increased knowledge of new resistance alleles will further improve correlations. PMID:27381390
Molecular characterization of human T-cell lymphotropic virus type 1 full and partial genomes by Illumina massively parallel sequencing technology.

PubMed

Pessôa, Rodrigo; Watanabe, Jaqueline Tomoko; Nukui, Youko; Pereira, Juliana; Casseb, Jorge; Kasseb, Jorge; de Oliveira, Augusto César Penalva; Segurado, Aluisio Cotrim; Sanabani, Sabri Saeed

2014-01-01

Here, we report on the partial and full-length genomic (FLG) variability of HTLV-1 sequences from 90 well-characterized subjects, including 48 HTLV-1 asymptomatic carriers (ACs), 35 HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM/TSP) and 7 adult T-cell leukemia/lymphoma (ATLL) patients, using an Illumina paired-end protocol. Blood samples were collected from 90 individuals, and DNA was extracted from the PBMCs to measure the proviral load and to amplify the HTLV-1 FLG from two overlapping fragments. The amplified PCR products were subjected to deep sequencing. The sequencing data were assembled, aligned, and mapped against the HTLV-1 genome with sufficient genetic resemblance and utilized for further phylogenetic analysis. A high-throughput sequencing-by-synthesis instrument was used to obtain an average of 3210- and 5200-fold coverage of the partial (n = 14) and FLG (n = 76) data from the HTLV-1 strains, respectively. The results based on the phylogenetic trees of consensus sequences from partial and FLGs revealed that 86 (95.5%) individuals were infected with the transcontinental sub-subtypes of the cosmopolitan subtype (aA) and that 4 individuals (4.5%) were infected with the Japanese sub-subtypes (aB). A comparison of the nucleotide and amino acids of the FLG between the three clinical settings yielded no correlation between the sequenced genotype and clinical outcomes. The evolutionary relationships among the HTLV sequences were inferred from nucleotide sequence, and the results are consistent with the hypothesis that there were multiple introductions of the transcontinental subtype in Brazil. This study has increased the number of subtype aA full-length genomes from 8 to 81 and HTLV-1 aB from 2 to 5 sequences. The overall data confirmed that the cosmopolitan transcontinental sub-subtypes were the most prevalent in the Brazilian population. It is hoped that this valuable genomic data will add to our current understanding of the evolutionary history of this medically important virus.
Radiation hybrid maps of the D-genome of Aegilops tauschii and their application in sequence assembly of large and complex plant genomes.

PubMed

Kumar, Ajay; Seetan, Raed; Mergoum, Mohamed; Tiwari, Vijay K; Iqbal, Muhammad J; Wang, Yi; Al-Azzam, Omar; Šimková, Hana; Luo, Ming-Cheng; Dvorak, Jan; Gu, Yong Q; Denton, Anne; Kilian, Andrzej; Lazo, Gerard R; Kianian, Shahryar F

2015-10-16

The large and complex genome of bread wheat (Triticum aestivum L., ~17 Gb) requires high resolution genome maps with saturated marker scaffolds to anchor and orient BAC contigs/ sequence scaffolds for whole genome assembly. Radiation hybrid (RH) mapping has proven to be an excellent tool for the development of such maps for it offers much higher and more uniform marker resolution across the length of the chromosome compared to genetic mapping and does not require marker polymorphism per se, as it is based on presence (retention) vs. absence (deletion) marker assay. In this study, a 178 line RH panel was genotyped with SSRs and DArT markers to develop the first high resolution RH maps of the entire D-genome of Ae. tauschii accession AL8/78. To confirm map order accuracy, the AL8/78-RH maps were compared with:1) a DArT consensus genetic map constructed using more than 100 bi-parental populations, 2) a RH map of the D-genome of reference hexaploid wheat 'Chinese Spring', and 3) two SNP-based genetic maps, one with anchored D-genome BAC contigs and another with anchored D-genome sequence scaffolds. Using marker sequences, the RH maps were also anchored with a BAC contig based physical map and draft sequence of the D-genome of Ae. tauschii. A total of 609 markers were mapped to 503 unique positions on the seven D-genome chromosomes, with a total map length of 14,706.7 cR. The average distance between any two marker loci was 29.2 cR which corresponds to 2.1 cM or 9.8 Mb. The average mapping resolution across the D-genome was estimated to be 0.34 Mb (Mb/cR) or 0.07 cM (cM/cR). The RH maps showed almost perfect agreement with several published maps with regard to chromosome assignments of markers. The mean rank correlations between the position of markers on AL8/78 maps and the four published maps, ranged from 0.75 to 0.92, suggesting a good agreement in marker order. With 609 mapped markers, a total of 2481 deletions for the whole D-genome were detected with an average deletion size of 42.0 Mb. A total of 520 markers were anchored to 216 Ae. tauschii sequence scaffolds, 116 of which were not anchored earlier to the D-genome. This study reports the development of first high resolution RH maps for the D-genome of Ae. tauschii accession AL8/78, which were then used for the anchoring of unassigned sequence scaffolds. This study demonstrates how RH mapping, which offered high and uniform resolution across the length of the chromosome, can facilitate the complete sequence assembly of the large and complex plant genomes.
Process of labeling specific chromosomes using recombinant repetitive DNA

DOEpatents

Moyzis, R.K.; Meyne, J.

1988-02-12

Chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family members and consensus sequences of the repetitive DNA families for the chromosome preferential sequences. The selected low homology regions are then hybridized with chromosomes to determine those low homology regions hybridized with a specific chromosome under normal stringency conditions.
Full-genome dengue virus sequencing in mosquito saliva shows lack of convergent positive selection during transmission by Aedes aegypti

PubMed Central

Cao-Lormeau, Van-Mai; Lambrechts, Louis

2017-01-01

Abstract Like other pathogens with high mutation and replication rates, within-host dengue virus (DENV) populations evolve during infection of their main mosquito vector, Aedes aegypti. Within-host DENV evolution during transmission provides opportunities for adaptation and emergence of novel virus variants. Recent studies of DENV genetic diversity failed to detect convergent evolution of adaptive mutations in mosquito tissues such as midgut and salivary glands, suggesting that convergent positive selection is not a major driver of within-host DENV evolution in the vector. However, it is unknown whether this conclusion extends to the transmitted viral subpopulation because it is technically difficult to sequence DENV genomes in mosquito saliva. Here, we achieved DENV full-genome sequencing by pooling saliva samples collected non-sacrificially from 49 to 163 individual Ae. aegypti mosquitoes previously infected with one of two DENV-1 genotypes. We compared the transmitted viral subpopulations found in the pooled saliva samples collected in time series with the input viral population present in the infectious blood meal. In all pooled saliva samples examined, the full-genome consensus sequence of the input viral population was unchanged. Although the pooling strategy prevents analysis of individual saliva samples, our results demonstrate the lack of strong convergent positive selection during a single round of DENV transmission by Ae. aegypti. This finding reinforces the idea that genetic drift and purifying selection are the dominant evolutionary forces shaping within-host DENV genetic diversity during transmission by mosquitoes. PMID:29497564
Characterization of the genome of the dairy Lactobacillus helveticus bacteriophage {Phi}AQ113.

PubMed

Zago, Miriam; Scaltriti, Erika; Rossetti, Lia; Guffanti, Alessandro; Armiento, Angelarita; Fornasari, Maria Emanuela; Grolli, Stefano; Carminati, Domenico; Brini, Elena; Pavan, Paolo; Felsani, Armando; D'Urzo, Annalisa; Moles, Anna; Claude, Jean-Baptiste; Grandori, Rita; Ramoni, Roberto; Giraffa, Giorgio

2013-08-01

The complete genomic sequence of the dairy Lactobacillus helveticus bacteriophage ΦAQ113 was determined. Phage ΦAQ113 is a Myoviridae bacteriophage with an isometric capsid and a contractile tail. The final assembled consensus sequence revealed a linear, circularly permuted, double-stranded DNA genome with a size of 36,566 bp and a G+C content of 37%. Fifty-six open reading frames (ORFs) were predicted, and a putative function was assigned to approximately 90% of them. The ΦAQ113 genome shows functionally related genes clustered together in a genome structure composed of modules for DNA replication/regulation, DNA packaging, head and tail morphogenesis, cell lysis, and lysogeny. The identification of genes involved in the establishment of lysogeny indicates that it may have originated as a temperate phage, even if it was isolated from natural cheese whey starters as a virulent phage, because it is able to propagate in a sensitive host strain. Additionally, we discovered that the ΦAQ113 phage genome is closely related to Lactobacillus gasseri phage KC5a and Lactobacillus johnsonii phage Lj771 genomes. The phylogenetic similarities between L. helveticus phage ΦAQ113 and two phages that belong to gut species confirm a possible common ancestral origin and support the increasing consideration of L. helveticus as a health-promoting organism.
Characterization of the Genome of the Dairy Lactobacillus helveticus Bacteriophage ΦAQ113

PubMed Central

Scaltriti, Erika; Rossetti, Lia; Guffanti, Alessandro; Armiento, Angelarita; Fornasari, Maria Emanuela; Grolli, Stefano; Carminati, Domenico; Brini, Elena; Pavan, Paolo; Felsani, Armando; D'Urzo, Annalisa; Moles, Anna; Claude, Jean-Baptiste; Grandori, Rita; Ramoni, Roberto; Giraffa, Giorgio

2013-01-01

The complete genomic sequence of the dairy Lactobacillus helveticus bacteriophage ΦAQ113 was determined. Phage ΦAQ113 is a Myoviridae bacteriophage with an isometric capsid and a contractile tail. The final assembled consensus sequence revealed a linear, circularly permuted, double-stranded DNA genome with a size of 36,566 bp and a G+C content of 37%. Fifty-six open reading frames (ORFs) were predicted, and a putative function was assigned to approximately 90% of them. The ΦAQ113 genome shows functionally related genes clustered together in a genome structure composed of modules for DNA replication/regulation, DNA packaging, head and tail morphogenesis, cell lysis, and lysogeny. The identification of genes involved in the establishment of lysogeny indicates that it may have originated as a temperate phage, even if it was isolated from natural cheese whey starters as a virulent phage, because it is able to propagate in a sensitive host strain. Additionally, we discovered that the ΦAQ113 phage genome is closely related to Lactobacillus gasseri phage KC5a and Lactobacillus johnsonii phage Lj771 genomes. The phylogenetic similarities between L. helveticus phage ΦAQ113 and two phages that belong to gut species confirm a possible common ancestral origin and support the increasing consideration of L. helveticus as a health-promoting organism. PMID:23728811
The low information content of Neurospora splicing signals: implications for RNA splicing and intron origin.

PubMed

Collins, Richard A; Stajich, Jason E; Field, Deborah J; Olive, Joan E; DeAbreu, Diane M

2015-05-01

When we expressed a small (0.9 kb) nonprotein-coding transcript derived from the mitochondrial VS plasmid in the nucleus of Neurospora we found that it was efficiently spliced at one or more of eight 5' splice sites and ten 3' splice sites, which are present apparently by chance in the sequence. Further experimental and bioinformatic analyses of other mitochondrial plasmids, random sequences, and natural nuclear genes in Neurospora and other fungi indicate that fungal spliceosomes recognize a wide range of 5' splice site and branchpoint sequences and predict introns to be present at high frequency in random sequence. In contrast, analysis of intronless fungal nuclear genes indicates that branchpoint, 5' splice site and 3' splice site consensus sequences are underrepresented compared with random sequences. This underrepresentation of splicing signals is sufficient to deplete the nuclear genome of splice sites at locations that do not comprise biologically relevant introns. Thus, the splicing machinery can recognize a wide range of splicing signal sequences, but splicing still occurs with great accuracy, not because the splicing machinery distinguishes correct from incorrect introns, but because incorrect introns are substantially depleted from the genome. © 2015 Collins et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

Filovirus RefSeq Entries: Evaluation and Selection of Filovirus Type Variants, Type Sequences, and Names

PubMed Central

Kuhn, Jens H.; Andersen, Kristian G.; Bào, Yīmíng; Bavari, Sina; Becker, Stephan; Bennett, Richard S.; Bergman, Nicholas H.; Blinkova, Olga; Bradfute, Steven; Brister, J. Rodney; Bukreyev, Alexander; Chandran, Kartik; Chepurnov, Alexander A.; Davey, Robert A.; Dietzgen, Ralf G.; Doggett, Norman A.; Dolnik, Olga; Dye, John M.; Enterlein, Sven; Fenimore, Paul W.; Formenty, Pierre; Freiberg, Alexander N.; Garry, Robert F.; Garza, Nicole L.; Gire, Stephen K.; Gonzalez, Jean-Paul; Griffiths, Anthony; Happi, Christian T.; Hensley, Lisa E.; Herbert, Andrew S.; Hevey, Michael C.; Hoenen, Thomas; Honko, Anna N.; Ignatyev, Georgy M.; Jahrling, Peter B.; Johnson, Joshua C.; Johnson, Karl M.; Kindrachuk, Jason; Klenk, Hans-Dieter; Kobinger, Gary; Kochel, Tadeusz J.; Lackemeyer, Matthew G.; Lackner, Daniel F.; Leroy, Eric M.; Lever, Mark S.; Mühlberger, Elke; Netesov, Sergey V.; Olinger, Gene G.; Omilabu, Sunday A.; Palacios, Gustavo; Panchal, Rekha G.; Park, Daniel J.; Patterson, Jean L.; Paweska, Janusz T.; Peters, Clarence J.; Pettitt, James; Pitt, Louise; Radoshitzky, Sheli R.; Ryabchikova, Elena I.; Saphire, Erica Ollmann; Sabeti, Pardis C.; Sealfon, Rachel; Shestopalov, Aleksandr M.; Smither, Sophie J.; Sullivan, Nancy J.; Swanepoel, Robert; Takada, Ayato; Towner, Jonathan S.; van der Groen, Guido; Volchkov, Viktor E.; Volchkova, Valentina A.; Wahl-Jensen, Victoria; Warren, Travis K.; Warfield, Kelly L.; Weidmann, Manfred; Nichol, Stuart T.

2014-01-01

Sequence determination of complete or coding-complete genomes of viruses is becoming common practice for supporting the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, sequencing hardware is under modification for use by non-experts, and software is constantly being improved to simplify sequence data management and analysis. Thus, analysis of virus disease outbreaks on the molecular level is now feasible, including characterization of the evolution of individual virus populations in single patients over time. The increasing accumulation of sequencing data creates a management problem for the curators of commonly used sequence databases and an entry retrieval problem for end users. Therefore, utilizing the data to their fullest potential will require setting nomenclature and annotation standards for virus isolates and associated genomic sequences. The National Center for Biotechnology Information’s (NCBI’s) RefSeq is a non-redundant, curated database for reference (or type) nucleotide sequence records that supplies source data to numerous other databases. Building on recently proposed templates for filovirus variant naming [ ()////-], we report consensus decisions from a majority of past and currently active filovirus experts on the eight filovirus type variants and isolates to be represented in RefSeq, their final designations, and their associated sequences. PMID:25256396
Convergence of DNA methylation and phosphorothioation epigenetics in bacterial genomes.

PubMed

Chen, Chao; Wang, Lianrong; Chen, Si; Wu, Xiaolin; Gu, Meijia; Chen, Xi; Jiang, Susu; Wang, Yunfu; Deng, Zixin; Dedon, Peter C; Chen, Shi

2017-04-25

Explosive growth in the study of microbial epigenetics has revealed a diversity of chemical structures and biological functions of DNA modifications in restriction-modification (R-M) and basic genetic processes. Here, we describe the discovery of shared consensus sequences for two seemingly unrelated DNA modification systems, 6m A methylation and phosphorothioation (PT), in which sulfur replaces a nonbridging oxygen in the DNA backbone. Mass spectrometric analysis of DNA from Escherichia coli B7A and Salmonella enterica serovar Cerro 87, strains possessing PT-based R-M genes, revealed d(G PS 6m A) dinucleotides in the G PS 6m AAC consensus representing ∼5% of the 1,100 to 1,300 PT-modified d(G PS A) motifs per genome, with 6m A arising from a yet-to-be-identified methyltransferase. To further explore PT and 6m A in another consensus sequence, G PS 6m ATC, we engineered a strain of E. coli HST04 to express Dnd genes from Hahella chejuensis KCTC2396 (PT in G PS ATC) and Dam methyltransferase from E. coli DH10B ( 6m A in G 6m ATC). Based on this model, in vitro studies revealed reduced Dam activity in G PS ATC-containing oligonucleotides whereas single-molecule real-time sequencing of HST04 DNA revealed 6m A in all 2,058 G PS ATC sites (5% of 37,698 total GATC sites). This model system also revealed temperature-sensitive restriction by DndFGH in KCTC2396 and B7A, which was exploited to discover that 6m A can substitute for PT to confer resistance to restriction by the DndFGH system. These results point to complex but unappreciated interactions between DNA modification systems and raise the possibility of coevolution of interacting systems to facilitate the function of each.
Genome sequencing of bacteria: sequencing, de novo assembly and rapid analysis using open source tools.

PubMed

Kisand, Veljo; Lettieri, Teresa

2013-04-01

De novo genome sequencing of previously uncharacterized microorganisms has the potential to open up new frontiers in microbial genomics by providing insight into both functional capabilities and biodiversity. Until recently, Roche 454 pyrosequencing was the NGS method of choice for de novo assembly because it generates hundreds of thousands of long reads (<450 bps), which are presumed to aid in the analysis of uncharacterized genomes. The array of tools for processing NGS data are increasingly free and open source and are often adopted for both their high quality and role in promoting academic freedom. The error rate of pyrosequencing the Alcanivorax borkumensis genome was such that thousands of insertions and deletions were artificially introduced into the finished genome. Despite a high coverage (~30 fold), it did not allow the reference genome to be fully mapped. Reads from regions with errors had low quality, low coverage, or were missing. The main defect of the reference mapping was the introduction of artificial indels into contigs through lower than 100% consensus and distracting gene calling due to artificial stop codons. No assembler was able to perform de novo assembly comparable to reference mapping. Automated annotation tools performed similarly on reference mapped and de novo draft genomes, and annotated most CDSs in the de novo assembled draft genomes. Free and open source software (FOSS) tools for assembly and annotation of NGS data are being developed rapidly to provide accurate results with less computational effort. Usability is not high priority and these tools currently do not allow the data to be processed without manual intervention. Despite this, genome assemblers now readily assemble medium short reads into long contigs (>97-98% genome coverage). A notable gap in pyrosequencing technology is the quality of base pair calling and conflicting base pairs between single reads at the same nucleotide position. Regardless, using draft whole genomes that are not finished and remain fragmented into tens of contigs allows one to characterize unknown bacteria with modest effort.
Mutational analyses of molecularly cloned satellite tobacco mosaic virus during serial passage in plants: Evidence for hotspots of genetic change

USGS Publications Warehouse

Kurath, G.; Dodds, J.A.

1995-01-01

The high level of genetic diversity and rapid evolution of viral RNA genomes are well documented, but few studies have characterized the rate and nature of ongoing genetic change over time under controlled experimental conditions, especially in plant hosts. The RNA genome of satellite tobacco mosaic virus (STMV) was used as an effective model for such studies because of advantageous features of its genome structure and because the extant genetic heterogeneity of STMV has been characterized previously. In the present study, the process of genetic change over time was studied by monitoring multiple serial passage lines of STMV populations for changes in their consensus sequences. A total of 42 passage lines were initiated by inoculation of tobacco plants with a helper tobamovirus and one of four STMV RNA inocula that were transcribed from full-length infectious STMV clones or extracted from purified STMV type strain virions. Ten serial passages were carried out for each line and the consensus genotypes of progeny STMV populations were assessed for genetic change by RNase protection analyses of the entire 1,059-nt STMV genome. Three different types of genetic change were observed, including the fixation of novel mutations in 9 of 42 lines, mutation at the major heterogeneity site near nt 751 in 5 of the 19 lines inoculated with a single genotype, and selection of a single major genotype in 6 of the 23 lines inoculated with mixed genotypes. Sequence analyses showed that the majority of mutations were single base substitutions. The distribution of mutation sites included three clusters in which mutations occurred at or very near the same site, suggesting hot spots of genetic change in the STMV genome. The diversity of genetic changes in sibling lines is clear evidence for the important role of chance and random sampling events in the process of genetic diversification of STMV virus populations.
Testing Convergent Evolution in Auditory Processing Genes between Echolocating Mammals and the Aye-Aye, a Percussive-Foraging Primate.

PubMed

Bankoff, Richard J; Jerjos, Michael; Hohman, Baily; Lauterbur, M Elise; Kistler, Logan; Perry, George H

2017-07-01

Several taxonomically distinct mammalian groups-certain microbats and cetaceans (e.g., dolphins)-share both morphological adaptations related to echolocation behavior and strong signatures of convergent evolution at the amino acid level across seven genes related to auditory processing. Aye-ayes (Daubentonia madagascariensis) are nocturnal lemurs with a specialized auditory processing system. Aye-ayes tap rapidly along the surfaces of trees, listening to reverberations to identify the mines of wood-boring insect larvae; this behavior has been hypothesized to functionally mimic echolocation. Here we investigated whether there are signals of convergence in auditory processing genes between aye-ayes and known mammalian echolocators. We developed a computational pipeline (Basic Exon Assembly Tool) that produces consensus sequences for regions of interest from shotgun genomic sequencing data for nonmodel organisms without requiring de novo genome assembly. We reconstructed complete coding region sequences for the seven convergent echolocating bat-dolphin genes for aye-ayes and another lemur. We compared sequences from these two lemurs in a phylogenetic framework with those of bat and dolphin echolocators and appropriate nonecholocating outgroups. Our analysis reaffirms the existence of amino acid convergence at these loci among echolocating bats and dolphins; some methods also detected signals of convergence between echolocating bats and both mice and elephants. However, we observed no significant signal of amino acid convergence between aye-ayes and echolocating bats and dolphins, suggesting that aye-aye tap-foraging auditory adaptations represent distinct evolutionary innovations. These results are also consistent with a developing consensus that convergent behavioral ecology does not reliably predict convergent molecular evolution. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Strong spurious transcription likely contributes to DNA insert bias in typical metagenomic clone libraries.

PubMed

Lam, Kathy N; Charles, Trevor C

2015-01-01

Clone libraries provide researchers with a powerful resource to study nucleic acid from diverse sources. Metagenomic clone libraries in particular have aided in studies of microbial biodiversity and function, and allowed the mining of novel enzymes. Libraries are often constructed by cloning large inserts into cosmid or fosmid vectors. Recently, there have been reports of GC bias in fosmid metagenomic libraries, and it was speculated to be a result of fragmentation and loss of AT-rich sequences during cloning. However, evidence in the literature suggests that transcriptional activity or gene product toxicity may play a role. To explore possible mechanisms responsible for sequence bias in clone libraries, we constructed a cosmid library from a human microbiome sample and sequenced DNA from different steps during library construction: crude extract DNA, size-selected DNA, and cosmid library DNA. We confirmed a GC bias in the final cosmid library, and we provide evidence that the bias is not due to fragmentation and loss of AT-rich sequences but is likely occurring after DNA is introduced into Escherichia coli. To investigate the influence of strong constitutive transcription, we searched the sequence data for promoters and found that rpoD/σ(70) promoter sequences were underrepresented in the cosmid library. Furthermore, when we examined the genomes of taxa that were differentially abundant in the cosmid library relative to the original sample, we found the bias to be more correlated with the number of rpoD/σ(70) consensus sequences in the genome than with simple GC content. The GC bias of metagenomic libraries does not appear to be due to DNA fragmentation. Rather, analysis of promoter sequences provides support for the hypothesis that strong constitutive transcription from sequences recognized as rpoD/σ(70) consensus-like in E. coli may lead to instability, causing loss of the plasmid or loss of the insert DNA that gives rise to the transcription. Despite widespread use of E. coli to propagate foreign DNA in metagenomic libraries, the effects of in vivo transcriptional activity on clone stability are not well understood. Further work is required to tease apart the effects of transcription from those of gene product toxicity.
Maintaining replication origins in the face of genomic change.

PubMed

Di Rienzi, Sara C; Lindstrom, Kimberly C; Mann, Tobias; Noble, William S; Raghuraman, M K; Brewer, Bonita J

2012-10-01

Origins of replication present a paradox to evolutionary biologists. As a collection, they are absolutely essential genomic features, but individually are highly redundant and nonessential. It is therefore difficult to predict to what extent and in what regard origins are conserved over evolutionary time. Here, through a comparative genomic analysis of replication origins and chromosomal replication patterns in the budding yeasts Saccharomyces cerevisiae and Lachancea waltii, we assess to what extent replication origins survived genomic change produced from 150 million years of evolution. We find that L. waltii origins exhibit a core consensus sequence and nucleosome occupancy pattern highly similar to those of S. cerevisiae origins. We further observe that the overall progression of chromosomal replication is similar between L. waltii and S. cerevisiae. Nevertheless, few origins show evidence of being conserved in location between the two species. Among the conserved origins are those surrounding centromeres and adjacent to histone genes, suggesting that proximity to an origin may be important for their regulation. We conclude that, over evolutionary time, origins maintain sequence, structure, and regulation, but are continually being created and destroyed, with the result that their locations are generally not conserved.
Maintaining replication origins in the face of genomic change

PubMed Central

Di Rienzi, Sara C.; Lindstrom, Kimberly C.; Mann, Tobias; Noble, William S.; Raghuraman, M.K.; Brewer, Bonita J.

2012-01-01

Origins of replication present a paradox to evolutionary biologists. As a collection, they are absolutely essential genomic features, but individually are highly redundant and nonessential. It is therefore difficult to predict to what extent and in what regard origins are conserved over evolutionary time. Here, through a comparative genomic analysis of replication origins and chromosomal replication patterns in the budding yeasts Saccharomyces cerevisiae and Lachancea waltii, we assess to what extent replication origins survived genomic change produced from 150 million years of evolution. We find that L. waltii origins exhibit a core consensus sequence and nucleosome occupancy pattern highly similar to those of S. cerevisiae origins. We further observe that the overall progression of chromosomal replication is similar between L. waltii and S. cerevisiae. Nevertheless, few origins show evidence of being conserved in location between the two species. Among the conserved origins are those surrounding centromeres and adjacent to histone genes, suggesting that proximity to an origin may be important for their regulation. We conclude that, over evolutionary time, origins maintain sequence, structure, and regulation, but are continually being created and destroyed, with the result that their locations are generally not conserved. PMID:22665441
Molecular cloning of actin genes in Trichomonas vaginalis and phylogeny inferred from actin sequences.

PubMed

Bricheux, G; Brugerolle, G

1997-08-01

The parasitic protozoan Trichomonas vaginalis is known to contain the ubiquitous and highly conserved protein actin. A genomic library and a cDNA library have been screened to identify and clone the actin gene(s) of T. vaginalis. The nucleotide sequence of one gene and its flanking regions have been determined. The open reading frame encodes a protein of 376 amino acids. The sequence is not interrupted by any introns and the promoter could be represented by a 10 bp motif close to a consensus motif also found upstream of most sequenced T. vaginalis genes. The five different clones isolated from the cDNA library have similar sequences and encode three actin proteins differing only by one or two amino acids. A phylogenetic analysis of 31 actin sequences by distance matrix and parsimony methods, using centractin as outgroup, gives congruent trees with Parabasala branching above Diplomonadida.
Cloning and sequencing of a laccase gene from the lignin-degrading basidiomycete Pleurotus ostreatus.

PubMed Central

Giardina, P; Cannio, R; Martirani, L; Marzullo, L; Palmieri, G; Sannia, G

1995-01-01

The gene (pox1) encoding a phenol oxidase from Pleurotus ostreatus, a lignin-degrading basidiomycete, was cloned and sequenced, and the corresponding pox1 cDNA was also synthesized and sequenced. The isolated gene consists of 2,592 bp, with the coding sequence being interrupted by 19 introns and flanked by an upstream region in which putative CAAT and TATA consensus sequences could be identified at positions -174 and -84, respectively. The isolation of a second cDNA (pox2 cDNA), showing 84% similarity, and of the corresponding truncated genomic clones demonstrated the existence of a multigene family coding for isoforms of laccase in P. ostreatus. PCR amplifications of specific regions on the DNA of isolated monokaryons proved that the two genes are not allelic forms. The POX1 amino acid sequence deduced was compared with those of other known laccases from different fungi. PMID:7793961
Mojo Hand, a TALEN design tool for genome editing applications.

PubMed

Neff, Kevin L; Argue, David P; Ma, Alvin C; Lee, Han B; Clark, Karl J; Ekker, Stephen C

2013-01-16

Recent studies of transcription activator-like (TAL) effector domains fused to nucleases (TALENs) demonstrate enormous potential for genome editing. Effective design of TALENs requires a combination of selecting appropriate genetic features, finding pairs of binding sites based on a consensus sequence, and, in some cases, identifying endogenous restriction sites for downstream molecular genetic applications. We present the web-based program Mojo Hand for designing TAL and TALEN constructs for genome editing applications (http://www.talendesign.org). We describe the algorithm and its implementation. The features of Mojo Hand include (1) automatic download of genomic data from the National Center for Biotechnology Information, (2) analysis of any DNA sequence to reveal pairs of binding sites based on a user-defined template, (3) selection of restriction-enzyme recognition sites in the spacer between the TAL monomer binding sites including options for the selection of restriction enzyme suppliers, and (4) output files designed for subsequent TALEN construction using the Golden Gate assembly method. Mojo Hand enables the rapid identification of TAL binding sites for use in TALEN design. The assembly of TALEN constructs, is also simplified by using the TAL-site prediction program in conjunction with a spreadsheet management aid of reagent concentrations and TALEN formulation. Mojo Hand enables scientists to more rapidly deploy TALENs for genome editing applications.
A Molecular Phylogeny of Living Primates

PubMed Central

Perelman, Polina; Johnson, Warren E.; Roos, Christian; Seuánez, Hector N.; Horvath, Julie E.; Moreira, Miguel A. M.; Kessing, Bailey; Pontius, Joan; Roelke, Melody; Rumpler, Yves; Schneider, Maria Paula C.; Silva, Artur; O'Brien, Stephen J.; Pecon-Slattery, Jill

2011-01-01

Comparative genomic analyses of primates offer considerable potential to define and understand the processes that mold, shape, and transform the human genome. However, primate taxonomy is both complex and controversial, with marginal unifying consensus of the evolutionary hierarchy of extant primate species. Here we provide new genomic sequence (∼8 Mb) from 186 primates representing 61 (∼90%) of the described genera, and we include outgroup species from Dermoptera, Scandentia, and Lagomorpha. The resultant phylogeny is exceptionally robust and illuminates events in primate evolution from ancient to recent, clarifying numerous taxonomic controversies and providing new data on human evolution. Ongoing speciation, reticulate evolution, ancient relic lineages, unequal rates of evolution, and disparate distributions of insertions/deletions among the reconstructed primate lineages are uncovered. Our resolution of the primate phylogeny provides an essential evolutionary framework with far-reaching applications including: human selection and adaptation, global emergence of zoonotic diseases, mammalian comparative genomics, primate taxonomy, and conservation of endangered species. PMID:21436896
Molecular biology of mycoplasmas: from the minimum cell concept to the artificial cell.

PubMed

Cordova, Caio M M; Hoeltgebaum, Daniela L; Machado, Laís D P N; Santos, Larissa Dos

2016-01-01

Mycoplasmas are a large group of bacteria, sorted into different genera in the Mollicutes class, whose main characteristic in common, besides the small genome, is the absence of cell wall. They are considered cellular and molecular biology study models. We present an updated review of the molecular biology of these model microorganisms and the development of replicative vectors for the transformation of mycoplasmas. Synthetic biology studies inspired by these pioneering works became possible and won the attention of the mainstream media. For the first time, an artificial genome was synthesized (a minimal genome produced from consensus sequences obtained from mycoplasmas). For the first time, a functional artificial cell has been constructed by introducing a genome completely synthesized within a cell envelope of a mycoplasma obtained by transformation techniques. Therefore, this article offers an updated insight to the state of the art of these peculiar organisms' molecular biology.
An Evolutionarily Young Polar Bear (Ursus maritimus) Endogenous Retrovirus Identified from Next Generation Sequence Data.

PubMed

Tsangaras, Kyriakos; Mayer, Jens; Alquezar-Planas, David E; Greenwood, Alex D

2015-11-24

Transcriptome analysis of polar bear (Ursus maritimus) tissues identified sequences with similarity to Porcine Endogenous Retroviruses (PERV). Based on these sequences, four proviral copies and 15 solo long terminal repeats (LTRs) of a newly described endogenous retrovirus were characterized from the polar bear draft genome sequence. Closely related sequences were identified by PCR analysis of brown bear (Ursus arctos) and black bear (Ursus americanus) but were absent in non-Ursinae bear species. The virus was therefore designated UrsusERV. Two distinct groups of LTRs were observed including a recombinant ERV that contained one LTR belonging to each group indicating that genomic invasions by at least two UrsusERV variants have recently occurred. Age estimates based on proviral LTR divergence and conservation of integration sites among ursids suggest the viral group is only a few million years old. The youngest provirus was polar bear specific, had intact open reading frames (ORFs) and could potentially encode functional proteins. Phylogenetic analyses of UrsusERV consensus protein sequences suggest that it is part of a pig, gibbon and koala retrovirus clade. The young age estimates and lineage specificity of the virus suggests UrsusERV is a recent cross species transmission from an unknown reservoir and places the viral group among the youngest of ERVs identified in mammals.
An Evolutionarily Young Polar Bear (Ursus maritimus) Endogenous Retrovirus Identified from Next Generation Sequence Data

PubMed Central

Tsangaras, Kyriakos; Mayer, Jens; Alquezar-Planas, David E.; Greenwood, Alex D.

2015-01-01

Transcriptome analysis of polar bear (Ursus maritimus) tissues identified sequences with similarity to Porcine Endogenous Retroviruses (PERV). Based on these sequences, four proviral copies and 15 solo long terminal repeats (LTRs) of a newly described endogenous retrovirus were characterized from the polar bear draft genome sequence. Closely related sequences were identified by PCR analysis of brown bear (Ursus arctos) and black bear (Ursus americanus) but were absent in non-Ursinae bear species. The virus was therefore designated UrsusERV. Two distinct groups of LTRs were observed including a recombinant ERV that contained one LTR belonging to each group indicating that genomic invasions by at least two UrsusERV variants have recently occurred. Age estimates based on proviral LTR divergence and conservation of integration sites among ursids suggest the viral group is only a few million years old. The youngest provirus was polar bear specific, had intact open reading frames (ORFs) and could potentially encode functional proteins. Phylogenetic analyses of UrsusERV consensus protein sequences suggest that it is part of a pig, gibbon and koala retrovirus clade. The young age estimates and lineage specificity of the virus suggests UrsusERV is a recent cross species transmission from an unknown reservoir and places the viral group among the youngest of ERVs identified in mammals. PMID:26610552
Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

PubMed

Rehm, Charlotte; Wurmthaler, Lena A; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S

2015-01-01

In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1-5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6-9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria.
Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

PubMed Central

Rehm, Charlotte; Wurmthaler, Lena A.; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S.

2015-01-01

In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1–5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6–9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria. PMID:26695179
Structural and sequence diversity of the transposon Galileo in the Drosophila willistoni genome.

PubMed

Gonçalves, Juliana W; Valiati, Victor Hugo; Delprat, Alejandra; Valente, Vera L S; Ruiz, Alfredo

2014-09-13

Galileo is one of three members of the P superfamily of DNA transposons. It was originally discovered in Drosophila buzzatii, in which three segregating chromosomal inversions were shown to have been generated by ectopic recombination between Galileo copies. Subsequently, Galileo was identified in six of 12 sequenced Drosophila genomes, indicating its widespread distribution within this genus. Galileo is strikingly abundant in Drosophila willistoni, a neotropical species that is highly polymorphic for chromosomal inversions, suggesting a role for this transposon in the evolution of its genome. We carried out a detailed characterization of all Galileo copies present in the D. willistoni genome. A total of 191 copies, including 133 with two terminal inverted repeats (TIRs), were classified according to structure in six groups. The TIRs exhibited remarkable variation in their length and structure compared to the most complete copy. Three copies showed extended TIRs due to internal tandem repeats, the insertion of other transposable elements (TEs), or the incorporation of non-TIR sequences into the TIRs. Phylogenetic analyses of the transposase (TPase)-encoding and TIR segments yielded two divergent clades, which we termed Galileo subfamilies V and W. Target-site duplications (TSDs) in D. willistoni Galileo copies were 7- or 8-bp in length, with the consensus sequence GTATTAC. Analysis of the region around the TSDs revealed a target site motif (TSM) with a 15-bp palindrome that may give rise to a stem-loop secondary structure. There is a remarkable abundance and diversity of Galileo copies in the D. willistoni genome, although no functional copies were found. The TIRs in particular have a dynamic structure and extend in different ways, but their ends (required for transposition) are more conserved than the rest of the element. The D. willistoni genome harbors two Galileo subfamilies (V and W) that diverged ~9 million years ago and may have descended from an ancestral element in the genome. Galileo shows a significant insertion preference for a 15-bp palindromic TSM.
A public resource facilitating clinical use of genomes

PubMed Central

Ball, Madeleine P.; Thakuria, Joseph V.; Zaranek, Alexander Wait; Clegg, Tom; Rosenbaum, Abraham M.; Wu, Xiaodi; Angrist, Misha; Bhak, Jong; Bobe, Jason; Callow, Matthew J.; Cano, Carlos; Chou, Michael F.; Chung, Wendy K.; Douglas, Shawn M.; Estep, Preston W.; Gore, Athurva; Hulick, Peter; Labarga, Alberto; Lee, Je-Hyuk; Lunshof, Jeantine E.; Kim, Byung Chul; Kim, Jong-Il; Li, Zhe; Murray, Michael F.; Nilsen, Geoffrey B.; Peters, Brock A.; Raman, Anugraha M.; Rienhoff, Hugh Y.; Robasky, Kimberly; Wheeler, Matthew T.; Vandewege, Ward; Vorhaus, Daniel B.; Yang, Joyce L.; Yang, Luhan; Aach, John; Ashley, Euan A.; Drmanac, Radoje; Kim, Seong-Jin; Li, Jin Billy; Peshkin, Leonid; Seidman, Christine E.; Seo, Jeong-Sun; Zhang, Kun; Rehm, Heidi L.; Church, George M.

2012-01-01

Rapid advances in DNA sequencing promise to enable new diagnostics and individualized therapies. Achieving personalized medicine, however, will require extensive research on highly reidentifiable, integrated datasets of genomic and health information. To assist with this, participants in the Personal Genome Project choose to forgo privacy via our institutional review board- approved “open consent” process. The contribution of public data and samples facilitates both scientific discovery and standardization of methods. We present our findings after enrollment of more than 1,800 participants, including whole-genome sequencing of 10 pilot participant genomes (the PGP-10). We introduce the Genome-Environment-Trait Evidence (GET-Evidence) system. This tool automatically processes genomes and prioritizes both published and novel variants for interpretation. In the process of reviewing the presumed healthy PGP-10 genomes, we find numerous literature references implying serious disease. Although it is sometimes impossible to rule out a late-onset effect, stringent evidence requirements can address the high rate of incidental findings. To that end we develop a peer production system for recording and organizing variant evaluations according to standard evidence guidelines, creating a public forum for reaching consensus on interpretation of clinically relevant variants. Genome analysis becomes a two-step process: using a prioritized list to record variant evaluations, then automatically sorting reviewed variants using these annotations. Genome data, health and trait information, participant samples, and variant interpretations are all shared in the public domain—we invite others to review our results using our participant samples and contribute to our interpretations. We offer our public resource and methods to further personalized medical research. PMID:22797899
Systematic sequencing of mRNA from the Antarctic krill (Euphausia superba) and first tissue specific transcriptional signature

PubMed Central

De Pittà, Cristiano; Bertolucci, Cristiano; Mazzotta, Gabriella M; Bernante, Filippo; Rizzo, Giorgia; De Nardi, Barbara; Pallavicini, Alberto; Lanfranchi, Gerolamo; Costa, Rodolfo

2008-01-01

Background Little is known about the genome sequences of Euphausiacea (krill) although these crustaceans are abundant components of the pelagic ecosystems in all oceans and used for aquaculture and pharmaceutical industry. This study reports the results of an expressed sequence tag (EST) sequencing project from different tissues of Euphausia superba (the Antarctic krill). Results We have constructed and sequenced five cDNA libraries from different Antarctic krill tissues: head, abdomen, thoracopods and photophores. We have identified 1.770 high-quality ESTs which were assembled into 216 overlapping clusters and 801 singletons resulting in a total of 1.017 non-redundant sequences. Quantitative RT-PCR analysis was performed to quantify and validate the expression levels of ten genes presenting different EST countings in krill tissues. In addition, bioinformatic screening of the non-redundant E. superba sequences identified 69 microsatellite containing ESTs. Clusters, consensuses and related similarity and gene ontology searches were organized in a dedicated E. superba database . Conclusion We defined the first tissue transcriptional signatures of E. superba based on functional categorization among the examined tissues. The analyses of annotated transcripts showed a higher similarity with genes from insects with respect to Malacostraca possibly as an effect of the limited number of Malacostraca sequences in the public databases. Our catalogue provides for the first time a genomic tool to investigate the biology of the Antarctic krill. PMID:18226200

INSaFLU: an automated open web-based bioinformatics suite "from-reads" for influenza whole-genome-sequencing-based surveillance.

PubMed

Borges, Vítor; Pinheiro, Miguel; Pechirra, Pedro; Guiomar, Raquel; Gomes, João Paulo

2018-06-29

A new era of flu surveillance has already started based on the genetic characterization and exploration of influenza virus evolution at whole-genome scale. Although this has been prioritized by national and international health authorities, the demanded technological transition to whole-genome sequencing (WGS)-based flu surveillance has been particularly delayed by the lack of bioinformatics infrastructures and/or expertise to deal with primary next-generation sequencing (NGS) data. We developed and implemented INSaFLU ("INSide the FLU"), which is the first influenza-oriented bioinformatics free web-based suite that deals with primary NGS data (reads) towards the automatic generation of the output data that are actually the core first-line "genetic requests" for effective and timely influenza laboratory surveillance (e.g., type and sub-type, gene and whole-genome consensus sequences, variants' annotation, alignments and phylogenetic trees). By handling NGS data collected from any amplicon-based schema, the implemented pipeline enables any laboratory to perform multi-step software intensive analyses in a user-friendly manner without previous advanced training in bioinformatics. INSaFLU gives access to user-restricted sample databases and projects management, being a transparent and flexible tool specifically designed to automatically update project outputs as more samples are uploaded. Data integration is thus cumulative and scalable, fitting the need for a continuous epidemiological surveillance during the flu epidemics. Multiple outputs are provided in nomenclature-stable and standardized formats that can be explored in situ or through multiple compatible downstream applications for fine-tuned data analysis. This platform additionally flags samples as "putative mixed infections" if the population admixture enrolls influenza viruses with clearly distinct genetic backgrounds, and enriches the traditional "consensus-based" influenza genetic characterization with relevant data on influenza sub-population diversification through a depth analysis of intra-patient minor variants. This dual approach is expected to strengthen our ability not only to detect the emergence of antigenic and drug resistance variants but also to decode alternative pathways of influenza evolution and to unveil intricate routes of transmission. In summary, INSaFLU supplies public health laboratories and influenza researchers with an open "one size fits all" framework, potentiating the operationalization of a harmonized multi-country WGS-based surveillance for influenza virus. INSaFLU can be accessed through https://insaflu.insa.pt .
Abr1, a Transposon-Like Element in the Genome of the Cultivated Mushroom Agaricus bisporus (Lange) Imbach

PubMed Central

Sonnenberg, Anton S. M.; Baars, Johan J. P.; Mikosch, Thomas S. P.; Schaap, Peter J.; Van Griensven, Leo J. L. D.

1999-01-01

A 300-bp repetitive element was found in the genome of the white button mushroom, Agaricus bisporus, and designated Abr1. It is present in ∼15 copies per haploid genome in the commercial strain Horst U1. Analysis of seven copies showed 89 to 97% sequence identity. The repeat has features typical of class II transposons (i.e., terminal inverted repeats, subterminal repeats, and a target site duplication of 7 bp). The latter shows a consensus sequence. When used as probe on Southern blots, Abr1 identifies relatively little variation within traditional and present-day commercial strains, indicating that most strains are identical or have a common origin. In contrast to these cultivars, high variation is found among field-collected strains. Furthermore, a remarkable difference in copy numbers of Abr1 was found between A. bisporus isolates with a secondarily homothallic life cycle and those with a heterothallic life cycle. Abr1 is a type II transposon not previously reported in basidiomycetes and appears to be useful for the identification of strains within the species A. bisporus. PMID:10427018
Identification of a novel plant amalgavirus (Amalgavirus, Amalgaviridae) genome sequence in Cistus incanus.

PubMed

Goh, C J; Park, D; Lee, J S; Sebastiani, F; Hahn, Y

2018-01-01

Amalgaviridae is a family of double-stranded, monosegmented RNA viruses that are associated with plants, fungi, microsporidians, and animals. A sequence contig derived from the transcriptome of a eudicot, Cistus incanus (the family Cistaceae; commonly known as hoary rockrose), was identified as the genome sequence of a novel plant RNA virus and named Cistus incanus RNA virus 1 (CiRV1). Sequence comparison and phylogenetic analysis indicated that CiRV1 is a novel species of the genus Amalgavirus in the family Amalgaviridae. The CiRV1 genome contig has two overlapping open reading frames (ORFs). ORF1 encodes a putative replication factory matrix-like protein, while ORF2 encodes a RNA-dependent RNA polymerase (RdRp) domain. An ORF1+2 fusion protein, which functions in viral RNA replication, is produced by a +1 programmed ribosomal frameshifting (PRF) mechanism. A +1 PRF motif UUU_CGU, which matches the conserved amalgavirus +1 PRF consensus sequence UUU_CGN, was found at the boundary of CiRV1 ORF1 and ORF2. Comparison of 25 amalgavirus ORF1+2 fusion proteins revealed that only three different positions within a 13-amino acid segment were recurrently used at the boundary, possibly being selected so as not to interfere with correct folding and function of the fusion protein. CiRV1 is the first virus found to be associated with the Cistus species and may be useful for studying amalgaviruses.
Enrichment dynamics of Listeria monocytogenes and the associated microbiome from naturally contaminated ice cream linked to a listeriosis outbreak.

PubMed

Ottesen, Andrea; Ramachandran, Padmini; Reed, Elizabeth; White, James R; Hasan, Nur; Subramanian, Poorani; Ryan, Gina; Jarvis, Karen; Grim, Christopher; Daquiqan, Ninalynn; Hanes, Darcy; Allard, Marc; Colwell, Rita; Brown, Eric; Chen, Yi

2016-11-16

Microbiota that co-enrich during efforts to recover pathogens from foodborne outbreaks interfere with efficient detection and recovery. Here, dynamics of co-enriching microbiota during recovery of Listeria monocytogenes from naturally contaminated ice cream samples linked to an outbreak are described for three different initial enrichment formulations used by the Food and Drug Administration (FDA), the International Organization of Standardization (ISO), and the United States Department of Agriculture (USDA). Enrichment cultures were analyzed using DNA extraction and sequencing from samples taken every 4 h throughout 48 h of enrichment. Resphera Insight and CosmosID analysis tools were employed for high-resolution profiling of 16S rRNA amplicons and whole genome shotgun data, respectively. During enrichment, other bacterial taxa were identified, including Anoxybacillus, Geobacillus, Serratia, Pseudomonas, Erwinia, and Streptococcus spp. Surprisingly, incidence of L. monocytogenes was proportionally greater at hour 0 than when tested 4, 8, and 12 h later with all three enrichment schemes. The corresponding increase in Anoxybacillus and Geobacillus spp.indicated these taxa co-enriched in competition with L. monocytogenes during early enrichment hours. L. monocytogenes became dominant after 24 h in all three enrichments. DNA sequences obtained from shotgun metagenomic data of Listeria monocytogenes at 48 h were assembled to produce a consensus draft genome which appeared to have a similar tracking utility to pure culture isolates of L. monocytogenes. All three methods performed equally well for enrichment of Listeria monocytogenes. The observation of potential competitive exclusion of L. mono by Anoxybacillus and Geobacillus in early enrichment hours provided novel information that may be used to further optimize enrichment formulations. Application of Resphera Insight for high-resolution analysis of 16S amplicon sequences accurately identified L. monocytogenes. Both shotgun and 16S rRNA data supported the presence of three slightly variable genomes of L. monocytogenes. Moreover, the draft assembly of a consensus genome of L. monocytogenes from shotgun metagenomic data demonstrated the potential utility of this approach to expedite trace-back of outbreak-associated strains, although further validation will be needed to confirm this utility.
Quantitative mutant analysis of viral quasispecies by chip-based matrix-assisted laser desorption/ ionization time-of-flight mass spectrometry

PubMed Central

Amexis, Georgios; Oeth, Paul; Abel, Kenneth; Ivshina, Anna; Pelloquin, Francois; Cantor, Charles R.; Braun, Andreas; Chumakov, Konstantin

2001-01-01

RNA viruses exist as quasispecies, heterogeneous and dynamic mixtures of mutants having one or more consensus sequences. An adequate description of the genomic structure of such viral populations must include the consensus sequence(s) plus a quantitative assessment of sequence heterogeneities. For example, in quality control of live attenuated viral vaccines, the presence of even small quantities of mutants or revertants may indicate incomplete or unstable attenuation that may influence vaccine safety. Previously, we demonstrated the monitoring of oral poliovirus vaccine with the use of mutant analysis by PCR and restriction enzyme cleavage (MAPREC). In this report, we investigate genetic variation in live attenuated mumps virus vaccine by using both MAPREC and a platform (DNA MassArray) based on matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry. Mumps vaccines prepared from the Jeryl Lynn strain typically contain at least two distinct viral substrains, JL1 and JL2, which have been characterized by full length sequencing. We report the development of assays for characterizing sequence variants in these substrains and demonstrate their use in quantitative analysis of substrains and sequence variations in mixed virus cultures and mumps vaccines. The results obtained from both the MAPREC and MALDI-TOF methods showed excellent correlation. This suggests the potential utility of MALDI-TOF for routine quality control of live viral vaccines and for assessment of genetic stability and quantitative monitoring of genetic changes in other RNA viruses of clinical interest. PMID:11593021
A high-density, SNP-based consensus map of tetraploid wheat as a bridge to integrate durum and bread wheat genomics and breeding

USDA-ARS?s Scientific Manuscript database

Consensus linkage maps are important tools in crop genomics. We have assembled a high-density tetraploid wheat consensus map by integrating 13 datasets from independent biparental populations involving durum wheat cultivars (Triticum turgidum ssp. durum), cultivated emmer (T. turgidum ssp. dicoccum...
Whole-Genome Sequencing for Detecting Antimicrobial Resistance in Nontyphoidal Salmonella.

PubMed

McDermott, Patrick F; Tyson, Gregory H; Kabera, Claudine; Chen, Yuansha; Li, Cong; Folster, Jason P; Ayers, Sherry L; Lam, Claudia; Tate, Heather P; Zhao, Shaohua

2016-09-01

Laboratory-based in vitro antimicrobial susceptibility testing is the foundation for guiding anti-infective therapy and monitoring antimicrobial resistance trends. We used whole-genome sequencing (WGS) technology to identify known antimicrobial resistance determinants among strains of nontyphoidal Salmonella and correlated these with susceptibility phenotypes to evaluate the utility of WGS for antimicrobial resistance surveillance. Six hundred forty Salmonella of 43 different serotypes were selected from among retail meat and human clinical isolates that were tested for susceptibility to 14 antimicrobials using broth microdilution. The MIC for each drug was used to categorize isolates as susceptible or resistant based on Clinical and Laboratory Standards Institute clinical breakpoints or National Antimicrobial Resistance Monitoring System (NARMS) consensus interpretive criteria. Each isolate was subjected to whole-genome shotgun sequencing, and resistance genes were identified from assembled sequences. A total of 65 unique resistance genes, plus mutations in two structural resistance loci, were identified. There were more unique resistance genes (n = 59) in the 104 human isolates than in the 536 retail meat isolates (n = 36). Overall, resistance genotypes and phenotypes correlated in 99.0% of cases. Correlations approached 100% for most classes of antibiotics but were lower for aminoglycosides and beta-lactams. We report the first finding of extended-spectrum β-lactamases (ESBLs) (blaCTX-M1 and blaSHV2a) in retail meat isolates of Salmonella in the United States. Whole-genome sequencing is an effective tool for predicting antibiotic resistance in nontyphoidal Salmonella, although the use of more appropriate surveillance breakpoints and increased knowledge of new resistance alleles will further improve correlations. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Mammalian genome projects reveal new growth hormone (GH) sequences. Characterization of the GH-encoding genes of armadillo (Dasypus novemcinctus), hedgehog (Erinaceus europaeus), bat (Myotis lucifugus), hyrax (Procavia capensis), shrew (Sorex araneus), ground squirrel (Spermophilus tridecemlineatus), elephant (Loxodonta africana), cat (Felis catus) and opossum (Monodelphis domestica).

PubMed

Wallis, Michael

2008-01-15

Mammalian growth hormone (GH) sequences have been shown previously to display episodic evolution: the sequence is generally strongly conserved but on at least two occasions during mammalian evolution (on lineages leading to higher primates and ruminants) bursts of rapid evolution occurred. However, the number of mammalian orders studied previously has been relatively limited, and the availability of sequence data via mammalian genome projects provides the potential for extending the range of GH gene sequences examined. Complete or nearly complete GH gene sequences for six mammalian species for which no data were previously available have been extracted from the genome databases-Dasypus novemcinctus (nine-banded armadillo), Erinaceus europaeus (western European hedgehog), Myotis lucifugus (little brown bat), Procavia capensis (cape rock hyrax), Sorex araneus (European shrew), Spermophilus tridecemlineatus (13-lined ground squirrel). In addition incomplete data for several other species have been extended. Examination of the data in detail and comparison with previously available sequences has allowed assessment of the reliability of deduced sequences. Several of the new sequences differ substantially from the consensus sequence previously determined for eutherian GHs, indicating greater variability than previously recognised, and confirming the episodic pattern of evolution. The episodic pattern is not seen for signal sequences, 5' upstream sequence or synonymous substitutions-it is specific to the mature protein sequence, suggesting that it relates to the hormonal function. The substitutions accumulated during the course of GH evolution have occurred mainly on the side of the hormone facing away from the receptor, in a non-random fashion, and it is suggested that this may reflect interaction of the receptor-bound hormone with other proteins or small ligands.
Viral pathogen discovery

PubMed Central

Chiu, Charles Y

2015-01-01

Viral pathogen discovery is of critical importance to clinical microbiology, infectious diseases, and public health. Genomic approaches for pathogen discovery, including consensus polymerase chain reaction (PCR), microarrays, and unbiased next-generation sequencing (NGS), have the capacity to comprehensively identify novel microbes present in clinical samples. Although numerous challenges remain to be addressed, including the bioinformatics analysis and interpretation of large datasets, these technologies have been successful in rapidly identifying emerging outbreak threats, screening vaccines and other biological products for microbial contamination, and discovering novel viruses associated with both acute and chronic illnesses. Downstream studies such as genome assembly, epidemiologic screening, and a culture system or animal model of infection are necessary to establish an association of a candidate pathogen with disease. PMID:23725672
A new open reading frame in the genome of the cyanobacterium Synechocystis sp. PCC 6803

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lysenko, E.S.; Ogarkova, O.A.; Tarasov, V.A.

1995-02-01

A new open reading frame ORF242, coding for a 26.47-kDa polypeptide, was found in a DNA fragment of the cyanobacterium Synechocystis 6803, transforming a photosynthetic mutant to photoautotrophy and having homology with plant chloroplast DNA. In the 5{prime} flanking region of ORF242, consensus sequences characteristic of a functioning gene were found. One copy of ORF242 is present in the Synechocystis 6803 genome. Insertion inactivation of ORF242 does not lead to a decrease in photosynthetic activity in cells of cyanobacteria but may influence the ratio between active complexes of photosystems I and II. 22 refs., 6 figs., 2 tabs.
Genome-wide comparison and taxonomic relatedness of multiple Xylella fastidiosa strains reveal the occurrence of three subspecies and a new Xylella species.

PubMed

Marcelletti, Simone; Scortichini, Marco

2016-10-01

A total of 21 Xylella fastidiosa strains were assessed by comparing their genomes to infer their taxonomic relationships. The whole-genome-based average nucleotide identity and tetranucleotide frequency correlation coefficient analyses were performed. In addition, a consensus tree based on comparisons of 956 core gene families, and a genome-wide phylogenetic tree and a Neighbor-net network were constructed with 820,088 nucleotides (i.e., approximately 30-33 % of the entire X. fastidiosa genome). All approaches revealed the occurrence of three well-demarcated genetic clusters that represent X. fastidiosa subspecies fastidiosa, multiplex and pauca, with the latter appeared to diverge. We suggest that the proposed but never formally described subspecies 'sandyi' and 'morus' are instead members of the subspecies fastidiosa. These analyses support the view that the Xylella strain isolated from Pyrus pyrifolia in Taiwan is likely to be a new species. A widely used multilocus sequence typing analysis yielded conflicting results.
Divergence and evolution of homologous regions of Bombyx mori nuclear polyhedrosis virus.

PubMed Central

Majima, K; Kobara, R; Maeda, S

1993-01-01

Homologous regions (hrs) (hr1,hr2-left,hr2-right,hr3,hr4-left,hr 4-right, and hr5) similar to those found in the Autographa californica nuclear polyhedrosis virus (AcNPV) genome were found in the Bombyx mori NPV (BmNPV) genome. The BmNPV hrs contained two to eight repeats of a homologous nucleotide sequence which were on average about 75 bp long. All of these homologous sequence repeats contained a 26-bp-long palindrome motif with an EcoRI or EcoRI-like site at its core. The consensus sequence of the BmNPV hrs showed 95% conservation with respect to those found in AcNPV. Nucleotide sequence analysis indicated that hr2-left and hr2-right of BmNPV evolved from an ancestor similar to hr2 of AcNPV by inversion, cleavage, and ligation. The polarities of the BmNPV and AcNPV hrs were conserved except for that of hr4-left. Within hr4-right of BmNPV, four repeats of a previously underscribed palindrome motif were found. Bmhr5D, a BmNPV mutant which lacked hr5, replicated at a rate similar to that of wild-type BmNPV in BmN cells and silkworm larvae, indicating that hr5 was not essential for viral replication. After ten passages of Bmhr5D in BmN cells, no detectable changes in its genome were observed by restriction endonuclease analysis. The evolution and divergence of the BmNPV genome are also discussed. Images PMID:8230471
A complete high-quality MinION nanopore assembly of an extensively drug-resistant Mycobacterium tuberculosis Beijing lineage strain identifies novel variation in repetitive PE/PPE gene regions.

PubMed

Bainomugisa, Arnold; Duarte, Tania; Lavu, Evelyn; Pandey, Sushil; Coulter, Chris; Marais, Ben J; Coin, Lachlan M

2018-06-15

A better understanding of the genomic changes that facilitate the emergence and spread of drug-resistant Mycobacterium tuberculosis strains is currently required. Here, we report the use of the MinION nanopore sequencer (Oxford Nanopore Technologies) to sequence and assemble an extensively drug-resistant (XDR) isolate, which is part of a modern Beijing sub-lineage strain, prevalent in Western Province, Papua New Guinea. Using 238-fold coverage obtained from a single flow-cell, de novo assembly of nanopore reads resulted into one contiguous assembly with 99.92 % assembly accuracy. Incorporation of complementary short read sequences (Illumina) as part of consensus error correction resulted in a 4 404 064 bp genome with 99.98 % assembly accuracy. This assembly had an average nucleotide identity of 99.7 % relative to the reference genome, H37Rv. We assembled nearly all GC-rich repetitive PE/PPE family genes (166/168) and identified variants within these genes. With an estimated genotypic error rate of 5.3 % from MinION data, we demonstrated identification of variants to include the conventional drug resistance mutations, and those that contribute to the resistance phenotype (efflux pumps/transporter) and virulence. Reference-based alignment of the assembly allowed detection of deletions and insertions. MinION sequencing provided a fully annotated assembly of a transmissible XDR strain from an endemic setting and showed its utility to provide further understanding of genomic processes within Mycobacterium tuberculosis.
PipeOnline 2.0: automated EST processing and functional data sorting.

PubMed

Ayoubi, Patricia; Jin, Xiaojing; Leite, Saul; Liu, Xianghui; Martajaja, Jeson; Abduraham, Abdurashid; Wan, Qiaolan; Yan, Wei; Misawa, Eduardo; Prade, Rolf A

2002-11-01

Expressed sequence tags (ESTs) are generated and deposited in the public domain, as redundant, unannotated, single-pass reactions, with virtually no biological content. PipeOnline automatically analyses and transforms large collections of raw DNA-sequence data from chromatograms or FASTA files by calling the quality of bases, screening and removing vector sequences, assembling and rewriting consensus sequences of redundant input files into a unigene EST data set and finally through translation, amino acid sequence similarity searches, annotation of public databases and functional data. PipeOnline generates an annotated database, retaining the processed unigene sequence, clone/file history, alignments with similar sequences, and proposed functional classification, if available. Functional annotation is automatic and based on a novel method that relies on homology of amino acid sequence multiplicity within GenBank records. Records are examined through a function ordered browser or keyword queries with automated export of results. PipeOnline offers customization for individual projects (MyPipeOnline), automated updating and alert service. PipeOnline is available at http://stress-genomics.org.
Nucleotide sequences of two genomic DNAs encoding peroxidase of Arabidopsis thaliana.

PubMed

Intapruk, C; Higashimura, N; Yamamoto, K; Okada, N; Shinmyo, A; Takano, M

1991-02-15

The peroxidase (EC 1.11.1.7)-encoding gene of Arabidopsis thaliana was screened from a genomic library using a cDNA encoding a neutral isozyme of horseradish, Armoracia rusticana, peroxidase (HRP) as a probe, and two positive clones were isolated. From the comparison with the sequences of the HRP-encoding genes, we concluded that two clones contained peroxidase-encoding genes, and they were named prxCa and prxEa. Both genes consisted of four exons and three introns; the introns had consensus nucleotides, GT and AG, at the 5' and 3' ends, respectively. The lengths of each putative exon of the prxEa gene were the same as those of the HRP-basic-isozyme-encoding gene, prxC3, and coded for 349 amino acids (aa) with a sequence homology of 89% to that encoded by prxC3. The prxCa gene was very close to the HRP-neutral-isozyme-encoding gene, prxC1b, and coded for 354 aa with 91% homology to that encoded by prxC1b. The aa sequence homology was 64% between the two peroxidases encoded by prxCa and prxEa.
Confirmation of a novel siadenovirus species detected in raptors: partial sequence and phylogenetic analysis.

PubMed

Kovács, Endre R; Benko, Mária

2009-03-01

Partial genome characterisation of a novel adenovirus, found recently in organ samples of multiple species of dead birds of prey, was carried out by sequence analysis of PCR-amplified DNA fragments. The virus, named as raptor adenovirus 1 (RAdV-1), has originally been detected by a nested PCR method with consensus primers targeting the adenoviral DNA polymerase gene. Phylogenetic analysis with the deduced amino acid sequence of the small PCR product has implied a new siadenovirus type present in the samples. Since virus isolation attempts remained unsuccessful, further characterisation of this putative novel siadenovirus was carried out with the use of PCR on the infected organ samples. The DNA sequence of the central genome part of RAdV-1, encompassing nine full (pTP, 52K, pIIIa, III, pVII, pX, pVI, hexon, protease) and two partial (DNA polymerase and DBP) genes and exceeding 12 kb pairs in size, was determined. Phylogenetic tree reconstructions, based on several genes, unambiguously confirmed the preliminary classification of RAdV-1 as a new species within the genus Siadenovirus. Further study of RAdV-1 is of interest since it represents a rare adenovirus genus of yet undetermined host origin.
Genomic Structure of the Luciferase Gene from the Bioluminescent Beetle, Nyctophila cf. Caucasica

PubMed Central

Day, John C.; Chaichi, Mohammad J.; Najafil, Iraj; Whiteley, Andrew S.

2006-01-01

The gene coding for beetle luciferase, the enzyme responsible for bioluminescence in over two thousand coleopteran species has, to date, only been characterized from one Palearctic species of Lampyridae. Here we report the characterization of the luciferase gene from a female beetle of an Iranian lampyrid species, Nyctophila cf. caucasica (Coleoptera:Lampyridae). The luciferase gene was composed of seven exons, coding for 547 amino acids, separated by six introns spanning 1976 bp of genomic DNA. The deduced amino acid sequences of the luciferase gene of N. caucasica showed 98.9% homology to that of the Palearctic species Lampyris noctiluca. Analysis of the 810 bp upstream region of the luciferase gene revealed three TATA boxes and several other consensus transcriptional factor recognition sequences presenting evidence for a putative core promoter region conserved in Lampyrinae from -190 through to -155 upstream of the luciferase start codon. Along with the core promoter region the luciferase gene was compared with orthologous sequences from other lampyrid species and found to have greatest identity to Lampyris turkistanicus and Lampyris noctiluca. The significant sequence identity to the former is discussed in relation to taxonomic issues of Iranian lampyrids. PMID:20298115
Detection of the CLOCK/BMAL1 heterodimer using a nucleic acid probe with cycling probe technology.

PubMed

Nakagawa, Kazuhiro; Yamamoto, Takuro; Yasuda, Akio

2010-09-15

An isothermal signal amplification technique for specific DNA sequences, known as cycling probe technology (CPT), has enabled rapid acquisition of genomic information. Here we report an analogous technique for the detection of an activated transcription factor, a transcription element-binding assay with fluorescent amplification by apurinic/apyrimidinic (AP) site lysis cycle (TEFAL). This simple amplification assay can detect activated transcription factors by using a unique nucleic acid probe containing a consensus binding sequence and an AP site, which enables the CPT reaction with AP endonuclease. In this article, we demonstrate that this method detects the functional CLOCK/BMAL1 heterodimer via the TEFAL probe containing the E-box consensus sequence to which the CLOCK/BMAL1 heterodimer binds. Using TEFAL combined with immunoassays, we measured oscillations in the amount of CLOCK/BMAL1 heterodimer in serum-stimulated HeLa cells. Furthermore, we succeeded in measuring the circadian accumulation of the functional CLOCK/BMAL1 heterodimer in human buccal mucosa cells. TEFAL contributes greatly to the study of transcription factor activation in mammalian tissues and cell extracts and is a powerful tool for less invasive investigation of human circadian rhythms. 2010 Elsevier Inc. All rights reserved.
Transcriptome sequencing of lentil based on second-generation technology permits large-scale unigene assembly and SSR marker discovery.

PubMed

Kaur, Sukhjiwan; Cogan, Noel O I; Pembleton, Luke W; Shinozuka, Maiko; Savin, Keith W; Materne, Michael; Forster, John W

2011-05-25

Lentil (Lens culinaris Medik.) is a cool-season grain legume which provides a rich source of protein for human consumption. In terms of genomic resources, lentil is relatively underdeveloped, in comparison to other Fabaceae species, with limited available data. There is hence a significant need to enhance such resources in order to identify novel genes and alleles for molecular breeding to increase crop productivity and quality. Tissue-specific cDNA samples from six distinct lentil genotypes were sequenced using Roche 454 GS-FLX Titanium technology, generating c. 1.38 × 106 expressed sequence tags (ESTs). De novo assembly generated a total of 15,354 contigs and 68,715 singletons. The complete unigene set was sequence-analysed against genome drafts of the model legume species Medicago truncatula and Arabidopsis thaliana to identify 12,639, and 7,476 unique matches, respectively. When compared to the genome of Glycine max, a total of 20,419 unique hits were observed corresponding to c. 31% of the known gene space. A total of 25,592 lentil unigenes were subsequently annoated from GenBank. Simple sequence repeat (SSR)-containing ESTs were identified from consensus sequences and a total of 2,393 primer pairs were designed. A subset of 192 EST-SSR markers was screened for validation across a panel 12 cultivated lentil genotypes and one wild relative species. A total of 166 primer pairs obtained successful amplification, of which 47.5% detected genetic polymorphism. A substantial collection of ESTs has been developed from sequence analysis of lentil genotypes using second-generation technology, permitting unigene definition across a broad range of functional categories. As well as providing resources for functional genomics studies, the unigene set has permitted significant enhancement of the number of publicly-available molecular genetic markers as tools for improvement of this species.
A new fast method for inferring multiple consensus trees using k-medoids.

PubMed

Tahiri, Nadia; Willems, Matthieu; Makarenkov, Vladimir

2018-04-05

Gene trees carry important information about specific evolutionary patterns which characterize the evolution of the corresponding gene families. However, a reliable species consensus tree cannot be inferred from a multiple sequence alignment of a single gene family or from the concatenation of alignments corresponding to gene families having different evolutionary histories. These evolutionary histories can be quite different due to horizontal transfer events or to ancient gene duplications which cause the emergence of paralogs within a genome. Many methods have been proposed to infer a single consensus tree from a collection of gene trees. Still, the application of these tree merging methods can lead to the loss of specific evolutionary patterns which characterize some gene families or some groups of gene families. Thus, the problem of inferring multiple consensus trees from a given set of gene trees becomes relevant. We describe a new fast method for inferring multiple consensus trees from a given set of phylogenetic trees (i.e. additive trees or X-trees) defined on the same set of species (i.e. objects or taxa). The traditional consensus approach yields a single consensus tree. We use the popular k-medoids partitioning algorithm to divide a given set of trees into several clusters of trees. We propose novel versions of the well-known Silhouette and Caliński-Harabasz cluster validity indices that are adapted for tree clustering with k-medoids. The efficiency of the new method was assessed using both synthetic and real data, such as a well-known phylogenetic dataset consisting of 47 gene trees inferred for 14 archaeal organisms. The method described here allows inference of multiple consensus trees from a given set of gene trees. It can be used to identify groups of gene trees having similar intragroup and different intergroup evolutionary histories. The main advantage of our method is that it is much faster than the existing tree clustering approaches, while providing similar or better clustering results in most cases. This makes it particularly well suited for the analysis of large genomic and phylogenetic datasets.

IMGMD: A platform for the integration and standardisation of In silico Microbial Genome-scale Metabolic Models.

PubMed

Ye, Chao; Xu, Nan; Dong, Chuan; Ye, Yuannong; Zou, Xuan; Chen, Xiulai; Guo, Fengbiao; Liu, Liming

2017-04-07

Genome-scale metabolic models (GSMMs) constitute a platform that combines genome sequences and detailed biochemical information to quantify microbial physiology at the system level. To improve the unity, integrity, correctness, and format of data in published GSMMs, a consensus IMGMD database was built in the LAMP (Linux + Apache + MySQL + PHP) system by integrating and standardizing 328 GSMMs constructed for 139 microorganisms. The IMGMD database can help microbial researchers download manually curated GSMMs, rapidly reconstruct standard GSMMs, design pathways, and identify metabolic targets for strategies on strain improvement. Moreover, the IMGMD database facilitates the integration of wet-lab and in silico data to gain an additional insight into microbial physiology. The IMGMD database is freely available, without any registration requirements, at http://imgmd.jiangnan.edu.cn/database.
Bacterial whole genome-based phylogeny: construction of a new benchmarking dataset and assessment of some existing methods.

PubMed

Ahrenfeldt, Johanne; Skaarup, Carina; Hasman, Henrik; Pedersen, Anders Gorm; Aarestrup, Frank Møller; Lund, Ole

2017-01-05

Whole genome sequencing (WGS) is increasingly used in diagnostics and surveillance of infectious diseases. A major application for WGS is to use the data for identifying outbreak clusters, and there is therefore a need for methods that can accurately and efficiently infer phylogenies from sequencing reads. In the present study we describe a new dataset that we have created for the purpose of benchmarking such WGS-based methods for epidemiological data, and also present an analysis where we use the data to compare the performance of some current methods. Our aim was to create a benchmark data set that mimics sequencing data of the sort that might be collected during an outbreak of an infectious disease. This was achieved by letting an E. coli hypermutator strain grow in the lab for 8 consecutive days, each day splitting the culture in two while also collecting samples for sequencing. The result is a data set consisting of 101 whole genome sequences with known phylogenetic relationship. Among the sequenced samples 51 correspond to internal nodes in the phylogeny because they are ancestral, while the remaining 50 correspond to leaves. We also used the newly created data set to compare three different online available methods that infer phylogenies from whole-genome sequencing reads: NDtree, CSI Phylogeny and REALPHY. One complication when comparing the output of these methods with the known phylogeny is that phylogenetic methods typically build trees where all observed sequences are placed as leafs, even though some of them are in fact ancestral. We therefore devised a method for post processing the inferred trees by collapsing short branches (thus relocating some leafs to internal nodes), and also present two new measures of tree similarity that takes into account the identity of both internal and leaf nodes. Based on this analysis we find that, among the investigated methods, CSI Phylogeny had the best performance, correctly identifying 73% of all branches in the tree and 71% of all clades. We have made all data from this experiment (raw sequencing reads, consensus whole-genome sequences, as well as descriptions of the known phylogeny in a variety of formats) publicly available, with the hope that other groups may find this data useful for benchmarking and exploring the performance of epidemiological methods. All data is freely available at: https://cge.cbs.dtu.dk/services/evolution_data.php .
Deletion endpoint allele-specificity in the developmentally regulated elimination of an internal sequence (IES) in Paramecium.

PubMed Central

Dubrana, K; Le Mouël, A; Amar, L

1997-01-01

Ciliated protozoa undergo thousands of site-specific DNA deletion events during the programmed development of micronuclear genomes to macronuclear genomes. Two deletion elements, W1 and W2, were identified in the Paramecium primaurelia wild-type 156 strain. Here, we report the characterization of both elements in wild-type strain 168 and show that they display variant deletion patterns when compared with those of strain 156. The W1 ( 168 ) element is defective for deletion. The W2 ( 168 ) element is excised utilizing two alternative boundaries on one side, both are different from the boundary utilized to excise the W2156 element. By crossing the 156 and 168 strains, we demonstrate that the definition of all deletion endpoints are each controlled by cis -acting determinant(s) rather than by strain-specific trans-acting factor(s). Sequence comparison of all deleted DNA segments indicates that the 5'-TA-3'terminal sequence is strictly required at their ends. Furthermore the identity of the first eight base pairs of these ends to a previously established consensus sequence correlates with the frequency of the corresponding deletion events. Our data implies the existence of an adaptive convergent evolution of these Paramecium deleted DNA segment end sequences. PMID:9171098
Sequence- and Structure-Based Functional Annotation and Assessment of Metabolic Transporters in Aspergillus oryzae: A Representative Case Study

PubMed Central

Raethong, Nachon; Wong-ekkabut, Jirasak; Laoteng, Kobkul; Vongsangnak, Wanwipa

2016-01-01

Aspergillus oryzae is widely used for the industrial production of enzymes. In A. oryzae metabolism, transporters appear to play crucial roles in controlling the flux of molecules for energy generation, nutrients delivery, and waste elimination in the cell. While the A. oryzae genome sequence is available, transporter annotation remains limited and thus the connectivity of metabolic networks is incomplete. In this study, we developed a metabolic annotation strategy to understand the relationship between the sequence, structure, and function for annotation of A. oryzae metabolic transporters. Sequence-based analysis with manual curation showed that 58 genes of 12,096 total genes in the A. oryzae genome encoded metabolic transporters. Under consensus integrative databases, 55 unambiguous metabolic transporter genes were distributed into channels and pores (7 genes), electrochemical potential-driven transporters (33 genes), and primary active transporters (15 genes). To reveal the transporter functional role, a combination of homology modeling and molecular dynamics simulation was implemented to assess the relationship between sequence to structure and structure to function. As in the energy metabolism of A. oryzae, the H+-ATPase encoded by the AO090005000842 gene was selected as a representative case study of multilevel linkage annotation. Our developed strategy can be used for enhancing metabolic network reconstruction. PMID:27274991
Sequence- and Structure-Based Functional Annotation and Assessment of Metabolic Transporters in Aspergillus oryzae: A Representative Case Study.

PubMed

Raethong, Nachon; Wong-Ekkabut, Jirasak; Laoteng, Kobkul; Vongsangnak, Wanwipa

2016-01-01

Aspergillus oryzae is widely used for the industrial production of enzymes. In A. oryzae metabolism, transporters appear to play crucial roles in controlling the flux of molecules for energy generation, nutrients delivery, and waste elimination in the cell. While the A. oryzae genome sequence is available, transporter annotation remains limited and thus the connectivity of metabolic networks is incomplete. In this study, we developed a metabolic annotation strategy to understand the relationship between the sequence, structure, and function for annotation of A. oryzae metabolic transporters. Sequence-based analysis with manual curation showed that 58 genes of 12,096 total genes in the A. oryzae genome encoded metabolic transporters. Under consensus integrative databases, 55 unambiguous metabolic transporter genes were distributed into channels and pores (7 genes), electrochemical potential-driven transporters (33 genes), and primary active transporters (15 genes). To reveal the transporter functional role, a combination of homology modeling and molecular dynamics simulation was implemented to assess the relationship between sequence to structure and structure to function. As in the energy metabolism of A. oryzae, the H(+)-ATPase encoded by the AO090005000842 gene was selected as a representative case study of multilevel linkage annotation. Our developed strategy can be used for enhancing metabolic network reconstruction.
Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments

PubMed Central

Haas, Brian J; Salzberg, Steven L; Zhu, Wei; Pertea, Mihaela; Allen, Jonathan E; Orvis, Joshua; White, Owen; Buell, C Robin; Wortman, Jennifer R

2008-01-01

EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation. PMID:18190707
A microsatellite-based consensus linkage map for species of Eucalyptus and a novel set of 230 microsatellite markers for the genus

PubMed Central

Brondani, Rosana PV; Williams, Emlyn R; Brondani, Claudio; Grattapaglia, Dario

2006-01-01

Background Eucalypts are the most widely planted hardwood trees in the world occupying globally more than 18 million hectares as an important source of carbon neutral renewable energy and raw material for pulp, paper and solid wood. Quantitative Trait Loci (QTLs) in Eucalyptus have been localized on pedigree-specific RAPD or AFLP maps seriously limiting the value of such QTL mapping efforts for molecular breeding. The availability of a genus-wide genetic map with transferable microsatellite markers has become a must for the effective advancement of genomic undertakings. This report describes the development of a novel set of 230 EMBRA microsatellites, the construction of the first comprehensive microsatellite-based consensus linkage map for Eucalyptus and the consolidation of existing linkage information for other microsatellites and candidate genes mapped in other species of the genus. Results The consensus map covers ~90% of the recombining genome of Eucalyptus, involves 234 mapped EMBRA loci on 11 linkage groups, an observed length of 1,568 cM and a mean distance between markers of 8.4 cM. A compilation of all microsatellite linkage information published in Eucalyptus allowed us to establish the homology among linkage groups between this consensus map and other maps published for E. globulus. Comparative mapping analyses also resulted in the linkage group assignment of other 41 microsatellites derived from other Eucalyptus species as well as candidate genes and QTLs for wood and flowering traits published in the literature. This report significantly increases the availability of microsatellite markers and mapping information for species of Eucalyptus and corroborates the high conservation of microsatellite flanking sequences and locus ordering between species of the genus. Conclusion This work represents an important step forward for Eucalyptus comparative genomics, opening stimulating perspectives for evolutionary studies and molecular breeding applications. The generalized use of an increasingly larger set of interspecific transferable markers and consensus mapping information, will allow faster and more detailed investigations of QTL synteny among species, validation of expression-QTL across variable genetic backgrounds and positioning of a growing number of candidate genes co-localized with QTLs, to be tested in association mapping experiments. PMID:16995939
Metagenome assembly through clustering of next-generation sequencing data using protein sequences.

PubMed

Sim, Mikang; Kim, Jaebum

2015-02-01

The study of environmental microbial communities, called metagenomics, has gained a lot of attention because of the recent advances in next-generation sequencing (NGS) technologies. Microbes play a critical role in changing their environments, and the mode of their effect can be solved by investigating metagenomes. However, the difficulty of metagenomes, such as the combination of multiple microbes and different species abundance, makes metagenome assembly tasks more challenging. In this paper, we developed a new metagenome assembly method by utilizing protein sequences, in addition to the NGS read sequences. Our method (i) builds read clusters by using mapping information against available protein sequences, and (ii) creates contig sequences by finding consensus sequences through probabilistic choices from the read clusters. By using simulated NGS read sequences from real microbial genome sequences, we evaluated our method in comparison with four existing assembly programs. We found that our method could generate relatively long and accurate metagenome assemblies, indicating that the idea of using protein sequences, as a guide for the assembly, is promising. Copyright © 2015 Elsevier B.V. All rights reserved.
Evaluation of artificial selection in Standard Poodles using whole-genome sequencing.

PubMed

Friedenberg, Steven G; Meurs, Kathryn M; Mackay, Trudy F C

2016-12-01

Identifying regions of artificial selection within dog breeds may provide insights into genetic variation that underlies breed-specific traits or diseases-particularly if these traits or disease predispositions are fixed within a breed. In this study, we searched for runs of homozygosity (ROH) and calculated the d i statistic (which is based upon F ST ) to identify regions of artificial selection in Standard Poodles using high-coverage, whole-genome sequencing data of 15 Standard Poodles and 49 dogs across seven other breeds. We identified consensus ROH regions ≥1 Mb in length and common to at least ten Standard Poodles covering 0.6 % of the genome, and d i regions that most distinguish Standard Poodles from other breeds covering 3.7 % of the genome. Within these regions, we identified enriched gene pathways related to olfaction, digestion, and taste, as well as pathways related to adrenal hormone biosynthesis, T cell function, and protein ubiquitination that could contribute to the pathogenesis of some Poodle-prevalent autoimmune diseases. We also validated variants related to hair coat and skull morphology that have previously been identified as being under selective pressure in Poodles, and flagged additional polymorphisms in genes such as ITGA2B, CBX4, and TNXB that may represent strong candidates for other common Poodle disorders.
A high HIV-1 strain variability in London, UK, revealed by full-genome analysis: Results from the ICONIC project.

PubMed

Yebra, Gonzalo; Frampton, Dan; Gallo Cassarino, Tiziano; Raffle, Jade; Hubb, Jonathan; Ferns, R Bridget; Waters, Laura; Tong, C Y William; Kozlakidis, Zisis; Hayward, Andrew; Kellam, Paul; Pillay, Deenan; Clark, Duncan; Nastouli, Eleni; Leigh Brown, Andrew J

2018-01-01

The ICONIC project has developed an automated high-throughput pipeline to generate HIV nearly full-length genomes (NFLG, i.e. from gag to nef) from next-generation sequencing (NGS) data. The pipeline was applied to 420 HIV samples collected at University College London Hospitals NHS Trust and Barts Health NHS Trust (London) and sequenced using an Illumina MiSeq at the Wellcome Trust Sanger Institute (Cambridge). Consensus genomes were generated and subtyped using COMET, and unique recombinants were studied with jpHMM and SimPlot. Maximum-likelihood phylogenetic trees were constructed using RAxML to identify transmission networks using the Cluster Picker. The pipeline generated sequences of at least 1Kb of length (median = 7.46Kb, IQR = 4.01Kb) for 375 out of the 420 samples (89%), with 174 (46.4%) being NFLG. A total of 365 sequences (169 of them NFLG) corresponded to unique subjects and were included in the down-stream analyses. The most frequent HIV subtypes were B (n = 149, 40.8%) and C (n = 77, 21.1%) and the circulating recombinant form CRF02_AG (n = 32, 8.8%). We found 14 different CRFs (n = 66, 18.1%) and multiple URFs (n = 32, 8.8%) that involved recombination between 12 different subtypes/CRFs. The most frequent URFs were B/CRF01_AE (4 cases) and A1/D, B/C, and B/CRF02_AG (3 cases each). Most URFs (19/26, 73%) lacked breakpoints in the PR+RT pol region, rendering them undetectable if only that was sequenced. Twelve (37.5%) of the URFs could have emerged within the UK, whereas the rest were probably imported from sub-Saharan Africa, South East Asia and South America. For 2 URFs we found highly similar pol sequences circulating in the UK. We detected 31 phylogenetic clusters using the full dataset: 25 pairs (mostly subtypes B and C), 4 triplets and 2 quadruplets. Some of these were not consistent across different genes due to inter- and intra-subtype recombination. Clusters involved 70 sequences, 19.2% of the dataset. The initial analysis of genome sequences detected substantial hidden variability in the London HIV epidemic. Analysing full genome sequences, as opposed to only PR+RT, identified previously undetected recombinants. It provided a more reliable description of CRFs (that would be otherwise misclassified) and transmission clusters.
ESTree db: a Tool for Peach Functional Genomics

PubMed Central

Lazzari, Barbara; Caprera, Andrea; Vecchietti, Alberto; Stella, Alessandra; Milanesi, Luciano; Pozzi, Carlo

2005-01-01

Background The ESTree db represents a collection of Prunus persica expressed sequenced tags (ESTs) and is intended as a resource for peach functional genomics. A total of 6,155 successful EST sequences were obtained from four in-house prepared cDNA libraries from Prunus persica mesocarps at different developmental stages. Another 12,475 peach EST sequences were downloaded from public databases and added to the ESTree db. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts and data were collected in a MySQL database. A php-based web interface was developed to query the database. Results The ESTree db version as of April 2005 encompasses 18,630 sequences representing eight libraries. Contig assembly was performed with CAP3. Putative single nucleotide polymorphism (SNP) detection was performed with the AutoSNP program and a search engine was implemented to retrieve results. All the sequences and all the contig consensus sequences were annotated both with blastx against the GenBank nr db and with GOblet against the viridiplantae section of the Gene Ontology db. Links to NiceZyme (Expasy) and to the KEGG metabolic pathways were provided. A local BLAST utility is available. A text search utility allows querying and browsing the database. Statistics were provided on Gene Ontology occurrences to assign sequences to Gene Ontology categories. Conclusion The resulting database is a comprehensive resource of data and links related to peach EST sequences. The Sequence Report and Contig Report pages work as the web interface core structures, giving quick access to data related to each sequence/contig. PMID:16351742
ESTree db: a tool for peach functional genomics.

PubMed

Lazzari, Barbara; Caprera, Andrea; Vecchietti, Alberto; Stella, Alessandra; Milanesi, Luciano; Pozzi, Carlo

2005-12-01

The ESTree db http://www.itb.cnr.it/estree/ represents a collection of Prunus persica expressed sequenced tags (ESTs) and is intended as a resource for peach functional genomics. A total of 6,155 successful EST sequences were obtained from four in-house prepared cDNA libraries from Prunus persica mesocarps at different developmental stages. Another 12,475 peach EST sequences were downloaded from public databases and added to the ESTree db. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts and data were collected in a MySQL database. A php-based web interface was developed to query the database. The ESTree db version as of April 2005 encompasses 18,630 sequences representing eight libraries. Contig assembly was performed with CAP3. Putative single nucleotide polymorphism (SNP) detection was performed with the AutoSNP program and a search engine was implemented to retrieve results. All the sequences and all the contig consensus sequences were annotated both with blastx against the GenBank nr db and with GOblet against the viridiplantae section of the Gene Ontology db. Links to NiceZyme (Expasy) and to the KEGG metabolic pathways were provided. A local BLAST utility is available. A text search utility allows querying and browsing the database. Statistics were provided on Gene Ontology occurrences to assign sequences to Gene Ontology categories. The resulting database is a comprehensive resource of data and links related to peach EST sequences. The Sequence Report and Contig Report pages work as the web interface core structures, giving quick access to data related to each sequence/contig.
An Autotetraploid Linkage Map of Rose (Rosa hybrida) Validated Using the Strawberry (Fragaria vesca) Genome Sequence

PubMed Central

Gar, Oron; Sargent, Daniel J.; Tsai, Ching-Jung; Pleban, Tzili; Shalev, Gil; Byrne, David H.; Zamir, Dani

2011-01-01

Polyploidy is a pivotal process in plant evolution as it increase gene redundancy and morphological intricacy but due to the complexity of polysomic inheritance we have only few genetic maps of autopolyploid organisms. A robust mapping framework is particularly important in polyploid crop species, rose included (2n = 4x = 28), where the objective is to study multiallelic interactions that control traits of value for plant breeding. From a cross between the garden, peach red and fragrant cultivar Fragrant Cloud (FC) and a cut-rose yellow cultivar Golden Gate (GG), we generated an autotetraploid GGFC mapping population consisting of 132 individuals. For the map we used 128 sequence-based markers, 141 AFLP, 86 SSR and three morphological markers. Seven linkage groups were resolved for FC (Total 632 cM) and GG (616 cM) which were validated by markers that segregated in both parents as well as the diploid integrated consensus map. The release of the Fragaria vesca genome, which also belongs to the Rosoideae, allowed us to place 70 rose sequenced markers on the seven strawberry pseudo-chromosomes. Synteny between Rosa and Fragaria was high with an estimated four major translocations and six inversions required to place the 17 non-collinear markers in the same order. Based on a verified linear order of the rose markers, we could further partition each of the parents into its four homologous groups, thus providing an essential framework to aid the sequencing of an autotetraploid genome. PMID:21647382
An autotetraploid linkage map of rose (Rosa hybrida) validated using the strawberry (Fragaria vesca) genome sequence.

PubMed

Gar, Oron; Sargent, Daniel J; Tsai, Ching-Jung; Pleban, Tzili; Shalev, Gil; Byrne, David H; Zamir, Dani

2011-01-01

Polyploidy is a pivotal process in plant evolution as it increase gene redundancy and morphological intricacy but due to the complexity of polysomic inheritance we have only few genetic maps of autopolyploid organisms. A robust mapping framework is particularly important in polyploid crop species, rose included (2n = 4x = 28), where the objective is to study multiallelic interactions that control traits of value for plant breeding. From a cross between the garden, peach red and fragrant cultivar Fragrant Cloud (FC) and a cut-rose yellow cultivar Golden Gate (GG), we generated an autotetraploid GGFC mapping population consisting of 132 individuals. For the map we used 128 sequence-based markers, 141 AFLP, 86 SSR and three morphological markers. Seven linkage groups were resolved for FC (Total 632 cM) and GG (616 cM) which were validated by markers that segregated in both parents as well as the diploid integrated consensus map.The release of the Fragaria vesca genome, which also belongs to the Rosoideae, allowed us to place 70 rose sequenced markers on the seven strawberry pseudo-chromosomes. Synteny between Rosa and Fragaria was high with an estimated four major translocations and six inversions required to place the 17 non-collinear markers in the same order. Based on a verified linear order of the rose markers, we could further partition each of the parents into its four homologous groups, thus providing an essential framework to aid the sequencing of an autotetraploid genome.
Adeno-Associated Virus Type 2 Wild-Type and Vector-Mediated Genomic Integration Profiles of Human Diploid Fibroblasts Analyzed by Third-Generation PacBio DNA Sequencing

PubMed Central

Hüser, Daniela; Gogol-Döring, Andreas; Chen, Wei

2014-01-01

ABSTRACT Genome-wide analysis of adeno-associated virus (AAV) type 2 integration in HeLa cells has shown that wild-type AAV integrates at numerous genomic sites, including AAVS1 on chromosome 19q13.42. Multiple GAGY/C repeats, resembling consensus AAV Rep-binding sites are preferred, whereas rep-deficient AAV vectors (rAAV) regularly show a random integration profile. This study is the first study to analyze wild-type AAV integration in diploid human fibroblasts. Applying high-throughput third-generation PacBio-based DNA sequencing, integration profiles of wild-type AAV and rAAV are compared side by side. Bioinformatic analysis reveals that both wild-type AAV and rAAV prefer open chromatin regions. Although genomic features of AAV integration largely reproduce previous findings, the pattern of integration hot spots differs from that described in HeLa cells before. DNase-Seq data for human fibroblasts and for HeLa cells reveal variant chromatin accessibility at preferred AAV integration hot spots that correlates with variant hot spot preferences. DNase-Seq patterns of these sites in human tissues, including liver, muscle, heart, brain, skin, and embryonic stem cells further underline variant chromatin accessibility. In summary, AAV integration is dependent on cell-type-specific, variant chromatin accessibility leading to random integration profiles for rAAV, whereas wild-type AAV integration sites cluster near GAGY/C repeats. IMPORTANCE Adeno-associated virus type 2 (AAV) is assumed to establish latency by chromosomal integration of its DNA. This is the first genome-wide analysis of wild-type AAV2 integration in diploid human cells and the first to compare wild-type to recombinant AAV vector integration side by side under identical experimental conditions. Major determinants of wild-type AAV integration represent open chromatin regions with accessible consensus AAV Rep-binding sites. The variant chromatin accessibility of different human tissues or cell types will have impact on vector targeting to be considered during gene therapy. PMID:25031342
Comprehensive Genomic Characterization of Upper Tract Urothelial Carcinoma.

PubMed

Moss, Tyler J; Qi, Yuan; Xi, Liu; Peng, Bo; Kim, Tae-Beom; Ezzedine, Nader E; Mosqueda, Maribel E; Guo, Charles C; Czerniak, Bogdan A; Ittmann, Michael; Wheeler, David A; Lerner, Seth P; Matin, Surena F

2017-10-01

Upper urinary tract urothelial cancer (UTUC) may have unique etiologic and genomic factors compared to bladder cancer. To characterize the genomic landscape of UTUC and provide insights into its biology using comprehensive integrated genomic analyses. We collected 31 untreated snap-frozen UTUC samples from two institutions and carried out whole-exome sequencing (WES) of DNA, RNA sequencing (RNAseq), and protein analysis. Adjusting for batch effects, consensus mutation calls from independent pipelines identified DNA mutations, gene expression clusters using unsupervised consensus hierarchical clustering (UCHC), and protein expression levels that were correlated with relevant clinical variables, The Cancer Genome Atlas, and other published data. WES identified mutations in FGFR3 (74.1%; 92% low-grade, 60% high-grade), KMT2D (44.4%), PIK3CA (25.9%), and TP53 (22.2%). APOBEC and CpG were the most common mutational signatures. UCHC of RNAseq data segregated samples into four molecular subtypes with the following characteristics. Cluster 1: no PIK3CA mutations, nonsmokers, high-grade
Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome.

PubMed

Bickhart, Derek M; Rosen, Benjamin D; Koren, Sergey; Sayre, Brian L; Hastie, Alex R; Chan, Saki; Lee, Joyce; Lam, Ernest T; Liachko, Ivan; Sullivan, Shawn T; Burton, Joshua N; Huson, Heather J; Nystrom, John C; Kelley, Christy M; Hutchison, Jana L; Zhou, Yang; Sun, Jiajie; Crisà, Alessandra; Ponce de León, F Abel; Schwartz, John C; Hammond, John A; Waldbieser, Geoffrey C; Schroeder, Steven G; Liu, George E; Dunham, Maitreya J; Shendure, Jay; Sonstegard, Tad S; Phillippy, Adam M; Van Tassell, Curtis P; Smith, Timothy P L

2017-04-01

The decrease in sequencing cost and increased sophistication of assembly algorithms for short-read platforms has resulted in a sharp increase in the number of species with genome assemblies. However, these assemblies are highly fragmented, with many gaps, ambiguities, and errors, impeding downstream applications. We demonstrate current state of the art for de novo assembly using the domestic goat (Capra hircus) based on long reads for contig formation, short reads for consensus validation, and scaffolding by optical and chromatin interaction mapping. These combined technologies produced what is, to our knowledge, the most continuous de novo mammalian assembly to date, with chromosome-length scaffolds and only 649 gaps. Our assembly represents a ∼400-fold improvement in continuity due to properly assembled gaps, compared to the previously published C. hircus assembly, and better resolves repetitive structures longer than 1 kb, representing the largest repeat family and immune gene complex yet produced for an individual of a ruminant species.
Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome

PubMed Central

Bickhart, Derek M.; Rosen, Benjamin D.; Koren, Sergey; Sayre, Brian L.; Hastie, Alex R.; Chan, Saki; Lee, Joyce; Lam, Ernest T.; Liachko, Ivan; Sullivan, Shawn T.; Burton, Joshua N.; Huson, Heather J.; Nystrom, John C.; Kelley, Christy M.; Hutchison, Jana L.; Zhou, Yang; Sun, Jiajie; Crisà, Alessandra; de León, F. Abel Ponce; Schwartz, John C.; Hammond, John A.; Waldbieser, Geoffrey C.; Schroeder, Steven G.; Liu, George E.; Dunham, Maitreya J.; Shendure, Jay; Sonstegard, Tad S.; Phillippy, Adam M.; Van Tassell, Curtis P.; Smith, Timothy P.L.

2018-01-01

The decrease in sequencing cost and increased sophistication of assembly algorithms for short-read platforms has resulted in a sharp increase in the number of species with genome assemblies. However, these assemblies are highly fragmented, with many gaps, ambiguities, and errors, impeding downstream applications. We demonstrate current state of the art for de novo assembly using the domestic goat (Capra hircus), based on long reads for contig formation, short reads for consensus validation, and scaffolding by optical and chromatin interaction mapping. These combined technologies produced the most continuous de novo mammalian assembly to date, with chromosome-length scaffolds and only 649 gaps. Our assembly represents a ~400-fold improvement in continuity due to properly assembled gaps compared to the previously published C. hircus assembly, and better resolves repetitive structures longer than 1 kb, representing the largest repeat family and immune gene complex ever produced for an individual of a ruminant species. PMID:28263316
tRNomics: analysis of tRNA genes from 50 genomes of Eukarya, Archaea, and Bacteria reveals anticodon-sparing strategies and domain-specific features.

PubMed Central

Marck, Christian; Grosjean, Henri

2002-01-01

From 50 genomes of the three domains of life (7 eukarya, 13 archaea, and 30 bacteria), we extracted, analyzed, and compared over 4,000 sequences corresponding to cytoplasmic, nonorganellar tRNAs. For each genome, the complete set of tRNAs required to read the 61 sense codons was identified, which permitted revelation of three major anticodon-sparing strategies. Other features and sequence peculiarities analyzed are the following: (1) fit to the standard cloverleaf structure, (2) characteristic consensus sequences for elongator and initiator tDNAs, (3) frequencies of bases at each sequence position, (4) type and frequencies of conserved 2D and 3D base pairs, (5) anticodon/tDNA usages and anticodon-sparing strategies, (6) identification of the tRNA-Ile with anticodon CAU reading AUA, (7) size of variable arm, (8) occurrence and location of introns, (9) occurrence of 3'-CCA and 5'-extra G encoded at the tDNA level, and (10) distribution of the tRNA genes in genomes and their mode of transcription. Among all tRNA isoacceptors, we found that initiator tDNA-iMet is the most conserved across the three domains, yet domain-specific signatures exist. Also, according to which tRNA feature is considered (5'-extra G encoded in tDNAs-His, AUA codon read by tRNA-Ile with anticodon CAU, presence of intron, absence of "two-out-of-three" reading mode and short V-arm in tDNA-Tyr) Archaea sequester either with Bacteria or Eukarya. No common features between Eukarya and Bacteria not shared with Archaea could be unveiled. Thus, from the tRNomic point of view, Archaea appears as an "intermediate domain" between Eukarya and Bacteria. PMID:12403461
Genome resources for climate-resilient cowpea, an essential crop for food security.

PubMed

Muñoz-Amatriaín, María; Mirebrahim, Hamid; Xu, Pei; Wanamaker, Steve I; Luo, MingCheng; Alhakami, Hind; Alpert, Matthew; Atokple, Ibrahim; Batieno, Benoit J; Boukar, Ousmane; Bozdag, Serdar; Cisse, Ndiaga; Drabo, Issa; Ehlers, Jeffrey D; Farmer, Andrew; Fatokun, Christian; Gu, Yong Q; Guo, Yi-Ning; Huynh, Bao-Lam; Jackson, Scott A; Kusi, Francis; Lawley, Cynthia T; Lucas, Mitchell R; Ma, Yaqin; Timko, Michael P; Wu, Jiajie; You, Frank; Barkley, Noelle A; Roberts, Philip A; Lonardi, Stefano; Close, Timothy J

2017-03-01

Cowpea (Vigna unguiculata L. Walp.) is a legume crop that is resilient to hot and drought-prone climates, and a primary source of protein in sub-Saharan Africa and other parts of the developing world. However, genome resources for cowpea have lagged behind most other major crops. Here we describe foundational genome resources and their application to the analysis of germplasm currently in use in West African breeding programs. Resources developed from the African cultivar IT97K-499-35 include a whole-genome shotgun (WGS) assembly, a bacterial artificial chromosome (BAC) physical map, and assembled sequences from 4355 BACs. These resources and WGS sequences of an additional 36 diverse cowpea accessions supported the development of a genotyping assay for 51 128 SNPs, which was then applied to five bi-parental RIL populations to produce a consensus genetic map containing 37 372 SNPs. This genetic map enabled the anchoring of 100 Mb of WGS and 420 Mb of BAC sequences, an exploration of genetic diversity along each linkage group, and clarification of macrosynteny between cowpea and common bean. The SNP assay enabled a diversity analysis of materials from West African breeding programs. Two major subpopulations exist within those materials, one of which has significant parentage from South and East Africa and more diversity. There are genomic regions of high differentiation between subpopulations, one of which coincides with a cluster of nodulin genes. The new resources and knowledge help to define goals and accelerate the breeding of improved varieties to address food security issues related to limited-input small-holder farming and climate stress. © 2016 The Authors. The Plant Journal published by John Wiley & Sons Ltd and Society for Experimental Biology.

Comparison of Two Capillary Gel Electrophoresis Systems for Clostridium difficile Ribotyping, Using a Panel of Ribotype 027 Isolates and Whole-Genome Sequences as a Reference Standard

PubMed Central

Xiao, Meng; Kong, Fanrong; Jin, Ping; Wang, Qinning; Xiao, Kelin; Jeoffreys, Neisha; James, Gregory

2012-01-01

PCR ribotyping is the most commonly used Clostridium difficile genotyping method, but its utility is limited by lack of standardization. In this study, we analyzed four published whole genomes and tested an international collection of 21 well-characterized C. difficile ribotype 027 isolates as the basis for comparison of two capillary gel electrophoresis (CGE)-based ribotyping methods. There were unexpected differences between the 16S-23S rRNA intergenic spacer region (ISR) allelic profiles of the four ribotype 027 genomes, but six bands were identified in all four and a seventh in three genomes. All seven bands and another, not identified in any of the whole genomes, were found in all 21 isolates. We compared sequencer-based CGE (SCGE) with three different primer pairs to the Qiagen QIAxcel CGE (QCGE) platform. Deviations from individual reference/consensus band sizes were smaller for SCGE (0 to 0.2 bp) than for QCGE (4.2 to 9.5 bp). Compared with QCGE, SCGE more readily distinguished bands of similar length (more discriminatory), detected bands of larger size and lower intensity (more sensitive), and assigned band sizes more accurately and reproducibly, making it more suitable for standardization. Specifically, QCGE failed to identify the largest ISR amplicon. Based on several criteria, we recommend the primer set 16S-USA/23S-USA for use in a proposed standard SCGE method. Similar differences between SCGE and QCGE were found on testing of 14 isolates of four other C. difficile ribotypes. Based on our results, ISR profiles based on accurate sequencer-based band lengths would be preferable to agarose gel-based banding patterns for the assignment of ribotypes. PMID:22692737
An Expressed Sequence Tag (EST)-enriched genetic map of turbot (Scophthalmus maximus): a useful framework for comparative genomics across model and farmed teleosts

PubMed Central

2012-01-01

Background The turbot (Scophthalmus maximus) is a relevant species in European aquaculture. The small turbot genome provides a source for genomics strategies to use in order to understand the genetic basis of productive traits, particularly those related to sex, growth and pathogen resistance. Genetic maps represent essential genomic screening tools allowing to localize quantitative trait loci (QTL) and to identify candidate genes through comparative mapping. This information is the backbone to develop marker-assisted selection (MAS) programs in aquaculture. Expressed sequenced tag (EST) resources have largely increased in turbot, thus supplying numerous type I markers suitable for extending the previous linkage map, which was mostly based on anonymous loci. The aim of this study was to construct a higher-resolution turbot genetic map using EST-linked markers, which will turn out to be useful for comparative mapping studies. Results A consensus gene-enriched genetic map of the turbot was constructed using 463 SNP and microsatellite markers in nine reference families. This map contains 438 markers, 180 EST-linked, clustered at 24 linkage groups. Linkage and comparative genomics evidences suggested additional linkage group fusions toward the consolidation of turbot map according to karyotype information. The linkage map showed a total length of 1402.7 cM with low average intermarker distance (3.7 cM; ~2 Mb). A global 1.6:1 female-to-male recombination frequency (RF) ratio was observed, although largely variable among linkage groups and chromosome regions. Comparative sequence analysis revealed large macrosyntenic patterns against model teleost genomes, significant hits decreasing from stickleback (54%) to zebrafish (20%). Comparative mapping supported particular chromosome rearrangements within Acanthopterygii and aided to assign unallocated markers to specific turbot linkage groups. Conclusions The new gene-enriched high-resolution turbot map represents a useful genomic tool for QTL identification, positional cloning strategies, and future genome assembling. This map showed large synteny conservation against model teleost genomes. Comparative genomics and data mining from landmarks will provide straightforward access to candidate genes, which will be the basis for genetic breeding programs and evolutionary studies in this species. PMID:22747677
The yeast genome may harbor hypoxia response elements (HRE).

PubMed

Ferreira, Túlio César; Hertzberg, Libi; Gassmann, Max; Campos, Elida Geralda

2007-01-01

The hypoxia-inducible factor-1 (HIF-1) is a heterodimeric transcription factor activated when cells are submitted to hypoxia. The heterodimer is composed of two subunits, HIF-1alpha and the constitutively expressed HIF-1beta. During normoxia, HIF-1alpha is degraded by the 26S proteasome, but hypoxia causes HIF-1alpha to be stabilized, enter the nucleus and bind to HIF-1beta, thus forming the active complex. The complex then binds to the regulatory sequences of various genes involved in physiological and pathological processes. The specific regulatory sequence recognized by HIF-1 is the hypoxia response element (HRE) that has the consensus sequence 5'BRCGTGVBBB3'. Although the basic transcriptional regulation machinery is conserved between yeast and mammals, Saccharomyces cerevisiae does not express HIF-1 subunits. However, we hypothesized that baker's yeast has a protein analogous to HIF-1 which participates in the response to changes in oxygen levels by binding to HRE sequences. In this study we screened the yeast genome for HREs using probabilistic motif search tools. We described 24 yeast genes containing motifs with high probability of being HREs (p-value<0.1) and classified them according to biological function. Our results show that S. cerevisiae may harbor HREs and indicate that a transcription factor analogous to HIF-1 may exist in this organism.
The complete sequence and promoter activity of the human A-raf-1 gene (ARAF1)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, J.E.; Beck, T.W.; Brennscheidt, U.

1994-03-01

The raf proto-oncogenes encode cytoplasmic protein serine/threonine kinases, which play a critical role in cell growth and development. One of these, A-raf-1 (human gene symbol, ARAF1), which is predominantly expressed in mouse urogenital tissues, has been mapped to an evolutionarily conserved linkage group composed of ARAF1, SYN1, TIMP, and properdin located at human chromosome Xp11.2. The authors have isolated human genomic DNA clones containing the expressed gene (ARAF1) on the X chromosome and a pseudogene (ARAF2) on chromosome 7p12-q11.21. Analysis of the nucleotide sequence from the ARAF1 genomic clones demonstrated that it consists of 16 exons encoded by minimally 10,776more » nucleotides. The major transcriptional start site (+1) was determined by RNase protection and primer extension assays. Promoter activity was confirmed by functional assays using DNA fragments fused to a CAT reporter gene. The ARAF1 minimal promoter, located between nucleotides -59 and +93, has a low G + C content and lacks consensus TATA and Inr sequences but shows sequence similarity at position -1 to the E box that is known to interact with USF and TFII-I transcription factors. 65 refs., 7 figs., 1 tab.« less
Environmental surveillance of viruses by tangential flow filtration and metagenomic reconstruction.

PubMed

Furtak, Vyacheslav; Roivainen, Merja; Mirochnichenko, Olga; Zagorodnyaya, Tatiana; Laassri, Majid; Zaidi, Sohail Z; Rehman, Lubna; Alam, Muhammad M; Chizhikov, Vladimir; Chumakov, Konstantin

2016-04-14

An approach is proposed for environmental surveillance of poliovirus by concentrating sewage samples with tangential flow filtration (TFF) followed by deep sequencing of viral RNA. Subsequent to testing the method with samples from Finland, samples from Pakistan, a country endemic for poliovirus, were investigated. Genomic sequencing was either performed directly, for unbiased identification of viruses regardless of their ability to grow in cell cultures, or after virus enrichment by cell culture or immunoprecipitation. Bioinformatics enabled separation and determination of individual consensus sequences. Overall, deep sequencing of the entire viral population identified polioviruses, non-polio enteroviruses, and other viruses. In Pakistani sewage samples, adeno-associated virus, unable to replicate autonomously in cell cultures, was the most abundant human virus. The presence of recombinants of wild polioviruses of serotype 1 (WPV1) was also inferred, whereby currently circulating WPV1 of south-Asian (SOAS) lineage comprised two sub-lineages depending on their non-capsid region origin. Complete genome analyses additionally identified point mutants and intertypic recombinants between attenuated Sabin strains in the Pakistani samples, and in one Finnish sample. The approach could allow rapid environmental surveillance of viruses causing human infections. It creates a permanent digital repository of the entire virome potentially useful for retrospective screening of future discovered viruses.
Brain cDNA clone for human cholinesterase

DOE Office of Scientific and Technical Information (OSTI.GOV)

McTiernan, C.; Adkins, S.; Chatonnet, A.

1987-10-01

A cDNA library from human basal ganglia was screened with oligonucleotide probes corresponding to portions of the amino acid sequence of human serum cholinesterase. Five overlapping clones, representing 2.4 kilobases, were isolated. The sequenced cDNA contained 207 base pairs of coding sequence 5' to the amino terminus of the mature protein in which there were four ATG translation start sites in the same reading frame as the protein. Only the ATG coding for Met-(-28) lay within a favorable consensus sequence for functional initiators. There were 1722 base pairs of coding sequence corresponding to the protein found circulating in human serum.more » The amino acid sequence deduced from the cDNA exactly matched the 574 amino acid sequence of human serum cholinesterase, as previously determined by Edman degradation. Therefore, our clones represented cholinesterase rather than acetylcholinesterase. It was concluded that the amino acid sequences of cholinesterase from two different tissues, human brain and human serum, were identical. Hybridization of genomic DNA blots suggested that a single gene, or very few genes coded for cholinesterase.« less
ApiEST-DB: analyzing clustered EST data of the apicomplexan parasites.

PubMed

Li, Li; Crabtree, Jonathan; Fischer, Steve; Pinney, Deborah; Stoeckert, Christian J; Sibley, L David; Roos, David S

2004-01-01

ApiEST-DB (http://www.cbil.upenn.edu/paradbs-servlet/) provides integrated access to publicly available EST data from protozoan parasites in the phylum Apicomplexa. The database currently incorporates a total of nearly 100,000 ESTs from several parasite species of clinical and/or veterinary interest, including Eimeria tenella, Neospora caninum, Plasmodium falciparum, Sarcocystis neurona and Toxoplasma gondii. To facilitate analysis of these data, EST sequences were clustered and assembled to form consensus sequences for each organism, and these assemblies were then subjected to automated annotation via similarity searches against protein and domain databases. The underlying relational database infrastructure, Genomics Unified Schema (GUS), enables complex biologically based queries, facilitating validation of gene models, identification of alternative splicing, detection of single nucleotide polymorphisms, identification of stage-specific genes and recognition of phylogenetically conserved and phylogenetically restricted sequences.
ConsPred: a rule-based (re-)annotation framework for prokaryotic genomes.

PubMed

Weinmaier, Thomas; Platzer, Alexander; Frank, Jeroen; Hellinger, Hans-Jörg; Tischler, Patrick; Rattei, Thomas

2016-11-01

The rapidly growing number of available prokaryotic genome sequences requires fully automated and high-quality software solutions for their initial and re-annotation. Here we present ConsPred, a prokaryotic genome annotation framework that performs intrinsic gene predictions, homology searches, predictions of non-coding genes as well as CRISPR repeats and integrates all evidence into a consensus annotation. ConsPred achieves comprehensive, high-quality annotations based on rules and priorities, similar to decision-making in manual curation and avoids conflicting predictions. Parameters controlling the annotation process are configurable by the user. ConsPred has been used in the institutions of the authors for longer than 5 years and can easily be extended and adapted to specific needs. The ConsPred algorithm for producing a consensus from the varying scores of multiple gene prediction programs approaches manual curation in accuracy. Its rule-based approach for choosing final predictions avoids overriding previous manual curations. ConsPred is implemented in Java, Perl and Shell and is freely available under the Creative Commons license as a stand-alone in-house pipeline or as an Amazon Machine Image for cloud computing, see https://sourceforge.net/projects/conspred/. thomas.rattei@univie.ac.atSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A novel genome-wide microsatellite resource for species of Eucalyptus with linkage-to-physical correspondence on the reference genome sequence.

PubMed

Grattapaglia, Dario; Mamani, Eva M C; Silva-Junior, Orzenil B; Faria, Danielle A

2015-03-01

Keystone species in their native ranges, eucalypts, are ecologically and genetically very diverse, growing naturally along extensive latitudinal and altitudinal ranges and variable environments. Besides their ecological importance, eucalypts are also the most widely planted trees for sustainable forestry in the world. We report the development of a novel collection of 535 microsatellites for species of Eucalyptus, 494 designed from ESTs and 41 from genomic libraries. A selected subset of 223 was evaluated for individual identification, parentage testing, and ancestral information content in the two most extensively studied species, Eucalyptus grandis and Eucalyptus globulus. Microsatellites showed high transferability and overlapping allele size range, suggesting they have arisen still in their common ancestor and confirming the extensive genome conservation between these two species. A consensus linkage map with 437 microsatellites, the most comprehensive microsatellite-only genetic map for Eucalyptus, was built by assembling segregation data from three mapping populations and anchored to the Eucalyptus genome. An overall colinearity between recombination-based and physical positioning of 84% of the mapped microsatellites was observed, with some ordering discrepancies and sporadic locus duplications, consistent with the recently described whole genome duplication events in Eucalyptus. The linkage map covered 95.2% of the 605.8-Mbp assembled genome sequence, placing one microsatellite every 1.55 Mbp on average, and an overall estimate of physical to recombination distance of 618 kbp/cM. The genetic parameters estimates together with linkage and physical position data for this large set of microsatellites should assist marker choice for genome-wide population genetics and comparative mapping in Eucalyptus. © 2014 John Wiley & Sons Ltd.
Characterization of Hepatitis C Virus (HCV) Envelope Diversification from Acute to Chronic Infection within a Sexually Transmitted HCV Cluster by Using Single-Molecule, Real-Time Sequencing

PubMed Central

Ho, Cynthia K. Y.; Raghwani, Jayna; Koekkoek, Sylvie; Liang, Richard H.; Van der Meer, Jan T. M.; Van Der Valk, Marc; De Jong, Menno; Pybus, Oliver G.

2016-01-01

ABSTRACT In contrast to other available next-generation sequencing platforms, PacBio single-molecule, real-time (SMRT) sequencing has the advantage of generating long reads albeit with a relatively higher error rate in unprocessed data. Using this platform, we longitudinally sampled and sequenced the hepatitis C virus (HCV) envelope genome region (1,680 nucleotides [nt]) from individuals belonging to a cluster of sexually transmitted cases. All five subjects were coinfected with HIV-1 and a closely related strain of HCV genotype 4d. In total, 50 samples were analyzed by using SMRT sequencing. By using 7 passes of circular consensus sequencing, the error rate was reduced to 0.37%, and the median number of sequences was 612 per sample. A further reduction of insertions was achieved by alignment against a sample-specific reference sequence. However, in vitro recombination during PCR amplification could not be excluded. Phylogenetic analysis supported close relationships among HCV sequences from the four male subjects and subsequent transmission from one subject to his female partner. Transmission was characterized by a strong genetic bottleneck. Viral genetic diversity was low during acute infection and increased upon progression to chronicity but subsequently fluctuated during chronic infection, caused by the alternate detection of distinct coexisting lineages. SMRT sequencing combines long reads with sufficient depth for many phylogenetic analyses and can therefore provide insights into within-host HCV evolutionary dynamics without the need for haplotype reconstruction using statistical algorithms. IMPORTANCE Next-generation sequencing has revolutionized the study of genetically variable RNA virus populations, but for phylogenetic and evolutionary analyses, longer sequences than those generated by most available platforms, while minimizing the intrinsic error rate, are desired. Here, we demonstrate for the first time that PacBio SMRT sequencing technology can be used to generate full-length HCV envelope sequences at the single-molecule level, providing a data set with large sequencing depth for the characterization of intrahost viral dynamics. The selection of consensus reads derived from at least 7 full circular consensus sequencing rounds significantly reduced the intrinsic high error rate of this method. We used this method to genetically characterize a unique transmission cluster of sexually transmitted HCV infections, providing insight into the distinct evolutionary pathways in each patient over time and identifying the transmission-associated genetic bottleneck as well as fluctuations in viral genetic diversity over time, accompanied by dynamic shifts in viral subpopulations. PMID:28077634
An artificial intelligence approach fit for tRNA gene studies in the era of big sequence data.

PubMed

Iwasaki, Yuki; Abe, Takashi; Wada, Kennosuke; Wada, Yoshiko; Ikemura, Toshimichi

2017-09-12

Unsupervised data mining capable of extracting a wide range of knowledge from big data without prior knowledge or particular models is a timely application in the era of big sequence data accumulation in genome research. By handling oligonucleotide compositions as high-dimensional data, we have previously modified the conventional self-organizing map (SOM) for genome informatics and established BLSOM, which can analyze more than ten million sequences simultaneously. Here, we develop BLSOM specialized for tRNA genes (tDNAs) that can cluster (self-organize) more than one million microbial tDNAs according to their cognate amino acid solely depending on tetra- and pentanucleotide compositions. This unsupervised clustering can reveal combinatorial oligonucleotide motifs that are responsible for the amino acid-dependent clustering, as well as other functionally and structurally important consensus motifs, which have been evolutionarily conserved. BLSOM is also useful for identifying tDNAs as phylogenetic markers for special phylotypes. When we constructed BLSOM with 'species-unknown' tDNAs from metagenomic sequences plus 'species-known' microbial tDNAs, a large portion of metagenomic tDNAs self-organized with species-known tDNAs, yielding information on microbial communities in environmental samples. BLSOM can also enhance accuracy in the tDNA database obtained from big sequence data. This unsupervised data mining should become important for studying numerous functionally unclear RNAs obtained from a wide range of organisms.
Low dose rectal inoculation of rhesus macaques by SIV SME660 or SIV MAC251 recapitulates human mucosal infection by HIV-1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Koraber, Bette; Perelson, Alan; Hraber, Peter

2009-01-01

Recently, we developed a novel approach to the identification of transmitted or early founder HIV -1 genomes in acutely infected humans based on single genome amplification and sequencing. Here we tested this approach in 18 acutely infected Indian rhesus macaques to determine the molecular features of SIV transmission. Animals were inoculated intrarectally (IR) or intravenously (IV) with stocks of SIVmac251 or SIVsmE660 that exhibited sequence diversity typical of early-chronic HIV -1 infection. 987 full-length SIV env sequences (median of 48 per animal) were determined from plasma virion RNA one to five weeks after infection. IR inoculation was followed by productivemore » infection by one or few viruses (median 1; range 1-5) that diversified randomly with near star-like phylogeny and a Poisson distribution of mutations. Consensus viral sequences from ramp-up and peak viremia were identical to viruses found in the inocula or differed from them by only one or few nuc1eotides, providing direct evidence that early plasma viral sequences coalesce to transmitted/founder virus( es). IV infection was approximately 10,000-fold more efficient than IR infection, and viruses transmitted by either route represented the full genetic spectra of the inocula. These findings identify key similarities in mucosal transmission and early diversification between SIV and HIV -1.« less
Genomic structure and chromosomal localization of GML (GPI-anchored molecule-like protein), a gene induced by p53

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kimura, Yasutoshi; Furuhata, Tomohisa; Nakamura, Yusuke

1997-05-01

Among its known functions, tumor suppressor gene p53 serves as a transcriptional regulator and mediates various signals through activation of downstream genes. We recently identified a novel gene, GML (glycosylphosphatidylinositol (GPI)-anchored molecule-like protein), whose expression is specifically induced by wildtype p53. To characterize the GML gene further, we determined 35.8 kb of DNA sequence that included a consensus binding sequence for p53 and the entire GML gene. The GML gene consists of four exons, and the p53-binding sequence is present in the 5{prime}-flanking region. In genomic organization this gene resembles genes encoding murine Ly-6 glycoproteins, a human homologue of themore » Ly-6 family called RIG-E, and CD59; products of these genes, known as GPI-anchored proteins, are variously involved in signal transduction, cell-cell adhesion, and cell-matrix attachment. FISH analysis revealed that the GML gene is located on human chromosome 8q24.3. Genes encoding at least two other GPI-anchored molecules, E48 and RIG-E, are also located in this region. 20 refs., 2 figs., 1 tab.« less
Developmentally programmed DNA splicing in Paramecium reveals short-distance crosstalk between DNA cleavage sites

PubMed Central

Gratias, Ariane; Lepère, Gersende; Garnier, Olivier; Rosa, Sarah; Duharcourt, Sandra; Malinsky, Sophie; Meyer, Eric; Bétermier, Mireille

2008-01-01

Somatic genome assembly in the ciliate Paramecium involves the precise excision of thousands of short internal eliminated sequences (IESs) that are scattered throughout the germline genome and often interrupt open reading frames. Excision is initiated by double-strand breaks centered on the TA dinucleotides that are conserved at each IES boundary, but the factors that drive cleavage site recognition remain unknown. A degenerate consensus was identified previously at IES ends and genetic analyses confirmed the participation of their nucleotide sequence in efficient excision. Even for wild-type IESs, however, variant excision patterns (excised or nonexcised) may be inherited maternally through sexual events, in a homology-dependent manner. We show here that this maternal epigenetic control interferes with the targeting of DNA breaks at IES ends. Furthermore, we demonstrate that a mutation in the TA at one end of an IES impairs DNA cleavage not only at the mutant end but also at the wild-type end. We conclude that crosstalk between both ends takes place prior to their cleavage and propose that the ability of an IES to adopt an excision-prone conformation depends on the combination of its nucleotide sequence and of additional determinants. PMID:18420657
rVISTA 2.0: Evolutionary Analysis of Transcription Factor Binding Sites

DOE Office of Scientific and Technical Information (OSTI.GOV)

Loots, G G; Ovcharenko, I

2004-01-28

Identifying and characterizing the patterns of DNA cis-regulatory modules represents a challenge that has the potential to reveal the regulatory language the genome uses to dictate transcriptional dynamics. Several studies have demonstrated that regulatory modules are under positive selection and therefore are often conserved between related species. Using this evolutionary principle we have created a comparative tool, rVISTA, for analyzing the regulatory potential of noncoding sequences. The rVISTA tool combines transcription factor binding site (TFBS) predictions, sequence comparisons and cluster analysis to identify noncoding DNA regions that are highly conserved and present in a specific configuration within an alignment. Heremore » we present the newly developed version 2.0 of the rVISTA tool that can process alignments generated by both zPicture and PipMaker alignment programs or use pre-computed pairwise alignments of seven vertebrate genomes available from the ECR Browser. The rVISTA web server is closely interconnected with the TRANSFAC database, allowing users to either search for matrices present in the TRANSFAC library collection or search for user-defined consensus sequences. rVISTA tool is publicly available at http://rvista.dcode.org/.« less
[Polymorphism of KPI-A genes from plants of the subgenus Potatoe (sect. Petota, Estolonifera and Lycopersicum) and subgenus Solanum].

PubMed

Krinitsyna, A A; Mel'nikova, N V; Belenikin, M S; Poltronieri, P; Santino, A; Kudriavtseva, A V; Savilova, A M; Speranskaia, A S

2013-01-01

Kunitz-type proteinase inhibitor proteins of group A (KPI-A) are involved in the protection of potato plants from pathogens and pests. Although sequences of large number of the KPI-A genes from different species of cultivated potato (Solanum tuberosum subsp. tuberosum) and a few genes from tomato (Solanum lycopersicum) are known to date, information about the allelic diversity of these genes in other species of the genus Solanum is lacking. In our work, the consensus sequences of the KPI-A genes were established in two species of subgenus Potatoe sect. Petota (Solanum tuberosum subsp. andigenum--5 genes and Solanum stoloniferum--2 genes) and in the subgenus Solanum (Solanum nigrum--5 genes) by amplification, cloning, sequencing and subsequent analysis. The determined sequences of KPI-A genes were 97-100% identical to known sequences of the cultivated potato of sect. Petota (cultivated potato Solanum tuberosum subsp. tuberosum) and sect. Etuberosum (S. palustre). The interspecific variability of these genes did not exceed the intraspecific variability for all studied species except Solanum lycopersicum. The distribution of highly variable and conserved sequences in the mature protein-encoding regions was uniform for all investigated KPI-A genes. However, our attempts to amplify the homologous genes using the same primers and the genomes of Solanum dulcamarum, Solanum lycopersicum and Mandragora officinarum resulted in no product formation. Phylogenetic analysis of KPI-A diversity showed that the sequences of the S. lycopersicum form independent cluster, whereas KPI-A of S. nigrum and species of sect. Etuberosum and sect. Petota are closely related and do not form species-specific subclasters. Although Solanum nigrum is resistant to all known races of economically one of the most important diseases of solanaceous plants oomycete Phytophthora infestans aminoacid sequences encoding by KPI-A genes from its genome have nearly or absolutely no differences to the same from genomes of cultivated potatoes involved by P. infestans.
Pestoides F, an atypical Yersinia pestis strain from the former Soviet Union.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Garcia, Emilio; Worsham, Patricia; Bearden, S.

2007-01-01

Unlike the classical Yersinia pestis strains, members of an atypical group of Y. pestis from Central Asia, denominated Y. pestis subspecies caucasica (also known as one of several pestoides types), are distinguished by a number of characteristics including their ability to ferment rhamnose and melibiose, their lack of the small plasmid encoding the plasminogen activator (pla) and pesticin, and their exceptionally large variants of the virulence plasmid pMT (encoding murine toxin and capsular antigen). We have obtained the entire genome sequence of Y. pestis Pestoides F, an isolate from the former Soviet Union that has enabled us to carryout amore » comprehensive genome-wide comparison of this organism's genomic content against the six published sequences of Y. pestis and their Y. pseudotuberculosis ancestor. Based on classical glycerol fermentation (+ve) and nitrate reduction (+ve) Y. pestis Pestoides F is an isolate that belongs to the biovar antiqua. This strain is unusual in other characteristics such as the fact that it carries a non-consensus V antigen (lcrV) sequence, and that unlike other Pla(-) strains, Pestoides F retains virulence by the parenteral and aerosol routes. The chromosome of Pestoides F is 4,517,345 bp in size comprising some 3,936 predicted coding sequences, while its pCD and pMT plasmids are 71,507 bp and 137,010 bp in size respectively. Comparison of chromosome-associated genes in Pestoides F with those in the other sequenced Y. pestis strains reveals differences ranging from strain-specific rearrangements, insertions, deletions, single nucleotide polymorphisms, and a unique distribution of insertion sequences. There is a single approximately 7 kb unique region in the chromosome not found in any of the completed Y. pestis strains sequenced to date, but which is present in the Y. pseudotuberculosis ancestor. Taken together, these findings are consistent with Pestoides F being derived from the most ancient lineage of Y. pestis yet sequenced.« less
Pestoides F, and Atypical Yersinia pestis Strain from the Former Soviet Union

DOE Office of Scientific and Technical Information (OSTI.GOV)

Garcia, E; Worsham, P; Bearden, S

2007-01-05

Unlike the classical Yersinia pestis strains, members of an atypical group of Y. pestis from Central Asia, denominated Y. pestis subspecies caucasica (also known as one of several pestoides types), are distinguished by a number of characteristics including their ability to ferment rhamnose and melibiose, their lacking the small plasmid encoding the plasminogen activator (pla) and pesticin, and their exceptionally large variants of the virulence plasmid pMT (encoding murine toxin and capsular antigen). We have obtained the entire genome sequence of Y. pestis Pestoides F, an isolate from the former Soviet Union that has enabled us to carryout a comprehensivemore » genome-wide comparison of this organism's genomic content against the six published sequences of Y. pestis and their Y. pseudotuberculosis ancestor. Based on classical glycerol fermentation (+ve) and nitrate reduction (+ve) Y. pestis Pestoides F is an isolate that belongs to the biovar antiqua. This strain is unusual in other characteristics such as the fact that it carries a non-consensus V antigen (lcrV) sequence, and that unlike other Pla{sup -} strains, Pestoides F retains virulence by the parenteral and aerosol routes. The chromosome of Pestoides F is 4,517,345 bp in size comprising some 3,936 predicted coding sequences, while its pCD and pMT plasmids are 71,507 bp and 137,010 bp in size respectively. Comparison of chromosome-associated genes in Pestoides F with those in the other sequenced Y. pestis strains, reveals a series of differences ranging from strain-specific rearrangements, insertions, deletions, single nucleotide polymorphisms, and a unique distribution of insertion sequences. There is a single {approx}7 kb unique region in the chromosome not found in any of the completed Y. pestis strains sequenced to date, but which is present in the Y. pseudotuberculosis ancestor. Taken together, these findings are consistent with Pestoides F being derived from the most ancient lineage of Y. pestis yet sequenced.« less
Detection of a novel herpesvirus from bats in the Philippines.

PubMed

Sano, Kaori; Okazaki, Sachiko; Taniguchi, Satoshi; Masangkay, Joseph S; Puentespina, Roberto; Eres, Eduardo; Cosico, Edison; Quibod, Niña; Kondo, Taisuke; Shimoda, Hiroshi; Hatta, Yuuki; Mitomo, Shumpei; Oba, Mami; Katayama, Yukie; Sassa, Yukiko; Furuya, Tetsuya; Nagai, Makoto; Une, Yumi; Maeda, Ken; Kyuwa, Shigeru; Yoshikawa, Yasuhiro; Akashi, Hiroomi; Omatsu, Tsutomu; Mizutani, Tetsuya

2015-08-01

Bats are natural hosts of many zoonotic viruses. Monitoring bat viruses is important to detect novel bat-borne infectious diseases. In this study, next generation sequencing techniques and conventional PCR were used to analyze intestine, lung, and blood clot samples collected from wild bats captured at three locations in Davao region, in the Philippines in 2012. Different viral genes belonging to the Retroviridae and Herpesviridae families were identified using next generation sequencing. The existence of herpesvirus in the samples was confirmed by PCR using herpesvirus consensus primers. The nucleotide sequences of the resulting PCR amplicons were 166-bp. Further phylogenetic analysis identified that the virus from which this nucleotide sequence was obtained belonged to the Gammaherpesvirinae subfamily. PCR using primers specific to the nucleotide sequence obtained revealed that the infection rate among the captured bats was 30 %. In this study, we present the partial genome of a novel gammaherpesvirus detected from wild bats. Our observations also indicate that this herpesvirus may be widely distributed in bat populations in Davao region.
The GENCODE exome: sequencing the complete human exome

PubMed Central

Coffey, Alison J; Kokocinski, Felix; Calafato, Maria S; Scott, Carol E; Palta, Priit; Drury, Eleanor; Joyce, Christopher J; LeProust, Emily M; Harrow, Jen; Hunt, Sarah; Lehesjoki, Anna-Elina; Turner, Daniel J; Hubbard, Tim J; Palotie, Aarno

2011-01-01

Sequencing the coding regions, the exome, of the human genome is one of the major current strategies to identify low frequency and rare variants associated with human disease traits. So far, the most widely used commercial exome capture reagents have mainly targeted the consensus coding sequence (CCDS) database. We report the design of an extended set of targets for capturing the complete human exome, based on annotation from the GENCODE consortium. The extended set covers an additional 5594 genes and 10.3 Mb compared with the current CCDS-based sets. The additional regions include potential disease genes previously inaccessible to exome resequencing studies, such as 43 genes linked to ion channel activity and 70 genes linked to protein kinase activity. In total, the new GENCODE exome set developed here covers 47.9 Mb and performed well in sequence capture experiments. In the sample set used in this study, we identified over 5000 SNP variants more in the GENCODE exome target (24%) than in the CCDS-based exome sequencing. PMID:21364695

EST-derived SSR markers used as anchor loci for the construction of a consensus linkage map in ryegrass (Lolium spp.)

PubMed Central

2010-01-01

Background Genetic markers and linkage mapping are basic prerequisites for marker-assisted selection and map-based cloning. In the case of the key grassland species Lolium spp., numerous mapping populations have been developed and characterised for various traits. Although some genetic linkage maps of these populations have been aligned with each other using publicly available DNA markers, the number of common markers among genetic maps is still low, limiting the ability to compare candidate gene and QTL locations across germplasm. Results A set of 204 expressed sequence tag (EST)-derived simple sequence repeat (SSR) markers has been assigned to map positions using eight different ryegrass mapping populations. Marker properties of a subset of 64 EST-SSRs were assessed in six to eight individuals of each mapping population and revealed 83% of the markers to be polymorphic in at least one population and an average number of alleles of 4.88. EST-SSR markers polymorphic in multiple populations served as anchor markers and allowed the construction of the first comprehensive consensus map for ryegrass. The integrated map was complemented with 97 SSRs from previously published linkage maps and finally contained 284 EST-derived and genomic SSR markers. The total map length was 742 centiMorgan (cM), ranging for individual chromosomes from 70 cM of linkage group (LG) 6 to 171 cM of LG 2. Conclusions The consensus linkage map for ryegrass based on eight mapping populations and constructed using a large set of publicly available Lolium EST-SSRs mapped for the first time together with previously mapped SSR markers will allow for consolidating existing mapping and QTL information in ryegrass. Map and markers presented here will prove to be an asset in the development for both molecular breeding of ryegrass as well as comparative genetics and genomics within grass species. PMID:20712870
Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea.

PubMed

Makarova, Kira S; Sorokin, Alexander V; Novichkov, Pavel S; Wolf, Yuri I; Koonin, Eugene V

2007-11-27

An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs). Rapid accumulation of genome sequences creates opportunities for refining COGs but also represents a challenge because of error amplification. One of the practical strategies involves construction of refined COGs for phylogenetically compact subsets of genomes. New Archaeal Clusters of Orthologous Genes (arCOGs) were constructed for 41 archaeal genomes (13 Crenarchaeota, 27 Euryarchaeota and one Nanoarchaeon) using an improved procedure that employs a similarity tree between smaller, group-specific clusters, semi-automatically partitions orthology domains in multidomain proteins, and uses profile searches for identification of remote orthologs. The annotation of arCOGs is a consensus between three assignments based on the COGs, the CDD database, and the annotations of homologs in the NR database. The 7538 arCOGs, on average, cover approximately 88% of the genes in a genome compared to a approximately 76% coverage in COGs. The finer granularity of ortholog identification in the arCOGs is apparent from the fact that 4538 arCOGs correspond to 2362 COGs; approximately 40% of the arCOGs are new. The archaeal gene core (protein-coding genes found in all 41 genome) consists of 166 arCOGs. The arCOGs were used to reconstruct gene loss and gene gain events during archaeal evolution and gene sets of ancestral forms. The Last Archaeal Common Ancestor (LACA) is conservatively estimated to possess 996 genes compared to 1245 and 1335 genes for the last common ancestors of Crenarchaeota and Euryarchaeota, respectively. It is inferred that LACA was a chemoautotrophic hyperthermophile that, in addition to the core archaeal functions, encoded more idiosyncratic systems, e.g., the CASS systems of antivirus defense and some toxin-antitoxin systems. The arCOGs provide a convenient, flexible framework for functional annotation of archaeal genomes, comparative genomics and evolutionary reconstructions. Genomic reconstructions suggest that the last common ancestor of archaea might have been (nearly) as advanced as the modern archaeal hyperthermophiles. ArCOGs and related information are available at: ftp://ftp.ncbi.nih.gov/pub/koonin/arCOGs/.
Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions.

PubMed

Senol Cali, Damla; Kim, Jeremie S; Ghose, Saugata; Alkan, Can; Mutlu, Onur

2018-04-02

Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance, as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we (1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and (2) provide guidelines for determining the appropriate tools for each step. Based on our analyses, we make four key observations: (1) the choice of the tool for basecalling plays a critical role in overcoming the high error rates of nanopore sequencing technology. (2) Read-to-read overlap finding tools, GraphMap and Minimap, perform similarly in terms of accuracy. However, Minimap has a lower memory usage, and it is faster than GraphMap. (3) There is a trade-off between accuracy and performance when deciding on the appropriate tool for the assembly step. The fast but less accurate assembler Miniasm can be used for quick initial assembly, and further polishing can be applied on top of it to increase the accuracy, which leads to faster overall assembly. (4) The state-of-the-art polishing tool, Racon, generates high-quality consensus sequences while providing a significant speedup over another polishing tool, Nanopolish. We analyze various combinations of different tools and expose the trade-offs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious and effective choices for each step of the genome assembly pipeline using nanopore sequence data. Also, with the help of bottlenecks we have found, developers can improve the current tools or build new ones that are both accurate and fast, to overcome the high error rates of the nanopore sequencing technology.
Analysis and Functional Annotation of an Expressed Sequence Tag Collection for Tropical Crop Sugarcane

PubMed Central

Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo

2003-01-01

To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979
Towards a Consensus Annotation System (GSC8 Meeting)

ScienceCinema

White, Owen

2018-02-01

The Genomic Standards Consortium was formed in September 2005. It is an international, open-membership working body which promotes standardization in the description of genomes and the exchange and integration of genomic data. The 2009 meeting was an activity of a five-year funding from the National Science Foundation and was organized held at the DOE Joint Genome Institute with organizational support provided by the JGI and by the University of California - San Diego. Towards Consensus Annotation at the Genomic Standards Consortium's 8th meeting at the DOE JGI in Walnut Creek, CA on Sept. 10, 2009.
The mariner transposons belonging to the irritans subfamily were maintained in chordate genomes by vertical transmission.

PubMed

Sinzelle, Ludivine; Chesneau, Albert; Bigot, Yves; Mazabraud, André; Pollet, Nicolas

2006-01-01

Mariner-like elements (MLEs) belong to the Tc1-mariner superfamily of DNA transposons, which is very widespread in animal genomes. We report here the first complete description of a MLE, Xtmar1, within the genome of a poikilotherm vertebrate, the amphibian Xenopus tropicalis. A close relative, XlMLE, is also characterized within the genome of a sibling species, Xenopus laevis. The phylogenetic analysis of the relationships between MLE transposases reveals that Xtmar1 is closely related to Hsmar2 and Bytmar1 and that together they form a second distinct lineage of the irritans subfamily. All members of this lineage are also characterized by the 36- to 43-bp size of their imperfectly conserved inverted terminal repeats and by the -8-bp motif located at their outer extremity. Since XlMLE, Xlmar1, and Hsmar2 are present in species located at both extremities of the vertebrate evolutionary tree, we looked for MLE relatives belonging to the same subfamily in the available sequencing projects using the amino acid consensus sequence of the Hsmar2 transposase as an in silico probe. We found that irritans MLEs are present in chordate genomes including most craniates. This therefore suggests that these elements have been present within chordate genomes for 750 Myr and that the main way they have been maintained in these species has been via vertical transmission. The very small number of stochastic losses observed in the data available suggests that their inactivation during evolution has been very slow.
Tree decomposition based fast search of RNA structures including pseudoknots in genomes.

PubMed

Song, Yinglei; Liu, Chunmei; Malmberg, Russell; Pan, Fangfang; Cai, Liming

2005-01-01

Searching genomes for RNA secondary structure with computational methods has become an important approach to the annotation of non-coding RNAs. However, due to the lack of efficient algorithms for accurate RNA structure-sequence alignment, computer programs capable of fast and effectively searching genomes for RNA secondary structures have not been available. In this paper, a novel RNA structure profiling model is introduced based on the notion of a conformational graph to specify the consensus structure of an RNA family. Tree decomposition yields a small tree width t for such conformation graphs (e.g., t = 2 for stem loops and only a slight increase for pseudo-knots). Within this modelling framework, the optimal alignment of a sequence to the structure model corresponds to finding a maximum valued isomorphic subgraph and consequently can be accomplished through dynamic programming on the tree decomposition of the conformational graph in time O(k(t)N(2)), where k is a small parameter; and N is the size of the projiled RNA structure. Experiments show that the application of the alignment algorithm to search in genomes yields the same search accuracy as methods based on a Covariance model with a significant reduction in computation time. In particular; very accurate searches of tmRNAs in bacteria genomes and of telomerase RNAs in yeast genomes can be accomplished in days, as opposed to months required by other methods. The tree decomposition based searching tool is free upon request and can be downloaded at our site h t t p ://w.uga.edu/RNA-informatics/software/index.php.
Innovative assembly strategy contributes to understanding the evolution and conservation genetics of the endangered Solenodon paradoxus from the island of Hispaniola.

PubMed

Grigorev, Kirill; Kliver, Sergey; Dobrynin, Pavel; Komissarov, Aleksey; Wolfsberger, Walter; Krasheninnikova, Ksenia; Afanador-Herna Ndez, Yashira M; Brandt, Adam L; Paulino, Liz A; Carreras, Rosanna; Rodríguez, Luis E; Nu N Ez, Adrell; Brandt, Jessica R; Silva, Filipe; Herna Ndez-Martich, J David; Majeske, Audrey J; Antunes, Agostinho; Roca, Alfred L; O'Brien, Stephen J; Martínez-Cruzado, Juan Carlos; Oleksyk, Taras K

2018-03-16

Solenodons are insectivores living in Hispaniola and Cuba that form an isolated branch in the tree of placental mammals highly divergent from other eulipothyplan insectivores The history, unique biology and adaptations of these enigmatic venomous species could be illuminated by the availability of genome data, but a whole genome assembly for solenodons has not been previously performed, partially due to the difficulty in obtaining samples from the field. Island isolation and reduced numbers have likely resulted in high homozygosity within the Hispaniolan solenodon (Solenodon paradoxus), thus we tested the performance of several assembly strategies on the genome of this genetically impoverished species. The string-graph based assembly strategy seemed a better choice compared to the conventional de Bruijn graph approach, due to the high levels of homozygosity, which is often a hallmark of endemic or endangered species. A consensus reference genome was assembled from sequences of five individuals from the southern subspecies (S. p. woodi). In addition, we obtained additional sequence from one sample of the northern subspecies (S. p. paradoxus). The resulting genome assemblies were compared to each other, and annotated for genes, with a specific emphasis on venom genes, repeats, variable microsatellite loci and other genomic variants. Phylogenetic positioning and selection signatures were inferred based on 4,416 single copy orthologs from 10 other mammals. We estimated that solenodons diverged from other extant mammals 73.6 Mya. Patterns of SNP variation allowed us to infer population demography, which supported a subspecies split within the Hispaniolan solenodon at least 300 Kya.
Development of versatile non-homologous end joining-based knock-in module for genome editing.

PubMed

Sawatsubashi, Shun; Joko, Yudai; Fukumoto, Seiji; Matsumoto, Toshio; Sugano, Shigeo S

2018-01-12

CRISPR/Cas9-based genome editing has dramatically accelerated genome engineering. An important aspect of genome engineering is efficient knock-in technology. For improved knock-in efficiency, the non-homologous end joining (NHEJ) repair pathway has been used over the homology-dependent repair pathway, but there remains a need to reduce the complexity of the preparation of donor vectors. We developed the versatile NHEJ-based knock-in module for genome editing (VIKING). Using the consensus sequence of the time-honored pUC vector to cut donor vectors, any vector with a pUC backbone could be used as the donor vector without customization. Conditions required to minimize random integration rates of the donor vector were also investigated. We attempted to isolate null lines of the VDR gene in human HaCaT keratinocytes using knock-in/knock-out with a selection marker cassette, and found 75% of clones isolated were successfully knocked-in. Although HaCaT cells have hypotetraploid genome composition, the results suggest multiple clones have VDR null phenotypes. VIKING modules enabled highly efficient knock-in of any vectors harboring pUC vectors. Users now can insert various existing vectors into an arbitrary locus in the genome. VIKING will contribute to low-cost genome engineering.
A high-density intraspecific SNP linkage map of pigeonpea (Cajanas cajan L. Millsp.)

PubMed Central

Mandal, Paritra; Bhutani, Shefali; Dutta, Sutapa; Kumawat, Giriraj; Singh, Bikram Pratap; Chaudhary, A. K.; Yadav, Rekha; Gaikwad, K.; Sevanthi, Amitha Mithra; Datta, Subhojit; Raje, Ranjeet S.; Sharma, Tilak R.; Singh, Nagendra Kumar

2017-01-01

Pigeonpea (Cajanus cajan (L.) Millsp.) is a major food legume cultivated in semi-arid tropical regions including the Indian subcontinent, Africa, and Southeast Asia. It is an important source of protein, minerals, and vitamins for nearly 20% of the world population. Due to high carbon sequestration and drought tolerance, pigeonpea is an important crop for the development of climate resilient agriculture and nutritional security. However, pigeonpea productivity has remained low for decades because of limited genetic and genomic resources, and sparse utilization of landraces and wild pigeonpea germplasm. Here, we present a dense intraspecific linkage map of pigeonpea comprising 932 markers that span a total adjusted map length of 1,411.83 cM. The consensus map is based on three different linkage maps that incorporate a large number of single nucleotide polymorphism (SNP) markers derived from next generation sequencing data, using Illumina GoldenGate bead arrays, and genotyping with restriction site associated DNA (RAD) sequencing. The genotyping-by-sequencing enhanced the marker density but was met with limited success due to lack of common markers across the genotypes of mapping population. The integrated map has 547 bead-array SNP, 319 RAD-SNP, and 65 simple sequence repeat (SSR) marker loci. We also show here correspondence between our linkage map and published genome pseudomolecules of pigeonpea. The availability of a high-density linkage map will help improve the anchoring of the pigeonpea genome to its chromosomes and the mapping of genes and quantitative trait loci associated with useful agronomic traits. PMID:28654689
Comparison of the Equine Reference Sequence with Its Sanger Source Data and New Illumina Reads

PubMed Central

Rebolledo-Mendez, Jovan; Hestand, Matthew S.; Coleman, Stephen J.; Zeng, Zheng; Orlando, Ludovic; MacLeod, James N.; Kalbfleisch, Ted

2015-01-01

The reference assembly for the domestic horse, EquCab2, published in 2009, was built using approximately 30 million Sanger reads from a Thoroughbred mare named Twilight. Contiguity in the assembly was facilitated using nearly 315 thousand BAC end sequences from Twilight’s half brother Bravo. Since then, it has served as the foundation for many genome-wide analyses that include not only the modern horse, but ancient horses and other equid species as well. As data mapped to this reference has accumulated, consistent variation between mapped datasets and the reference, in terms of regions with no read coverage, single nucleotide variants, and small insertions/deletions have become apparent. In many cases, it is not clear whether these differences are the result of true sequence variation between the research subjects’ and Twilight’s genome or due to errors in the reference. EquCab2 is regarded as “The Twilight Assembly.” The objective of this study was to identify inconsistencies between the EquCab2 assembly and the source Twilight Sanger data used to build it. To that end, the original Sanger and BAC end reads have been mapped back to this equine reference and assessed with the addition of approximately 40X coverage of new Illumina Paired-End sequence data. The resulting mapped datasets identify those regions with low Sanger read coverage, as well as variation in genomic content that is not consistent with either the original Twilight Sanger data or the new genomic sequence data generated from Twilight on the Illumina platform. As the haploid EquCab2 reference assembly was created using Sanger reads derived largely from a single individual, the vast majority of variation detected in a mapped dataset comprised of those same Sanger reads should be heterozygous. In contrast, homozygous variations would represent either errors in the reference or contributions from Bravo's BAC end sequences. Our analysis identifies 720,843 homozygous discrepancies between new, high throughput genomic sequence data generated for Twilight and the EquCab2 reference assembly. Most of these represent errors in the assembly, while approximately 10,000 are demonstrated to be contributions from another horse. Other results are presented that include the binary alignment map file of the mapped Sanger reads, a list of variants identified as discrepancies between the source data and resulting reference, and a BED annotation file that lists the regions of the genome whose consensus was likely derived from low coverage alignments. PMID:26107638
Pearl millet [Pennisetum glaucum (L.) R. Br.] consensus linkage map constructed using four RIL mapping populations and newly developed EST-SSRs.

PubMed

Rajaram, Vengaldas; Nepolean, Thirunavukkarasu; Senthilvel, Senapathy; Varshney, Rajeev K; Vadez, Vincent; Srivastava, Rakesh K; Shah, Trushar M; Supriya, Ambawat; Kumar, Sushil; Ramana Kumari, Basava; Bhanuprakash, Amindala; Narasu, Mangamoori Lakshmi; Riera-Lizarazu, Oscar; Hash, Charles Thomas

2013-03-09

Pearl millet [Pennisetum glaucum (L.) R. Br.] is a widely cultivated drought- and high-temperature tolerant C4 cereal grown under dryland, rainfed and irrigated conditions in drought-prone regions of the tropics and sub-tropics of Africa, South Asia and the Americas. It is considered an orphan crop with relatively few genomic and genetic resources. This study was undertaken to increase the EST-based microsatellite marker and genetic resources for this crop to facilitate marker-assisted breeding. Newly developed EST-SSR markers (99), along with previously mapped EST-SSR (17), genomic SSR (53) and STS (2) markers, were used to construct linkage maps of four F7 recombinant inbred populations (RIP) based on crosses ICMB 841-P3 × 863B-P2 (RIP A), H 77/833-2 × PRLT 2/89-33 (RIP B), 81B-P6 × ICMP 451-P8 (RIP C) and PT 732B-P2 × P1449-2-P1 (RIP D). Mapped loci numbers were greatest for RIP A (104), followed by RIP B (78), RIP C (64) and RIP D (59). Total map lengths (Haldane) were 615 cM, 690 cM, 428 cM and 276 cM, respectively. A total of 176 loci detected by 171 primer pairs were mapped among the four crosses. A consensus map of 174 loci (899 cM) detected by 169 primer pairs was constructed using MergeMap to integrate the individual linkage maps. Locus order in the consensus map was well conserved for nearly all linkage groups. Eighty-nine EST-SSR marker loci from this consensus map had significant BLAST hits (top hits with e-value ≤ 1E-10) on the genome sequences of rice, foxtail millet, sorghum, maize and Brachypodium with 35, 88, 58, 48 and 38 loci, respectively. The consensus map developed in the present study contains the largest set of mapped SSRs reported to date for pearl millet, and represents a major consolidation of existing pearl millet genetic mapping information. This study increased numbers of mapped pearl millet SSR markers by >50%, filling important gaps in previously published SSR-based linkage maps for this species and will greatly facilitate SSR-based QTL mapping and applied marker-assisted selection programs.
A high HIV-1 strain variability in London, UK, revealed by full-genome analysis: Results from the ICONIC project

PubMed Central

Frampton, Dan; Gallo Cassarino, Tiziano; Raffle, Jade; Hubb, Jonathan; Ferns, R. Bridget; Waters, Laura; Tong, C. Y. William; Kozlakidis, Zisis; Hayward, Andrew; Kellam, Paul; Pillay, Deenan; Clark, Duncan; Nastouli, Eleni; Leigh Brown, Andrew J.

2018-01-01

Background & methods The ICONIC project has developed an automated high-throughput pipeline to generate HIV nearly full-length genomes (NFLG, i.e. from gag to nef) from next-generation sequencing (NGS) data. The pipeline was applied to 420 HIV samples collected at University College London Hospitals NHS Trust and Barts Health NHS Trust (London) and sequenced using an Illumina MiSeq at the Wellcome Trust Sanger Institute (Cambridge). Consensus genomes were generated and subtyped using COMET, and unique recombinants were studied with jpHMM and SimPlot. Maximum-likelihood phylogenetic trees were constructed using RAxML to identify transmission networks using the Cluster Picker. Results The pipeline generated sequences of at least 1Kb of length (median = 7.46Kb, IQR = 4.01Kb) for 375 out of the 420 samples (89%), with 174 (46.4%) being NFLG. A total of 365 sequences (169 of them NFLG) corresponded to unique subjects and were included in the down-stream analyses. The most frequent HIV subtypes were B (n = 149, 40.8%) and C (n = 77, 21.1%) and the circulating recombinant form CRF02_AG (n = 32, 8.8%). We found 14 different CRFs (n = 66, 18.1%) and multiple URFs (n = 32, 8.8%) that involved recombination between 12 different subtypes/CRFs. The most frequent URFs were B/CRF01_AE (4 cases) and A1/D, B/C, and B/CRF02_AG (3 cases each). Most URFs (19/26, 73%) lacked breakpoints in the PR+RT pol region, rendering them undetectable if only that was sequenced. Twelve (37.5%) of the URFs could have emerged within the UK, whereas the rest were probably imported from sub-Saharan Africa, South East Asia and South America. For 2 URFs we found highly similar pol sequences circulating in the UK. We detected 31 phylogenetic clusters using the full dataset: 25 pairs (mostly subtypes B and C), 4 triplets and 2 quadruplets. Some of these were not consistent across different genes due to inter- and intra-subtype recombination. Clusters involved 70 sequences, 19.2% of the dataset. Conclusions The initial analysis of genome sequences detected substantial hidden variability in the London HIV epidemic. Analysing full genome sequences, as opposed to only PR+RT, identified previously undetected recombinants. It provided a more reliable description of CRFs (that would be otherwise misclassified) and transmission clusters. PMID:29389981
An RNAi in silico approach to find an optimal shRNA cocktail against HIV-1

PubMed Central

2010-01-01

Background HIV-1 can be inhibited by RNA interference in vitro through the expression of short hairpin RNAs (shRNAs) that target conserved genome sequences. In silico shRNA design for HIV has lacked a detailed study of virus variability constituting a possible breaking point in a clinical setting. We designed shRNAs against HIV-1 considering the variability observed in naïve and drug-resistant isolates available at public databases. Methods A Bioperl-based algorithm was developed to automatically scan multiple sequence alignments of HIV, while evaluating the possibility of identifying dominant and subdominant viral variants that could be used as efficient silencing molecules. Student t-test and Bonferroni Dunn correction test were used to assess statistical significance of our findings. Results Our in silico approach identified the most common viral variants within highly conserved genome regions, with a calculated free energy of ≥ -6.6 kcal/mol. This is crucial for strand loading to RISC complex and for a predicted silencing efficiency score, which could be used in combination for achieving over 90% silencing. Resistant and naïve isolate variability revealed that the most frequent shRNA per region targets a maximum of 85% of viral sequences. Adding more divergent sequences maintained this percentage. Specific sequence features that have been found to be related with higher silencing efficiency were hardly accomplished in conserved regions, even when lower entropy values correlated with better scores. We identified a conserved region among most HIV-1 genomes, which meets as many sequence features for efficient silencing. Conclusions HIV-1 variability is an obstacle to achieving absolute silencing using shRNAs designed against a consensus sequence, mainly because there are many functional viral variants. Our shRNA cocktail could be truly effective at silencing dominant and subdominant naïve viral variants. Additionally, resistant isolates might be targeted under specific antiretroviral selective pressure, but in both cases these should be tested exhaustively prior to clinical use. PMID:21172023
Genome analysis of canine astroviruses reveals genetic heterogeneity and suggests possible inter-species transmission.

PubMed

Mihalov-Kovács, Eszter; Martella, Vito; Lanave, Gianvito; Bodnar, Livia; Fehér, Enikő; Marton, Szilvia; Kemenesi, Gábor; Jakab, Ferenc; Bányai, Krisztián

2017-03-15

Canine astrovirus RNA was detected in the stools of 17/63 (26.9%) samples, using either a broadly reactive consensus RT-PCR for astroviruses or random RT-PCR coupled with massive deep sequencing. The complete or nearly complete genome sequence of five canine astroviruses was reconstructed that allowed mapping the genome organization and to investigate the genetic diversity of these viruses. The genome was about 6.6kb in length and contained three open reading frames (ORFs) flanked by a 5' UTR, and a 3' UTR plus a poly-A tail. ORF1a and ORF1b overlapped by 43 nucleotides while the ORF2 overlapped by 8 nucleotides with the 3' end of ORF1b. Upon genome comparison, four strains (HUN/2012/2, HUN/2012/6, HUN/2012/115, and HUN/2012/135) were more related genetically to each other and to UK canine astroviruses (88-96% nt identity), whilst strain HUN/2012/126 was more divergent (75-76% nt identity). In the ORF1b and ORF2, strains HUN/2012/2, HUN/2012/6, and HUN/2012/135 were related genetically to other canine astroviruses identified formerly in Europe and China, whereas strain HUN/2012/126 was related genetically to a divergent canine astrovirus strain, ITA/2010/Zoid. For one canine astrovirus, HUN/2012/8, only a 3.2kb portion of the genome, at the 3' end, could be determined. Interestingly, this strain possessed unique genetic signatures (including a longer ORF1b/ORF2 overlap and a longer 3'UTR) and it was divergent in both ORF1b and ORF2 from all other canine astroviruses, with the highest nucleotide sequence identity (68% and 63%, respectively) to a mink astrovirus, thus suggesting a possible event of interspecies transmission. The genetic heterogeneity of canine astroviruses may pose a challenge for the diagnostics and for future prophylaxis strategies. Copyright © 2016 Elsevier B.V. All rights reserved.
Diversity and Evolution of Bacterial Twin Arginine Translocase Protein, TatC, Reveals a Protein Secretion System That Is Evolving to Fit Its Environmental Niche

PubMed Central

Simone, Domenico; Bay, Denice C.; Leach, Thorin; Turner, Raymond J.

2013-01-01

Background The twin-arginine translocation (Tat) protein export system enables the transport of fully folded proteins across a membrane. This system is composed of two integral membrane proteins belonging to TatA and TatC protein families and in some systems a third component, TatB, a homolog of TatA. TatC participates in substrate protein recognition through its interaction with a twin arginine leader peptide sequence. Methodology/Principal Findings The aim of this study was to explore TatC diversity, evolution and sequence conservation in bacteria to identify how TatC is evolving and diversifying in various bacterial phyla. Surveying bacterial genomes revealed that 77% of all species possess one or more tatC loci and half of these classes possessed only tatC and tatA genes. Phylogenetic analysis of diverse TatC homologues showed that they were primarily inherited but identified a small subset of taxonomically unrelated bacteria that exhibited evidence supporting lateral gene transfer within an ecological niche. Examination of bacilli tatCd/tatCy isoform operons identified a number of known and potentially new Tat substrate genes based on their frequent association to tatC loci. Evolutionary analysis of these Bacilli isoforms determined that TatCy was the progenitor of TatCd. A bacterial TatC consensus sequence was determined and highlighted conserved and variable regions within a three dimensional model of the Escherichia coli TatC protein. Comparative analysis between the TatC consensus sequence and Bacilli TatCd/y isoform consensus sequences revealed unique sites that may contribute to isoform substrate specificity or make TatA specific contacts. Synonymous to non-synonymous nucleotide substitution analyses of bacterial tatC homologues determined that tatC sequence variation differs dramatically between various classes and suggests TatC specialization in these species. Conclusions/Significance TatC proteins appear to be diversifying within particular bacterial classes and its specialization may be driven by the substrates it transports and the environment of its host. PMID:24236045
Site directed recombination

DOEpatents

Jurka, Jerzy W.

1997-01-01

Enhanced homologous recombination is obtained by employing a consensus sequence which has been found to be associated with integration of repeat sequences, such as Alu and ID. The consensus sequence or sequence having a single transition mutation determines one site of a double break which allows for high efficiency of integration at the site. By introducing single or double stranded DNA having the consensus sequence flanking region joined to a sequence of interest, one can reproducibly direct integration of the sequence of interest at one or a limited number of sites. In this way, specific sites can be identified and homologous recombination achieved at the site by employing a second flanking sequence associated with a sequence proximal to the 3'-nick.
Common and distinct DNA-binding and regulatory activities of the BEN-solo transcription factor family.

PubMed

Dai, Qi; Ren, Aiming; Westholm, Jakub O; Duan, Hong; Patel, Dinshaw J; Lai, Eric C

2015-01-01

Recently, the BEN (BANP, E5R, and NAC1) domain was recognized as a new class of conserved DNA-binding domain. The fly genome encodes three proteins that bear only a single BEN domain ("BEN-solo" factors); namely, Insensitive (Insv), Bsg25A (Elba1), and CG9883 (Elba2). Insv homodimers preferentially bind CCAATTGG palindromes throughout the genome to mediate transcriptional repression, whereas Bsg25A and Elba2 heterotrimerize with their obligate adaptor, Elba3 (i.e., the ELBA complex), to recognize a CCAATAAG motif in the Fab-7 insulator. While these data suggest distinct DNA-binding properties of BEN-solo proteins, we performed reporter assays that indicate that both Bsg25A and Elba2 can individually recognize Insv consensus sites efficiently. We confirmed this by solving the structure of Bsg25A complexed to the Insv site, which showed that key aspects of the BEN:DNA recognition strategy are similar between these proteins. We next show that both Insv and ELBA proteins are competent to mediate transcriptional repression via Insv consensus sequences but that the ELBA complex appears to be selective for the ELBA site. Reciprocally, genome-wide analysis reveals that Insv exhibits significant cobinding to class I insulator elements, indicating that it may also contribute to insulator function. Indeed, we observed abundant Insv binding within the Hox complexes with substantial overlaps with class I insulators, many of which bear Insv consensus sites. Moreover, Insv coimmunoprecipitates with the class I insulator factor CP190. Finally, we observed that Insv harbors exclusive activity among fly BEN-solo factors with respect to regulation of Notch-mediated cell fate choices in the peripheral nervous system. This in vivo activity is recapitulated by BEND6, a mammalian BEN-solo factor that conserves the Notch corepressor function of Insv but not its capacity to bind Insv consensus sites. Altogether, our data define an array of common and distinct biochemical and functional properties of this new family of transcription factors. © 2015 Dai et al.; Published by Cold Spring Harbor Laboratory Press.
Common and distinct DNA-binding and regulatory activities of the BEN-solo transcription factor family

PubMed Central

Dai, Qi; Ren, Aiming; Westholm, Jakub O.; Duan, Hong; Patel, Dinshaw J.

2015-01-01

Recently, the BEN (BANP, E5R, and NAC1) domain was recognized as a new class of conserved DNA-binding domain. The fly genome encodes three proteins that bear only a single BEN domain (“BEN-solo” factors); namely, Insensitive (Insv), Bsg25A (Elba1), and CG9883 (Elba2). Insv homodimers preferentially bind CCAATTGG palindromes throughout the genome to mediate transcriptional repression, whereas Bsg25A and Elba2 heterotrimerize with their obligate adaptor, Elba3 (i.e., the ELBA complex), to recognize a CCAATAAG motif in the Fab-7 insulator. While these data suggest distinct DNA-binding properties of BEN-solo proteins, we performed reporter assays that indicate that both Bsg25A and Elba2 can individually recognize Insv consensus sites efficiently. We confirmed this by solving the structure of Bsg25A complexed to the Insv site, which showed that key aspects of the BEN:DNA recognition strategy are similar between these proteins. We next show that both Insv and ELBA proteins are competent to mediate transcriptional repression via Insv consensus sequences but that the ELBA complex appears to be selective for the ELBA site. Reciprocally, genome-wide analysis reveals that Insv exhibits significant cobinding to class I insulator elements, indicating that it may also contribute to insulator function. Indeed, we observed abundant Insv binding within the Hox complexes with substantial overlaps with class I insulators, many of which bear Insv consensus sites. Moreover, Insv coimmunoprecipitates with the class I insulator factor CP190. Finally, we observed that Insv harbors exclusive activity among fly BEN-solo factors with respect to regulation of Notch-mediated cell fate choices in the peripheral nervous system. This in vivo activity is recapitulated by BEND6, a mammalian BEN-solo factor that conserves the Notch corepressor function of Insv but not its capacity to bind Insv consensus sites. Altogether, our data define an array of common and distinct biochemical and functional properties of this new family of transcription factors. PMID:25561495
Veillonella, Firmicutes: Microbes disguised as Gram negatives.

PubMed

Vesth, Tammi; Ozen, Aslı; Andersen, Sandra C; Kaas, Rolf Sommer; Lukjancenko, Oksana; Bohlin, Jon; Nookaew, Intawat; Wassenaar, Trudy M; Ussery, David W

2013-12-20

The Firmicutes represent a major component of the intestinal microflora. The intestinal Firmicutes are a large, diverse group of organisms, many of which are poorly characterized due to their anaerobic growth requirements. Although most Firmicutes are Gram positive, members of the class Negativicutes, including the genus Veillonella, stain Gram negative. Veillonella are among the most abundant organisms of the oral and intestinal microflora of animals and humans, in spite of being strict anaerobes. In this work, the genomes of 24 Negativicutes, including eight Veillonella spp., are compared to 20 other Firmicutes genomes; a further 101 prokaryotic genomes were included, covering 26 phyla. Thus a total of 145 prokaryotic genomes were analyzed by various methods to investigate the apparent conflict of the Veillonella Gram stain and their taxonomic position within the Firmicutes. Comparison of the genome sequences confirms that the Negativicutes are distantly related to Clostridium spp., based on 16S rRNA, complete genomic DNA sequences, and a consensus tree based on conserved proteins. The genus Veillonella is relatively homogeneous: inter-genus pair-wise comparison identifies at least 1,350 shared proteins, although less than half of these are found in any given Clostridium genome. Only 27 proteins are found conserved in all analyzed prokaryote genomes. Veillonella has distinct metabolic properties, and significant similarities to genomes of Proteobacteria are not detected, with the exception of a shared LPS biosynthesis pathway. The clade within the class Negativicutes to which the genus Veillonella belongs exhibits unique properties, most of which are in common with Gram-positives and some with Gram negatives. They are only distantly related to Clostridia, but are even less closely related to Gram-negative species. Though the Negativicutes stain Gram-negative and possess two membranes, the genome and proteome analysis presented here confirm their place within the (mainly) Gram positive phylum of the Firmicutes. Further studies are required to unveil the evolutionary history of the Veillonella and other Negativicutes.

Veillonella, Firmicutes: Microbes disguised as Gram negatives

PubMed Central

Vesth, Tammi; Ozen, Aslı; Andersen, Sandra C.; Kaas, Rolf Sommer; Lukjancenko, Oksana; Bohlin, Jon; Nookaew, Intawat; Wassenaar, Trudy M.; Ussery, David W.

2013-01-01

The Firmicutes represent a major component of the intestinal microflora. The intestinal Firmicutes are a large, diverse group of organisms, many of which are poorly characterized due to their anaerobic growth requirements. Although most Firmicutes are Gram positive, members of the class Negativicutes, including the genus Veillonella, stain Gram negative. Veillonella are among the most abundant organisms of the oral and intestinal microflora of animals and humans, in spite of being strict anaerobes. In this work, the genomes of 24 Negativicutes, including eight Veillonella spp., are compared to 20 other Firmicutes genomes; a further 101 prokaryotic genomes were included, covering 26 phyla. Thus a total of 145 prokaryotic genomes were analyzed by various methods to investigate the apparent conflict of the Veillonella Gram stain and their taxonomic position within the Firmicutes. Comparison of the genome sequences confirms that the Negativicutes are distantly related to Clostridium spp., based on 16S rRNA, complete genomic DNA sequences, and a consensus tree based on conserved proteins. The genus Veillonella is relatively homogeneous: inter-genus pair-wise comparison identifies at least 1,350 shared proteins, although less than half of these are found in any given Clostridium genome. Only 27 proteins are found conserved in all analyzed prokaryote genomes. Veillonella has distinct metabolic properties, and significant similarities to genomes of Proteobacteria are not detected, with the exception of a shared LPS biosynthesis pathway. The clade within the class Negativicutes to which the genus Veillonella belongs exhibits unique properties, most of which are in common with Gram-positives and some with Gram negatives. They are only distantly related to Clostridia, but are even less closely related to Gram-negative species. Though the Negativicutes stain Gram-negative and possess two membranes, the genome and proteome analysis presented here confirm their place within the (mainly) Gram positive phylum of the Firmicutes. Further studies are required to unveil the evolutionary history of the Veillonella and other Negativicutes. PMID:24976898
In silico prediction of splice-altering single nucleotide variants in the human genome.

PubMed

Jian, Xueqiu; Boerwinkle, Eric; Liu, Xiaoming

2014-12-16

In silico tools have been developed to predict variants that may have an impact on pre-mRNA splicing. The major limitation of the application of these tools to basic research and clinical practice is the difficulty in interpreting the output. Most tools only predict potential splice sites given a DNA sequence without measuring splicing signal changes caused by a variant. Another limitation is the lack of large-scale evaluation studies of these tools. We compared eight in silico tools on 2959 single nucleotide variants within splicing consensus regions (scSNVs) using receiver operating characteristic analysis. The Position Weight Matrix model and MaxEntScan outperformed other methods. Two ensemble learning methods, adaptive boosting and random forests, were used to construct models that take advantage of individual methods. Both models further improved prediction, with outputs of directly interpretable prediction scores. We applied our ensemble scores to scSNVs from the Catalogue of Somatic Mutations in Cancer database. Analysis showed that predicted splice-altering scSNVs are enriched in recurrent scSNVs and known cancer genes. We pre-computed our ensemble scores for all potential scSNVs across the human genome, providing a whole genome level resource for identifying splice-altering scSNVs discovered from large-scale sequencing studies.
A large outbreak of acute gastroenteritis in Shippensburg, Pennsylvania, 1972 revisited: evidence for common source exposure to a recombinant GII.Pg/GII.3 norovirus.

PubMed

Johnson, J A; Parra, G I; Levenson, E A; Green, K Y

2017-06-01

Historical outbreaks can be an important source of information in the understanding of norovirus evolution and epidemiology. Here, we revisit an outbreak of undiagnosed gastroenteritis that occurred in Shippensburg, Pennsylvania in 1972. Nearly 5000 people fell ill over the course of 10 days. Symptoms included diarrhea, vomiting, stomach cramps, and fever, lasting for a median of 24 h. Using current techniques, including next-generation sequencing of full-length viral genomic amplicons, we identified an unusual norovirus recombinant (GII.Pg/GII.3) in nine of 15 available stool samples from the outbreak. This particular recombinant virus has not been reported in recent decades, although GII.3 and GII.Pg genotypes have been detected individually in current epidemic strains. The consensus nucleotide sequences were nearly identical among the four viral genomes analysed, although each strain had three to seven positions in the genome with heterogenous non-synonymous nucleotide subpopulations. Two of these resulting amino acid polymorphisms were conserved in frequency among all four cases, consistent with common source exposure and successful transmission of a mixed viral population. Continued investigation of variant nucleotide populations and recombination events among ancestral norovirus strains such as the Shippensburg virus may provide unique insight into the origin of contemporary strains.
PredictSNP2: A Unified Platform for Accurately Evaluating SNP Effects by Exploiting the Different Characteristics of Variants in Distinct Genomic Regions

PubMed Central

Brezovský, Jan

2016-01-01

An important message taken from human genome sequencing projects is that the human population exhibits approximately 99.9% genetic similarity. Variations in the remaining parts of the genome determine our identity, trace our history and reveal our heritage. The precise delineation of phenotypically causal variants plays a key role in providing accurate personalized diagnosis, prognosis, and treatment of inherited diseases. Several computational methods for achieving such delineation have been reported recently. However, their ability to pinpoint potentially deleterious variants is limited by the fact that their mechanisms of prediction do not account for the existence of different categories of variants. Consequently, their output is biased towards the variant categories that are most strongly represented in the variant databases. Moreover, most such methods provide numeric scores but not binary predictions of the deleteriousness of variants or confidence scores that would be more easily understood by users. We have constructed three datasets covering different types of disease-related variants, which were divided across five categories: (i) regulatory, (ii) splicing, (iii) missense, (iv) synonymous, and (v) nonsense variants. These datasets were used to develop category-optimal decision thresholds and to evaluate six tools for variant prioritization: CADD, DANN, FATHMM, FitCons, FunSeq2 and GWAVA. This evaluation revealed some important advantages of the category-based approach. The results obtained with the five best-performing tools were then combined into a consensus score. Additional comparative analyses showed that in the case of missense variations, protein-based predictors perform better than DNA sequence-based predictors. A user-friendly web interface was developed that provides easy access to the five tools’ predictions, and their consensus scores, in a user-understandable format tailored to the specific features of different categories of variations. To enable comprehensive evaluation of variants, the predictions are complemented with annotations from eight databases. The web server is freely available to the community at http://loschmidt.chemi.muni.cz/predictsnp2. PMID:27224906
PredictSNP2: A Unified Platform for Accurately Evaluating SNP Effects by Exploiting the Different Characteristics of Variants in Distinct Genomic Regions.

PubMed

Bendl, Jaroslav; Musil, Miloš; Štourač, Jan; Zendulka, Jaroslav; Damborský, Jiří; Brezovský, Jan

2016-05-01

An important message taken from human genome sequencing projects is that the human population exhibits approximately 99.9% genetic similarity. Variations in the remaining parts of the genome determine our identity, trace our history and reveal our heritage. The precise delineation of phenotypically causal variants plays a key role in providing accurate personalized diagnosis, prognosis, and treatment of inherited diseases. Several computational methods for achieving such delineation have been reported recently. However, their ability to pinpoint potentially deleterious variants is limited by the fact that their mechanisms of prediction do not account for the existence of different categories of variants. Consequently, their output is biased towards the variant categories that are most strongly represented in the variant databases. Moreover, most such methods provide numeric scores but not binary predictions of the deleteriousness of variants or confidence scores that would be more easily understood by users. We have constructed three datasets covering different types of disease-related variants, which were divided across five categories: (i) regulatory, (ii) splicing, (iii) missense, (iv) synonymous, and (v) nonsense variants. These datasets were used to develop category-optimal decision thresholds and to evaluate six tools for variant prioritization: CADD, DANN, FATHMM, FitCons, FunSeq2 and GWAVA. This evaluation revealed some important advantages of the category-based approach. The results obtained with the five best-performing tools were then combined into a consensus score. Additional comparative analyses showed that in the case of missense variations, protein-based predictors perform better than DNA sequence-based predictors. A user-friendly web interface was developed that provides easy access to the five tools' predictions, and their consensus scores, in a user-understandable format tailored to the specific features of different categories of variations. To enable comprehensive evaluation of variants, the predictions are complemented with annotations from eight databases. The web server is freely available to the community at http://loschmidt.chemi.muni.cz/predictsnp2.
A synthetic promoter library for constitutive gene expression in Lactobacillus plantarum.

PubMed

Rud, Ida; Jensen, Peter Ruhdal; Naterstad, Kristine; Axelsson, Lars

2006-04-01

A synthetic promoter library (SPL) for Lactobacillus plantarum has been developed, which generalizes the approach for obtaining synthetic promoters. The consensus sequence, derived from rRNA promoters extracted from the L. plantarum WCFS1 genome, was kept constant, and the non-consensus sequences were randomized. Construction of the SPL was performed in a vector (pSIP409) previously developed for high-level, inducible gene expression in L. plantarum and Lactobacillus sakei. A wide range of promoter strengths was obtained with the approach, covering 3-4 logs of expression levels in small increments of activity. The SPL was evaluated for the ability to drive beta-glucuronidase (GusA) and aminopeptidase N (PepN) expression. Protein production from the synthetic promoters was constitutive, and the most potent promoters gave high protein production with levels comparable to those of native rRNA promoters, and production of PepN protein corresponding to approximately 10-15 % of the total cellular protein. High correlation was obtained between the activities of promoters when tested in L. sakei and L. plantarum, which indicates the potential of the SPL for other Lactobacillus species. The SPL enables fine-tuning of stable gene expression for various applications in L. plantarum.
“One code to find them all”: a perl tool to conveniently parse RepeatMasker output files

PubMed Central

2014-01-01

Background Of the different bioinformatic methods used to recover transposable elements (TEs) in genome sequences, one of the most commonly used procedures is the homology-based method proposed by the RepeatMasker program. RepeatMasker generates several output files, including the .out file, which provides annotations for all detected repeats in a query sequence. However, a remaining challenge consists of identifying the different copies of TEs that correspond to the identified hits. This step is essential for any evolutionary/comparative analysis of the different copies within a family. Different possibilities can lead to multiple hits corresponding to a unique copy of an element, such as the presence of large deletions/insertions or undetermined bases, and distinct consensus corresponding to a single full-length sequence (like for long terminal repeat (LTR)-retrotransposons). These possibilities must be taken into account to determine the exact number of TE copies. Results We have developed a perl tool that parses the RepeatMasker .out file to better determine the number and positions of TE copies in the query sequence, in addition to computing quantitative information for the different families. To determine the accuracy of the program, we tested it on several RepeatMasker .out files corresponding to two organisms (Drosophila melanogaster and Homo sapiens) for which the TE content has already been largely described and which present great differences in genome size, TE content, and TE families. Conclusions Our tool provides access to detailed information concerning the TE content in a genome at the family level from the .out file of RepeatMasker. This information includes the exact position and orientation of each copy, its proportion in the query sequence, and its quality compared to the reference element. In addition, our tool allows a user to directly retrieve the sequence of each copy and obtain the same detailed information at the family level when a local library with incomplete TE class/subclass information was used with RepeatMasker. We hope that this tool will be helpful for people working on the distribution and evolution of TEs within genomes.
Definition of a consensus DNA-binding site for PecS, a global regulator of virulence gene expression in Erwinia chrysanthemi and identification of new members of the PecS regulon.

PubMed

Rouanet, Carine; Reverchon, Sylvie; Rodionov, Dmitry A; Nasser, William

2004-07-16

In Erwinia chrysanthemi, production of pectic enzymes is modulated by a complex network involving several regulators. One of them, PecS, which belongs to the MarR family, also controls the synthesis of various other virulence factors, such as cellulases and indigoidine. Here, the PecS consensus-binding site is defined by combining a systematic evolution of ligands by an exponential enrichment approach and mutational analyses. The consensus consists of a 23-base pair palindromic-like sequence (C(-11)G(-10)A(-9)N(-8)W(-7)T(-6)C(-5)G(-4)T(-3)A(-2))T(-1)A(0)T(1)(T(2)A(3)C(4)G(5)A(6)N(7)N(8)N(9)C(10)G(11)). Mutational experiments revealed that (i) the palindromic organization is required for the binding of PecS, (ii) the very conserved part of the consensus (-6 to 6) allows for a specific interaction with PecS, but the presence of the relatively degenerated bases located apart significantly increases PecS affinity, (iii) the four bases G, A, T, and C are required for efficient binding of PecS, and (iv) the presence of several binding sites on the same promoter increases the affinity of PecS. This consensus is detected in the regions involved in PecS binding on the previously characterized target genes. This variable consensus is in agreement with the observation that the members of the MarR family are able to bind various DNA targets as dimers by means of a winged helix DNA-binding motif. Binding of PecS on a promoter region containing the defined consensus results in a repression of gene transcription in vitro. Preliminary scanning of the E. chrysanthemi genome sequence with the consensus revealed the presence of strong PecS-binding sites in the intergenic region between fliE and fliFGHIJKLMNOPQR which encode proteins involved in the biogenesis of flagellum. Accordingly, PecS directly represses fliE expression. Thus, PecS seems to control the synthesis of virulence factors required for the key steps of plant infection.
Consensus pan-genome assembly of the specialised wine bacterium Oenococcus oeni.

PubMed

Sternes, Peter R; Borneman, Anthony R

2016-04-27

Oenococcus oeni is a lactic acid bacterium that is specialised for growth in the ecological niche of wine, where it is noted for its ability to perform the secondary, malolactic fermentation that is often required for many types of wine. Expanding the understanding of strain-dependent genetic variations in its small and streamlined genome is important for realising its full potential in industrial fermentation processes. Whole genome comparison was performed on 191 strains of O. oeni; from this rich source of genomic information consensus pan-genome assemblies of the invariant (core) and variable (flexible) regions of this organism were established. Genetic variation in amino acid biosynthesis and sugar transport and utilisation was found to be common between strains. Furthermore, we characterised previously-unreported intra-specific genetic variations in the natural competence of this microbe. By assembling a consensus pan-genome from a large number of strains, this study provides a tool for researchers to readily compare protein-coding genes across strains and infer functional relationships between genes in conserved syntenic regions. This establishes a foundation for further genetic, and thus phenotypic, research of this industrially-important species.
Genomic organization, expression, and chromosome localization of a third aurora-related kinase gene, Aie1.

PubMed

Hu, H M; Chuang, C K; Lee, M J; Tseng, T C; Tang, T K

2000-11-01

We previously reported two novel testis-specific serine/threonine kinases, Aie1 (mouse) and AIE2 (human), that share high amino acid identities with the kinase domains of fly aurora and yeast Ipl1. Here, we report the entire intron-exon organization of the Aie1 gene and analyze the expression patterns of Aie1 mRNA during testis development. The mouse Aie1 gene spans approximately 14 kb and contains seven exons. The sequences of the exon-intron boundaries of the Aie1 gene conform to the consensus sequences (GT/AG) of the splicing donor and acceptor sites of most eukaryotic genes. Comparative genomic sequencing revealed that the gene structure is highly conserved between mouse Aie1 and human AIE2. However, much less homology was found in the sequence outside the kinase-coding domains. The Aie1 locus was mapped to mouse chromosome 7A2-A3 by fluorescent in situ hybridization. Northern blot analysis indicates that Aie1 mRNA likely is expressed at a low level on day 14 and reaches its plateau on day 21 in the developing postnatal testis. RNA in situ hybridization indicated that the expression of the Aie1 transcript was restricted to meiotically active germ cells, with the highest levels detected in spermatocytes at the late pachytene stage. These findings suggest that Aie1 plays a role in spermatogenesis.
Transcriptome analysis of the honey bee fungal pathogen, Ascosphaera apis: implications for host pathogenesis

PubMed Central

2012-01-01

Background We present a comprehensive transcriptome analysis of the fungus Ascosphaera apis, an economically important pathogen of the Western honey bee (Apis mellifera) that causes chalkbrood disease. Our goals were to further annotate the A. apis reference genome and to identify genes that are candidates for being differentially expressed during host infection versus axenic culture. Results We compared A. apis transcriptome sequence from mycelia grown on liquid or solid media with that dissected from host-infected tissue. 454 pyrosequencing provided 252 Mb of filtered sequence reads from both culture types that were assembled into 10,087 contigs. Transcript contigs, protein sequences from multiple fungal species, and ab initio gene predictions were included as evidence sources in the Maker gene prediction pipeline, resulting in 6,992 consensus gene models. A phylogeny based on 12 of these protein-coding loci further supported the taxonomic placement of Ascosphaera as sister to the core Onygenales. Several common protein domains were less abundant in A. apis compared with related ascomycete genomes, particularly cytochrome p450 and protein kinase domains. A novel gene family was identified that has expanded in some ascomycete lineages, but not others. We manually annotated genes with homologs in other fungal genomes that have known relevance to fungal virulence and life history. Functional categories of interest included genes involved in mating-type specification, intracellular signal transduction, and stress response. Computational and manual annotations have been made publicly available on the Bee Pests and Pathogens website. Conclusions This comprehensive transcriptome analysis substantially enhances our understanding of the A. apis genome and its expression during infection of honey bee larvae. It also provides resources for future molecular studies of chalkbrood disease and ultimately improved disease management. PMID:22747707
Customisation of the exome data analysis pipeline using a combinatorial approach.

PubMed

Pattnaik, Swetansu; Vaidyanathan, Srividya; Pooja, Durgad G; Deepak, Sa; Panda, Binay

2012-01-01

The advent of next generation sequencing (NGS) technologies have revolutionised the way biologists produce, analyse and interpret data. Although NGS platforms provide a cost-effective way to discover genome-wide variants from a single experiment, variants discovered by NGS need follow up validation due to the high error rates associated with various sequencing chemistries. Recently, whole exome sequencing has been proposed as an affordable option compared to whole genome runs but it still requires follow up validation of all the novel exomic variants. Customarily, a consensus approach is used to overcome the systematic errors inherent to the sequencing technology, alignment and post alignment variant detection algorithms. However, the aforementioned approach warrants the use of multiple sequencing chemistry, multiple alignment tools, multiple variant callers which may not be viable in terms of time and money for individual investigators with limited informatics know-how. Biologists often lack the requisite training to deal with the huge amount of data produced by NGS runs and face difficulty in choosing from the list of freely available analytical tools for NGS data analysis. Hence, there is a need to customise the NGS data analysis pipeline to preferentially retain true variants by minimising the incidence of false positives and make the choice of right analytical tools easier. To this end, we have sampled different freely available tools used at the alignment and post alignment stage suggesting the use of the most suitable combination determined by a simple framework of pre-existing metrics to create significant datasets.
SSMART: Sequence-structure motif identification for RNA-binding proteins.

PubMed

Munteanu, Alina; Mukherjee, Neelanjan; Ohler, Uwe

2018-06-11

RNA-binding proteins (RBPs) regulate every aspect of RNA metabolism and function. There are hundreds of RBPs encoded in the eukaryotic genomes, and each recognize its RNA targets through a specific mixture of RNA sequence and structure properties. For most RBPs, however, only a primary sequence motif has been determined, while the structure of the binding sites is uncharacterized. We developed SSMART, an RNA motif finder that simultaneously models the primary sequence and the structural properties of the RNA targets sites. The sequence-structure motifs are represented as consensus strings over a degenerate alphabet, extending the IUPAC codes for nucleotides to account for secondary structure preferences. Evaluation on synthetic data showed that SSMART is able to recover both sequence and structure motifs implanted into 3'UTR-like sequences, for various degrees of structured/unstructured binding sites. In addition, we successfully used SSMART on high-throughput in vivo and in vitro data, showing that we not only recover the known sequence motif, but also gain insight into the structural preferences of the RBP. Availability: SSMART is freely available at https://ohlerlab.mdc-berlin.de/software/SSMART_137/. Supplementary data are available at Bioinformatics online.
Cassava Brown Streak Virus (Potyviridae) Encodes a Putative Maf/HAM1 Pyrophosphatase Implicated in Reduction of Mutations and a P1 Proteinase That Suppresses RNA Silencing but Contains No HC-Pro ▿

PubMed Central

Mbanzibwa, Deusdedith R.; Tian, Yanping; Mukasa, Settumba B.; Valkonen, Jari P. T.

2009-01-01

The complete positive-sense single-stranded RNA genome of Cassava brown streak virus (CBSV; genus Ipomovirus; Potyviridae) was found to consist of 9,069 nucleotides and predicted to produce a polyprotein of 2,902 amino acids. It was lacking helper-component proteinase but contained a single P1 serine proteinase that strongly suppressed RNA silencing. Besides the exceptional structure of the 5′-proximal part of the genome, CBSV also contained a Maf/HAM1-like sequence (678 nucleotides, 226 amino acids) recombined between the replicase and coat protein domains in the 3′-proximal part of the genome, which is highly conserved in Potyviridae. HAM1 was flanked by consensus proteolytic cleavage sites for ipomovirus NIaPro cysteine proteinase. Homology of CBSV HAM1 with cellular Maf/HAM1 pyrophosphatases suggests that it may intercept noncanonical nucleoside triphosphates to reduce mutagenesis of viral RNA. PMID:19386713
A consensus framework map of durum wheat (Triticum durum Desf.) suitable for linkage disequilibrium analysis and genome-wide association mapping

USDA-ARS?s Scientific Manuscript database

Genomics applications in durum (Triticum durum Desf.) wheat have the potential to boost exploitation of genetic resources and to advance understanding of the genetics of important complex traits (e.g. resilience to environmental and biotic stresses). A dense and accurate consensus map specific for ...
To Clone or Not To Clone: Method Analysis for Retrieving Consensus Sequences In Ancient DNA Samples

PubMed Central

Winters, Misa; Barta, Jodi Lynn; Monroe, Cara; Kemp, Brian M.

2011-01-01

The challenges associated with the retrieval and authentication of ancient DNA (aDNA) evidence are principally due to post-mortem damage which makes ancient samples particularly prone to contamination from “modern” DNA sources. The necessity for authentication of results has led many aDNA researchers to adopt methods considered to be “gold standards” in the field, including cloning aDNA amplicons as opposed to directly sequencing them. However, no standardized protocol has emerged regarding the necessary number of clones to sequence, how a consensus sequence is most appropriately derived, or how results should be reported in the literature. In addition, there has been no systematic demonstration of the degree to which direct sequences are affected by damage or whether direct sequencing would provide disparate results from a consensus of clones. To address this issue, a comparative study was designed to examine both cloned and direct sequences amplified from ∼3,500 year-old ancient northern fur seal DNA extracts. Majority rules and the Consensus Confidence Program were used to generate consensus sequences for each individual from the cloned sequences, which exhibited damage at 31 of 139 base pairs across all clones. In no instance did the consensus of clones differ from the direct sequence. This study demonstrates that, when appropriate, cloning need not be the default method, but instead, should be used as a measure of authentication on a case-by-case basis, especially when this practice adds time and cost to studies where it may be superfluous. PMID:21738625
A combined strategy involving Sanger and 454 pyrosequencing increases genomic resources to aid in the management of reproduction, disease control and genetic selection in the turbot (Scophthalmus maximus).

PubMed

Ribas, Laia; Pardo, Belén G; Fernández, Carlos; Alvarez-Diós, José Antonio; Gómez-Tato, Antonio; Quiroga, María Isabel; Planas, Josep V; Sitjà-Bobadilla, Ariadna; Martínez, Paulino; Piferrer, Francesc

2013-03-15

Genomic resources for plant and animal species that are under exploitation primarily for human consumption are increasingly important, among other things, for understanding physiological processes and for establishing adequate genetic selection programs. Current available techniques for high-throughput sequencing have been implemented in a number of species, including fish, to obtain a proper description of the transcriptome. The objective of this study was to generate a comprehensive transcriptomic database in turbot, a highly priced farmed fish species in Europe, with potential expansion to other areas of the world, for which there are unsolved production bottlenecks, to understand better reproductive- and immune-related functions. This information is essential to implement marker assisted selection programs useful for the turbot industry. Expressed sequence tags were generated by Sanger sequencing of cDNA libraries from different immune-related tissues after several parasitic challenges. The resulting database ("Turbot 2 database") was enlarged with sequences generated from a 454 sequencing run of brain-hypophysis-gonadal axis-derived RNA obtained from turbot at different development stages. The assembly of Sanger and 454 sequences generated 52,427 consensus sequences ("Turbot 3 database"), of which 23,661 were successfully annotated. A total of 1,410 sequences were confirmed to be related to reproduction and key genes involved in sex differentiation and maturation were identified for the first time in turbot (AR, AMH, SRY-related genes, CYP19A, ZPGs, STAR FSHR, etc.). Similarly, 2,241 sequences were related to the immune system and several novel key immune genes were identified (BCL, TRAF, NCK, CD28 and TOLLIP, among others). The number of genes of many relevant reproduction- and immune-related pathways present in the database was 50-90% of the total gene count of each pathway. In addition, 1,237 microsatellites and 7,362 single nucleotide polymorphisms (SNPs) were also compiled. Further, 2,976 putative natural antisense transcripts (NATs) including microRNAs were also identified. The combined sequencing strategies employed here significantly increased the turbot genomic resources available, including 34,400 novel sequences. The generated database contains a larger number of genes relevant for reproduction- and immune-associated studies, with an excellent coverage of most genes present in many relevant physiological pathways. This database also allowed the identification of many microsatellites and SNP markers that will be very useful for population and genome screening and a valuable aid in marker assisted selection programs.
A combined strategy involving Sanger and 454 pyrosequencing increases genomic resources to aid in the management of reproduction, disease control and genetic selection in the turbot (Scophthalmus maximus)

PubMed Central

2013-01-01

Background Genomic resources for plant and animal species that are under exploitation primarily for human consumption are increasingly important, among other things, for understanding physiological processes and for establishing adequate genetic selection programs. Current available techniques for high-throughput sequencing have been implemented in a number of species, including fish, to obtain a proper description of the transcriptome. The objective of this study was to generate a comprehensive transcriptomic database in turbot, a highly priced farmed fish species in Europe, with potential expansion to other areas of the world, for which there are unsolved production bottlenecks, to understand better reproductive- and immune-related functions. This information is essential to implement marker assisted selection programs useful for the turbot industry. Results Expressed sequence tags were generated by Sanger sequencing of cDNA libraries from different immune-related tissues after several parasitic challenges. The resulting database (“Turbot 2 database”) was enlarged with sequences generated from a 454 sequencing run of brain-hypophysis-gonadal axis-derived RNA obtained from turbot at different development stages. The assembly of Sanger and 454 sequences generated 52,427 consensus sequences (“Turbot 3 database”), of which 23,661 were successfully annotated. A total of 1,410 sequences were confirmed to be related to reproduction and key genes involved in sex differentiation and maturation were identified for the first time in turbot (AR, AMH, SRY-related genes, CYP19A, ZPGs, STAR FSHR, etc.). Similarly, 2,241 sequences were related to the immune system and several novel key immune genes were identified (BCL, TRAF, NCK, CD28 and TOLLIP, among others). The number of genes of many relevant reproduction- and immune-related pathways present in the database was 50–90% of the total gene count of each pathway. In addition, 1,237 microsatellites and 7,362 single nucleotide polymorphisms (SNPs) were also compiled. Further, 2,976 putative natural antisense transcripts (NATs) including microRNAs were also identified. Conclusions The combined sequencing strategies employed here significantly increased the turbot genomic resources available, including 34,400 novel sequences. The generated database contains a larger number of genes relevant for reproduction- and immune-associated studies, with an excellent coverage of most genes present in many relevant physiological pathways. This database also allowed the identification of many microsatellites and SNP markers that will be very useful for population and genome screening and a valuable aid in marker assisted selection programs. PMID:23497389
Genetic heterogeneity of L-Zagreb mumps virus vaccine strain.

PubMed

Kosutic-Gulija, Tanja; Forcic, Dubravko; Santak, Maja; Ramljak, Ana; Mateljak-Lukacevic, Sanja; Mazuran, Renata

2008-07-10

The most often used mumps vaccine strains Jeryl Lynn (JL), RIT4385, Urabe-AM9, L-Zagreb and L-3 differ in immunogenicity and reactogenicity. Previous analyses showed that JL, Urabe-AM9 and L-3 are genetically heterogeneous. We identified the heterogeneity of L-Zagreb throughout the entire genome. Two major variants were defined: variant A being identical to the consensus sequence of viral seeds and vaccine(s) and variant B which differs from variant A in three nucleotide positions. The difference between viral variants in L-Zagreb strain is insufficient for distinct viral strains to be defined. We demonstrated that proportion of variants in L-Zagreb viral population depends on cell substrate used for viral replication in vitro and in vivo. L-Zagreb strain should be considered as a single strain composed of at least two variant viral genomes.
REC-1 and HIM-5 distribute meiotic crossovers and function redundantly in meiotic double-strand break formation in Caenorhabditis elegans

PubMed Central

Chung, George; Rose, Ann M.; Petalcorin, Mark I.R.; Martin, Julie S.; Kessler, Zebulin; Sanchez-Pulido, Luis; Ponting, Chris P.; Yanowitz, Judith L.; Boulton, Simon J.

2015-01-01

The Caenorhabditis elegans gene rec-1 was the first genetic locus identified in metazoa to affect the distribution of meiotic crossovers along the chromosome. We report that rec-1 encodes a distant paralog of HIM-5, which was discovered by whole-genome sequencing and confirmed by multiple genome-edited alleles. REC-1 is phosphorylated by cyclin-dependent kinase (CDK) in vitro, and mutation of the CDK consensus sites in REC-1 compromises meiotic crossover distribution in vivo. Unexpectedly, rec-1; him-5 double mutants are synthetic-lethal due to a defect in meiotic double-strand break formation. Thus, we uncovered an unexpected robustness to meiotic DSB formation and crossover positioning that is executed by HIM-5 and REC-1 and regulated by phosphorylation. PMID:26385965

Structure of genes for Hsp30 from the white-rot fungus Coriolus versicolor and the increase of their expression by heat shock and exposure to a hazardous chemical.

PubMed

Iimura, Yosuke; Tatsumi, Kenji

2002-07-01

We isolated and analysed two genomic DNAs that encode the heat-shock protein Hsp30 from Coriolus versicolor. The amino acid sequences substitute only three amino acid substitutions. The promoter regions contain the consensus heat-shock element, a xenobiotic-response element, a stress-response element, and a metal-response element. The levels of mRNAs for Hsp30 increased markedly after exposure of C. versicolor to pentachlorophenol and levels were higher than those after heat shock.
miRanalyzer: an update on the detection and analysis of microRNAs in high-throughput sequencing experiments

PubMed Central

Hackenberg, Michael; Rodríguez-Ezpeleta, Naiara; Aransay, Ana M.

2011-01-01

We present a new version of miRanalyzer, a web server and stand-alone tool for the detection of known and prediction of new microRNAs in high-throughput sequencing experiments. The new version has been notably improved regarding speed, scope and available features. Alignments are now based on the ultrafast short-read aligner Bowtie (granting also colour space support, allowing mismatches and improving speed) and 31 genomes, including 6 plant genomes, can now be analysed (previous version contained only 7). Differences between plant and animal microRNAs have been taken into account for the prediction models and differential expression of both, known and predicted microRNAs, between two conditions can be calculated. Additionally, consensus sequences of predicted mature and precursor microRNAs can be obtained from multiple samples, which increases the reliability of the predicted microRNAs. Finally, a stand-alone version of the miRanalyzer that is based on a local and easily customized database is also available; this allows the user to have more control on certain parameters as well as to use specific data such as unpublished assemblies or other libraries that are not available in the web server. miRanalyzer is available at http://bioinfo2.ugr.es/miRanalyzer/miRanalyzer.php. PMID:21515631
Turning gold into ‘junk’: transposable elements utilize central proteins of cellular networks

PubMed Central

Abrusán, György; Szilágyi, András; Zhang, Yang; Papp, Balázs

2013-01-01

The numerous discovered cases of domesticated transposable element (TE) proteins led to the recognition that TEs are a significant source of evolutionary innovation. However, much less is known about the reverse process, whether and to what degree the evolution of TEs is influenced by the genome of their hosts. We addressed this issue by searching for cases of incorporation of host genes into the sequence of TEs and examined the systems-level properties of these genes using the Saccharomyces cerevisiae and Drosophila melanogaster genomes. We identified 51 cases where the evolutionary scenario was the incorporation of a host gene fragment into a TE consensus sequence, and we show that both the yeast and fly homologues of the incorporated protein sequences have central positions in the cellular networks. An analysis of selective pressure (Ka/Ks ratio) detected significant selection in 37% of the cases. Recent research on retrovirus-host interactions shows that virus proteins preferentially target hubs of the host interaction networks enabling them to take over the host cell using only a few proteins. We propose that TEs face a similar evolutionary pressure to evolve proteins with high interacting capacities and take some of the necessary protein domains directly from their hosts. PMID:23341038
Evolutionary diversity and potential recombinogenic role of integration targets of non-LTR retrotransposons

PubMed Central

Gentles, Andrew J.; Kohany, Oleksiy; Jurka, Jerzy

2005-01-01

Short interspersed elements (SINEs) make up a significant fraction of total DNA in mammalian genomes, providing a rich substrate for chromosomal rearrangements by SINE-SINE recombinations. Proliferation of mammalian SINEs is mediated primarily by LINE1 (L1) non-LTR retrotransposons that preferentially integrate at DNA sequence targets with average length ~15 bp and containing conserved endonucleolytic nicking signals at both ends. We report that sequence variations in the first of the two nicking signals, represented by a 5′TT-AAAA consensus sequence, affect the position of the second signal thus leading to target site duplications (TSDs) of different lengths. The length distribution of TSDs appears to be affected also by L1-encoded enzyme variants, since targets with the same 5′ nicking site can be of different average length in different mammalian species. Taking this into account, we re-analyzed the second nicking site and found that it is larger and includes more conserved sites than previously appreciated, with a consensus of 5′ANTNTN-AA. We also studied potential involvement of the nicking sites in stimulating recombinations between SINE elements. We determined that SINE elements retaining TSDs with perfect 5′TT-AAAA nicking sites appear to be lost relatively rapidly from the human and rat genomes, and less rapidly from dog. We speculate that the introduction of single-strand DNA breaks induced by recurring endonucleolytic attacks at these sites, combined with the ubiquitousness of SINEs, may significantly promote recombination between repetitive elements, leading to the observed losses. At the same time new L1 subfamilies may be selected for “incompatibility” with pre-existing targets. This provides a possible driving force for the continual emergence of new L1 subfamilies which, in turn, may affect selection of L1-dependent SINE subfamilies. PMID:15944437
Substrate Specifity Profiling of the Aspergillus fumigatus Proteolytic Secretome Reveals Consensus Motifs with Predominance of Ile/Leu and Phe/Tyr

PubMed Central

Watson, Douglas S.; Feng, Xizhi; Askew, David S.; Jambunathan, Kalyani; Kodukula, Krishna; Galande, Amit K.

2011-01-01

Background The filamentous fungus Aspergillus fumigatus (AF) can cause devastating infections in immunocompromised individuals. Early diagnosis improves patient outcomes but remains challenging because of the limitations of current methods. To augment the clinician's toolkit for rapid diagnosis of AF infections, we are investigating AF secreted proteases as novel diagnostic targets. The AF genome encodes up to 100 secreted proteases, but fewer than 15 of these enzymes have been characterized thus far. Given the large number of proteases in the genome, studies focused on individual enzymes may overlook potential diagnostic biomarkers. Methodology and Principal Findings As an alternative, we employed a combinatorial library of internally quenched fluorogenic probes (IQFPs) to profile the global proteolytic secretome of an AF clinical isolate in vitro. Comparative protease activity profiling revealed 212 substrate sequences that were cleaved by AF secreted proteases but not by normal human serum. A central finding was that isoleucine, leucine, phenylalanine, and tyrosine predominated at each of the three variable positions of the library (44.1%, 59.1%, and 57.0%, respectively) among substrate sequences cleaved by AF secreted proteases. In contrast, fewer than 10% of the residues at each position of cleaved sequences were cationic or anionic. Consensus substrate motifs were cleaved by thermostable serine proteases that retained activity up to 50°C. Precise proteolytic cleavage sites were reliably determined by a simple, rapid mass spectrometry-based method, revealing predominantly non-prime side specificity. A comparison of the secreted protease activities of three AF clinical isolates revealed consistent protease substrate specificity fingerprints. However, secreted proteases of A. flavus, A. nidulans, and A. terreus strains exhibited striking differences in their proteolytic signatures. Conclusions This report provides proof-of-principle for the use of protease substrate specificity profiling to define the proteolytic secretome of Aspergillus fumigatus. Expansion of this technique to protease secretion during infection could lead to development of novel approaches to fungal diagnosis. PMID:21695046
Evaluation of the genetic diversity of Plum pox virus in a single plum tree.

PubMed

Predajňa, Lukáš; Šubr, Zdeno; Candresse, Thierry; Glasa, Miroslav

2012-07-01

Genetic diversity of Plum pox virus (PPV) and its distribution within a single perennial woody host (plum, Prunus domestica) has been evaluated. A plum tree was triply infected by chip-budding with PPV-M, PPV-D and PPV-Rec isolates in 2003 and left to develop untreated under open field conditions. In September 2010 leaf and fruit samples were collected from different parts of the tree canopy. A 745-bp NIb-CP fragment of PPV genome, containing the hypervariable region encoding the CP N-terminal end was amplified by RT-PCR from each sample and directly sequenced to determine the dominant sequence. In parallel, the PCR products were cloned and a total of 105 individual clones were sequenced. Sequence analysis revealed that after 7 years of infection, only PPV-M was still detectable in the tree and that the two other isolates (PPV-Rec and PPV-D) had been displaced. Despite the fact that the analysis targeted a relatively short portion of the genome, a substantial amount of intra-isolate variability was observed for PPV-M. A total of 51 different haplotypes could be identified from the 105 individual sequences, two of which were largely dominant. However, no clear-cut structuration of the viral population by the tree architecture could be highlighted although the results obtained suggest the possibility of intra-leaf/fruit differentiation of the viral population. Comparison of the consensus sequence with the original source isolate showed no difference, suggesting within-plant stability of this original isolate under open field conditions. Copyright © 2012 Elsevier B.V. All rights reserved.
A novel atypical hemolytic uremic syndrome-associated hybrid CFHR1/CFH gene encoding a fusion protein that antagonizes factor H-dependent complement regulation.

PubMed

Valoti, Elisabetta; Alberti, Marta; Tortajada, Agustin; Garcia-Fernandez, Jesus; Gastoldi, Sara; Besso, Luca; Bresin, Elena; Remuzzi, Giuseppe; Rodriguez de Cordoba, Santiago; Noris, Marina

2015-01-01

Genomic aberrations affecting the genes encoding factor H (FH) and the five FH-related proteins (FHRs) have been described in patients with atypical hemolytic uremic syndrome (aHUS), a rare condition characterized by microangiopathic hemolytic anemia, thrombocytopenia, and ARF. These genomic rearrangements occur through nonallelic homologous recombinations caused by the presence of repeated homologous sequences in CFH and CFHR1-R5 genes. In this study, we found heterozygous genomic rearrangements among CFH and CFHR genes in 4.5% of patients with aHUS. CFH/CFHR rearrangements were associated with poor clinical prognosis and high risk of post-transplant recurrence. Five patients carried known CFH/CFHR1 genes, but we found a duplication leading to a novel CFHR1/CFH hybrid gene in a family with two affected subjects. The resulting fusion protein contains the first four short consensus repeats of FHR1 and the terminal short consensus repeat 20 of FH. In an FH-dependent hemolysis assay, we showed that the hybrid protein causes sheep erythrocyte lysis. Functional analysis of the FHR1 fraction purified from serum of heterozygous carriers of the CFHR1/CFH hybrid gene indicated that the FHR1/FH hybrid protein acts as a competitive antagonist of FH. Furthermore, sera from carriers of the hybrid CFHR1/CFH gene induced more C5b-9 deposition on endothelial cells than control serum. These results suggest that this novel genomic hybrid mediates disease pathogenesis through dysregulation of complement at the endothelial cell surface. We recommend that genetic screening of aHUS includes analysis of CFH and CFHR rearrangements, particularly before a kidney transplant. Copyright © 2015 by the American Society of Nephrology.
A cell-free stock of simian-human immunodeficiency virus that causes AIDS in pig-tailed macaques has a limited number of amino acid substitutions in both SIVmac and HIV-1 regions of the genome and has offered cytotropism.

PubMed

Stephens, E B; Mukherjee, S; Sahni, M; Zhuge, W; Raghavan, R; Singh, D K; Leung, K; Atkinson, B; Li, Z; Joag, S V; Liu, Z Q; Narayan, O

1997-05-12

We have examined both the sequence changes in the LTR, gag, vif, vpr, vpx, tat, rev, vpu, env, and nef genes and the cell tropism of a cell-free stock of chimeric simian-human immunodeficiency virus (SHIV) isolated from the cerebrospinal fluid of a pig-tailed macaque (PNb) that developed AIDS. This virus (SHIVKU-1) is highly pathogenic when inoculated into other macaques. DNA sequence analysis of PCR-amplified products revealed a total of 5 nucleotide changes in the LTR while vif had 2 consensus amino acid changes. The gag, vif, and vpx had no consensus amino acid substitutions, whereas vpr had 1 consensus substitution. The tat and rev genes of the HXB2 region of SHIVKU-1 had 2 and 1 consensus amino acid changes, respectively. The vpu gene of the HXB2 region of SHIV, which originally had an ACG at the beginning of the gene, reverted to an initiation ATG codon and in addition contained a consensus amino acid substitution at position 69 of this protein. As expected, the majority of the nucleotide substitutions were found in the env and nef genes. Thirteen and 5 amino acid changes were predicted for the corresponding Env and Nef proteins, respectively. In addition, one-third of the env gene clones isolated from the SHIVKU-1 stock had a 5-amino-acid deletion in the V4 region. Using three independent assays, we determined that the changes in the SHIVKU-1 were associated with an increase in the efficiency of replication in macrophages. The strikingly few consensus changes in the virus suggest that conversion of this virus to one capable of causing AIDS in pig-tailed macaques was associated with relatively few changes in the viral envelope and/or accessory genes. These results will provide the basis for the development of a pathogenic, molecular clone of SHIV capable of causing AIDS in pig-tailed macaques.
Describing sequencing results of structural chromosome rearrangements with a suggested next-generation cytogenetic nomenclature.

PubMed

Ordulu, Zehra; Wong, Kristen E; Currall, Benjamin B; Ivanov, Andrew R; Pereira, Shahrin; Althari, Sara; Gusella, James F; Talkowski, Michael E; Morton, Cynthia C

2014-05-01

With recent rapid advances in genomic technologies, precise delineation of structural chromosome rearrangements at the nucleotide level is becoming increasingly feasible. In this era of "next-generation cytogenetics" (i.e., an integration of traditional cytogenetic techniques and next-generation sequencing), a consensus nomenclature is essential for accurate communication and data sharing. Currently, nomenclature for describing the sequencing data of these aberrations is lacking. Herein, we present a system called Next-Gen Cytogenetic Nomenclature, which is concordant with the International System for Human Cytogenetic Nomenclature (2013). This system starts with the alignment of rearrangement sequences by BLAT or BLAST (alignment tools) and arrives at a concise and detailed description of chromosomal changes. To facilitate usage and implementation of this nomenclature, we are developing a program designated BLA(S)T Output Sequence Tool of Nomenclature (BOSToN), a demonstrative version of which is accessible online. A standardized characterization of structural chromosomal rearrangements is essential both for research analyses and for application in the clinical setting. Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
De novo assembly of mitochondrial genomes provides insights into genetic diversity and molecular evolution in wild boars and domestic pigs.

PubMed

Ni, Pan; Bhuiyan, Ali Akbar; Chen, Jian-Hai; Li, Jingjin; Zhang, Cheng; Zhao, Shuhong; Du, Xiaoyong; Li, Hua; Yu, Hui; Liu, Xiangdong; Li, Kui

2018-06-01

Up to date, the scarcity of publicly available complete mitochondrial sequences for European wild pigs hampers deeper understanding about the genetic changes following domestication. Here, we have assembled 26 de novo mtDNA sequences of European wild boars from next generation sequencing (NGS) data and downloaded 174 complete mtDNA sequences to assess the genetic relationship, nucleotide diversity, and selection. The Bayesian consensus tree reveals the clear divergence between the European and Asian clade and a very small portion (10 out of 200 samples) of maternal introgression. The overall nucleotides diversities of the mtDNA sequences have been reduced following domestication. Interestingly, the selection efficiencies in both European and Asian domestic pigs are reduced, probably caused by changes in both selection constraints and maternal population size following domestication. This study suggests that de novo assembled mitogenomes can be a great boon to uncover the genetic turnover following domestication. Further investigation is warranted to include more samples from the ever-increasing amounts of NGS data to help us to better understand the process of domestication.
Targeted Identification of Short Interspersed Nuclear Element Families Shows Their Widespread Existence and Extreme Heterogeneity in Plant Genomes[W

PubMed Central

Wenke, Torsten; Döbel, Thomas; Sörensen, Thomas Rosleff; Junghans, Holger; Weisshaar, Bernd; Schmidt, Thomas

2011-01-01

Short interspersed nuclear elements (SINEs) are non-long terminal repeat retrotransposons that are highly abundant, heterogeneous, and mostly not annotated in eukaryotic genomes. We developed a tool designated SINE-Finder for the targeted discovery of tRNA-derived SINEs. We analyzed sequence data of 16 plant genomes, including 13 angiosperms and three gymnosperms and identified 17,829 full-length and truncated SINEs falling into 31 families showing the widespread occurrence of SINEs in higher plants. The investigation focused on potato (Solanum tuberosum), resulting in the detection of seven different SolS SINE families consisting of 1489 full-length and 870 5′ truncated copies. Consensus sequences of full-length members range in size from 106 to 244 bp depending on the SINE family. SolS SINEs populated related species and evolved separately, which led to some distinct subfamilies. Solanaceae SINEs are dispersed along chromosomes and distributed without clustering but with preferred integration into short A-rich motifs. They emerged more than 23 million years ago and were species specifically amplified during the radiation of potato, tomato (Solanum lycopersicum), and tobacco (Nicotiana tabacum). We show that tobacco TS retrotransposons are composite SINEs consisting of the 3′ end of a long interspersed nuclear element integrated downstream of a nonhomologous SINE family followed by successfully colonization of the genome. We propose an evolutionary scenario for the formation of TS as a spontaneous event, which could be typical for the emergence of SINE families. PMID:21908723
Targeted identification of short interspersed nuclear element families shows their widespread existence and extreme heterogeneity in plant genomes.

PubMed

Wenke, Torsten; Döbel, Thomas; Sörensen, Thomas Rosleff; Junghans, Holger; Weisshaar, Bernd; Schmidt, Thomas

2011-09-01

Short interspersed nuclear elements (SINEs) are non-long terminal repeat retrotransposons that are highly abundant, heterogeneous, and mostly not annotated in eukaryotic genomes. We developed a tool designated SINE-Finder for the targeted discovery of tRNA-derived SINEs. We analyzed sequence data of 16 plant genomes, including 13 angiosperms and three gymnosperms and identified 17,829 full-length and truncated SINEs falling into 31 families showing the widespread occurrence of SINEs in higher plants. The investigation focused on potato (Solanum tuberosum), resulting in the detection of seven different SolS SINE families consisting of 1489 full-length and 870 5' truncated copies. Consensus sequences of full-length members range in size from 106 to 244 bp depending on the SINE family. SolS SINEs populated related species and evolved separately, which led to some distinct subfamilies. Solanaceae SINEs are dispersed along chromosomes and distributed without clustering but with preferred integration into short A-rich motifs. They emerged more than 23 million years ago and were species specifically amplified during the radiation of potato, tomato (Solanum lycopersicum), and tobacco (Nicotiana tabacum). We show that tobacco TS retrotransposons are composite SINEs consisting of the 3' end of a long interspersed nuclear element integrated downstream of a nonhomologous SINE family followed by successfully colonization of the genome. We propose an evolutionary scenario for the formation of TS as a spontaneous event, which could be typical for the emergence of SINE families.
Genome-wide identification and evolutionary analysis of algal LPAT genes involved in TAG biosynthesis using bioinformatic approaches.

PubMed

Misra, Namrata; Panda, Prasanna Kumar; Parida, Bikram Kumar

2014-12-01

Lysophosphatidyl acyltransferase (LPAT) is one of the major triacylglycerol synthesis enzymes, controlling the metabolic flow of lysophosphatidic acid to phosphatidic acid. Experimental studies in Arabidopsis have shown that LPAT activity is exhibited primarily by three distinct isoforms, namely the plastid-located LPAT1, the endoplasmic reticulum-located LPAT2, and the soluble isoform of LPAT (solLPAT). In this study, 24 putative genes representing all LPAT isoforms were identified from the analysis of 11 complete genomes including green algae, red algae, diatoms and higher plants. We observed LPAT1 and solLPAT genes to be ubiquitously present in nearly all genomes examined, whereas LPAT2 genes to have evolved more recently in the plant lineage. Phylogenetic analysis indicated that LPAT1, LPAT2 and solLPAT have convergently evolved through separate evolutionary paths and belong to three different gene families, which was further evidenced by their wide divergence at gene structure and sequence level. The genome distribution supports the hypothesis that each gene encoding a LPAT is not duplicated. Mapping of exon-intron structure of LPAT genes to the domain structure of proteins across different algal and plant species indicates that exon shuffling plays no role in the evolution of LPAT genes. Besides the previously defined motifs, several conserved consensus sequences were discovered which could be useful to distinguish different LPAT isoforms. Taken together, this study will enable the generation of experimental approximations to better understand the functional role of algal LPAT in lipid accumulation.
Human papillomavirus type 2 associated with pyogenic granuloma in patients without clinical evidence of warts.

PubMed

Vázquez-Martínez, Osvaldo T; González-Betancourt, Anajulia; Barboza-Cerda, María Carmen; González-González, Sergio E; Lugo-Trampe, Ángel; Welsh, Oliverio; Rojas-Martínez, Augusto; Martínez-Rodríguez, Herminia G; Ocampo-Candiani, Jorge; Ortiz-López, Rocío

2016-07-01

Pyogenic granuloma is a non-neoplastic lesion that frequently occurs in the skin and mucous membranes of children and pregnant women. The anatomical sites of pyogenic granulomas overlap with those of wart infections caused by the human papillomavirus (HPV). This study assessed the presence of HPV DNA in pyogenic granuloma samples by polymerase chain reaction. Eighteen pyogenic granuloma biopsies from patients without a clinical history or evidence of verruca in the studied area were tested for the presence of the HPV genome. The presence of HPV DNA was screened by three independent polymerase chain reaction reactions using standard consensus primer sets targeted to the L1 or E1 consensus regions of HPV genome. The HPV DNA-positive samples were genotyped using methodologies enabling the identification of up to 30 HPVs, including oncogenic, nononcogenic, and cutaneous viral types. The HPV DNA was detected in 44.4% (eight of 18) of the samples, with HPV-2 being the only type in the eight HPV DNA-positive samples. Contamination with HPV-2 sequences throughout the entire process was reliably eliminated. This report is the first to suggest an association between HPV-2 and pyogenic granuloma. This relationship is similar to that observed between HPV-2 and nongenital warts. © 2015 The International Society of Dermatology.
Heterosexual Transmission of Subtype C HIV-1 Selects Consensus-Like Variants without Increased Replicative Capacity or Interferon-α Resistance

PubMed Central

Fenton-May, Angharad E.; Dilernia, Dario A.; Kilembe, William; Allen, Susan A.; Borrow, Persephone; Hunter, Eric

2015-01-01

Heterosexual transmission of HIV-1 is characterized by a genetic bottleneck that selects a single viral variant, the transmitted/founder (TF), during most transmission events. To assess viral characteristics influencing HIV-1 transmission, we sequenced 167 near full-length viral genomes and generated 40 infectious molecular clones (IMC) including TF variants and multiple non-transmitted (NT) HIV-1 subtype C variants from six linked heterosexual transmission pairs near the time of transmission. Consensus-like genomes sensitive to donor antibodies were selected for during transmission in these six transmission pairs. However, TF variants did not demonstrate increased viral fitness in terms of particle infectivity or viral replicative capacity in activated peripheral blood mononuclear cells (PBMC) and monocyte-derived dendritic cells (MDDC). In addition, resistance of the TF variant to the antiviral effects of interferon-α (IFN-α) was not significantly different from that of non-transmitted variants from the same transmission pair. Thus neither in vitro viral replicative capacity nor IFN-α resistance discriminated the transmission potential of viruses in the quasispecies of these chronically infected individuals. However, our findings support the hypothesis that within-host evolution of HIV-1 in response to adaptive immune responses reduces viral transmission potential. PMID:26378795
Transitioning from genotypes to epigenotypes: why the time has come for medulloblastoma epigenomics.

PubMed

Batora, N V; Sturm, D; Jones, D T W; Kool, M; Pfister, S M; Northcott, P A

2014-04-04

Recent advances in genomic technologies have allowed for tremendous progress in our understanding of the biology underlying medulloblastoma, a malignant childhood brain tumor. Consensus molecular subgroups have been put forth by the pediatric neuro-oncology community and next-generation genomic studies have led to an improved description of driver genes and pathways somatically altered in these subgroups. In contrast to the impressive pace at which advances have been made at the level of the medulloblastoma genome, comparable studies of the epigenome have lagged behind. Complementary data yielded from genomic sequencing and copy number profiling have verified frequent targeting of chromatin modifiers in medulloblastoma, highly suggestive of prominent epigenetic deregulation in the disease. Past studies of DNA methylation-dependent gene silencing and microRNA expression analyses further support the concept of medulloblastoma as an epigenetic disease. In this Review, we aim to summarize the key findings of past reports pertaining to medulloblastoma epigenetics as well as recent and ongoing genomic efforts linking somatic alterations of the genome with inferred deregulation of the epigenome. In addition, we predict what is on the horizon for medulloblastoma epigenetics and how aberrant changes in the medulloblastoma epigenome might serve as an attractive target for future therapies. Copyright © 2013 IBRO. Published by Elsevier Ltd. All rights reserved.
Polyomavirus BK non-coding control region rearrangements in health and disease.

PubMed

Sharma, Preety M; Gupta, Gaurav; Vats, Abhay; Shapiro, Ron; Randhawa, Parmjeet S

2007-08-01

BK virus is an increasingly recognized pathogen in transplanted patients. DNA sequencing of this virus shows considerable genomic variability. To understand the clinical significance of rearrangements in the non-coding control region (NCCR) of BK virus (BKV), we report a meta-analysis of 507 sequences, including 40 sequences generated in our own laboratory, for associations between rearrangements and disease, tissue tropism, geographic origin, and viral genotype. NCCR rearrangements were less frequent in (a) asymptomatic BKV viruria compared to patients viral nephropathy (1.7% vs. 22.5%), and (b) viral genotype 1 compared to other genotypes (2.4% vs. 11.2%). Rearrangements were commoner in malignancy (78.6%), and Norwegians (45.7%), and less common in East Indians (0%), and Japanese (4.3%). A surprising number of rearranged sequences were reported from mononuclear cells of healthy subjects, whereas most plasma sequences were archetypal. This difference could not be related to potential recombinase activity in lymphocytes, as consensus recombination signal sequences could not be found in the NCCR region. NCCR rearrangements are neither required nor a sufficient condition to produce clinical disease. BKV nephropathy and hemorrhagic cystitis are not associated with any unique NCCR configuration or nucleotide sequence.
Development of a EST dataset and characterization of EST-SSRs in a traditional Chinese medicinal plant, Epimedium sagittatum (Sieb. Et Zucc.) Maxim

PubMed Central

2010-01-01

Background Epimedium sagittatum (Sieb. Et Zucc.) Maxim, a traditional Chinese medicinal plant species, has been used extensively as genuine medicinal materials. Certain Epimedium species are endangered due to commercial overexploition, while sustainable application studies, conservation genetics, systematics, and marker-assisted selection (MAS) of Epimedium is less-studied due to the lack of molecular markers. Here, we report a set of expressed sequence tags (ESTs) and simple sequence repeats (SSRs) identified in these ESTs for E. sagittatum. Results cDNAs of E. sagittatum are sequenced using 454 GS-FLX pyrosequencing technology. The raw reads are cleaned and assembled into a total of 76,459 consensus sequences comprising of 17,231 contigs and 59,228 singlets. About 38.5% (29,466) of the consensus sequences significantly match to the non-redundant protein database (E-value < 1e-10), 22,295 of which are further annotated using Gene Ontology (GO) terms. A total of 2,810 EST-SSRs is identified from the Epimedium EST dataset. Trinucleotide SSR is the dominant repeat type (55.2%) followed by dinucleotide (30.4%), tetranuleotide (7.3%), hexanucleotide (4.9%), and pentanucleotide (2.2%) SSR. The dominant repeat motif is AAG/CTT (23.6%) followed by AG/CT (19.3%), ACC/GGT (11.1%), AT/AT (7.5%), and AAC/GTT (5.9%). Thirty-two SSR-ESTs are randomly selected and primer pairs are synthesized for testing the transferability across 52 Epimedium species. Eighteen primer pairs (85.7%) could be successfully transferred to Epimedium species and sixteen of those show high genetic diversity with 0.35 of observed heterozygosity (Ho) and 0.65 of expected heterozygosity (He) and high number of alleles per locus (11.9). Conclusion A large EST dataset with a total of 76,459 consensus sequences is generated, aiming to provide sequence information for deciphering secondary metabolism, especially for flavonoid pathway in Epimedium. A total of 2,810 EST-SSRs is identified from EST dataset and ~1580 EST-SSR markers are transferable. E. sagittatum EST-SSR transferability to the major Epimedium germplasm is up to 85.7%. Therefore, this EST dataset and EST-SSRs will be a powerful resource for further studies such as taxonomy, molecular breeding, genetics, genomics, and secondary metabolism in Epimedium species. PMID:20141623
A safe an easy method for building consensus HIV sequences from 454 massively parallel sequencing data.

PubMed

Fernández-Caballero Rico, Jose Ángel; Chueca Porcuna, Natalia; Álvarez Estévez, Marta; Mosquera Gutiérrez, María Del Mar; Marcos Maeso, María Ángeles; García, Federico

2018-02-01

To show how to generate a consensus sequence from the information of massive parallel sequences data obtained from routine HIV anti-retroviral resistance studies, and that may be suitable for molecular epidemiology studies. Paired Sanger (Trugene-Siemens) and next-generation sequencing (NGS) (454 GSJunior-Roche) HIV RT and protease sequences from 62 patients were studied. NGS consensus sequences were generated using Mesquite, using 10%, 15%, and 20% thresholds. Molecular evolutionary genetics analysis (MEGA) was used for phylogenetic studies. At a 10% threshold, NGS-Sanger sequences from 17/62 patients were phylogenetically related, with a median bootstrap-value of 88% (IQR83.5-95.5). Association increased to 36/62 sequences, median bootstrap 94% (IQR85.5-98)], using a 15% threshold. Maximum association was at the 20% threshold, with 61/62 sequences associated, and a median bootstrap value of 99% (IQR98-100). A safe method is presented to generate consensus sequences from HIV-NGS data at 20% threshold, which will prove useful for molecular epidemiological studies. Copyright © 2016 Elsevier España, S.L.U. and Sociedad Española de Enfermedades Infecciosas y Microbiología Clínica. All rights reserved.
The ethics of creating genetically modified children using genome editing.

PubMed

Ishii, Tetsuya

2017-12-01

To review the recent ethical, legal, and social issues surrounding human reproduction involving germline genome editing. Genome editing techniques, such as CRISPR/Cas9, have facilitated genetic modification in human embryos. The most likely purpose of germline genome editing is the prevention of serious genetic disease in offspring. However, complex issues still remain, including irremediable risks to fetuses and future generations, the role of women, the availability of alternatives, long-term follow-up, health insurance coverage, misuse for human enhancement, and the potential effects on adoption. Further discussions, a broad consensus, and appropriate regulations are required before human germline genome editing is introduced into the global society. Before germline genome editing is used for disease prevention, a broad consensus must be formed by carefully discussing its ethical, legal, and social issues.

Expressed sequence tags from the oomycete fish pathogen Saprolegnia parasitica reveal putative virulence factors

PubMed Central

Torto-Alalibo, Trudy; Tian, Miaoying; Gajendran, Kamal; Waugh, Mark E; van West, Pieter; Kamoun, Sophien

2005-01-01

Background The oomycete Saprolegnia parasitica is one of the most economically important fish pathogens. There is a dramatic recrudescence of Saprolegnia infections in aquaculture since the use of the toxic organic dye malachite green was banned in 2002. Little is known about the molecular mechanisms underlying pathogenicity in S. parasitica and other animal pathogenic oomycetes. In this study we used a genomics approach to gain a first insight into the transcriptome of S. parasitica. Results We generated 1510 expressed sequence tags (ESTs) from a mycelial cDNA library of S. parasitica. A total of 1279 consensus sequences corresponding to 525944 base pairs were assembled. About half of the unigenes showed similarities to known protein sequences or motifs. The S. parasitica sequences tended to be relatively divergent from Phytophthora sequences. Based on the sequence alignments of 18 conserved proteins, the average amino acid identity between S. parasitica and three Phytophthora species was 77% compared to 93% within Phytophthora. Several S. parasitica cDNAs, such as those with similarity to fungal type I cellulose binding domain proteins, PAN/Apple module proteins, glycosyl hydrolases, proteases, as well as serine and cysteine protease inhibitors, were predicted to encode secreted proteins that could function in virulence. Some of these cDNAs were more similar to fungal proteins than to other eukaryotic proteins confirming that oomycetes and fungi share some virulence components despite their evolutionary distance Conclusion We provide a first glimpse into the gene content of S. parasitica, a reemerging oomycete fish pathogen. These resources will greatly accelerate research on this important pathogen. The data is available online through the Oomycete Genomics Database [1]. PMID:16076392
Accuracy of different bioinformatics methods in detecting antibiotic resistance and virulence factors from Staphylococcus aureus whole genome sequences.

PubMed

Mason, Amy; Foster, Dona; Bradley, Phelim; Golubchik, Tanya; Doumith, Michel; Gordon, N Claire; Pichon, Bruno; Iqbal, Zamin; Staves, Peter; Crook, Derrick; Walker, A Sarah; Kearns, Angela; Peto, Tim

2018-06-20

Background : In principle, whole genome sequencing (WGS) can predict phenotypic resistance directly from genotype, replacing laboratory-based tests. However, the contribution of different bioinformatics methods to genotype-phenotype discrepancies has not been systematically explored to date. Methods : We compared three WGS-based bioinformatics methods (Genefinder (read-based), Mykrobe (de Bruijn graph-based) and Typewriter (BLAST-based)) for predicting presence/absence of 83 different resistance determinants and virulence genes, and overall antimicrobial susceptibility, in 1379 Staphylococcus aureus isolates previously characterised by standard laboratory methods (disc diffusion, broth and/or agar dilution and PCR). Results : 99.5% (113830/114457) of individual resistance-determinant/virulence gene predictions were identical between all three methods, with only 627 (0.5%) discordant predictions, demonstrating high overall agreement (Fliess-Kappa=0.98, p<0.0001). Discrepancies when identified were in only one of the three methods for all genes except the cassette recombinase, ccrC(b ). Genotypic antimicrobial susceptibility prediction matched laboratory phenotype in 98.3% (14224/14464) cases (2720 (18.8%) resistant, 11504 (79.5%) susceptible). There was greater disagreement between the laboratory phenotypes and the combined genotypic predictions (97 (0.7%) phenotypically-susceptible but all bioinformatic methods reported resistance; 89 (0.6%) phenotypically-resistant, but all bioinformatics methods reported susceptible) than within the three bioinformatics methods (54 (0.4%) cases, 16 phenotypically-resistant, 38 phenotypically-susceptible). However, in 36/54 (67%), the consensus genotype matched the laboratory phenotype. Conclusions : In this study, the choice between these three specific bioinformatic methods to identify resistance-determinants or other genes in S. aureus did not prove critical, with all demonstrating high concordance with each other and phenotypic/molecular methods. However, each has some limitations and therefore consensus methods provide some assurance. Copyright © 2018 American Society for Microbiology.
Identification of natural and artificial DNA substrates for the light-activated LOV-HTH transcription factor EL222

PubMed Central

Rivera-Cancel, Giomar; Motta-Mena, Laura B.; Gardner, Kevin H.

2012-01-01

Light-oxygen-voltage (LOV) domains serve as the photosensory modules for a wide range of plant and bacterial proteins, conferring blue light dependent regulation to effector activities as diverse as enzymes and DNA binding. LOV domains can also be engineered into a variety of exogenous targets, enabling similar regulation for new protein-based reagents. Common to these proteins is the ability for LOV domains to reversibly form a photochemical adduct between an internal flavin chromophore and the surrounding protein, using this to trigger conformational changes that affect output activity. Using the Erythrobacter litoralis protein EL222 model system which links LOV regulation to a helix-turn-helix (HTH) DNA binding domain, we demonstrated that the LOV domain binds and inhibits the HTH domain in the dark, releasing these interactions upon illumination [Nash et al. (2011) Proc. Natl. Acad. Sci. USA 108, 9449–9454]. Here we combine genomic and in vitro selection approaches to identify optimal DNA binding sites for EL222. Within the bacterial host, we observe binding several genomic sites using a 12 bp sequence consensus that is also found by in vitro selection methods. Sequence-specific alterations in the DNA consensus reduce EL222-binding affinity in a manner consistent with the expected binding mode: a protein dimer binding to two repeats. Finally, we demonstrate the light-dependent activation of transcription of two genes adjacent to an EL222 binding site. Taken together, these results shed light on the native function of EL222 and provide useful reagents for further basic and applications research of this versatile protein. PMID:23205774
Structural characterization of copia-type retrotransposons leads to insights into the marker development in a biofuel crop, Jatropha curcas L.

PubMed Central

2013-01-01

Background Recently, Jatropha curcas L. has attracted worldwide attention for its potential as a source of biodiesel. However, most DNA markers have demonstrated high levels of genetic similarity among and within jatropha populations around the globe. Despite promising features of copia-type retrotransposons as ideal genetic tools for gene tagging, mutagenesis, and marker-assisted selection, they have not been characterized in the jatropha genome yet. Here, we examined the diversity, evolution, and genome-wide organization of copia-type retrotransposons in the Asian, African, and Mesoamerican accessions of jatropha, then introduced a retrotransposon-based marker for this biofuel crop. Results In total, 157 PCR fragments that were amplified using the degenerate primers for the reverse transcriptase (RT) domain of copia-type retroelements were sequenced and aligned to construct the neighbor-joining tree. Phylogenetic analysis demonstrated that isolated copia RT sequences were classified into ten families, which were then grouped into three lineages. An in-depth study of the jatropha genome for the RT sequences of each family led to the characterization of full consensus sequences of the jatropha copia-type families. Estimated copy numbers of target sequences were largely different among families, as was presence of genes within 5 kb flanking regions for each family. Five copia-type families were as appealing candidates for the development of DNA marker systems. A candidate marker from family Jc7 was particularly capable of detecting genetic variation among different jatropha accessions. Fluorescence in situ hybridization (FISH) to metaphase chromosomes reveals that copia-type retrotransposons are scattered across chromosomes mainly located in the distal part regions. Conclusion This is the first report on genome-wide analysis and the cytogenetic mapping of copia-type retrotransposons of jatropha, leading to the discovery of families bearing high potential as DNA markers. Distinct dynamics of individual copia-type families, feasibility of a retrotransposon-based insertion polymorphism marker system in examining genetic variability, and approaches for the development of breeding strategies in jatropha using copia-type retrotransposons are discussed. PMID:24020916
Characterization of the Structural Gene Promoter of Aedes aegypti Densovirus

PubMed Central

Ward, Todd W.; Kimmick, Michael W.; Afanasiev, Boris N.; Carlson, Jonathan O.

2001-01-01

Aedes aegypti densonucleosis virus (AeDNV) has two promoters that have been shown to be active by reporter gene expression analysis (B. N. Afanasiev, Y. V. Koslov, J. O. Carlson, and B. J. Beaty, Exp. Parasitol. 79:322–339, 1994). Northern blot analysis of cells infected with AeDNV revealed two transcripts 1,200 and 3,500 nucleotides in length that are assumed to express the structural protein (VP) gene and nonstructural protein genes, respectively. Primer extension was used to map the transcriptional start site of the structural protein gene. Surprisingly, the structural protein gene transcript began at an initiator consensus sequence, CAGT, 60 nucleotides upstream from the map unit 61 TATAA sequence previously thought to define the promoter. Constructs with the β-galactosidase gene fused to the structural protein gene were used to determine elements necessary for promoter function. Deletion or mutation of the initiator sequence, CAGT, reduced protein expression by 93%, whereas mutation of the TATAA sequence at map unit 61 had little effect. An additional open reading frame was observed upstream of the structural protein gene that can express β-galactosidase at a low level (20% of that of VP fusions). Expression of the AeDNV structural protein gene was shown to be stimulated by the major nonstructural protein NS1 (Afanasiev et al., Exp. parasitol., 1994). To determine the sequences required for transactivation, expression of structural protein gene–β-galactosidase gene fusion constructs differing in AeDNV genome content was measured with and without NS1. The presence of NS1 led to an 8- to 10-fold increase in expression when either genomic end was present, compared to a 2-fold increase with a construct lacking the genomic ends. An even higher (37-fold) increase in expression occurred with both genomic ends present; however, this was in part due to template replication as shown by Southern blot analysis. These data indicate the location and importance of various elements necessary for efficient protein expression and transactivation from the structural protein gene promoter of AeDNV. PMID:11152505
High-density marker profiling confirms ancestral genomes of Avena species and identifies D-genome chromosomes of hexaploid oat.

PubMed

Yan, Honghai; Bekele, Wubishet A; Wight, Charlene P; Peng, Yuanying; Langdon, Tim; Latta, Robert G; Fu, Yong-Bi; Diederichsen, Axel; Howarth, Catherine J; Jellen, Eric N; Boyle, Brian; Wei, Yuming; Tinker, Nicholas A

2016-11-01

Genome analysis of 27 oat species identifies ancestral groups, delineates the D genome, and identifies ancestral origin of 21 mapped chromosomes in hexaploid oat. We investigated genomic relationships among 27 species of the genus Avena using high-density genetic markers revealed by genotyping-by-sequencing (GBS). Two methods of GBS analysis were used: one based on tag-level haplotypes that were previously mapped in cultivated hexaploid oat (A. sativa), and one intended to sample and enumerate tag-level haplotypes originating from all species under investigation. Qualitatively, both methods gave similar predictions regarding the clustering of species and shared ancestral genomes. Furthermore, results were consistent with previous phylogenies of the genus obtained with conventional approaches, supporting the robustness of whole genome GBS analysis. Evidence is presented to justify the final and definitive classification of the tetraploids A. insularis, A. maroccana (=A. magna), and A. murphyi as containing D-plus-C genomes, and not A-plus-C genomes, as is most often specified in past literature. Through electronic painting of the 21 chromosome representations in the hexaploid oat consensus map, we show how the relative frequency of matches between mapped hexaploid-derived haplotypes and AC (DC)-genome tetraploids vs. A- and C-genome diploids can accurately reveal the genome origin of all hexaploid chromosomes, including the approximate positions of inter-genome translocations. Evidence is provided that supports the continued classification of a diverged B genome in AB tetraploids, and it is confirmed that no extant A-genome diploids, including A. canariensis, are similar enough to the D genome of tetraploid and hexaploid oat to warrant consideration as a D-genome diploid.
Base-Calling Algorithm with Vocabulary (BCV) Method for Analyzing Population Sequencing Chromatograms

PubMed Central

Fantin, Yuri S.; Neverov, Alexey D.; Favorov, Alexander V.; Alvarez-Figueroa, Maria V.; Braslavskaya, Svetlana I.; Gordukova, Maria A.; Karandashova, Inga V.; Kuleshov, Konstantin V.; Myznikova, Anna I.; Polishchuk, Maya S.; Reshetov, Denis A.; Voiciehovskaya, Yana A.; Mironov, Andrei A.; Chulanov, Vladimir P.

2013-01-01

Sanger sequencing is a common method of reading DNA sequences. It is less expensive than high-throughput methods, and it is appropriate for numerous applications including molecular diagnostics. However, sequencing mixtures of similar DNA of pathogens with this method is challenging. This is important because most clinical samples contain such mixtures, rather than pure single strains. The traditional solution is to sequence selected clones of PCR products, a complicated, time-consuming, and expensive procedure. Here, we propose the base-calling with vocabulary (BCV) method that computationally deciphers Sanger chromatograms obtained from mixed DNA samples. The inputs to the BCV algorithm are a chromatogram and a dictionary of sequences that are similar to those we expect to obtain. We apply the base-calling function on a test dataset of chromatograms without ambiguous positions, as well as one with 3–14% sequence degeneracy. Furthermore, we use BCV to assemble a consensus sequence for an HIV genome fragment in a sample containing a mixture of viral DNA variants and to determine the positions of the indels. Finally, we detect drug-resistant Mycobacterium tuberculosis strains carrying frameshift mutations mixed with wild-type bacteria in the pncA gene, and roughly characterize bacterial communities in clinical samples by direct 16S rRNA sequencing. PMID:23382983
Postmortem disclosure of genetic information to family members: active or passive?

PubMed

Boers, Sarah N; van Delden, Johannes J M; Knoers, Nine V; Bredenoord, Annelien L

2015-03-01

Advances in next-generation DNA sequencing (NGS) now make it possible, and affordable, to sequence the entire genome of an individual. Routine clinical application is on the horizon. There is a consensus that some subsets of genetic information should be disclosed to patients, but disclosure to their relatives is less consensual. This issue becomes especially salient after a patient's death, when permission can no longer be sought. There has however been little debate on postmortem disclosure. We identify and explain the arguments in favor of and against disclosure of genetic information to the relatives of a deceased patient. We conclude that there are valid reasons to communicate some subsets of genetic information to family members after death, and we propose a passive postmortem disclosure policy. Copyright © 2015 Elsevier Ltd. All rights reserved.
Embedding strategies for effective use of information from multiple sequence alignments.

PubMed Central

Henikoff, S.; Henikoff, J. G.

1997-01-01

We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain. PMID:9070452
Genetic heterogeneity of L-Zagreb mumps virus vaccine strain

PubMed Central

Kosutic-Gulija, Tanja; Forcic, Dubravko; Šantak, Maja; Ramljak, Ana; Mateljak-Lukacevic, Sanja; Mazuran, Renata

2008-01-01

Background The most often used mumps vaccine strains Jeryl Lynn (JL), RIT4385, Urabe-AM9, L-Zagreb and L-3 differ in immunogenicity and reactogenicity. Previous analyses showed that JL, Urabe-AM9 and L-3 are genetically heterogeneous. Results We identified the heterogeneity of L-Zagreb throughout the entire genome. Two major variants were defined: variant A being identical to the consensus sequence of viral seeds and vaccine(s) and variant B which differs from variant A in three nucleotide positions. The difference between viral variants in L-Zagreb strain is insufficient for distinct viral strains to be defined. We demonstrated that proportion of variants in L-Zagreb viral population depends on cell substrate used for viral replication in vitro and in vivo. Conclusion L-Zagreb strain should be considered as a single strain composed of at least two variant viral genomes. PMID:18616793
REC-1 and HIM-5 distribute meiotic crossovers and function redundantly in meiotic double-strand break formation in Caenorhabditis elegans.

PubMed

Chung, George; Rose, Ann M; Petalcorin, Mark I R; Martin, Julie S; Kessler, Zebulin; Sanchez-Pulido, Luis; Ponting, Chris P; Yanowitz, Judith L; Boulton, Simon J

2015-09-15

The Caenorhabditis elegans gene rec-1 was the first genetic locus identified in metazoa to affect the distribution of meiotic crossovers along the chromosome. We report that rec-1 encodes a distant paralog of HIM-5, which was discovered by whole-genome sequencing and confirmed by multiple genome-edited alleles. REC-1 is phosphorylated by cyclin-dependent kinase (CDK) in vitro, and mutation of the CDK consensus sites in REC-1 compromises meiotic crossover distribution in vivo. Unexpectedly, rec-1; him-5 double mutants are synthetic-lethal due to a defect in meiotic double-strand break formation. Thus, we uncovered an unexpected robustness to meiotic DSB formation and crossover positioning that is executed by HIM-5 and REC-1 and regulated by phosphorylation. © 2015 Chung et al.; Published by Cold Spring Harbor Laboratory Press.
Exon–intron organization of genes in the slime mold Physarum polycephalum

PubMed Central

Trzcinska-Danielewicz, Joanna; Fronk, Jan

2000-01-01

The slime mold Physarum polycephalum is a morphologically simple organism with a large and complex genome. The exon–intron organization of its genes exhibits features typical for protists and fungi as well as those characteristic for the evolutionarily more advanced species. This indicates that both the taxonomic position as well as the size of the genome shape the exon–intron organization of an organism. The average gene has 3.7 introns which are on average 138 bp, with a rather narrow size distribution. Introns are enriched in AT base pairs by 13% relative to exons. The consensus sequences at exon–intron boundaries resemble those found for other species, with minor differences between short and long introns. A unique feature of P.polycephalum introns is the strong preference for pyrimidines in the coding strand throughout their length, without a particular enrichment at the 3′-ends. PMID:10982858
SURVEY AND SUMMARY: exon-intron organization of genes in the slime mold Physarum polycephalum.

PubMed

Trzcinska-Danielewicz, J; Fronk, J

2000-09-15

The slime mold Physarum polycephalum is a morphologically simple organism with a large and complex genome. The exon-intron organization of its genes exhibits features typical for protists and fungi as well as those characteristic for the evolutionarily more advanced species. This indicates that both the taxonomic position as well as the size of the genome shape the exon-intron organization of an organism. The average gene has 3.7 introns which are on average 138 bp, with a rather narrow size distribution. Introns are enriched in AT base pairs by 13% relative to exons. The consensus sequences at exon-intron boundaries resemble those found for other species, with minor differences between short and long introns. A unique feature of P.polycephalum introns is the strong preference for pyrimidines in the coding strand throughout their length, without a particular enrichment at the 3'-ends.
Virulence-Associated Genome Mutations of Murine Rotavirus Identified by Alternating Serial Passages in Mice and Cell Cultures

PubMed Central

Tatsumi, Masatoshi; Tsutsumi, Hiroyuki

2014-01-01

ABSTRACT Although significant clinical efficacy and safety of rotavirus vaccines were recently revealed in many countries, the mechanism of their attenuation is not well understood. We passaged serially a cell culture-adapted murine rotavirus EB strain in mouse pups or in cell cultures alternately and repeatedly and fully sequenced all 11 genes of 21 virus samples passaged in mice or in cell cultures. Sequence analysis revealed that mouse-passaged viruses that regained virulence almost consistently acquired four kinds of amino acid (aa) substitutions in VP4 and substitution in aa 37 (Val to Ala) in NSP4. In addition, they gained and invariably conserved the 3′ consensus sequence in NSP1. The molecular changes occurred along with the acquisition of virulence during passages in mice and then disappeared following passages in cell cultures. Intraperitoneal injection of recombinant NSP4 proteins confirmed the aa 37 site as important for its diarrheagenic activity in mice. These genome changes are likely to be correlated with rotavirus virulence. IMPORTANCE Serial passage of a virulent wild-type virus in vitro often results in loss of virulence of the virus in an original animal host, while serial passage of a cell culture-adapted avirulent virus in vivo often gains virulence in an animal host. Actually, live attenuated virus vaccines were originally produced by serial passage in cell cultures. Although clinical efficacy and safety of rotavirus vaccines were recently revealed, the mechanism of their attenuation is not well understood. We passaged serially a murine rotavirus by alternating switch of host (mice or cell cultures) repeatedly and sequenced the eleven genes of the passaged viruses to identify mutations associated with the emergence or disappearance of virulence. Sequence analysis revealed that changes in three genes (VP4, NSP1, and NSP4) were associated with virulence in mice. Intraperitoneal injection of recombinant NSP4 proteins confirmed its diarrheagenic activity in mice. These genome changes are likely to be correlated with rotavirus virulence. PMID:24599996
Virulence-associated genome mutations of murine rotavirus identified by alternating serial passages in mice and cell cultures.

PubMed

Tsugawa, Takeshi; Tatsumi, Masatoshi; Tsutsumi, Hiroyuki

2014-05-01

Although significant clinical efficacy and safety of rotavirus vaccines were recently revealed in many countries, the mechanism of their attenuation is not well understood. We passaged serially a cell culture-adapted murine rotavirus EB strain in mouse pups or in cell cultures alternately and repeatedly and fully sequenced all 11 genes of 21 virus samples passaged in mice or in cell cultures. Sequence analysis revealed that mouse-passaged viruses that regained virulence almost consistently acquired four kinds of amino acid (aa) substitutions in VP4 and substitution in aa 37 (Val to Ala) in NSP4. In addition, they gained and invariably conserved the 3' consensus sequence in NSP1. The molecular changes occurred along with the acquisition of virulence during passages in mice and then disappeared following passages in cell cultures. Intraperitoneal injection of recombinant NSP4 proteins confirmed the aa 37 site as important for its diarrheagenic activity in mice. These genome changes are likely to be correlated with rotavirus virulence. Serial passage of a virulent wild-type virus in vitro often results in loss of virulence of the virus in an original animal host, while serial passage of a cell culture-adapted avirulent virus in vivo often gains virulence in an animal host. Actually, live attenuated virus vaccines were originally produced by serial passage in cell cultures. Although clinical efficacy and safety of rotavirus vaccines were recently revealed, the mechanism of their attenuation is not well understood. We passaged serially a murine rotavirus by alternating switch of host (mice or cell cultures) repeatedly and sequenced the eleven genes of the passaged viruses to identify mutations associated with the emergence or disappearance of virulence. Sequence analysis revealed that changes in three genes (VP4, NSP1, and NSP4) were associated with virulence in mice. Intraperitoneal injection of recombinant NSP4 proteins confirmed its diarrheagenic activity in mice. These genome changes are likely to be correlated with rotavirus virulence.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Yanfeng; Zheng, Yi; Qin, Ling

Beta-hydroxyacid dehydrogenase (β-HAD) genes have been identified in all sequenced genomes of eukaryotes and prokaryotes. Their gene products catalyze the NAD+- or NADP+-dependent oxidation of various β-hydroxy acid substrates into their corresponding semialdehyde. In many fungal and bacterial genomes, multiple β-HAD genes are observed leading to the hypothesis that these gene products may have unique, uncharacterized metabolic roles specific to their species. The genomes of Geobacter sulfurreducens and Geobacter metallireducens each contain two potential β-HAD genes. The protein sequences of one pair of these genes, Gs-βHAD (Q74DE4) and Gm-βHAD (Q39R98), have 65% sequence identity and 77% sequence similarity with eachmore » other. Both proteins reduce succinic semialdehyde, a metabolite of the GABA shunt. To further explore the structural and functional characteristics of these two β-HADs with a potentially unique substrate specificity, crystal structures for Gs-βHAD and Gm-βHAD in complex with NADP+ were determined to a resolution of 1.89 Å and 2.07 Å, respectively. The structure of both proteins are similar, composed of 14 α-helices and nine β-strands organized into two domains. Domain One (1-165) adopts a typical Rossmann fold composed of two α/β units: a six-strand parallel β-sheet surrounded by six α-helices (α1 – α6) followed by a mixed three-strand β-sheet surrounded by two α-helices (α7 and α8). Domain Two (166-287) is composed of a bundle of seven α-helices (α9 – α14). Four functional regions conserved in all β-HADs are spatially located near each other at the interdomain cleft in both Gs-βHAD and Gm-βHAD with a buried molecule of NADP+. The structural features of Gs-βHAD and Gm-βHAD are described in relation to the four conserved consensus sequences characteristic of β-HADs and the potential biochemical importance of these enzymes as an alternative pathway for the degradation of succinic semialdehyde.« less
Genomic Heat Shock Element Sequences Drive Cooperative Human Heat Shock Factor 1 DNA Binding and Selectivity*

PubMed Central

Jaeger, Alex M.; Makley, Leah N.; Gestwicki, Jason E.; Thiele, Dennis J.

2014-01-01

The heat shock transcription factor 1 (HSF1) activates expression of a variety of genes involved in cell survival, including protein chaperones, the protein degradation machinery, anti-apoptotic proteins, and transcription factors. Although HSF1 activation has been linked to amelioration of neurodegenerative disease, cancer cells exhibit a dependence on HSF1 for survival. Indeed, HSF1 drives a program of gene expression in cancer cells that is distinct from that activated in response to proteotoxic stress, and HSF1 DNA binding activity is elevated in cycling cells as compared with arrested cells. Active HSF1 homotrimerizes and binds to a DNA sequence consisting of inverted repeats of the pentameric sequence nGAAn, known as heat shock elements (HSEs). Recent comprehensive ChIP-seq experiments demonstrated that the architecture of HSEs is very diverse in the human genome, with deviations from the consensus sequence in the spacing, orientation, and extent of HSE repeats that could influence HSF1 DNA binding efficacy and the kinetics and magnitude of target gene expression. To understand the mechanisms that dictate binding specificity, HSF1 was purified as either a monomer or trimer and used to evaluate DNA-binding site preferences in vitro using fluorescence polarization and thermal denaturation profiling. These results were compared with quantitative chromatin immunoprecipitation assays in vivo. We demonstrate a role for specific orientations of extended HSE sequences in driving preferential HSF1 DNA binding to target loci in vivo. These studies provide a biochemical basis for understanding differential HSF1 target gene recognition and transcription in neurodegenerative disease and in cancer. PMID:25204655
Low dose rectal inoculation of rhesus macaques by SIV smE660 or SIVmac251 recapitulates

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hraber, Peter; Giorgi, Elena E; Keele, Brandon

2008-01-01

We recently developed a novel strategy to identify transmitted HIV-1 genomes in acutely infected humans using single-genome amplification and a model of random virus evolution. Here, we used this approach to determine the molecular features of simian immunodeficiency virus (SIV) transmission in 18 experimentally infected Indian rhesus macaques. Animals were inoculated intrarectally (i.r.) or intravenously (i.v.) with stocks of SIVmac251 or SIVsmE660 that exhibited sequence diversity typical of early-chronic HIV-1 infection. 987 full-length SIV env sequences (median of 48 per animal) were determined from plasma virion RNA 1--5 wk after infection. i.r. inoculation was followed by productive infection by onemore » or a few viruses (median 1; range 1--5) that diversified randomly with near starlike phylogeny and a Poisson distribution of mutations. Consensus viral sequences from ramp-up and peak viremia were identical to viruses found in the inocula or differed from them by only one or a few nucleotides, providing direct evidence that early plasma viral sequences coalesce to transmitted/founder viruses. i.v. infection was >2,000-fold more efficient than i.r. infection, and viruses transmitted by either route represented the full genetic spectra of the inocula. These findings identify key similarities in mucosal transmission and early diversification between SIV and HIV-1, and thus validate the SIV-macaque mucosal infection model for HIV-1 vaccine and microbicide research.« less
Molecular cloning and characterization of RGA1 encoding a G protein alpha subunit from rice (Oryza sativa L. IR-36).

PubMed

Seo, H S; Kim, H Y; Jeong, J Y; Lee, S Y; Cho, M J; Bahk, J D

1995-03-01

A cDNA clone, RGA1, was isolated by using a GPA1 cDNA clone of Arabidopsis thaliana G protein alpha subunit as a probe from a rice (Oryza sativa L. IR-36) seedling cDNA library from roots and leaves. Sequence analysis of genomic clone reveals that the RGA1 gene has 14 exons and 13 introns, and encodes a polypeptide of 380 amino acid residues with a calculated molecular weight of 44.5 kDa. The encoded protein exhibits a considerable degree of amino acid sequence similarity to all the other known G protein alpha subunits. A putative TATA sequence (ATATGA), a potential CAAT box sequence (AGCAATAC), and a cis-acting element, CCACGTGG (ABRE), known to be involved in ABA induction are found in the promoter region. The RGA1 protein contains all the consensus regions of G protein alpha subunits except the cysteine residue near the C-terminus for ADP-ribosylation by pertussis toxin. The RGA1 polypeptide expressed in Escherichia coli was, however, ADP-ribosylated by 10 microM [adenylate-32P] NAD and activated cholera toxin. Southern analysis indicates that there are no other genes similar to the RGA1 gene in the rice genome. Northern analysis reveals that the RGA1 mRNA is 1.85 kb long and expressed in vegetative tissues, including leaves and roots, and that its expression is regulated by light.
Unit-length line-1 transcripts in human teratocarcinoma cells.

PubMed Central

Skowronski, J; Fanning, T G; Singer, M F

1988-01-01

We have characterized the approximately 6.5-kilobase cytoplasmic poly(A)+ Line-1 (L1) RNA present in a human teratocarcinoma cell line, NTera2D1, by primer extension and by analysis of cloned cDNAs. The bulk of the RNA begins (5' end) at the residue previously identified as the 5' terminus of the longest known primate genomic L1 elements, presumed to represent "unit" length. Several of the cDNA clones are close to 6 kilobase pairs, that is, close to full length. The partial sequences of 18 cDNA clones and full sequence of one (5,975 base pairs) indicate that many different genomic L1 elements contribute transcripts to the 6.5-kilobase cytoplasmic poly(A)+ RNA in NTera2D1 cells because no 2 of the 19 cDNAs analyzed had identical sequences. The transcribed elements appear to represent a subset of the total genomic L1s, a subset that has a characteristic consensus sequence in the 3' noncoding region and a high degree of sequence conservation throughout. Two open reading frames (ORFs) of 1,122 (ORF1) and 3,852 (ORF2) bases, flanked by about 800 and 200 bases of sequence at the 5' and 3' ends, respectively, can be identified in the cDNAs. Both ORFs are in the same frame, and they are separated by 33 bases bracketed by two conserved in-frame stop codons. ORF 2 is interrupted by at least one randomly positioned stop codon in the majority of the cDNAs. The data support proposals suggesting that the human L1 family includes one or more functional genes as well as an extraordinarily large number of pseudogenes whose ORFs are broken by stop codons. The cDNA structures suggest that both genes and pseudogenes are transcribed. At least one of the cDNAs (cD11), which was sequenced in its entirety, could, in principle, represent an mRNA for production of the ORF1 polypeptide. The similarity of mammalian L1s to several recently described invertebrate movable elements defines a new widely distributed class of elements which we term class II retrotransposons. Images PMID:2454389

Genetic and Molecular Epidemiological Characterization of a Novel Adenovirus in Antarctic Penguins Collected between 2008 and 2013

PubMed Central

Lee, Sook-Young; Kim, Jeong-Hoon; Seo, Tae-Kun; No, Jin Sun; Kim, Hankyeom; Kim, Won-keun; Choi, Han-Gu; Kang, Sung-Ho; Song, Jin-Won

2016-01-01

Antarctica is considered a relatively uncontaminated region with regard to the infectious diseases because of its extreme environment, and isolated geography. For the genetic characterization and molecular epidemiology of the newly found penguin adenovirus in Antarctica, entire genome sequencing and annual survey of penguin adenovirus were conducted. The entire genome sequences of penguin adenoviruses were completed for two Chinstrap penguins (Pygoscelis antarctica) and two Gentoo penguins (Pygoscelis papua). The whole genome lengths and G+C content of penguin adenoviruses were found to be 24,630–24,662 bp and 35.5–35.6%, respectively. Notably, the presence of putative sialidase gene was not identified in penguin adenoviruses by Rapid Amplification of cDNA Ends (RACE-PCR) as well as consensus specific PCR. The penguin adenoviruses were demonstrated to be a new species within the genus Siadenovirus, with a distance of 29.9–39.3% (amino acid, 32.1–47.9%) in DNA polymerase gene, and showed the closest relationship with turkey adenovirus 3 (TAdV-3) in phylogenetic analysis. During the 2008–2013 study period, the penguin adenoviruses were annually detected in 22 of 78 penguins (28.2%), and the molecular epidemiological study of the penguin adenovirus indicates a predominant infection in Chinstrap penguin population (12/30, 40%). Interestingly, the genome of penguin adenovirus could be detected in several internal samples, except the lymph node and brain. In conclusion, an analysis of the entire adenoviral genomes from Antarctic penguins was conducted, and the penguin adenoviruses, containing unique genetic character, were identified as a new species within the genus Siadenovirus. Moreover, it was annually detected in Antarctic penguins, suggesting its circulation within the penguin population. PMID:27309961
ClubSub-P: Cluster-Based Subcellular Localization Prediction for Gram-Negative Bacteria and Archaea

PubMed Central

Paramasivam, Nagarajan; Linke, Dirk

2011-01-01

The subcellular localization (SCL) of proteins provides important clues to their function in a cell. In our efforts to predict useful vaccine targets against Gram-negative bacteria, we noticed that misannotated start codons frequently lead to wrongly assigned SCLs. This and other problems in SCL prediction, such as the relatively high false-positive and false-negative rates of some tools, can be avoided by applying multiple prediction tools to groups of homologous proteins. Here we present ClubSub-P, an online database that combines existing SCL prediction tools into a consensus pipeline from more than 600 proteomes of fully sequenced microorganisms. On top of the consensus prediction at the level of single sequences, the tool uses clusters of homologous proteins from Gram-negative bacteria and from Archaea to eliminate false-positive and false-negative predictions. ClubSub-P can assign the SCL of proteins from Gram-negative bacteria and Archaea with high precision. The database is searchable, and can easily be expanded using either new bacterial genomes or new prediction tools as they become available. This will further improve the performance of the SCL prediction, as well as the detection of misannotated start codons and other annotation errors. ClubSub-P is available online at http://toolkit.tuebingen.mpg.de/clubsubp/ PMID:22073040
Microfluidic affinity and ChIP-seq analyses converge on a conserved FOXP2-binding motif in chimp and human, which enables the detection of evolutionarily novel targets.

PubMed

Nelson, Christopher S; Fuller, Chris K; Fordyce, Polly M; Greninger, Alexander L; Li, Hao; DeRisi, Joseph L

2013-07-01

The transcription factor forkhead box P2 (FOXP2) is believed to be important in the evolution of human speech. A mutation in its DNA-binding domain causes severe speech impairment. Humans have acquired two coding changes relative to the conserved mammalian sequence. Despite intense interest in FOXP2, it has remained an open question whether the human protein's DNA-binding specificity and chromatin localization are conserved. Previous in vitro and ChIP-chip studies have provided conflicting consensus sequences for the FOXP2-binding site. Using MITOMI 2.0 microfluidic affinity assays, we describe the binding site of FOXP2 and its affinity profile in base-specific detail for all substitutions of the strongest binding site. We find that human and chimp FOXP2 have similar binding sites that are distinct from previously suggested consensus binding sites. Additionally, through analysis of FOXP2 ChIP-seq data from cultured neurons, we find strong overrepresentation of a motif that matches our in vitro results and identifies a set of genes with FOXP2 binding sites. The FOXP2-binding sites tend to be conserved, yet we identified 38 instances of evolutionarily novel sites in humans. Combined, these data present a comprehensive portrait of FOXP2's-binding properties and imply that although its sequence specificity has been conserved, some of its genomic binding sites are newly evolved.
Microfluidic affinity and ChIP-seq analyses converge on a conserved FOXP2-binding motif in chimp and human, which enables the detection of evolutionarily novel targets

PubMed Central

Nelson, Christopher S.; Fuller, Chris K.; Fordyce, Polly M.; Greninger, Alexander L.; Li, Hao; DeRisi, Joseph L.

2013-01-01

The transcription factor forkhead box P2 (FOXP2) is believed to be important in the evolution of human speech. A mutation in its DNA-binding domain causes severe speech impairment. Humans have acquired two coding changes relative to the conserved mammalian sequence. Despite intense interest in FOXP2, it has remained an open question whether the human protein’s DNA-binding specificity and chromatin localization are conserved. Previous in vitro and ChIP-chip studies have provided conflicting consensus sequences for the FOXP2-binding site. Using MITOMI 2.0 microfluidic affinity assays, we describe the binding site of FOXP2 and its affinity profile in base-specific detail for all substitutions of the strongest binding site. We find that human and chimp FOXP2 have similar binding sites that are distinct from previously suggested consensus binding sites. Additionally, through analysis of FOXP2 ChIP-seq data from cultured neurons, we find strong overrepresentation of a motif that matches our in vitro results and identifies a set of genes with FOXP2 binding sites. The FOXP2-binding sites tend to be conserved, yet we identified 38 instances of evolutionarily novel sites in humans. Combined, these data present a comprehensive portrait of FOXP2’s-binding properties and imply that although its sequence specificity has been conserved, some of its genomic binding sites are newly evolved. PMID:23625967
Structural organization of the porcine and human genes coding for a leydig cell-specific insulin-like peptide (LEY I-L) and chromosomal localization of the human gene (INSL3)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Burkhardt E.; Adham, I.M.; Brosig, B.

1994-03-01

Leydig insulin-like protein (LEY I-L) is a member of the insulin-like hormone superfamily. The LEY I-L gene (designated INSL3) is expressed exclusively in prenatal and postnatal Leydig cells. The authors report here the cloning and nucleotide sequence of porcine and human LEY I-L genes including the 5[prime] regions. Both genes consist of two exons and one intron. The organization of the LEY I-L gene is similar to that of insulin and relaxin. The transcription start site in the porcine and human LEY I-L gene is localized 13 and 14 bp upstream of the translation start site, respectively. Alignment of themore » 5[prime] flanking regions of both genes reveals that the first 107 nucleotides upstream of the transcription start site exhibit an overall sequence similarity of 80%. This conserved region contains a consensus TATAA box, a CAAT-like element (GAAT), and a consensus SP1 sequence (GGGCGG) at equivalent positions in both genes and therefore may play a role in regulation of expression of the LEY I-L gene. The porcine and human genome contains a single copy of the LEY I-L gene. By in situ hybridization, the human gene was assigned to bands p13.2-p12 of the short arm of chromosome 19. 25 refs., 6 figs.« less
Experimental Identification of Actinobacillus pleuropneumoniae Strains L20 and JL03 Heptosyltransferases, Evidence for a New Heptosyltransferase Signature Sequence

PubMed Central

Merino, Susana; Knirel, Yuriy A.; Regué, Miguel; Tomás, Juan M.

2013-01-01

We experimentally identified the activities of six predicted heptosyltransferases in Actinobacillus pleuropneumoniae genome serotype 5b strain L20 and serotype 3 strain JL03. The initial identification was based on a bioinformatic analysis of the amino acid similarity between these putative heptosyltrasferases with others of known function from enteric bacteria and Aeromonas. The putative functions of all the Actinobacillus pleuropneumoniae heptosyltrasferases were determined by using surrogate LPS acceptor molecules from well-defined A. hydrophyla AH-3 and A. salmonicida A450 mutants. Our results show that heptosyltransferases APL_0981 and APJL_1001 are responsible for the transfer of the terminal outer core D-glycero-D-manno-heptose (D,D-Hep) residue although they are not currently included in the CAZY glycosyltransferase 9 family. The WahF heptosyltransferase group signature sequence [S(T/S)(GA)XXH] differs from the heptosyltransferases consensus signature sequence [D(TS)(GA)XXH], because of the substitution of D261 for S261, being unique. PMID:23383222
Non-Coding RNA Analysis Using the Rfam Database.

PubMed

Kalvari, Ioanna; Nawrocki, Eric P; Argasinska, Joanna; Quinones-Olvera, Natalia; Finn, Robert D; Bateman, Alex; Petrov, Anton I

2018-06-01

Rfam is a database of non-coding RNA families in which each family is represented by a multiple sequence alignment, a consensus secondary structure, and a covariance model. Using a combination of manual and literature-based curation and a custom software pipeline, Rfam converts descriptions of RNA families found in the scientific literature into computational models that can be used to annotate RNAs belonging to those families in any DNA or RNA sequence. Valuable research outputs that are often locked up in figures and supplementary information files are encapsulated in Rfam entries and made accessible through the Rfam Web site. The data produced by Rfam have a broad application, from genome annotation to providing training sets for algorithm development. This article gives an overview of how to search and navigate the Rfam Web site, and how to annotate sequences with RNA families. The Rfam database is freely available at http://rfam.org. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.
Functional Genomics Analysis of Big Data Identifies Novel Peroxisome Proliferator-Activated Receptor γ Target Single Nucleotide Polymorphisms Showing Association With Cardiometabolic Outcomes.

PubMed

Richardson, Kris; Schnitzler, Gavin R; Lai, Chao-Qiang; Ordovas, Jose M

2015-12-01

Cardiovascular disease and type 2 diabetes mellitus represent overlapping diseases where a large portion of the variation attributable to genetics remains unexplained. An important player in their pathogenesis is peroxisome proliferator-activated receptor γ (PPARγ) that is involved in lipid and glucose metabolism and maintenance of metabolic homeostasis. We used a functional genomics methodology to interrogate human chromatin immunoprecipitation-sequencing, genome-wide association studies, and expression quantitative trait locus data to inform selection of candidate functional single nucleotide polymorphisms (SNPs) falling in PPARγ motifs. We derived 27 328 chromatin immunoprecipitation-sequencing peaks for PPARγ in human adipocytes through meta-analysis of 3 data sets. The PPARγ consensus motif showed greatest enrichment and mapped to 8637 peaks. We identified 146 SNPs in these motifs. This number was significantly less than would be expected by chance, and Inference of Natural Selection from Interspersed Genomically coHerent elemenTs analysis indicated that these motifs are under weak negative selection. A screen of these SNPs against genome-wide association studies for cardiometabolic traits revealed significant enrichment with 16 SNPs. A screen against the MuTHER expression quantitative trait locus data revealed 8 of these were significantly associated with altered gene expression in human adipose, more than would be expected by chance. Several SNPs fall close, or are linked by expression quantitative trait locus to lipid-metabolism loci including CYP26A1. We demonstrated the use of functional genomics to identify SNPs of potential function. Specifically, that SNPs within PPARγ motifs that bind PPARγ in adipocytes are significantly associated with cardiometabolic disease and with the regulation of transcription in adipose. This method may be used to uncover functional SNPs that do not reach significance thresholds in the agnostic approach of genome-wide association studies. © 2015 American Heart Association, Inc.
Predictive genetic testing for adult-onset disorders in minors: a critical analysis of the arguments for and against the 2013 ACMG guidelines.

PubMed

Anderson, J A; Hayeems, R Z; Shuman, C; Szego, M J; Monfared, N; Bowdin, S; Zlotnik Shaul, R; Meyn, M S

2015-04-01

The publication of the ACMG recommendations has reignited the debate over predictive testing for adult-onset disorders in minors. Response has been polarized. With this in mind, we review and critically analyze this debate. First, we identify long-standing inconsistencies between consensus guidelines and clinical practice regarding risk assessment for adult-onset genetic disorders in children using family history and molecular analysis. Second, we discuss the disparate assumptions regarding the nature of whole genome and exome sequencing underlying arguments of both supporters and critics, and the role these assumptions play in the arguments for and against reporting. Third, we suggest that implicit differences regarding the definition of best interests of the child underlie disparate conclusions as to the best interests of children in this context. We conclude by calling for clarity and consensus concerning the central foci of this debate. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Multiple splicing defects in an intronic false exon.

PubMed

Sun, H; Chasin, L A

2000-09-01

Splice site consensus sequences alone are insufficient to dictate the recognition of real constitutive splice sites within the typically large transcripts of higher eukaryotes, and large numbers of pseudoexons flanked by pseudosplice sites with good matches to the consensus sequences can be easily designated. In an attempt to identify elements that prevent pseudoexon splicing, we have systematically altered known splicing signals, as well as immediately adjacent flanking sequences, of an arbitrarily chosen pseudoexon from intron 1 of the human hprt gene. The substitution of a 5' splice site that perfectly matches the 5' consensus combined with mutation to match the CAG/G sequence of the 3' consensus failed to get this model pseudoexon included as the central exon in a dhfr minigene context. Provision of a real 3' splice site and a consensus 5' splice site and removal of an upstream inhibitory sequence were necessary and sufficient to confer splicing on the pseudoexon. This activated context also supported the splicing of a second pseudoexon sequence containing no apparent enhancer. Thus, both the 5' splice site sequence and the polypyrimidine tract of the pseudoexon are defective despite their good agreement with the consensus. On the other hand, the pseudoexon body did not exert a negative influence on splicing. The introduction into the pseudoexon of a sequence selected for binding to ASF/SF2 or its replacement with beta-globin exon 2 only partially reversed the effect of the upstream negative element and the defective polypyrimidine tract. These results support the idea that exon-bridging enhancers are not a prerequisite for constitutive exon definition and suggest that intrinsically defective splice sites and negative elements play important roles in distinguishing the real splicing signal from the vast number of false splicing signals.
Cloning and expression of a nuclear encoded plastid specific 33 kDa ribonucleoprotein gene (33RNP) from pea that is light stimulated.

PubMed

Reddy, M K; Nair, S; Singh, B N; Mudgil, Y; Tewari, K K; Sopory, S K

2001-01-24

We report the cloning and sequencing of both cDNA and genomic DNA of a 33 kDa chloroplast ribonucleoprotein (33RNP) from pea. The analysis of the predicted amino acid sequence of the cDNA clone revealed that the encoded protein contains two RNA binding domains, including the conserved consensus ribonucleoprotein sequences CS-RNP1 and CS-RNP2, on the C-terminus half and the presence of a putative transit peptide sequence in the N-terminus region. The phylogenetic and multiple sequence alignment analysis of pea chloroplast RNP along with RNPs reported from the other plant sources revealed that the pea 33RNP is very closely related to Nicotiana sylvestris 31RNP and 28RNP and also to 31RNP and 28RNP of Arabidopsis and spinach, respectively. The pea 33RNP was expressed in Escherichia coli and purified to homogeneity. The in vitro import of precursor protein into chloroplasts confirmed that the N-terminus putative transit peptide is a bona fide transit peptide and 33RNP is localized in the chloroplast. The nucleic acid-binding properties of the recombinant protein, as revealed by South-Western analysis, showed that 33RNP has higher binding affinity for poly (U) and oligo dT than for ssDNA and dsDNA. The steady state transcript level was higher in leaves than in roots and the expression of this gene is light stimulated. Sequence analysis of the genomic clone revealed that the gene contains four exons and three introns. We have also isolated and analyzed the 5' flanking region of the pea 33RNP gene.
Whole genome sequencing of 35 individuals provides insights into the genetic architecture of Korean population.

PubMed

Zhang, Wenqian; Meehan, Joe; Su, Zhenqiang; Ng, Hui Wen; Shu, Mao; Luo, Heng; Ge, Weigong; Perkins, Roger; Tong, Weida; Hong, Huixiao

2014-01-01

Due to a significant decline in the costs associated with next-generation sequencing, it has become possible to decipher the genetic architecture of a population by sequencing a large number of individuals to a deep coverage. The Korean Personal Genomes Project (KPGP) recently sequenced 35 Korean genomes at high coverage using the Illumina Hiseq platform and made the deep sequencing data publicly available, providing the scientific community opportunities to decipher the genetic architecture of the Korean population. In this study, we used two single nucleotide variant (SNV) calling pipelines: mapping the raw reads obtained from whole genome sequencing of 35 Korean individuals in KPGP using BWA and SOAP2 followed by SNV calling using SAMtools and SOAPsnp, respectively. The consensus SNVs obtained from the two SNV pipelines were used to represent the SNVs of the Korean population. We compared these SNVs to those from 17 other populations provided by the HapMap consortium and the 1000 Genomes Project (1KGP) and identified SNVs that were only present in the Korean population. We studied the mutation spectrum and analyzed the genes of non-synonymous SNVs only detected in the Korean population. We detected a total of 8,555,726 SNVs in the 35 Korean individuals and identified 1,213,613 SNVs detected in at least one Korean individual (SNV-1) and 12,640 in all of 35 Korean individuals (SNV-35) but not in 17 other populations. In contrast with the SNVs common to other populations in HapMap and 1KGP, the Korean only SNVs had high percentages of non-silent variants, emphasizing the unique roles of these Korean only SNVs in the Korean population. Specifically, we identified 8,361 non-synonymous Korean only SNVs, of which 58 SNVs existed in all 35 Korean individuals. The 5,754 genes of non-synonymous Korean only SNVs were highly enriched in some metabolic pathways. We found adhesion is the top disease term associated with SNV-1 and Nelson syndrome is the only disease term associated with SNV-35. We found that a significant number of Korean only SNVs are in genes that are associated with the drug term of adenosine. We identified the SNVs that were found in the Korean population but not seen in other populations, and explored the corresponding genes and pathways as well as the associated disease terms and drug terms. The results expand our knowledge of the genetic architecture of the Korean population, which will benefit the implementation of personalized medicine for the Korean population.
A retrotransposable element from the mosquito Anopheles gambiae .

PubMed Central

Besansky, N J

1990-01-01

A family of middle repetitive elements from the African malaria vector Anopheles gambiae is described. Approximately 100 copies of the element, designated T1Ag, are dispersed in the genome. Full-length elements are 4.6 kilobase pairs in length, but truncation of the 5' end is common. Nucleotide sequences of one full-length, two 5'-truncated, and two 5' ends of T1Ag elements were determined and aligned to define a consensus sequence. Sequence analysis revealed two long, overlapping open reading frames followed by a polyadenylation signal, AATAAA, and a tail consisting of tandem repetitions of the motif TGAAA. No direct or inverted long terminal repeats (LTRs) were detected. The first open reading frame, 442 amino acids in length, includes a domain resembling that of nucleic acid-binding proteins. The second open reading frame, 975 amino acids long, resembles the reverse transcriptases of a category of retrotransposable elements without LTRs, variously termed class II retrotransposons, class III elements or non-LTR retrotransposons. Similarity at the sequence and structural levels places T1Ag in this category. Images PMID:1689457
Pilot survey of expressed sequence tags (ESTs) from the asexual blood stages of Plasmodium vivax in human patients.

PubMed

Merino, Emilio F; Fernandez-Becerra, Carmen; Madeira, Alda M B N; Machado, Ariane L; Durham, Alan; Gruber, Arthur; Hall, Neil; del Portillo, Hernando A

2003-07-21

Plasmodium vivax is the most widely distributed human malaria, responsible for 70-80 million clinical cases each year and large socio-economical burdens for countries such as Brazil where it is the most prevalent species. Unfortunately, due to the impossibility of growing this parasite in continuous in vitro culture, research on P. vivax remains largely neglected. A pilot survey of expressed sequence tags (ESTs) from the asexual blood stages of P. vivax was performed. To do so, 1,184 clones from a cDNA library constructed with parasites obtained from 10 different human patients in the Brazilian Amazon were sequenced. Sequences were automatedly processed to remove contaminants and low quality reads. A total of 806 sequences with an average length of 586 bp met such criteria and their clustering revealed 666 distinct events. The consensus sequence of each cluster and the unique sequences of the singlets were used in similarity searches against different databases that included P. vivax, Plasmodium falciparum, Plasmodium yoelii, Plasmodium knowlesi, Apicomplexa and the GenBank non-redundant database. An E-value of <10(-30) was used to define a significant database match. ESTs were manually assigned a gene ontology (GO) terminology A total of 769 ESTs could be assigned a putative identity based upon sequence similarity to known proteins in GenBank. Moreover, 292 ESTs were annotated and a GO terminology was assigned to 164 of them. These are the first ESTs reported for P. vivax and, as such, they represent a valuable resource to assist in the annotation of the P. vivax genome currently being sequenced. Moreover, since the GC-content of the P. vivax genome is strikingly different from that of P. falciparum, these ESTs will help in the validation of gene predictions for P. vivax and to create a gene index of this malaria parasite.
Resolving Heart Regeneration by Replacement Histone Profiling.

PubMed

Goldman, Joseph Aaron; Kuzu, Guray; Lee, Nutishia; Karasik, Jaclyn; Gemberling, Matthew; Foglia, Matthew J; Karra, Ravi; Dickson, Amy L; Sun, Fei; Tolstorukov, Michael Y; Poss, Kenneth D

2017-02-27

Chromatin regulation is a principal mechanism governing animal development, yet it is unclear to what extent structural changes in chromatin underlie tissue regeneration. Non-mammalian vertebrates such as zebrafish activate cardiomyocyte (CM) division after tissue damage to regenerate lost heart muscle. Here, we generated transgenic zebrafish expressing a biotinylatable H3.3 histone variant in CMs and derived cell-type-specific profiles of histone replacement. We identified an emerging program of putative enhancers that revise H3.3 occupancy during regeneration, overlaid upon a genome-wide reduction of H3.3 from promoters. In transgenic reporter lines, H3.3-enriched elements directed gene expression in subpopulations of CMs. Other elements increased H3.3 enrichment and displayed enhancer activity in settings of injury- and/or Neuregulin1-elicited CM proliferation. Dozens of consensus sequence motifs containing predicted transcription factor binding sites were enriched in genomic regions with regeneration-responsive H3.3 occupancy. Thus, cell-type-specific regulatory programs of tissue regeneration can be revealed by genome-wide H3.3 profiling. Copyright © 2017 Elsevier Inc. All rights reserved.
Determinants for DNA target structure selectivity of the human LINE-1 retrotransposon endonuclease.

PubMed

Repanas, Kostas; Zingler, Nora; Layer, Liliana E; Schumann, Gerald G; Perrakis, Anastassis; Weichenrieder, Oliver

2007-01-01

The human LINE-1 endonuclease (L1-EN) is the targeting endonuclease encoded by the human LINE-1 (L1) retrotransposon. L1-EN guides the genomic integration of new L1 and Alu elements that presently account for approximately 28% of the human genome. L1-EN bears considerable technological interest, because its target selectivity may ultimately be engineered to allow the site-specific integration of DNA into defined genomic locations. Based on the crystal structure, we generated L1-EN mutants to analyze and manipulate DNA target site recognition. Crystal structures and their dynamic and functional analysis show entire loop grafts to be feasible, resulting in altered specificity, while individual point mutations do not change the nicking pattern of L1-EN. Structural parameters of the DNA target seem more important for recognition than the nucleotide sequence, and nicking profiles on DNA oligonucleotides in vitro are less well defined than the respective integration site consensus in vivo. This suggests that additional factors other than the DNA nicking specificity of L1-EN contribute to the targeted integration of non-LTR retrotransposons.
Critical Assessment of Metagenome Interpretation – a benchmark of computational metagenomics software

PubMed Central

Sczyrba, Alexander; Hofmann, Peter; Belmann, Peter; Koslicki, David; Janssen, Stefan; Dröge, Johannes; Gregor, Ivan; Majda, Stephan; Fiedler, Jessika; Dahms, Eik; Bremges, Andreas; Fritz, Adrian; Garrido-Oter, Ruben; Jørgensen, Tue Sparholt; Shapiro, Nicole; Blood, Philip D.; Gurevich, Alexey; Bai, Yang; Turaev, Dmitrij; DeMaere, Matthew Z.; Chikhi, Rayan; Nagarajan, Niranjan; Quince, Christopher; Meyer, Fernando; Balvočiūtė, Monika; Hansen, Lars Hestbjerg; Sørensen, Søren J.; Chia, Burton K. H.; Denis, Bertrand; Froula, Jeff L.; Wang, Zhong; Egan, Robert; Kang, Dongwan Don; Cook, Jeffrey J.; Deltel, Charles; Beckstette, Michael; Lemaitre, Claire; Peterlongo, Pierre; Rizk, Guillaume; Lavenier, Dominique; Wu, Yu-Wei; Singer, Steven W.; Jain, Chirag; Strous, Marc; Klingenberg, Heiner; Meinicke, Peter; Barton, Michael; Lingner, Thomas; Lin, Hsin-Hung; Liao, Yu-Chieh; Silva, Genivaldo Gueiros Z.; Cuevas, Daniel A.; Edwards, Robert A.; Saha, Surya; Piro, Vitor C.; Renard, Bernhard Y.; Pop, Mihai; Klenk, Hans-Peter; Göker, Markus; Kyrpides, Nikos C.; Woyke, Tanja; Vorholt, Julia A.; Schulze-Lefert, Paul; Rubin, Edward M.; Darling, Aaron E.; Rattei, Thomas; McHardy, Alice C.

2018-01-01

In metagenome analysis, computational methods for assembly, taxonomic profiling and binning are key components facilitating downstream biological data interpretation. However, a lack of consensus about benchmarking datasets and evaluation metrics complicates proper performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on datasets of unprecedented complexity and realism. Benchmark metagenomes were generated from ~700 newly sequenced microorganisms and ~600 novel viruses and plasmids, including genomes with varying degrees of relatedness to each other and to publicly available ones and representing common experimental setups. Across all datasets, assembly and genome binning programs performed well for species represented by individual genomes, while performance was substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below the family level. Parameter settings substantially impacted performances, underscoring the importance of program reproducibility. While highlighting current challenges in computational metagenomics, the CAMI results provide a roadmap for software selection to answer specific research questions. PMID:28967888
Generation and Analysis of Expressed Sequence Tags from Olea europaea L.

PubMed Central

Ozdemir Ozgenturk, Nehir; Oruç, Fatma; Sezerman, Ugur; Kuçukural, Alper; Vural Korkut, Senay; Toksoz, Feriha; Un, Cemal

2010-01-01

Olive (Olea europaea L.) is an important source of edible oil which was originated in Near-East region. In this study, two cDNA libraries were constructed from young olive leaves and immature olive fruits for generation of ESTs to discover the novel genes and search the function of unknown genes of olive. The randomly selected 3840 colonies were sequenced for EST collection from both libraries. Readable 2228 sequences for olive leaf and 1506 sequences for olive fruit were assembled into 205 and 69 contigs, respectively, whereas 2478 were singletons. Putative functions of all 2752 differentially expressed unique sequences were designated by gene homology based on BLAST and annotated using BLAST2GO. While 1339 ESTs show no homology to the database, 2024 ESTs have homology (under 80%) with hypothetical proteins, putative proteins, expressed proteins, and unknown proteins in NCBI-GenBank. 635 EST's unique genes sequence have been identified by over 80% homology to known function in other species which were not previously described in Olea family. Only 3.1% of total EST's was shown similarity with olive database existing in NCBI. This generated EST's data and consensus sequences were submitted to NCBI as valuable source for functional genome studies of olive. PMID:21197085
Shotgun Protein Sequencing with Meta-contig Assembly*

PubMed Central

Guthals, Adrian; Clauser, Karl R.; Bandeira, Nuno

2012-01-01

Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings. PMID:22798278
Shotgun protein sequencing with meta-contig assembly.

PubMed

Guthals, Adrian; Clauser, Karl R; Bandeira, Nuno

2012-10-01

Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings.

A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis

PubMed Central

Fitzpatrick, David A; Logue, Mary E; Stajich, Jason E; Butler, Geraldine

2006-01-01

Background To date, most fungal phylogenies have been derived from single gene comparisons, or from concatenated alignments of a small number of genes. The increase in fungal genome sequencing presents an opportunity to reconstruct evolutionary events using entire genomes. As a tool for future comparative, phylogenomic and phylogenetic studies, we used both supertrees and concatenated alignments to infer relationships between 42 species of fungi for which complete genome sequences are available. Results A dataset of 345,829 genes was extracted from 42 publicly available fungal genomes. Supertree methods were employed to derive phylogenies from 4,805 single gene families. We found that the average consensus supertree method may suffer from long-branch attraction artifacts, while matrix representation with parsimony (MRP) appears to be immune from these. A genome phylogeny was also reconstructed from a concatenated alignment of 153 universally distributed orthologs. Our MRP supertree and concatenated phylogeny are highly congruent. Within the Ascomycota, the sub-phyla Pezizomycotina and Saccharomycotina were resolved. Both phylogenies infer that the Leotiomycetes are the closest sister group to the Sordariomycetes. There is some ambiguity regarding the placement of Stagonospora nodurum, the sole member of the class Dothideomycetes present in the dataset. Within the Saccharomycotina, a monophyletic clade containing organisms that translate CTG as serine instead of leucine is evident. There is also strong support for two groups within the CTG clade, one containing the fully sexual species Candida lusitaniae, Candida guilliermondii and Debaryomyces hansenii, and the second group containing Candida albicans, Candida dubliniensis, Candida tropicalis, Candida parapsilosis and Lodderomyces elongisporus. The second major clade within the Saccharomycotina contains species whose genomes have undergone a whole genome duplication (WGD), and their close relatives. We could not confidently resolve whether Candida glabrata or Saccharomyces castellii lies at the base of the WGD clade. Conclusion We have constructed robust phylogenies for fungi based on whole genome analysis. Overall, our phylogenies provide strong support for the classification of phyla, sub-phyla, classes and orders. We have resolved the relationship of the classes Leotiomyctes and Sordariomycetes, and have identified two classes within the CTG clade of the Saccharomycotina that may correlate with sexual status. PMID:17121679
p53 Specifically Binds Triplex DNA In Vitro and in Cells

PubMed Central

Brázdová, Marie; Tichý, Vlastimil; Helma, Robert; Bažantová, Pavla; Polášková, Alena; Krejčí, Aneta; Petr, Marek; Navrátilová, Lucie; Tichá, Olga; Nejedlý, Karel; Bennink, Martin L.; Subramaniam, Vinod; Bábková, Zuzana; Martínek, Tomáš; Lexa, Matej; Adámik, Matej

2016-01-01

Triplex DNA is implicated in a wide range of biological activities, including regulation of gene expression and genomic instability leading to cancer. The tumor suppressor p53 is a central regulator of cell fate in response to different type of insults. Sequence and structure specific modes of DNA recognition are core attributes of the p53 protein. The focus of this work is the structure-specific binding of p53 to DNA containing triplex-forming sequences in vitro and in cells and the effect on p53-driven transcription. This is the first DNA binding study of full-length p53 and its deletion variants to both intermolecular and intramolecular T.A.T triplexes. We demonstrate that the interaction of p53 with intermolecular T.A.T triplex is comparable to the recognition of CTG-hairpin non-B DNA structure. Using deletion mutants we determined the C-terminal DNA binding domain of p53 to be crucial for triplex recognition. Furthermore, strong p53 recognition of intramolecular T.A.T triplexes (H-DNA), stabilized by negative superhelicity in plasmid DNA, was detected by competition and immunoprecipitation experiments, and visualized by AFM. Moreover, chromatin immunoprecipitation revealed p53 binding T.A.T forming sequence in vivo. Enhanced reporter transactivation by p53 on insertion of triplex forming sequence into plasmid with p53 consensus sequence was observed by luciferase reporter assays. In-silico scan of human regulatory regions for the simultaneous presence of both consensus sequence and T.A.T motifs identified a set of candidate p53 target genes and p53-dependent activation of several of them (ABCG5, ENOX1, INSR, MCC, NFAT5) was confirmed by RT-qPCR. Our results show that T.A.T triplex comprises a new class of p53 binding sites targeted by p53 in a DNA structure-dependent mode in vitro and in cells. The contribution of p53 DNA structure-dependent binding to the regulation of transcription is discussed. PMID:27907175
BAC sequencing using pooled methods.

PubMed

Saski, Christopher A; Feltus, F Alex; Parida, Laxmi; Haiminen, Niina

2015-01-01

Shotgun sequencing and assembly of a large, complex genome can be both expensive and challenging to accurately reconstruct the true genome sequence. Repetitive DNA arrays, paralogous sequences, polyploidy, and heterozygosity are main factors that plague de novo genome sequencing projects that typically result in highly fragmented assemblies and are difficult to extract biological meaning. Targeted, sub-genomic sequencing offers complexity reduction by removing distal segments of the genome and a systematic mechanism for exploring prioritized genomic content through BAC sequencing. If one isolates and sequences the genome fraction that encodes the relevant biological information, then it is possible to reduce overall sequencing costs and efforts that target a genomic segment. This chapter describes the sub-genome assembly protocol for an organism based upon a BAC tiling path derived from a genome-scale physical map or from fine mapping using BACs to target sub-genomic regions. Methods that are described include BAC isolation and mapping, DNA sequencing, and sequence assembly.
Pro-Inflammatory Flagellin Proteins of Prevalent Motile Commensal Bacteria Are Variably Abundant in the Intestinal Microbiome of Elderly Humans

PubMed Central

Neville, B. Anne; Sheridan, Paul O.; Harris, Hugh M. B.; Coughlan, Simone; Flint, Harry J.; Duncan, Sylvia H.; Jeffery, Ian B.; Claesson, Marcus J.; Ross, R. Paul; Scott, Karen P.; O'Toole, Paul W.

2013-01-01

Some Eubacterium and Roseburia species are among the most prevalent motile bacteria present in the intestinal microbiota of healthy adults. These flagellate species contribute “cell motility” category genes to the intestinal microbiome and flagellin proteins to the intestinal proteome. We reviewed and revised the annotation of motility genes in the genomes of six Eubacterium and Roseburia species that occur in the human intestinal microbiota and examined their respective locus organization by comparative genomics. Motility gene order was generally conserved across these loci. Five of these species harbored multiple genes for predicted flagellins. Flagellin proteins were isolated from R. inulinivorans strain A2-194 and from E. rectale strains A1-86 and M104/1. The amino-termini sequences of the R. inulinivorans and E. rectale A1-86 proteins were almost identical. These protein preparations stimulated secretion of interleukin-8 (IL-8) from human intestinal epithelial cell lines, suggesting that these flagellins were pro-inflammatory. Flagellins from the other four species were predicted to be pro-inflammatory on the basis of alignment to the consensus sequence of pro-inflammatory flagellins from the β- and γ- proteobacteria. Many fliC genes were deduced to be under the control of σ28. The relative abundance of the target Eubacterium and Roseburia species varied across shotgun metagenomes from 27 elderly individuals. Genes involved in the flagellum biogenesis pathways of these species were variably abundant in these metagenomes, suggesting that the current depth of coverage used for metagenomic sequencing (3.13–4.79 Gb total sequence in our study) insufficiently captures the functional diversity of genomes present at low (≤1%) relative abundance. E. rectale and R. inulinivorans thus appear to synthesize complex flagella composed of flagellin proteins that stimulate IL-8 production. A greater depth of sequencing, improved evenness of sequencing and improved metagenome assembly from short reads will be required to facilitate in silico analyses of complete complex biochemical pathways for low-abundance target species from shotgun metagenomes. PMID:23935906
A Phylogenomic Approach Based on PCR Target Enrichment and High Throughput Sequencing: Resolving the Diversity within the South American Species of Bartsia L. (Orobanchaceae)

PubMed Central

Tank, David C.

2016-01-01

Advances in high-throughput sequencing (HTS) have allowed researchers to obtain large amounts of biological sequence information at speeds and costs unimaginable only a decade ago. Phylogenetics, and the study of evolution in general, is quickly migrating towards using HTS to generate larger and more complex molecular datasets. In this paper, we present a method that utilizes microfluidic PCR and HTS to generate large amounts of sequence data suitable for phylogenetic analyses. The approach uses the Fluidigm Access Array System (Fluidigm, San Francisco, CA, USA) and two sets of PCR primers to simultaneously amplify 48 target regions across 48 samples, incorporating sample-specific barcodes and HTS adapters (2,304 unique amplicons per Access Array). The final product is a pooled set of amplicons ready to be sequenced, and thus, there is no need to construct separate, costly genomic libraries for each sample. Further, we present a bioinformatics pipeline to process the raw HTS reads to either generate consensus sequences (with or without ambiguities) for every locus in every sample or—more importantly—recover the separate alleles from heterozygous target regions in each sample. This is important because it adds allelic information that is well suited for coalescent-based phylogenetic analyses that are becoming very common in conservation and evolutionary biology. To test our approach and bioinformatics pipeline, we sequenced 576 samples across 96 target regions belonging to the South American clade of the genus Bartsia L. in the plant family Orobanchaceae. After sequencing cleanup and alignment, the experiment resulted in ~25,300bp across 486 samples for a set of 48 primer pairs targeting the plastome, and ~13,500bp for 363 samples for a set of primers targeting regions in the nuclear genome. Finally, we constructed a combined concatenated matrix from all 96 primer combinations, resulting in a combined aligned length of ~40,500bp for 349 samples. PMID:26828929
MytiBase: a knowledgebase of mussel (M. galloprovincialis) transcribed sequences

PubMed Central

Venier, Paola; De Pittà, Cristiano; Bernante, Filippo; Varotto, Laura; De Nardi, Barbara; Bovo, Giuseppe; Roch, Philippe; Novoa, Beatriz; Figueras, Antonio; Pallavicini, Alberto; Lanfranchi, Gerolamo

2009-01-01

Background Although Bivalves are among the most studied marine organisms due to their ecological role, economic importance and use in pollution biomonitoring, very little information is available on the genome sequences of mussels. This study reports the functional analysis of a large-scale Expressed Sequence Tag (EST) sequencing from different tissues of Mytilus galloprovincialis (the Mediterranean mussel) challenged with toxic pollutants, temperature and potentially pathogenic bacteria. Results We have constructed and sequenced seventeen cDNA libraries from different Mediterranean mussel tissues: gills, digestive gland, foot, anterior and posterior adductor muscle, mantle and haemocytes. A total of 24,939 clones were sequenced from these libraries generating 18,788 high-quality ESTs which were assembled into 2,446 overlapping clusters and 4,666 singletons resulting in a total of 7,112 non-redundant sequences. In particular, a high-quality normalized cDNA library (Nor01) was constructed as determined by the high rate of gene discovery (65.6%). Bioinformatic screening of the non-redundant M. galloprovincialis sequences identified 159 microsatellite-containing ESTs. Clusters, consensuses, related similarities and gene ontology searches have been organized in a dedicated, searchable database . Conclusion We defined the first species-specific catalogue of M. galloprovincialis ESTs including 7,112 unique transcribed sequences. Putative microsatellite markers were identified. This annotated catalogue represents a valuable platform for expression studies, marker validation and genetic linkage analysis for investigations in the biology of Mediterranean mussels. PMID:19203376
Genome-wide SNP identification for the construction of a high-resolution genetic map of Japanese flounder (Paralichthys olivaceus): applications to QTL mapping of Vibrio anguillarum disease resistance and comparative genomic analysis

PubMed Central

Shao, Changwei; Niu, Yongchao; Rastas, Pasi; Liu, Yang; Xie, Zhiyuan; Li, Hengde; Wang, Lei; Jiang, Yong; Tai, Shuaishuai; Tian, Yongsheng; Sakamoto, Takashi; Chen, Songlin

2015-01-01

High-resolution genetic maps are essential for fine mapping of complex traits, genome assembly, and comparative genomic analysis. Single-nucleotide polymorphisms (SNPs) are the primary molecular markers used for genetic map construction. In this study, we identified 13,362 SNPs evenly distributed across the Japanese flounder (Paralichthys olivaceus) genome. Of these SNPs, 12,712 high-confidence SNPs were subjected to high-throughput genotyping and assigned to 24 consensus linkage groups (LGs). The total length of the genetic linkage map was 3,497.29 cM with an average distance of 0.47 cM between loci, thereby representing the densest genetic map currently reported for Japanese flounder. Nine positive quantitative trait loci (QTLs) forming two main clusters for Vibrio anguillarum disease resistance were detected. All QTLs could explain 5.1–8.38% of the total phenotypic variation. Synteny analysis of the QTL regions on the genome assembly revealed 12 immune-related genes, among them 4 genes strongly associated with V. anguillarum disease resistance. In addition, 246 genome assembly scaffolds with an average size of 21.79 Mb were anchored onto the LGs; these scaffolds, comprising 522.99 Mb, represented 95.78% of assembled genomic sequences. The mapped assembly scaffolds in Japanese flounder were used for genome synteny analyses against zebrafish (Danio rerio) and medaka (Oryzias latipes). Flounder and medaka were found to possess almost one-to-one synteny, whereas flounder and zebrafish exhibited a multi-syntenic correspondence. The newly developed high-resolution genetic map, which will facilitate QTL mapping, scaffold assembly, and genome synteny analysis of Japanese flounder, marks a milestone in the ongoing genome project for this species. PMID:25762582
The mitochondrial genome of the gymnosperm Cycas taitungensis contains a novel family of short interspersed elements, Bpu sequences, and abundant RNA editing sites.

PubMed

Chaw, Shu-Miaw; Shih, Arthur Chun-Chieh; Wang, Daryi; Wu, Yu-Wei; Liu, Shu-Mei; Chou, The-Yuan

2008-03-01

The mtDNA of Cycas taitungensis is a circular molecule of 414,903 bp, making it 2- to 6-fold larger than the known mtDNAs of charophytes and bryophytes, but similar to the average of 7 elucidated angiosperm mtDNAs. It is characterized by abundant RNA editing sites (1,084), more than twice the number found in the angiosperm mtDNAs. The A + T content of Cycas mtDNA is 53.1%, the lowest among known land plants. About 5% of the Cycas mtDNA is composed of a novel family of mobile elements, which we designated as "Bpu sequences." They share a consensus sequence of 36 bp with 2 terminal direct repeats (AAGG) and a recognition site for the Bpu 10I restriction endonuclease (CCTGAAGC). Comparison of the Cycas mtDNA with other plant mtDNAs revealed many new insights into the biology and evolution of land plant mtDNAs. For example, the noncoding sequences in mtDNAs have drastically expanded as land plants have evolved, with abrupt increases appearing in the bryophytes, and then in the seed plants. As a result, the genomic organizations of seed plant mtDNAs are much less compact than in other plants. Also, the Cycas mtDNA appears to have been exempted from the frequent gene loss observed in angiosperm mtDNAs. Similar to the angiosperms, the 3 Cycas genes nad1, nad2, and nad5 are disrupted by 5 group II intron squences, which have brought the genes into trans-splicing arrangements. The evolutionary origin and invasion/duplication mechanism of the Bpu sequences in Cycas mtDNA are hypothesized and discussed.
Transcriptome de novo assembly from next-generation sequencing and comparative analyses in the hexaploid salt marsh species Spartina maritima and Spartina alterniflora (Poaceae)

PubMed Central

Ferreira de Carvalho, J; Poulain, J; Da Silva, C; Wincker, P; Michon-Coudouel, S; Dheilly, A; Naquin, D; Boutte, J; Salmon, A; Ainouche, M

2013-01-01

Spartina species have a critical ecological role in salt marshes and represent an excellent system to investigate recurrent polyploid speciation. Using the 454 GS-FLX pyrosequencer, we assembled and annotated the first reference transcriptome (from roots and leaves) for two related hexaploid Spartina species that hybridize in Western Europe, the East American invasive Spartina alterniflora and the Euro-African S. maritima. The de novo read assembly generated 38 478 consensus sequences and 99% found an annotation using Poaceae databases, representing a total of 16 753 non-redundant genes. Spartina expressed sequence tags were mapped onto the Sorghum bicolor genome, where they were distributed among the subtelomeric arms of the 10 S. bicolor chromosomes, with high gene density correlation. Normalization of the complementary DNA library improved the number of annotated genes. Ecologically relevant genes were identified among GO biological function categories in salt and heavy metal stress response, C4 photosynthesis and in lignin and cellulose metabolism. Expression of some of these genes had been found to be altered by hybridization and genome duplication in a previous microarray-based study in Spartina. As these species are hexaploid, up to three duplicated homoeologs may be expected per locus. When analyzing sequence polymorphism at four different loci in S. maritima and S. alterniflora, we found up to four haplotypes per locus, suggesting the presence of two expressed homoeologous sequences with one or two allelic variants each. This reference transcriptome will allow analysis of specific Spartina genes of ecological or evolutionary interest, estimation of homoeologous gene expression variation using RNA-seq and further gene expression evolution analyses in natural populations. PMID:23149455
Whole-genome sequencing for comparative genomics and de novo genome assembly.

PubMed

Benjak, Andrej; Sala, Claudia; Hartkoorn, Ruben C

2015-01-01

Next-generation sequencing technologies for whole-genome sequencing of mycobacteria are rapidly becoming an attractive alternative to more traditional sequencing methods. In particular this technology is proving useful for genome-wide identification of mutations in mycobacteria (comparative genomics) as well as for de novo assembly of whole genomes. Next-generation sequencing however generates a vast quantity of data that can only be transformed into a usable and comprehensible form using bioinformatics. Here we describe the methodology one would use to prepare libraries for whole-genome sequencing, and the basic bioinformatics to identify mutations in a genome following Illumina HiSeq or MiSeq sequencing, as well as de novo genome assembly following sequencing using Pacific Biosciences (PacBio).
c-Myb Binds to a Sequence in the Proximal Region of the RAG-2 Promoter and Is Essential for Promoter Activity in T-Lineage Cells

PubMed Central

Wang, Qian-Fei; Lauring, Josh; Schlissel, Mark S.

2000-01-01

The RAG-2 gene encodes a component of the V(D)J recombinase which is essential for the assembly of antigen receptor genes in B and T lymphocytes. Previously, we reported that the transcription factor BSAP (PAX-5) regulates the murine RAG-2 promoter in B-cell lines. A partially overlapping but distinct region of the proximal RAG-2 promoter was also identified as an important element for promoter activity in T cells; however, the responsible factor was unknown. In this report, we present data demonstrating that c-Myb binds to a Myb consensus site within the proximal promoter and is critical for its activity in T-lineage cells. We show that c-Myb can transactivate a RAG-2 promoter-reporter construct in cotransfection assays and that this transactivation depends on the proximal promoter Myb consensus site. By using a chromatin immunoprecipitation (ChIP) strategy, fractionation of chromatin with anti-c-Myb antibody specifically enriched endogenous RAG-2 promoter DNA sequences. DNase I genomic footprinting revealed that the c-Myb site is occupied in a tissue-specific fashion in vivo. Furthermore, an integrated RAG-2 promoter construct with mutations at the c-Myb site was not enriched in the ChIP assay, while a wild-type integrated promoter construct was enriched. Finally, this lack of binding of c-Myb to a chromosomally integrated mutant RAG-2 promoter construct in vivo was associated with a striking decrease in promoter activity. We conclude that c-Myb regulates the RAG-2 promoter in T cells by binding to this consensus c-Myb binding site. PMID:11094072
Osmolality/salinity-responsive enhancers (OSREs) control induction of osmoprotective genes in euryhaline fish

PubMed Central

Wang, Xiaodan; Kültz, Dietmar

2017-01-01

Fish respond to salinity stress by transcriptional induction of many genes, but the mechanism of their osmotic regulation is unknown. We developed a reporter assay using cells derived from the brain of the tilapia Oreochromis mossambicus (OmB cells) to identify osmolality/salinity-responsive enhancers (OSREs) in the genes of O. mossambicus. Genomic DNA comprising the regulatory regions of two strongly salinity-induced genes, inositol monophosphatase 1 (IMPA1.1) and myo-inositol phosphate synthase (MIPS), was isolated and analyzed with dual luciferase enhancer trap reporter assays. We identified five sequences (two in IMPA1.1 and three in MIPS) that share a common consensus element (DDKGGAAWWDWWYDNRB), which we named “OSRE1.” Additional OSREs that were less effective in conferring salinity-induced trans-activation and do not match the OSRE1 consensus also were identified in both MIPS and IMPA1.1. Although OSRE1 shares homology with the mammalian osmotic-response element/tonicity-responsive enhancer (ORE/TonE) enhancer, the latter is insufficient to confer osmotic induction in fish. Like other enhancers, OSRE1 trans-activates genes independent of orientation. We conclude that OSRE1 is a cis-regulatory element (CRE) that enhances the hyperosmotic induction of osmoregulated genes in fish. Our study also shows that tailored reporter assays developed for OmB cells facilitate the identification of CREs in fish genomes. Knowledge of the OSRE1 motif allows affinity-purification of the corresponding transcription factor and computational approaches for enhancer screening of fish genomes. Moreover, our study enables targeted inactivation of OSRE1 enhancers, a method superior to gene knockout for functional characterization because it confines impairment of gene function to a specific context (salinity stress) and eliminates pitfalls of constitutive gene knockouts (embryonic lethality, developmental compensation). PMID:28289196
Molecular characterization of a nuclear topoisomerase II from Nicotiana tabacum that functionally complements a temperature-sensitive topoisomerase II yeast mutant.

PubMed

Singh, B N; Mudgil, Yashwanti; Sopory, S K; Reddy, M K

2003-07-01

We have successfully expressed enzymatically active plant topoisomerase II in Escherichia coli for the first time, which has enabled its biochemical characterization. Using a PCR-based strategy, we obtained a full-length cDNA and the corresponding genomic clone of tobacco topoisomerase II. The genomic clone has 18 exons interrupted by 17 introns. Most of the 5' and 3' splice junctions follow the typical canonical consensus dinucleotide sequence GU-AG present in other plant introns. The position of introns and phasing with respect to primary amino acid sequence in tobacco TopII and Arabidopsis TopII are highly conserved, suggesting that the two genes are evolved from the common ancestral type II topoisomerase gene. The cDNA encodes a polypeptide of 1482 amino acids. The primary amino acid sequence shows a striking sequence similarity, preserving all the structural domains that are conserved among eukaryotic type II topoisomerases in an identical spatial order. We have expressed the full-length polypeptide in E. coli and purified the recombinant protein to homogeneity. The full-length polypeptide relaxed supercoiled DNA and decatenated the catenated DNA in a Mg(2+)- and ATP-dependent manner, and this activity was inhibited by 4'-(9-acridinylamino)-3'-methoxymethanesulfonanilide (m-AMSA). The immunofluorescence and confocal microscopic studies, with antibodies developed against the N-terminal region of tobacco recombinant topoisomerase II, established the nuclear localization of topoisomerase II in tobacco BY2 cells. The regulated expression of tobacco topoisomerase II gene under the GAL1 promoter functionally complemented a temperature-sensitive TopII(ts) yeast mutant.
Cloning and pharmacological characterization of the rabbit bradykinin B2 receptor.

PubMed

Bachvarov, D R; Saint-Jacques, E; Larrivée, J F; Levesque, L; Rioux, F; Drapeau, G; Marceau, F

1995-12-01

Degenerate primers, corresponding to consensus sequences of third and sixth transmembrane domains of G protein-coupled receptor superfamily, were used for the polymerase chain reaction amplification and consecutive characterization of G protein-coupled receptors present in cultured rabbit aortic smooth muscle cells. One of the isolated resulting fragments was highly homologous to the corresponding region of the bradykinin (BK) B2 receptor cloned in other species. The polymerase chain reaction fragment was used to screen a rabbit genomic library, which allowed the identification of an intronless 1101-nucleotide open reading frame which codes for a 367-amino acid receptor protein. The rabbit B2 receptor sequence is more than 80% identical to the ones determined in three other species and retain putative glycosylation, palmitoylation and phosphorylation sites. In the rabbit genomic sequence, an acceptor splice sequence was found 8 base pairs upstream of the start codon. Northern blot analysis showed a high expression of a major transcript (4.2 kilobases) in the rabbit kidney and duodenum, and a less abundant expression in other tissues. Southern blot experiments suggest that a single copy of this gene exists in the rabbit genome. The cloned rabbit B2 receptor expressed in COS-1 cells binds [3H]BK in a saturable manner (KD 2.1 nM) and this ligand competes with a series of kinin agonists and antagonist with a rank order consistent with the B2 receptor identity. The insurmountable character of the antagonism exerted by Hoe 140 against BK on the rabbit B2 receptor, previously shown in pharmacological experiments, was confirmed in binding experiments with the cloned receptor expressed in a controlled manner. By contrast, Hoe 140 competed with [3H]BK in a surmountable manner for the human B2 receptor expressed in COS-1 cells. The cloning of the rabbit B2 receptor will be useful notably for the study of the structural basis of antagonist binding and for studies on receptor regulation in a relatively large animal.
Changes in mumps virus neurovirulence phenotype associated with quasispecies heterogeneity

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sauder, Christian J.; Vandenburgh, Kari M.; Iskow, Rebecca C.

2006-06-20

Mumps virus is a highly neurotropic virus with evidence of central nervous system invasion (CNS) in approximately half of all cases of infection. In countries where live attenuated mumps virus vaccines were introduced, the number of mumps cases declined dramatically; however, recently, the safety of some vaccine strains has been questioned. For example, one of the most widely used vaccines, the Urabe AM9 strain, was causally associated with meningitis, leading to the withdrawal of this product from the market in several countries. This highlights the need for a better understanding of the attenuation process and the identification of markers ofmore » attenuation. To this end, we further attenuated the Urabe AM9 strain by serial passage in cell culture and compared the complete nucleotide sequences of the parental and passaged viruses. Interestingly, despite a dramatic decrease in virus virulence (as assayed in rats), the only genomic changes were in the form of changes in the level of genetic heterogeneity at specific genome sites, i.e., either selection of one nucleotide variant at positions where the starting material exhibited nucleotide heterogeneity or the evolution of an additional nucleotide to create a heterogenic site. This finding suggests that changes in the level of genetic heterogeneity at specific genome sites can have profound neurovirulence phenotypic consequences and, therefore, caution should be exercised when evaluating genetic markers of virulence or attenuation based only on a consensus sequence.« less
Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing

PubMed Central

2013-01-01

Background Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RNA viruses of the Western honey bee (Apis mellifera), deformed wing virus (DWV) and Israel acute paralysis virus (IAPV). All viral RNA was extracted from North American samples of honey bees or, in one case, the ectoparasitic mite Varroa destructor. Results Coverage depth was generally lower for IAPV than DWV, and marked gaps in coverage occurred in several narrow regions (< 50 bp) of IAPV. These coverage gaps occurred across sequencing runs and were virtually unchanged when reads were re-mapped with greater permissiveness (up to 8% divergence), suggesting a recurrent sequencing artifact rather than strain divergence. Consensus sequences of DWV for each sample showed little phylogenetic divergence, low nucleotide diversity, and strongly negative values of Fu and Li’s D statistic, suggesting a recent population bottleneck and/or purifying selection. The Kakugo strain of DWV fell outside of all other DWV sequences at 100% bootstrap support. IAPV consensus sequences supported the existence of multiple clades as had been previously reported, and Fu and Li’s D was closer to neutral expectation overall, although a sliding-window analysis identified a significantly positive D within the protease region, suggesting selection maintains diversity in that region. Within-sample mean diversity was comparable between the two viruses on average, although for both viruses there was substantial variation among samples in mean diversity at third codon positions and in the number of high-diversity sites. FST values were bimodal for DWV, likely reflecting neutral divergence in two low-diversity populations, whereas IAPV had several sites that were strong outliers with very low FST. Conclusions This initial survey of genetic variation within honey bee RNA viruses suggests future directions for studies examining the underlying causes of population-genetic structure in these economically important pathogens. PMID:23497218
Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing.

PubMed

Cornman, Robert Scott; Boncristiani, Humberto; Dainat, Benjamin; Chen, Yanping; vanEngelsdorp, Dennis; Weaver, Daniel; Evans, Jay D

2013-03-07

Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RNA viruses of the Western honey bee (Apis mellifera), deformed wing virus (DWV) and Israel acute paralysis virus (IAPV). All viral RNA was extracted from North American samples of honey bees or, in one case, the ectoparasitic mite Varroa destructor. Coverage depth was generally lower for IAPV than DWV, and marked gaps in coverage occurred in several narrow regions (< 50 bp) of IAPV. These coverage gaps occurred across sequencing runs and were virtually unchanged when reads were re-mapped with greater permissiveness (up to 8% divergence), suggesting a recurrent sequencing artifact rather than strain divergence. Consensus sequences of DWV for each sample showed little phylogenetic divergence, low nucleotide diversity, and strongly negative values of Fu and Li's D statistic, suggesting a recent population bottleneck and/or purifying selection. The Kakugo strain of DWV fell outside of all other DWV sequences at 100% bootstrap support. IAPV consensus sequences supported the existence of multiple clades as had been previously reported, and Fu and Li's D was closer to neutral expectation overall, although a sliding-window analysis identified a significantly positive D within the protease region, suggesting selection maintains diversity in that region. Within-sample mean diversity was comparable between the two viruses on average, although for both viruses there was substantial variation among samples in mean diversity at third codon positions and in the number of high-diversity sites. FST values were bimodal for DWV, likely reflecting neutral divergence in two low-diversity populations, whereas IAPV had several sites that were strong outliers with very low FST. This initial survey of genetic variation within honey bee RNA viruses suggests future directions for studies examining the underlying causes of population-genetic structure in these economically important pathogens.
Development and implementation of a highly-multiplexed SNP array for genetic mapping in maritime pine and comparative mapping with loblolly pine

PubMed Central

2011-01-01

Background Single nucleotide polymorphisms (SNPs) are the most abundant source of genetic variation among individuals of a species. New genotyping technologies allow examining hundreds to thousands of SNPs in a single reaction for a wide range of applications such as genetic diversity analysis, linkage mapping, fine QTL mapping, association studies, marker-assisted or genome-wide selection. In this paper, we evaluated the potential of highly-multiplexed SNP genotyping for genetic mapping in maritime pine (Pinus pinaster Ait.), the main conifer used for commercial plantation in southwestern Europe. Results We designed a custom GoldenGate assay for 1,536 SNPs detected through the resequencing of gene fragments (707 in vitro SNPs/Indels) and from Sanger-derived Expressed Sequenced Tags assembled into a unigene set (829 in silico SNPs/Indels). Offspring from three-generation outbred (G2) and inbred (F2) pedigrees were genotyped. The success rate of the assay was 63.6% and 74.8% for in silico and in vitro SNPs, respectively. A genotyping error rate of 0.4% was further estimated from segregating data of SNPs belonging to the same gene. Overall, 394 SNPs were available for mapping. A total of 287 SNPs were integrated with previously mapped markers in the G2 parental maps, while 179 SNPs were localized on the map generated from the analysis of the F2 progeny. Based on 98 markers segregating in both pedigrees, we were able to generate a consensus map comprising 357 SNPs from 292 different loci. Finally, the analysis of sequence homology between mapped markers and their orthologs in a Pinus taeda linkage map, made it possible to align the 12 linkage groups of both species. Conclusions Our results show that the GoldenGate assay can be used successfully for high-throughput SNP genotyping in maritime pine, a conifer species that has a genome seven times the size of the human genome. This SNP-array will be extended thanks to recent sequencing effort using new generation sequencing technologies and will include SNPs from comparative orthologous sequences that were identified in the present study, providing a wider collection of anchor points for comparative genomics among the conifers. PMID:21767361
Gap Resolution

DOE Office of Scientific and Technical Information (OSTI.GOV)

Labutti, Kurt; Foster, Brian; Lapidus, Alla

Gap Resolution is a software package that was developed to improve Newbler genome assemblies by automating the closure of sequence gaps caused by repetitive regions in the DNA. This is done by performing the follow steps:1) Identify and distribute the data for each gap in sub-projects. 2) Assemble the data associated with each sub-project using a secondary assembler, such as Newbler or PGA. 3) Determine if any gaps are closed after reassembly, and either design fakes (consensus of closed gap) for those that closed or lab experiments for those that require additional data. The software requires as input a genomemore » assembly produce by the Newbler assembler provided by Roche and 454 data containing paired-end reads.« less
An efficient approach to BAC based assembly of complex genomes.

PubMed

Visendi, Paul; Berkman, Paul J; Hayashi, Satomi; Golicz, Agnieszka A; Bayer, Philipp E; Ruperao, Pradeep; Hurgobin, Bhavna; Montenegro, Juan; Chan, Chon-Kit Kenneth; Staňková, Helena; Batley, Jacqueline; Šimková, Hana; Doležel, Jaroslav; Edwards, David

2016-01-01

There has been an exponential growth in the number of genome sequencing projects since the introduction of next generation DNA sequencing technologies. Genome projects have increasingly involved assembly of whole genome data which produces inferior assemblies compared to traditional Sanger sequencing of genomic fragments cloned into bacterial artificial chromosomes (BACs). While whole genome shotgun sequencing using next generation sequencing (NGS) is relatively fast and inexpensive, this method is extremely challenging for highly complex genomes, where polyploidy or high repeat content confounds accurate assembly, or where a highly accurate 'gold' reference is required. Several attempts have been made to improve genome sequencing approaches by incorporating NGS methods, to variable success. We present the application of a novel BAC sequencing approach which combines indexed pools of BACs, Illumina paired read sequencing, a sequence assembler specifically designed for complex BAC assembly, and a custom bioinformatics pipeline. We demonstrate this method by sequencing and assembling BAC cloned fragments from bread wheat and sugarcane genomes. We demonstrate that our assembly approach is accurate, robust, cost effective and scalable, with applications for complete genome sequencing in large and complex genomes.

Characterization of Two Novel Polyomaviruses of Birds by Using Multiply Primed Rolling-Circle Amplification of Their Genomes

PubMed Central

Johne, Reimar; Wittig, Walter; Fernández-de-Luco, Daniel; Höfle, Ursula; Müller, Hermann

2006-01-01

Polyomaviruses are small nonenveloped particles with a circular double-stranded genome, approximately 5 kbp in size. The mammalian polyomaviruses mainly cause persistent subclinical infections in their natural nonimmunocompromised hosts. In contrast, the polyomaviruses of birds—avian polyomavirus (APV) and goose hemorrhagic polyomavirus (GHPV)—are the primary agents of acute and chronic disease with high mortality rates in young birds. Screening of field samples of diseased birds by consensus PCR revealed the presence of two novel polyomaviruses in the liver of an Eurasian bullfinch (Pyrrhula pyrrhula griseiventris) and in the spleen of a Eurasian jackdaw (Corvus monedula), tentatively designated as finch polyomavirus (FPyV) and crow polyomavirus (CPyV), respectively. The genomes of the viruses were amplified by using multiply primed rolling-circle amplification and cloned. Analysis of the FPyV and CPyV genome sequences revealed a close relationship to APV and GHPV, indicating the existence of a distinct avian group among the polyomaviruses. The main characteristics of this group are (i) involvement in fatal disease, (ii) the existence of an additional open reading frame in the 5′ region of the late mRNAs, and (iii) a different manner of DNA binding of the large tumor antigen compared to that of the mammalian polyomaviruses. PMID:16537620
MELOGEN: an EST database for melon functional genomics

PubMed Central

Gonzalez-Ibeas, Daniel; Blanca, José; Roig, Cristina; González-To, Mireia; Picó, Belén; Truniger, Verónica; Gómez, Pedro; Deleu, Wim; Caño-Delgado, Ana; Arús, Pere; Nuez, Fernando; Garcia-Mas, Jordi; Puigdomènech, Pere; Aranda, Miguel A

2007-01-01

Background Melon (Cucumis melo L.) is one of the most important fleshy fruits for fresh consumption. Despite this, few genomic resources exist for this species. To facilitate the discovery of genes involved in essential traits, such as fruit development, fruit maturation and disease resistance, and to speed up the process of breeding new and better adapted melon varieties, we have produced a large collection of expressed sequence tags (ESTs) from eight normalized cDNA libraries from different tissues in different physiological conditions. Results We determined over 30,000 ESTs that were clustered into 16,637 non-redundant sequences or unigenes, comprising 6,023 tentative consensus sequences (contigs) and 10,614 unclustered sequences (singletons). Many potential molecular markers were identified in the melon dataset: 1,052 potential simple sequence repeats (SSRs) and 356 single nucleotide polymorphisms (SNPs) were found. Sixty-nine percent of the melon unigenes showed a significant similarity with proteins in databases. Functional classification of the unigenes was carried out following the Gene Ontology scheme. In total, 9,402 unigenes were mapped to one or more ontology. Remarkably, the distributions of melon and Arabidopsis unigenes followed similar tendencies, suggesting that the melon dataset is representative of the whole melon transcriptome. Bioinformatic analyses primarily focused on potential precursors of melon micro RNAs (miRNAs) in the melon dataset, but many other genes potentially controlling disease resistance and fruit quality traits were also identified. Patterns of transcript accumulation were characterised by Real-Time-qPCR for 20 of these genes. Conclusion The collection of ESTs characterised here represents a substantial increase on the genetic information available for melon. A database (MELOGEN) which contains all EST sequences, contig images and several tools for analysis and data mining has been created. This set of sequences constitutes also the basis for an oligo-based microarray for melon that is being used in experiments to further analyse the melon transcriptome. PMID:17767721
Massive amplification of rolling-circle transposons in the lineage of the bat Myotis lucifugus

PubMed Central

Pritham, Ellen J.; Feschotte, Cédric

2007-01-01

Rolling-circle (RC) transposons, or Helitrons, are a newly recognized group of eukaryotic transposable elements abundant in the genomes of plants, invertebrates, and zebrafish. We provide evidence for the colonization of a mammalian genome by Helitrons, which has not been reported previously. We identified and characterized two families of Helitrons in the little brown bat Myotis lucifugus. The consensus sequence for the first family, HeliBat1, displays the hallmarks of an autonomous Helitron, including coding capacity for an ≈1,500-aa protein with an RC replication motif and a region related to the SF1 superfamily of DNA helicases. The HeliBatN1 family is a nonautonomous Helitron family that is only distantly related to HeliBat1. The two HeliBat families have attained high copy numbers (≈15,000 and > 100,000 copies, respectively) and make up at least ≈3% of the M. lucifugus genome. Sequence divergence and cross-species analyses indicate that both HeliBat families have amplified within the last ≈30–36 million years and are restricted to the lineage of vesper bats. We could not detect the presence of Helitrons in any other order of placental mammals, despite the broad representation of these taxa in the databases. We describe an instance of HeliBat-mediated transduction of a host gene fragment that was subsequently dispersed in ≈1,000 copies throughout the M. lucifugus genome. Given the demonstrated propensity of RC transposons to mediate the duplication and shuffling of host genes in bacteria and maize, it is tempting to speculate that the massive amplification of Helitrons in vesper bats has influenced the evolutionary trajectory of these mammals. PMID:17261799
The specificity and flexibility of l1 reverse transcription priming at imperfect T-tracts.

PubMed

Monot, Clément; Kuciak, Monika; Viollet, Sébastien; Mir, Ashfaq Ali; Gabus, Caroline; Darlix, Jean-Luc; Cristofari, Gaël

2013-05-01

L1 retrotransposons have a prominent role in reshaping mammalian genomes. To replicate, the L1 ribonucleoprotein particle (RNP) first uses its endonuclease (EN) to nick the genomic DNA. The newly generated DNA end is subsequently used as a primer to initiate reverse transcription within the L1 RNA poly(A) tail, a process known as target-primed reverse transcription (TPRT). Prior studies demonstrated that most L1 insertions occur into sequences related to the L1 EN consensus sequence (degenerate 5'-TTTT/A-3' sites) and frequently preceded by imperfect T-tracts. However, it is currently unclear whether--and to which degree--the liberated 3'-hydroxyl extremity on the genomic DNA needs to be accessible and complementary to the poly(A) tail of the L1 RNA for efficient priming of reverse transcription. Here, we employed a direct assay for the initiation of L1 reverse transcription to define the molecular rules that guide this process. First, efficient priming is detected with as few as 4 matching nucleotides at the primer 3' end. Second, L1 RNP can tolerate terminal mismatches if they are compensated within the 10 last bases of the primer by an increased number of matching nucleotides. All terminal mismatches are not equally detrimental to DNA extension, a C being extended at higher levels than an A or a G. Third, efficient priming in the context of duplex DNA requires a 3' overhang. This suggests the possible existence of additional DNA processing steps, which generate a single-stranded 3' end to allow L1 reverse transcription. Based on these data we propose that the specificity of L1 reverse transcription initiation contributes, together with the specificity of the initial EN cleavage, to the distribution of new L1 insertions within the human genome.
Influence of Translation Initiation on Organellar Protein Targeting in Arabidopsis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sally A. Mackenzie

2011-04-18

A primary focus of the Mackenzie laboratory is the elucidation of processes and machinery for mitochondrial genome maintenance and transmission in higher plants. We have found that numerous organellar DNA maintenance components in plants appear to be dual targeted to mitochondria and plastids. Of particular interest was the observation that some twin (tandemly arrayed) dual targeting presequences appeared to utilize non-AUG alternative translation initiation, allowing for multiple translation starts at a single gene. Two aspects of this phenomenon were of particular interest: (1) Alternative translation initiation might provide a mechanism to regulate protein targeting temporally and spatially, a possibility thatmore » had not been demonstrated previously, and (2) alternative translation initiation might occur in genes involved in nuclear-controlled mitochondrial genome recombination, thought to be exclusively mitochondrial in their function. During the course of this research, we pursued three aims, with an emphasis on two specific genes of interest: POLgamma2, an organellar DNA polymerase, and MSH1, a MutS homolog thought to participate in mitochondrial, but not plastid, genome recombination surveillance. Our aims were to (1) Identify additional genes within Arabidopsis and other genomes that employ non-AUG alternative translation initiation, (2) Locate sequences upstream to the annotated AUG that confer alternative non-AUG translation initiation activity, and (3) Identify cis and trans factors that influence start site selection in genes with non-AUG starts. Toward these ends, we have shown that non-AUG initiation occurs in a number of genes, likely influencing targeting behavior of the protein. We have also shown that start site selection is strongly influenced by Kozak consensus sequence environment, indicating that alternative translation initiation in plants occurs by relaxation of ribosome scanning.« less
PROSPECT improves cis-acting regulatory element prediction by integrating expression profile data with consensus pattern searches

PubMed Central

Fujibuchi, Wataru; Anderson, John S. J.; Landsman, David

2001-01-01

Consensus pattern and matrix-based searches designed to predict cis-acting transcriptional regulatory sequences have historically been subject to large numbers of false positives. We sought to decrease false positives by incorporating expression profile data into a consensus pattern-based search method. We have systematically analyzed the expression phenotypes of over 6000 yeast genes, across 121 expression profile experiments, and correlated them with the distribution of 14 known regulatory elements over sequences upstream of the genes. Our method is based on a metric we term probabilistic element assessment (PEA), which is a ranking of potential sites based on sequence similarity in the upstream regions of genes with similar expression phenotypes. For eight of the 14 known elements that we examined, our method had a much higher selectivity than a naïve consensus pattern search. Based on our analysis, we have developed a web-based tool called PROSPECT, which allows consensus pattern-based searching of gene clusters obtained from microarray data. PMID:11574681
Origin Replication Complex Binding, Nucleosome Depletion Patterns, and a Primary Sequence Motif Can Predict Origins of Replication in a Genome with Epigenetic Centromeres

PubMed Central

Tsai, Hung-Ji; Baller, Joshua A.; Liachko, Ivan; Koren, Amnon; Burrack, Laura S.; Hickman, Meleah A.; Thevandavakkam, Mathuravani A.; Rusche, Laura N.

2014-01-01

ABSTRACT Origins of DNA replication are key genetic elements, yet their identification remains elusive in most organisms. In previous work, we found that centromeres contain origins of replication (ORIs) that are determined epigenetically in the pathogenic yeast Candida albicans. In this study, we used origin recognition complex (ORC) binding and nucleosome occupancy patterns in Saccharomyces cerevisiae and Kluyveromyces lactis to train a machine learning algorithm to predict the position of active arm (noncentromeric) origins in the C. albicans genome. The model identified bona fide active origins as determined by the presence of replication intermediates on nondenaturing two-dimensional (2D) gels. Importantly, these origins function at their native chromosomal loci and also as autonomously replicating sequences (ARSs) on a linear plasmid. A “mini-ARS screen” identified at least one and often two ARS regions of ≥100 bp within each bona fide origin. Furthermore, a 15-bp AC-rich consensus motif was associated with the predicted origins and conferred autonomous replicating activity to the mini-ARSs. Thus, while centromeres and the origins associated with them are epigenetic, arm origins are dependent upon critical DNA features, such as a binding site for ORC and a propensity for nucleosome exclusion. PMID:25182328
The episode of genetic drift defining the migration of humans out of Africa is derived from a large east African population size.

PubMed

Elhassan, Nuha; Gebremeskel, Eyoab Iyasu; Elnour, Mohamed Ali; Isabirye, Dan; Okello, John; Hussien, Ayman; Kwiatksowski, Dominic; Hirbo, Jibril; Tishkoff, Sara; Ibrahim, Muntaser E

2014-01-01

Human genetic variation particularly in Africa is still poorly understood. This is despite a consensus on the large African effective population size compared to populations from other continents. Based on sequencing of the mitochondrial Cytochrome C Oxidase subunit II (MT-CO2), and genome wide microsatellite data we observe evidence suggesting the effective size (Ne) of humans to be larger than the current estimates, with a foci of increased genetic diversity in east Africa, and a population size of east Africans being at least 2-6 fold larger than other populations. Both phylogenetic and network analysis indicate that east Africans possess more ancestral lineages in comparison to various continental populations placing them at the root of the human evolutionary tree. Our results also affirm east Africa as the likely spot from which migration towards Asia has taken place. The study reflects the spectacular level of sequence variation within east Africans in comparison to the global sample, and appeals for further studies that may contribute towards filling the existing gaps in the database. The implication of these data to current genomic research, as well as the need to carry out defined studies of human genetic variation that includes more African populations; particularly east Africans is paramount.
Extensive frameshift at all AGG and CCC codons in the mitochondrial cytochrome c oxidase subunit 1 gene of Perkinsus marinus (Alveolata; Dinoflagellata).

PubMed

Masuda, Isao; Matsuzaki, Motomichi; Kita, Kiyoshi

2010-10-01

Diverse mitochondrial (mt) genetic systems have evolved independently of the more uniform nuclear system and often employ modified genetic codes. The organization and genetic system of dinoflagellate mt genomes are particularly unusual and remain an evolutionary enigma. We determined the sequence of full-length cytochrome c oxidase subunit 1 (cox1) mRNA of the earliest diverging dinoflagellate Perkinsus and show that this gene resides in the mt genome. Apparently, this mRNA is not translated in a single reading frame with standard codon usage. Our examination of the nucleotide sequence and three-frame translation of the mRNA suggest that the reading frame must be shifted 10 times, at every AGG and CCC codon, to yield a consensus COX1 protein. We suggest two possible mechanisms for these translational frameshifts: a ribosomal frameshift in which stalled ribosomes skip the first bases of these codons or specialized tRNAs recognizing non-triplet codons, AGGY and CCCCU. Regardless of the mechanism, active and efficient machinery would be required to tolerate the frameshifts predicted in Perkinsus mitochondria. To our knowledge, this is the first evidence of translational frameshifts in protist mitochondria and, by far, is the most extensive case in mitochondria.
Human genetics and genomics a decade after the release of the draft sequence of the human genome.

PubMed

Naidoo, Nasheen; Pawitan, Yudi; Soong, Richie; Cooper, David N; Ku, Chee-Seng

2011-10-01

Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade.
Human genetics and genomics a decade after the release of the draft sequence of the human genome

PubMed Central

2011-01-01

Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade. PMID:22155605
An ultra-high density linkage map and QTL mapping for sex and growth-related traits of common carp (Cyprinus carpio)

PubMed Central

Peng, Wenzhu; Xu, Jian; Zhang, Yan; Feng, Jianxin; Dong, Chuanju; Jiang, Likun; Feng, Jingyan; Chen, Baohua; Gong, Yiwen; Chen, Lin; Xu, Peng

2016-01-01

High density genetic linkage maps are essential for QTL fine mapping, comparative genomics and high quality genome sequence assembly. In this study, we constructed a high-density and high-resolution genetic linkage map with 28,194 SNP markers on 14,146 distinct loci for common carp based on high-throughput genotyping with the carp 250 K single nucleotide polymorphism (SNP) array in a mapping family. The genetic length of the consensus map was 10,595.94 cM with an average locus interval of 0.75 cM and an average marker interval of 0.38 cM. Comparative genomic analysis revealed high level of conserved syntenies between common carp and the closely related model species zebrafish and medaka. The genome scaffolds were anchored to the high-density linkage map, spanning 1,357 Mb of common carp reference genome. QTL mapping and association analysis identified 22 QTLs for growth-related traits and 7 QTLs for sex dimorphism. Candidate genes underlying growth-related traits were identified, including important regulators such as KISS2, IGF1, SMTLB, NPFFR1 and CPE. Candidate genes associated with sex dimorphism were also identified including 3KSR and DMRT2b. The high-density and high-resolution genetic linkage map provides an important tool for QTL fine mapping and positional cloning of economically important traits, and improving common carp genome assembly. PMID:27225429
Sequence specificity for uridylylation of the viral peptide linked to the genome (VPg) of enteroviruses.

PubMed

Schein, Catherine H; Ye, Mengyi; Paul, Aniko V; Oberste, M Steven; Chapman, Nora; van der Heden van Noort, Gerbrand J; Filippov, Dmitri V; Choi, Kyung H

2015-10-01

Enteroviruses (EV) uridylylate a peptide, VPg, as the first step in their replication. VPgpUpU, found free in infected cells, serves as the primer for RNA elongation. The abilities of four polymerases (3D(pol)), from EV-species A-C, to uridylylate VPgs that varied by up to 60% of their residues were compared. Each 3D(pol) was able to uridylylate all five VPgs using polyA RNA as template, while showing specificity for its own genome encoded peptide. All 3D(pol) uridylylated a consensus VPg representing the physical chemical properties of 31 different VPgs. Thus the residues required for uridylylation and the enzymatic mechanism must be similar in diverse EV. As VPg-binding sites differ in co-crystal structures, the reaction is probably done by a second 3D(pol) molecule. The conservation of polymerase residues whose mutation reduces uridylylation but not RNA elongation is compared. Copyright © 2015 Elsevier Inc. All rights reserved.
Duplication of the genome in normal and cancer cell cycles.

PubMed

Bandura, Jennifer L; Calvi, Brian R

2002-01-01

It is critical to discover the mechanisms of normal cell cycle regulation if we are to fully understand what goes awry in cancer cells. The normal eukaryotic cell tightly regulates the activity of origins of DNA replication so that the genome is duplicated exactly once per cell cycle. Over the last ten years much has been learned concerning the cell cycle regulation of origin activity. It is now clear that the proteins and cell cycle mechanisms that control origin activity are largely conserved from yeast to humans. Despite this conservation, the composition of origins of DNA replication in higher eukaryotes remains ill defined. A DNA consensus for predicting origins has yet to emerge, and it is of some debate whether primary DNA sequence determines where replication initiates. In this review we outline what is known about origin structure and the mechanism of once per cell cycle DNA replication with an emphasis on recent advances in mammalian cells. We discuss the possible relevance of these regulatory pathways for cancer biology and therapy.
Polypeptide p41 of a Norwalk-Like Virus Is a Nucleic Acid-Independent Nucleoside Triphosphatase

PubMed Central

Pfister, Thomas; Wimmer, Eckard

2001-01-01

Southampton virus (SHV) is a member of the Norwalk-like viruses (NLVs), one of four genera of the family Caliciviridae. The genome of SHV contains three open reading frames (ORFs). ORF 1 encodes a polyprotein that is autocatalytically processed into six proteins, one of which is p41. p41 shares sequence motifs with protein 2C of picornaviruses and superfamily 3 helicases. We have expressed p41 of SHV in bacteria. Purified p41 exhibited nucleoside triphosphate (NTP)-binding and NTP hydrolysis activities. The NTPase activity was not stimulated by single-stranded nucleic acids. SHV p41 had no detectable helicase activity. Protein sequence comparison between the consensus sequences of NLV p41 and enterovirus protein 2C revealed regions of high similarity. According to secondary structure prediction, the conserved regions were located within a putative central domain of alpha helices and beta strands. This study reveals for the first time an NTPase activity associated with a calicivirus-encoded protein. Based on enzymatic properties and sequence information, a functional relationship between NLV p41 and enterovirus 2C is discussed in regard to the role of 2C-like proteins in virus replication. PMID:11160659
Diverse Dengue Type 2 Virus Populations Contain Recombinant and Both Parental Viruses in a Single Mosquito Host

PubMed Central

Craig, Scott; Thu, Hlaing Myat; Lowry, Kym; Wang, Xiao-fang; Holmes, Edward C.; Aaskov, John

2003-01-01

Envelope (E) protein genes sampled from populations of dengue 2 (DEN-2) virus in individual Aedes aegypti mosquitoes and in serum from dengue patients were copied to cDNA, cloned, and sequenced. The nucleotide sequences of the E genes in more than 70% of the clones differed from the consensus sequence for the corresponding virus population at up to 11 sites, and 24 of the 94 clones contained at least one stop codon. Virus populations recovered up to 2 years apart yielded clones with similar polymorphisms in the E gene. For one mosquito, the clones obtained fell into two genotypes. One group of sequences was closely related to those of viruses recovered from dengue patients in the same locality (Yangon, Myanmar) since 1995 and were classified as Asian 1 genotype. The second group were Cosmopolitan genotype viruses which were also circulating in Yangon in 2000 and which were related to DEN-2 viruses sampled from southern China in 1999. Finally, one clone was identified as a recombinant genome composed of portions of these two “parental” genotypes. This is the first report of recombinant and parental dengue viruses in a single host. PMID:12634407
Applying the Concept of Peptide Uniqueness to Anti-Polio Vaccination.

PubMed

Kanduc, Darja; Fasano, Candida; Capone, Giovanni; Pesce Delfino, Antonella; Calabrò, Michele; Polimeno, Lorenzo

2015-01-01

Although rare, adverse events may associate with anti-poliovirus vaccination thus possibly hampering global polio eradication worldwide. To design peptide-based anti-polio vaccines exempt from potential cross-reactivity risks and possibly able to reduce rare potential adverse events such as the postvaccine paralytic poliomyelitis due to the tendency of the poliovirus genome to mutate. Proteins from poliovirus type 1, strain Mahoney, were analyzed for amino acid sequence identity to the human proteome at the pentapeptide level, searching for sequences that (1) have zero percent of identity to human proteins, (2) are potentially endowed with an immunologic potential, and (3) are highly conserved among poliovirus strains. Sequence analyses produced a set of consensus epitopic peptides potentially able to generate specific anti-polio immune responses exempt from cross-reactivity with the human host. Peptide sequences unique to poliovirus proteins and conserved among polio strains might help formulate a specific and universal anti-polio vaccine able to react with multiple viral strains and exempt from the burden of possible cross-reactions with human proteins. As an additional advantage, using a peptide-based vaccine instead of current anti-polio DNA vaccines would eliminate the rare post-polio poliomyelitis cases and other disabling symptoms that may appear following vaccination.
Global Metabolic Reconstruction and Metabolic Gene Evolution in the Cattle Genome

PubMed Central

Kim, Woonsu; Park, Hyesun; Seo, Seongwon

2016-01-01

The sequence of cattle genome provided a valuable opportunity to systematically link genetic and metabolic traits of cattle. The objectives of this study were 1) to reconstruct genome-scale cattle-specific metabolic pathways based on the most recent and updated cattle genome build and 2) to identify duplicated metabolic genes in the cattle genome for better understanding of metabolic adaptations in cattle. A bioinformatic pipeline of an organism for amalgamating genomic annotations from multiple sources was updated. Using this, an amalgamated cattle genome database based on UMD_3.1, was created. The amalgamated cattle genome database is composed of a total of 33,292 genes: 19,123 consensus genes between NCBI and Ensembl databases, 8,410 and 5,493 genes only found in NCBI or Ensembl, respectively, and 266 genes from NCBI scaffolds. A metabolic reconstruction of the cattle genome and cattle pathway genome database (PGDB) was also developed using Pathway Tools, followed by an intensive manual curation. The manual curation filled or revised 68 pathway holes, deleted 36 metabolic pathways, and added 23 metabolic pathways. Consequently, the curated cattle PGDB contains 304 metabolic pathways, 2,460 reactions including 2,371 enzymatic reactions, and 4,012 enzymes. Furthermore, this study identified eight duplicated genes in 12 metabolic pathways in the cattle genome compared to human and mouse. Some of these duplicated genes are related with specific hormone biosynthesis and detoxifications. The updated genome-scale metabolic reconstruction is a useful tool for understanding biology and metabolic characteristics in cattle. There has been significant improvements in the quality of cattle genome annotations and the MetaCyc database. The duplicated metabolic genes in the cattle genome compared to human and mouse implies evolutionary changes in the cattle genome and provides a useful information for further research on understanding metabolic adaptations of cattle. PMID:26992093
Genome sequence of Phytophthora ramorum: implications for management

Treesearch

Brett Tyler; Sucheta Tripathy; Nik Grunwald; Kurt Lamour; Kelly Ivors; Matteo Garbelotto; Daniel Rokhsar; Nik Putnam; Igor Grigoriev; Jeffrey Boore

2006-01-01

A draft genome sequence has been determined for Phytophthora ramorum, together with a draft sequence of the soybean pathogen Phytophthora sojae. The P. ramorum genome was sequenced to a depth of 7-fold coverage, while the P. sojae genome was sequenced to a depth of 9-fold coverage. The genome...
Genome Science: A Video Tour of the Washington University Genome Sequencing Center for High School and Undergraduate Students

ERIC Educational Resources Information Center

Flowers, Susan K.; Easter, Carla; Holmes, Andrea; Cohen, Brian; Bednarski, April E.; Mardis, Elaine R.; Wilson, Richard K.; Elgin, Sarah C. R.

2005-01-01

Sequencing of the human genome has ushered in a new era of biology. The technologies developed to facilitate the sequencing of the human genome are now being applied to the sequencing of other genomes. In 2004, a partnership was formed between Washington University School of Medicine Genome Sequencing Center's Outreach Program and Washington…

An evaluation of copy number variation detection tools for cancer using whole exome sequencing data.

PubMed

Zare, Fatima; Dow, Michelle; Monteleone, Nicholas; Hosny, Abdelrahman; Nabavi, Sheida

2017-05-31

Recently copy number variation (CNV) has gained considerable interest as a type of genomic/genetic variation that plays an important role in disease susceptibility. Advances in sequencing technology have created an opportunity for detecting CNVs more accurately. Recently whole exome sequencing (WES) has become primary strategy for sequencing patient samples and study their genomics aberrations. However, compared to whole genome sequencing, WES introduces more biases and noise that make CNV detection very challenging. Additionally, tumors' complexity makes the detection of cancer specific CNVs even more difficult. Although many CNV detection tools have been developed since introducing NGS data, there are few tools for somatic CNV detection for WES data in cancer. In this study, we evaluated the performance of the most recent and commonly used CNV detection tools for WES data in cancer to address their limitations and provide guidelines for developing new ones. We focused on the tools that have been designed or have the ability to detect cancer somatic aberrations. We compared the performance of the tools in terms of sensitivity and false discovery rate (FDR) using real data and simulated data. Comparative analysis of the results of the tools showed that there is a low consensus among the tools in calling CNVs. Using real data, tools show moderate sensitivity (~50% - ~80%), fair specificity (~70% - ~94%) and poor FDRs (~27% - ~60%). Also, using simulated data we observed that increasing the coverage more than 10× in exonic regions does not improve the detection power of the tools significantly. The limited performance of the current CNV detection tools for WES data in cancer indicates the need for developing more efficient and precise CNV detection methods. Due to the complexity of tumors and high level of noise and biases in WES data, employing advanced novel segmentation, normalization and de-noising techniques that are designed specifically for cancer data is necessary. Also, CNV detection development suffers from the lack of a gold standard for performance evaluation. Finally, developing tools with user-friendly user interfaces and visualization features can enhance CNV studies for a broader range of users.
Miniature Transposable Sequences Are Frequently Mobilized in the Bacterial Plant Pathogen Pseudomonas syringae pv. phaseolicola

PubMed Central

Bardaji, Leire; Añorga, Maite; Jackson, Robert W.; Martínez-Bilbao, Alejandro; Yanguas-Casás, Natalia; Murillo, Jesús

2011-01-01

Mobile genetic elements are widespread in Pseudomonas syringae, and often associate with virulence genes. Genome reannotation of the model bean pathogen P. syringae pv. phaseolicola 1448A identified seventeen types of insertion sequences and two miniature inverted-repeat transposable elements (MITEs) with a biased distribution, representing 2.8% of the chromosome, 25.8% of the 132-kb virulence plasmid and 2.7% of the 52-kb plasmid. Employing an entrapment vector containing sacB, we estimated that transposition frequency oscillated between 2.6×10−5 and 1.1×10−6, depending on the clone, although it was stable for each clone after consecutive transfers in culture media. Transposition frequency was similar for bacteria grown in rich or minimal media, and from cells recovered from compatible and incompatible plant hosts, indicating that growth conditions do not influence transposition in strain 1448A. Most of the entrapped insertions contained a full-length IS801 element, with the remaining insertions corresponding to sequences smaller than any transposable element identified in strain 1448A, and collectively identified as miniature sequences. From these, fragments of 229, 360 and 679-nt of the right end of IS801 ended in a consensus tetranucleotide and likely resulted from one-ended transposition of IS801. An average 0.7% of the insertions analyzed consisted of IS801 carrying a fragment of variable size from gene PSPPH_0008/PSPPH_0017, showing that IS801 can mobilize DNA in vivo. Retrospective analysis of complete plasmids and genomes of P. syringae suggests, however, that most fragments of IS801 are likely the result of reorganizations rather than one-ended transpositions, and that this element might preferentially contribute to genome flexibility by generating homologous regions of recombination. A further miniature sequence previously found to affect host range specificity and virulence, designated MITEPsy1 (100-nt), represented an average 2.4% of the total number of insertions entrapped in sacB, demonstrating for the first time the mobilization of a MITE in bacteria. PMID:22016774
Miniature transposable sequences are frequently mobilized in the bacterial plant pathogen Pseudomonas syringae pv. phaseolicola.

PubMed

Bardaji, Leire; Añorga, Maite; Jackson, Robert W; Martínez-Bilbao, Alejandro; Yanguas-Casás, Natalia; Murillo, Jesús

2011-01-01

Mobile genetic elements are widespread in Pseudomonas syringae, and often associate with virulence genes. Genome reannotation of the model bean pathogen P. syringae pv. phaseolicola 1448A identified seventeen types of insertion sequences and two miniature inverted-repeat transposable elements (MITEs) with a biased distribution, representing 2.8% of the chromosome, 25.8% of the 132-kb virulence plasmid and 2.7% of the 52-kb plasmid. Employing an entrapment vector containing sacB, we estimated that transposition frequency oscillated between 2.6×10(-5) and 1.1×10(-6), depending on the clone, although it was stable for each clone after consecutive transfers in culture media. Transposition frequency was similar for bacteria grown in rich or minimal media, and from cells recovered from compatible and incompatible plant hosts, indicating that growth conditions do not influence transposition in strain 1448A. Most of the entrapped insertions contained a full-length IS801 element, with the remaining insertions corresponding to sequences smaller than any transposable element identified in strain 1448A, and collectively identified as miniature sequences. From these, fragments of 229, 360 and 679-nt of the right end of IS801 ended in a consensus tetranucleotide and likely resulted from one-ended transposition of IS801. An average 0.7% of the insertions analyzed consisted of IS801 carrying a fragment of variable size from gene PSPPH_0008/PSPPH_0017, showing that IS801 can mobilize DNA in vivo. Retrospective analysis of complete plasmids and genomes of P. syringae suggests, however, that most fragments of IS801 are likely the result of reorganizations rather than one-ended transpositions, and that this element might preferentially contribute to genome flexibility by generating homologous regions of recombination. A further miniature sequence previously found to affect host range specificity and virulence, designated MITEPsy1 (100-nt), represented an average 2.4% of the total number of insertions entrapped in sacB, demonstrating for the first time the mobilization of a MITE in bacteria.
Origin of the Y genome in Elymus and its relationship to other genomes in Triticeae based on evidence from elongation factor G (EF-G) gene sequences.

PubMed

Sun, Genlou; Komatsuda, Takao

2010-08-01

It is well known that Elymus arose through hybridization between representatives of different genera. Cytogenetic analyses show that all its members include the St genome in combination with one or more of four other genomes, the H, Y, P, and W genomes. The origins of the H, P, and W genomes are known, but not for the Y genome. We analyzed the single copy nuclear gene coding for elongation factor G (EF-G) from 28 accessions of polyploid Elymus species and 45 accessions of diploid Triticeae species in order to investigate origin of the Y genome and its relationship to other genomes in the tribe Triticeae. Sequence comparisons among the St, H, Y, P, W, and E genomes detected genome-specific polymorphisms at 66 nucleotide positions. The St and Y genomes are relatively dissimilar. The phylogeny of the Y genome sequences was investigated for the first time. They were most similar to the W genome sequences. The Y genome sequences were placed in two different groups. These two groups were included in an unresolved clade that included the W and E sequences as well as sequences from many annual species. The H genomes sequences were in a clade with the F, P, and Ns genome sequences as sister groups. These two clades were more closely related to each other and to the L and Xp genomes than they were to the St genome sequences. These data support the hypothesis that the Y genome evolved in a diploid species and has a different origin from the St genome. Copyright 2010 Elsevier Inc. All rights reserved.
Reduce Manual Curation by Combining Gene Predictions from Multiple Annotation Engines, a Case Study of Start Codon Prediction

PubMed Central

Ederveen, Thomas H. A.; Overmars, Lex; van Hijum, Sacha A. F. T.

2013-01-01

Nowadays, prokaryotic genomes are sequenced faster than the capacity to manually curate gene annotations. Automated genome annotation engines provide users a straight-forward and complete solution for predicting ORF coordinates and function. For many labs, the use of AGEs is therefore essential to decrease the time necessary for annotating a given prokaryotic genome. However, it is not uncommon for AGEs to provide different and sometimes conflicting predictions. Combining multiple AGEs might allow for more accurate predictions. Here we analyzed the ab initio open reading frame (ORF) calling performance of different AGEs based on curated genome annotations of eight strains from different bacterial species with GC% ranging from 35–52%. We present a case study which demonstrates a novel way of comparative genome annotation, using combinations of AGEs in a pre-defined order (or path) to predict ORF start codons. The order of AGE combinations is from high to low specificity, where the specificity is based on the eight genome annotations. For each AGE combination we are able to derive a so-called projected confidence value, which is the average specificity of ORF start codon prediction based on the eight genomes. The projected confidence enables estimating likeliness of a correct prediction for a particular ORF start codon by a particular AGE combination, pinpointing ORFs notoriously difficult to predict start codons. We correctly predict start codons for 90.5±4.8% of the genes in a genome (based on the eight genomes) with an accuracy of 81.1±7.6%. Our consensus-path methodology allows a marked improvement over majority voting (9.7±4.4%) and with an optimal path ORF start prediction sensitivity is gained while maintaining a high specificity. PMID:23675487
Rapid construction of genome map for large yellow croaker (Larimichthys crocea) by the whole-genome mapping in BioNano Genomics Irys system.

PubMed

Xiao, Shijun; Li, Jiongtang; Ma, Fengshou; Fang, Lujing; Xu, Shuangbin; Chen, Wei; Wang, Zhi Yong

2015-09-03

Large yellow croaker (Larimichthys crocea) is an important commercial fish in China and East-Asia. The annual product of the species from the aqua-farming industry is about 90 thousand tons. In spite of its economic importance, genetic studies of economic traits and genomic selections of the species are hindered by the lack of genomic resources. Specifically, a whole-genome physical map of large yellow croaker is still missing. The traditional BAC-based fingerprint method is extremely time- and labour-consuming. Here we report the first genome map construction using the high-throughput whole-genome mapping technique by nanochannel arrays in BioNano Genomics Irys system. For an optimal marker density of ~10 per 100 kb, the nicking endonuclease Nt.BspQ1 was chosen for the genome map generation. 645,305 DNA molecules with a total length of ~112 Gb were labelled and detected, covering more than 160X of the large yellow croaker genome. Employing IrysView package and signature patterns in raw DNA molecules, a whole-genome map of large yellow croaker was assembled into 686 maps with a total length of 727 Mb, which was consistent with the estimated genome size. The N50 length of the whole-genome map, including 126 maps, was up to 1.7 Mb. The excellent hybrid alignment with large yellow croaker draft genome validated the consensus genome map assembly and highlighted a promising application of whole-genome mapping on draft genome sequence super-scaffolding. The genome map data of large yellow croaker are accessible on lycgenomics.jmu.edu.cn/pm. Using the state-of-the-art whole-genome mapping technique in Irys system, the first whole-genome map for large yellow croaker has been constructed and thus highly facilitates the ongoing genomic and evolutionary studies for the species. To our knowledge, this is the first public report on genome map construction by the whole-genome mapping for aquatic-organisms. Our study demonstrates a promising application of the whole-genome mapping on genome maps construction for other non-model organisms in a fast and reliable manner.
Company profile: Complete Genomics Inc.

PubMed

Reid, Clifford

2011-02-01

Complete Genomics Inc. is a life sciences company that focuses on complete human genome sequencing. It is taking a completely different approach to DNA sequencing than other companies in the industry. Rather than building a general-purpose platform for sequencing all organisms and all applications, it has focused on a single application - complete human genome sequencing. The company's Complete Genomics Analysis Platform (CGA™ Platform) comprises an integrated package of biochemistry, instrumentation and software that sequences human genomes at the highest quality, lowest cost and largest scale available. Complete Genomics offers a turnkey service that enables customers to outsource their human genome sequencing to the company's genome sequencing center in Mountain View, CA, USA. Customers send in their DNA samples, the company does all the library preparation, DNA sequencing, assembly and variant analysis, and customers receive research-ready data that they can use for biological discovery.
Identification of estrogen-responsive genes using a genome-wide analysis of promoter elements for transcription factor binding sites.

PubMed

Kamalakaran, Sitharthan; Radhakrishnan, Senthil K; Beck, William T

2005-06-03

We developed a pipeline to identify novel genes regulated by the steroid hormone-dependent transcription factor, estrogen receptor, through a systematic analysis of upstream regions of all human and mouse genes. We built a data base of putative promoter regions for 23,077 human and 19,984 mouse transcripts from National Center for Biotechnology Information annotation and 8793 human and 6785 mouse promoters from the Data Base of Transcriptional Start Sites. We used this data base of putative promoters to identify potential targets of estrogen receptor by identifying estrogen response elements (EREs) in their promoters. Our program correctly identified EREs in genes known to be regulated by estrogen in addition to several new genes whose putative promoters contained EREs. We validated six genes (KIAA1243, NRIP1, MADH9, NME3, TPD52L, and ABCG2) to be estrogen-responsive in MCF7 cells using reverse transcription PCR. To allow for extensibility of our program in identifying targets of other transcription factors, we have built a Web interface to access our data base and programs. Our Web-based program for Promoter Analysis of Genome, PAGen@UIC, allows a user to identify putative target genes for vertebrate transcription factors through the analysis of their upstream sequences. The interface allows the user to search the human and mouse promoter data bases for potential target genes containing one or more listed transcription factor binding sites (TFBSs) in their upstream elements, using either regular expression-based consensus or position weight matrices. The data base can also be searched for promoters harboring user-defined TFBSs given as a consensus or a position weight matrix. Furthermore, the user can retrieve putative promoter sequences for any given gene together with identified TFBSs located on its promoter. Orthologous promoters are also analyzed to determine conserved elements.
A High-Density Consensus Map of Common Wheat Integrating Four Mapping Populations Scanned by the 90K SNP Array

PubMed Central

Wen, Weie; He, Zhonghu; Gao, Fengmei; Liu, Jindong; Jin, Hui; Zhai, Shengnan; Qu, Yanying; Xia, Xianchun

2017-01-01

A high-density consensus map is a powerful tool for gene mapping, cloning and molecular marker-assisted selection in wheat breeding. The objective of this study was to construct a high-density, single nucleotide polymorphism (SNP)-based consensus map of common wheat (Triticum aestivum L.) by integrating genetic maps from four recombinant inbred line populations. The populations were each genotyped using the wheat 90K Infinium iSelect SNP assay. A total of 29,692 SNP markers were mapped on 21 linkage groups corresponding to 21 hexaploid wheat chromosomes, covering 2,906.86 cM, with an overall marker density of 10.21 markers/cM. Compared with the previous maps based on the wheat 90K SNP chip detected 22,736 (76.6%) of the SNPs with consistent chromosomal locations, whereas 1,974 (6.7%) showed different chromosomal locations, and 4,982 (16.8%) were newly mapped. Alignment of the present consensus map and the wheat expressed sequence tags (ESTs) Chromosome Bin Map enabled assignment of 1,221 SNP markers to specific chromosome bins and 819 ESTs were integrated into the consensus map. The marker orders of the consensus map were validated based on physical positions on the wheat genome with Spearman rank correlation coefficients ranging from 0.69 (4D) to 0.97 (1A, 4B, 5B, and 6A), and were also confirmed by comparison with genetic position on the previously 40K SNP consensus map with Spearman rank correlation coefficients ranging from 0.84 (6D) to 0.99 (6A). Chromosomal rearrangements reported previously were confirmed in the present consensus map and new putative rearrangements were identified. In addition, an integrated consensus map was developed through the combination of five published maps with ours, containing 52,607 molecular markers. The consensus map described here provided a high-density SNP marker map and a reliable order of SNPs, representing a step forward in mapping and validation of chromosomal locations of SNPs on the wheat 90K array. Moreover, it can be used as a reference for quantitative trait loci (QTL) mapping to facilitate exploitation of genes and QTL in wheat breeding. PMID:28848588
Curated eutherian third party data gene data sets.

PubMed

Premzl, Marko

2016-03-01

The free available eutherian genomic sequence data sets advanced scientific field of genomics. Of note, future revisions of gene data sets were expected, due to incompleteness of public eutherian genomic sequence assemblies and potential genomic sequence errors. The eutherian comparative genomic analysis protocol was proposed as guidance in protection against potential genomic sequence errors in public eutherian genomic sequences. The protocol was applicable in updates of 7 major eutherian gene data sets, including 812 complete coding sequences deposited in European Nucleotide Archive as curated third party data gene data sets.
International consensus for neuroblastoma molecular diagnostics: report from the International Neuroblastoma Risk Group (INRG) Biology Committee

PubMed Central

Ambros, P F; Ambros, I M; Brodeur, G M; Haber, M; Khan, J; Nakagawara, A; Schleiermacher, G; Speleman, F; Spitz, R; London, W B; Cohn, S L; Pearson, A D J; Maris, J M

2009-01-01

Neuroblastoma serves as a paradigm for utilising tumour genomic data for determining patient prognosis and treatment allocation. However, before the establishment of the International Neuroblastoma Risk Group (INRG) Task Force in 2004, international consensus on markers, methodology, and data interpretation did not exist, compromising the reliability of decisive genetic markers and inhibiting translational research efforts. The objectives of the INRG Biology Committee were to identify highly prognostic genetic aberrations to be included in the new INRG risk classification schema and to develop precise definitions, decisive biomarkers, and technique standardisation. The review of the INRG database (n=8800 patients) by the INRG Task Force finally enabled the identification of the most significant neuroblastoma biomarkers. In addition, the Biology Committee compared the standard operating procedures of different cooperative groups to arrive at international consensus for methodology, nomenclature, and future directions. Consensus was reached to include MYCN status, 11q23 allelic status, and ploidy in the INRG classification system on the basis of an evidence-based review of the INRG database. Standardised operating procedures for analysing these genetic factors were adopted, and criteria for proper nomenclature were developed. Neuroblastoma treatment planning is highly dependant on tumour cell genomic features, and it is likely that a comprehensive panel of DNA-based biomarkers will be used in future risk assignment algorithms applying genome-wide techniques. Consensus on methodology and interpretation is essential for uniform INRG classification and will greatly facilitate international and cooperative clinical and translational research studies. PMID:19401703
Intragenomic polymorphisms among high-copy loci: a genus-wide study of nuclear ribosomal DNA in Asclepias (Apocynaceae).

PubMed

Weitemier, Kevin; Straub, Shannon C K; Fishbein, Mark; Liston, Aaron

2015-01-01

Despite knowledge that concerted evolution of high-copy loci is often imperfect, studies that investigate the extent of intragenomic polymorphisms and comparisons across a large number of species are rarely made. We present a bioinformatic pipeline for characterizing polymorphisms within an individual among copies of a high-copy locus. Results are presented for nuclear ribosomal DNA (nrDNA) across the milkweed genus, Asclepias. The 18S-26S portion of the nrDNA cistron of Asclepias syriaca served as a reference for assembly of the region from 124 samples representing 90 species of Asclepias. Reads were mapped back to each individual's consensus and at each position reads differing from the consensus were tallied using a custom perl script. Low frequency polymorphisms existed in all individuals (mean = 5.8%). Most nrDNA positions (91%) were polymorphic in at least one individual, with polymorphic sites being less frequent in subunit regions and loops. Highly polymorphic sites existed in each individual, with highest abundance in the "noncoding" ITS regions. Phylogenetic signal was present in the distribution of intragenomic polymorphisms across the genus. Intragenomic polymorphisms in nrDNA are common in Asclepias, being found at higher frequency than any other study to date. The high and variable frequency of polymorphisms across species highlights concerns that phylogenetic applications of nrDNA may be error-prone. The new analytical approach provided here is applicable to other taxa and other high-copy regions characterized by low coverage genome sequencing (genome skimming).
Development and validation of a 20K single nucleotide polymorphism (SNP) whole genome genotyping array for apple (Malus × domestica Borkh).

PubMed

Bianco, Luca; Cestaro, Alessandro; Sargent, Daniel James; Banchi, Elisa; Derdak, Sophia; Di Guardo, Mario; Salvi, Silvio; Jansen, Johannes; Viola, Roberto; Gut, Ivo; Laurens, Francois; Chagné, David; Velasco, Riccardo; van de Weg, Eric; Troggio, Michela

2014-01-01

High-density SNP arrays for genome-wide assessment of allelic variation have made high resolution genetic characterization of crop germplasm feasible. A medium density array for apple, the IRSC 8K SNP array, has been successfully developed and used for screens of bi-parental populations. However, the number of robust and well-distributed markers contained on this array was not sufficient to perform genome-wide association analyses in wider germplasm sets, or Pedigree-Based Analysis at high precision, because of rapid decay of linkage disequilibrium. We describe the development of an Illumina Infinium array targeting 20K SNPs. The SNPs were predicted from re-sequencing data derived from the genomes of 13 Malus × domestica apple cultivars and one accession belonging to a crab apple species (M. micromalus). A pipeline for SNP selection was devised that avoided the pitfalls associated with the inclusion of paralogous sequence variants, supported the construction of robust multi-allelic SNP haploblocks and selected up to 11 entries within narrow genomic regions of ±5 kb, termed focal points (FPs). Broad genome coverage was attained by placing FPs at 1 cM intervals on a consensus genetic map, complementing them with FPs to enrich the ends of each of the chromosomes, and by bridging physical intervals greater than 400 Kbps. The selection also included ∼3.7K validated SNPs from the IRSC 8K array. The array has already been used in other studies where ∼15.8K SNP markers were mapped with an average of ∼6.8K SNPs per full-sib family. The newly developed array with its high density of polymorphic validated SNPs is expected to be of great utility for Pedigree-Based Analysis and Genomic Selection. It will also be a valuable tool to help dissect the genetic mechanisms controlling important fruit quality traits, and to aid the identification of marker-trait associations suitable for the application of Marker Assisted Selection in apple breeding programs.
Development and Validation of a 20K Single Nucleotide Polymorphism (SNP) Whole Genome Genotyping Array for Apple (Malus × domestica Borkh)

PubMed Central

Bianco, Luca; Cestaro, Alessandro; Sargent, Daniel James; Banchi, Elisa; Derdak, Sophia; Di Guardo, Mario; Salvi, Silvio; Jansen, Johannes; Viola, Roberto; Gut, Ivo; Laurens, Francois; Chagné, David; Velasco, Riccardo; van de Weg, Eric; Troggio, Michela

2014-01-01

High-density SNP arrays for genome-wide assessment of allelic variation have made high resolution genetic characterization of crop germplasm feasible. A medium density array for apple, the IRSC 8K SNP array, has been successfully developed and used for screens of bi-parental populations. However, the number of robust and well-distributed markers contained on this array was not sufficient to perform genome-wide association analyses in wider germplasm sets, or Pedigree-Based Analysis at high precision, because of rapid decay of linkage disequilibrium. We describe the development of an Illumina Infinium array targeting 20K SNPs. The SNPs were predicted from re-sequencing data derived from the genomes of 13 Malus × domestica apple cultivars and one accession belonging to a crab apple species (M. micromalus). A pipeline for SNP selection was devised that avoided the pitfalls associated with the inclusion of paralogous sequence variants, supported the construction of robust multi-allelic SNP haploblocks and selected up to 11 entries within narrow genomic regions of ±5 kb, termed focal points (FPs). Broad genome coverage was attained by placing FPs at 1 cM intervals on a consensus genetic map, complementing them with FPs to enrich the ends of each of the chromosomes, and by bridging physical intervals greater than 400 Kbps. The selection also included ∼3.7K validated SNPs from the IRSC 8K array. The array has already been used in other studies where ∼15.8K SNP markers were mapped with an average of ∼6.8K SNPs per full-sib family. The newly developed array with its high density of polymorphic validated SNPs is expected to be of great utility for Pedigree-Based Analysis and Genomic Selection. It will also be a valuable tool to help dissect the genetic mechanisms controlling important fruit quality traits, and to aid the identification of marker-trait associations suitable for the application of Marker Assisted Selection in apple breeding programs. PMID:25303088
Identification of two novel functional p53 responsive elements in the Herpes Simplex Virus-1 genome

PubMed Central

Hsieh, Jui-Cheng; Kuta, Ryan; Armour, Courtney R.; Boehmer, Paul E.

2014-01-01

Analysis of the herpes simplex virus-1 (HSV-1) genome reveals two candidate p53 responsive elements (p53RE), located in proximity to the replication origins oriL and oriS, referred to as p53RE-L and p53RE-S, respectively. The sequences of p53RE-L and p53RE-S conform to the p53 consensus site and are present in HSV-1 strains KOS, 17, and F. p53 binds to both elements in vitro and in virus-infected cells. Both p53RE-L and p53RE-S are capable of conferring p53-dependent transcriptional activation onto a heterologous reporter gene. Importantly, expression of the essential immediate early viral transactivator ICP4 and the essential DNA replication protein ICP8, that are adjacent to p53RE-S and p53RE-L, are repressed in a p53-dependent manner. Taken together, this study identifies two novel functional p53RE in the HSV-1 genome and suggests a complex mechanism of viral gene regulation by p53 which may determine progression of the lytic viral replication cycle or the establishment of latency. PMID:25010269
Approaches for in silico finishing of microbial genome sequences

PubMed Central

Kremer, Frederico Schmitt; McBride, Alan John Alexander; Pinto, Luciano da Silva

2017-01-01

Abstract The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as “drafts”, incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing. PMID:28898352
Approaches for in silico finishing of microbial genome sequences.

PubMed

Kremer, Frederico Schmitt; McBride, Alan John Alexander; Pinto, Luciano da Silva

The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as "drafts", incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing.
DNA sequence analysis of ARS elements from chromosome III of Saccharomyces cerevisiae: identification of a new conserved sequence.

PubMed Central

Palzkill, T G; Oliver, S G; Newlon, C S

1986-01-01

Four fragments of Saccharomyces cerevisiae chromosome III DNA which carry ARS elements have been sequenced. Each fragment contains multiple copies of sequences that have at least 10 out of 11 bases of homology to a previously reported 11 bp core consensus sequence. A survey of these new ARS sequences and previously reported sequences revealed the presence of an additional 11 bp conserved element located on the 3' side of the T-rich strand of the core consensus. Subcloning analysis as well as deletion and transposon insertion mutagenesis of ARS fragments support a role for 3' conserved sequence in promoting ARS activity. PMID:3529036
Enterocin TW21, a novel bacteriocin from dochi-isolated Enterococcus faecium D081821.

PubMed

Chang, S-Y; Chen, Y-S; Pan, S-F; Lee, Y-S; Chang, C-H; Chang, C-H; Yu, B; Wu, H-C

2013-09-01

Purification and characterization of a novel bacteriocin produced by strain Enterococcus faecium D081821. Enterococcus faecium D081821, isolated from the traditional Taiwanese fermented food dochi (fermented black beans), was previously found to produce a bacteriocin against Listeria monocytogenes and some Gram-positive bacteria. This bacteriocin, termed enterocin TW21, was purified from culture supernatant by ammonium sulfate precipitation, Sep-Pak C18 cartridge, ion-exchange and gel filtration chromatography. Mass spectrometry analysis showed the mass of the peptide to be approximately 5300·6 Da. The N-terminal amino acid sequencing yielded a partial sequence NH2 -ATYYGNGVYxNTQK by Edman degradation, and it contains the consensus class IIa bacteriocin motif YGNGV in the N-terminal region. The open reading frame (ORF) encoding the bacteriocin was identified from the draft genome sequence of Enterococcus faecium D081821, and sequence analysis of this peptide indicated that enterocin TW21 is a novel bacteriocin. Enterococcus faecium D081821 produced a bacteriocin named enterocin TW21, the molecular weight and amino acid sequence both revealed it to be a novel bacteriocin. A new member of class IIa bacteriocin was identified. This bacteriocin shows great inhibitory ability against L. monocytogenes and could be applied as a natural food preservative. © 2013 The Society for Applied Microbiology.
Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

PubMed Central

2011-01-01

Background Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS) of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence. Results An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA containing putative SNPs was amplified by PCR from AL8/78 and AS75 and resequenced with the ABI 3730 xl. In a sample of 302 randomly selected putative SNPs, 84.0% in gene regions, 88.0% in repeat junctions, and 81.3% in uncharacterized regions were validated. Conclusion An annotation-based genome-wide SNP discovery pipeline for NGS platforms was developed. The pipeline is suitable for SNP discovery in genomic libraries of complex genomes and does not require a reference genome sequence. The pipeline is applicable to all current NGS platforms, provided that at least one such platform generates relatively long reads. The pipeline package, AGSNP, and the discovered 497,118 Ae. tauschii SNPs can be accessed at (http://avena.pw.usda.gov/wheatD/agsnp.shtml). PMID:21266061

The Functional Human C-Terminome

PubMed Central

Hedden, Michael; Lyon, Kenneth F.; Brooks, Steven B.; David, Roxanne P.; Limtong, Justin; Newsome, Jacklyn M.; Novakovic, Nemanja; Rajasekaran, Sanguthevar; Thapar, Vishal; Williams, Sean R.; Schiller, Martin R.

2016-01-01

All translated proteins end with a carboxylic acid commonly called the C-terminus. Many short functional sequences (minimotifs) are located on or immediately proximal to the C-terminus. However, information about the function of protein C-termini has not been consolidated into a single source. Here, we built a new “C-terminome” database and web system focused on human proteins. Approximately 3,600 C-termini in the human proteome have a minimotif with an established molecular function. To help evaluate the function of the remaining C-termini in the human proteome, we inferred minimotifs identified by experimentation in rodent cells, predicted minimotifs based upon consensus sequence matches, and predicted novel highly repetitive sequences in C-termini. Predictions can be ranked by enrichment scores or Gene Evolutionary Rate Profiling (GERP) scores, a measurement of evolutionary constraint. By searching for new anchored sequences on the last 10 amino acids of proteins in the human proteome with lengths between 3–10 residues and up to 5 degenerate positions in the consensus sequences, we have identified new consensus sequences that predict instances in the majority of human genes. All of this information is consolidated into a database that can be accessed through a C-terminome web system with search and browse functions for minimotifs and human proteins. A known consensus sequence-based predicted function is assigned to nearly half the proteins in the human proteome. Weblink: http://cterminome.bio-toolkit.com. PMID:27050421
Whole Genome Sequencing of Greater Amberjack (Seriola dumerili) for SNP Identification on Aligned Scaffolds and Genome Structural Variation Analysis Using Parallel Resequencing

PubMed Central

Aokic, Jun-ya; Kawase, Junya; Hamada, Kazuhisa; Fujimoto, Hiroshi; Yamamoto, Ikki; Usuki, Hironori

2018-01-01

Greater amberjack (Seriola dumerili) is distributed in tropical and temperate waters worldwide and is an important aquaculture fish. We carried out de novo sequencing of the greater amberjack genome to construct a reference genome sequence to identify single nucleotide polymorphisms (SNPs) for breeding amberjack by marker-assisted or gene-assisted selection as well as to identify functional genes for biological traits. We obtained 200 times coverage and constructed a high-quality genome assembly using next generation sequencing technology. The assembled sequences were aligned onto a yellowtail (Seriola quinqueradiata) radiation hybrid (RH) physical map by sequence homology. A total of 215 of the longest amberjack sequences, with a total length of 622.8 Mbp (92% of the total length of the genome scaffolds), were lined up on the yellowtail RH map. We resequenced the whole genomes of 20 greater amberjacks and mapped the resulting sequences onto the reference genome sequence. About 186,000 nonredundant SNPs were successfully ordered on the reference genome. Further, we found differences in the genome structural variations between two greater amberjack populations using BreakDancer. We also analyzed the greater amberjack transcriptome and mapped the annotated sequences onto the reference genome sequence. PMID:29785397
Rapid and accurate pyrosequencing of angiosperm plastid genomes

PubMed Central

Moore, Michael J; Dhingra, Amit; Soltis, Pamela S; Shaw, Regina; Farmerie, William G; Folta, Kevin M; Soltis, Douglas E

2006-01-01

Background Plastid genome sequence information is vital to several disciplines in plant biology, including phylogenetics and molecular biology. The past five years have witnessed a dramatic increase in the number of completely sequenced plastid genomes, fuelled largely by advances in conventional Sanger sequencing technology. Here we report a further significant reduction in time and cost for plastid genome sequencing through the successful use of a newly available pyrosequencing platform, the Genome Sequencer 20 (GS 20) System (454 Life Sciences Corporation), to rapidly and accurately sequence the whole plastid genomes of the basal eudicot angiosperms Nandina domestica (Berberidaceae) and Platanus occidentalis (Platanaceae). Results More than 99.75% of each plastid genome was simultaneously obtained during two GS 20 sequence runs, to an average depth of coverage of 24.6× in Nandina and 17.3× in Platanus. The Nandina and Platanus plastid genomes shared essentially identical gene complements and possessed the typical angiosperm plastid structure and gene arrangement. To assess the accuracy of the GS 20 sequence, over 45 kilobases of sequence were generated for each genome using conventional sequencing. Overall error rates of 0.043% and 0.031% were observed in GS 20 sequence for Nandina and Platanus, respectively. More than 97% of all observed errors were associated with homopolymer runs, with ~60% of all errors associated with homopolymer runs of 5 or more nucleotides and ~50% of all errors associated with regions of extensive homopolymer runs. No substitution errors were present in either genome. Error rates were generally higher in the single-copy and noncoding regions of both plastid genomes relative to the inverted repeat and coding regions. Conclusion Highly accurate and essentially complete sequence information was obtained for the Nandina and Platanus plastid genomes using the GS 20 System. More importantly, the high accuracy observed in the GS 20 plastid genome sequence was generated for a significant reduction in time and cost over traditional shotgun-based genome sequencing techniques, although with approximately half the coverage of previously reported GS 20 de novo genome sequence. The GS 20 should be broadly applicable to angiosperm plastid genome sequencing, and therefore promises to expand the scale of plant genetic and phylogenetic research dramatically. PMID:16934154
Genome Sequencing of Steroid Producing Bacteria Using Ion Torrent Technology and a Reference Genome.

PubMed

Sola-Landa, Alberto; Rodríguez-García, Antonio; Barreiro, Carlos; Pérez-Redondo, Rosario

2017-01-01

The Next-Generation Sequencing technology has enormously eased the bacterial genome sequencing and several tens of thousands of genomes have been sequenced during the last 10 years. Most of the genome projects are published as draft version, however, for certain applications the complete genome sequence is required.In this chapter, we describe the strategy that allowed the complete genome sequencing of Mycobacterium neoaurum NRRL B-3805, an industrial strain exploited for steroid production, using Ion Torrent sequencing reads and the genome of a close strain as the reference. This protocol can be applied to analyze the genetic variations between closely related strains; for example, to elucidate the point mutations between a parental strain and a random mutagenesis-derived mutant.
BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes.

PubMed

Staňková, Helena; Hastie, Alex R; Chan, Saki; Vrána, Jan; Tulpová, Zuzana; Kubaláková, Marie; Visendi, Paul; Hayashi, Satomi; Luo, Mingcheng; Batley, Jacqueline; Edwards, David; Doležel, Jaroslav; Šimková, Hana

2016-07-01

The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC-by-BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high-resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high-resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome-scale analysis of repetitive sequences and revealed a ~800-kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone-by-clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC-contig physical map and validate sequence assembly on a chromosome-arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome-by-chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
A first generation BAC-based physical map of the rainbow trout genome

PubMed Central

Palti, Yniv; Luo, Ming-Cheng; Hu, Yuqin; Genet, Carine; You, Frank M; Vallejo, Roger L; Thorgaard, Gary H; Wheeler, Paul A; Rexroad, Caird E

2009-01-01

Background Rainbow trout (Oncorhynchus mykiss) are the most-widely cultivated cold freshwater fish in the world and an important model species for many research areas. Coupling great interest in this species as a research model with the need for genetic improvement of aquaculture production efficiency traits justifies the continued development of genomics research resources. Many quantitative trait loci (QTL) have been identified for production and life-history traits in rainbow trout. A bacterial artificial chromosome (BAC) physical map is needed to facilitate fine mapping of QTL and the selection of positional candidate genes for incorporation in marker-assisted selection (MAS) for improving rainbow trout aquaculture production. This resource will also facilitate efforts to obtain and assemble a whole-genome reference sequence for this species. Results The physical map was constructed from DNA fingerprinting of 192,096 BAC clones using the 4-color high-information content fingerprinting (HICF) method. The clones were assembled into physical map contigs using the finger-printing contig (FPC) program. The map is composed of 4,173 contigs and 9,379 singletons. The total number of unique fingerprinting fragments (consensus bands) in contigs is 1,185,157, which corresponds to an estimated physical length of 2.0 Gb. The map assembly was validated by 1) comparison with probe hybridization results and agarose gel fingerprinting contigs; and 2) anchoring large contigs to the microsatellite-based genetic linkage map. Conclusion The production and validation of the first BAC physical map of the rainbow trout genome is described in this paper. We are currently integrating this map with the NCCCWA genetic map using more than 200 microsatellites isolated from BAC end sequences and by identifying BACs that harbor more than 300 previously mapped markers. The availability of an integrated physical and genetic map will enable detailed comparative genome analyses, fine mapping of QTL, positional cloning, selection of positional candidate genes for economically important traits and the incorporation of MAS into rainbow trout breeding programs. PMID:19814815
Human Adenovirus Core Protein V Is Targeted by the Host SUMOylation Machinery To Limit Essential Viral Functions.

PubMed

Freudenberger, Nora; Meyer, Tina; Groitl, Peter; Dobner, Thomas; Schreiner, Sabrina

2018-02-15

Human adenoviruses (HAdV) are nonenveloped viruses containing a linear, double-stranded DNA genome surrounded by an icosahedral capsid. To allow proper viral replication, the genome is imported through the nuclear pore complex associated with viral core proteins. Until now, the role of these incoming virion proteins during the early phase of infection was poorly understood. The core protein V is speculated to bridge the core and the surrounding capsid. It binds the genome in a sequence-independent manner and localizes in the nucleus of infected cells, accumulating at nucleoli. Here, we show that protein V contains conserved SUMO conjugation motifs (SCMs). Mutation of these consensus motifs resulted in reduced SUMOylation of the protein; thus, protein V represents a novel target of the host SUMOylation machinery. To understand the role of protein V SUMO posttranslational modification during productive HAdV infection, we generated a replication-competent HAdV with SCM mutations within the protein V coding sequence. Phenotypic analyses revealed that these SCM mutations are beneficial for adenoviral replication. Blocking protein V SUMOylation at specific sites shifts the onset of viral DNA replication to earlier time points during infection and promotes viral gene expression. Simultaneously, the altered kinetics within the viral life cycle are accompanied by more efficient proteasomal degradation of host determinants and increased virus progeny production than that observed during wild-type infection. Taken together, our studies show that protein V SUMOylation reduces virus growth; hence, protein V SUMOylation represents an important novel aspect of the host antiviral strategy to limit virus replication and thereby points to potential intervention strategies. IMPORTANCE Many decades of research have revealed that HAdV structural proteins promote viral entry and mainly physical stability of the viral genome in the capsid. Our work over the last years showed that this concept needs expansion as the functions are more diverse. We showed that capsid protein VI regulates the antiviral response by modulation of the transcription factor Daxx during infection. Moreover, core protein VII interacts with SPOC1 restriction factor, which is beneficial for efficient viral gene expression. Here, we were able to show that core protein V also represents a novel substrate of the host SUMOylation machinery and contains several conserved SCMs; mutation of these consensus motifs reduced SUMOylation of the protein. Unexpectedly, we observed that introducing these mutations into HAdV promotes adenoviral replication. In conclusion, we offer novel insights into adenovirus core proteins and provide evidence that SUMOylation of HAdV factors regulates replication efficiency. Copyright © 2018 American Society for Microbiology.
Whole-genome random sequencing and assembly of Haemophilus influenzae Rd

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fleischmann, R.D.; Adams, M.D.; White, O.

1995-07-28

An approach for genome analysis based on sequencing and assembly of unselected pieces of DNA from the whole chromosome has been applied to obtain the complete nucleotide sequence (1,830,137 base pairs) of the genome from the bacterium Haemophilus influenzae Rd. This approach eliminates the need for initial mapping efforts and is therefore applicable to the vast array of microbial species for which genome maps are unavailable. The H. influenzae Rd genome sequence (Genome Sequence DataBase accession number L42023) represents the only complete genome sequence from a free-living organism. 46 refs., 4 figs., 4 tabs.
Novel poly-uridine insertion in the 3'UTR and E2 amino acid substitutions in a low virulent classical swine fever virus.

PubMed

Coronado, Liani; Liniger, Matthias; Muñoz-González, Sara; Postel, Alexander; Pérez, Lester Josue; Pérez-Simó, Marta; Perera, Carmen Laura; Frías-Lepoureau, Maria Teresa; Rosell, Rosa; Grundhoff, Adam; Indenbirken, Daniela; Alawi, Malik; Fischer, Nicole; Becher, Paul; Ruggli, Nicolas; Ganges, Llilianne

2017-03-01

In this study, we compared the virulence in weaner pigs of the Pinar del Rio isolate and the virulent Margarita strain. The latter caused the Cuban classical swine fever (CSF) outbreak of 1993. Our results showed that the Pinar del Rio virus isolated during an endemic phase is clearly of low virulence. We analysed the complete nucleotide sequence of the Pinar del Rio virus isolated after persistence in newborn piglets, as well as the genome sequence of the inoculum. The consensus genome sequence of the Pinar del Rio virus remained completely unchanged after 28days of persistent infection in swine. More importantly, a unique poly-uridine tract was discovered in the 3'UTR of the Pinar del Rio virus, which was not found in the Margarita virus or any other known CSFV sequences. Based on RNA secondary structure prediction, the poly-uridine tract results in a long single-stranded intervening sequence (SS) between the stem-loops I and II of the 3'UTR, without major changes in the stem- loop structures when compared to the Margarita virus. The possible implications of this novel insertion on persistence and attenuation remain to be investigated. In addition, comparison of the amino acid sequence of the viral proteins E rns , E1, E2 and p7 of the Margarita and Pinar del Rio viruses showed that all non-conservative amino acid substitutions acquired by the Pinar del Rio isolate clustered in E2, with two of them being located within the B/C domain. Immunisation and cross-neutralisation experiments in pigs and rabbits suggest differences between these two viruses, which may be attributable to the amino acid differences observed in E2. Altogether, these data provide fresh insights into viral molecular features which might be associated with the attenuation and adaptation of CSFV for persistence in the field. Copyright © 2017 Elsevier B.V. All rights reserved.
Improving transmembrane protein consensus topology prediction using inter-helical interaction.

PubMed

Wang, Han; Zhang, Chao; Shi, Xiaohu; Zhang, Li; Zhou, You

2012-11-01

Alpha helix transmembrane proteins (αTMPs) represent roughly 30% of all open reading frames (ORFs) in a typical genome and are involved in many critical biological processes. Due to the special physicochemical properties, it is hard to crystallize and obtain high resolution structures experimentally, thus, sequence-based topology prediction is highly desirable for the study of transmembrane proteins (TMPs), both in structure prediction and function prediction. Various model-based topology prediction methods have been developed, but the accuracy of those individual predictors remain poor due to the limitation of the methods or the features they used. Thus, the consensus topology prediction method becomes practical for high accuracy applications by combining the advances of the individual predictors. Here, based on the observation that inter-helical interactions are commonly found within the transmembrane helixes (TMHs) and strongly indicate the existence of them, we present a novel consensus topology prediction method for αTMPs, CNTOP, which incorporates four top leading individual topology predictors, and further improves the prediction accuracy by using the predicted inter-helical interactions. The method achieved 87% prediction accuracy based on a benchmark dataset and 78% accuracy based on a non-redundant dataset which is composed of polytopic αTMPs. Our method derives the highest topology accuracy than any other individual predictors and consensus predictors, at the same time, the TMHs are more accurately predicted in their length and locations, where both the false positives (FPs) and the false negatives (FNs) decreased dramatically. The CNTOP is available at: http://ccst.jlu.edu.cn/JCSB/cntop/CNTOP.html. Copyright © 2012 Elsevier B.V. All rights reserved.
Fungal genome sequencing: basic biology to biotechnology.

PubMed

Sharma, Krishna Kant

2016-08-01

The genome sequences provide a first glimpse into the genomic basis of the biological diversity of filamentous fungi and yeast. The genome sequence of the budding yeast, Saccharomyces cerevisiae, with a small genome size, unicellular growth, and rich history of genetic and molecular analyses was a milestone of early genomics in the 1990s. The subsequent completion of fission yeast, Schizosaccharomyces pombe and genetic model, Neurospora crassa initiated a revolution in the genomics of the fungal kingdom. In due course of time, a substantial number of fungal genomes have been sequenced and publicly released, representing the widest sampling of genomes from any eukaryotic kingdom. An ambitious genome-sequencing program provides a wealth of data on metabolic diversity within the fungal kingdom, thereby enhancing research into medical science, agriculture science, ecology, bioremediation, bioenergy, and the biotechnology industry. Fungal genomics have higher potential to positively affect human health, environmental health, and the planet's stored energy. With a significant increase in sequenced fungal genomes, the known diversity of genes encoding organic acids, antibiotics, enzymes, and their pathways has increased exponentially. Currently, over a hundred fungal genome sequences are publicly available; however, no inclusive review has been published. This review is an initiative to address the significance of the fungal genome-sequencing program and provides the road map for basic and applied research.
Genome Improvement at JGI-HAGSC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grimwood, Jane; Schmutz, Jeremy J.; Myers, Richard M.

Since the completion of the sequencing of the human genome, the Joint Genome Institute (JGI) has rapidly expanded its scientific goals in several DOE mission-relevant areas. At the JGI-HAGSC, we have kept pace with this rapid expansion of projects with our focus on assessing, assembling, improving and finishing eukaryotic whole genome shotgun (WGS) projects for which the shotgun sequence is generated at the Production Genomic Facility (JGI-PGF). We follow this by combining the draft WGS with genomic resources generated at JGI-HAGSC or in collaborator laboratories (including BAC end sequences, genetic maps and FLcDNA sequences) to produce an improved draft sequence.more » For eukaryotic genomes important to the DOE mission, we then add further information from directed experiments to produce reference genomic sequences that are publicly available for any scientific researcher. Also, we have continued our program for producing BAC-based finished sequence, both for adding information to JGI genome projects and for small BAC-based sequencing projects proposed through any of the JGI sequencing programs. We have now built our computational expertise in WGS assembly and analysis and have moved eukaryotic genome assembly from the JGI-PGF to JGI-HAGSC. We have concentrated our assembly development work on large plant genomes and complex fungal and algal genomes.« less
Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.

2005-08-26

Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. Amore » minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.« less
Reference-quality genome sequence of Aegilops tauschii, the source of wheat D genome, shows that recombination shapes genome structure and evolution

USDA-ARS?s Scientific Manuscript database

Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat and an important genetic resource for wheat. A reference-quality sequence for the Ae. tauschii genome was produced with a combination of ordered-clone sequencing, whole-genome shotgun sequencing, and BioNano optical geno...
A reference consensus genetic map for molecular markers and economically important traits in faba bean (Vicia faba L.)

PubMed Central

2013-01-01

Background Faba bean (Vicia faba L.) is among the earliest domesticated crops from the Near East. Today this legume is a key protein feed and food worldwide and continues to serve an important role in culinary traditions throughout Middle East, Mediterranean region, China and Ethiopia. Adapted to a wide range of soil types, the main faba bean breeding objectives are to improve yield, resistance to biotic and abiotic stresses, seed quality and other agronomic traits. Genomic approaches aimed at enhancing faba bean breeding programs require high-quality genetic linkage maps to facilitate quantitative trait locus analysis and gene tagging for use in a marker-assisted selection. The objective of this study was to construct a reference consensus map in faba bean by joining the information from the most relevant maps reported so far in this crop. Results A combination of two approaches, increasing the number of anchor loci in diverse mapping populations and joining the corresponding genetic maps, was used to develop a reference consensus map in faba bean. The map was constructed from three main recombinant inbreed populations derived from four parental lines, incorporates 729 markers and is based on 69 common loci. It spans 4,602 cM with a range from 323 to 1041 loci in six main linkage groups or chromosomes, and an average marker density of one locus every 6 cM. Locus order is generally well maintained between the consensus map and the individual maps. Conclusion We have constructed a reliable and fairly dense consensus genetic linkage map that will serve as a basis for genomic approaches in faba bean research and breeding. The core map contains a larger number of markers than any previous individual map, covers existing gaps and achieves a wider coverage of the large faba bean genome as a whole. This tool can be used as a reference resource for studies in different genetic backgrounds, and provides a framework for transferring genetic information when using different marker technologies. Combined with syntenic approaches, the consensus map will increase marker density in selected genomic regions and will be useful for future faba bean molecular breeding applications. PMID:24377374
Identification of a precursor genomic segment that provided a sequence unique to glycophorin B and E genes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Onda, M.; Kudo, S.; Fukuda, M.

Human glycophorin A, B, and E (GPA, GPB, and GPE) genes belong to a gene family located at the long arm of chromosome 4. These three genes are homologous from the 5'-flanking sequence to the Alu sequence, which is 1 kb downstream from the exon encoding the transmembrane domain. Analysis of the Alu sequence and flanking direct repeat sequences suggested that the GPA gene most closely resembles the ancestral gene, whereas the GPB and GPE gene arose by homologous recombination within the Alu sequence, acquiring 3' sequences from an unrelated precursor genomic segment. Here the authors describe the identification ofmore » this putative precursor genomic segment. A human genomic library was screened by using the sequence of the 3' region of the GPB gene as a probe. The genomic clones isolated were found to contain an Alu sequence that appeared to be involved in the recombination. Downstream from the Alu sequence, the nucleotide sequence of the precursor genomic segment is almost identical to that of the GPB or GPE gene. In contrast, the upstream sequence of the genomic segment differs entirely from that of the GPA, GPB, and GPE genes. Conservation of the direct repeats flanking the Alu sequence of the genomic segment strongly suggests that the sequence of this genomic segment has been maintained during evolution. This identified genomic segment was found to reside downstream from the GPA gene by both gene mapping and in situ chromosomal localization. The precursor genomic segment was also identified in the orangutan genome, which is known to lack GPB and GPE genes. These results indicate that one of the duplicated ancestral glycophorin genes acquired a unique 3' sequence by unequal crossing-over through its Alu sequence and the further downstream Alu sequence present in the duplicated gene. Further duplication and divergence of this gene yielded the GPB and GPE genes. 37 refs., 5 figs.« less
SubCellProt: predicting protein subcellular localization using machine learning approaches.

PubMed

Garg, Prabha; Sharma, Virag; Chaudhari, Pradeep; Roy, Nilanjan

2009-01-01

High-throughput genome sequencing projects continue to churn out enormous amounts of raw sequence data. However, most of this raw sequence data is unannotated and, hence, not very useful. Among the various approaches to decipher the function of a protein, one is to determine its localization. Experimental approaches for proteome annotation including determination of a protein's subcellular localizations are very costly and labor intensive. Besides the available experimental methods, in silico methods present alternative approaches to accomplish this task. Here, we present two machine learning approaches for prediction of the subcellular localization of a protein from the primary sequence information. Two machine learning algorithms, k Nearest Neighbor (k-NN) and Probabilistic Neural Network (PNN) were used to classify an unknown protein into one of the 11 subcellular localizations. The final prediction is made on the basis of a consensus of the predictions made by two algorithms and a probability is assigned to it. The results indicate that the primary sequence derived features like amino acid composition, sequence order and physicochemical properties can be used to assign subcellular localization with a fair degree of accuracy. Moreover, with the enhanced accuracy of our approach and the definition of a prediction domain, this method can be used for proteome annotation in a high throughput manner. SubCellProt is available at www.databases.niper.ac.in/SubCellProt.
The case for the continuing use of the revised Cambridge Reference Sequence (rCRS) and the standardization of notation in human mitochondrial DNA studies.

PubMed

Bandelt, Hans-Jürgen; Kloss-Brandstätter, Anita; Richards, Martin B; Yao, Yong-Gang; Logan, Ian

2014-02-01

Since the determination in 1981 of the sequence of the human mitochondrial DNA (mtDNA) genome, the Cambridge Reference Sequence (CRS), has been used as the reference sequence to annotate mtDNA in molecular anthropology, forensic science and medical genetics. The CRS was eventually upgraded to the revised version (rCRS) in 1999. This reference sequence is a convenient device for recording mtDNA variation, although it has often been misunderstood as a wild-type (WT) or consensus sequence by medical geneticists. Recently, there has been a proposal to replace the rCRS with the so-called Reconstructed Sapiens Reference Sequence (RSRS). Even if it had been estimated accurately, the RSRS would be a cumbersome substitute for the rCRS, as the new proposal fuses--and thus confuses--the two distinct concepts of ancestral lineage and reference point for human mtDNA. Instead, we prefer to maintain the rCRS and to report mtDNA profiles by employing the hitherto predominant circumfix style. Tree diagrams could display mutations by using either the profile notation (in conventional short forms where appropriate) or in a root-upwards way with two suffixes indicating ancestral and derived nucleotides. This would guard against misunderstandings about reporting mtDNA variation. It is therefore neither necessary nor sensible to change the present reference sequence, the rCRS, in any way. The proposed switch to RSRS would inevitably lead to notational chaos, mistakes and misinterpretations.
Location analysis for the estrogen receptor-α reveals binding to diverse ERE sequences and widespread binding within repetitive DNA elements

PubMed Central

Mason, Christopher E.; Shu, Feng-Jue; Wang, Cheng; Session, Ryan M.; Kallen, Roland G.; Sidell, Neil; Yu, Tianwei; Liu, Mei Hui; Cheung, Edwin; Kallen, Caleb B.

2010-01-01

Location analysis for estrogen receptor-α (ERα)-bound cis-regulatory elements was determined in MCF7 cells using chromatin immunoprecipitation (ChIP)-on-chip. Here, we present the estrogen response element (ERE) sequences that were identified at ERα-bound loci and quantify the incidence of ERE sequences under two stringencies of detection: <10% and 10–20% nucleotide deviation from the canonical ERE sequence. We demonstrate that ∼50% of all ERα-bound loci do not have a discernable ERE and show that most ERα-bound EREs are not perfect consensus EREs. Approximately one-third of all ERα-bound ERE sequences reside within repetitive DNA sequences, most commonly of the AluS family. In addition, the 3-bp spacer between the inverted ERE half-sites, rather than being random nucleotides, is C(A/T)G-enriched at bona fide receptor targets. Diverse ERα-bound loci were validated using electrophoretic mobility shift assay and ChIP-polymerase chain reaction (PCR). The functional significance of receptor-bound loci was demonstrated using luciferase reporter assays which proved that repetitive element ERE sequences contribute to enhancer function. ChIP-PCR demonstrated estrogen-dependent recruitment of the coactivator SRC3 to these loci in vivo. Our data demonstrate that ERα binds to widely variant EREs with less sequence specificity than had previously been suspected and that binding at repetitive and nonrepetitive genomic targets is favored by specific trinucleotide spacers. PMID:20047966
Location analysis for the estrogen receptor-alpha reveals binding to diverse ERE sequences and widespread binding within repetitive DNA elements.

PubMed

Mason, Christopher E; Shu, Feng-Jue; Wang, Cheng; Session, Ryan M; Kallen, Roland G; Sidell, Neil; Yu, Tianwei; Liu, Mei Hui; Cheung, Edwin; Kallen, Caleb B

2010-04-01

Location analysis for estrogen receptor-alpha (ERalpha)-bound cis-regulatory elements was determined in MCF7 cells using chromatin immunoprecipitation (ChIP)-on-chip. Here, we present the estrogen response element (ERE) sequences that were identified at ERalpha-bound loci and quantify the incidence of ERE sequences under two stringencies of detection: <10% and 10-20% nucleotide deviation from the canonical ERE sequence. We demonstrate that approximately 50% of all ERalpha-bound loci do not have a discernable ERE and show that most ERalpha-bound EREs are not perfect consensus EREs. Approximately one-third of all ERalpha-bound ERE sequences reside within repetitive DNA sequences, most commonly of the AluS family. In addition, the 3-bp spacer between the inverted ERE half-sites, rather than being random nucleotides, is C(A/T)G-enriched at bona fide receptor targets. Diverse ERalpha-bound loci were validated using electrophoretic mobility shift assay and ChIP-polymerase chain reaction (PCR). The functional significance of receptor-bound loci was demonstrated using luciferase reporter assays which proved that repetitive element ERE sequences contribute to enhancer function. ChIP-PCR demonstrated estrogen-dependent recruitment of the coactivator SRC3 to these loci in vivo. Our data demonstrate that ERalpha binds to widely variant EREs with less sequence specificity than had previously been suspected and that binding at repetitive and nonrepetitive genomic targets is favored by specific trinucleotide spacers.

in silico Whole Genome Sequencer & Analyzer (iWGS): A Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhou, Xiaofan; Peris, David; Kominek, Jacek

The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimentalmore » design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.« less
in silico Whole Genome Sequencer & Analyzer (iWGS): A Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies

DOE PAGES

Zhou, Xiaofan; Peris, David; Kominek, Jacek; ...

2016-09-16

The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimentalmore » design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.« less
Middle East Respiratory Syndrome Coronavirus Intra-Host Populations Are Characterized by Numerous High Frequency Variants

DOE Office of Scientific and Technical Information (OSTI.GOV)

Borucki, Monica K.; Lao, Victoria; Hwang, Mona

Middle East respiratory syndrome coronavirus (MERS-CoV) is an emerging human pathogen related to SARS virus. In vitro studies indicate this virus may have a broad host range suggesting an increased pandemic potential. Genetic and epidemiological evidence indicate camels serve as a reservoir for MERS virus but the mechanism of cross species transmission is unclear and many questions remain regarding the susceptibility of humans to infection. Deep sequencing data was obtained from the nasal samples of three camels that had been experimentally infected with a human MERS-CoV isolate. A majority of the genome was covered and average coverage was greater thanmore » 12,000x depth. Although only 5 mutations were detected in the consensus sequences, 473 intrahost single nucleotide variants were identified. Lastly, many of these variants were present at high frequencies and could potentially influence viral phenotype and the sensitivity of detection assays that target these regions for primer or probe binding.« less
Active role of a human genomic insert in replication of a yeast artificial chromosome.

PubMed

van Brabant, A J; Fangman, W L; Brewer, B J

1999-06-01

Yeast artificial chromosomes (YACs) are a common tool for cloning eukaryotic DNA. The manner by which large pieces of foreign DNA are assimilated by yeast cells into a functional chromosome is poorly understood, as is the reason why some of them are stably maintained and some are not. We examined the replication of a stable YAC containing a 240-kb insert of DNA from the human T-cell receptor beta locus. The human insert contains multiple sites that serve as origins of replication. The activity of these origins appears to require the yeast ARS consensus sequence and, as with yeast origins, additional flanking sequences. In addition, the origins in the human insert exhibit a spacing, a range of activation efficiencies, and a variation in times of activation during S phase similar to those found for normal yeast chromosomes. We propose that an appropriate combination of replication origin density, activation times, and initiation efficiencies is necessary for the successful maintenance of YAC inserts.
Middle East Respiratory Syndrome Coronavirus Intra-Host Populations Are Characterized by Numerous High Frequency Variants

DOE PAGES

Borucki, Monica K.; Lao, Victoria; Hwang, Mona; ...

2016-01-20

Middle East respiratory syndrome coronavirus (MERS-CoV) is an emerging human pathogen related to SARS virus. In vitro studies indicate this virus may have a broad host range suggesting an increased pandemic potential. Genetic and epidemiological evidence indicate camels serve as a reservoir for MERS virus but the mechanism of cross species transmission is unclear and many questions remain regarding the susceptibility of humans to infection. Deep sequencing data was obtained from the nasal samples of three camels that had been experimentally infected with a human MERS-CoV isolate. A majority of the genome was covered and average coverage was greater thanmore » 12,000x depth. Although only 5 mutations were detected in the consensus sequences, 473 intrahost single nucleotide variants were identified. Lastly, many of these variants were present at high frequencies and could potentially influence viral phenotype and the sensitivity of detection assays that target these regions for primer or probe binding.« less
Visualization of genome signatures of eukaryote genomes by batch-learning self-organizing map with a special emphasis on Drosophila genomes.

PubMed

Abe, Takashi; Hamano, Yuta; Ikemura, Toshimichi

2014-01-01

A strategy of evolutionary studies that can compare vast numbers of genome sequences is becoming increasingly important with the remarkable progress of high-throughput DNA sequencing methods. We previously established a sequence alignment-free clustering method "BLSOM" for di-, tri-, and tetranucleotide compositions in genome sequences, which can characterize sequence characteristics (genome signatures) of a wide range of species. In the present study, we generated BLSOMs for tetra- and pentanucleotide compositions in approximately one million sequence fragments derived from 101 eukaryotes, for which almost complete genome sequences were available. BLSOM recognized phylotype-specific characteristics (e.g., key combinations of oligonucleotide frequencies) in the genome sequences, permitting phylotype-specific clustering of the sequences without any information regarding the species. In our detailed examination of 12 Drosophila species, the correlation between their phylogenetic classification and the classification on the BLSOMs was observed to visualize oligonucleotides diagnostic for species-specific clustering.
Snake Genome Sequencing: Results and Future Prospects

PubMed Central

Kerkkamp, Harald M. I.; Kini, R. Manjunatha; Pospelov, Alexey S.; Vonk, Freek J.; Henkel, Christiaan V.; Richardson, Michael K.

2016-01-01

Snake genome sequencing is in its infancy—very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression. PMID:27916957
Snake Genome Sequencing: Results and Future Prospects.

PubMed

Kerkkamp, Harald M I; Kini, R Manjunatha; Pospelov, Alexey S; Vonk, Freek J; Henkel, Christiaan V; Richardson, Michael K

2016-12-01

Snake genome sequencing is in its infancy-very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression.
Sequencing and comparative genomic analysis of 1227 Felis catus cDNA sequences enriched for developmental, clinical and nutritional phenotypes

PubMed Central

2012-01-01

Background The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated. Results We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes. Conclusions The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information. PMID:22257742
Whole Genome Complete Resequencing of Bacillus subtilis Natto by Combining Long Reads with High-Quality Short Reads

PubMed Central

Kamada, Mayumi; Hase, Sumitaka; Sato, Kengo; Toyoda, Atsushi; Fujiyama, Asao; Sakakibara, Yasubumi

2014-01-01

De novo microbial genome sequencing reached a turning point with third-generation sequencing (TGS) platforms, and several microbial genomes have been improved by TGS long reads. Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and it has a function in the production of the traditional Japanese fermented food “natto.” The B. subtilis natto BEST195 genome was previously sequenced with short reads, but it included some incomplete regions. We resequenced the BEST195 genome using a PacBio RS sequencer, and we successfully obtained a complete genome sequence from one scaffold without any gaps, and we also applied Illumina MiSeq short reads to enhance quality. Compared with the previous BEST195 draft genome and Marburg 168 genome, we found that incomplete regions in the previous genome sequence were attributed to GC-bias and repetitive sequences, and we also identified some novel genes that are found only in the new genome. PMID:25329997
Automated Sanger Analysis Pipeline (ASAP): A Tool for Rapidly Analyzing Sanger Sequencing Data with Minimum User Interference.

PubMed

Singh, Aditya; Bhatia, Prateek

2016-12-01

Sanger sequencing platforms, such as applied biosystems instruments, generate chromatogram files. Generally, for 1 region of a sequence, we use both forward and reverse primers to sequence that area, in that way, we have 2 sequences that need to be aligned and a consensus generated before mutation detection studies. This work is cumbersome and takes time, especially if the gene is large with many exons. Hence, we devised a rapid automated command system to filter, build, and align consensus sequences and also optionally extract exonic regions, translate them in all frames, and perform an amino acid alignment starting from raw sequence data within a very short time. In full capabilities of Automated Mutation Analysis Pipeline (ASAP), it is able to read "*.ab1" chromatogram files through command line interface, convert it to the FASTQ format, trim the low-quality regions, reverse-complement the reverse sequence, create a consensus sequence, extract the exonic regions using a reference exonic sequence, translate the sequence in all frames, and align the nucleic acid and amino acid sequences to reference nucleic acid and amino acid sequences, respectively. All files are created and can be used for further analysis. ASAP is available as Python 3.x executable at https://github.com/aditya-88/ASAP. The version described in this paper is 0.28.
Sequencing intractable DNA to close microbial genomes.

PubMed

Hurt, Richard A; Brown, Steven D; Podar, Mircea; Palumbo, Anthony V; Elias, Dwayne A

2012-01-01

Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.
First complete genome sequence of infectious laryngotracheitis virus

PubMed Central

2011-01-01

Background Infectious laryngotracheitis virus (ILTV) is an alphaherpesvirus that causes acute respiratory disease in chickens worldwide. To date, only one complete genomic sequence of ILTV has been reported. This sequence was generated by concatenating partial sequences from six different ILTV strains. Thus, the full genomic sequence of a single (individual) strain of ILTV has not been determined previously. This study aimed to use high throughput sequencing technology to determine the complete genomic sequence of a live attenuated vaccine strain of ILTV. Results The complete genomic sequence of the Serva vaccine strain of ILTV was determined, annotated and compared to the concatenated ILTV reference sequence. The genome size of the Serva strain was 152,628 bp, with a G + C content of 48%. A total of 80 predicted open reading frames were identified. The Serva strain had 96.5% DNA sequence identity with the concatenated ILTV sequence. Notably, the concatenated ILTV sequence was found to lack four large regions of sequence, including 528 bp and 594 bp of sequence in the UL29 and UL36 genes, respectively, and two copies of a 1,563 bp sequence in the repeat regions. Considerable differences in the size of the predicted translation products of 4 other genes (UL54, UL30, UL37 and UL38) were also identified. More than 530 single-nucleotide polymorphisms (SNPs) were identified. Most SNPs were located within three genomic regions, corresponding to sequence from the SA-2 ILTV vaccine strain in the concatenated ILTV sequence. Conclusions This is the first complete genomic sequence of an individual ILTV strain. This sequence will facilitate future comparative genomic studies of ILTV by providing an appropriate reference sequence for the sequence analysis of other ILTV strains. PMID:21501528
Insights from 20 years of bacterial genome sequencing

DOE PAGES

Land, Miriam L.; Hauser, Loren; Jun, Se-Ran; ...

2015-02-27

Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date,more » there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them.« less
Insights from 20 years of bacterial genome sequencing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Land, Miriam L.; Hauser, Loren; Jun, Se-Ran

Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date,more » there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them.« less
RefSeq microbial genomes database: new representation and annotation strategy.

PubMed

Tatusova, Tatiana; Ciufo, Stacy; Fedorov, Boris; O'Neill, Kathleen; Tolstoy, Igor

2014-01-01

The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives. These can be accessed through the Entrez search and retrieval system at http://www.ncbi.nlm.nih.gov/genome. Next-generation sequencing has enabled researchers to perform genomic sequencing at rates that were unimaginable in the past. Microbial genomes can now be sequenced in a matter of hours, which has led to a significant increase in the number of assembled genomes deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the annotation, analysis and visualization bioinformatics tools. New strategies have been developed for the annotation and representation of reference genomes and sequence variations derived from population studies and clinical outbreaks.
Gene calling and bacterial genome annotation with BG7.

PubMed

Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

2015-01-01

New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).
An improved consensus linkage map of barley based on flow-sorted chromosomes and SNP markers

USDA-ARS?s Scientific Manuscript database

Recent advances in high-throughput genotyping have made it easier to combine information from different mapping populations into consensus genetic maps, which provide increased marker density and genome coverage compared to individual maps. Previously, a SNP-based genotyping platform was developed a...
PCR Amplification Strategies towards full-length HIV-1 Genome sequencing.

PubMed

Liu, Chao Chun; Ji, Hezhao

2018-06-26

The advent of next generation sequencing has enabled greater resolution of viral diversity and improved feasibility of full viral genome sequencing allowing routine HIV-1 full genome sequencing in both research and diagnostic settings. Regardless of the sequencing platform selected, successful PCR amplification of the HIV-1 genome is essential for sequencing template preparation. As such, full HIV-1 genome amplification is a crucial step in dictating the successful and reliable sequencing downstream. Here we reviewed existing PCR protocols leading to HIV-1 full genome sequencing. In addition to the discussion on basic considerations on relevant PCR design, the advantages as well as the pitfalls of published protocols were reviewed. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Whole genome sequence analysis of BT-474 using complete Genomics' standard and long fragment read technologies.

PubMed

Ciotlos, Serban; Mao, Qing; Zhang, Rebecca Yu; Li, Zhenyu; Chin, Robert; Gulbahce, Natali; Liu, Sophie Jia; Drmanac, Radoje; Peters, Brock A

2016-01-01

The cell line BT-474 is a popular cell line for studying the biology of cancer and developing novel drugs. However, there is no complete, published genome sequence for this highly utilized scientific resource. In this study we sought to provide a comprehensive and useful data set for the scientific community by generating a whole genome sequence for BT-474. Five μg of genomic DNA, isolated from an early passage of the BT-474 cell line, was used to generate a whole genome sequence (114X coverage) using Complete Genomics' standard sequencing process. To provide additional variant phasing and structural variation data we also processed and analyzed two separate libraries of 5 and 6 individual cells to depths of 99X and 87X, respectively, using Complete Genomics' Long Fragment Read (LFR) technology. BT-474 is a highly aneuploid cell line with an extremely complex genome sequence. This ~300X total coverage genome sequence provides a more complete understanding of this highly utilized cell line at the genomic level.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.