Sample records for sequence analysis assignment

  1. Streaming fragment assignment for real-time analysis of sequencing experiments

    PubMed Central

    Roberts, Adam; Pachter, Lior

    2013-01-01

    We present eXpress, a software package for highly efficient probabilistic assignment of ambiguously mapping sequenced fragments. eXpress uses a streaming algorithm with linear run time and constant memory use. It can determine abundances of sequenced molecules in real time, and can be applied to ChIP-seq, metagenomics and other large-scale sequencing data. We demonstrate its use on RNA-seq data, showing greater efficiency than other quantification methods. PMID:23160280

  2. MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach

    PubMed Central

    Watson, Mick; Minot, Samuel S.; Rivera, Maria C.; Franklin, Rima B.

    2017-01-01

    Abstract Background: Environmental metagenomic analysis is typically accomplished by assigning taxonomy and/or function from whole genome sequencing or 16S amplicon sequences. Both of these approaches are limited, however, by read length, among other technical and biological factors. A nanopore-based sequencing platform, MinION™, produces reads that are ≥1 × 104 bp in length, potentially providing for more precise assignment, thereby alleviating some of the limitations inherent in determining metagenome composition from short reads. We tested the ability of sequence data produced by MinION (R7.3 flow cells) to correctly assign taxonomy in single bacterial species runs and in three types of low-complexity synthetic communities: a mixture of DNA using equal mass from four species, a community with one relatively rare (1%) and three abundant (33% each) components, and a mixture of genomic DNA from 20 bacterial strains of staggered representation. Taxonomic composition of the low-complexity communities was assessed by analyzing the MinION sequence data with three different bioinformatic approaches: Kraken, MG-RAST, and One Codex. Results: Long read sequences generated from libraries prepared from single strains using the version 5 kit and chemistry, run on the original MinION device, yielded as few as 224 to as many as 3497 bidirectional high-quality (2D) reads with an average overall study length of 6000 bp. For the single-strain analyses, assignment of reads to the correct genus by different methods ranged from 53.1% to 99.5%, assignment to the correct species ranged from 23.9% to 99.5%, and the majority of misassigned reads were to closely related organisms. A synthetic metagenome sequenced with the same setup yielded 714 high quality 2D reads of approximately 5500 bp that were up to 98% correctly assigned to the species level. Synthetic metagenome MinION libraries generated using version 6 kit and chemistry yielded from 899 to 3497 2D reads with lengths averaging 5700 bp with up to 98% assignment accuracy at the species level. The observed community proportions for “equal” and “rare” synthetic libraries were close to the known proportions, deviating from 0.1% to 10% across all tests. For a 20-species mock community with staggered contributions, a sequencing run detected all but 3 species (each included at <0.05% of DNA in the total mixture), 91% of reads were assigned to the correct species, 93% of reads were assigned to the correct genus, and >99% of reads were assigned to the correct family. Conclusions: At the current level of output and sequence quality (just under 4 × 103 2D reads for a synthetic metagenome), MinION sequencing followed by Kraken or One Codex analysis has the potential to provide rapid and accurate metagenomic analysis where the consortium is comprised of a limited number of taxa. Important considerations noted in this study included: high sensitivity of the MinION platform to the quality of input DNA, high variability of sequencing results across libraries and flow cells, and relatively small numbers of 2D reads per analysis limit. Together, these limited detection of very rare components of the microbial consortia, and would likely limit the utility of MinION for the sequencing of high-complexity metagenomic communities where thousands of taxa are expected. Furthermore, the limitations of the currently available data analysis tools suggest there is considerable room for improvement in the analytical approaches for the characterization of microbial communities using long reads. Nevertheless, the fact that the accurate taxonomic assignment of high-quality reads generated by MinION is approaching 99.5% and, in most cases, the inferred community structure mirrors the known proportions of a synthetic mixture warrants further exploration of practical application to environmental metagenomics as the platform continues to develop and improve. With further improvement in sequence throughput and error rate reduction, this platform shows great promise for precise real-time analysis of the composition and structure of more complex microbial communities. PMID:28327976

  3. MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach.

    PubMed

    Brown, Bonnie L; Watson, Mick; Minot, Samuel S; Rivera, Maria C; Franklin, Rima B

    2017-03-01

    Environmental metagenomic analysis is typically accomplished by assigning taxonomy and/or function from whole genome sequencing or 16S amplicon sequences. Both of these approaches are limited, however, by read length, among other technical and biological factors. A nanopore-based sequencing platform, MinION™, produces reads that are ≥1 × 104 bp in length, potentially providing for more precise assignment, thereby alleviating some of the limitations inherent in determining metagenome composition from short reads. We tested the ability of sequence data produced by MinION (R7.3 flow cells) to correctly assign taxonomy in single bacterial species runs and in three types of low-complexity synthetic communities: a mixture of DNA using equal mass from four species, a community with one relatively rare (1%) and three abundant (33% each) components, and a mixture of genomic DNA from 20 bacterial strains of staggered representation. Taxonomic composition of the low-complexity communities was assessed by analyzing the MinION sequence data with three different bioinformatic approaches: Kraken, MG-RAST, and One Codex. Results: Long read sequences generated from libraries prepared from single strains using the version 5 kit and chemistry, run on the original MinION device, yielded as few as 224 to as many as 3497 bidirectional high-quality (2D) reads with an average overall study length of 6000 bp. For the single-strain analyses, assignment of reads to the correct genus by different methods ranged from 53.1% to 99.5%, assignment to the correct species ranged from 23.9% to 99.5%, and the majority of misassigned reads were to closely related organisms. A synthetic metagenome sequenced with the same setup yielded 714 high quality 2D reads of approximately 5500 bp that were up to 98% correctly assigned to the species level. Synthetic metagenome MinION libraries generated using version 6 kit and chemistry yielded from 899 to 3497 2D reads with lengths averaging 5700 bp with up to 98% assignment accuracy at the species level. The observed community proportions for “equal” and “rare” synthetic libraries were close to the known proportions, deviating from 0.1% to 10% across all tests. For a 20-species mock community with staggered contributions, a sequencing run detected all but 3 species (each included at <0.05% of DNA in the total mixture), 91% of reads were assigned to the correct species, 93% of reads were assigned to the correct genus, and >99% of reads were assigned to the correct family. Conclusions: At the current level of output and sequence quality (just under 4 × 103 2D reads for a synthetic metagenome), MinION sequencing followed by Kraken or One Codex analysis has the potential to provide rapid and accurate metagenomic analysis where the consortium is comprised of a limited number of taxa. Important considerations noted in this study included: high sensitivity of the MinION platform to the quality of input DNA, high variability of sequencing results across libraries and flow cells, and relatively small numbers of 2D reads per analysis limit. Together, these limited detection of very rare components of the microbial consortia, and would likely limit the utility of MinION for the sequencing of high-complexity metagenomic communities where thousands of taxa are expected. Furthermore, the limitations of the currently available data analysis tools suggest there is considerable room for improvement in the analytical approaches for the characterization of microbial communities using long reads. Nevertheless, the fact that the accurate taxonomic assignment of high-quality reads generated by MinION is approaching 99.5% and, in most cases, the inferred community structure mirrors the known proportions of a synthetic mixture warrants further exploration of practical application to environmental metagenomics as the platform continues to develop and improve. With further improvement in sequence throughput and error rate reduction, this platform shows great promise for precise real-time analysis of the composition and structure of more complex microbial communities. © The Author 2017. Published by Oxford University Press.

  4. Quality Control Test for Sequence-Phenotype Assignments

    PubMed Central

    Ortiz, Maria Teresa Lara; Rosario, Pablo Benjamín Leon; Luna-Nevarez, Pablo; Gamez, Alba Savin; Martínez-del Campo, Ana; Del Rio, Gabriel

    2015-01-01

    Relating a gene mutation to a phenotype is a common task in different disciplines such as protein biochemistry. In this endeavour, it is common to find false relationships arising from mutations introduced by cells that may be depurated using a phenotypic assay; yet, such phenotypic assays may introduce additional false relationships arising from experimental errors. Here we introduce the use of high-throughput DNA sequencers and statistical analysis aimed to identify incorrect DNA sequence-phenotype assignments and observed that 10–20% of these false assignments are expected in large screenings aimed to identify critical residues for protein function. We further show that this level of incorrect DNA sequence-phenotype assignments may significantly alter our understanding about the structure-function relationship of proteins. We have made available an implementation of our method at http://bis.ifc.unam.mx/en/software/chispas. PMID:25700273

  5. A meta-analysis of bacterial diversity in the feces of cattle

    USDA-ARS?s Scientific Manuscript database

    In this study, we conducted a meta-analysis on 16S rRNA gene sequences of bovine fecal origin that are publicly available in the RDP database. A total of 13663 sequences including 603 isolate sequences were identified in the RDP database (Release 11, Update 1), where 13447 sequences were assigned t...

  6. CloVR-ITS: Automated internal transcribed spacer amplicon sequence analysis pipeline for the characterization of fungal microbiota

    PubMed Central

    2013-01-01

    Background Besides the development of comprehensive tools for high-throughput 16S ribosomal RNA amplicon sequence analysis, there exists a growing need for protocols emphasizing alternative phylogenetic markers such as those representing eukaryotic organisms. Results Here we introduce CloVR-ITS, an automated pipeline for comparative analysis of internal transcribed spacer (ITS) pyrosequences amplified from metagenomic DNA isolates and representing fungal species. This pipeline performs a variety of steps similar to those commonly used for 16S rRNA amplicon sequence analysis, including preprocessing for quality, chimera detection, clustering of sequences into operational taxonomic units (OTUs), taxonomic assignment (at class, order, family, genus, and species levels) and statistical analysis of sample groups of interest based on user-provided information. Using ITS amplicon pyrosequencing data from a previous human gastric fluid study, we demonstrate the utility of CloVR-ITS for fungal microbiota analysis and provide runtime and cost examples, including analysis of extremely large datasets on the cloud. We show that the largest fractions of reads from the stomach fluid samples were assigned to Dothideomycetes, Saccharomycetes, Agaricomycetes and Sordariomycetes but that all samples were dominated by sequences that could not be taxonomically classified. Representatives of the Candida genus were identified in all samples, most notably C. quercitrusa, while sequence reads assigned to the Aspergillus genus were only identified in a subset of samples. CloVR-ITS is made available as a pre-installed, automated, and portable software pipeline for cloud-friendly execution as part of the CloVR virtual machine package (http://clovr.org). Conclusion The CloVR-ITS pipeline provides fungal microbiota analysis that can be complementary to bacterial 16S rRNA and total metagenome sequence analysis allowing for more comprehensive studies of environmental and host-associated microbial communities. PMID:24451270

  7. A proteomic analysis of leaf sheaths from rice.

    PubMed

    Shen, Shihua; Matsubae, Masami; Takao, Toshifumi; Tanaka, Naoki; Komatsu, Setsuko

    2002-10-01

    The proteins extracted from the leaf sheaths of rice seedlings were separated by 2-D PAGE, and analyzed by Edman sequencing and mass spectrometry, followed by database searching. Image analysis revealed 352 protein spots on 2-D PAGE after staining with Coomassie Brilliant Blue. The amino acid sequences of 44 of 84 proteins were determined; for 31 of these proteins, a clear function could be assigned, whereas for 12 proteins, no function could be assigned. Forty proteins did not yield amino acid sequence information, because they were N-terminally blocked, or the obtained sequences were too short and/or did not give unambiguous results. Fifty-nine proteins were analyzed by mass spectrometry; all of these proteins were identified by matching to the protein database. The amino acid sequences of 19 of 27 proteins analyzed by mass spectrometry were similar to the results of Edman sequencing. These results suggest that 2-D PAGE combined with Edman sequencing and mass spectrometry analysis can be effectively used to identify plant proteins.

  8. RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets.

    PubMed

    Scheuch, Matthias; Höper, Dirk; Beer, Martin

    2015-03-03

    Fuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck. To overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS - Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets. RIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.

  9. Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods

    PubMed Central

    Dröge, J.; Gregor, I.; McHardy, A. C.

    2015-01-01

    Motivation: Metagenomics characterizes microbial communities by random shotgun sequencing of DNA isolated directly from an environment of interest. An essential step in computational metagenome analysis is taxonomic sequence assignment, which allows identifying the sequenced community members and reconstructing taxonomic bins with sequence data for the individual taxa. For the massive datasets generated by next-generation sequencing technologies, this cannot be performed with de-novo phylogenetic inference methods. We describe an algorithm and the accompanying software, taxator-tk, which performs taxonomic sequence assignment by fast approximate determination of evolutionary neighbors from sequence similarities. Results: Taxator-tk was precise in its taxonomic assignment across all ranks and taxa for a range of evolutionary distances and for short as well as for long sequences. In addition to the taxonomic binning of metagenomes, it is well suited for profiling microbial communities from metagenome samples because it identifies bacterial, archaeal and eukaryotic community members without being affected by varying primer binding strengths, as in marker gene amplification, or copy number variations of marker genes across different taxa. Taxator-tk has an efficient, parallelized implementation that allows the assignment of 6 Gb of sequence data per day on a standard multiprocessor system with 10 CPU cores and microbial RefSeq as the genomic reference data. Availability and implementation: Taxator-tk source and binary program files are publicly available at http://algbio.cs.uni-duesseldorf.de/software/. Contact: Alice.McHardy@uni-duesseldorf.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25388150

  10. Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo) genome assembly and analysis

    USDA-ARS?s Scientific Manuscript database

    Next-generation sequencing technologies were used to rapidly and efficiently sequence the genome of the domestic turkey (Meleagris gallopavo). The current genome assembly (~1.1 Gb) includes 917 Mb of sequence assigned to chromosomes. Innate heterozygosity of the sequenced bird allowed discovery of...

  11. Identification and subspecific differentiation of Mycobacterium scrofulaceum by automated sequencing of a region of the gene (hsp65) encoding a 65-kilodalton heat shock protein.

    PubMed Central

    Swanson, D S; Pan, X; Musser, J M

    1996-01-01

    Mycobacterium scrofulaceum is most commonly recovered from children with cervical lymphadenitis, although it also accounts for approximately 2% of the mycobacterial infections in AIDS patients. Species assignment of M. scrofulaceum isolated by conventional techniques can be difficult and time-consuming. To develop a strategy for rapid species assignment of these organisms, a 360-bp region of the gene (hsp65) encoding a 65-kDa heat shock protein in 37 isolates from diverse sources was sequenced. Eight hsp65 alleles were identified, and these sequences formed phylogenetic clusters and lineages largely distinct from other Mycobacterium species. There was incomplete correlation between serovar designation and hsp65 allele assignment. The hsp65 data correlated strongly with the results of sequence analysis of the gene coding for 16S rRNA. Automated DNA sequencing of a 360-bp region of the hsp65 gene provides a rapid and unambiguous method for species assignment of these acid-fast organisms for diagnostic purposes. PMID:8940463

  12. Fragment assignment in the cloud with eXpress-D

    PubMed Central

    2013-01-01

    Background Probabilistic assignment of ambiguously mapped fragments produced by high-throughput sequencing experiments has been demonstrated to greatly improve accuracy in the analysis of RNA-Seq and ChIP-Seq, and is an essential step in many other sequence census experiments. A maximum likelihood method using the expectation-maximization (EM) algorithm for optimization is commonly used to solve this problem. However, batch EM-based approaches do not scale well with the size of sequencing datasets, which have been increasing dramatically over the past few years. Thus, current approaches to fragment assignment rely on heuristics or approximations for tractability. Results We present an implementation of a distributed EM solution to the fragment assignment problem using Spark, a data analytics framework that can scale by leveraging compute clusters within datacenters–“the cloud”. We demonstrate that our implementation easily scales to billions of sequenced fragments, while providing the exact maximum likelihood assignment of ambiguous fragments. The accuracy of the method is shown to be an improvement over the most widely used tools available and can be run in a constant amount of time when cluster resources are scaled linearly with the amount of input data. Conclusions The cloud offers one solution for the difficulties faced in the analysis of massive high-thoughput sequencing data, which continue to grow rapidly. Researchers in bioinformatics must follow developments in distributed systems–such as new frameworks like Spark–for ways to port existing methods to the cloud and help them scale to the datasets of the future. Our software, eXpress-D, is freely available at: http://github.com/adarob/express-d. PMID:24314033

  13. An A Priori Multiobjective Optimization Model of a Search and Rescue Network

    DTIC Science & Technology

    1992-03-01

    sequences. Classical sensitivity analysis and tolerance analysis were used to analyze the frequency assignments generated by the different weight...function for excess coverage of a frequency. Sensitivity analysis is used to investigate the robustness of the frequency assignments produced by the...interest. The linear program solution is used to produce classical sensitivity analysis for the weight ranges. 17 III. Model Formulation This chapter

  14. Phylogenetic characterization of a biogas plant microbial community integrating clone library 16S-rDNA sequences and metagenome sequence data obtained by 454-pyrosequencing.

    PubMed

    Kröber, Magdalena; Bekel, Thomas; Diaz, Naryttza N; Goesmann, Alexander; Jaenicke, Sebastian; Krause, Lutz; Miller, Dimitri; Runte, Kai J; Viehöver, Prisca; Pühler, Alfred; Schlüter, Andreas

    2009-06-01

    The phylogenetic structure of the microbial community residing in a fermentation sample from a production-scale biogas plant fed with maize silage, green rye and liquid manure was analysed by an integrated approach using clone library sequences and metagenome sequence data obtained by 454-pyrosequencing. Sequencing of 109 clones from a bacterial and an archaeal 16S-rDNA amplicon library revealed that the obtained nucleotide sequences are similar but not identical to 16S-rDNA database sequences derived from different anaerobic environments including digestors and bioreactors. Most of the bacterial 16S-rDNA sequences could be assigned to the phylum Firmicutes with the most abundant class Clostridia and to the class Bacteroidetes, whereas most archaeal 16S-rDNA sequences cluster close to the methanogen Methanoculleus bourgensis. Further sequences of the archaeal library most probably represent so far non-characterised species within the genus Methanoculleus. A similar result derived from phylogenetic analysis of mcrA clone sequences. The mcrA gene product encodes the alpha-subunit of methyl-coenzyme-M reductase involved in the final step of methanogenesis. BLASTn analysis applying stringent settings resulted in assignment of 16S-rDNA metagenome sequence reads to 62 16S-rDNA amplicon sequences thus enabling frequency of abundance estimations for 16S-rDNA clone library sequences. Ribosomal Database Project (RDP) Classifier processing of metagenome 16S-rDNA reads revealed abundance of the phyla Firmicutes, Bacteroidetes and Euryarchaeota and the orders Clostridiales, Bacteroidales and Methanomicrobiales. Moreover, a large fraction of 16S-rDNA metagenome reads could not be assigned to lower taxonomic ranks, demonstrating that numerous microorganisms in the analysed fermentation sample of the biogas plant are still unclassified or unknown.

  15. Inferring Higher Functional Information for RIKEN Mouse Full-Length cDNA Clones With FACTS

    PubMed Central

    Nagashima, Takeshi; Silva, Diego G.; Petrovsky, Nikolai; Socha, Luis A.; Suzuki, Harukazu; Saito, Rintaro; Kasukawa, Takeya; Kurochkin, Igor V.; Konagaya, Akihiko; Schönbach, Christian

    2003-01-01

    FACTS (Functional Association/Annotation of cDNA Clones from Text/Sequence Sources) is a semiautomated knowledge discovery and annotation system that integrates molecular function information derived from sequence analysis results (sequence inferred) with functional information extracted from text. Text-inferred information was extracted from keyword-based retrievals of MEDLINE abstracts and by matching of gene or protein names to OMIM, BIND, and DIP database entries. Using FACTS, we found that 47.5% of the 60,770 RIKEN mouse cDNA FANTOM2 clone annotations were informative for text searches. MEDLINE queries yielded molecular interaction-containing sentences for 23.1% of the clones. When disease MeSH and GO terms were matched with retrieved abstracts, 22.7% of clones were associated with potential diseases, and 32.5% with GO identifiers. A significant number (23.5%) of disease MeSH-associated clones were also found to have a hereditary disease association (OMIM Morbidmap). Inferred neoplastic and nervous system disease represented 49.6% and 36.0% of disease MeSH-associated clones, respectively. A comparison of sequence-based GO assignments with informative text-based GO assignments revealed that for 78.2% of clones, identical GO assignments were provided for that clone by either method, whereas for 21.8% of clones, the assignments differed. In contrast, for OMIM assignments, only 28.5% of clones had identical sequence-based and text-based OMIM assignments. Sequence, sentence, and term-based functional associations are included in the FACTS database (http://facts.gsc.riken.go.jp/), which permits results to be annotated and explored through web-accessible keyword and sequence search interfaces. The FACTS database will be a critical tool for investigating the functional complexity of the mouse transcriptome, cDNA-inferred interactome (molecular interactions), and pathome (pathologies). PMID:12819151

  16. [Multiplexing mapping of human cDNAs]. Final report, September 1, 1991--February 28, 1994

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    Using PCR with automated product analysis, 329 human brain cDNA sequences have been assigned to individual human chromosomes. Primers were designed from single-pass cDNA sequences expressed sequence tags (ESTs). Primers were used in PCR reactions with DNA from somatic cell hybrid mapping panels as templates, often with multiplexing. Many ESTs mapped match sequence database records. To evaluate of these matches, the position of the primers relative to the matching region (In), the BLAST scores and the Poisson probability values of the EST/sequence record match were determined. In cases where the gene product was stringently identified by the sequence match hadmore » already been mapped, the gene locus determined by EST was consistent with the previous position which strongly supports the validity of assigning unknown genes to human chromosomes based on the EST sequence matches. In the present cases mapping the ESTs to a chromosome can also be considered to have mapped the known gene product: rolipram-sensitive cAMP phosphodiesterase, chromosome 1; protein phosphatase 2A{beta}, chromosome 4; alpha-catenin, chromosome 5; the ELE1 oncogene, chromosome 10q11.2 or q2.1-q23; MXII protein, chromosome l0q24-qter; ribosomal protein L18a homologue, chromosome 14; ribosomal protein L3, chromosome 17; and moesin, Xp11-cen. There were also ESTs mapped that were closely related to non-human sequence records. These matches therefore can be considered to identify human counterparts of known gene products, or members of known gene families. Examples of these include membrane proteins, translation-associated proteins, structural proteins, and enzymes. These data then demonstrate that single pass sequence information is sufficient to design PCR primers useful for assigning cDNA sequences to human chromosomes. When the EST sequence matches previous sequence database records, the chromosome assignments of the EST can be used to make preliminary assignments of the human gene to a chromosome.« less

  17. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing.

    PubMed

    Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P; Panitz, Frank; Bendixen, Christian; Nielsen, Rasmus; Willerslev, Eske

    2007-02-14

    The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis. We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses, population genetics, and phylogenetics.

  18. Comparative genome and methylome analysis reveals restriction/modification system diversity in the gut commensal Bifidobacterium breve

    PubMed Central

    Bottacini, Francesca; Morrissey, Ruth; Roberts, Richard John; James, Kieran; van Breen, Justin; Egan, Muireann; Lambert, Jolanda; van Limpt, Kees; Knol, Jan; Motherway, Mary O’Connell; van Sinderen, Douwe

    2018-01-01

    Abstract Bifidobacterium breve represents one of the most abundant bifidobacterial species in the gastro-intestinal tract of breast-fed infants, where their presence is believed to exert beneficial effects. In the present study whole genome sequencing, employing the PacBio Single Molecule, Real-Time (SMRT) sequencing platform, combined with comparative genome analysis allowed the most extensive genetic investigation of this taxon. Our findings demonstrate that genes encoding Restriction/Modification (R/M) systems constitute a substantial part of the B. breve variable gene content (or variome). Using the methylome data generated by SMRT sequencing, combined with targeted Illumina bisulfite sequencing (BS-seq) and comparative genome analysis, we were able to detect methylation recognition motifs and assign these to identified B. breve R/M systems, where in several cases such assignments were confirmed by restriction analysis. Furthermore, we show that R/M systems typically impose a very significant barrier to genetic accessibility of B. breve strains, and that cloning of a methyltransferase-encoding gene may overcome such a barrier, thus allowing future functional investigations of members of this species. PMID:29294107

  19. Automated analysis of high-throughput B-cell sequencing data reveals a high frequency of novel immunoglobulin V gene segment alleles.

    PubMed

    Gadala-Maria, Daniel; Yaari, Gur; Uduman, Mohamed; Kleinstein, Steven H

    2015-02-24

    Individual variation in germline and expressed B-cell immunoglobulin (Ig) repertoires has been associated with aging, disease susceptibility, and differential response to infection and vaccination. Repertoire properties can now be studied at large-scale through next-generation sequencing of rearranged Ig genes. Accurate analysis of these repertoire-sequencing (Rep-Seq) data requires identifying the germline variable (V), diversity (D), and joining (J) gene segments used by each Ig sequence. Current V(D)J assignment methods work by aligning sequences to a database of known germline V(D)J segment alleles. However, existing databases are likely to be incomplete and novel polymorphisms are hard to differentiate from the frequent occurrence of somatic hypermutations in Ig sequences. Here we develop a Tool for Ig Genotype Elucidation via Rep-Seq (TIgGER). TIgGER analyzes mutation patterns in Rep-Seq data to identify novel V segment alleles, and also constructs a personalized germline database containing the specific set of alleles carried by a subject. This information is then used to improve the initial V segment assignments from existing tools, like IMGT/HighV-QUEST. The application of TIgGER to Rep-Seq data from seven subjects identified 11 novel V segment alleles, including at least one in every subject examined. These novel alleles constituted 13% of the total number of unique alleles in these subjects, and impacted 3% of V(D)J segment assignments. These results reinforce the highly polymorphic nature of human Ig V genes, and suggest that many novel alleles remain to be discovered. The integration of TIgGER into Rep-Seq processing pipelines will increase the accuracy of V segment assignments, thus improving B-cell repertoire analyses.

  20. Uncommonly isolated clinical Pseudomonas: identification and phylogenetic assignation.

    PubMed

    Mulet, M; Gomila, M; Ramírez, A; Cardew, S; Moore, E R B; Lalucat, J; García-Valdés, E

    2017-02-01

    Fifty-two Pseudomonas strains that were difficult to identify at the species level in the phenotypic routine characterizations employed by clinical microbiology laboratories were selected for genotypic-based analysis. Species level identifications were done initially by partial sequencing of the DNA dependent RNA polymerase sub-unit D gene (rpoD). Two other gene sequences, for the small sub-unit ribosonal RNA (16S rRNA) and for DNA gyrase sub-unit B (gyrB) were added in a multilocus sequence analysis (MLSA) study to confirm the species identifications. These sequences were analyzed with a collection of reference sequences from the type strains of 161 Pseudomonas species within an in-house multi-locus sequence analysis database. Whole-cell matrix-assisted laser-desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) analyses of these strains complemented the DNA sequenced-based phylogenetic analyses and were observed to be in accordance with the results of the sequence data. Twenty-three out of 52 strains were assigned to 12 recognized species not commonly detected in clinical specimens and 29 (56 %) were considered representatives of at least ten putative new species. Most strains were distributed within the P. fluorescens and P. aeruginosa lineages. The value of rpoD sequences in species-level identifications for Pseudomonas is emphasized. The correct species identifications of clinical strains is essential for establishing the intrinsic antibiotic resistance patterns and improved treatment plans.

  1. The Use and Effectiveness of Triple Multiplex System for Coding Region Single Nucleotide Polymorphism in Mitochondrial DNA Typing of Archaeologically Obtained Human Skeletons from Premodern Joseon Tombs of Korea

    PubMed Central

    Oh, Chang Seok; Lee, Soong Deok; Kim, Yi-Suk; Shin, Dong Hoon

    2015-01-01

    Previous study showed that East Asian mtDNA haplogroups, especially those of Koreans, could be successfully assigned by the coupled use of analyses on coding region SNP markers and control region mutation motifs. In this study, we tried to see if the same triple multiplex analysis for coding regions SNPs could be also applicable to ancient samples from East Asia as the complementation for sequence analysis of mtDNA control region. By the study on Joseon skeleton samples, we know that mtDNA haplogroup determined by coding region SNP markers successfully falls within the same haplogroup that sequence analysis on control region can assign. Considering that ancient samples in previous studies make no small number of errors in control region mtDNA sequencing, coding region SNP analysis can be used as good complimentary to the conventional haplogroup determination, especially of archaeological human bone samples buried underground over long periods. PMID:26345190

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wemmer, D.E.; Kumar, N.V.; Metrione, R.M.

    Toxin II from Radianthus paumotensis (Rp/sub II/) has been investigated by high-resolution NMR and chemical sequencing methods. Resonance assignments have been obtained for this protein by the sequential approach. NMR assignments could not be made consistent with the previously reported primary sequence for this protein, and chemical methods have been used to determine a sequence with which the NMR data are consistent. Analysis of the 2D NOE spectra shows that the protein secondary structure is comprised of two sequences of ..beta..-sheet, probably joined into a distorted continuous sheet, connected by turns and extended loops, without any regular ..cap alpha..-helical segments.more » The residues previously implicated in activity in this class of proteins, D8 and R13, occur in a loop region.« less

  3. The genomic underpinnings of eukaryotic virus taxonomy: creating a sequence-based framework for family-level virus classification.

    PubMed

    Aiewsakun, Pakorn; Simmonds, Peter

    2018-02-20

    The International Committee on Taxonomy of Viruses (ICTV) classifies viruses into families, genera and species and provides a regulated system for their nomenclature that is universally used in virus descriptions. Virus taxonomic assignments have traditionally been based upon virus phenotypic properties such as host range, virion morphology and replication mechanisms, particularly at family level. However, gene sequence comparisons provide a clearer guide to their evolutionary relationships and provide the only information that may guide the incorporation of viruses detected in environmental (metagenomic) studies that lack any phenotypic data. The current study sought to determine whether the existing virus taxonomy could be reproduced by examination of genetic relationships through the extraction of protein-coding gene signatures and genome organisational features. We found large-scale consistency between genetic relationships and taxonomic assignments for viruses of all genome configurations and genome sizes. The analysis pipeline that we have called 'Genome Relationships Applied to Virus Taxonomy' (GRAViTy) was highly effective at reproducing the current assignments of viruses at family level as well as inter-family groupings into orders. Its ability to correctly differentiate assigned viruses from unassigned viruses, and classify them into the correct taxonomic group, was evaluated by threefold cross-validation technique. This predicted family membership of eukaryotic viruses with close to 100% accuracy and specificity potentially enabling the algorithm to predict assignments for the vast corpus of metagenomic sequences consistently with ICTV taxonomy rules. In an evaluation run of GRAViTy, over one half (460/921) of (near)-complete genome sequences from several large published metagenomic eukaryotic virus datasets were assigned to 127 novel family-level groupings. If corroborated by other analysis methods, these would potentially more than double the number of eukaryotic virus families in the ICTV taxonomy. A rapid and objective means to explore metagenomic viral diversity and make informed recommendations for their assignments at each taxonomic layer is essential. GRAViTy provides one means to make rule-based assignments at family and order levels in a manner that preserves the integrity and underlying organisational principles of the current ICTV taxonomy framework. Such methods are increasingly required as the vast virosphere is explored.

  4. Multilocus Sequence Typing of Cronobacter Strains Isolated from Retail Foods and Environmental Samples.

    PubMed

    Killer, Jiří; Skřivanová, Eva; Hochel, Igor; Marounek, Milan

    2015-06-01

    Cronobacter spp. are bacterial pathogens that affect children and immunocompromised adults. In this study, we used multilocus sequence typing (MLST) to determine sequence types (STs) in 11 Cronobacter spp. strains isolated from retail foods, 29 strains from dust samples obtained from vacuum cleaners, and 4 clinical isolates. Using biochemical tests, species-specific polymerase chain reaction, and MLST analysis, 36 strains were identified as Cronobacter sakazakii, and 6 were identified as Cronobacter malonaticus. In addition, one strain that originated from retail food and one from a dust sample from a vacuum cleaner were identified on the basis of MLST analysis as Cronobacter dublinensis and Cronobacter turicensis, respectively. Cronobacter spp. strains isolated from the retail foods were assigned to eight different MLST sequence types, seven of which were newly identified. The strains isolated from the dust samples were assigned to 7 known STs and 14 unknown STs. Three clinical isolates and one household dust isolate were assigned to ST4, which is the predominant ST associated with neonatal meningitis. One clinical isolate was classified based on MLST analysis as Cronobacter malonaticus and belonged to an as-yet-unknown ST. Three strains isolated from the household dust samples were assigned to ST1, which is another clinically significant ST. It can be concluded that Cronobacter spp. strains of different origin are genetically quite variable. The recovery of C. sakazakii strains belonging to ST1 and ST4 from the dust samples suggests the possibility that contamination could occur during food preparation. All of the novel STs and alleles for C. sakazakii, C. malonaticus, C. dublinensis, and C. turicensis determined in this study were deposited in the Cronobacter MLST database available online ( http://pubmlst.org/cronobacter/).

  5. MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads

    PubMed Central

    Lukjancenko, Oksana; Thomsen, Martin Christen Frølund; Maddalena Sperotto, Maria; Lund, Ole; Møller Aarestrup, Frank; Sicheritz-Pontén, Thomas

    2017-01-01

    An increasing amount of species and gene identification studies rely on the use of next generation sequence analysis of either single isolate or metagenomics samples. Several methods are available to perform taxonomic annotations and a previous metagenomics benchmark study has shown that a vast number of false positive species annotations are a problem unless thresholds or post-processing are applied to differentiate between correct and false annotations. MGmapper is a package to process raw next generation sequence data and perform reference based sequence assignment, followed by a post-processing analysis to produce reliable taxonomy annotation at species and strain level resolution. An in-vitro bacterial mock community sample comprised of 8 genuses, 11 species and 12 strains was previously used to benchmark metagenomics classification methods. After applying a post-processing filter, we obtained 100% correct taxonomy assignments at species and genus level. A sensitivity and precision at 75% was obtained for strain level annotations. A comparison between MGmapper and Kraken at species level, shows MGmapper assigns taxonomy at species level using 84.8% of the sequence reads, compared to 70.5% for Kraken and both methods identified all species with no false positives. Extensive read count statistics are provided in plain text and excel sheets for both rejected and accepted taxonomy annotations. The use of custom databases is possible for the command-line version of MGmapper, and the complete pipeline is freely available as a bitbucked package (https://bitbucket.org/genomicepidemiology/mgmapper). A web-version (https://cge.cbs.dtu.dk/services/MGmapper) provides the basic functionality for analysis of small fastq datasets. PMID:28467460

  6. MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads.

    PubMed

    Petersen, Thomas Nordahl; Lukjancenko, Oksana; Thomsen, Martin Christen Frølund; Maddalena Sperotto, Maria; Lund, Ole; Møller Aarestrup, Frank; Sicheritz-Pontén, Thomas

    2017-01-01

    An increasing amount of species and gene identification studies rely on the use of next generation sequence analysis of either single isolate or metagenomics samples. Several methods are available to perform taxonomic annotations and a previous metagenomics benchmark study has shown that a vast number of false positive species annotations are a problem unless thresholds or post-processing are applied to differentiate between correct and false annotations. MGmapper is a package to process raw next generation sequence data and perform reference based sequence assignment, followed by a post-processing analysis to produce reliable taxonomy annotation at species and strain level resolution. An in-vitro bacterial mock community sample comprised of 8 genuses, 11 species and 12 strains was previously used to benchmark metagenomics classification methods. After applying a post-processing filter, we obtained 100% correct taxonomy assignments at species and genus level. A sensitivity and precision at 75% was obtained for strain level annotations. A comparison between MGmapper and Kraken at species level, shows MGmapper assigns taxonomy at species level using 84.8% of the sequence reads, compared to 70.5% for Kraken and both methods identified all species with no false positives. Extensive read count statistics are provided in plain text and excel sheets for both rejected and accepted taxonomy annotations. The use of custom databases is possible for the command-line version of MGmapper, and the complete pipeline is freely available as a bitbucked package (https://bitbucket.org/genomicepidemiology/mgmapper). A web-version (https://cge.cbs.dtu.dk/services/MGmapper) provides the basic functionality for analysis of small fastq datasets.

  7. Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB.

    PubMed

    Xu, Qifang; Dunbrack, Roland L

    2012-11-01

    Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM-HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly.

  8. MassSieve: Panning MS/MS peptide data for proteins

    PubMed Central

    Slotta, Douglas J.; McFarland, Melinda A.; Markey, Sanford P.

    2010-01-01

    We present MassSieve, a Java-based platform for visualization and parsimony analysis of single and comparative LC-MS/MS database search engine results. The success of mass spectrometric peptide sequence assignment algorithms has led to the need for a tool to merge and evaluate the increasing data set sizes that result from LC-MS/MS-based shotgun proteomic experiments. MassSieve supports reports from multiple search engines with differing search characteristics, which can increase peptide sequence coverage and/or identify conflicting or ambiguous spectral assignments. PMID:20564260

  9. Comparative phylobiomic analysis of the bacterial community of water kefir by 16S rRNA gene amplicon sequencing and ARDRA analysis.

    PubMed

    Gulitz, A; Stadie, J; Ehrmann, M A; Ludwig, W; Vogel, R F

    2013-04-01

    The aim of this study was to analyse the bacterial microbiota of water kefir using culture-independent methods. We compared four water kefirs of different origins using 16S rDNA amplicon sequencing and ARDRA. The microbiota consisted of different proportions of the genera Lactobacillus (Lact.), Leuconostoc (Leuc.), Acetobacter (Acet.) and Gluconobacter. Surprisingly, varying but consistently high numbers of sequences representing members of the genus Bifidobacterium (Bif.) were found in all kefirs. Whereas part of the bifidobacterial sequences could be assigned to Bifidobacterium psychraerophilum, a majority of sequences identical to each other could not be assigned to any known species. A nearly full-length sequence of the latter exhibited a beyond-species similarity (96.4%) with the sequence from the closest relative species Bif. psychraerophilum. A Bifidobacterium-specific ARDRA analysis reflected the abundance of the novel Bifidobacterium species by revealing its unique MboI restriction profile. Attempts to isolate the bifidobacteria were successful for Bif. psychraerophilum only. The complexity of the water kefir microbiota has been underestimated in previously studies. The occurrence of bifidobacteria as part of the consortium is novel. These data give new insights into the understanding of the complexity of food fermentations and underline the need for approaches detecting noncultivable organisms. © 2013 The Society for Applied Microbiology.

  10. Genomic Analysis of Complex Microbial Communities in Wounds

    DTIC Science & Technology

    2009-07-01

    Actinobacteria — were the most commonly misclassified [25]. The 16S sequences used in the current study were all greater than or equal to 200 bases...with most (89.1%) of the sequences falling into Firmicutes, Proteobac- teria, and Actinobacteria phyla. High percentages of the Firmicutes and... Actinobacteria sequences were successfully assigned to the genus level, 88.0% and 82.3%, respectively; however, only 53.0% of the Proteobacteria sequences

  11. A Self-Instructional Approach To the Teaching of Enzymology Involving Computer-Based Sequence Analysis and Molecular Modelling.

    ERIC Educational Resources Information Center

    Attwood, Paul V.

    1997-01-01

    Describes a self-instructional assignment approach to the teaching of advanced enzymology. Presents an assignment that offers a means of teaching enzymology to students that exposes them to modern computer-based techniques of analyzing protein structure and relates structure to enzyme function. (JRH)

  12. Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB

    PubMed Central

    Dunbrack, Roland L.

    2012-01-01

    Motivation: Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. Results: We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM–HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. Availability: The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly. Contact: Roland.Dunbracks@fccc.edu PMID:22942020

  13. First observation of rotational structures in Re 168

    DOE PAGES

    Hartley, D. J.; Janssens, R. V. F.; Riedinger, L. L.; ...

    2016-11-30

    We assigned first rotational sequences to the odd-odd nucleus 168Re. Coincidence relationships of these structures with rhenium x rays confirm the isotopic assignment, while arguments based on the γ-ray multiplicity (K-fold) distributions observed with the new bands lead to the mass assignment. Configurations for the two bands were determined through analysis of the rotational alignments of the structures and a comparison of the experimental B(M1)/B(E2) ratios with theory. Tentative spin assignments are proposed for the πh 11/2νi 13/2 band, based on energy level systematics for other known sequences in neighboring odd-odd rhenium nuclei, as well as on systematics seen formore » the signature inversion feature that is well known in this region. Furthermore, the spin assignment for the πh 11/2ν(h 9/2/f 7/2) structure provides additional validation of the proposed spins and configurations for isomers in the 176Au → 172Ir → 168Re α-decay chain.« less

  14. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hartley, D. J.; Janssens, R. V. F.; Riedinger, L. L.

    We assigned first rotational sequences to the odd-odd nucleus 168Re. Coincidence relationships of these structures with rhenium x rays confirm the isotopic assignment, while arguments based on the γ-ray multiplicity (K-fold) distributions observed with the new bands lead to the mass assignment. Configurations for the two bands were determined through analysis of the rotational alignments of the structures and a comparison of the experimental B(M1)/B(E2) ratios with theory. Tentative spin assignments are proposed for the πh 11/2νi 13/2 band, based on energy level systematics for other known sequences in neighboring odd-odd rhenium nuclei, as well as on systematics seen formore » the signature inversion feature that is well known in this region. Furthermore, the spin assignment for the πh 11/2ν(h 9/2/f 7/2) structure provides additional validation of the proposed spins and configurations for isomers in the 176Au → 172Ir → 168Re α-decay chain.« less

  15. Community and gene composition of a human dental plaque microbiota obtained by metagenomic sequencing

    PubMed Central

    Xie, G.; Chain, P.S.G.; Lo, C.; Liu, K-L.; Gans, J.; Merritt, J.; Qi, F.

    2010-01-01

    SUMMARY Human dental plaque is a complex microbial community containing an estimated 700 to 19,000 species/phylotypes. Despite numerous studies analysing species richness in healthy and diseased human subjects, the true genomic composition of the human dental plaque microbiota remains unknown. Here we report a metagenomic analysis of a healthy human plaque sample using a combination of second-generation sequencing platforms. A total of 860 million base pairs of non-human sequences were generated. Various analysis tools revealed the presence of 12 well-characterized phyla, members of the TM-7 and BRC1 clade, and sequences that could not be classified. Both pathogens and opportunistic pathogens were identified, supporting the ecological plaque hypothesis for oral diseases. Mapping the metagenomic reads to sequenced reference genomes demonstrated that 4% of the reads could be assigned to the sequenced species. Preliminary annotation identified genes belonging to all known functional categories. Interestingly, although 73% of the total assembled contig sequences were predicted to code for proteins, only 51% of them could be assigned a functional role. Furthermore, ~ 2.8% of the total predicted genes coded for proteins involved in resistance to antibiotics and toxic compounds, suggesting that the oral cavity is an important reservoir for antimicrobial resistance. PMID:21040513

  16. Community and gene composition of a human dental plaque microbiota obtained by metagenomic sequencing.

    PubMed

    Xie, G; Chain, P S G; Lo, C-C; Liu, K-L; Gans, J; Merritt, J; Qi, F

    2010-12-01

    Human dental plaque is a complex microbial community containing an estimated 700 to 19,000 species/phylotypes. Despite numerous studies analysing species richness in healthy and diseased human subjects, the true genomic composition of the human dental plaque microbiota remains unknown. Here we report a metagenomic analysis of a healthy human plaque sample using a combination of second-generation sequencing platforms. A total of 860 million base pairs of non-human sequences were generated. Various analysis tools revealed the presence of 12 well-characterized phyla, members of the TM-7 and BRC1 clade, and sequences that could not be classified. Both pathogens and opportunistic pathogens were identified, supporting the ecological plaque hypothesis for oral diseases. Mapping the metagenomic reads to sequenced reference genomes demonstrated that 4% of the reads could be assigned to the sequenced species. Preliminary annotation identified genes belonging to all known functional categories. Interestingly, although 73% of the total assembled contig sequences were predicted to code for proteins, only 51% of them could be assigned a functional role. Furthermore, ~2.8% of the total predicted genes coded for proteins involved in resistance to antibiotics and toxic compounds, suggesting that the oral cavity is an important reservoir for antimicrobial resistance. © 2010 John Wiley & Sons A/S.

  17. An automated genotyping tool for enteroviruses and noroviruses.

    PubMed

    Kroneman, A; Vennema, H; Deforche, K; v d Avoort, H; Peñaranda, S; Oberste, M S; Vinjé, J; Koopmans, M

    2011-06-01

    Molecular techniques are established as routine in virological laboratories and virus typing through (partial) sequence analysis is increasingly common. Quality assurance for the use of typing data requires harmonization of genotype nomenclature, and agreement on target genes, depending on the level of resolution required, and robustness of methods. To develop and validate web-based open-access typing-tools for enteroviruses and noroviruses. An automated web-based typing algorithm was developed, starting with BLAST analysis of the query sequence against a reference set of sequences from viruses in the family Picornaviridae or Caliciviridae. The second step is phylogenetic analysis of the query sequence and a sub-set of the reference sequences, to assign the enterovirus type or norovirus genotype and/or variant, with profile alignment, construction of phylogenetic trees and bootstrap validation. Typing is performed on VP1 sequences of Human enterovirus A to D, and ORF1 and ORF2 sequences of genogroup I and II noroviruses. For validation, we used the tools to automatically type sequences in the RIVM and CDC enterovirus databases and the FBVE norovirus database. Using the typing-tools, 785(99%) of 795 Enterovirus VP1 sequences, and 8154(98.5%) of 8342 norovirus sequences were typed in accordance with previously used methods. Subtyping into variants was achieved for 4439(78.4%) of 5838 NoV GII.4 sequences. The online typing-tools reliably assign genotypes for enteroviruses and noroviruses. The use of phylogenetic methods makes these tools robust to ongoing evolution. This should facilitate standardized genotyping and nomenclature in clinical and public health laboratories, thus supporting inter-laboratory comparisons. Copyright © 2011 Elsevier B.V. All rights reserved.

  18. Determination of disease phenotypes and pathogenic variants from exome sequence data in the CAGI 4 gene panel challenge.

    PubMed

    Kundu, Kunal; Pal, Lipika R; Yin, Yizhou; Moult, John

    2017-09-01

    The use of gene panel sequence for diagnostic and prognostic testing is now widespread, but there are so far few objective tests of methods to interpret these data. We describe the design and implementation of a gene panel sequencing data analysis pipeline (VarP) and its assessment in a CAGI4 community experiment. The method was applied to clinical gene panel sequencing data of 106 patients, with the goal of determining which of 14 disease classes each patient has and the corresponding causative variant(s). The disease class was correctly identified for 36 cases, including 10 where the original clinical pipeline did not find causative variants. For a further seven cases, we found strong evidence of an alternative disease to that tested. Many of the potentially causative variants are missense, with no previous association with disease, and these proved the hardest to correctly assign pathogenicity or otherwise. Post analysis showed that three-dimensional structure data could have helped for up to half of these cases. Over-reliance on HGMD annotation led to a number of incorrect disease assignments. We used a largely ad hoc method to assign probabilities of pathogenicity for each variant, and there is much work still to be done in this area. © 2017 The Authors. **Human Mutation published by Wiley Periodicals, Inc.

  19. Evaluation of next generation sequencing for the analysis of Eimeria communities in wildlife.

    PubMed

    Vermeulen, Elke T; Lott, Matthew J; Eldridge, Mark D B; Power, Michelle L

    2016-05-01

    Next-generation sequencing (NGS) techniques are well-established for studying bacterial communities but not yet for microbial eukaryotes. Parasite communities remain poorly studied, due in part to the lack of reliable and accessible molecular methods to analyse eukaryotic communities. We aimed to develop and evaluate a methodology to analyse communities of the protozoan parasite Eimeria from populations of the Australian marsupial Petrogale penicillata (brush-tailed rock-wallaby) using NGS. An oocyst purification method for small sample sizes and polymerase chain reaction (PCR) protocol for the 18S rRNA locus targeting Eimeria was developed and optimised prior to sequencing on the Illumina MiSeq platform. A data analysis approach was developed by modifying methods from bacterial metagenomics and utilising existing Eimeria sequences in GenBank. Operational taxonomic unit (OTU) assignment at a high similarity threshold (97%) was more accurate at assigning Eimeria contigs into Eimeria OTUs but at a lower threshold (95%) there was greater resolution between OTU consensus sequences. The assessment of two amplification PCR methods prior to Illumina MiSeq, single and nested PCR, determined that single PCR was more sensitive to Eimeria as more Eimeria OTUs were detected in single amplicons. We have developed a simple and cost-effective approach to a data analysis pipeline for community analysis of eukaryotic organisms using Eimeria communities as a model. The pipeline provides a basis for evaluation using other eukaryotic organisms and potential for diverse community analysis studies. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. Phylogenetic diversity of bacterial communities in bovine rumen as affected by diets and microenvironments.

    PubMed

    Kim, Minseok; Morrison, Mark; Yu, Zhongtang

    2011-09-01

    Phylogenetic analysis was conducted to examine ruminal bacteria in two ruminal fractions (adherent fraction vs. liquid fraction) collected from cattle fed with two different diets: forage alone vs. forage plus concentrate. One hundred forty-four 16S rRNA gene (rrs) sequences were obtained from clone libraries constructed from the four samples. These rrs sequences were assigned to 116 different operational taxonomic units (OTUs) defined at 0.03 phylogenetic distance. Most of these OTUs could not be assigned to any known genus. The phylum Firmicutes was represented by approximately 70% of all the sequences. By comparing to the OTUs already documented in the rumen, 52 new OTUs were identified. UniFrac, SONS, and denaturing gradient gel electrophoresis analyses revealed difference in diversity between the two fractions and between the two diets. This study showed that rrs sequences recovered from small clone libraries can still help identify novel species-level OTUs.

  1. Phylogenetic analysis of Austrian canine distemper virus strains from clinical samples from dogs and wild carnivores.

    PubMed

    Benetka, V; Leschnik, M; Affenzeller, N; Möstl, K

    2011-04-09

    Austrian field cases of canine distemper (14 dogs, one badger [Meles meles] and one stone marten [Martes foina]) from 2002 to 2007 were investigated and the case histories were summarised briefly. Phylogenetic analysis of fusion (F) and haemagglutinin (H) gene sequences revealed different canine distemper virus (CDV) lineages circulating in Austria. The majority of CDV strains detected from 2002 to 2004 were well embedded in the European lineage. One Austrian canine sample detected in 2003, with a high similarity to Hungarian sequences from 2005 to 2006, could be assigned to the Arctic group (phocine distemper virus type 2-like). The two canine sequences from 2007 formed a clearly distinct group flanked by sequences detected previously in China and the USA on an intermediate position between the European wildlife and the Asia-1 cluster. The Austrian wildlife strains (2006 and 2007) could be assigned to the European wildlife group and were most closely related to, yet clearly different from, the 2007 canine samples. To elucidate the epidemiological role of Austrian wildlife in the transmission of the disease to dogs and vice versa, H protein residues related to receptor and host specificity (residues 530 and 549) were analysed. All samples showed the amino acids expected for their host of origin, with the exception of a canine sequence from 2007, which had an intermediate position between wildlife and canine viral strains. In the period investigated, canine strains circulating in Austria could be assigned to four different lineages reflecting both a high diversity and probably different origins of virus introduction to Austria in different years.

  2. A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.

    PubMed

    Thakur, Shalabh; Guttman, David S

    2016-06-30

    Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, most scale poorly when faced with hundreds of genome sequences, and many require extensive manual curation. We have developed a de-novo genome analysis pipeline (DeNoGAP) for the automated, iterative and high-throughput analysis of data from comparative genomics projects involving hundreds of whole genome sequences. The pipeline is designed to perform reference-assisted and de novo gene prediction, homolog protein family assignment, ortholog prediction, functional annotation, and pan-genome analysis using a range of proven tools and databases. While most existing methods scale quadratically with the number of genomes since they rely on pairwise comparisons among predicted protein sequences, DeNoGAP scales linearly since the homology assignment is based on iteratively refined hidden Markov models. This iterative clustering strategy enables DeNoGAP to handle a very large number of genomes using minimal computational resources. Moreover, the modular structure of the pipeline permits easy updates as new analysis programs become available. DeNoGAP integrates bioinformatics tools and databases for comparative analysis of a large number of genomes. The pipeline offers tools and algorithms for annotation and analysis of completed and draft genome sequences. The pipeline is developed using Perl, BioPerl and SQLite on Ubuntu Linux version 12.04 LTS. Currently, the software package accompanies script for automated installation of necessary external programs on Ubuntu Linux; however, the pipeline should be also compatible with other Linux and Unix systems after necessary external programs are installed. DeNoGAP is freely available at https://sourceforge.net/projects/denogap/ .

  3. Rescuing discarded spectra: Full comprehensive analysis of a minimal proteome.

    PubMed

    Lluch-Senar, Maria; Mancuso, Francesco M; Climente-González, Héctor; Peña-Paz, Marcia I; Sabido, Eduard; Serrano, Luis

    2016-02-01

    A common problem encountered when performing large-scale MS proteome analysis is the loss of information due to the high percentage of unassigned spectra. To determine the causes behind this loss we have analyzed the proteome of one of the smallest living bacteria that can be grown axenically, Mycoplasma pneumoniae (729 ORFs). The proteome of M. pneumoniae cells, grown in defined media, was analyzed by MS. An initial search with both Mascot and a species-specific NCBInr database with common contaminants (NCBImpn), resulted in around 79% of the acquired spectra not having an assignment. The percentage of non-assigned spectra was reduced to 27% after re-analysis of the data with the PEAKS software, thereby increasing the proteome coverage of M. pneumoniae from the initial 60% to over 76%. Nonetheless, 33,413 spectra with assigned amino acid sequences could not be mapped to any NCBInr database protein sequence. Approximately, 1% of these unassigned peptides corresponded to PTMs and 4% to M. pneumoniae protein variants (deamidation and translation inaccuracies). The most abundant peptide sequence variants (Phe-Tyr and Ala-Ser) could be explained by alterations in the editing capacity of the corresponding tRNA synthases. About another 1% of the peptides not associated to any protein had repetitions of the same aromatic/hydrophobic amino acid at the N-terminus, or had Arg/Lys at the C-terminus. Thus, in a model system, we have maximized the number of assigned spectra to 73% (51,453 out of the 70,040 initial acquired spectra). All MS data have been deposited in the ProteomeXchange with identifier PXD002779 (http://proteomecentral.proteomexchange.org/dataset/PXD002779). © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Sequence-specific sup 1 H NMR resonance assignments of Bacillus subtilis HPr: Use of spectra obtained from mutants to resolve spectral overlap

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wittekind, M.; Klevit, R.E.; Reizer, J.

    1990-08-07

    On the basis of an analysis of two-dimensional {sup 1}H NMR spectra, the complete sequence-specific {sup 1}H NMR assignments are presented for the phosphocarrier protein HPr from the Gram-positive bacterium Bacillus subtilis. During the assignment procedure, extensive use was made of spectra obtained from point mutants of HPr in order to resolve spectral overlap and to provide verification of assignments. Regions of regular secondary structure were identified by characteristic patterns of sequential backbone proton NOEs and slowly exchanging amide protons. B subtilis HPr contains four {beta}-strands that form a single antiparallel {beta}-sheet and two well-defined {alpha}-helices. There are two stretchesmore » of extended backbone structure, one of which contains the active site His{sub 15}. The overall fold of the protein is very similar to that of Escherichia coli HPr determined by NMR studies.« less

  5. HLA genotyping by next-generation sequencing of complementary DNA.

    PubMed

    Segawa, Hidenobu; Kukita, Yoji; Kato, Kikuya

    2017-11-28

    Genotyping of the human leucocyte antigen (HLA) is indispensable for various medical treatments. However, unambiguous genotyping is technically challenging due to high polymorphism of the corresponding genomic region. Next-generation sequencing is changing the landscape of genotyping. In addition to high throughput of data, its additional advantage is that DNA templates are derived from single molecules, which is a strong merit for the phasing problem. Although most currently developed technologies use genomic DNA, use of cDNA could enable genotyping with reduced costs in data production and analysis. We thus developed an HLA genotyping system based on next-generation sequencing of cDNA. Each HLA gene was divided into 3 or 4 target regions subjected to PCR amplification and subsequent sequencing with Ion Torrent PGM. The sequence data were then subjected to an automated analysis. The principle of the analysis was to construct candidate sequences generated from all possible combinations of variable bases and arrange them in decreasing order of the number of reads. Upon collecting candidate sequences from all target regions, 2 haplotypes were usually assigned. Cases not assigned 2 haplotypes were forwarded to 4 additional processes: selection of candidate sequences applying more stringent criteria, removal of artificial haplotypes, selection of candidate sequences with a relaxed threshold for sequence matching, and countermeasure for incomplete sequences in the HLA database. The genotyping system was evaluated using 30 samples; the overall accuracy was 97.0% at the field 3 level and 98.3% at the G group level. With one sample, genotyping of DPB1 was not completed due to short read size. We then developed a method for complete sequencing of individual molecules of the DPB1 gene, using the molecular barcode technology. The performance of the automatic genotyping system was comparable to that of systems developed in previous studies. Thus, next-generation sequencing of cDNA is a viable option for HLA genotyping.

  6. Expanded microbial genome coverage and improved protein family annotation in the COG database

    PubMed Central

    Galperin, Michael Y.; Makarova, Kira S.; Wolf, Yuri I.; Koonin, Eugene V.

    2015-01-01

    Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the COGs is expected to become an important tool for microbial genomics. PMID:25428365

  7. Full-length genome sequences of five hepatitis C virus isolates representing subtypes 3g, 3h, 3i and 3k, and a unique genotype 3 variant.

    PubMed

    Lu, Ling; Li, Chunhua; Yuan, Jie; Lu, Teng; Okamoto, Hiroaki; Murphy, Donald G

    2013-03-01

    We characterized the full-length genomes of five distinct hepatitis C virus (HCV)-3 isolates. These represent the first complete genomes for subtypes 3g and 3h, the second such genomes for 3k and 3i, and of one novel variant presently not assigned to a subtype. Each genome was determined from 18-25 overlapping fragments. They had lengths of 9579-9660 nt and each contained a single ORF encoding 3020-3025 aa. They were isolated from five patients residing in Canada; four were of Asian origin and one was of Somali origin. Phylogenetic analysis using 64 partial NS5B sequences differentiated 10 assigned subtypes, 3a-3i and 3k, and two additional lineages within genotype 3. From the data of this study, HCV-3 full-length sequences are now available for six of the assigned subtypes and one unassigned. Our findings should add insights to HCV evolutionary studies and clinical applications.

  8. Lessons Learned from Dependency Usage in HERA: Implications for THERP-Related HRA Methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    April M. Whaley; Ronald L. Boring; Harold S. Blackman

    Dependency occurs when the probability of success or failure on one action changes the probability of success or failure on a subsequent action. Dependency may serve as a modifier on the human error probabilities (HEPs) for successive actions in human reliability analysis (HRA) models. Discretion should be employed when determining whether or not a dependency calculation is warranted: dependency should not be assigned without strongly grounded reasons. Human reliability analysts may sometimes assign dependency in cases where it is unwarranted. This inappropriate assignment is attributed to a lack of clear guidance to encompass the range of scenarios human reliability analystsmore » are addressing. Inappropriate assignment of dependency produces inappropriately elevated HEP values. Lessons learned about dependency usage in the Human Event Repository and Analysis (HERA) system may provide clarification and guidance for analysts using first-generation HRA methods. This paper presents the HERA approach to dependency assessment and discusses considerations for dependency usage in HRA, including the cognitive basis for dependency, direction for determining when dependency should be assessed, considerations for determining the dependency level, temporal issues to consider when assessing dependency, (e.g., considering task sequence versus overall event sequence, and dependency over long periods of time), and diagnosis and action influences on dependency.« less

  9. Analysis of sequence repeats of proteins in the PDB.

    PubMed

    Mary Rajathei, David; Selvaraj, Samuel

    2013-12-01

    Internal repeats in protein sequences play a significant role in the evolution of protein structure and function. Applications of different bioinformatics tools help in the identification and characterization of these repeats. In the present study, we analyzed sequence repeats in a non-redundant set of proteins available in the Protein Data Bank (PDB). We used RADAR for detecting internal repeats in a protein, PDBeFOLD for assessing structural similarity, PDBsum for finding functional involvement and Pfam for domain assignment of the repeats in a protein. Through the analysis of sequence repeats, we found that identity of the sequence repeats falls in the range of 20-40% and, the superimposed structures of the most of the sequence repeats maintain similar overall folding. Analysis sequence repeats at the functional level reveals that most of the sequence repeats are involved in the function of the protein through functionally involved residues in the repeat regions. We also found that sequence repeats in single and two domain proteins often contained conserved sequence motifs for the function of the domain. Copyright © 2013 Elsevier Ltd. All rights reserved.

  10. Rapid and accurate taxonomic classification of insect (class Insecta) cytochrome c oxidase subunit 1 (COI) DNA barcode sequences using a naïve Bayesian classifier

    PubMed Central

    Porter, Teresita M; Gibson, Joel F; Shokralla, Shadi; Baird, Donald J; Golding, G Brian; Hajibabaei, Mehrdad

    2014-01-01

    Current methods to identify unknown insect (class Insecta) cytochrome c oxidase (COI barcode) sequences often rely on thresholds of distances that can be difficult to define, sequence similarity cut-offs, or monophyly. Some of the most commonly used metagenomic classification methods do not provide a measure of confidence for the taxonomic assignments they provide. The aim of this study was to use a naïve Bayesian classifier (Wang et al. Applied and Environmental Microbiology, 2007; 73: 5261) to automate taxonomic assignments for large batches of insect COI sequences such as data obtained from high-throughput environmental sequencing. This method provides rank-flexible taxonomic assignments with an associated bootstrap support value, and it is faster than the blast-based methods commonly used in environmental sequence surveys. We have developed and rigorously tested the performance of three different training sets using leave-one-out cross-validation, two field data sets, and targeted testing of Lepidoptera, Diptera and Mantodea sequences obtained from the Barcode of Life Data system. We found that type I error rates, incorrect taxonomic assignments with a high bootstrap support, were already relatively low but could be lowered further by ensuring that all query taxa are actually present in the reference database. Choosing bootstrap support cut-offs according to query length and summarizing taxonomic assignments to more inclusive ranks can also help to reduce error while retaining the maximum number of assignments. Additionally, we highlight gaps in the taxonomic and geographic representation of insects in public sequence databases that will require further work by taxonomists to improve the quality of assignments generated using any method.

  11. Draft genome sequence of marine-derived Streptomyces sp. TP-A0598, a producer of anti-MRSA antibiotic lydicamycins.

    PubMed

    Komaki, Hisayuki; Ichikawa, Natsuko; Hosoyama, Akira; Fujita, Nobuyuki; Igarashi, Yasuhiro

    2015-01-01

    Streptomyces sp. TP-A0598, isolated from seawater, produces lydicamycin, structurally unique type I polyketide bearing two nitrogen-containing five-membered rings, and four congeners TPU-0037-A, -B, -C, and -D. We herein report the 8 Mb draft genome sequence of this strain, together with classification and features of the organism and generation, annotation and analysis of the genome sequence. The genome encodes 7,240 putative ORFs, of which 4,450 ORFs were assigned with COG categories. Also, 66 tRNA genes and one rRNA operon were identified. The genome contains eight gene clusters involved in the production of polyketides and nonribosomal peptides. Among them, a PKS/NRPS gene cluster was assigned to be responsible for lydicamycin biosynthesis and a plausible biosynthetic pathway was proposed on the basis of gene function prediction. This genome sequence data will facilitate to probe the potential of secondary metabolism in marine-derived Streptomyces.

  12. Web-Based Phylogenetic Assignment Tool for Analysis of Terminal Restriction Fragment Length Polymorphism Profiles of Microbial Communities

    PubMed Central

    Kent, Angela D.; Smith, Dan J.; Benson, Barbara J.; Triplett, Eric W.

    2003-01-01

    Culture-independent DNA fingerprints are commonly used to assess the diversity of a microbial community. However, relating species composition to community profiles produced by community fingerprint methods is not straightforward. Terminal restriction fragment length polymorphism (T-RFLP) is a community fingerprint method in which phylogenetic assignments may be inferred from the terminal restriction fragment (T-RF) sizes through the use of web-based resources that predict T-RF sizes for known bacteria. The process quickly becomes computationally intensive due to the need to analyze profiles produced by multiple restriction digests and the complexity of profiles generated by natural microbial communities. A web-based tool is described here that rapidly generates phylogenetic assignments from submitted community T-RFLP profiles based on a database of fragments produced by known 16S rRNA gene sequences. Users have the option of submitting a customized database generated from unpublished sequences or from a gene other than the 16S rRNA gene. This phylogenetic assignment tool allows users to employ T-RFLP to simultaneously analyze microbial community diversity and species composition. An analysis of the variability of bacterial species composition throughout the water column in a humic lake was carried out to demonstrate the functionality of the phylogenetic assignment tool. This method was validated by comparing the results generated by this program with results from a 16S rRNA gene clone library. PMID:14602639

  13. Characteristics of HIV-infected U.S. Army soldiers linked in molecular transmission clusters, 2001-2012

    PubMed Central

    Jagodzinski, Linda L.; Liu, Ying; Pham, Peter T.; Kijak, Gustavo H.; Tovanabutra, Sodsai; McCutchan, Francine E.; Scoville, Stephanie L.; Cersovsky, Steven B.; Michael, Nelson L.; Scott, Paul T.; Peel, Sheila A.

    2017-01-01

    Objective Recent surveillance data suggests the United States (U.S.) Army HIV epidemic is concentrated among men who have sex with men. To identify potential targets for HIV prevention strategies, the relationship between demographic and clinical factors and membership within transmission clusters based on baseline pol sequences of HIV-infected Soldiers from 2001 through 2012 were analyzed. Methods We conducted a retrospective analysis of baseline partial pol sequences, demographic and clinical characteristics available for all Soldiers in active service and newly-diagnosed with HIV-1 infection from January 1, 2001 through December 31, 2012. HIV-1 subtype designations and transmission clusters were identified from phylogenetic analysis of sequences. Univariate and multivariate logistic regression models were used to evaluate and adjust for the association between characteristics and cluster membership. Results Among 518 of 995 HIV-infected Soldiers with available partial pol sequences, 29% were members of a transmission cluster. Assignment to a southern U.S. region at diagnosis and year of diagnosis were independently associated with cluster membership after adjustment for other significant characteristics (p<0.10) of age, race, year of diagnosis, region of duty assignment, sexually transmitted infections, last negative HIV test, antiretroviral therapy, and transmitted drug resistance. Subtyping of the pol fragment indicated HIV-1 subtype B infection predominated (94%) among HIV-infected Soldiers. Conclusion These findings identify areas to explore as HIV prevention targets in the U.S. Army. An increased frequency of current force testing may be justified, especially among Soldiers assigned to duty in installations with high local HIV prevalence such as southern U.S. states. PMID:28759645

  14. Automated sequence-specific protein NMR assignment using the memetic algorithm MATCH.

    PubMed

    Volk, Jochen; Herrmann, Torsten; Wüthrich, Kurt

    2008-07-01

    MATCH (Memetic Algorithm and Combinatorial Optimization Heuristics) is a new memetic algorithm for automated sequence-specific polypeptide backbone NMR assignment of proteins. MATCH employs local optimization for tracing partial sequence-specific assignments within a global, population-based search environment, where the simultaneous application of local and global optimization heuristics guarantees high efficiency and robustness. MATCH thus makes combined use of the two predominant concepts in use for automated NMR assignment of proteins. Dynamic transition and inherent mutation are new techniques that enable automatic adaptation to variable quality of the experimental input data. The concept of dynamic transition is incorporated in all major building blocks of the algorithm, where it enables switching between local and global optimization heuristics at any time during the assignment process. Inherent mutation restricts the intrinsically required randomness of the evolutionary algorithm to those regions of the conformation space that are compatible with the experimental input data. Using intact and artificially deteriorated APSY-NMR input data of proteins, MATCH performed sequence-specific resonance assignment with high efficiency and robustness.

  15. Characterization of a Methanogenic Community within an Algal Fed Anaerobic Digester

    PubMed Central

    Ellis, Joshua T.; Tramp, Cody; Sims, Ronald C.; Miller, Charles D.

    2012-01-01

    The microbial diversity and metabolic potential of a methanogenic consortium residing in a 3785-liter anaerobic digester, fed with wastewater algae, was analyzed using 454 pyrosequencing technology. DNA was extracted from anaerobic sludge material and used in metagenomic analysis through PCR amplification of the methyl-coenzyme M reductase α subunit (mcrA) gene using primer sets ML, MCR, and ME. The majority of annotated mcrA sequences were assigned taxonomically to the genera Methanosaeta in the order Methanosarcinales. Methanogens from the genus Methanosaeta are obligate acetotrophs, suggesting this genus plays a dominant role in methane production from the analyzed fermentation sample. Numerous analyzed sequences within the algae fed anaerobic digester were unclassified and could not be assigned taxonomically. Relative amplicon frequencies were determined for each primer set to determine the utility of each in pyrosequencing. Primer sets ML and MCR performed better quantitatively (representing the large majority of analyzed sequences) than primer set ME. However, each of these primer sets was shown to provide a quantitatively unique community structure, and thus they are of equal importance in mcrA metagenomic analysis. PMID:23724331

  16. Molecular characterization of pea enation mosaic virus and bean leafroll virus from the Pacific Northwest, USA.

    PubMed

    Vemulapati, B; Druffel, K L; Eigenbrode, S D; Karasev, A; Pappu, H R

    2010-10-01

    The family Luteoviridae consists of eight viruses assigned to three different genera, Luteovirus, Polerovirus and Enamovirus. The complete genomic sequences of pea enation mosaic virus (genus Enamovirus) and bean leafroll virus (genus Luteovirus) from the Pacific Northwest, USA, were determined. Annotation, sequence comparisons, and phylogenetic analysis of selected genes together with those of known polero- and enamoviruses were conducted.

  17. No evidence for the use of DIR, D–D fusions, chromosome 15 open reading frames or VHreplacement in the peripheral repertoire was found on application of an improved algorithm, JointML, to 6329 human immunoglobulin H rearrangements

    PubMed Central

    Ohm-Laursen, Line; Nielsen, Morten; Larsen, Stine R; Barington, Torben

    2006-01-01

    Antibody diversity is created by imprecise joining of the variability (V), diversity (D) and joining (J) gene segments of the heavy and light chain loci. Analysis of rearrangements is complicated by somatic hypermutations and uncertainty concerning the sources of gene segments and the precise way in which they recombine. It has been suggested that D genes with irregular recombination signal sequences (DIR) and chromosome 15 open reading frames (OR15) can replace conventional D genes, that two D genes or inverted D genes may be used and that the repertoire can be further diversified by heavy chain V gene (VH) replacement. Safe conclusions require large, well-defined sequence samples and algorithms minimizing stochastic assignment of segments. Two computer programs were developed for analysis of heavy chain joints. JointHMM is a profile hidden Markow model, while JointML is a maximum-likelihood-based method taking the lengths of the joint and the mutational status of the VH gene into account. The programs were applied to a set of 6329 clonally unrelated rearrangements. A conventional D gene was found in 80% of unmutated sequences and 64% of mutated sequences, while D-gene assignment was kept below 5% in artificial (randomly permutated) rearrangements. No evidence for the use of DIR, OR15, multiple D genes or VH replacements was found, while inverted D genes were used in less than 1‰ of the sequences. JointML was shown to have a higher predictive performance for D-gene assignment in mutated and unmutated sequences than four other publicly available programs. An online version 1·0 of JointML is available at http://www.cbs.dtu.dk/services/VDJsolver. PMID:17005006

  18. First DNA Barcode Reference Library for the Identification of South American Freshwater Fish from the Lower Paraná River

    PubMed Central

    Brancolini, Florencia; del Pazo, Felipe; Posner, Victoria Maria; Grimberg, Alexis; Arranz, Silvia Eda

    2016-01-01

    Valid fish species identification is essential for biodiversity conservation and fisheries management. Here, we provide a sequence reference library based on mitochondrial cytochrome c oxidase subunit I for a valid identification of 79 freshwater fish species from the Lower Paraná River. Neighbour-joining analysis based on K2P genetic distances formed non-overlapping clusters for almost all species with a ≥99% bootstrap support each. Identification was successful for 97.8% of species as the minimum genetic distance to the nearest neighbour exceeded the maximum intraspecific distance in all these cases. A barcoding gap of 2.5% was apparent for the whole data set with the exception of four cases. Within-species distances ranged from 0.00% to 7.59%, while interspecific distances varied between 4.06% and 19.98%, without considering Odontesthes species with a minimum genetic distance of 0%. Sequence library validation was performed by applying BOLDs BIN analysis tool, Poisson Tree Processes model and Automatic Barcode Gap Discovery, along with a reliable taxonomic assignment by experts. Exhaustive revision of vouchers was performed when a conflicting assignment was detected after sequence analysis and BIN discordance evaluation. Thus, the sequence library presented here can be confidently used as a benchmark for identification of half of the fish species recorded for the Lower Paraná River. PMID:27442116

  19. Impact of sequencing depth on the characterization of the microbiome and resistome.

    PubMed

    Zaheer, Rahat; Noyes, Noelle; Ortega Polo, Rodrigo; Cook, Shaun R; Marinier, Eric; Van Domselaar, Gary; Belk, Keith E; Morley, Paul S; McAllister, Tim A

    2018-04-12

    Developments in high-throughput next generation sequencing (NGS) technology have rapidly advanced the understanding of overall microbial ecology as well as occurrence and diversity of specific genes within diverse environments. In the present study, we compared the ability of varying sequencing depths to generate meaningful information about the taxonomic structure and prevalence of antimicrobial resistance genes (ARGs) in the bovine fecal microbial community. Metagenomic sequencing was conducted on eight composite fecal samples originating from four beef cattle feedlots. Metagenomic DNA was sequenced to various depths, D1, D0.5 and D0.25, with average sample read counts of 117, 59 and 26 million, respectively. A comparative analysis of the relative abundance of reads aligning to different phyla and antimicrobial classes indicated that the relative proportions of read assignments remained fairly constant regardless of depth. However, the number of reads being assigned to ARGs as well as to microbial taxa increased significantly with increasing depth. We found a depth of D0.5 was suitable to describe the microbiome and resistome of cattle fecal samples. This study helps define a balance between cost and required sequencing depth to acquire meaningful results.

  20. Molecular markers for identifying a new selected variety of Pacific white shrimp Litopenaeus vannamei

    NASA Astrophysics Data System (ADS)

    Yu, Yang; Zhang, Xiaojun; Liu, Jingwen; Li, Fuhua; Huang, Hao; Li, Yijun; Liu, Xiaolin; Xiang, Jianhai

    2015-01-01

    Selective breeding of the Pacific white shrimp Litopenaeus vannamei during the last decade has produced new varieties exhibiting high growth rates and disease resistance. However, the identification of new varieties of shrimps from their phenotypic characters is difficult. This study introduces a new approach for identifying varieties of shrimps using molecular markers of microsatellites and mitochondrial control region sequences. The method was employed to identify a new selected variety, Kehai No. 1 (KH-1), from three representative stocks (control group): Zhengda; Tongwei; and a stock collected from Fujian Province, which is now cultured in mainland China. By pooled genotyping of KH-1 and the control group, five microsatellites showing differences between KH-1 and the control group were screened out. Individual genotyping data confirmed the results from pooled genotyping. The genotyping data for the five microsatellites were applied to the assignment analysis of the KH-1 group and the control group using the partial Bayesian assignment method in GENECLASS2. By sequencing the mitochondrial control regions of individuals from the KH-1 and control group, four haplotypes were observed in the KH-1 group, whereas 14 haplotypes were obtained in the control group. By combining the microsatellite assignment analysis with mitochondrial control region analysis, the average accuracy of identification of individuals in the KH-1 group and control group reached 89%. The five selected microsatellite loci and mitochondrial control region sequences were highly polymorphic and could be used to distinguish new selected varieties of L. vannamei from other populations cultured in China.

  1. Clonal architecture of secondary acute myeloid leukemia defined by single-cell sequencing.

    PubMed

    Hughes, Andrew E O; Magrini, Vincent; Demeter, Ryan; Miller, Christopher A; Fulton, Robert; Fulton, Lucinda L; Eades, William C; Elliott, Kevin; Heath, Sharon; Westervelt, Peter; Ding, Li; Conrad, Donald F; White, Brian S; Shao, Jin; Link, Daniel C; DiPersio, John F; Mardis, Elaine R; Wilson, Richard K; Ley, Timothy J; Walter, Matthew J; Graubert, Timothy A

    2014-07-01

    Next-generation sequencing has been used to infer the clonality of heterogeneous tumor samples. These analyses yield specific predictions-the population frequency of individual clones, their genetic composition, and their evolutionary relationships-which we set out to test by sequencing individual cells from three subjects diagnosed with secondary acute myeloid leukemia, each of whom had been previously characterized by whole genome sequencing of unfractionated tumor samples. Single-cell mutation profiling strongly supported the clonal architecture implied by the analysis of bulk material. In addition, it resolved the clonal assignment of single nucleotide variants that had been initially ambiguous and identified areas of previously unappreciated complexity. Accordingly, we find that many of the key assumptions underlying the analysis of tumor clonality by deep sequencing of unfractionated material are valid. Furthermore, we illustrate a single-cell sequencing strategy for interrogating the clonal relationships among known variants that is cost-effective, scalable, and adaptable to the analysis of both hematopoietic and solid tumors, or any heterogeneous population of cells.

  2. Pyrosequencing the Canine Faecal Microbiota: Breadth and Depth of Biodiversity

    PubMed Central

    Hand, Daniel; Wallis, Corrin; Colyer, Alison; Penn, Charles W.

    2013-01-01

    Mammalian intestinal microbiota remain poorly understood despite decades of interest and investigation by culture-based and other long-established methodologies. Using high-throughput sequencing technology we now report a detailed analysis of canine faecal microbiota. The study group of animals comprised eleven healthy adult miniature Schnauzer dogs of mixed sex and age, some closely related and all housed in kennel and pen accommodation on the same premises with similar feeding and exercise regimes. DNA was extracted from faecal specimens and subjected to PCR amplification of 16S rDNA, followed by sequencing of the 5′ region that included variable regions V1 and V2. Barcoded amplicons were sequenced by Roche-454 FLX high-throughput pyrosequencing. Sequences were assigned to taxa using the Ribosomal Database Project Bayesian classifier and revealed dominance of Fusobacterium and Bacteroidetes phyla. Differences between animals in the proportions of different taxa, among 10,000 reads per animal, were clear and not supportive of the concept of a “core microbiota”. Despite this variability in prominent genera, littermates were shown to have a more similar faecal microbial composition than unrelated dogs. Diversity of the microbiota was also assessed by assignment of sequence reads into operational taxonomic units (OTUs) at the level of 97% sequence identity. The OTU data were then subjected to rarefaction analysis and determination of Chao1 richness estimates. The data indicated that faecal microbiota comprised possibly as many as 500 to 1500 OTUs. PMID:23382835

  3. ARResT/AssignSubsets: a novel application for robust subclassification of chronic lymphocytic leukemia based on B cell receptor IG stereotypy.

    PubMed

    Bystry, Vojtech; Agathangelidis, Andreas; Bikos, Vasilis; Sutton, Lesley Ann; Baliakas, Panagiotis; Hadzidimitriou, Anastasia; Stamatopoulos, Kostas; Darzentas, Nikos

    2015-12-01

    An ever-increasing body of evidence supports the importance of B cell receptor immunoglobulin (BcR IG) sequence restriction, alias stereotypy, in chronic lymphocytic leukemia (CLL). This phenomenon accounts for ∼30% of studied cases, one in eight of which belong to major subsets, and extends beyond restricted sequence patterns to shared biologic and clinical characteristics and, generally, outcome. Thus, the robust assignment of new cases to major CLL subsets is a critical, and yet unmet, requirement. We introduce a novel application, ARResT/AssignSubsets, which enables the robust assignment of BcR IG sequences from CLL patients to major stereotyped subsets. ARResT/AssignSubsets uniquely combines expert immunogenetic sequence annotation from IMGT/V-QUEST with curation to safeguard quality, statistical modeling of sequence features from more than 7500 CLL patients, and results from multiple perspectives to allow for both objective and subjective assessment. We validated our approach on the learning set, and evaluated its real-world applicability on a new representative dataset comprising 459 sequences from a single institution. ARResT/AssignSubsets is freely available on the web at http://bat.infspire.org/arrest/assignsubsets/ nikos.darzentas@gmail.com. Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  4. Fold independent structural comparisons of protein-ligand binding sites for exploring functional relationships.

    PubMed

    Gold, Nicola D; Jackson, Richard M

    2006-02-03

    The rapid growth in protein structural data and the emergence of structural genomics projects have increased the need for automatic structure analysis and tools for function prediction. Small molecule recognition is critical to the function of many proteins; therefore, determination of ligand binding site similarity is important for understanding ligand interactions and may allow their functional classification. Here, we present a binding sites database (SitesBase) that given a known protein-ligand binding site allows rapid retrieval of other binding sites with similar structure independent of overall sequence or fold similarity. However, each match is also annotated with sequence similarity and fold information to aid interpretation of structure and functional similarity. Similarity in ligand binding sites can indicate common binding modes and recognition of similar molecules, allowing potential inference of function for an uncharacterised protein or providing additional evidence of common function where sequence or fold similarity is already known. Alternatively, the resource can provide valuable information for detailed studies of molecular recognition including structure-based ligand design and in understanding ligand cross-reactivity. Here, we show examples of atomic similarity between superfamily or more distant fold relatives as well as between seemingly unrelated proteins. Assignment of unclassified proteins to structural superfamiles is also undertaken and in most cases substantiates assignments made using sequence similarity. Correct assignment is also possible where sequence similarity fails to find significant matches, illustrating the potential use of binding site comparisons for newly determined proteins.

  5. Neo-sex Chromosomes in the Monarch Butterfly, Danaus plexippus

    PubMed Central

    Mongue, Andrew J.; Nguyen, Petr; Voleníková, Anna; Walters, James R.

    2017-01-01

    We report the discovery of a neo-sex chromosome in the monarch butterfly, Danaus plexippus, and several of its close relatives. Z-linked scaffolds in the D. plexippus genome assembly were identified via sex-specific differences in Illumina sequencing coverage. Additionally, a majority of the D. plexippus genome assembly was assigned to chromosomes based on counts of one-to-one orthologs relative to the butterfly Melitaea cinxia (with replication using two other lepidopteran species), in which genome scaffolds have been mapped to linkage groups. Sequencing coverage-based assessments of Z linkage combined with homology-based chromosomal assignments provided strong evidence for a Z-autosome fusion in the Danaus lineage, involving the autosome homologous to chromosome 21 in M. cinxia. Coverage analysis also identified three notable assembly errors resulting in chimeric Z-autosome scaffolds. Cytogenetic analysis further revealed a large W chromosome that is partially euchromatic, consistent with being a neo-W chromosome. The discovery of a neo-Z and the provisional assignment of chromosome linkage for >90% of D. plexippus genes lays the foundation for novel insights concerning sex chromosome evolution in this female-heterogametic model species for functional and evolutionary genomics. PMID:28839116

  6. Expanded microbial genome coverage and improved protein family annotation in the COG database.

    PubMed

    Galperin, Michael Y; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V

    2015-01-01

    Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the COGs is expected to become an important tool for microbial genomics. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by US Government employees and is in the public domain in the US.

  7. The SUPERFAMILY database in 2004: additions and improvements.

    PubMed

    Madera, Martin; Vogel, Christine; Kummerfeld, Sarah K; Chothia, Cyrus; Gough, Julian

    2004-01-01

    The SUPERFAMILY database provides structural assignments to protein sequences and a framework for analysis of the results. At the core of the database is a library of profile Hidden Markov Models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent an entire superfamily. We have applied the library to predicted proteins from all completely sequenced genomes (currently 154), the Swiss-Prot and TrEMBL databases and other sequence collections. Close to 60% of all proteins have at least one match, and one half of all residues are covered by assignments. All models and full results are available for download and online browsing at http://supfam.org. Users can study the distribution of their superfamily of interest across all completely sequenced genomes, investigate with which other superfamilies it combines and retrieve proteins in which it occurs. Alternatively, concentrating on a particular genome as a whole, it is possible first, to find out its superfamily composition, and secondly, to compare it with that of other genomes to detect superfamilies that are over- or under-represented. In addition, the webserver provides the following standard services: sequence search; keyword search for genomes, superfamilies and sequence identifiers; and multiple alignment of genomic, PDB and custom sequences.

  8. Multilocus sequence typing (MLST) for lineage assignment and high resolution diversity studies in Trypanosoma cruzi.

    PubMed

    Yeo, Matthew; Mauricio, Isabel L; Messenger, Louisa A; Lewis, Michael D; Llewellyn, Martin S; Acosta, Nidia; Bhattacharyya, Tapan; Diosque, Patricio; Carrasco, Hernan J; Miles, Michael A

    2011-06-01

    Multilocus sequence typing (MLST) is a powerful and highly discriminatory method for analysing pathogen population structure and epidemiology. Trypanosoma cruzi, the protozoan agent of American trypanosomiasis (Chagas disease), has remarkable genetic and ecological diversity. A standardised MLST protocol that is suitable for assignment of T. cruzi isolates to genetic lineage and for higher resolution diversity studies has not been developed. We have sequenced and diplotyped nine single copy housekeeping genes and assessed their value as part of a systematic MLST scheme for T. cruzi. A minimum panel of four MLST targets (Met-III, RB19, TcGPXII, and DHFR-TS) was shown to provide unambiguous assignment of isolates to the six known T. cruzi lineages (Discrete Typing Units, DTUs TcI-TcVI). In addition, we recommend six MLST targets (Met-II, Met-III, RB19, TcMPX, DHFR-TS, and TR) for more in depth diversity studies on the basis that diploid sequence typing (DST) with this expanded panel distinguished 38 out of 39 reference isolates. Phylogenetic analysis implies a subdivision between North and South American TcIV isolates. Single Nucleotide Polymorphism (SNP) data revealed high levels of heterozygosity among DTUs TcI, TcIII, TcIV and, for three targets, putative corresponding homozygous and heterozygous loci within DTUs TcI and TcIII. Furthermore, individual gene trees gave incongruent topologies at inter- and intra-DTU levels, inconsistent with a model of strict clonality. We demonstrate the value of systematic MLST diplotyping for describing inter-DTU relationships and for higher resolution diversity studies of T. cruzi, including presence of recombination events. The high levels of heterozygosity will facilitate future population genetics analysis based on MLST haplotypes.

  9. Morphological identification and COI barcodes of adult flies help determine species identities of chironomid larvae (Diptera, Chironomidae).

    PubMed

    Failla, A J; Vasquez, A A; Hudson, P; Fujimoto, M; Ram, J L

    2016-02-01

    Establishing reliable methods for the identification of benthic chironomid communities is important due to their significant contribution to biomass, ecology and the aquatic food web. Immature larval specimens are more difficult to identify to species level by traditional morphological methods than their fully developed adult counterparts, and few keys are available to identify the larval species. In order to develop molecular criteria to identify species of chironomid larvae, larval and adult chironomids from Western Lake Erie were subjected to both molecular and morphological taxonomic analysis. Mitochondrial cytochrome c oxidase I (COI) barcode sequences of 33 adults that were identified to species level by morphological methods were grouped with COI sequences of 189 larvae in a neighbor-joining taxon-ID tree. Most of these larvae could be identified only to genus level by morphological taxonomy (only 22 of the 189 sequenced larvae could be identified to species level). The taxon-ID tree of larval sequences had 45 operational taxonomic units (OTUs, defined as clusters with >97% identity or individual sequences differing from nearest neighbors by >3%; supported by analysis of all larval pairwise differences), of which seven could be identified to species or 'species group' level by larval morphology. Reference sequences from the GenBank and BOLD databases assigned six larval OTUs with presumptive species level identifications and confirmed one previously assigned species level identification. Sequences from morphologically identified adults in the present study grouped with and further classified the identity of 13 larval OTUs. The use of morphological identification and subsequent DNA barcoding of adult chironomids proved to be beneficial in revealing possible species level identifications of larval specimens. Sequence data from this study also contribute to currently inadequate public databases relevant to the Great Lakes region, while the neighbor-joining analysis reported here describes the application and confirmation of a useful tool that can accelerate identification and bioassessment of chironomid communities.

  10. Morphological identification and COI barcodes of adult flies help determine species identities of chironomid larvae (Diptera, Chironomidae)

    USGS Publications Warehouse

    Failla, Andrew Joseph; Vasquez, Adrian Amelio; Hudson, Patrick L.; Fujimoto, Masanori; Ram, Jeffrey L.

    2016-01-01

    Establishing reliable methods for the identification of benthic chironomid communities is important due to their significant contribution to biomass, ecology and the aquatic food web. Immature larval specimens are more difficult to identify to species level by traditional morphological methods than their fully developed adult counterparts, and few keys are available to identify the larval species. In order to develop molecular criteria to identify species of chironomid larvae, larval and adult chironomids from Western Lake Erie were subjected to both molecular and morphological taxonomic analysis. Mitochondrial cytochrome c oxidase I (COI) barcode sequences of 33 adults that were identified to species level by morphological methods were grouped with COI sequences of 189 larvae in a neighbor-joining taxon-ID tree. Most of these larvae could be identified only to genus level by morphological taxonomy (only 22 of the 189 sequenced larvae could be identified to species level). The taxon-ID tree of larval sequences had 45 operational taxonomic units (OTUs, defined as clusters with >97% identity or individual sequences differing from nearest neighbors by >3%; supported by analysis of all larval pairwise differences), of which seven could be identified to species or ‘species group’ level by larval morphology. Reference sequences from the GenBank and BOLD databases assigned six larval OTUs with presumptive species level identifications and confirmed one previously assigned species level identification. Sequences from morphologically identified adults in the present study grouped with and further classified the identity of 13 larval OTUs. The use of morphological identification and subsequent DNA barcoding of adult chironomids proved to be beneficial in revealing possible species level identifications of larval specimens. Sequence data from this study also contribute to currently inadequate public databases relevant to the Great Lakes region, while the neighbor-joining analysis reported here describes the application and confirmation of a useful tool that can accelerate identification and bioassesment of chironomid communities.

  11. Genomic characterization and taxonomic position of a rhabdovirus from a hybrid snakehead.

    PubMed

    Zeng, Weiwei; Wang, Qing; Wang, Yingying; Liu, Cun; Liang, Hongru; Fang, Xiang; Wu, Shuqin

    2014-09-01

    A new rhabdovirus, tentatively designated as hybrid snakehead rhabdovirus C1207 (HSHRV-C1207), was first isolated from a moribund hybrid snakehead (Channa maculata×Channa argus) in China. We present the complete genome sequence of HSHRV-C1207 and a comprehensive sequence comparison between HSHRV-C1207 and other rhabdoviruses. Sequence alignment and phylogenetic analysis revealed that HSHRV-C1207 shared the highest degree of homology with Monopterus albus rhabdovirus and Siniperca chuatsi rhabdovirus. All three viruses clustered into a single group that was distinct from the recognized genera in the family Rhabdoviridae. Our analysis suggests that HSHRV-C1207, as well as MARV and SCRV, should be assigned to a new rhabdovirus genus.

  12. A combination of PhP typing and β-d-glucuronidase gene sequence variation analysis for differentiation of Escherichia coli from humans and animals.

    PubMed

    Masters, N; Christie, M; Katouli, M; Stratton, H

    2015-06-01

    We investigated the usefulness of the β-d-glucuronidase gene variance in Escherichia coli as a microbial source tracking tool using a novel algorithm for comparison of sequences from a prescreened set of host-specific isolates using a high-resolution PhP typing method. A total of 65 common biochemical phenotypes belonging to 318 E. coli strains isolated from humans and domestic and wild animals were analysed for nucleotide variations at 10 loci along a 518 bp fragment of the 1812 bp β-d-glucuronidase gene. Neighbour-joining analysis of loci variations revealed 86 (76.8%) human isolates and 91.2% of animal isolates were correctly identified. Pairwise hierarchical clustering improved assignment; where 92 (82.1%) human and 204 (99%) animal strains were assigned to their respective cluster. Our data show that initial typing of isolates and selection of common types from different hosts prior to analysis of the β-d-glucuronidase gene sequence improves source identification. We also concluded that numerical profiling of the nucleotide variations can be used as a valuable approach to differentiate human from animal E. coli. This study signifies the usefulness of the β-d-glucuronidase gene as a marker for differentiating human faecal pollution from animal sources.

  13. Comparative Analysis of Transcriptomes of Macrophage Revealing the Mechanism of the Immunoregulatory Activities of a Novel Polysaccharide Isolated from Boletus speciosus Frost.

    PubMed

    Ding, Xiang; Zhu, Hongqing; Hou, Yiling; Hou, Wanru; Zhang, Nan; Fu, Lei

    2017-01-01

    The mechanism of the immunoregulatory activities of polysaccharide is still not clear. Here, we performed the B-cell, T-cell, and macrophage cell proliferation, the cell cycle analysis of macrophage cells, sequenced the transcriptomes of control group macrophages, and Boletus speciosus Frost polysaccharide (BSF-1) group macrophages using Illumina sequencing technology to identify differentially expressed genes (DEGs) to determine the molecular mechanisms of immunomodulatory activity of BSF-1 in macrophages. These results suggested that BSF-1 could promote the proliferation of B-cell, T-cell, and macrophages, promote the proliferation of macrophage cells by abolishing cell cycle arrests in the G0/G1 phases, and promote cell cycle progression in S-phase and G2/M phase, which might induce cell division. A total of 12,498,414 and 11,840,624 bp paired-end reads were obtained for the control group and BSF-1 group, respectively, and they corresponded to a total size of 12.5 G bp and 11.8 G bp, respectively, after the low-quality reads and adapter sequences were removed. Approximately 81.83% of the total number of genes (8,257) were expressed reads per kilobase per million mapped reads (RPKM ≥1) and more than 1366 genes were highly expressed (RPKM >60) in the BSF-1 group. A gene ontology-enrichment analysis generated 13,042 assignments to cellular components, 13,094 assignments to biological processes, and 13,135 assignments to molecular functions. A Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis showed that the mitogen-activated protein kinase (MAPK) signaling pathways are significantly enriched for DEGs between the two cell groups. An analysis of transcriptome resources enabled us to examine gene expression profiles, verify differential gene expression, and select candidate signaling pathways as the mechanisms of the immunomodulatory activity of BSF-1. Based on the experimental data, we believe that the significant antitumor activities of BSF-1 in vivo mainly involve the MAPK signaling pathways. Boletus speciosus Frost-1 (BSF-1) could promote the proliferation of B-cell, T-cell, and macrophages, promote the proliferation of macrophage cells by abolishing cell cycle arrests in the G0/G1 phases, and promote cell cycle progression in S-phase and G2/M phase, which might induce cell divisionApproximately 81.83% of the total number of genes (8257) were expressed (reads per kilobase per million mapped reads [RPKM] =1) and more than 1366 genes were highly expressed (RPKM >60) in the BSF-1 groupA gene ontology-enrichment analysis generated 13,042 assignments to cellular components, 13,094 assignments to biological processes, and 13,135 assignments to molecular functionsA Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis showed that the mitogen-activated protein kinase signaling pathways are significantly enriched for DEGs between the two cell groups. Abbreviations used: BSF-1: Boletus speciosus Frost polysaccharide.

  14. Laughter and the Management of Divergent Positions in Peer Review Interactions

    PubMed Central

    Raclaw, Joshua; Ford, Cecilia E.

    2017-01-01

    In this paper we focus on how participants in peer review interactions use laughter as a resource as they publicly report divergence of evaluative positions, divergence that is typical in the give and take of joint grant evaluation. Using the framework of conversation analysis, we examine the infusion of laughter and multimodal laugh-relevant practices into sequences of talk in meetings of grant reviewers deliberating on the evaluation and scoring of high-level scientific grant applications. We focus on a recurrent sequence in these meetings, what we call the score-reporting sequence, in which the assigned reviewers first announce the preliminary scores they have assigned to the grant. We demonstrate that such sequences are routine sites for the use of laugh practices to navigate the initial moments in which divergence of opinion is made explicit. In the context of meetings convened for the purposes of peer review, laughter thus serves as a valuable resource for managing the socially delicate but institutionally required reporting of divergence and disagreement that is endemic to meetings where these types of evaluative tasks are a focal activity. PMID:29170594

  15. Identification of functional candidates amongst hypothetical proteins of Mycobacterium leprae Br4923, a causative agent of leprosy.

    PubMed

    Naqvi, Ahmad Abu Turab; Ahmad, Faizan; Hassan, Md Imtaiyaz

    2015-01-01

    Mycobacterium leprae is an intracellular obligate parasite that causes leprosy in humans, and it leads to the destruction of peripheral nerves and skin deformation. Here, we report an extensive analysis of the hypothetical proteins (HPs) from M. leprae strain Br4923, assigning their functions to better understand the mechanism of pathogenesis and to search for potential therapeutic interventions. The genome of M. leprae encodes 1604 proteins, of which the functions of 632 are not known (HPs). In this paper, we predicted the probable functions of 312 HPs. First, we classified all HPs into families and subfamilies on the basis of sequence similarity, followed by domain assignment, which provides many clues for their possible function. However, the functions of 320 proteins were not predicted because of low sequence similarity with proteins of known function. Annotated HPs were categorized into enzymes, binding proteins, transporters, and proteins involved in cellular processes. We found several novel proteins whose functions were unknown for M. leprae. These proteins have a requisite association with bacterial virulence and pathogenicity. Finally, our sequence-based analysis will be helpful for further validation and the search for potential drug targets while developing effective drugs to cure leprosy.

  16. Evolution, substrate specificity and subfamily classification of glycoside hydrolase family 5 (GH5).

    PubMed

    Aspeborg, Henrik; Coutinho, Pedro M; Wang, Yang; Brumer, Harry; Henrissat, Bernard

    2012-09-20

    The large Glycoside Hydrolase family 5 (GH5) groups together a wide range of enzymes acting on β-linked oligo- and polysaccharides, and glycoconjugates from a large spectrum of organisms. The long and complex evolution of this family of enzymes and its broad sequence diversity limits functional prediction. With the objective of improving the differentiation of enzyme specificities in a knowledge-based context, and to obtain new evolutionary insights, we present here a new, robust subfamily classification of family GH5. About 80% of the current sequences were assigned into 51 subfamilies in a global analysis of all publicly available GH5 sequences and associated biochemical data. Examination of subfamilies with catalytically-active members revealed that one third are monospecific (containing a single enzyme activity), although new functions may be discovered with biochemical characterization in the future. Furthermore, twenty subfamilies presently have no characterization whatsoever and many others have only limited structural and biochemical data. Mapping of functional knowledge onto the GH5 phylogenetic tree revealed that the sequence space of this historical and industrially important family is far from well dispersed, highlighting targets in need of further study. The analysis also uncovered a number of GH5 proteins which have lost their catalytic machinery, indicating evolution towards novel functions. Overall, the subfamily division of GH5 provides an actively curated resource for large-scale protein sequence annotation for glycogenomics; the subfamily assignments are openly accessible via the Carbohydrate-Active Enzyme database at http://www.cazy.org/GH5.html.

  17. Priceomyuces M. Suzuki & Kurtzman (2010)

    USDA-ARS?s Scientific Manuscript database

    This chapter describes the ascomycete yeast genus Priceomyces and is to be published in "The Yeasts, A Taxonomic Study, 5th edition." The genus Priceomyces has five described species that were earlier assigned to the genus Pichia, but gene sequence analysis showed that the species, now reclassified...

  18. Molecular Diagnostic Analysis of Outbreak Scenarios

    ERIC Educational Resources Information Center

    Morsink, M. C.; Dekter, H. E.; Dirks-Mulder, A.; van Leeuwen, W. B.

    2012-01-01

    In the current laboratory assignment, technical aspects of the polymerase chain reaction (PCR) are integrated in the context of six different bacterial outbreak scenarios. The "Enterobacterial Repetitive Intergenic Consensus Sequence" (ERIC) PCR was used to analyze different outbreak scenarios. First, groups of 2-4 students determined optimal…

  19. A bioinformatic pipeline for identifying informative SNP panels for parentage assignment from RADseq data.

    PubMed

    Andrews, Kimberly R; Adams, Jennifer R; Cassirer, E Frances; Plowright, Raina K; Gardner, Colby; Dwire, Maggie; Hohenlohe, Paul A; Waits, Lisette P

    2018-06-05

    The development of high-throughput sequencing technologies is dramatically increasing the use of single nucleotide polymorphisms (SNPs) across the field of genetics, but most parentage studies of wild populations still rely on microsatellites. We developed a bioinformatic pipeline for identifying SNP panels that are informative for parentage analysis from restriction site-associated DNA sequencing (RADseq) data. This pipeline includes options for analysis with or without a reference genome, and provides methods to maximize genotyping accuracy and select sets of unlinked loci that have high statistical power. We test this pipeline on small populations of Mexican gray wolf and bighorn sheep, for which parentage analyses are expected to be challenging due to low genetic diversity and the presence of many closely related individuals. We compare the results of parentage analysis across SNP panels generated with or without the use of a reference genome, and between SNPs and microsatellites. For Mexican gray wolf, we conducted parentage analyses for 30 pups from a single cohort where samples were available from 64% of possible mothers and 53% of possible fathers, and the accuracy of parentage assignments could be estimated because true identities of parents were known a priori based on field data. For bighorn sheep, we conducted maternity analyses for 39 lambs from five cohorts where 77% of possible mothers were sampled, but true identities of parents were unknown. Analyses with and without a reference genome produced SNP panels with >95% parentage assignment accuracy for Mexican gray wolf, outperforming microsatellites at 78% accuracy. Maternity assignments were completely consistent across all SNP panels for the bighorn sheep, and were 74.4% consistent with assignments from microsatellites. Accuracy and consistency of parentage analysis were not reduced when using as few as 284 SNPs for Mexican gray wolf and 142 SNPs for bighorn sheep, indicating our pipeline can be used to develop SNP genotyping assays for parentage analysis with relatively small numbers of loci. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  20. Genomics of Three New Bacteriophages Useful in the Biocontrol of Salmonella

    PubMed Central

    Bardina, Carlota; Colom, Joan; Spricigo, Denis A.; Otero, Jennifer; Sánchez-Osuna, Miquel; Cortés, Pilar; Llagostera, Montserrat

    2016-01-01

    Non-typhoid Salmonella is the principal pathogen related to food-borne diseases throughout the world. Widespread antibiotic resistance has adversely affected human health and has encouraged the search for alternative antimicrobial agents. The advances in bacteriophage therapy highlight their use in controlling a broad spectrum of food-borne pathogens. One requirement for the use of bacteriophages as antibacterials is the characterization of their genomes. In this work, complete genome sequencing and molecular analyses were carried out for three new virulent Salmonella-specific bacteriophages (UAB_Phi20, UAB_Phi78, and UAB_Phi87) able to infect a broad range of Salmonella strains. Sequence analysis of the genomes of UAB_Phi20, UAB_Phi78, and UAB_Phi87 bacteriophages did not evidence the presence of known virulence-associated and antibiotic resistance genes, and potential immunoreactive food allergens. The UAB_Phi20 genome comprised 41,809 base pairs with 80 open reading frames (ORFs); 24 of them with assigned function. Genome sequence showed a high homology of UAB_Phi20 with Salmonella bacteriophage P22 and other P22likeviruses genus of the Podoviridae family, including ST64T and ST104. The DNA of UAB_Phi78 contained 44,110 bp including direct terminal repeats (DTR) of 179 bp and 58 putative ORFs were predicted and 20 were assigned function. This bacteriophage was assigned to the SP6likeviruses genus of the Podoviridae family based on its high similarity not only with SP6 but also with the K1-5, K1E, and K1F bacteriophages, all of which infect Escherichia coli. The UAB_Phi87 genome sequence consisted of 87,669 bp with terminal direct repeats of 608 bp; although 148 ORFs were identified, putative functions could be assigned to only 29 of them. Sequence comparisons revealed the mosaic structure of UAB_Phi87 and its high similarity with bacteriophages Felix O1 and wV8 of E. coli with respect to genetic content and functional organization. Phylogenetic analysis of large terminase subunits confirms their packaging strategies and grouping to the different phage genus type. All these studies are necessary for the development and the use of an efficient cocktail with commercial applications in bacteriophage therapy against Salmonella. PMID:27148229

  1. Systematic internal transcribed spacer sequence analysis for identification of clinical mold isolates in diagnostic mycology: a 5-year study.

    PubMed

    Ciardo, Diana E; Lucke, Katja; Imhof, Alex; Bloemberg, Guido V; Böttger, Erik C

    2010-08-01

    The implementation of internal transcribed spacer (ITS) sequencing for routine identification of molds in the diagnostic mycology laboratory was analyzed in a 5-year study. All mold isolates (n = 6,900) recovered in our laboratory from 2005 to 2009 were included in this study. According to a defined work flow, which in addition to troublesome phenotypic identification takes clinical relevance into account, 233 isolates were subjected to ITS sequence analysis. Sequencing resulted in successful identification for 78.6% of the analyzed isolates (57.1% at species level, 21.5% at genus level). In comparison, extended in-depth phenotypic characterization of the isolates subjected to sequencing achieved taxonomic assignment for 47.6% of these, with a mere 13.3% at species level. Optimization of DNA extraction further improved the efficacy of molecular identification. This study is the first of its kind to testify to the systematic implementation of sequence-based identification procedures in the routine workup of mold isolates in the diagnostic mycology laboratory.

  2. Single Machine Scheduling and Due Date Assignment with Past-Sequence-Dependent Setup Time and Position-Dependent Processing Time

    PubMed Central

    Zhao, Chuan-Li; Hsu, Hua-Feng

    2014-01-01

    This paper considers single machine scheduling and due date assignment with setup time. The setup time is proportional to the length of the already processed jobs; that is, the setup time is past-sequence-dependent (p-s-d). It is assumed that a job's processing time depends on its position in a sequence. The objective functions include total earliness, the weighted number of tardy jobs, and the cost of due date assignment. We analyze these problems with two different due date assignment methods. We first consider the model with job-dependent position effects. For each case, by converting the problem to a series of assignment problems, we proved that the problems can be solved in O(n 4) time. For the model with job-independent position effects, we proved that the problems can be solved in O(n 3) time by providing a dynamic programming algorithm. PMID:25258727

  3. Single machine scheduling and due date assignment with past-sequence-dependent setup time and position-dependent processing time.

    PubMed

    Zhao, Chuan-Li; Hsu, Chou-Jung; Hsu, Hua-Feng

    2014-01-01

    This paper considers single machine scheduling and due date assignment with setup time. The setup time is proportional to the length of the already processed jobs; that is, the setup time is past-sequence-dependent (p-s-d). It is assumed that a job's processing time depends on its position in a sequence. The objective functions include total earliness, the weighted number of tardy jobs, and the cost of due date assignment. We analyze these problems with two different due date assignment methods. We first consider the model with job-dependent position effects. For each case, by converting the problem to a series of assignment problems, we proved that the problems can be solved in O(n(4)) time. For the model with job-independent position effects, we proved that the problems can be solved in O(n(3)) time by providing a dynamic programming algorithm.

  4. Sequence Analysis of Raspberry latent virus Suggests a New Genus of Dicot Infecting Reoviruses

    USDA-ARS?s Scientific Manuscript database

    Currently, there are three assigned genera of plant reoviruses: Phytoreovirus, Fijivirus and Oryzavirus. With only two exceptions, all plant reoviruses infect monocotyledonous plants. The recent characterization of Raspberry latent virus (RpLV) isolated from red raspberry plants in northern Washingt...

  5. Molecular and morphologic data reveal multiple species in Peromyscus pectoralis

    PubMed Central

    Bradley, Robert D.; Schmidly, David J.; Amman, Brian R.; Platt, Roy N.; Neumann, Kathy M.; Huynh, Howard M.; Muñiz-Martínez, Raúl; López-González, Celia; Ordóñez-Garza, Nicté

    2015-01-01

    DNA sequence and morphometric data were used to re-evaluate the taxonomy and systematics of Peromyscus pectoralis. Phylogenetic analyses (maximum likelihood and Bayesian inference) of DNA sequences from the mitochondrial cytochrome-b gene in 44 samples of P. pectoralis indicated 2 well-supported monophyletic clades. The 1st clade contained specimens from Texas historically assigned to P. p. laceianus; the 2nd was comprised of specimens previously referable to P. p. collinus, P. p. laceianus, and P. p. pectoralis obtained from northern and eastern Mexico. Levels of genetic variation (~7%) between these 2 clades indicated that the genetic divergence typically exceeded that reported for other species of Peromyscus. Samples of P. p. laceianus north and south of the Río Grande were not monophyletic. In addition, samples representing P. p. collinus and P. p. pectoralis formed 2 clades that differed genetically by 7.14%. Multivariate analyses of external and cranial measurements from 63 populations of P. pectoralis revealed 4 morpho-groups consistent with clades in the DNA sequence analysis: 1 from Texas and New Mexico assignable to P. p. laceianus; a 2nd from western and southern Mexico assignable to P. p. pectoralis; a 3rd from northern and central Mexico previously assigned to P. p. pectoralis but herein shown to represent an undescribed taxon; and a 4th from southeastern Mexico assignable to P. p. collinus. Based on the concordance of these results, populations from the United States are referred to as P. laceianus, whereas populations from Mexico are referred to as P. pectoralis (including some samples historically assigned to P. p. collinus, P. p. laceianus, and P. p. pectoralis). A new subspecies is described to represent populations south of the Río Grande in northern and central Mexico. Additional research is needed to discern if P. p. collinus warrants species recognition. PMID:26937045

  6. Genome organization of epidemic Acinetobacter baumannii strains.

    PubMed

    Di Nocera, Pier Paolo; Rocco, Francesco; Giannouli, Maria; Triassi, Maria; Zarrilli, Raffaele

    2011-10-10

    Acinetobacter baumannii is an opportunistic pathogen responsible for hospital-acquired infections. A. baumannii epidemics described world-wide were caused by few genotypic clusters of strains. The occurrence of epidemics caused by multi-drug resistant strains assigned to novel genotypes have been reported over the last few years. In the present study, we compared whole genome sequences of three A. baumannii strains assigned to genotypes ST2, ST25 and ST78, representative of the most frequent genotypes responsible for epidemics in several Mediterranean hospitals, and four complete genome sequences of A. baumannii strains assigned to genotypes ST1, ST2 and ST77. Comparative genome analysis showed extensive synteny and identified 3068 coding regions which are conserved, at the same chromosomal position, in all A. baumannii genomes. Genome alignments also identified 63 DNA regions, ranging in size from 4 o 126 kb, all defined as genomic islands, which were present in some genomes, but were either missing or replaced by non-homologous DNA sequences in others. Some islands are involved in resistance to drugs and metals, others carry genes encoding surface proteins or enzymes involved in specific metabolic pathways, and others correspond to prophage-like elements. Accessory DNA regions encode 12 to 19% of the potential gene products of the analyzed strains. The analysis of a collection of epidemic A. baumannii strains showed that some islands were restricted to specific genotypes. The definition of the genome components of A. baumannii provides a scaffold to rapidly evaluate the genomic organization of novel clinical A. baumannii isolates. Changes in island profiling will be useful in genomic epidemiology of A. baumannii population.

  7. Assigning protein functions by comparative genome analysis protein phylogenetic profiles

    DOEpatents

    Pellegrini, Matteo; Marcotte, Edward M.; Thompson, Michael J.; Eisenberg, David; Grothe, Robert; Yeates, Todd O.

    2003-05-13

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  8. Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity.

    PubMed

    He, Yan; Caporaso, J Gregory; Jiang, Xiao-Tao; Sheng, Hua-Fang; Huse, Susan M; Rideout, Jai Ram; Edgar, Robert C; Kopylova, Evguenia; Walters, William A; Knight, Rob; Zhou, Hong-Wei

    2015-01-01

    The operational taxonomic unit (OTU) is widely used in microbial ecology. Reproducibility in microbial ecology research depends on the reliability of OTU-based 16S ribosomal subunit RNA (rRNA) analyses. Here, we report that many hierarchical and greedy clustering methods produce unstable OTUs, with membership that depends on the number of sequences clustered. If OTUs are regenerated with additional sequences or samples, sequences originally assigned to a given OTU can be split into different OTUs. Alternatively, sequences assigned to different OTUs can be merged into a single OTU. This OTU instability affects alpha-diversity analyses such as rarefaction curves, beta-diversity analyses such as distance-based ordination (for example, Principal Coordinate Analysis (PCoA)), and the identification of differentially represented OTUs. Our results show that the proportion of unstable OTUs varies for different clustering methods. We found that the closed-reference method is the only one that produces completely stable OTUs, with the caveat that sequences that do not match a pre-existing reference sequence collection are discarded. As a compromise to the factors listed above, we propose using an open-reference method to enhance OTU stability. This type of method clusters sequences against a database and includes unmatched sequences by clustering them via a relatively stable de novo clustering method. OTU stability is an important consideration when analyzing microbial diversity and is a feature that should be taken into account during the development of novel OTU clustering methods.

  9. Discrimination of germline V genes at different sequencing lengths and mutational burdens: A new tool for identifying and evaluating the reliability of V gene assignment.

    PubMed

    Zhang, Bochao; Meng, Wenzhao; Prak, Eline T Luning; Hershberg, Uri

    2015-12-01

    Immune repertoires are collections of lymphocytes that express diverse antigen receptor gene rearrangements consisting of Variable (V), (Diversity (D) in the case of heavy chains) and Joining (J) gene segments. Clonally related cells typically share the same germline gene segments and have highly similar junctional sequences within their third complementarity determining regions. Identifying clonal relatedness of sequences is a key step in the analysis of immune repertoires. The V gene is the most important for clone identification because it has the longest sequence and the greatest number of sequence variants. However, accurate identification of a clone's germline V gene source is challenging because there is a high degree of similarity between different germline V genes. This difficulty is compounded in antibodies, which can undergo somatic hypermutation. Furthermore, high-throughput sequencing experiments often generate partial sequences and have significant error rates. To address these issues, we describe a novel method to estimate which germline V genes (or alleles) cannot be discriminated under different conditions (read lengths, sequencing errors or somatic hypermutation frequencies). Starting with any set of germline V genes, this method measures their similarity using different sequencing lengths and calculates their likelihood of unambiguous assignment under different levels of mutation. Hence, one can identify, under different experimental and biological conditions, the germline V genes (or alleles) that cannot be uniquely identified and bundle them together into groups of specific V genes with highly similar sequences. Copyright © 2015 Elsevier B.V. All rights reserved.

  10. Divergence of Structure and Function in the Haloacid Dehalogenase Enzyme Superfamily: Bacteroides thetaiotaomicron BT2127 is an Inorganic Pyrophosphatase+

    PubMed Central

    Huang, Hua; Yury, Patskovsky; Toro, Rafael; Farelli, Jeremiah D.; Pandya, Chetanya; Almo, Steven C.; Allen, Karen N.; Dunaway-Mariano, Debra

    2012-01-01

    The explosion of protein sequence information requires that current strategies for function assignment must evolve to complement experimental approaches with computationally-based function prediction. This necessitates the development of strategies based on the identification of sequence markers in the form of specificity determinants and a more informed definition of orthologues. Herein, we have undertaken the function assignment of the unknown Haloalkanoate Dehalogenase superfamily member BT2127 (Uniprot accession # Q8A5V9) from Bacteroides thetaiotaomicron using an integrated bioinformatics/structure/mechanism approach. The substrate specificity profile and steady-state rate constants of BT2127 (with kcat/Km value for pyrophosphate of ∼1 × 105 M−1 s−1), together with the gene context, supports the assigned in vivo function as an inorganic pyrophosphatase. The X-ray structural analysis of the wild-type BT2127 and several variants generated by site-directed mutagenesis shows that substrate discrimination is based, in part, on active site space restrictions imposed by the cap domain (specifically by residues Tyr76 and Glu47). Structure guided site directed mutagenesis coupled with kinetic analysis of the mutant enzymes identified the residues required for catalysis, substrate binding, and domain-domain association. Based on this structure-function analysis, the catalytic residues Asp11, Asp13, Thr113, and Lys147 as well the metal binding residues Asp171, Asn172 and Glu47 were used as markers to confirm BT2127 orthologues identified via sequence searches. This bioinformatic analysis demonstrated that the biological range of BT2127 orthologue is restricted to the phylum Bacteroidetes/Chlorobi. The key structural determinants in the divergence of BT2127 and its closest homologue β-phosphoglucomutase control the leaving group size (phosphate vs. glucose-phosphate) and the position of the Asp acid/base in the open vs. closed conformations. HADSF pyrophosphatases represent a third mechanistic and fold type for bacterial pyrophosphatases. PMID:21894910

  11. Sequence-structure mapping errors in the PDB: OB-fold domains

    PubMed Central

    Venclovas, Česlovas; Ginalski, Krzysztof; Kang, Chulhee

    2004-01-01

    The Protein Data Bank (PDB) is the single most important repository of structural data for proteins and other biologically relevant molecules. Therefore, it is critically important to keep the PDB data, as much as possible, error-free. In this study, we have analyzed PDB crystal structures possessing oligonucleotide/oligosaccharide binding (OB)-fold, one of the highly populated folds, for the presence of sequence-structure mapping errors. Using energy-based structure quality assessment coupled with sequence analyses, we have found that there are at least five OB-structures in the PDB that have regions where sequences have been incorrectly mapped onto the structure. We have demonstrated that the combination of these computation techniques is effective not only in detecting sequence-structure mapping errors, but also in providing guidance to correct them. Namely, we have used results of computational analysis to direct a revision of X-ray data for one of the PDB entries containing a fairly inconspicuous sequence-structure mapping error. The revised structure has been deposited with the PDB. We suggest use of computational energy assessment and sequence analysis techniques to facilitate structure determination when homologs having known structure are available to use as a reference. Such computational analysis may be useful in either guiding the sequence-structure assignment process or verifying the sequence mapping within poorly defined regions. PMID:15133161

  12. Phylogenetic position of parabasalid symbionts from the termite Calotermes flavicollis based on small subunit rRNA sequences.

    PubMed

    Gerbod, D; Edgcomb, V P; Noël, C; Delgado-Viscogliosi, P; Viscogliosi, E

    2000-09-01

    Small subunit rDNA genes were amplified by polymerase chain reaction using specific primers from mixed-population DNA obtained from the whole hindgut of the termite Calotermes flavicollis. Comparative sequence analysis of the clones revealed two kinds of sequences that were both from parabasalid symbionts. In a molecular tree inferred by distance, parsimony and likelihood methods, and including 27 parabasalid sequences retrieved from the data bases, the sequences of the group II (clones Cf5 and Cf6) were closely related to the Devescovinidae/Calonymphidae species and thus were assigned to the Devescovinidae Foaina. The sequence of the group I (clone Cf1) emerged within the Trichomonadinae and strongly clustered with Tetratrichomonas gallinarum. On the basis of morphological data, the Monocercomonadidae Hexamastix termitis might be the most likely origin of this sequence.

  13. MALDI Top-Down sequencing: calling N- and C-terminal protein sequences with high confidence and speed.

    PubMed

    Suckau, Detlev; Resemann, Anja

    2009-12-01

    The ability to match Top-Down protein sequencing (TDS) results by MALDI-TOF to protein sequences by classical protein database searching was evaluated in this work. Resulting from these analyses were the protein identity, the simultaneous assignment of the N- and C-termini and protein sequences of up to 70 residues from either terminus. In combination with de novo sequencing using the MALDI-TDS data, even fusion proteins were assigned and the detailed sequence around the fusion site was elucidated. MALDI-TDS allowed to efficiently match protein sequences quickly and to validate recombinant protein structures-in particular, protein termini-on the level of undigested proteins.

  14. Accurate Sample Assignment in a Multiplexed, Ultrasensitive, High-Throughput Sequencing Assay for Minimal Residual Disease.

    PubMed

    Bartram, Jack; Mountjoy, Edward; Brooks, Tony; Hancock, Jeremy; Williamson, Helen; Wright, Gary; Moppett, John; Goulden, Nick; Hubank, Mike

    2016-07-01

    High-throughput sequencing (HTS) (next-generation sequencing) of the rearranged Ig and T-cell receptor genes promises to be less expensive and more sensitive than current methods of monitoring minimal residual disease (MRD) in patients with acute lymphoblastic leukemia. However, the adoption of new approaches by clinical laboratories requires careful evaluation of all potential sources of error and the development of strategies to ensure the highest accuracy. Timely and efficient clinical use of HTS platforms will depend on combining multiple samples (multiplexing) in each sequencing run. Here we examine the Ig heavy-chain gene HTS on the Illumina MiSeq platform for MRD. We identify errors associated with multiplexing that could potentially impact the accuracy of MRD analysis. We optimize a strategy that combines high-purity, sequence-optimized oligonucleotides, dual indexing, and an error-aware demultiplexing approach to minimize errors and maximize sensitivity. We present a probability-based, demultiplexing pipeline Error-Aware Demultiplexer that is suitable for all MiSeq strategies and accurately assigns samples to the correct identifier without excessive loss of data. Finally, using controls quantified by digital PCR, we show that HTS-MRD can accurately detect as few as 1 in 10(6) copies of specific leukemic MRD. Crown Copyright © 2016. Published by Elsevier Inc. All rights reserved.

  15. Comparative Analysis of Transcriptomes of Macrophage Revealing the Mechanism of the Immunoregulatory Activities of a Novel Polysaccharide Isolated from Boletus speciosus Frost

    PubMed Central

    Ding, Xiang; Zhu, Hongqing; Hou, Yiling; Hou, Wanru; Zhang, Nan; Fu, Lei

    2017-01-01

    Background: The mechanism of the immunoregulatory activities of polysaccharide is still not clear. Materials and Methods: Here, we performed the B-cell, T-cell, and macrophage cell proliferation, the cell cycle analysis of macrophage cells, sequenced the transcriptomes of control group macrophages, and Boletus speciosus Frost polysaccharide (BSF-1) group macrophages using Illumina sequencing technology to identify differentially expressed genes (DEGs) to determine the molecular mechanisms of immunomodulatory activity of BSF-1 in macrophages. Results: These results suggested that BSF-1 could promote the proliferation of B-cell, T-cell, and macrophages, promote the proliferation of macrophage cells by abolishing cell cycle arrests in the G0/G1 phases, and promote cell cycle progression in S-phase and G2/M phase, which might induce cell division. A total of 12,498,414 and 11,840,624 bp paired-end reads were obtained for the control group and BSF-1 group, respectively, and they corresponded to a total size of 12.5 G bp and 11.8 G bp, respectively, after the low-quality reads and adapter sequences were removed. Approximately 81.83% of the total number of genes (8,257) were expressed reads per kilobase per million mapped reads (RPKM ≥1) and more than 1366 genes were highly expressed (RPKM >60) in the BSF-1 group. A gene ontology-enrichment analysis generated 13,042 assignments to cellular components, 13,094 assignments to biological processes, and 13,135 assignments to molecular functions. A Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis showed that the mitogen-activated protein kinase (MAPK) signaling pathways are significantly enriched for DEGs between the two cell groups. Conclusion: An analysis of transcriptome resources enabled us to examine gene expression profiles, verify differential gene expression, and select candidate signaling pathways as the mechanisms of the immunomodulatory activity of BSF-1. Based on the experimental data, we believe that the significant antitumor activities of BSF-1 in vivo mainly involve the MAPK signaling pathways. SUMMARY Boletus speciosus Frost-1 (BSF-1) could promote the proliferation of B-cell, T-cell, and macrophages, promote the proliferation of macrophage cells by abolishing cell cycle arrests in the G0/G1 phases, and promote cell cycle progression in S-phase and G2/M phase, which might induce cell divisionApproximately 81.83% of the total number of genes (8257) were expressed (reads per kilobase per million mapped reads [RPKM] =1) and more than 1366 genes were highly expressed (RPKM >60) in the BSF-1 groupA gene ontology-enrichment analysis generated 13,042 assignments to cellular components, 13,094 assignments to biological processes, and 13,135 assignments to molecular functionsA Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis showed that the mitogen-activated protein kinase signaling pathways are significantly enriched for DEGs between the two cell groups. Abbreviations used: BSF-1: Boletus speciosus Frost polysaccharide. PMID:28839373

  16. Metabarcoding of marine nematodes – evaluation of reference datasets used in tree-based taxonomy assignment approach

    PubMed Central

    2016-01-01

    Abstract Background Metabarcoding is becoming a common tool used to assess and compare diversity of organisms in environmental samples. Identification of OTUs is one of the critical steps in the process and several taxonomy assignment methods were proposed to accomplish this task. This publication evaluates the quality of reference datasets, alongside with several alignment and phylogeny inference methods used in one of the taxonomy assignment methods, called tree-based approach. This approach assigns anonymous OTUs to taxonomic categories based on relative placements of OTUs and reference sequences on the cladogram and support that these placements receive. New information In tree-based taxonomy assignment approach, reliable identification of anonymous OTUs is based on their placement in monophyletic and highly supported clades together with identified reference taxa. Therefore, it requires high quality reference dataset to be used. Resolution of phylogenetic trees is strongly affected by the presence of erroneous sequences as well as alignment and phylogeny inference methods used in the process. Two preparation steps are essential for the successful application of tree-based taxonomy assignment approach. Curated collections of genetic information do include erroneous sequences. These sequences have detrimental effect on the resolution of cladograms used in tree-based approach. They must be identified and excluded from the reference dataset beforehand. Various combinations of multiple sequence alignment and phylogeny inference methods provide cladograms with different topology and bootstrap support. These combinations of methods need to be tested in order to determine the one that gives highest resolution for the particular reference dataset. Completing the above mentioned preparation steps is expected to decrease the number of unassigned OTUs and thus improve the results of the tree-based taxonomy assignment approach. PMID:27932919

  17. Metabarcoding of marine nematodes - evaluation of reference datasets used in tree-based taxonomy assignment approach.

    PubMed

    Holovachov, Oleksandr

    2016-01-01

    Metabarcoding is becoming a common tool used to assess and compare diversity of organisms in environmental samples. Identification of OTUs is one of the critical steps in the process and several taxonomy assignment methods were proposed to accomplish this task. This publication evaluates the quality of reference datasets, alongside with several alignment and phylogeny inference methods used in one of the taxonomy assignment methods, called tree-based approach. This approach assigns anonymous OTUs to taxonomic categories based on relative placements of OTUs and reference sequences on the cladogram and support that these placements receive. In tree-based taxonomy assignment approach, reliable identification of anonymous OTUs is based on their placement in monophyletic and highly supported clades together with identified reference taxa. Therefore, it requires high quality reference dataset to be used. Resolution of phylogenetic trees is strongly affected by the presence of erroneous sequences as well as alignment and phylogeny inference methods used in the process. Two preparation steps are essential for the successful application of tree-based taxonomy assignment approach. Curated collections of genetic information do include erroneous sequences. These sequences have detrimental effect on the resolution of cladograms used in tree-based approach. They must be identified and excluded from the reference dataset beforehand.Various combinations of multiple sequence alignment and phylogeny inference methods provide cladograms with different topology and bootstrap support. These combinations of methods need to be tested in order to determine the one that gives highest resolution for the particular reference dataset.Completing the above mentioned preparation steps is expected to decrease the number of unassigned OTUs and thus improve the results of the tree-based taxonomy assignment approach.

  18. DNA Barcodes for Species Identification in the Hyperdiverse Ant Genus Pheidole (Formicidae: Myrmicinae)

    PubMed Central

    Ng'endo, R.N.; Osiemo, Z.B.; Brandl, R.

    2013-01-01

    DNA sequencing is increasingly being used to assist in species identification in order to overcome taxonomic impediment. However, few studies attempt to compare the results of these molecular studies with a more traditional species delineation approach based on morphological characters. Mitochondrial DNA Cytochrome oxidase subunit 1 (CO1) gene was sequenced, measuring 636 base pairs, from 47 ants of the genus Pheidole (Formicidae: Myrmicinae) collected in the Brazilian Atlantic Forest to test whether the morphology-based assignment of individuals into species is supported by DNA-based species delimitation. Twenty morphospecies were identified, whereas the barcoding analysis identified 19 Molecular Operational Taxonomic Units (MOTUs). Fifteen out of the 19 DNA-based clusters allocated, using sequence divergence thresholds of 2% and 3%, matched with morphospecies. Both thresholds yielded the same number of MOTUs. Only one MOTU was successfully identified to species level using the CO1 sequences of Pheidole species already in the Genbank. The average pairwise sequence divergence for all 47 sequences was 19%, ranging between 0–25%. In some cases, however, morphology and molecular based methods differed in their assignment of individuals to morphospecies or MOTUs. The occurrence of distinct mitochondrial lineages within morphological species highlights groups for further detailed genetic and morphological studies, and therefore a pluralistic approach using several methods to understand the taxonomy of difficult lineages is advocated. PMID:23902257

  19. Collaborative Learning through Formative Peer Review with Technology

    ERIC Educational Resources Information Center

    Eaton, Carrie Diaz; Wade, Stephanie

    2014-01-01

    This paper describes a collaboration between a mathematician and a compositionist who developed a sequence of collaborative writing assignments for calculus. This sequence of developmentally appropriate assignments presents peer review as a collaborative process that promotes reflection, deepens understanding, and improves exposition. First, we…

  20. Burkholderia cordobensis sp. nov., from agricultural soils.

    PubMed

    Draghi, Walter O; Peeters, Charlotte; Cnockaert, Margo; Snauwaert, Cindy; Wall, Luis G; Zorreguieta, Angeles; Vandamme, Peter

    2014-06-01

    Two Gram-negative, rod-shaped bacteria were isolated from agricultural soils in Córdoba province in central Argentina. Their 16S rRNA gene sequences demonstrated that they belong to the genus Burkholderia, with Burkholderia zhejiangensis as most closely related formally named species; this relationship was confirmed through comparative gyrB sequence analysis. Whole-cell fatty acid analysis supported their assignment to the genus Burkholderia. Burkholderia sp. strain YI23, for which a whole-genome sequence is available, represents the same taxon, as demonstrated by its highly similar 16S rRNA (100% similarity) and gyrB (99.1-99.7%) gene sequences. The results of DNA-DNA hybridization experiments and physiological and biochemical characterization further substantiated the genotypic and phenotypic distinctiveness of the Argentinian soil isolates, for which the name Burkholderia cordobensis sp. nov. is proposed, with strain MMP81(T) ( = LMG 27620(T) = CCUG 64368(T)) as the type strain. © 2014 IUMS.

  1. Bifidobacterium aquikefiri sp. nov., isolated from water kefir.

    PubMed

    Laureys, David; Cnockaert, Margo; De Vuyst, Luc; Vandamme, Peter

    2016-03-01

    A novel Bifidobacterium , strain LMG 28769 T , was isolated from a household water kefir fermentation process. Cells were Gram-stain-positive, non-motile, non-spore-forming, catalase-negative, oxidase-negative and facultatively anaerobic short rods. Analysis of its 16S rRNA gene sequence revealed Bifidobacterium crudilactis and Bifidobacterium psychraerophilum (97.4 and 97.1 % similarity towards the respective type strain sequences) as nearest phylogenetic neighbours. Its assignment to the genus Bifidobacterium was confirmed by the presence of fructose 6-phosphate phosphoketolase activity. Analysis of the hsp60 gene sequence revealed very low similarity with nucleotide sequences in the NCBI nucleotide database. The genotypic and phenotypic analyses allowed the differentiation of strain LMG 28769 T from all recognized Bifidobacterium species. Strain LMG 28769 T ( = CCUG 67145 T  = R 54638 T ) therefore represents a novel species, for which the name Bifidobacterium aquikefiri sp. nov. is proposed.

  2. Fuzzy cluster analysis of simple physicochemical properties of amino acids for recognizing secondary structure in proteins.

    PubMed Central

    Mocz, G.

    1995-01-01

    Fuzzy cluster analysis has been applied to the 20 amino acids by using 65 physicochemical properties as a basis for classification. The clustering products, the fuzzy sets (i.e., classical sets with associated membership functions), have provided a new measure of amino acid similarities for use in protein folding studies. This work demonstrates that fuzzy sets of simple molecular attributes, when assigned to amino acid residues in a protein's sequence, can predict the secondary structure of the sequence with reasonable accuracy. An approach is presented for discriminating standard folding states, using near-optimum information splitting in half-overlapping segments of the sequence of assigned membership functions. The method is applied to a nonredundant set of 252 proteins and yields approximately 73% matching for correctly predicted and correctly rejected residues with approximately 60% overall success rate for the correctly recognized ones in three folding states: alpha-helix, beta-strand, and coil. The most useful attributes for discriminating these states appear to be related to size, polarity, and thermodynamic factors. Van der Waals volume, apparent average thickness of surrounding molecular free volume, and a measure of dimensionless surface electron density can explain approximately 95% of prediction results. hydrogen bonding and hydrophobicity induces do not yet enable clear clustering and prediction. PMID:7549882

  3. The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v.4)

    DOE PAGES

    Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos; ...

    2016-02-24

    The DOE-JGI Metagenome Annotation Pipeline (MAP v.4) performs structural and functional annotation for metagenomic sequences that are submitted to the Integrated Microbial Genomes with Microbiomes (IMG/M) system for comparative analysis. The pipeline runs on nucleotide sequences provide d via the IMG submission site. Users must first define their analysis projects in GOLD and then submit the associated sequence datasets consisting of scaffolds/contigs with optional coverage information and/or unassembled reads in fasta and fastq file formats. The MAP processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNAs, as well as CRISPR elements. Structural annotation ismore » followed by functional annotation including assignment of protein product names and connection to various protein family databases.« less

  4. The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v.4)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos

    The DOE-JGI Metagenome Annotation Pipeline (MAP v.4) performs structural and functional annotation for metagenomic sequences that are submitted to the Integrated Microbial Genomes with Microbiomes (IMG/M) system for comparative analysis. The pipeline runs on nucleotide sequences provide d via the IMG submission site. Users must first define their analysis projects in GOLD and then submit the associated sequence datasets consisting of scaffolds/contigs with optional coverage information and/or unassembled reads in fasta and fastq file formats. The MAP processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNAs, as well as CRISPR elements. Structural annotation ismore » followed by functional annotation including assignment of protein product names and connection to various protein family databases.« less

  5. The first initiative of DNA barcoding of ornamental plants from Egypt and potential applications in horticulture industry

    PubMed Central

    Ashfaq, Muhammad; Ali, Hayssam M.; Yessoufou, Kowiyou

    2017-01-01

    DNA barcoding relies on short and standardized gene regions to identify species. The agricultural and horticultural applications of barcoding such as for marketplace regulation and copyright protection remain poorly explored. This study examines the effectiveness of the standard plant barcode markers (matK and rbcL) for the identification of plant species in private and public nurseries in northern Egypt. These two markers were sequenced from 225 specimens of 161 species and 62 plant families of horticultural importance. The sequence recovery was similar for rbcL (96.4%) and matK (84%), but the number of specimens assigned correctly to the respective genera and species was lower for rbcL (75% and 29%) than matK (85% and 40%). The combination of rbcL and matK brought the number of correct generic and species assignments to 83.4% and 40%, respectively. Individually, the efficiency of both markers varied among different plant families; for example, all palm specimens (Arecaceae) were correctly assigned to species while only one individual of Asteraceae was correctly assigned to species. Further, barcodes reliably assigned ornamental horticultural and medicinal plants correctly to genus while they showed a lower or no success in assigning these plants to species and cultivars. For future, we recommend the combination of a complementary barcode (e.g. ITS or trnH-psbA) with rbcL + matK to increase the performance of taxa identification. By aiding species identification of horticultural crops and ornamental palms, the analysis of the barcode regions will have large impact on horticultural industry. PMID:28199378

  6. The first initiative of DNA barcoding of ornamental plants from Egypt and potential applications in horticulture industry.

    PubMed

    O Elansary, Hosam; Ashfaq, Muhammad; Ali, Hayssam M; Yessoufou, Kowiyou

    2017-01-01

    DNA barcoding relies on short and standardized gene regions to identify species. The agricultural and horticultural applications of barcoding such as for marketplace regulation and copyright protection remain poorly explored. This study examines the effectiveness of the standard plant barcode markers (matK and rbcL) for the identification of plant species in private and public nurseries in northern Egypt. These two markers were sequenced from 225 specimens of 161 species and 62 plant families of horticultural importance. The sequence recovery was similar for rbcL (96.4%) and matK (84%), but the number of specimens assigned correctly to the respective genera and species was lower for rbcL (75% and 29%) than matK (85% and 40%). The combination of rbcL and matK brought the number of correct generic and species assignments to 83.4% and 40%, respectively. Individually, the efficiency of both markers varied among different plant families; for example, all palm specimens (Arecaceae) were correctly assigned to species while only one individual of Asteraceae was correctly assigned to species. Further, barcodes reliably assigned ornamental horticultural and medicinal plants correctly to genus while they showed a lower or no success in assigning these plants to species and cultivars. For future, we recommend the combination of a complementary barcode (e.g. ITS or trnH-psbA) with rbcL + matK to increase the performance of taxa identification. By aiding species identification of horticultural crops and ornamental palms, the analysis of the barcode regions will have large impact on horticultural industry.

  7. Protein evolution analysis of S-hydroxynitrile lyase by complete sequence design utilizing the INTMSAlign software.

    PubMed

    Nakano, Shogo; Asano, Yasuhisa

    2015-02-03

    Development of software and methods for design of complete sequences of functional proteins could contribute to studies of protein engineering and protein evolution. To this end, we developed the INTMSAlign software, and used it to design functional proteins and evaluate their usefulness. The software could assign both consensus and correlation residues of target proteins. We generated three protein sequences with S-selective hydroxynitrile lyase (S-HNL) activity, which we call designed S-HNLs; these proteins folded as efficiently as the native S-HNL. Sequence and biochemical analysis of the designed S-HNLs suggested that accumulation of neutral mutations occurs during the process of S-HNLs evolution from a low-activity form to a high-activity (native) form. Taken together, our results demonstrate that our software and the associated methods could be applied not only to design of complete sequences, but also to predictions of protein evolution, especially within families such as esterases and S-HNLs.

  8. Protein evolution analysis of S-hydroxynitrile lyase by complete sequence design utilizing the INTMSAlign software

    NASA Astrophysics Data System (ADS)

    Nakano, Shogo; Asano, Yasuhisa

    2015-02-01

    Development of software and methods for design of complete sequences of functional proteins could contribute to studies of protein engineering and protein evolution. To this end, we developed the INTMSAlign software, and used it to design functional proteins and evaluate their usefulness. The software could assign both consensus and correlation residues of target proteins. We generated three protein sequences with S-selective hydroxynitrile lyase (S-HNL) activity, which we call designed S-HNLs; these proteins folded as efficiently as the native S-HNL. Sequence and biochemical analysis of the designed S-HNLs suggested that accumulation of neutral mutations occurs during the process of S-HNLs evolution from a low-activity form to a high-activity (native) form. Taken together, our results demonstrate that our software and the associated methods could be applied not only to design of complete sequences, but also to predictions of protein evolution, especially within families such as esterases and S-HNLs.

  9. Pfarao: a web application for protein family analysis customized for cytoskeletal and motor proteins (CyMoBase).

    PubMed

    Odronitz, Florian; Kollmar, Martin

    2006-11-29

    Annotation of protein sequences of eukaryotic organisms is crucial for the understanding of their function in the cell. Manual annotation is still by far the most accurate way to correctly predict genes. The classification of protein sequences, their phylogenetic relation and the assignment of function involves information from various sources. This often leads to a collection of heterogeneous data, which is hard to track. Cytoskeletal and motor proteins consist of large and diverse superfamilies comprising up to several dozen members per organism. Up to date there is no integrated tool available to assist in the manual large-scale comparative genomic analysis of protein families. Pfarao (Protein Family Application for Retrieval, Analysis and Organisation) is a database driven online working environment for the analysis of manually annotated protein sequences and their relationship. Currently, the system can store and interrelate a wide range of information about protein sequences, species, phylogenetic relations and sequencing projects as well as links to literature and domain predictions. Sequences can be imported from multiple sequence alignments that are generated during the annotation process. A web interface allows to conveniently browse the database and to compile tabular and graphical summaries of its content. We implemented a protein sequence-centric web application to store, organize, interrelate, and present heterogeneous data that is generated in manual genome annotation and comparative genomics. The application has been developed for the analysis of cytoskeletal and motor proteins (CyMoBase) but can easily be adapted for any protein.

  10. Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis

    PubMed Central

    2012-01-01

    Background The multiplexing becomes the major limitation of the next-generation sequencing (NGS) in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Results Here we introduce pair-barcode sequencing (PBS), an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc.), 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. Conclusions By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand. PMID:22276739

  11. Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis.

    PubMed

    Tu, Jing; Ge, Qinyu; Wang, Shengqin; Wang, Lei; Sun, Beili; Yang, Qi; Bai, Yunfei; Lu, Zuhong

    2012-01-25

    The multiplexing becomes the major limitation of the next-generation sequencing (NGS) in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Here we introduce pair-barcode sequencing (PBS), an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc.), 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand.

  12. Yleaf: Software for Human Y-Chromosomal Haplogroup Inference from Next-Generation Sequencing Data.

    PubMed

    Ralf, Arwin; Montiel González, Diego; Zhong, Kaiyin; Kayser, Manfred

    2018-05-01

    Next-generation sequencing (NGS) technologies offer immense possibilities given the large genomic data they simultaneously deliver. The human Y-chromosome serves as good example how NGS benefits various applications in evolution, anthropology, genealogy, and forensics. Prior to NGS, the Y-chromosome phylogenetic tree consisted of a few hundred branches, based on NGS data, it now contains many thousands. The complexity of both, Y tree and NGS data provide challenges for haplogroup assignment. For effective analysis and interpretation of Y-chromosome NGS data, we present Yleaf, a publically available, automated, user-friendly software for high-resolution Y-chromosome haplogroup inference independently of library and sequencing methods.

  13. Library Design-Facilitated High-Throughput Sequencing of Synthetic Peptide Libraries.

    PubMed

    Vinogradov, Alexander A; Gates, Zachary P; Zhang, Chi; Quartararo, Anthony J; Halloran, Kathryn H; Pentelute, Bradley L

    2017-11-13

    A methodology to achieve high-throughput de novo sequencing of synthetic peptide mixtures is reported. The approach leverages shotgun nanoliquid chromatography coupled with tandem mass spectrometry-based de novo sequencing of library mixtures (up to 2000 peptides) as well as automated data analysis protocols to filter away incorrect assignments, noise, and synthetic side-products. For increasing the confidence in the sequencing results, mass spectrometry-friendly library designs were developed that enabled unambiguous decoding of up to 600 peptide sequences per hour while maintaining greater than 85% sequence identification rates in most cases. The reliability of the reported decoding strategy was additionally confirmed by matching fragmentation spectra for select authentic peptides identified from library sequencing samples. The methods reported here are directly applicable to screening techniques that yield mixtures of active compounds, including particle sorting of one-bead one-compound libraries and affinity enrichment of synthetic library mixtures performed in solution.

  14. Derivational Suffixes as Cues to Stress Position in Reading Greek

    ERIC Educational Resources Information Center

    Grimani, Aikaterini; Protopapas, Athanassios

    2017-01-01

    Background: In languages with lexical stress, reading aloud must include stress assignment. Stress information sources across languages include word-final letter sequences. Here, we examine whether such sequences account for stress assignment in Greek and whether this is attributable to absolute rules involving accenting morphemes or to…

  15. A System to Automatically Classify and Name Any Individual Genome-Sequenced Organism Independently of Current Biological Classification and Nomenclature

    PubMed Central

    Song, Yuhyun; Leman, Scotland; Monteil, Caroline L.; Heath, Lenwood S.; Vinatzer, Boris A.

    2014-01-01

    A broadly accepted and stable biological classification system is a prerequisite for biological sciences. It provides the means to describe and communicate about life without ambiguity. Current biological classification and nomenclature use the species as the basic unit and require lengthy and laborious species descriptions before newly discovered organisms can be assigned to a species and be named. The current system is thus inadequate to classify and name the immense genetic diversity within species that is now being revealed by genome sequencing on a daily basis. To address this lack of a general intra-species classification and naming system adequate for today’s speed of discovery of new diversity, we propose a classification and naming system that is exclusively based on genome similarity and that is suitable for automatic assignment of codes to any genome-sequenced organism without requiring any phenotypic or phylogenetic analysis. We provide examples demonstrating that genome similarity-based codes largely align with current taxonomic groups at many different levels in bacteria, animals, humans, plants, and viruses. Importantly, the proposed approach is only slightly affected by the order of code assignment and can thus provide codes that reflect similarity between organisms and that do not need to be revised upon discovery of new diversity. We envision genome similarity-based codes to complement current biological nomenclature and to provide a universal means to communicate unambiguously about any genome-sequenced organism in fields as diverse as biodiversity research, infectious disease control, human and microbial forensics, animal breed and plant cultivar certification, and human ancestry research. PMID:24586551

  16. Peer Assessment of Student-Produced Mechanics Lab Report Videos

    ERIC Educational Resources Information Center

    Douglas, Scott S.; Aiken, John M.; Lin, Shih-Yin; Greco, Edwin F.; Alicea-Muñoz, Emily; Schatz, Michael F.

    2017-01-01

    We examine changes in students' rating behavior during a semester-long sequence of peer evaluation laboratory exercises in an introductory mechanics course. We perform a quantitative analysis of the ratings given by students to peers' physics lab reports, and conduct interviews with students. We find that peers persistently assign higher ratings…

  17. A comparative study of ancient environmental DNA to pollen and macrofossils from lake sediments reveals taxonomic overlap and additional plant taxa

    NASA Astrophysics Data System (ADS)

    Pedersen, Mikkel Winther; Ginolhac, Aurélien; Orlando, Ludovic; Olsen, Jesper; Andersen, Kenneth; Holm, Jakob; Funder, Svend; Willerslev, Eske; Kjær, Kurt H.

    2013-09-01

    We use 2nd generation sequencing technology on sedimentary ancient DNA (sedaDNA) from a lake in South Greenland to reconstruct the local floristic history around a low-arctic lake and compare the results with those previously obtained from pollen and macrofossils in the same lake. Thirty-eight of thirty-nine samples from the core yielded putative DNA sequences. Using a multiple assignment strategy on the trnL g-h DNA barcode, consisting of two different phylogenetic and one sequence similarity assignment approaches, thirteen families of plants were identified, of which two (Scrophulariaceae and Asparagaceae) are absent from the pollen and macrofossil records. An age model for the sediment based on twelve radiocarbon dates establishes a chronology and shows that the lake record dates back to 10,650 cal yr BP. Our results suggest that sedaDNA analysis from lake sediments, although taxonomically less detailed than pollen and macrofossil analyses can be a complementary tool for establishing the composition of both terrestrial and aquatic local plant communities and a method for identifying additional taxa.

  18. Indigenous species barcode database improves the identification of zooplankton

    PubMed Central

    Yang, Jianghua; Zhang, Wanwan; Sun, Jingying; Xie, Yuwei; Zhang, Yimin; Burton, G. Allen; Yu, Hongxia

    2017-01-01

    Incompleteness and inaccuracy of DNA barcode databases is considered an important hindrance to the use of metabarcoding in biodiversity analysis of zooplankton at the species-level. Species barcoding by Sanger sequencing is inefficient for organisms with small body sizes, such as zooplankton. Here mitochondrial cytochrome c oxidase I (COI) fragment barcodes from 910 freshwater zooplankton specimens (87 morphospecies) were recovered by a high-throughput sequencing platform, Ion Torrent PGM. Intraspecific divergence of most zooplanktons was < 5%, except Branchionus leydign (Rotifer, 14.3%), Trichocerca elongate (Rotifer, 11.5%), Lecane bulla (Rotifer, 15.9%), Synchaeta oblonga (Rotifer, 5.95%) and Schmackeria forbesi (Copepod, 6.5%). Metabarcoding data of 28 environmental samples from Lake Tai were annotated by both an indigenous database and NCBI Genbank database. The indigenous database improved the taxonomic assignment of metabarcoding of zooplankton. Most zooplankton (81%) with barcode sequences in the indigenous database were identified by metabarcoding monitoring. Furthermore, the frequency and distribution of zooplankton were also consistent between metabarcoding and morphology identification. Overall, the indigenous database improved the taxonomic assignment of zooplankton. PMID:28977035

  19. Comparative analysis of bacteria associated with different mosses by 16S rRNA and 16S rDNA sequencing.

    PubMed

    Tian, Yang; Li, Yan Hong

    2017-01-01

    To understand the differences of the bacteria associated with different mosses, a phylogenetic study of bacterial communities in three mosses was carried out based on 16S rDNA and 16S rRNA sequencing. The mosses used were Hygroamblystegium noterophilum, Entodon compressus and Grimmia montana, representing hygrophyte, shady plant and xerophyte, respectively. In total, the operational taxonomic units (OTUs), richness and diversity were different regardless of the moss species and the library level. All the examined 1183 clones were assigned to 248 OTUs, 56 genera were assigned in rDNA libraries and 23 genera were determined at the rRNA level. Proteobacteria and Bacteroidetes were considered as the most dominant phyla in all the libraries, whereas abundant Actinobacteria and Acidobacteria were detected in the rDNA library of Entodon compressus and approximately 24.7% clones were assigned to Candidate division TM7 in Grimmia montana at rRNA level. The heatmap showed the bacterial profiles derived from rRNA and rDNA were partly overlapping. However, the principle component analysis of all the profiles derived from rDNA showed sharper differences between the different mosses than that of rRNA-based profiles. This suggests that the metabolically active bacterial compositions in different mosses were more phylogenetically similar and the differences of the bacteria associated with different mosses were mainly detected at the rDNA level. Obtained results clearly demonstrate that combination of 16S rDNA and 16S rRNA sequencing is preferred approach to have a good understanding on the constitution of the microbial communities in mosses. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. Flexible taxonomic assignment of ambiguous sequencing reads

    PubMed Central

    2011-01-01

    Background To characterize the diversity of bacterial populations in metagenomic studies, sequencing reads need to be accurately assigned to taxonomic units in a given reference taxonomy. Reads that cannot be reliably assigned to a unique leaf in the taxonomy (ambiguous reads) are typically assigned to the lowest common ancestor of the set of species that match it. This introduces a potentially severe error in the estimation of bacteria present in the sample due to false positives, since all species in the subtree rooted at the ancestor are implicitly assigned to the read even though many of them may not match it. Results We present a method that maps each read to a node in the taxonomy that minimizes a penalty score while balancing the relevance of precision and recall in the assignment through a parameter q. This mapping can be obtained in time linear in the number of matching sequences, because LCA queries to the reference taxonomy take constant time. When applied to six different metagenomic datasets, our algorithm produces different taxonomic distributions depending on whether coverage or precision is maximized. Including information on the quality of the reads reduces the number of unassigned reads but increases the number of ambiguous reads, stressing the relevance of our method. Finally, two measures of performance are described and results with a set of artificially generated datasets are discussed. Conclusions The assignment strategy of sequencing reads introduced in this paper is a versatile and a quick method to study bacterial communities. The bacterial composition of the analyzed samples can vary significantly depending on how ambiguous reads are assigned depending on the value of the q parameter. Validation of our results in an artificial dataset confirm that a combination of values of q produces the most accurate results. PMID:21211059

  1. Routine HLA-B genotyping with PCR-sequence-specific oligonucleotides detects a B*52 variant (B*5206).

    PubMed

    Hoelsch, K; Lenggeler, I; Pfannes, W; Knabe, H; Klein, H-G; Woelpl, A

    2005-05-01

    A new human leukocyte antigen (HLA)-B allele was found during routine typing of samples for a German unrelated bone marrow donor registry, the "Aktion Knochenmarkspende Bayern". After first interpretation of data of two independent low-resolution sequence-specific oligonucleotide typing tests, a B*51 variant was suggested. Further analysis via sequence-based typing identified the sequence as new B*52 allele. This new allele officially assigned as B*5206 differs from HLA-B*520102 by one nucleotide exchange in exon 2. The mutation is located at nucleotide position 274, at which a cytosine is substituted by a thymine leading to an amino acid change at protein position 67 from serine (TCC) to phenylalanine (TTC).

  2. Canis mtDNA HV1 database: a web-based tool for collecting and surveying Canis mtDNA HV1 haplotype in public database.

    PubMed

    Thai, Quan Ke; Chung, Dung Anh; Tran, Hoang-Dung

    2017-06-26

    Canine and wolf mitochondrial DNA haplotypes, which can be used for forensic or phylogenetic analyses, have been defined in various schemes depending on the region analyzed. In recent studies, the 582 bp fragment of the HV1 region is most commonly used. 317 different canine HV1 haplotypes have been reported in the rapidly growing public database GenBank. These reported haplotypes contain several inconsistencies in their haplotype information. To overcome this issue, we have developed a Canis mtDNA HV1 database. This database collects data on the HV1 582 bp region in dog mitochondrial DNA from the GenBank to screen and correct the inconsistencies. It also supports users in detection of new novel mutation profiles and assignment of new haplotypes. The Canis mtDNA HV1 database (CHD) contains 5567 nucleotide entries originating from 15 subspecies in the species Canis lupus. Of these entries, 3646 were haplotypes and grouped into 804 distinct sequences. 319 sequences were recognized as previously assigned haplotypes, while the remaining 485 sequences had new mutation profiles and were marked as new haplotype candidates awaiting further analysis for haplotype assignment. Of the 3646 nucleotide entries, only 414 were annotated with correct haplotype information, while 3232 had insufficient or lacked haplotype information and were corrected or modified before storing in the CHD. The CHD can be accessed at http://chd.vnbiology.com . It provides sequences, haplotype information, and a web-based tool for mtDNA HV1 haplotyping. The CHD is updated monthly and supplies all data for download. The Canis mtDNA HV1 database contains information about canine mitochondrial DNA HV1 sequences with reconciled annotation. It serves as a tool for detection of inconsistencies in GenBank and helps identifying new HV1 haplotypes. Thus, it supports the scientific community in naming new HV1 haplotypes and to reconcile existing annotation of HV1 582 bp sequences.

  3. Illumina MiSeq Sequencing for Preliminary Analysis of Microbiome Causing Primary Endodontic Infections in Egypt

    PubMed Central

    Azab, Marwa Mohamed; Fayyad, Dalia Mukhtar

    2018-01-01

    The use of high throughput next generation technologies has allowed more comprehensive analysis than traditional Sanger sequencing. The specific aim of this study was to investigate the microbial diversity of primary endodontic infections using Illumina MiSeq sequencing platform in Egyptian patients. Samples were collected from 19 patients in Suez Canal University Hospital (Endodontic Department) using sterile # 15K file and paper points. DNA was extracted using Mo Bio power soil DNA isolation extraction kit followed by PCR amplification and agarose gel electrophoresis. The microbiome was characterized on the basis of the V3 and V4 hypervariable region of the 16S rRNA gene by using paired-end sequencing on Illumina MiSeq device. MOTHUR software was used in sequence filtration and analysis of sequenced data. A total of 1858 operational taxonomic units at 97% similarity were assigned to 26 phyla, 245 families, and 705 genera. Four main phyla Firmicutes, Bacteroidetes, Proteobacteria, and Synergistetes were predominant in all samples. At genus level, Prevotella, Bacillus, Porphyromonas, Streptococcus, and Bacteroides were the most abundant. Illumina MiSeq platform sequencing can be used to investigate oral microbiome composition of endodontic infections. Elucidating the ecology of endodontic infections is a necessary step in developing effective intracanal antimicrobials. PMID:29849646

  4. VDJServer: A Cloud-Based Analysis Portal and Data Commons for Immune Repertoire Sequences and Rearrangements.

    PubMed

    Christley, Scott; Scarborough, Walter; Salinas, Eddie; Rounds, William H; Toby, Inimary T; Fonner, John M; Levin, Mikhail K; Kim, Min; Mock, Stephen A; Jordan, Christopher; Ostmeyer, Jared; Buntzman, Adam; Rubelt, Florian; Davila, Marco L; Monson, Nancy L; Scheuermann, Richard H; Cowell, Lindsay G

    2018-01-01

    Recent technological advances in immune repertoire sequencing have created tremendous potential for advancing our understanding of adaptive immune response dynamics in various states of health and disease. Immune repertoire sequencing produces large, highly complex data sets, however, which require specialized methods and software tools for their effective analysis and interpretation. VDJServer is a cloud-based analysis portal for immune repertoire sequence data that provide access to a suite of tools for a complete analysis workflow, including modules for preprocessing and quality control of sequence reads, V(D)J gene segment assignment, repertoire characterization, and repertoire comparison. VDJServer also provides sophisticated visualizations for exploratory analysis. It is accessible through a standard web browser via a graphical user interface designed for use by immunologists, clinicians, and bioinformatics researchers. VDJServer provides a data commons for public sharing of repertoire sequencing data, as well as private sharing of data between users. We describe the main functionality and architecture of VDJServer and demonstrate its capabilities with use cases from cancer immunology and autoimmunity. VDJServer provides a complete analysis suite for human and mouse T-cell and B-cell receptor repertoire sequencing data. The combination of its user-friendly interface and high-performance computing allows large immune repertoire sequencing projects to be analyzed with no programming or software installation required. VDJServer is a web-accessible cloud platform that provides access through a graphical user interface to a data management infrastructure, a collection of analysis tools covering all steps in an analysis, and an infrastructure for sharing data along with workflows, results, and computational provenance. VDJServer is a free, publicly available, and open-source licensed resource.

  5. Systematic Internal Transcribed Spacer Sequence Analysis for Identification of Clinical Mold Isolates in Diagnostic Mycology: a 5-Year Study▿ †

    PubMed Central

    Ciardo, Diana E.; Lucke, Katja; Imhof, Alex; Bloemberg, Guido V.; Böttger, Erik C.

    2010-01-01

    The implementation of internal transcribed spacer (ITS) sequencing for routine identification of molds in the diagnostic mycology laboratory was analyzed in a 5-year study. All mold isolates (n = 6,900) recovered in our laboratory from 2005 to 2009 were included in this study. According to a defined work flow, which in addition to troublesome phenotypic identification takes clinical relevance into account, 233 isolates were subjected to ITS sequence analysis. Sequencing resulted in successful identification for 78.6% of the analyzed isolates (57.1% at species level, 21.5% at genus level). In comparison, extended in-depth phenotypic characterization of the isolates subjected to sequencing achieved taxonomic assignment for 47.6% of these, with a mere 13.3% at species level. Optimization of DNA extraction further improved the efficacy of molecular identification. This study is the first of its kind to testify to the systematic implementation of sequence-based identification procedures in the routine workup of mold isolates in the diagnostic mycology laboratory. PMID:20573873

  6. Investigation of the protein osteocalcin of Camelops hesternus: Sequence, structure and phylogenetic implications

    NASA Astrophysics Data System (ADS)

    Humpula, James F.; Ostrom, Peggy H.; Gandhi, Hasand; Strahler, John R.; Walker, Angela K.; Stafford, Thomas W.; Smith, James J.; Voorhies, Michael R.; George Corner, R.; Andrews, Phillip C.

    2007-12-01

    Ancient DNA sequences offer an extraordinary opportunity to unravel the evolutionary history of ancient organisms. Protein sequences offer another reservoir of genetic information that has recently become tractable through the application of mass spectrometric techniques. The extent to which ancient protein sequences resolve phylogenetic relationships, however, has not been explored. We determined the osteocalcin amino acid sequence from the bone of an extinct Camelid (21 ka, Camelops hesternus) excavated from Isleta Cave, New Mexico and three bones of extant camelids: bactrian camel ( Camelus bactrianus); dromedary camel ( Camelus dromedarius) and guanaco ( Llama guanacoe) for a diagenetic and phylogenetic assessment. There was no difference in sequence among the four taxa. Structural attributes observed in both modern and ancient osteocalcin include a post-translation modification, Hyp 9, deamidation of Gln 35 and Gln 39, and oxidation of Met 36. Carbamylation of the N-terminus in ancient osteocalcin may result in blockage and explain previous difficulties in sequencing ancient proteins via Edman degradation. A phylogenetic analysis using osteocalcin sequences of 25 vertebrate taxa was conducted to explore osteocalcin protein evolution and the utility of osteocalcin sequences for delineating phylogenetic relationships. The maximum likelihood tree closely reflected generally recognized taxonomic relationships. For example, maximum likelihood analysis recovered rodents, birds and, within hominins, the Homo-Pan-Gorilla trichotomy. Within Artiodactyla, character state analysis showed that a substitution of Pro 4 for His 4 defines the Capra-Ovis clade within Artiodactyla. Homoplasy in our analysis indicated that osteocalcin evolution is not a perfect indicator of species evolution. Limited sequence availability prevented assigning functional significance to sequence changes. Our preliminary analysis of osteocalcin evolution represents an initial step towards a complete character analysis aimed at determining the evolutionary history of this functionally significant protein. We emphasize that ancient protein sequencing and phylogenetic analyses using amino acid sequences must pay close attention to post-translational modifications, amino acid substitutions due to diagenetic alteration and the impacts of isobaric amino acids on mass shifts and sequence alignments.

  7. Proteomics analysis of "Rovabiot Excel", a secreted protein cocktail from the filamentous fungus Penicillium funiculosum grown under industrial process fermentation.

    PubMed

    Guais, Olivier; Borderies, Gisèle; Pichereaux, Carole; Maestracci, Marc; Neugnot, Virginie; Rossignol, Michel; François, Jean Marie

    2008-12-01

    MS/MS techniques are well customized now for proteomic analysis, even for non-sequenced organisms, since peptide sequences obtained by these methods can be matched with those found in databases from closely related sequenced organisms. We used this approach to characterize the protein content of the "Rovabio Excel", an enzymatic cocktail produced by Penicillium funiculosum that is used as feed additive in animal nutrition. Protein separation by bi-dimensional electrophoresis yielded more than 100 spots, from which 37 proteins were unambiguously assigned from peptide sequences. By one-dimensional SDS-gel electrophoresis, 34 proteins were identified among which 8 were not found in the 2-DE analysis. A third method, termed 'peptidic shotgun', which consists in a direct treatment of the cocktail by trypsin followed by separation of the peptides on two-dimensional liquid chromatography, resulted in the identification of two additional proteins not found by the two other methods. Altogether, more than 50 proteins, among which several glycosylhydrolytic, hemicellulolytic and proteolytic enzymes, were identified by combining three separation methods in this enzymatic cocktail. This work confirmed the power of proteome analysis to explore the genome expression of a non-sequenced fungus by taking advantage of sequences from phylogenetically related filamentous fungi and pave the way for further functional analysis of P. funiculosum.

  8. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fox, J.W.; Elzinga, M.; Tu, A.T.

    The primary structure of myotoxin a, a myotoxin protein from the venom of the North American rattlesnake Crotalus viridis viridis, was determined and the position of the disulfide bonds assigned. The toxin was isolated, carboxymethylated, and cleaved by cyanogen bromide, and the resultant peptides were isolated. The cyanogen bromide peptides were subjected to amino acid sequence analysis. In order to assign the positions of the three disulfide bonds, the native toxin was cleaved sequentially with cyanogen bromide and trypsin. A two peptide unit connected by one disulfide bond was isolated and characterized, and a three-peptide unit connected by two disulfidemore » bonds was isolated. One peptide in the three-peptide unit was identified as Cys-Cys-Lys. In order to establish the linkages between the peptides and Cys-Cys-Lys, one cycle of Edman degradation was carried out such that the Cys-Cys bond was cleaved. Upon isolation and analysis of the cleavage products, the disulfide bonds connecting the three peptides were determined. The positions of the disulfide bridges of myotoxin a were determined to be totally different from those of neurotoxins isolated from snake venoms. The sequence of myotoxin a was compared with the sequences of other snake venom toxins using the computer program RELATE to determine whether myotoxin a is similar to any other types of toxins. From the computer analysis, myotoxin a did not show any close relationship to other toxins except crotamine from the South American rattlesnake Crotalus durissus terrificus.« less

  9. Transcriptome Analysis and Discovery of Genes Involved in Immune Pathways from Hepatopancreas of Microbial Challenged Mitten Crab Eriocheir sinensis

    PubMed Central

    Li, Xihong; Cui, Zhaoxia; Liu, Yuan; Song, Chengwen; Shi, Guohui

    2013-01-01

    Background The Chinese mitten crab Eriocheir sinensis is an important economic crustacean and has been seriously attacked by various diseases, which requires more and more information for immune relevant genes on genome background. Recently, high-throughput RNA sequencing (RNA-seq) technology provides a powerful and efficient method for transcript analysis and immune gene discovery. Methods/Principal Findings A cDNA library from hepatopancreas of E. sinensis challenged by a mixture of three pathogen strains (Gram-positive bacteria Micrococcus luteus, Gram-negative bacteria Vibrio alginolyticus and fungi Pichia pastoris; 108 cfu·mL−1) was constructed and randomly sequenced using Illumina technique. Totally 39.76 million clean reads were assembled to 70,300 unigenes. After ruling out short-length and low-quality sequences, 52,074 non-redundant unigenes were compared to public databases for homology searching and 17,617 of them showed high similarity to sequences in NCBI non-redundant protein (Nr) database. For function classification and pathway assignment, 18,734 (36.00%) unigenes were categorized to three Gene Ontology (GO) categories, 12,243 (23.51%) were classified to 25 Clusters of Orthologous Groups (COG), and 8,983 (17.25%) were assigned to six Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Potentially, 24, 14, 47 and 132 unigenes were characterized to be involved in Toll, IMD, JAK-STAT and MAPK pathways, respectively. Conclusions/Significance This is the first systematical transcriptome analysis of components relating to innate immune pathways in E. sinensis. Functional genes and putative pathways identified here will contribute to better understand immune system and prevent various diseases in crab. PMID:23874555

  10. Profiling Nematode Communities in Unmanaged Flowerbed and Agricultural Field Soils in Japan by DNA Barcode Sequencing

    PubMed Central

    Morise, Hisashi; Miyazaki, Erika; Yoshimitsu, Shoko; Eki, Toshihiko

    2012-01-01

    Soil nematodes play crucial roles in the soil food web and are a suitable indicator for assessing soil environments and ecosystems. Previous nematode community analyses based on nematode morphology classification have been shown to be useful for assessing various soil environments. Here we have conducted DNA barcode analysis for soil nematode community analyses in Japanese soils. We isolated nematodes from two different environmental soils of an unmanaged flowerbed and an agricultural field using the improved flotation-sieving method. Small subunit (SSU) rDNA fragments were directly amplified from each of 68 (flowerbed samples) and 48 (field samples) isolated nematodes to determine the nucleotide sequence. Sixteen and thirteen operational taxonomic units (OTUs) were obtained by multiple sequence alignment from the flowerbed and agricultural field nematodes, respectively. All 29 SSU rDNA-derived OTUs (rOTUs) were further mapped onto a phylogenetic tree with 107 known nematode species. Interestingly, the two nematode communities examined were clearly distinct from each other in terms of trophic groups: Animal predators and plant feeders were markedly abundant in the flowerbed soils, in contrast, bacterial feeders were dominantly observed in the agricultural field soils. The data from the flowerbed nematodes suggests a possible food web among two different trophic nematode groups and plants (weeds) in the closed soil environment. Finally, DNA sequences derived from the mitochondrial cytochrome oxidase c subunit 1 (COI) gene were determined as a DNA barcode from 43 agricultural field soil nematodes. These nematodes were assigned to 13 rDNA-derived OTUs, but in the COI gene analysis were assigned to 23 COI gene-derived OTUs (cOTUs), indicating that COI gene-based barcoding may provide higher taxonomic resolution than conventional SSU rDNA-barcoding in soil nematode community analysis. PMID:23284767

  11. A Teaching-Learning Sequence about Weather Map Reading

    ERIC Educational Resources Information Center

    Mandrikas, Achilleas; Stavrou, Dimitrios; Skordoulis, Constantine

    2017-01-01

    In this paper a teaching-learning sequence (TLS) introducing pre-service elementary teachers (PET) to weather map reading, with emphasis on wind assignment, is presented. The TLS includes activities about recognition of wind symbols, assignment of wind direction and wind speed on a weather map and identification of wind characteristics in a…

  12. Diversity and community composition of methanogenic archaea in the rumen of Scottish upland sheep assessed by different methods.

    PubMed

    Snelling, Timothy J; Genç, Buğra; McKain, Nest; Watson, Mick; Waters, Sinéad M; Creevey, Christopher J; Wallace, R John

    2014-01-01

    Ruminal archaeomes of two mature sheep grazing in the Scottish uplands were analysed by different sequencing and analysis methods in order to compare the apparent archaeal communities. All methods revealed that the majority of methanogens belonged to the Methanobacteriales order containing the Methanobrevibacter, Methanosphaera and Methanobacteria genera. Sanger sequenced 1.3 kb 16S rRNA gene amplicons identified the main species of Methanobrevibacter present to be a SGMT Clade member Mbb. millerae (≥ 91% of OTUs); Methanosphaera comprised the remainder of the OTUs. The primers did not amplify ruminal Thermoplasmatales-related 16S rRNA genes. Illumina sequenced V6-V8 16S rRNA gene amplicons identified similar Methanobrevibacter spp. and Methanosphaera clades and also identified the Thermoplasmatales-related order as 13% of total archaea. Unusually, both methods concluded that Mbb. ruminantium and relatives from the same clade (RO) were almost absent. Sequences mapping to rumen 16S rRNA and mcrA gene references were extracted from Illumina metagenome data. Mapping of the metagenome data to 16S rRNA gene references produced taxonomic identification to Order level including 2-3% Thermoplasmatales, but was unable to discriminate to species level. Mapping of the metagenome data to mcrA gene references resolved 69% to unclassified Methanobacteriales. Only 30% of sequences were assigned to species level clades: of the sequences assigned to Methanobrevibacter, most mapped to SGMT (16%) and RO (10%) clades. The Sanger 16S amplicon and Illumina metagenome mcrA analyses showed similar species richness (Chao1 Index 19-35), while Illumina metagenome and amplicon 16S rRNA analysis gave lower richness estimates (10-18). The values of the Shannon Index were low in all methods, indicating low richness and uneven species distribution. Thus, although much information may be extracted from the other methods, Illumina amplicon sequencing of the V6-V8 16S rRNA gene would be the method of choice for studying rumen archaeal communities.

  13. EFICAz2: enzyme function inference by a combined approach enhanced by machine learning.

    PubMed

    Arakaki, Adrian K; Huang, Ying; Skolnick, Jeffrey

    2009-04-13

    We previously developed EFICAz, an enzyme function inference approach that combines predictions from non-completely overlapping component methods. Two of the four components in the original EFICAz are based on the detection of functionally discriminating residues (FDRs). FDRs distinguish between member of an enzyme family that are homofunctional (classified under the EC number of interest) or heterofunctional (annotated with another EC number or lacking enzymatic activity). Each of the two FDR-based components is associated to one of two specific kinds of enzyme families. EFICAz exhibits high precision performance, except when the maximal test to training sequence identity (MTTSI) is lower than 30%. To improve EFICAz's performance in this regime, we: i) increased the number of predictive components and ii) took advantage of consensual information from the different components to make the final EC number assignment. We have developed two new EFICAz components, analogs to the two FDR-based components, where the discrimination between homo and heterofunctional members is based on the evaluation, via Support Vector Machine models, of all the aligned positions between the query sequence and the multiple sequence alignments associated to the enzyme families. Benchmark results indicate that: i) the new SVM-based components outperform their FDR-based counterparts, and ii) both SVM-based and FDR-based components generate unique predictions. We developed classification tree models to optimally combine the results from the six EFICAz components into a final EC number prediction. The new implementation of our approach, EFICAz2, exhibits a highly improved prediction precision at MTTSI < 30% compared to the original EFICAz, with only a slight decrease in prediction recall. A comparative analysis of enzyme function annotation of the human proteome by EFICAz2 and KEGG shows that: i) when both sources make EC number assignments for the same protein sequence, the assignments tend to be consistent and ii) EFICAz2 generates considerably more unique assignments than KEGG. Performance benchmarks and the comparison with KEGG demonstrate that EFICAz2 is a powerful and precise tool for enzyme function annotation, with multiple applications in genome analysis and metabolic pathway reconstruction. The EFICAz2 web service is available at: http://cssb.biology.gatech.edu/skolnick/webservice/EFICAz2/index.html.

  14. Problems of classification in the family Paramyxoviridae.

    PubMed

    Rima, Bert; Collins, Peter; Easton, Andrew; Fouchier, Ron; Kurath, Gael; Lamb, Robert A; Lee, Benhur; Maisner, Andrea; Rota, Paul; Wang, Lin-Fa

    2018-05-01

    A number of unassigned viruses in the family Paramyxoviridae need to be classified either as a new genus or placed into one of the seven genera currently recognized in this family. Furthermore, numerous new paramyxoviruses continue to be discovered. However, attempts at classification have highlighted the difficulties that arise by applying historic criteria or criteria based on sequence alone to the classification of the viruses in this family. While the recent taxonomic change that elevated the previous subfamily Pneumovirinae into a separate family Pneumoviridae is readily justified on the basis of RNA dependent -RNA polymerase (RdRp or L protein) sequence motifs, using RdRp sequence comparisons for assignment to lower level taxa raises problems that would require an overhaul of the current criteria for assignment into genera in the family Paramyxoviridae. Arbitrary cut off points to delineate genera and species would have to be set if classification was based on the amino acid sequence of the RdRp alone or on pairwise analysis of sequence complementarity (PASC) of all open reading frames (ORFs). While these cut-offs cannot be made consistent with the current classification in this family, resorting to genus-level demarcation criteria with additional input from the biological context may afford a way forward. Such criteria would reflect the increasingly dynamic nature of virus taxonomy even if it would require a complete revision of the current classification.

  15. Proposals for the classification of human rhinovirus species A, B and C into genotypically assigned types

    PubMed Central

    McIntyre, Chloe L.; Knowles, Nick J.

    2013-01-01

    Human rhinoviruses (HRVs) frequently cause mild upper respiratory tract infections and more severe disease manifestations such as bronchiolitis and asthma exacerbations. HRV is classified into three species within the genus Enterovirus of the family Picornaviridae. HRV species A and B contain 75 and 25 serotypes identified by cross-neutralization assays, although the use of such assays for routine HRV typing is hampered by the large number of serotypes, replacement of virus isolation by molecular methods in HRV diagnosis and the poor or absent replication of HRV species C in cell culture. To address these problems, we propose an alternative, genotypic classification of HRV-based genetic relatedness analogous to that used for enteroviruses. Nucleotide distances between 384 complete VP1 sequences of currently assigned HRV (sero)types identified divergence thresholds of 13, 12 and 13 % for species A, B and C, respectively, that divided inter- and intra-type comparisons. These were paralleled by 10, 9.5 and 10 % thresholds in the larger dataset of >3800 VP4 region sequences. Assignments based on VP1 sequences led to minor revisions of existing type designations (such as the reclassification of serotype pairs, e.g. A8/A95 and A29/A44, as single serotypes) and the designation of new HRV types A101–106, B101–103 and C34–C51. A protocol for assignment and numbering of new HRV types using VP1 sequences and the restriction of VP4 sequence comparisons to type identification and provisional type assignments is proposed. Genotypic assignment and identification of HRV types will be of considerable value in the future investigation of type-associated differences in disease outcomes, transmission and epidemiology. PMID:23677786

  16. TaxI: a software tool for DNA barcoding using distance methods

    PubMed Central

    Steinke, Dirk; Vences, Miguel; Salzburger, Walter; Meyer, Axel

    2005-01-01

    DNA barcoding is a promising approach to the diagnosis of biological diversity in which DNA sequences serve as the primary key for information retrieval. Most existing software for evolutionary analysis of DNA sequences was designed for phylogenetic analyses and, hence, those algorithms do not offer appropriate solutions for the rapid, but precise analyses needed for DNA barcoding, and are also unable to process the often large comparative datasets. We developed a flexible software tool for DNA taxonomy, named TaxI. This program calculates sequence divergences between a query sequence (taxon to be barcoded) and each sequence of a dataset of reference sequences defined by the user. Because the analysis is based on separate pairwise alignments this software is also able to work with sequences characterized by multiple insertions and deletions that are difficult to align in large sequence sets (i.e. thousands of sequences) by multiple alignment algorithms because of computational restrictions. Here, we demonstrate the utility of this approach with two datasets of fish larvae and juveniles from Lake Constance and juvenile land snails under different models of sequence evolution. Sets of ribosomal 16S rRNA sequences, characterized by multiple indels, performed as good as or better than cox1 sequence sets in assigning sequences to species, demonstrating the suitability of rRNA genes for DNA barcoding. PMID:16214755

  17. Accounting for Protein Subcellular Localization: A Compartmental Map of the Rat Liver Proteome*

    PubMed Central

    Jadot, Michel; Boonen, Marielle; Thirion, Jaqueline; Wang, Nan; Xing, Jinchuan; Zhao, Caifeng; Tannous, Abla; Qian, Meiqian; Zheng, Haiyan; Everett, John K.; Moore, Dirk F.; Sleat, David E.; Lobel, Peter

    2017-01-01

    Accurate knowledge of the intracellular location of proteins is important for numerous areas of biomedical research including assessing fidelity of putative protein-protein interactions, modeling cellular processes at a system-wide level and investigating metabolic and disease pathways. Many proteins have not been localized, or have been incompletely localized, partly because most studies do not account for entire subcellular distribution. Thus, proteins are frequently assigned to one organelle whereas a significant fraction may reside elsewhere. As a step toward a comprehensive cellular map, we used subcellular fractionation with classic balance sheet analysis and isobaric labeling/quantitative mass spectrometry to assign locations to >6000 rat liver proteins. We provide quantitative data and error estimates describing the distribution of each protein among the eight major cellular compartments: nucleus, mitochondria, lysosomes, peroxisomes, endoplasmic reticulum, Golgi, plasma membrane and cytosol. Accounting for total intracellular distribution improves quality of organelle assignments and assigns proteins with multiple locations. Protein assignments and supporting data are available online through the Prolocate website (http://prolocate.cabm.rutgers.edu). As an example of the utility of this data set, we have used organelle assignments to help analyze whole exome sequencing data from an infant dying at 6 months of age from a suspected neurodegenerative lysosomal storage disorder of unknown etiology. Sequencing data was prioritized using lists of lysosomal proteins comprising well-established residents of this organelle as well as novel candidates identified in this study. The latter included copper transporter 1, encoded by SLC31A1, which we localized to both the plasma membrane and lysosome. The patient harbors two predicted loss of function mutations in SLC31A1, suggesting that this may represent a heretofore undescribed recessive lysosomal storage disease gene. PMID:27923875

  18. Accounting for Protein Subcellular Localization: A Compartmental Map of the Rat Liver Proteome.

    PubMed

    Jadot, Michel; Boonen, Marielle; Thirion, Jaqueline; Wang, Nan; Xing, Jinchuan; Zhao, Caifeng; Tannous, Abla; Qian, Meiqian; Zheng, Haiyan; Everett, John K; Moore, Dirk F; Sleat, David E; Lobel, Peter

    2017-02-01

    Accurate knowledge of the intracellular location of proteins is important for numerous areas of biomedical research including assessing fidelity of putative protein-protein interactions, modeling cellular processes at a system-wide level and investigating metabolic and disease pathways. Many proteins have not been localized, or have been incompletely localized, partly because most studies do not account for entire subcellular distribution. Thus, proteins are frequently assigned to one organelle whereas a significant fraction may reside elsewhere. As a step toward a comprehensive cellular map, we used subcellular fractionation with classic balance sheet analysis and isobaric labeling/quantitative mass spectrometry to assign locations to >6000 rat liver proteins. We provide quantitative data and error estimates describing the distribution of each protein among the eight major cellular compartments: nucleus, mitochondria, lysosomes, peroxisomes, endoplasmic reticulum, Golgi, plasma membrane and cytosol. Accounting for total intracellular distribution improves quality of organelle assignments and assigns proteins with multiple locations. Protein assignments and supporting data are available online through the Prolocate website (http://prolocate.cabm.rutgers.edu). As an example of the utility of this data set, we have used organelle assignments to help analyze whole exome sequencing data from an infant dying at 6 months of age from a suspected neurodegenerative lysosomal storage disorder of unknown etiology. Sequencing data was prioritized using lists of lysosomal proteins comprising well-established residents of this organelle as well as novel candidates identified in this study. The latter included copper transporter 1, encoded by SLC31A1, which we localized to both the plasma membrane and lysosome. The patient harbors two predicted loss of function mutations in SLC31A1, suggesting that this may represent a heretofore undescribed recessive lysosomal storage disease gene. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

  19. MetAMOS: a modular and open source metagenomic assembly and analysis pipeline

    PubMed Central

    2013-01-01

    We describe MetAMOS, an open source and modular metagenomic assembly and analysis pipeline. MetAMOS represents an important step towards fully automated metagenomic analysis, starting with next-generation sequencing reads and producing genomic scaffolds, open-reading frames and taxonomic or functional annotations. MetAMOS can aid in reducing assembly errors, commonly encountered when assembling metagenomic samples, and improves taxonomic assignment accuracy while also reducing computational cost. MetAMOS can be downloaded from: https://github.com/treangen/MetAMOS. PMID:23320958

  20. REPPER—repeats and their periodicities in fibrous proteins

    PubMed Central

    Gruber, Markus; Söding, Johannes; Lupas, Andrei N.

    2005-01-01

    REPPER (REPeats and their PERiodicities) is an integrated server that detects and analyzes regions with short gapless repeats in protein sequences or alignments. It finds periodicities by Fourier Transform (FTwin) and internal similarity analysis (REPwin). FTwin assigns numerical values to amino acids that reflect certain properties, for instance hydrophobicity, and gives information on corresponding periodicities. REPwin uses self-alignments and displays repeats that reveal significant internal similarities. Both programs use a sliding window to ensure that different periodic regions within the same protein are detected independently. FTwin and REPwin are complemented by secondary structure prediction (PSIPRED) and coiled coil prediction (COILS), making the server a versatile analysis tool for sequences of fibrous proteins. REPPER is available at . PMID:15980460

  1. The proteome: structure, function and evolution

    PubMed Central

    Fleming, Keiran; Kelley, Lawrence A; Islam, Suhail A; MacCallum, Robert M; Muller, Arne; Pazos, Florencio; Sternberg, Michael J.E

    2006-01-01

    This paper reports two studies to model the inter-relationships between protein sequence, structure and function. First, an automated pipeline to provide a structural annotation of proteomes in the major genomes is described. The results are stored in a database at Imperial College, London (3D-GENOMICS) that can be accessed at www.sbg.bio.ic.ac.uk. Analysis of the assignments to structural superfamilies provides evolutionary insights. 3D-GENOMICS is being integrated with related proteome annotation data at University College London and the European Bioinformatics Institute in a project known as e-protein (http://www.e-protein.org/). The second topic is motivated by the developments in structural genomics projects in which the structure of a protein is determined prior to knowledge of its function. We have developed a new approach PHUNCTIONER that uses the gene ontology (GO) classification to supervise the extraction of the sequence signal responsible for protein function from a structure-based sequence alignment. Using GO we can obtain profiles for a range of specificities described in the ontology. In the region of low sequence similarity (around 15%), our method is more accurate than assignment from the closest structural homologue. The method is also able to identify the specific residues associated with the function of the protein family. PMID:16524832

  2. Pfarao: a web application for protein family analysis customized for cytoskeletal and motor proteins (CyMoBase)

    PubMed Central

    Odronitz, Florian; Kollmar, Martin

    2006-01-01

    Background Annotation of protein sequences of eukaryotic organisms is crucial for the understanding of their function in the cell. Manual annotation is still by far the most accurate way to correctly predict genes. The classification of protein sequences, their phylogenetic relation and the assignment of function involves information from various sources. This often leads to a collection of heterogeneous data, which is hard to track. Cytoskeletal and motor proteins consist of large and diverse superfamilies comprising up to several dozen members per organism. Up to date there is no integrated tool available to assist in the manual large-scale comparative genomic analysis of protein families. Description Pfarao (Protein Family Application for Retrieval, Analysis and Organisation) is a database driven online working environment for the analysis of manually annotated protein sequences and their relationship. Currently, the system can store and interrelate a wide range of information about protein sequences, species, phylogenetic relations and sequencing projects as well as links to literature and domain predictions. Sequences can be imported from multiple sequence alignments that are generated during the annotation process. A web interface allows to conveniently browse the database and to compile tabular and graphical summaries of its content. Conclusion We implemented a protein sequence-centric web application to store, organize, interrelate, and present heterogeneous data that is generated in manual genome annotation and comparative genomics. The application has been developed for the analysis of cytoskeletal and motor proteins (CyMoBase) but can easily be adapted for any protein. PMID:17134497

  3. Bacterial community comparisons by taxonomy-supervised analysis independent of sequence alignment and clustering

    PubMed Central

    Sul, Woo Jun; Cole, James R.; Jesus, Ederson da C.; Wang, Qiong; Farris, Ryan J.; Fish, Jordan A.; Tiedje, James M.

    2011-01-01

    High-throughput sequencing of 16S rRNA genes has increased our understanding of microbial community structure, but now even higher-throughput methods to the Illumina scale allow the creation of much larger datasets with more samples and orders-of-magnitude more sequences that swamp current analytic methods. We developed a method capable of handling these larger datasets on the basis of assignment of sequences into an existing taxonomy using a supervised learning approach (taxonomy-supervised analysis). We compared this method with a commonly used clustering approach based on sequence similarity (taxonomy-unsupervised analysis). We sampled 211 different bacterial communities from various habitats and obtained ∼1.3 million 16S rRNA sequences spanning the V4 hypervariable region by pyrosequencing. Both methodologies gave similar ecological conclusions in that β-diversity measures calculated by using these two types of matrices were significantly correlated to each other, as were the ordination configurations and hierarchical clustering dendrograms. In addition, our taxonomy-supervised analyses were also highly correlated with phylogenetic methods, such as UniFrac. The taxonomy-supervised analysis has the advantages that it is not limited by the exhaustive computation required for the alignment and clustering necessary for the taxonomy-unsupervised analysis, is more tolerant of sequencing errors, and allows comparisons when sequences are from different regions of the 16S rRNA gene. With the tremendous expansion in 16S rRNA data acquisition underway, the taxonomy-supervised approach offers the potential to provide more rapid and extensive community comparisons across habitats and samples. PMID:21873204

  4. DNA Barcode Analysis of Thrips (Thysanoptera) Diversity in Pakistan Reveals Cryptic Species Complexes.

    PubMed

    Iftikhar, Romana; Ashfaq, Muhammad; Rasool, Akhtar; Hebert, Paul D N

    2016-01-01

    Although thrips are globally important crop pests and vectors of viral disease, species identifications are difficult because of their small size and inconspicuous morphological differences. Sequence variation in the mitochondrial COI-5' (DNA barcode) region has proven effective for the identification of species in many groups of insect pests. We analyzed barcode sequence variation among 471 thrips from various plant hosts in north-central Pakistan. The Barcode Index Number (BIN) system assigned these sequences to 55 BINs, while the Automatic Barcode Gap Discovery detected 56 partitions, a count that coincided with the number of monophyletic lineages recognized by Neighbor-Joining analysis and Bayesian inference. Congeneric species showed an average of 19% sequence divergence (range = 5.6% - 27%) at COI, while intraspecific distances averaged 0.6% (range = 0.0% - 7.6%). BIN analysis suggested that all intraspecific divergence >3.0% actually involved a species complex. In fact, sequences for three major pest species (Haplothrips reuteri, Thrips palmi, Thrips tabaci), and one predatory thrips (Aeolothrips intermedius) showed deep intraspecific divergences, providing evidence that each is a cryptic species complex. The study compiles the first barcode reference library for the thrips of Pakistan, and examines global haplotype diversity in four important pest thrips.

  5. Sequencing RNA by a combination of exonuclease digestion and uridine specific chemical cleavage using MALDI-TOF.

    PubMed Central

    Tolson, D A; Nicholson, N H

    1998-01-01

    The determination of DNA sequences by partial exonuclease digestion followed by Matrix-Assisted Laser Desorption Time of Flight Mass Spectrometry (MALDI-TOF) is a well established method. When the same procedure is applied to RNA, difficulties arise due to the small (1 Da) mass difference between the nucleotides U and C, which makes unambiguous assignment difficult using a MALDI-TOF instrument. Here we report our experiences with sequence specific endonucleases and chemical methods followed by MALDI-TOF to resolve these sequence ambiguities. We have found chemical methods superior to endonucleases both in terms of correct specificity and extent of sequence coverage. This methodology can be used in combination with exonuclease digestion to rapidly assign RNA sequences. PMID:9421498

  6. Hepatitis E virus genotype 3 diversity: phylogenetic analysis and presence of subtype 3b in wild boar in Europe.

    PubMed

    Vina-Rodriguez, Ariel; Schlosser, Josephine; Becher, Dietmar; Kaden, Volker; Groschup, Martin H; Eiden, Martin

    2015-05-22

    An increasing number of indigenous cases of hepatitis E caused by genotype 3 viruses (HEV-3) have been diagnosed all around the word, particularly in industrialized countries. Hepatitis E is a zoonotic disease and accumulating evidence indicates that domestic pigs and wild boars are the main reservoirs of HEV-3. A detailed analysis of HEV-3 subtypes could help to determine the interplay of human activity, the role of animals as reservoirs and cross species transmission. Although complete genome sequences are most appropriate for HEV subtype determination, in most cases only partial genomic sequences are available. We therefore carried out a subtype classification analysis, which uses regions from all three open reading frames of the genome. Using this approach, more than 1000 published HEV-3 isolates were subtyped. Newly recovered HEV partial sequences from hunted German wild boars were also included in this study. These sequences were assigned to genotype 3 and clustered within subtype 3a, 3i and, unexpectedly, one of them within the subtype 3b, a first non-human report of this subtype in Europe.

  7. Comparison of dkgB-linked intergenic sequence ribotyping to DNA microarray hybridization for assigning serotype to Salmonella enterica

    PubMed Central

    Guard, Jean; Sanchez-Ingunza, Roxana; Morales, Cesar; Stewart, Tod; Liljebjelke, Karen; Kessel, JoAnn; Ingram, Kim; Jones, Deana; Jackson, Charlene; Fedorka-Cray, Paula; Frye, Jonathan; Gast, Richard; Hinton, Arthur

    2012-01-01

    Two DNA-based methods were compared for the ability to assign serotype to 139 isolates of Salmonella enterica ssp. I. Intergenic sequence ribotyping (ISR) evaluated single nucleotide polymorphisms occurring in a 5S ribosomal gene region and flanking sequences bordering the gene dkgB. A DNA microarray hybridization method that assessed the presence and the absence of sets of genes was the second method. Serotype was assigned for 128 (92.1%) of submissions by the two DNA methods. ISR detected mixtures of serotypes within single colonies and it cost substantially less than Kauffmann–White serotyping and DNA microarray hybridization. Decreasing the cost of serotyping S. enterica while maintaining reliability may encourage routine testing and research. PMID:22998607

  8. Characterisation of the genomes of four putative vesiculoviruses: tench rhabdovirus, grass carp rhabdovirus, perch rhabdovirus and eel rhabdovirus European X.

    PubMed

    Stone, David M; Kerr, Rose C; Hughes, Margaret; Radford, Alan D; Darby, Alistair C

    2013-11-01

    The complete coding sequences were determined for four putative vesiculoviruses isolated from fish. Sequence alignment and phylogenetic analysis based on the predicted amino acid sequences of the five main proteins assigned tench rhabdovirus and grass carp rhabdovirus together with spring viraemia of carp and pike fry rhabdovirus to a lineage that was distinct from the mammalian vesiculoviruses. Perch rhabdovirus, eel virus European X, lake trout rhabdovirus 903/87 and sea trout virus were placed in a second lineage that was also distinct from the recognised genera in the family Rhabdoviridae. Establishment of two new rhabdovirus genera, "Perhabdovirus" and "Sprivivirus", is discussed.

  9. Coordinating Environmental Genomics and Geochemistry Reveals Metabolic Transitions in a Hot Spring Ecosystem

    PubMed Central

    Swingley, Wesley D.; Meyer-Dombard, D’Arcy R.; Shock, Everett L.; Alsop, Eric B.; Falenski, Heinz D.; Havig, Jeff R.; Raymond, Jason

    2012-01-01

    We have constructed a conceptual model of biogeochemical cycles and metabolic and microbial community shifts within a hot spring ecosystem via coordinated analysis of the “Bison Pool” (BP) Environmental Genome and a complementary contextual geochemical dataset of ∼75 geochemical parameters. 2,321 16S rRNA clones and 470 megabases of environmental sequence data were produced from biofilms at five sites along the outflow of BP, an alkaline hot spring in Sentinel Meadow (Lower Geyser Basin) of Yellowstone National Park. This channel acts as a >22 m gradient of decreasing temperature, increasing dissolved oxygen, and changing availability of biologically important chemical species, such as those containing nitrogen and sulfur. Microbial life at BP transitions from a 92°C chemotrophic streamer biofilm community in the BP source pool to a 56°C phototrophic mat community. We improved automated annotation of the BP environmental genomes using BLAST-based Markov clustering. We have also assigned environmental genome sequences to individual microbial community members by complementing traditional homology-based assignment with nucleotide word-usage algorithms, allowing more than 70% of all reads to be assigned to source organisms. This assignment yields high genome coverage in dominant community members, facilitating reconstruction of nearly complete metabolic profiles and in-depth analysis of the relation between geochemical and metabolic changes along the outflow. We show that changes in environmental conditions and energy availability are associated with dramatic shifts in microbial communities and metabolic function. We have also identified an organism constituting a novel phylum in a metabolic “transition” community, located physically between the chemotroph- and phototroph-dominated sites. The complementary analysis of biogeochemical and environmental genomic data from BP has allowed us to build ecosystem-based conceptual models for this hot spring, reconstructing whole metabolic networks in order to illuminate community roles in shaping and responding to geochemical variability. PMID:22675512

  10. A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy.

    PubMed

    Gao, Xiang; Lin, Huaiying; Revanna, Kashi; Dong, Qunfeng

    2017-05-10

    Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .

  11. [Multilocus Sequence Typing analysis of human Campylobacter coli in Granada (Spain)].

    PubMed

    Carrillo-Ávila, J A; Sorlózano-Puerto, A; Pérez-Ruiz, M; Gutiérrez-Fernández, J

    2016-12-01

    Different subtypes of Campylobacter spp. have been associated with diarrhoea and a Multilocus Sequence Typing (MLST) method has been performed for subtyping. In the present work, MLST was used to analyse the genetic diversity of eight strains of Campylobacter coli. Nineteen genetic markers were amplified for MLST analysis: AnsB, DmsA, ggt, Cj1585c, CJJ81176-1367/1371, Tlp7, cj1321-cj1326, fucP, cj0178, cj0755/cfrA, ceuE, pldA, cstII, cstIII. After comparing the obtained sequences with the Campylobacter MLST database, the allele numbers, sequence types (STs) and clonal complexes (CCs) were assigned. The 8 C. coli isolates yielded 4 different STs belonging to 2 CCs. Seven isolates belong to ST-828 clonal complex and only one isolate belong to ST-21. Two samples came from the same patient, but were isolated in two different periods of time. MLST can be useful for taxonomic characterization of C. coli isolates.

  12. Cleavage sites within the poliovirus capsid protein precursors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Larsen, G.R.; Anderson, C.W.; Dorner, A.J.

    1982-01-01

    Partial amino-terminal sequence analysis was performed on radiolabeled poliovirus capsid proteins VP1, VP2, and VP3. A computer-assisted comparison of the amino acid sequences obtained with that predicted by the nucleotide sequence of the poliovirus genome allows assignment of the amino terminus of each capsid protein to a unique position within the virus polyprotein. Sequence analysis of trypsin-digested VP4, which has a blocked amino terminus, demonstrates that VP4 is encoded at or very near to the amino terminus of the polyprotein. The gene order of the capsid proteins is VP4-VP2-VP3-VP1. Cleavage of VP0 to VP4 and VP2 is shown to occurmore » between asparagine and serine, whereas the cleavages that separate VP2/VP3 and VP3/VP1 occur between glutamine and glycine residues. This finding supports the hypothesis that the cleavage of VP0, which occurs during virion morphogenesis, is distinct from the cleavages that separate functional regions of the polyprotein.« less

  13. MToolBox: a highly automated pipeline for heteroplasmy annotation and prioritization analysis of human mitochondrial variants in high-throughput sequencing

    PubMed Central

    Diroma, Maria Angela; Santorsola, Mariangela; Guttà, Cristiano; Gasparre, Giuseppe; Picardi, Ernesto; Pesole, Graziano; Attimonelli, Marcella

    2014-01-01

    Motivation: The increasing availability of mitochondria-targeted and off-target sequencing data in whole-exome and whole-genome sequencing studies (WXS and WGS) has risen the demand of effective pipelines to accurately measure heteroplasmy and to easily recognize the most functionally important mitochondrial variants among a huge number of candidates. To this purpose, we developed MToolBox, a highly automated pipeline to reconstruct and analyze human mitochondrial DNA from high-throughput sequencing data. Results: MToolBox implements an effective computational strategy for mitochondrial genomes assembling and haplogroup assignment also including a prioritization analysis of detected variants. MToolBox provides a Variant Call Format file featuring, for the first time, allele-specific heteroplasmy and annotation files with prioritized variants. MToolBox was tested on simulated samples and applied on 1000 Genomes WXS datasets. Availability and implementation: MToolBox package is available at https://sourceforge.net/projects/mtoolbox/. Contact: marcella.attimonelli@uniba.it Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25028726

  14. Bridge over troubled proline: assignment of intrinsically disordered proteins using (HCA)CON(CAN)H and (HCA)N(CA)CO(N)H experiments concomitantly with HNCO and i(HCA)CO(CA)NH.

    PubMed

    Hellman, Maarit; Piirainen, Henni; Jaakola, Veli-Pekka; Permi, Perttu

    2014-01-01

    NMR spectroscopy is by far the most versatile and information rich technique to study intrinsically disordered proteins (IDPs). While NMR is able to offer residue level information on structure and dynamics, assignment of chemical shift resonances in IDPs is not a straightforward process. Consequently, numerous pulse sequences and assignment protocols have been developed during past several years, targeted especially for the assignment of IDPs, including experiments that employ H(N), H(α) or (13)C detection combined with two to six indirectly detected dimensions. Here we propose two new HN-detection based pulse sequences, (HCA)CON(CAN)H and (HCA)N(CA)CO(N)H, that provide correlations with (1)H(N)(i - 1), (13)C'(i - 1) and (15)N(i), and (1)H(N)(i + 1), (13)C'(i) and (15)N(i) frequencies, respectively. Most importantly, they offer sequential links across the proline bridges and enable filling the single proline gaps during the assignment. We show that the novel experiments can efficiently complement the information available from existing HNCO and intraresidual i(HCA)CO(CA)NH pulse sequences and their concomitant usage enabled >95 % assignment of backbone resonances in cytoplasmic tail of adenosine receptor A2A in comparison to 73 % complete assignment using the HNCO/i(HCA)CO(CA)NH data alone.

  15. Automated sequence analysis and editing software for HIV drug resistance testing.

    PubMed

    Struck, Daniel; Wallis, Carole L; Denisov, Gennady; Lambert, Christine; Servais, Jean-Yves; Viana, Raquel V; Letsoalo, Esrom; Bronze, Michelle; Aitken, Sue C; Schuurman, Rob; Stevens, Wendy; Schmit, Jean Claude; Rinke de Wit, Tobias; Perez Bercoff, Danielle

    2012-05-01

    Access to antiretroviral treatment in resource-limited-settings is inevitably paralleled by the emergence of HIV drug resistance. Monitoring treatment efficacy and HIV drugs resistance testing are therefore of increasing importance in resource-limited settings. Yet low-cost technologies and procedures suited to the particular context and constraints of such settings are still lacking. The ART-A (Affordable Resistance Testing for Africa) consortium brought together public and private partners to address this issue. To develop an automated sequence analysis and editing software to support high throughput automated sequencing. The ART-A Software was designed to automatically process and edit ABI chromatograms or FASTA files from HIV-1 isolates. The ART-A Software performs the basecalling, assigns quality values, aligns query sequences against a set reference, infers a consensus sequence, identifies the HIV type and subtype, translates the nucleotide sequence to amino acids and reports insertions/deletions, premature stop codons, ambiguities and mixed calls. The results can be automatically exported to Excel to identify mutations. Automated analysis was compared to manual analysis using a panel of 1624 PR-RT sequences generated in 3 different laboratories. Discrepancies between manual and automated sequence analysis were 0.69% at the nucleotide level and 0.57% at the amino acid level (668,047 AA analyzed), and discordances at major resistance mutations were recorded in 62 cases (4.83% of differences, 0.04% of all AA) for PR and 171 (6.18% of differences, 0.03% of all AA) cases for RT. The ART-A Software is a time-sparing tool for pre-analyzing HIV and viral quasispecies sequences in high throughput laboratories and highlighting positions requiring attention. Copyright © 2012 Elsevier B.V. All rights reserved.

  16. Application of the MIDAS approach for analysis of lysine acetylation sites.

    PubMed

    Evans, Caroline A; Griffiths, John R; Unwin, Richard D; Whetton, Anthony D; Corfe, Bernard M

    2013-01-01

    Multiple Reaction Monitoring Initiated Detection and Sequencing (MIDAS™) is a mass spectrometry-based technique for the detection and characterization of specific post-translational modifications (Unwin et al. 4:1134-1144, 2005), for example acetylated lysine residues (Griffiths et al. 18:1423-1428, 2007). The MIDAS™ technique has application for discovery and analysis of acetylation sites. It is a hypothesis-driven approach that requires a priori knowledge of the primary sequence of the target protein and a proteolytic digest of this protein. MIDAS essentially performs a targeted search for the presence of modified, for example acetylated, peptides. The detection is based on the combination of the predicted molecular weight (measured as mass-charge ratio) of the acetylated proteolytic peptide and a diagnostic fragment (product ion of m/z 126.1), which is generated by specific fragmentation of acetylated peptides during collision induced dissociation performed in tandem mass spectrometry (MS) analysis. Sequence information is subsequently obtained which enables acetylation site assignment. The technique of MIDAS was later trademarked by ABSciex for targeted protein analysis where an MRM scan is combined with full MS/MS product ion scan to enable sequence confirmation.

  17. Tissue-Specific Transcriptome Profiling of Plutella Xylostella Third Instar Larval Midgut

    PubMed Central

    Xie, Wen; Lei, Yanyuan; Fu, Wei; Yang, Zhongxia; Zhu, Xun; Guo, Zhaojiang; Wu, Qingjun; Wang, Shaoli; Xu, Baoyun; Zhou, Xuguo; Zhang, Youjun

    2012-01-01

    The larval midgut of diamondback moth, Plutella xylostella, is a dynamic tissue that interfaces with a diverse array of physiological and toxicological processes, including nutrient digestion and allocation, xenobiotic detoxification, innate and adaptive immune response, and pathogen defense. Despite its enormous agricultural importance, the genomic resources for P. xylostella are surprisingly scarce. In this study, a Bt resistant P. xylostella strain was subjected to the in-depth transcriptome analysis to identify genes and gene networks putatively involved in various physiological and toxicological processes in the P. xylostella larval midgut. Using Illumina deep sequencing, we obtained roughly 40 million reads containing approximately 3.6 gigabases of sequence data. De novo assembly generated 63,312 ESTs with an average read length of 416bp, and approximately half of the P. xylostella sequences (45.4%, 28,768) showed similarity to the non-redundant database in GenBank with a cut-off E-value below 10-5. Among them, 11,092 unigenes were assigned to one or multiple GO terms and 16,732 unigenes were assigned to 226 specific pathways. In-depth analysis indentified genes putatively involved in insecticide resistance, nutrient digestion, and innate immune defense. Besides conventional detoxification enzymes and insecticide targets, novel genes, including 28 chymotrypsins and 53 ABC transporters, have been uncovered in the P. xylostella larval midgut transcriptome; which are potentially linked to the Bt toxicity and resistance. Furthermore, an unexpectedly high number of ESTs, including 46 serpins and 7 lysozymes, were predicted to be involved in the immune defense. As the first tissue-specific transcriptome analysis of P. xylostella, this study sheds light on the molecular understanding of insecticide resistance, especially Bt resistance in an agriculturally important insect pest, and lays the foundation for future functional genomics research. In addition, current sequencing effort greatly enriched the existing P. xylostella EST database, and makes RNAseq a viable option in the future genomic analysis. PMID:23091412

  18. Tissue-specific transcriptome profiling of Plutella xylostella third instar larval midgut.

    PubMed

    Xie, Wen; Lei, Yanyuan; Fu, Wei; Yang, Zhongxia; Zhu, Xun; Guo, Zhaojiang; Wu, Qingjun; Wang, Shaoli; Xu, Baoyun; Zhou, Xuguo; Zhang, Youjun

    2012-01-01

    The larval midgut of diamondback moth, Plutella xylostella, is a dynamic tissue that interfaces with a diverse array of physiological and toxicological processes, including nutrient digestion and allocation, xenobiotic detoxification, innate and adaptive immune response, and pathogen defense. Despite its enormous agricultural importance, the genomic resources for P. xylostella are surprisingly scarce. In this study, a Bt resistant P. xylostella strain was subjected to the in-depth transcriptome analysis to identify genes and gene networks putatively involved in various physiological and toxicological processes in the P. xylostella larval midgut. Using Illumina deep sequencing, we obtained roughly 40 million reads containing approximately 3.6 gigabases of sequence data. De novo assembly generated 63,312 ESTs with an average read length of 416 bp, and approximately half of the P. xylostella sequences (45.4%, 28,768) showed similarity to the non-redundant database in GenBank with a cut-off E-value below 10(-5). Among them, 11,092 unigenes were assigned to one or multiple GO terms and 16,732 unigenes were assigned to 226 specific pathways. In-depth analysis identified genes putatively involved in insecticide resistance, nutrient digestion, and innate immune defense. Besides conventional detoxification enzymes and insecticide targets, novel genes, including 28 chymotrypsins and 53 ABC transporters, have been uncovered in the P. xylostella larval midgut transcriptome; which are potentially linked to the Bt toxicity and resistance. Furthermore, an unexpectedly high number of ESTs, including 46 serpins and 7 lysozymes, were predicted to be involved in the immune defense.As the first tissue-specific transcriptome analysis of P. xylostella, this study sheds light on the molecular understanding of insecticide resistance, especially Bt resistance in an agriculturally important insect pest, and lays the foundation for future functional genomics research. In addition, current sequencing effort greatly enriched the existing P. xylostella EST database, and makes RNAseq a viable option in the future genomic analysis.

  19. Using phylogenetically-informed annotation (PIA) to search for light-interacting genes in transcriptomes from non-model organisms.

    PubMed

    Speiser, Daniel I; Pankey, M Sabrina; Zaharoff, Alexander K; Battelle, Barbara A; Bracken-Grissom, Heather D; Breinholt, Jesse W; Bybee, Seth M; Cronin, Thomas W; Garm, Anders; Lindgren, Annie R; Patel, Nipam H; Porter, Megan L; Protas, Meredith E; Rivera, Ajna S; Serb, Jeanne M; Zigler, Kirk S; Crandall, Keith A; Oakley, Todd H

    2014-11-19

    Tools for high throughput sequencing and de novo assembly make the analysis of transcriptomes (i.e. the suite of genes expressed in a tissue) feasible for almost any organism. Yet a challenge for biologists is that it can be difficult to assign identities to gene sequences, especially from non-model organisms. Phylogenetic analyses are one useful method for assigning identities to these sequences, but such methods tend to be time-consuming because of the need to re-calculate trees for every gene of interest and each time a new data set is analyzed. In response, we employed existing tools for phylogenetic analysis to produce a computationally efficient, tree-based approach for annotating transcriptomes or new genomes that we term Phylogenetically-Informed Annotation (PIA), which places uncharacterized genes into pre-calculated phylogenies of gene families. We generated maximum likelihood trees for 109 genes from a Light Interaction Toolkit (LIT), a collection of genes that underlie the function or development of light-interacting structures in metazoans. To do so, we searched protein sequences predicted from 29 fully-sequenced genomes and built trees using tools for phylogenetic analysis in the Osiris package of Galaxy (an open-source workflow management system). Next, to rapidly annotate transcriptomes from organisms that lack sequenced genomes, we repurposed a maximum likelihood-based Evolutionary Placement Algorithm (implemented in RAxML) to place sequences of potential LIT genes on to our pre-calculated gene trees. Finally, we implemented PIA in Galaxy and used it to search for LIT genes in 28 newly-sequenced transcriptomes from the light-interacting tissues of a range of cephalopod mollusks, arthropods, and cubozoan cnidarians. Our new trees for LIT genes are available on the Bitbucket public repository ( http://bitbucket.org/osiris_phylogenetics/pia/ ) and we demonstrate PIA on a publicly-accessible web server ( http://galaxy-dev.cnsi.ucsb.edu/pia/ ). Our new trees for LIT genes will be a valuable resource for researchers studying the evolution of eyes or other light-interacting structures. We also introduce PIA, a high throughput method for using phylogenetic relationships to identify LIT genes in transcriptomes from non-model organisms. With simple modifications, our methods may be used to search for different sets of genes or to annotate data sets from taxa outside of Metazoa.

  20. ANCAC: amino acid, nucleotide, and codon analysis of COGs--a tool for sequence bias analysis in microbial orthologs.

    PubMed

    Meiler, Arno; Klinger, Claudia; Kaufmann, Michael

    2012-09-08

    The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC's NUCOCOG dataset as the largest one available for that purpose thus far. Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.

  1. ANCAC: amino acid, nucleotide, and codon analysis of COGs – a tool for sequence bias analysis in microbial orthologs

    PubMed Central

    2012-01-01

    Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills. PMID:22958836

  2. The Transcriptome Analysis and Comparison Explorer--T-ACE: a platform-independent, graphical tool to process large RNAseq datasets of non-model organisms.

    PubMed

    Philipp, E E R; Kraemer, L; Mountfort, D; Schilhabel, M; Schreiber, S; Rosenstiel, P

    2012-03-15

    Next generation sequencing (NGS) technologies allow a rapid and cost-effective compilation of large RNA sequence datasets in model and non-model organisms. However, the storage and analysis of transcriptome information from different NGS platforms is still a significant bottleneck, leading to a delay in data dissemination and subsequent biological understanding. Especially database interfaces with transcriptome analysis modules going beyond mere read counts are missing. Here, we present the Transcriptome Analysis and Comparison Explorer (T-ACE), a tool designed for the organization and analysis of large sequence datasets, and especially suited for transcriptome projects of non-model organisms with little or no a priori sequence information. T-ACE offers a TCL-based interface, which accesses a PostgreSQL database via a php-script. Within T-ACE, information belonging to single sequences or contigs, such as annotation or read coverage, is linked to the respective sequence and immediately accessible. Sequences and assigned information can be searched via keyword- or BLAST-search. Additionally, T-ACE provides within and between transcriptome analysis modules on the level of expression, GO terms, KEGG pathways and protein domains. Results are visualized and can be easily exported for external analysis. We developed T-ACE for laboratory environments, which have only a limited amount of bioinformatics support, and for collaborative projects in which different partners work on the same dataset from different locations or platforms (Windows/Linux/MacOS). For laboratories with some experience in bioinformatics and programming, the low complexity of the database structure and open-source code provides a framework that can be customized according to the different needs of the user and transcriptome project.

  3. A novel algorithm for validating peptide identification from a shotgun proteomics search engine.

    PubMed

    Jian, Ling; Niu, Xinnan; Xia, Zhonghang; Samir, Parimal; Sumanasekera, Chiranthani; Mu, Zheng; Jennings, Jennifer L; Hoek, Kristen L; Allos, Tara; Howard, Leigh M; Edwards, Kathryn M; Weil, P Anthony; Link, Andrew J

    2013-03-01

    Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) has revolutionized the proteomics analysis of complexes, cells, and tissues. In a typical proteomic analysis, the tandem mass spectra from a LC-MS/MS experiment are assigned to a peptide by a search engine that compares the experimental MS/MS peptide data to theoretical peptide sequences in a protein database. The peptide spectra matches are then used to infer a list of identified proteins in the original sample. However, the search engines often fail to distinguish between correct and incorrect peptides assignments. In this study, we designed and implemented a novel algorithm called De-Noise to reduce the number of incorrect peptide matches and maximize the number of correct peptides at a fixed false discovery rate using a minimal number of scoring outputs from the SEQUEST search engine. The novel algorithm uses a three-step process: data cleaning, data refining through a SVM-based decision function, and a final data refining step based on proteolytic peptide patterns. Using proteomics data generated on different types of mass spectrometers, we optimized the De-Noise algorithm on the basis of the resolution and mass accuracy of the mass spectrometer employed in the LC-MS/MS experiment. Our results demonstrate De-Noise improves peptide identification compared to other methods used to process the peptide sequence matches assigned by SEQUEST. Because De-Noise uses a limited number of scoring attributes, it can be easily implemented with other search engines.

  4. Correcting names of bacteria deposited in National Microbial Repositories: an analysed sequence data necessary for taxonomic re-categorization of misclassified bacteria-ONE example, genus Lysinibacillus.

    PubMed

    Rekadwad, Bhagwan N; Gonzalez, Juan M

    2017-08-01

    A report on 16S rRNA gene sequence re-analysis and digitalization is presented using Lysinibacillus species (one example) deposited in National Microbial Repositories in India. Lysinibacillus species 16S rRNA gene sequences were digitalized to provide quick response (QR) codes, Chaose Game Representation (CGR) and Frequency of Chaose Game Representation (FCGR). GC percentage, phylogenetic analysis, and principal component analysis (PCA) are tools used for the differentiation and reclassification of the strains under investigation. The seven reasons supporting the statements made by us as misclassified Lysinibacillus species deposited in National Microbial Depositories are given in this paper. Based on seven reasons, bacteria deposited in National Microbial Repositories such as Lysinibacillus and many other needs reanalyses for their exact identity. Leaves of identity with type strains of related species shows difference 2 to 8 % suggesting that reclassification is needed to correctly assign species names to the analyzed Lysinibacillus strains available in National Microbial Repositories.

  5. Waves and Particles, The Orbital Atom, Parts One and Two of an Integrated Science Sequence, Teacher's Guide, 1973 Edition.

    ERIC Educational Resources Information Center

    Portland Project Committee, OR.

    This teacher's guide includes parts one and two of the four-part third year Portland Project, a three-year integrated secondary science curriculum sequence. The Harvard Project Physics textbook is used for reading assignments for part one. Assignments relate to waves, light, electricity, magnetic fields, Faraday and the electrical age,…

  6. Isolation, sequence identification and tissue expression profiles of 3 novel porcine genes: ASPA, NAGA, and HEXA.

    PubMed

    Shu, Xianghua; Liu, Yonggang; Yang, Liangyu; Song, Chunlian; Hou, Jiafa

    2008-01-01

    The complete coding sequences of 3 porcine genes - ASPA, NAGA, and HEXA - were amplified by the reverse transcriptase polymerase chain reaction (RT-PCR) based on the conserved sequence information of the mouse or other mammals and referenced pig ESTs. These 3 novel porcine genes were then deposited in the NCBI database and assigned GeneIDs: 100142661, 100142664 and 100142667. The phylogenetic tree analysis revealed that the porcine ASPA, NAGA, and HEXA all have closer genetic relationships with the ASPA, NAGA, and HEXA of cattle. Tissue expression profile analysis was also carried out and results revealed that swine ASPA, NAGA, and HEXA genes were differentially expressed in various organs, including skeletal muscle, the heart, liver, fat, kidney, lung, and small and large intestines. Our experiment is the first one to establish the foundation for further research on these 3 swine genes.

  7. Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits.

    PubMed

    Dessimoz, Christophe; Boeckmann, Brigitte; Roth, Alexander C J; Gonnet, Gaston H

    2006-01-01

    Correct orthology assignment is a critical prerequisite of numerous comparative genomics procedures, such as function prediction, construction of phylogenetic species trees and genome rearrangement analysis. We present an algorithm for the detection of non-orthologs that arise by mistake in current orthology classification methods based on genome-specific best hits, such as the COGs database. The algorithm works with pairwise distance estimates, rather than computationally expensive and error-prone tree-building methods. The accuracy of the algorithm is evaluated through verification of the distribution of predicted cases, case-by-case phylogenetic analysis and comparisons with predictions from other projects using independent methods. Our results show that a very significant fraction of the COG groups include non-orthologs: using conservative parameters, the algorithm detects non-orthology in a third of all COG groups. Consequently, sequence analysis sensitive to correct orthology assignments will greatly benefit from these findings.

  8. New species and phylogenetic relationships of the spider genus Coptoprepes using morphological and sequence data (Araneae: Anyphaenidae).

    PubMed

    Barone, Mariana L; Werenkraut, Victoria; Ramírez, Martín J

    2016-10-17

    We present evidence from the standard cytochrome c oxidase subunit I (COI) barcoding marker and from new collections, showing that the males and females of C. ecotono Werenkraut & Ramírez were mismatched, and describe the female of that species for the first time. An undescribed male from Chile is assigned to the new species Coptoprepes laudani, together with the female that was previously thought as C. ecotono. The matching of sexes is justified after a dual cladistics analysis of morphological and sequence data in combination. New locality data and barcoding sequences are provided for other species of Coptoprepes, all endemic of the temperate forests of Chile and adjacent Argentina. Although morphology and sequences are not conclusive on the relationships of Coptoprepes species, the sequence data suggests that the species without a retrolateral tibial apophysis may belong to an independent lineage.

  9. GenBank

    PubMed Central

    Benson, Dennis A.; Karsch-Mizrachi, Ilene; Lipman, David J.; Ostell, James; Wheeler, David L.

    2007-01-01

    GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 240 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage (). PMID:17202161

  10. Chromosomal Organization and Sequence Diversity of Genes Encoding Lachrymatory Factor Synthase in Allium cepa L.

    PubMed Central

    Masamura, Noriya; McCallum, John; Khrustaleva, Ludmila; Kenel, Fernand; Pither-Joyce, Meegham; Shono, Jinji; Suzuki, Go; Mukai, Yasuhiko; Yamauchi,, Naoki; Shigyo, Masayoshi

    2012-01-01

    Lachrymatory factor synthase (LFS) catalyzes the formation of lachrymatory factor, one of the most distinctive traits of bulb onion (Allium cepa L.). Therefore, we used LFS as a model for a functional gene in a huge genome, and we examined the chromosomal organization of LFS in A. cepa by multiple approaches. The first-level analysis completed the chromosomal assignment of LFS gene to chromosome 5 of A. cepa via the use of a complete set of A. fistulosum–shallot (A. cepa L. Aggregatum group) monosomic addition lines. Subsequent use of an F2 mapping population from the interspecific cross A. cepa × A. roylei confirmed the assignment of an LFS locus to this chromosome. Sequence comparison of two BAC clones bearing LFS genes, LFS amplicons from diverse germplasm, and expressed sequences from a doubled haploid line revealed variation consistent with duplicated LFS genes. Furthermore, the BAC-FISH study using the two BAC clones as a probe showed that LFS genes are localized in the proximal region of the long arm of the chromosome. These results suggested that LFS in A. cepa is transcribed from at least two loci and that they are localized on chromosome 5. PMID:22690373

  11. Classification of DNA nucleotides with transverse tunneling currents

    NASA Astrophysics Data System (ADS)

    Nyvold Pedersen, Jonas; Boynton, Paul; Di Ventra, Massimiliano; Jauho, Antti-Pekka; Flyvbjerg, Henrik

    2017-01-01

    It has been theoretically suggested and experimentally demonstrated that fast and low-cost sequencing of DNA, RNA, and peptide molecules might be achieved by passing such molecules between electrodes embedded in a nanochannel. The experimental realization of this scheme faces major challenges, however. In realistic liquid environments, typical currents in tunneling devices are of the order of picoamps. This corresponds to only six electrons per microsecond, and this number affects the integration time required to do current measurements in real experiments. This limits the speed of sequencing, though current fluctuations due to Brownian motion of the molecule average out during the required integration time. Moreover, data acquisition equipment introduces noise, and electronic filters create correlations in time-series data. We discuss how these effects must be included in the analysis of, e.g., the assignment of specific nucleobases to current signals. As the signals from different molecules overlap, unambiguous classification is impossible with a single measurement. We argue that the assignment of molecules to a signal is a standard pattern classification problem and calculation of the error rates is straightforward. The ideas presented here can be extended to other sequencing approaches of current interest.

  12. Molecular Barcoding of Aquatic Oligochaetes: Implications for Biomonitoring

    PubMed Central

    Vivien, Régis; Wyler, Sofia; Lafont, Michel; Pawlowski, Jan

    2015-01-01

    Aquatic oligochaetes are well recognized bioindicators of quality of sediments and water in watercourses and lakes. However, the difficult taxonomic determination based on morphological features compromises their more common use in eco-diagnostic analyses. To overcome this limitation, we investigated molecular barcodes as identification tool for broad range of taxa of aquatic oligochaetes. We report 185 COI and 52 ITS2 rDNA sequences for specimens collected in Switzerland and belonging to the families Naididae, Lumbriculidae, Enchytraeidae and Lumbricidae. Phylogenetic analyses allowed distinguishing 41 lineages separated by more than 10 % divergence in COI sequences. The lineage distinction was confirmed by Automatic Barcode Gap Discovery (ABGD) method and by ITS2 data. Our results showed that morphological identification underestimates the oligochaete diversity. Only 26 of the lineages could be assigned to morphospecies, of which seven were sequenced for the first time. Several cryptic species were detected within common morphospecies. Many juvenile specimens that could not be assigned morphologically have found their home after genetic analysis. Our study showed that COI barcodes performed very well as species identifiers in aquatic oligochaetes. Their easy amplification and good taxonomic resolution might help promoting aquatic oligochaetes as bioindicators for next generation environmental DNA biomonitoring of aquatic ecosystems. PMID:25856230

  13. Genotyping of the fish rhabdovirus, viral haemorrhagic septicaemia virus, by restriction fragment length polymorphisms

    USGS Publications Warehouse

    Einer-Jensen, Katja; Winton, James R.; Lorenzen, Niels

    2005-01-01

    The aim of this study was to develop a standardized molecular assay that used limited resources and equipment for routine genotyping of isolates of the fish rhabdovirus, viral haemorrhagic septicaemia virus (VHSV). Computer generated restriction maps, based on 62 unique full-length (1524 nt) sequences of the VHSV glycoprotein (G) gene, were used to predict restriction fragment length polymorphism (RFLP) patterns that were subsequently grouped and compared with a phylogenetic analysis of the G-gene sequences of the same set of isolates. Digestion of PCR amplicons from the full-lengthG-gene by a set of three restriction enzymes was predicted to accurately enable the assignment of the VHSV isolates into the four major genotypes discovered to date. Further sub-typing of the isolates into the recently described sub-lineages of genotype I was possible by applying three additional enzymes. Experimental evaluation of the method consisted of three steps: (i) RT-PCR amplification of the G-gene of VHSV isolates using purified viral RNA as template, (ii) digestion of the PCR products with a panel of restriction endonucleases and (iii) interpretation of the resulting RFLP profiles. The RFLP analysis was shown to approximate the level of genetic discrimination obtained by other, more labour-intensive, molecular techniques such as the ribonuclease protection assay or sequence analysis. In addition, 37 previously uncharacterised isolates from diverse sources were assigned to specific genotypes. While the assay was able to distinguish between marine and continental isolates of VHSV, the differences did not correlate with the pathogenicity of the isolates.

  14. Multi-Platform Next-Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): Genome Assembly and Analysis

    PubMed Central

    Aslam, Luqman; Beal, Kathryn; Ann Blomberg, Le; Bouffard, Pascal; Burt, David W.; Crasta, Oswald; Crooijmans, Richard P. M. A.; Cooper, Kristal; Coulombe, Roger A.; De, Supriyo; Delany, Mary E.; Dodgson, Jerry B.; Dong, Jennifer J.; Evans, Clive; Frederickson, Karin M.; Flicek, Paul; Florea, Liliana; Folkerts, Otto; Groenen, Martien A. M.; Harkins, Tim T.; Herrero, Javier; Hoffmann, Steve; Megens, Hendrik-Jan; Jiang, Andrew; de Jong, Pieter; Kaiser, Pete; Kim, Heebal; Kim, Kyu-Won; Kim, Sungwon; Langenberger, David; Lee, Mi-Kyung; Lee, Taeheon; Mane, Shrinivasrao; Marcais, Guillaume; Marz, Manja; McElroy, Audrey P.; Modise, Thero; Nefedov, Mikhail; Notredame, Cédric; Paton, Ian R.; Payne, William S.; Pertea, Geo; Prickett, Dennis; Puiu, Daniela; Qioa, Dan; Raineri, Emanuele; Ruffier, Magali; Salzberg, Steven L.; Schatz, Michael C.; Scheuring, Chantel; Schmidt, Carl J.; Schroeder, Steven; Searle, Stephen M. J.; Smith, Edward J.; Smith, Jacqueline; Sonstegard, Tad S.; Stadler, Peter F.; Tafer, Hakim; Tu, Zhijian (Jake); Van Tassell, Curtis P.; Vilella, Albert J.; Williams, Kelly P.; Yorke, James A.; Zhang, Liqing; Zhang, Hong-Bin; Zhang, Xiaojun; Zhang, Yang; Reed, Kent M.

    2010-01-01

    A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest. PMID:20838655

  15. Quantifying humpback whale song sequences to understand the dynamics of song exchange at the ocean basin scale.

    PubMed

    Garland, Ellen C; Noad, Michael J; Goldizen, Anne W; Lilley, Matthew S; Rekdahl, Melinda L; Garrigue, Claire; Constantine, Rochelle; Daeschler Hauser, Nan; Poole, M Michael; Robbins, Jooke

    2013-01-01

    Humpback whales have a continually evolving vocal sexual display, or "song," that appears to undergo both evolutionary and "revolutionary" change. All males within a population adhere to the current content and arrangement of the song. Populations within an ocean basin share similarities in their songs; this sharing is complex as multiple variations of the song (song types) may be present within a region at any one time. To quantitatively investigate the similarity of song types, songs were compared at both the individual singer and population level using the Levenshtein distance technique and cluster analysis. The highly stereotyped sequences of themes from the songs of 211 individuals from populations within the western and central South Pacific region from 1998 through 2008 were grouped together based on the percentage of song similarity, and compared to qualitatively assigned song types. The analysis produced clusters of highly similar songs that agreed with previous qualitative assignments. Each cluster contained songs from multiple populations and years, confirming the eastward spread of song types and their progressive evolution through the study region. Quantifying song similarity and exchange will assist in understanding broader song dynamics and contribute to the use of vocal displays as population identifiers.

  16. Improved serial analysis of V1 ribosomal sequence tags (SARST-V1) provides a rapid, comprehensive, sequence-based characterization of bacterial diversity and community composition.

    PubMed

    Yu, Zhongtang; Yu, Marie; Morrison, Mark

    2006-04-01

    Serial analysis of ribosomal sequence tags (SARST) is a recently developed technology that can generate large 16S rRNA gene (rrs) sequence data sets from microbiomes, but there are numerous enzymatic and purification steps required to construct the ribosomal sequence tag (RST) clone libraries. We report here an improved SARST method, which still targets the V1 hypervariable region of rrs genes, but reduces the number of enzymes, oligonucleotides, reagents, and technical steps needed to produce the RST clone libraries. The new method, hereafter referred to as SARST-V1, was used to examine the eubacterial diversity present in community DNA recovered from the microbiome resident in the ovine rumen. The 190 sequenced clones contained 1055 RSTs and no less than 236 unique phylotypes (based on > or = 95% sequence identity) that were assigned to eight different eubacterial phyla. Rarefaction and monomolecular curve analyses predicted that the complete RST clone library contains 99% of the 353 unique phylotypes predicted to exist in this microbiome. When compared with ribosomal intergenic spacer analysis (RISA) of the same community DNA sample, as well as a compilation of nine previously published conventional rrs clone libraries prepared from the same type of samples, the RST clone library provided a more comprehensive characterization of the eubacterial diversity present in rumen microbiomes. As such, SARST-V1 should be a useful tool applicable to comprehensive examination of diversity and composition in microbiomes and offers an affordable, sequence-based method for diversity analysis.

  17. Rapid Identification of Sequences for Orphan Enzymes to Power Accurate Protein Annotation

    PubMed Central

    Ojha, Sunil; Watson, Douglas S.; Bomar, Martha G.; Galande, Amit K.; Shearer, Alexander G.

    2013-01-01

    The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the “back catalog” of enzymology – “orphan enzymes,” those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme “back catalog” is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology’s “back catalog” another powerful tool to drive accurate genome annotation. PMID:24386392

  18. Rapid identification of sequences for orphan enzymes to power accurate protein annotation.

    PubMed

    Ramkissoon, Kevin R; Miller, Jennifer K; Ojha, Sunil; Watson, Douglas S; Bomar, Martha G; Galande, Amit K; Shearer, Alexander G

    2013-01-01

    The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the "back catalog" of enzymology--"orphan enzymes," those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme "back catalog" is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology's "back catalog" another powerful tool to drive accurate genome annotation.

  19. Assignment of the human PAX4 gene to chromosome band 7q32 by fluorescence in situ hybridization.

    PubMed

    Tamura, T; Izumikawa, Y; Kishino, T; Soejima, H; Jinno, Y; Niikawa, N

    1994-01-01

    Of the nine known members of a human paired box-containing gene family (Pax), only PAX4 has not been precisely localized. We screened a cosmid library of human genomic DNA using polymerase chain reaction products for PAX4 as a probe and isolated three positive cosmid clones. Sequence analysis revealed that at least two of them had exon-like sequences and showed extensive homology to Pax-4 in the mouse. These two cosmid clones were mapped to human chromosome band 7q32 by fluorescence in situ hybridization.

  20. The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4).

    PubMed

    Huntemann, Marcel; Ivanova, Natalia N; Mavromatis, Konstantinos; Tripp, H James; Paez-Espino, David; Palaniappan, Krishnaveni; Szeto, Ernest; Pillay, Manoj; Chen, I-Min A; Pati, Amrita; Nielsen, Torben; Markowitz, Victor M; Kyrpides, Nikos C

    2015-01-01

    The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. Structural annotation is followed by assignment of protein product names and functions.

  1. A genomewide survey of basic helix–loop–helix factors in Drosophila

    PubMed Central

    Moore, Adrian W.; Barbel, Sandra; Jan, Lily Yeh; Jan, Yuh Nung

    2000-01-01

    The basic helix–loop–helix (bHLH) transcription factors play important roles in the specification of tissue type during the development of animals. We have used the information contained in the recently published genomic sequence of Drosophila melanogaster to identify 12 additional bHLH proteins. By sequence analysis we have assigned these proteins to families defined by Atonal, Hairy-Enhancer of Split, Hand, p48, Mesp, MYC/USF, and the bHLH-Per, Arnt, Sim (PAS) domain. In addition, one single protein represents a unique family of bHLH proteins. mRNA in situ analysis demonstrates that the genes encoding these proteins are expressed in several tissue types but are particularly concentrated in the developing nervous system and mesoderm. PMID:10973473

  2. Application of a fast sorting algorithm to the assignment of mass spectrometric cross-linking data.

    PubMed

    Petrotchenko, Evgeniy V; Borchers, Christoph H

    2014-09-01

    Cross-linking combined with MS involves enzymatic digestion of cross-linked proteins and identifying cross-linked peptides. Assignment of cross-linked peptide masses requires a search of all possible binary combinations of peptides from the cross-linked proteins' sequences, which becomes impractical with increasing complexity of the protein system and/or if digestion enzyme specificity is relaxed. Here, we describe the application of a fast sorting algorithm to search large sequence databases for cross-linked peptide assignments based on mass. This same algorithm has been used previously for assigning disulfide-bridged peptides (Choi et al., ), but has not previously been applied to cross-linking studies. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Time-series oligonucleotide count to assign antiviral siRNAs with long utility fit in the big data era.

    PubMed

    Wada, K; Wada, Y; Iwasaki, Y; Ikemura, T

    2017-10-01

    Oligonucleotides are key elements of nucleic acid therapeutics such as small interfering RNAs (siRNAs). Influenza and Ebolaviruses are zoonotic RNA viruses mutating very rapidly, and their sequence changes must be characterized intensively to design therapeutic oligonucleotides with long utility. Focusing on a total of 182 experimentally validated siRNAs for influenza A, B and Ebolaviruses compiled by the siRNA database, we conducted time-series analyses of occurrences of siRNA targets in these viral genomes. Reflecting their high mutation rates, occurrences of target oligonucleotides evidently fluctuate in viral populations and often disappear. Time-series analysis of the one-base changed sequences derived from each original target identified the oligonucleotide that shows a compensatory increase and will potentially become the 'awaiting-type oligonucleotide'; the combined use of this oligonucleotide with the original can provide therapeutics with long utility. This strategy is also useful for assigning diagnostic reverse transcription-PCR primers with long utility.

  4. Time-series oligonucleotide count to assign antiviral siRNAs with long utility fit in the big data era

    PubMed Central

    Wada, K; Wada, Y; Iwasaki, Y; Ikemura, T

    2017-01-01

    Oligonucleotides are key elements of nucleic acid therapeutics such as small interfering RNAs (siRNAs). Influenza and Ebolaviruses are zoonotic RNA viruses mutating very rapidly, and their sequence changes must be characterized intensively to design therapeutic oligonucleotides with long utility. Focusing on a total of 182 experimentally validated siRNAs for influenza A, B and Ebolaviruses compiled by the siRNA database, we conducted time-series analyses of occurrences of siRNA targets in these viral genomes. Reflecting their high mutation rates, occurrences of target oligonucleotides evidently fluctuate in viral populations and often disappear. Time-series analysis of the one-base changed sequences derived from each original target identified the oligonucleotide that shows a compensatory increase and will potentially become the ‘awaiting-type oligonucleotide’ the combined use of this oligonucleotide with the original can provide therapeutics with long utility. This strategy is also useful for assigning diagnostic reverse transcription-PCR primers with long utility. PMID:28905886

  5. The practical evaluation of DNA barcode efficacy.

    PubMed

    Spouge, John L; Mariño-Ramírez, Leonardo

    2012-01-01

    This chapter describes a workflow for measuring the efficacy of a barcode in identifying species. First, assemble individual sequence databases corresponding to each barcode marker. A controlled collection of taxonomic data is preferable to GenBank data, because GenBank data can be problematic, particularly when comparing barcodes based on more than one marker. To ensure proper controls when evaluating species identification, specimens not having a sequence in every marker database should be discarded. Second, select a computer algorithm for assigning species to barcode sequences. No algorithm has yet improved notably on assigning a specimen to the species of its nearest neighbor within a barcode database. Because global sequence alignments (e.g., with the Needleman-Wunsch algorithm, or some related algorithm) examine entire barcode sequences, they generally produce better species assignments than local sequence alignments (e.g., with BLAST). No neighboring method (e.g., global sequence similarity, global sequence distance, or evolutionary distance based on a global alignment) has yet shown a notable superiority in identifying species. Finally, "the probability of correct identification" (PCI) provides an appropriate measurement of barcode efficacy. The overall PCI for a data set is the average of the species PCIs, taken over all species in the data set. This chapter states explicitly how to calculate PCI, how to estimate its statistical sampling error, and how to use data on PCR failure to set limits on how much improvements in PCR technology can improve species identification.

  6. Comparative transcriptome analysis of microsclerotia development in Nomuraea rileyi.

    PubMed

    Song, Zhangyong; Yin, Youping; Jiang, Shasha; Liu, Juanjuan; Chen, Huan; Wang, Zhongkang

    2013-06-19

    Nomuraea rileyi is used as an environmental-friendly biopesticide. However, mass production and commercialization of this organism are limited due to its fastidious growth and sporulation requirements. When cultured in amended medium, we found that N. rileyi could produce microsclerotia bodies, replacing conidiophores as the infectious agent. However, little is known about the genes involved in microsclerotia development. In the present study, the transcriptomes were analyzed using next-generation sequencing technology to find the genes involved in microsclerotia development. A total of 4.69 Gb of clean nucleotides comprising 32,061 sequences was obtained, and 20,919 sequences were annotated (about 65%). Among the annotated sequences, only 5928 were annotated with 34 gene ontology (GO) functional categories, and 12,778 sequences were mapped to 165 pathways by searching against the Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) database. Furthermore, we assessed the transcriptomic differences between cultures grown in minimal and amended medium. In total, 4808 sequences were found to be differentially expressed; 719 differentially expressed unigenes were assigned to 25 GO classes and 1888 differentially expressed unigenes were assigned to 161 KEGG pathways, including 25 enrichment pathways. Subsequently, we examined the up-regulation or uniquely expressed genes following amended medium treatment, which were also expressed on the enrichment pathway, and found that most of them participated in mediating oxidative stress homeostasis. To elucidate the role of oxidative stress in microsclerotia development, we analyzed the diversification of unigenes using quantitative reverse transcription-PCR (RT-qPCR). Our findings suggest that oxidative stress occurs during microsclerotia development, along with a broad metabolic activity change. Our data provide the most comprehensive sequence resource available for the study of N. rileyi. We believe that the transcriptome datasets will serve as an important public information platform to accelerate studies on N. rileyi microsclerotia.

  7. A taxonomic framework for cable bacteria and proposal of the candidate genera Electrothrix and Electronema.

    PubMed

    Trojan, Daniela; Schreiber, Lars; Bjerg, Jesper T; Bøggild, Andreas; Yang, Tingting; Kjeldsen, Kasper U; Schramm, Andreas

    2016-07-01

    Cable bacteria are long, multicellular filaments that can conduct electric currents over centimeter-scale distances. All cable bacteria identified to date belong to the deltaproteobacterial family Desulfobulbaceae and have not been isolated in pure culture yet. Their taxonomic delineation and exact phylogeny is uncertain, as most studies so far have reported only short partial 16S rRNA sequences or have relied on identification by a combination of filament morphology and 16S rRNA-targeted fluorescence in situ hybridization with a Desulfobulbaceae-specific probe. In this study, nearly full-length 16S rRNA gene sequences of 16 individual cable bacteria filaments from freshwater, salt marsh, and marine sites of four geographic locations are presented. These sequences formed a distinct, monophyletic sister clade to the genus Desulfobulbus and could be divided into six coherent, species-level clusters, arranged as two genus-level groups. The same grouping was retrieved by phylogenetic analysis of full or partial dsrAB genes encoding the dissimilatory sulfite reductase. Based on these results, it is proposed to accommodate cable bacteria within two novel candidate genera: the mostly marine "Candidatus Electrothrix", with four candidate species, and the mostly freshwater "Candidatus Electronema", with two candidate species. This taxonomic framework can be used to assign environmental sequences confidently to the cable bacteria clade, even without morphological information. Database searches revealed 185 16S rRNA gene sequences that affiliated within the clade formed by the proposed cable bacteria genera, of which 120 sequences could be assigned to one of the six candidate species, while the remaining 65 sequences indicated the existence of up to five additional species. Copyright © 2016 The Author(s). Published by Elsevier GmbH.. All rights reserved.

  8. Identification of Mycobacterium spp. of veterinary importance using rpoB gene sequencing

    PubMed Central

    2011-01-01

    Background Studies conducted on Mycobacterium spp. isolated from human patients indicate that sequencing of a 711 bp portion of the rpoB gene can be useful in assigning a species identity, particularly for members of the Mycobacterium avium complex (MAC). Given that MAC are important pathogens in livestock, companion animals, and zoo/exotic animals, we were interested in evaluating the use of rpoB sequencing for identification of Mycobacterium isolates of veterinary origin. Results A total of 386 isolates, collected over 2008 - June 2011 from 378 animals (amphibians, reptiles, birds, and mammals) underwent PCR and sequencing of a ~ 711 bp portion of the rpoB gene; 310 isolates (80%) were identified to the species level based on similarity at ≥ 98% with a reference sequence. The remaining 76 isolates (20%) displayed < 98% similarity with reference sequences and were assigned to a clade based on their location in a neighbor-joining tree containing reference sequences. For a subset of 236 isolates that received both 16S rRNA and rpoB sequencing, 167 (70%) displayed a similar species/clade assignation for both sequencing methods. For the remaining 69 isolates, species/clade identities were different with each sequencing method. Mycobacterium avium subsp. hominissuis was the species most frequently isolated from specimens from pigs, cervids, companion animals, cattle, and exotic/zoo animals. Conclusions rpoB sequencing proved useful in identifying Mycobacterium isolates of veterinary origin to clade, species, or subspecies levels, particularly for assemblages (such as the MAC) where 16S rRNA sequencing alone is not adequate to demarcate these taxa. rpoB sequencing can represent a cost-effective identification tool suitable for routine use in the veterinary diagnostic laboratory. PMID:22118247

  9. Homonuclear Hartmann-Hahn transfer with reduced relaxation losses by use of the MOCCA-XY16 multiple pulse sequence

    NASA Astrophysics Data System (ADS)

    Furrer, Julien; Kramer, Frank; Marino, John P.; Glaser, Steffen J.; Luy, Burkhard

    2004-01-01

    Homonuclear Hartmann-Hahn transfer is one of the most important building blocks in modern high-resolution NMR. It constitutes a very efficient transfer element for the assignment of proteins, nucleic acids, and oligosaccharides. Nevertheless, in macromolecules exceeding ˜10 kDa TOCSY-experiments can show decreasing sensitivity due to fast transverse relaxation processes that are active during the mixing periods. In this article we propose the MOCCA-XY16 multiple pulse sequence, originally developed for efficient TOCSY transfer through residual dipolar couplings, as a homonuclear Hartmann-Hahn sequence with improved relaxation properties. A theoretical analysis of the coherence transfer via scalar couplings and its relaxation behavior as well as experimental transfer curves for MOCCA-XY16 relative to the well-characterized DIPSI-2 multiple pulse sequence are given.

  10. Homonuclear Hartmann-Hahn transfer with reduced relaxation losses by use of the MOCCA-XY16 multiple pulse sequence.

    PubMed

    Furrer, Julien; Kramer, Frank; Marino, John P; Glaser, Steffen J; Luy, Burkhard

    2004-01-01

    Homonuclear Hartmann-Hahn transfer is one of the most important building blocks in modern high-resolution NMR. It constitutes a very efficient transfer element for the assignment of proteins, nucleic acids, and oligosaccharides. Nevertheless, in macromolecules exceeding approximately 10 kDa TOCSY-experiments can show decreasing sensitivity due to fast transverse relaxation processes that are active during the mixing periods. In this article we propose the MOCCA-XY16 multiple pulse sequence, originally developed for efficient TOCSY transfer through residual dipolar couplings, as a homonuclear Hartmann-Hahn sequence with improved relaxation properties. A theoretical analysis of the coherence transfer via scalar couplings and its relaxation behavior as well as experimental transfer curves for MOCCA-XY16 relative to the well-characterized DIPSI-2 multiple pulse sequence are given.

  11. From Sequences to Insights in Microbial Ecology

    PubMed Central

    Knight, R.

    2010-01-01

    s4-3 Rapid declines in the cost of sequencing have made large volumes of DNA sequence data available to individual investigators. Now, data analysis is the rate-limiting step: providing a user with sequences alone typically leads to bewilderment, frustration, and skepticism about the technology. In this talk, I focus on how to extract insights from 16S rRNA data, including key lab steps (barcoding and normalization) and on which tools are available to perform routine but essential processing steps such as denoising, chimera detection, taxonomy assignment, and diversity analyses (including detection of biological clusters and gradients in the samples). Providing users with advice on these points and with a standard pipeline they can exploit (but modify if circumstances require) can greatly accelerate the rate of understanding, publication, and acquisition of funding for further studies.

  12. HIV infection among U.S. Army and Air Force military personnel: sociodemographic and genotyping analysis.

    PubMed

    Singer, Darrell E; Bautista, Christian T; O'Connell, Robert J; Sanders-Buell, Eric; Agan, Brian K; Kijak, Gustavo H; Hakre, Shilpa; Sanchez, Jose L; Sateren, Warren B; McCutchan, Francine E; Michael, Nelson L; Scott, Paul T

    2010-08-01

    Since 1985, the U.S. Department of Defense has periodically screened all military personnel for HIV allowing for the monitoring of the infection in this dynamic cohort population. A nested case-control study was performed to study sociodemographics, overseas assignment, and molecular analysis of HIV. Cases were newly identified HIV infections among U.S. Army and Air Force military personnel from 2000 to 2004. Controls were frequency matched to cases by gender and date of case first positive HIV screening test. Genotyping analysis was performed using high-throughput screening assays and partial genome sequencing. HIV was significantly associated with black race [odds ratio (OR) = 6.65], single marital status (OR = 4.45), and age (OR per year = 1.07). Ninety-seven percent were subtype B and 3% were non-B subtypes (A3, CRF01_AE, A/C recombinant, G, CRF02_AG). Among cases, overseas assignment in the period at risk prior to their first HIV-positive test was associated with non-B HIV subtype infection (OR = 8.44). Black and single military personnel remain disproportionately affected by HIV infection. Most non-B HIV subtypes were associated with overseas assignment. Given the increased frequency and length of assignments, and the expanding HIV genetic diversity observed in this population, there is a need for active HIV genotyping surveillance and a need to reinforce primary HIV prevention efforts.

  13. Classifying short genomic fragments from novel lineages using composition and homology

    PubMed Central

    2011-01-01

    Background The assignment of taxonomic attributions to DNA fragments recovered directly from the environment is a vital step in metagenomic data analysis. Assignments can be made using rank-specific classifiers, which assign reads to taxonomic labels from a predetermined level such as named species or strain, or rank-flexible classifiers, which choose an appropriate taxonomic rank for each sequence in a data set. The choice of rank typically depends on the optimal model for a given sequence and on the breadth of taxonomic groups seen in a set of close-to-optimal models. Homology-based (e.g., LCA) and composition-based (e.g., PhyloPythia, TACOA) rank-flexible classifiers have been proposed, but there is at present no hybrid approach that utilizes both homology and composition. Results We first develop a hybrid, rank-specific classifier based on BLAST and Naïve Bayes (NB) that has comparable accuracy and a faster running time than the current best approach, PhymmBL. By substituting LCA for BLAST or allowing the inclusion of suboptimal NB models, we obtain a rank-flexible classifier. This hybrid classifier outperforms established rank-flexible approaches on simulated metagenomic fragments of length 200 bp to 1000 bp and is able to assign taxonomic attributions to a subset of sequences with few misclassifications. We then demonstrate the performance of different classifiers on an enhanced biological phosphorous removal metagenome, illustrating the advantages of rank-flexible classifiers when representative genomes are absent from the set of reference genomes. Application to a glacier ice metagenome demonstrates that similar taxonomic profiles are obtained across a set of classifiers which are increasingly conservative in their classification. Conclusions Our NB-based classification scheme is faster than the current best composition-based algorithm, Phymm, while providing equally accurate predictions. The rank-flexible variant of NB, which we term ε-NB, is complementary to LCA and can be combined with it to yield conservative prediction sets of very high confidence. The simple parameterization of LCA and ε-NB allows for tuning of the balance between more predictions and increased precision, allowing the user to account for the sensitivity of downstream analyses to misclassified or unclassified sequences. PMID:21827705

  14. Sequence Analysis of the Genome of Carnation (Dianthus caryophyllus L.)

    PubMed Central

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-01-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. ‘Francesco’ was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568 887 315 bp, consisting of 45 088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16 644 bp and 60 737 bp, respectively, and the longest scaffold was 1 287 144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. PMID:24344172

  15. A DNA Barcode Library for North American Pyraustinae (Lepidoptera: Pyraloidea: Crambidae).

    PubMed

    Yang, Zhaofu; Landry, Jean-François; Hebert, Paul D N

    2016-01-01

    Although members of the crambid subfamily Pyraustinae are frequently important crop pests, their identification is often difficult because many species lack conspicuous diagnostic morphological characters. DNA barcoding employs sequence diversity in a short standardized gene region to facilitate specimen identifications and species discovery. This study provides a DNA barcode reference library for North American pyraustines based upon the analysis of 1589 sequences recovered from 137 nominal species, 87% of the fauna. Data from 125 species were barcode compliant (>500bp, <1% n), and 99 of these taxa formed a distinct cluster that was assigned to a single BIN. The other 26 species were assigned to 56 BINs, reflecting frequent cases of deep intraspecific sequence divergence and a few instances of barcode sharing, creating a total of 155 BINs. Two systems for OTU designation, ABGD and BIN, were examined to check the correspondence between current taxonomy and sequence clusters. The BIN system performed better than ABGD in delimiting closely related species, while OTU counts with ABGD were influenced by the value employed for relative gap width. Different species with low or no interspecific divergence may represent cases of unrecognized synonymy, whereas those with high intraspecific divergence require further taxonomic scrutiny as they may involve cryptic diversity. The barcode library developed in this study will also help to advance understanding of relationships among species of Pyraustinae.

  16. Streptococcus pharyngis sp. nov., a novel streptococcal species isolated from the respiratory tract of wild rabbits.

    PubMed

    Vela, Ana I; Casas-Díaz, Encarna; Lavín, Santiago; Domínguez, Lucas; Fernández-Garayzábal, Jose F

    2015-09-01

    Four isolates of an unknown Gram-stain-positive, catalase-negative coccus-shaped organism, isolated from the pharynx of four wild rabbits, were characterized by phenotypic and molecular genetic methods. The micro-organisms were tentatively assigned to the genus Streptococcus based on cellular morphological and biochemical criteria, although the organisms did not appear to correspond to any species with a validly published name. Comparative 16S rRNA gene sequencing confirmed their identification as members of the genus Streptococcus, being most closely related phylogenetically to Streptococcus porcorum 682-03(T) (96.9% 16S rRNA gene sequence similarity). Analysis of rpoB and sodA gene sequences showed divergence values between the novel species and S. porcorum 682-03(T) (the closest phylogenetic relative determined from 16S rRNA gene sequences) of 18.1 and 23.9%, respectively. The novel bacterial isolate could be distinguished from the type strain of S. porcorum by several biochemical characteristics, such as the production of glycyl-tryptophan arylamidase and α-chymotrypsin, and the non-acidification of different sugars. Based on both phenotypic and phylogenetic findings, it is proposed that the unknown bacterium be assigned to a novel species of the genus Streptococcus, and named Streptococcus pharyngis sp. nov. The type strain is DICM10-00796B(T) ( = CECT 8754(T) = CCUG 66496(T)).

  17. Business Planning in the Light of Neuro-fuzzy and Predictive Forecasting

    NASA Astrophysics Data System (ADS)

    Chakrabarti, Prasun; Basu, Jayanta Kumar; Kim, Tai-Hoon

    In this paper we have pointed out gain sensing on forecast based techniques.We have cited an idea of neural based gain forecasting. Testing of sequence of gain pattern is also verifies using statsistical analysis of fuzzy value assignment. The paper also suggests realization of stable gain condition using K-Means clustering of data mining. A new concept of 3D based gain sensing has been pointed out. The paper also reveals what type of trend analysis can be observed for probabilistic gain prediction.

  18. Draft genome sequence of bitter gourd (Momordica charantia), a vegetable and medicinal plant in tropical and subtropical regions.

    PubMed

    Urasaki, Naoya; Takagi, Hiroki; Natsume, Satoshi; Uemura, Aiko; Taniai, Naoki; Miyagi, Norimichi; Fukushima, Mai; Suzuki, Shouta; Tarora, Kazuhiko; Tamaki, Moritoshi; Sakamoto, Moriaki; Terauchi, Ryohei; Matsumura, Hideo

    2017-02-01

    Bitter gourd (Momordica charantia) is an important vegetable and medicinal plant in tropical and subtropical regions globally. In this study, the draft genome sequence of a monoecious bitter gourd inbred line, OHB3-1, was analyzed. Through Illumina sequencing and de novo assembly, scaffolds of 285.5 Mb in length were generated, corresponding to ∼84% of the estimated genome size of bitter gourd (339 Mb). In this draft genome sequence, 45,859 protein-coding gene loci were identified, and transposable elements accounted for 15.3% of the whole genome. According to synteny mapping and phylogenetic analysis of conserved genes, bitter gourd was more related to watermelon (Citrullus lanatus) than to cucumber (Cucumis sativus) or melon (C. melo). Using RAD-seq analysis, 1507 marker loci were genotyped in an F2 progeny of two bitter gourd lines, resulting in an improved linkage map, comprising 11 linkage groups. By anchoring RAD tag markers, 255 scaffolds were assigned to the linkage map. Comparative analysis of genome sequences and predicted genes determined that putative trypsin-inhibitor and ribosome-inactivating genes were distinctive in the bitter gourd genome. These genes could characterize the bitter gourd as a medicinal plant. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  19. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution.

    PubMed

    2004-12-09

    We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.

  20. GenBank.

    PubMed

    Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Wheeler, David L

    2008-01-01

    GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 260 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov.

  1. GenBank

    PubMed Central

    Benson, Dennis A.; Karsch-Mizrachi, Ilene; Lipman, David J.; Ostell, James; Wheeler, David L.

    2008-01-01

    GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 260 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov PMID:18073190

  2. Flexible, fast and accurate sequence alignment profiling on GPGPU with PaSWAS.

    PubMed

    Warris, Sven; Yalcin, Feyruz; Jackson, Katherine J L; Nap, Jan Peter

    2015-01-01

    To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW) algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate due to statistical issues. Current SW implementations that run on graphics hardware do not report the alignment details necessary for further analysis. With the Parallel SW Alignment Software (PaSWAS) it is possible (a) to have easy access to the computational power of NVIDIA-based general purpose graphics processing units (GPGPUs) to perform high-speed sequence alignments, and (b) retrieve relevant information such as score, number of gaps and mismatches. The software reports multiple hits per alignment. The added value of the new SW implementation is demonstrated with two test cases: (1) tag recovery in next generation sequence data and (2) isotype assignment within an immunoglobulin 454 sequence data set. Both cases show the usability and versatility of the new parallel Smith-Waterman implementation.

  3. Structure-Specific Ribonucleases for MS-Based Elucidation of Higher-Order RNA Structure

    NASA Astrophysics Data System (ADS)

    Scalabrin, Matteo; Siu, Yik; Asare-Okai, Papa Nii; Fabris, Daniele

    2014-07-01

    Supported by high-throughput sequencing technologies, structure-specific nucleases are experiencing a renaissance as biochemical probes for genome-wide mapping of nucleic acid structure. This report explores the benefits and pitfalls of the application of Mung bean (Mb) and V1 nuclease, which attack specifically single- and double-stranded regions of nucleic acids, as possible structural probes to be employed in combination with MS detection. Both enzymes were found capable of operating in ammonium-based solutions that are preferred for high-resolution analysis by direct infusion electrospray ionization (ESI). Sequence analysis by tandem mass spectrometry (MS/MS) was performed to confirm mapping assignments and to resolve possible ambiguities arising from the concomitant formation of isobaric products with identical base composition and different sequences. The observed products grouped together into ladder-type series that facilitated their assignment to unique regions of the substrate, but revealed also a certain level of uncertainty in identifying the boundaries between paired and unpaired regions. Various experimental factors that are known to stabilize nucleic acid structure, such as higher ionic strength, presence of Mg(II), etc., increased the accuracy of cleavage information, but did not completely eliminate deviations from expected results. These observations suggest extreme caution in interpreting the results afforded by these types of reagents. Regardless of the analytical platform of choice, the results highlighted the need to repeat probing experiments under the most diverse possible conditions to recognize potential artifacts and to increase the level of confidence in the observed structural information.

  4. Development of a Multiplex Single Base Extension Assay for Mitochondrial DNA Haplogroup Typing

    PubMed Central

    Nelson, Tahnee M.; Just, Rebecca S.; Loreille, Odile; Schanfield, Moses S.; Podini, Daniele

    2007-01-01

    Aim To provide a screening tool to reduce time and sample consumption when attempting mtDNA haplogroup typing. Methods A single base primer extension assay was developed to enable typing, in a single reaction, of twelve mtDNA haplogroup specific polymorphisms. For validation purposes a total of 147 samples were tested including 73 samples successfully haplogroup typed using mtDNA control region (CR) sequence data, 21 samples inconclusively haplogroup typed by CR data, 20 samples previously haplogroup typed using restriction fragment length polymorphism (RFLP) analysis, and 31 samples of known ancestral origin without previous haplogroup typing. Additionally, two highly degraded human bones embalmed and buried in the early 1950s were analyzed using the single nucleotide polymorphisms (SNP) multiplex. Results When the SNP multiplex was used to type the 96 previously CR sequenced specimens, an increase in haplogroup or macrohaplogroup assignment relative to conventional CR sequence analysis was observed. The single base extension assay was also successfully used to assign a haplogroup to decades-old, embalmed skeletal remains dating to World War II. Conclusion The SNP multiplex was successfully used to obtain haplogroup status of highly degraded human bones, and demonstrated the ability to eliminate possible contributors. The SNP multiplex provides a low-cost, high throughput method for typing of mtDNA haplogroups A, B, C, D, E, F, G, H, L1/L2, L3, M, and N that could be useful for screening purposes for human identification efforts and anthropological studies. PMID:17696300

  5. Evaluation of next generation mtGenome sequencing using the Ion Torrent Personal Genome Machine (PGM)☆

    PubMed Central

    Parson, Walther; Strobl, Christina; Huber, Gabriela; Zimmermann, Bettina; Gomes, Sibylle M.; Souto, Luis; Fendt, Liane; Delport, Rhena; Langit, Reina; Wootton, Sharon; Lagacé, Robert; Irwin, Jodi

    2013-01-01

    Insights into the human mitochondrial phylogeny have been primarily achieved by sequencing full mitochondrial genomes (mtGenomes). In forensic genetics (partial) mtGenome information can be used to assign haplotypes to their phylogenetic backgrounds, which may, in turn, have characteristic geographic distributions that would offer useful information in a forensic case. In addition and perhaps even more relevant in the forensic context, haplogroup-specific patterns of mutations form the basis for quality control of mtDNA sequences. The current method for establishing (partial) mtDNA haplotypes is Sanger-type sequencing (STS), which is laborious, time-consuming, and expensive. With the emergence of Next Generation Sequencing (NGS) technologies, the body of available mtDNA data can potentially be extended much more quickly and cost-efficiently. Customized chemistries, laboratory workflows and data analysis packages could support the community and increase the utility of mtDNA analysis in forensics. We have evaluated the performance of mtGenome sequencing using the Personal Genome Machine (PGM) and compared the resulting haplotypes directly with conventional Sanger-type sequencing. A total of 64 mtGenomes (>1 million bases) were established that yielded high concordance with the corresponding STS haplotypes (<0.02% differences). About two-thirds of the differences were observed in or around homopolymeric sequence stretches. In addition, the sequence alignment algorithm employed to align NGS reads played a significant role in the analysis of the data and the resulting mtDNA haplotypes. Further development of alignment software would be desirable to facilitate the application of NGS in mtDNA forensic genetics. PMID:23948325

  6. Cloning, analysis and functional annotation of expressed sequence tags from the Earthworm Eisenia fetida

    PubMed Central

    Pirooznia, Mehdi; Gong, Ping; Guan, Xin; Inouye, Laura S; Yang, Kuan; Perkins, Edward J; Deng, Youping

    2007-01-01

    Background Eisenia fetida, commonly known as red wiggler or compost worm, belongs to the Lumbricidae family of the Annelida phylum. Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression response to environmental contaminants, we cloned 4032 cDNAs or expressed sequence tags (ESTs) from two E. fetida libraries enriched with genes responsive to ten ordnance related compounds using suppressive subtractive hybridization-PCR. Results A total of 3144 good quality ESTs (GenBank dbEST accession number EH669363–EH672369 and EL515444–EL515580) were obtained from the raw clone sequences after cleaning. Clustering analysis yielded 2231 unique sequences including 448 contigs (from 1361 ESTs) and 1783 singletons. Comparative genomic analysis showed that 743 or 33% of the unique sequences shared high similarity with existing genes in the GenBank nr database. Provisional function annotation assigned 830 Gene Ontology terms to 517 unique sequences based on their homology with the annotated genomes of four model organisms Drosophila melanogaster, Mus musculus, Saccharomyces cerevisiae, and Caenorhabditis elegans. Seven percent of the unique sequences were further mapped to 99 Kyoto Encyclopedia of Genes and Genomes pathways based on their matching Enzyme Commission numbers. All the information is stored and retrievable at a highly performed, web-based and user-friendly relational database called EST model database or ESTMD version 2. Conclusion The ESTMD containing the sequence and annotation information of 4032 E. fetida ESTs is publicly accessible at . PMID:18047730

  7. DNA barcode analysis: a comparison of phylogenetic and statistical classification methods.

    PubMed

    Austerlitz, Frederic; David, Olivier; Schaeffer, Brigitte; Bleakley, Kevin; Olteanu, Madalina; Leblois, Raphael; Veuille, Michel; Laredo, Catherine

    2009-11-10

    DNA barcoding aims to assign individuals to given species according to their sequence at a small locus, generally part of the CO1 mitochondrial gene. Amongst other issues, this raises the question of how to deal with within-species genetic variability and potential transpecific polymorphism. In this context, we examine several assignation methods belonging to two main categories: (i) phylogenetic methods (neighbour-joining and PhyML) that attempt to account for the genealogical framework of DNA evolution and (ii) supervised classification methods (k-nearest neighbour, CART, random forest and kernel methods). These methods range from basic to elaborate. We investigated the ability of each method to correctly classify query sequences drawn from samples of related species using both simulated and real data. Simulated data sets were generated using coalescent simulations in which we varied the genealogical history, mutation parameter, sample size and number of species. No method was found to be the best in all cases. The simplest method of all, "one nearest neighbour", was found to be the most reliable with respect to changes in the parameters of the data sets. The parameter most influencing the performance of the various methods was molecular diversity of the data. Addition of genetically independent loci--nuclear genes--improved the predictive performance of most methods. The study implies that taxonomists can influence the quality of their analyses either by choosing a method best-adapted to the configuration of their sample, or, given a certain method, increasing the sample size or altering the amount of molecular diversity. This can be achieved either by sequencing more mtDNA or by sequencing additional nuclear genes. In the latter case, they may also have to modify their data analysis method.

  8. Delineation of the Species Haemophilus influenzae by Phenotype, Multilocus Sequence Phylogeny, and Detection of Marker Genes▿ †

    PubMed Central

    Nørskov-Lauritsen, Niels; Overballe, Merete D.; Kilian, Mogens

    2009-01-01

    To obtain more information on the much-debated definition of prokaryotic species, we investigated the borders of Haemophilus influenzae by comparative analysis of H. influenzae reference strains with closely related bacteria including strains assigned to Haemophilus haemolyticus, cryptic genospecies biotype IV, and the never formally validated species “Haemophilus intermedius”. Multilocus sequence phylogeny based on six housekeeping genes separated a cluster encompassing the type and the reference strains of H. influenzae from 31 more distantly related strains. Comparison of 16S rRNA gene sequences supported this delineation but was obscured by a conspicuously high number of polymorphic sites in many of the strains that did not belong to the core group of H. influenzae strains. The division was corroborated by the differential presence of genes encoding H. influenzae adhesion and penetration protein, fuculokinase, and Cu,Zn-superoxide dismutase, whereas immunoglobulin A1 protease activity or the presence of the iga gene was of limited discriminatory value. The existence of porphyrin-synthesizing strains (“H. intermedius”) closely related to H. influenzae was confirmed. Several chromosomally encoded hemin biosynthesis genes were identified, and sequence analysis showed these genes to represent an ancestral genotype rather than recent transfers from, e.g., Haemophilus parainfluenzae. Strains previously assigned to H. haemolyticus formed several separate lineages within a distinct but deeply branching cluster, intermingled with strains of “H. intermedius” and cryptic genospecies biotype IV. Although H. influenzae is phenotypically more homogenous than some other Haemophilus species, the genetic diversity and multicluster structure of strains traditionally associated with H. influenzae make it difficult to define the natural borders of that species. PMID:19060144

  9. Identification of differentially expressed genes through RNA sequencing in goats (Capra hircus) at different postnatal stages

    PubMed Central

    Li, Qian; Lin, Sen

    2017-01-01

    Intramuscular fat (IMF) content and fatty acid composition of longissimus dorsi muscle (LM) change with growth, which partially determines the flavor and nutritional value of goat (Capra hircus) meat. However, unlike cattle, little information is available on the transcriptome-wide changes during different postnatal stages in small ruminants, especially goats. In this study, the sequencing reads of goat LM tissues collected from kid, youth, and adult period were mapped to the goat genome. Results showed that out of total 24 689 Unigenes, 20 435 Unigenes were annotated. Based on expected number of fragments per kilobase of transcript sequence per million base pairs sequenced (FPKM), 111 annotated differentially expressed genes (DEGs) were identified among different postnatal stages, which were subsequently assigned to 16 possible expression patterns by series-cluster analysis. Functional classification by Gene Ontology (GO) analysis was used for selecting the genes showing highest expression related to lipid metabolism. Finally, we identified the node genes for lipid metabolism regulation using co-expression analysis. In conclusion, these data may uncover candidate genes having functional roles in regulation of goat muscle development and lipid metabolism during the various growth stages in goats. PMID:28800357

  10. Identification of differentially expressed genes through RNA sequencing in goats (Capra hircus) at different postnatal stages.

    PubMed

    Lin, Yaqiu; Zhu, Jiangjiang; Wang, Yong; Li, Qian; Lin, Sen

    2017-01-01

    Intramuscular fat (IMF) content and fatty acid composition of longissimus dorsi muscle (LM) change with growth, which partially determines the flavor and nutritional value of goat (Capra hircus) meat. However, unlike cattle, little information is available on the transcriptome-wide changes during different postnatal stages in small ruminants, especially goats. In this study, the sequencing reads of goat LM tissues collected from kid, youth, and adult period were mapped to the goat genome. Results showed that out of total 24 689 Unigenes, 20 435 Unigenes were annotated. Based on expected number of fragments per kilobase of transcript sequence per million base pairs sequenced (FPKM), 111 annotated differentially expressed genes (DEGs) were identified among different postnatal stages, which were subsequently assigned to 16 possible expression patterns by series-cluster analysis. Functional classification by Gene Ontology (GO) analysis was used for selecting the genes showing highest expression related to lipid metabolism. Finally, we identified the node genes for lipid metabolism regulation using co-expression analysis. In conclusion, these data may uncover candidate genes having functional roles in regulation of goat muscle development and lipid metabolism during the various growth stages in goats.

  11. Phylogenetic analysis of rubella virus strains during an outbreak in São Paulo, 2007-2008.

    PubMed

    Figueiredo, C A; Oliveira, M I; Curti, S P; Afonso, A M S; Frugi Yu, A L; Gualberto, F; Durigon, E L

    2012-10-01

    Rubella virus (RV) is an important human pathogen that causes rubella, an acute contagious disease. It also causes severe birth defects collectively known as congenital rubella syndrome when infection occurs during the first trimester of pregnancy. Here, we present the phylogenetic analysis of RV that circulated in São Paulo during the 2007-2008 outbreak. Samples collected from patients diagnosed with rubella were isolated in cell culture and sequenced. RV RNA was obtained from samples or RV-infected cell cultures and amplified by reverse transcriptase-polymerase chain reaction. Sequences were assigned to genotypes by phylogenetic analysis using RV reference sequences. Seventeen sequences were analyzed, and three genotypes were identified: 1a, 1G, and 2B. Genotypes 1a and 1G, which were isolated in 2007, were responsible for sporadic rubella cases in São Paulo. Thereafter, in late 2007, the epidemiological conditions changed, resulting in a large RV outbreak with the clear dominance of genotype 2B. The results of this study provide new approaches for monitoring the progress of elimination of rubella from São Paulo, Brazil. Copyright © 2012 Wiley Periodicals, Inc.

  12. Hepatitis E Virus Genotype 3 Diversity: Phylogenetic Analysis and Presence of Subtype 3b in Wild Boar in Europe

    PubMed Central

    Vina-Rodriguez, Ariel; Schlosser, Josephine; Becher, Dietmar; Kaden, Volker; Groschup, Martin H.; Eiden, Martin

    2015-01-01

    An increasing number of indigenous cases of hepatitis E caused by genotype 3 viruses (HEV-3) have been diagnosed all around the word, particularly in industrialized countries. Hepatitis E is a zoonotic disease and accumulating evidence indicates that domestic pigs and wild boars are the main reservoirs of HEV-3. A detailed analysis of HEV-3 subtypes could help to determine the interplay of human activity, the role of animals as reservoirs and cross species transmission. Although complete genome sequences are most appropriate for HEV subtype determination, in most cases only partial genomic sequences are available. We therefore carried out a subtype classification analysis, which uses regions from all three open reading frames of the genome. Using this approach, more than 1000 published HEV-3 isolates were subtyped. Newly recovered HEV partial sequences from hunted German wild boars were also included in this study. These sequences were assigned to genotype 3 and clustered within subtype 3a, 3i and, unexpectedly, one of them within the subtype 3b, a first non-human report of this subtype in Europe. PMID:26008708

  13. Comparative NMR analysis of the decadeoxynucleotide d-(GCATTAATGC)2 and an analogue containing 2-aminoadenine.

    PubMed Central

    Chazin, W J; Rance, M; Chollet, A; Leupin, W

    1991-01-01

    The dodecadeoxynucleotide duplex d-(GCATTAATGC)2 has been prepared with all adenine bases replaced by 2-NH2-adenine. This modified duplex has been characterized by nuclear magnetic resonance (NMR) spectroscopy. Complete sequence-specific 1H resonance assignments have been obtained by using a variety of 2D NMR methods. Multiple quantum-filtered and multiple quantum experiments have been used to completely assign all sugar ring protons, including 5'H and 5'H resonances. The assignments form the basis for a detailed comparative analysis of the 1H NMR parameters of the modified and parent duplex. The structural features of both decamer duplexes in solution are characteristic of the B-DNA family. The spin-spin coupling constants in the sugar rings and the relative spatial proximities of protons in the bases and sugars (as determined from the comparison of corresponding nuclear Overhauser effects) are virtually identical in the parent and modified duplexes. Thus, substitution by this adenine analogue in oligonucleotides appears not to disturb the global or local conformation of the DNA duplex. PMID:1945828

  14. Orthology prediction methods: A quality assessment using curated protein families

    PubMed Central

    Trachana, Kalliopi; Larsson, Tomas A; Powell, Sean; Chen, Wei-Hua; Doerks, Tobias; Muller, Jean; Bork, Peer

    2011-01-01

    The increasing number of sequenced genomes has prompted the development of several automated orthology prediction methods. Tests to evaluate the accuracy of predictions and to explore biases caused by biological and technical factors are therefore required. We used 70 manually curated families to analyze the performance of five public methods in Metazoa. We analyzed the strengths and weaknesses of the methods and quantified the impact of biological and technical challenges. From the latter part of the analysis, genome annotation emerged as the largest single influencer, affecting up to 30% of the performance. Generally, most methods did well in assigning orthologous group but they failed to assign the exact number of genes for half of the groups. The publicly available benchmark set (http://eggnog.embl.de/orthobench/) should facilitate the improvement of current orthology assignment protocols, which is of utmost importance for many fields of biology and should be tackled by a broad scientific community. PMID:21853451

  15. DNA barcode data accurately assign higher spider taxa

    PubMed Central

    Coddington, Jonathan A.; Agnarsson, Ingi; Cheng, Ren-Chung; Čandek, Klemen; Driskell, Amy; Frick, Holger; Gregorič, Matjaž; Kostanjšek, Rok; Kropf, Christian; Kweskin, Matthew; Lokovšek, Tjaša; Pipan, Miha; Vidergar, Nina

    2016-01-01

    The use of unique DNA sequences as a method for taxonomic identification is no longer fundamentally controversial, even though debate continues on the best markers, methods, and technology to use. Although both existing databanks such as GenBank and BOLD, as well as reference taxonomies, are imperfect, in best case scenarios “barcodes” (whether single or multiple, organelle or nuclear, loci) clearly are an increasingly fast and inexpensive method of identification, especially as compared to manual identification of unknowns by increasingly rare expert taxonomists. Because most species on Earth are undescribed, a complete reference database at the species level is impractical in the near term. The question therefore arises whether unidentified species can, using DNA barcodes, be accurately assigned to more inclusive groups such as genera and families—taxonomic ranks of putatively monophyletic groups for which the global inventory is more complete and stable. We used a carefully chosen test library of CO1 sequences from 49 families, 313 genera, and 816 species of spiders to assess the accuracy of genus and family-level assignment. We used BLAST queries of each sequence against the entire library and got the top ten hits. The percent sequence identity was reported from these hits (PIdent, range 75–100%). Accurate assignment of higher taxa (PIdent above which errors totaled less than 5%) occurred for genera at PIdent values >95 and families at PIdent values ≥ 91, suggesting these as heuristic thresholds for accurate generic and familial identifications in spiders. Accuracy of identification increases with numbers of species/genus and genera/family in the library; above five genera per family and fifteen species per genus all higher taxon assignments were correct. We propose that using percent sequence identity between conventional barcode sequences may be a feasible and reasonably accurate method to identify animals to family/genus. However, the quality of the underlying database impacts accuracy of results; many outliers in our dataset could be attributed to taxonomic and/or sequencing errors in BOLD and GenBank. It seems that an accurate and complete reference library of families and genera of life could provide accurate higher level taxonomic identifications cheaply and accessibly, within years rather than decades. PMID:27547527

  16. DWARF – a data warehouse system for analyzing protein families

    PubMed Central

    Fischer, Markus; Thai, Quan K; Grieb, Melanie; Pleiss, Jürgen

    2006-01-01

    Background The emerging field of integrative bioinformatics provides the tools to organize and systematically analyze vast amounts of highly diverse biological data and thus allows to gain a novel understanding of complex biological systems. The data warehouse DWARF applies integrative bioinformatics approaches to the analysis of large protein families. Description The data warehouse system DWARF integrates data on sequence, structure, and functional annotation for protein fold families. The underlying relational data model consists of three major sections representing entities related to the protein (biochemical function, source organism, classification to homologous families and superfamilies), the protein sequence (position-specific annotation, mutant information), and the protein structure (secondary structure information, superimposed tertiary structure). Tools for extracting, transforming and loading data from public available resources (ExPDB, GenBank, DSSP) are provided to populate the database. The data can be accessed by an interface for searching and browsing, and by analysis tools that operate on annotation, sequence, or structure. We applied DWARF to the family of α/β-hydrolases to host the Lipase Engineering database. Release 2.3 contains 6138 sequences and 167 experimentally determined protein structures, which are assigned to 37 superfamilies 103 homologous families. Conclusion DWARF has been designed for constructing databases of large structurally related protein families and for evaluating their sequence-structure-function relationships by a systematic analysis of sequence, structure and functional annotation. It has been applied to predict biochemical properties from sequence, and serves as a valuable tool for protein engineering. PMID:17094801

  17. Genomic and Genetic Diversity within the Pseudomonas fluorescens Complex

    PubMed Central

    Garrido-Sanz, Daniel; Meier-Kolthoff, Jan P.; Göker, Markus; Martín, Marta; Rivilla, Rafael; Redondo-Nieto, Miguel

    2016-01-01

    The Pseudomonas fluorescens complex includes Pseudomonas strains that have been taxonomically assigned to more than fifty different species, many of which have been described as plant growth-promoting rhizobacteria (PGPR) with potential applications in biocontrol and biofertilization. So far the phylogeny of this complex has been analyzed according to phenotypic traits, 16S rDNA, MLSA and inferred by whole-genome analysis. However, since most of the type strains have not been fully sequenced and new species are frequently described, correlation between taxonomy and phylogenomic analysis is missing. In recent years, the genomes of a large number of strains have been sequenced, showing important genomic heterogeneity and providing information suitable for genomic studies that are important to understand the genomic and genetic diversity shown by strains of this complex. Based on MLSA and several whole-genome sequence-based analyses of 93 sequenced strains, we have divided the P. fluorescens complex into eight phylogenomic groups that agree with previous works based on type strains. Digital DDH (dDDH) identified 69 species and 75 subspecies within the 93 genomes. The eight groups corresponded to clustering with a threshold of 31.8% dDDH, in full agreement with our MLSA. The Average Nucleotide Identity (ANI) approach showed inconsistencies regarding the assignment to species and to the eight groups. The small core genome of 1,334 CDSs and the large pan-genome of 30,848 CDSs, show the large diversity and genetic heterogeneity of the P. fluorescens complex. However, a low number of strains were enough to explain most of the CDSs diversity at core and strain-specific genomic fractions. Finally, the identification and analysis of group-specific genome and the screening for distinctive characters revealed a phylogenomic distribution of traits among the groups that provided insights into biocontrol and bioremediation applications as well as their role as PGPR. PMID:26915094

  18. AutoFACT: An Automatic Functional Annotation and Classification Tool

    PubMed Central

    Koski, Liisa B; Gray, Michael W; Lang, B Franz; Burger, Gertraud

    2005-01-01

    Background Assignment of function to new molecular sequence data is an essential step in genomics projects. The usual process involves similarity searches of a given sequence against one or more databases, an arduous process for large datasets. Results We present AutoFACT, a fully automated and customizable annotation tool that assigns biologically informative functions to a sequence. Key features of this tool are that it (1) analyzes nucleotide and protein sequence data; (2) determines the most informative functional description by combining multiple BLAST reports from several user-selected databases; (3) assigns putative metabolic pathways, functional classes, enzyme classes, GeneOntology terms and locus names; and (4) generates output in HTML, text and GFF formats for the user's convenience. We have compared AutoFACT to four well-established annotation pipelines. The error rate of functional annotation is estimated to be only between 1–2%. Comparison of AutoFACT to the traditional top-BLAST-hit annotation method shows that our procedure increases the number of functionally informative annotations by approximately 50%. Conclusion AutoFACT will serve as a useful annotation tool for smaller sequencing groups lacking dedicated bioinformatics staff. It is implemented in PERL and runs on LINUX/UNIX platforms. AutoFACT is available at . PMID:15960857

  19. GASP: Gapped Ancestral Sequence Prediction for proteins

    PubMed Central

    Edwards, Richard J; Shields, Denis C

    2004-01-01

    Background The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel) events or sequence fragments. Results Here we present a new algorithm, GASP (Gapped Ancestral Sequence Prediction), for predicting ancestral sequences from phylogenetic trees and the corresponding multiple sequence alignments. Alignments may be of any size and contain gaps. GASP first assigns the positions of gaps in the phylogeny before using a likelihood-based approach centred on amino acid substitution matrices to assign ancestral amino acids. Important outgroup information is used by first working down from the tips of the tree to the root, using descendant data only to assign probabilities, and then working back up from the root to the tips using descendant and outgroup data to make predictions. GASP was tested on a number of simulated datasets based on real phylogenies. Prediction accuracy for ungapped data was similar to three alternative algorithms tested, with GASP performing better in some cases and worse in others. Adding simple insertions and deletions to the simulated data did not have a detrimental effect on GASP accuracy. Conclusions GASP (Gapped Ancestral Sequence Prediction) will predict ancestral sequences from multiple protein alignments of any size. Although not as accurate in all cases as some of the more sophisticated maximum likelihood approaches, it can process a wide range of input phylogenies and will predict ancestral sequences for gapped and ungapped residues alike. PMID:15350199

  20. Mango (Mangifera indica L.) cv. Kent fruit mesocarp de novo transcriptome assembly identifies gene families important for ripening

    PubMed Central

    Dautt-Castro, Mitzuko; Ochoa-Leyva, Adrian; Contreras-Vergara, Carmen A.; Pacheco-Sanchez, Magda A.; Casas-Flores, Sergio; Sanchez-Flores, Alejandro; Kuhn, David N.; Islas-Osuna, Maria A.

    2015-01-01

    Fruit ripening is a physiological and biochemical process genetically programmed to regulate fruit quality parameters like firmness, flavor, odor and color, as well as production of ethylene in climacteric fruit. In this study, a transcriptomic analysis of mango (Mangifera indica L.) mesocarp cv. “Kent” was done to identify key genes associated with fruit ripening. Using the Illumina sequencing platform, 67,682,269 clean reads were obtained and a transcriptome of 4.8 Gb. A total of 33,142 coding sequences were predicted and after functional annotation, 25,154 protein sequences were assigned with a product according to Swiss-Prot database and 32,560 according to non-redundant database. Differential expression analysis identified 2,306 genes with significant differences in expression between mature-green and ripe mango [1,178 up-regulated and 1,128 down-regulated (FDR ≤ 0.05)]. The expression of 10 genes evaluated by both qRT-PCR and RNA-seq data was highly correlated (R = 0.97), validating the differential expression data from RNA-seq alone. Gene Ontology enrichment analysis, showed significantly represented terms associated to fruit ripening like “cell wall,” “carbohydrate catabolic process” and “starch and sucrose metabolic process” among others. Mango genes were assigned to 327 metabolic pathways according to Kyoto Encyclopedia of Genes and Genomes database, among them those involved in fruit ripening such as plant hormone signal transduction, starch and sucrose metabolism, galactose metabolism, terpenoid backbone, and carotenoid biosynthesis. This study provides a mango transcriptome that will be very helpful to identify genes for expression studies in early and late flowering mangos during fruit ripening. PMID:25741352

  1. Mango (Mangifera indica L.) cv. Kent fruit mesocarp de novo transcriptome assembly identifies gene families important for ripening.

    PubMed

    Dautt-Castro, Mitzuko; Ochoa-Leyva, Adrian; Contreras-Vergara, Carmen A; Pacheco-Sanchez, Magda A; Casas-Flores, Sergio; Sanchez-Flores, Alejandro; Kuhn, David N; Islas-Osuna, Maria A

    2015-01-01

    Fruit ripening is a physiological and biochemical process genetically programmed to regulate fruit quality parameters like firmness, flavor, odor and color, as well as production of ethylene in climacteric fruit. In this study, a transcriptomic analysis of mango (Mangifera indica L.) mesocarp cv. "Kent" was done to identify key genes associated with fruit ripening. Using the Illumina sequencing platform, 67,682,269 clean reads were obtained and a transcriptome of 4.8 Gb. A total of 33,142 coding sequences were predicted and after functional annotation, 25,154 protein sequences were assigned with a product according to Swiss-Prot database and 32,560 according to non-redundant database. Differential expression analysis identified 2,306 genes with significant differences in expression between mature-green and ripe mango [1,178 up-regulated and 1,128 down-regulated (FDR ≤ 0.05)]. The expression of 10 genes evaluated by both qRT-PCR and RNA-seq data was highly correlated (R = 0.97), validating the differential expression data from RNA-seq alone. Gene Ontology enrichment analysis, showed significantly represented terms associated to fruit ripening like "cell wall," "carbohydrate catabolic process" and "starch and sucrose metabolic process" among others. Mango genes were assigned to 327 metabolic pathways according to Kyoto Encyclopedia of Genes and Genomes database, among them those involved in fruit ripening such as plant hormone signal transduction, starch and sucrose metabolism, galactose metabolism, terpenoid backbone, and carotenoid biosynthesis. This study provides a mango transcriptome that will be very helpful to identify genes for expression studies in early and late flowering mangos during fruit ripening.

  2. Dissemination and genetic diversity of chlamydial agents in Polish wildfowl: Isolation and molecular characterisation of avian Chlamydia abortus strains.

    PubMed

    Szymańska-Czerwińska, Monika; Mitura, Agata; Niemczuk, Krzysztof; Zaręba, Kinga; Jodełko, Agnieszka; Pluta, Aneta; Scharf, Sabine; Vitek, Bailey; Aaziz, Rachid; Vorimore, Fabien; Laroucau, Karine; Schnee, Christiane

    2017-01-01

    Wild birds are considered as a reservoir for avian chlamydiosis posing a potential infectious threat to domestic poultry and humans. Analysis of 894 cloacal or fecal swabs from free-living birds in Poland revealed an overall Chlamydiaceae prevalence of 14.8% (n = 132) with the highest prevalence noted in Anatidae (19.7%) and Corvidae (13.4%). Further testing conducted with species-specific real-time PCR showed that 65 samples (49.2%) were positive for C. psittaci whereas only one was positive for C. avium. To classify the non-identified chlamydial agents and to genotype the C. psittaci and C. avium-positive samples, specimens were subjected to ompA-PCR and sequencing (n = 83). The ompA-based NJ dendrogram revealed that only 23 out of 83 sequences were assigned to C. psittaci, in particular to four clades representing the previously described C. psittaci genotypes B, C, Mat116 and 1V. Whereas the 59 remaining sequences were assigned to two new clades named G1 and G2, each one including sequences recently obtained from chlamydiae detected in Swedish wetland birds. G1 (18 samples from Anatidae and Rallidae) grouped closely together with genotype 1V and in relative proximity to several C. abortus isolates, and G2 (41 samples from Anatidae and Corvidae) grouped closely to C. psittaci strains of the classical ABE cluster, Matt116 and M56. Finally, deep molecular analysis of four representative isolates of genotypes 1V, G1 and G2 based on 16S rRNA, IGS and partial 23S rRNA sequences as well as MLST clearly classify these isolates within the C. abortus species. Consequently, we propose an expansion of the C. abortus species to include not only the classical isolates of mammalian origin, but also avian isolates so far referred to as atypical C. psittaci or C. psittaci/C. abortus intermediates.

  3. Dissemination and genetic diversity of chlamydial agents in Polish wildfowl: Isolation and molecular characterisation of avian Chlamydia abortus strains

    PubMed Central

    Szymańska-Czerwińska, Monika; Mitura, Agata; Niemczuk, Krzysztof; Zaręba, Kinga; Jodełko, Agnieszka; Pluta, Aneta; Scharf, Sabine; Vitek, Bailey; Aaziz, Rachid; Vorimore, Fabien; Laroucau, Karine; Schnee, Christiane

    2017-01-01

    Wild birds are considered as a reservoir for avian chlamydiosis posing a potential infectious threat to domestic poultry and humans. Analysis of 894 cloacal or fecal swabs from free-living birds in Poland revealed an overall Chlamydiaceae prevalence of 14.8% (n = 132) with the highest prevalence noted in Anatidae (19.7%) and Corvidae (13.4%). Further testing conducted with species-specific real-time PCR showed that 65 samples (49.2%) were positive for C. psittaci whereas only one was positive for C. avium. To classify the non-identified chlamydial agents and to genotype the C. psittaci and C. avium-positive samples, specimens were subjected to ompA-PCR and sequencing (n = 83). The ompA-based NJ dendrogram revealed that only 23 out of 83 sequences were assigned to C. psittaci, in particular to four clades representing the previously described C. psittaci genotypes B, C, Mat116 and 1V. Whereas the 59 remaining sequences were assigned to two new clades named G1 and G2, each one including sequences recently obtained from chlamydiae detected in Swedish wetland birds. G1 (18 samples from Anatidae and Rallidae) grouped closely together with genotype 1V and in relative proximity to several C. abortus isolates, and G2 (41 samples from Anatidae and Corvidae) grouped closely to C. psittaci strains of the classical ABE cluster, Matt116 and M56. Finally, deep molecular analysis of four representative isolates of genotypes 1V, G1 and G2 based on 16S rRNA, IGS and partial 23S rRNA sequences as well as MLST clearly classify these isolates within the C. abortus species. Consequently, we propose an expansion of the C. abortus species to include not only the classical isolates of mammalian origin, but also avian isolates so far referred to as atypical C. psittaci or C. psittaci/C. abortus intermediates. PMID:28350846

  4. Taxonomic and functional assignment of cloned sequences from high Andean forest soil metagenome.

    PubMed

    Montaña, José Salvador; Jiménez, Diego Javier; Hernández, Mónica; Angel, Tatiana; Baena, Sandra

    2012-02-01

    Total metagenomic DNA was isolated from high Andean forest soil and subjected to taxonomical and functional composition analyses by means of clone library generation and sequencing. The obtained yield of 1.7 μg of DNA/g of soil was used to construct a metagenomic library of approximately 20,000 clones (in the plasmid p-Bluescript II SK+) with an average insert size of 4 Kb, covering 80 Mb of the total metagenomic DNA. Metagenomic sequences near the plasmid cloning site were sequenced and them trimmed and assembled, obtaining 299 reads and 31 contigs (0.3 Mb). Taxonomic assignment of total sequences was performed by BLASTX, resulting in 68.8, 44.8 and 24.5% classification into taxonomic groups using the metagenomic RAST server v2.0, WebCARMA v1.0 online system and MetaGenome Analyzer v3.8 software, respectively. Most clone sequences were classified as Bacteria belonging to phlya Actinobacteria, Proteobacteria and Acidobacteria. Among the most represented orders were Actinomycetales (34% average), Rhizobiales, Burkholderiales and Myxococcales and with a greater number of sequences in the genus Mycobacterium (7% average), Frankia, Streptomyces and Bradyrhizobium. The vast majority of sequences were associated with the metabolism of carbohydrates, proteins, lipids and catalytic functions, such as phosphatases, glycosyltransferases, dehydrogenases, methyltransferases, dehydratases and epoxide hydrolases. In this study we compared different methods of taxonomic and functional assignment of metagenomic clone sequences to evaluate microbial diversity in an unexplored soil ecosystem, searching for putative enzymes of biotechnological interest and generating important information for further functional screening of clone libraries.

  5. Nucleotide Sequence Diversity and Linkage Disequilibrium of Four Nuclear Loci in Foxtail Millet (Setaria italica).

    PubMed

    He, Shui-Lian; Yang, Yang; Morrell, Peter L; Yi, Ting-Shuang

    2015-01-01

    Foxtail millet (Setaria italica (L.) Beauv) is one of the earliest domesticated grains, which has been cultivated in northern China by 8,700 years before present (YBP) and across Eurasia by 4,000 YBP. Owing to a small genome and diploid nature, foxtail millet is a tractable model crop for studying functional genomics of millets and bioenergy grasses. In this study, we examined nucleotide sequence diversity, geographic structure, and levels of linkage disequilibrium at four nuclear loci (ADH1, G3PDH, IGS1 and TPI1) in representative samples of 311 landrace accessions across its cultivated range. Higher levels of nucleotide sequence and haplotype diversity were observed in samples from China relative to other sampled regions. Genetic assignment analysis classified the accessions into seven clusters based on nucleotide sequence polymorphisms. Intralocus LD decayed rapidly to half the initial value within ~1.2 kb or less.

  6. The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4)

    DOE PAGES

    Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos; ...

    2015-10-26

    The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. In conclusion, structural annotation is followed by assignment of protein product names and functions.

  7. The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos

    The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. In conclusion, structural annotation is followed by assignment of protein product names and functions.

  8. A filtering method to generate high quality short reads using illumina paired-end technology.

    PubMed

    Eren, A Murat; Vineis, Joseph H; Morrison, Hilary G; Sogin, Mitchell L

    2013-01-01

    Consensus between independent reads improves the accuracy of genome and transcriptome analyses, however lack of consensus between very similar sequences in metagenomic studies can and often does represent natural variation of biological significance. The common use of machine-assigned quality scores on next generation platforms does not necessarily correlate with accuracy. Here, we describe using the overlap of paired-end, short sequence reads to identify error-prone reads in marker gene analyses and their contribution to spurious OTUs following clustering analysis using QIIME. Our approach can also reduce error in shotgun sequencing data generated from libraries with small, tightly constrained insert sizes. The open-source implementation of this algorithm in Python programming language with user instructions can be obtained from https://github.com/meren/illumina-utils.

  9. [EST-SSR identification, markers development of Ligusticum chuanxiong based on Ligusticum chuanxiong transcriptome sequences].

    PubMed

    Yuan, Can; Peng, Fang; Yang, Ze-Mao; Zhong, Wen-Juan; Mou, Fang-Sheng; Gong, Yi-Yun; Ji, Pei-Cheng; Pu, De-Qiang; Huang, Hai-Yan; Yang, Xiao; Zhang, Chao

    2017-09-01

    Ligusticum chuanxiong is a well-known traditional Chinese medicine plant. The study on its molecular markers development and germplasm resources is very important. In this study, we obtained 24 422 unigenes by assembling transcriptome sequencing reads of L. chuanxiong root. EST-SSR was detected and 4 073 SSR loci were identified. EST-SSR distribution and characteristic analysis results showed that the mono-nucleotide repeats were the main repeat types, accounting for 41.0%. In addition, the sequences containing SSR were functionally annotated in Gene Ontology (GO) and KEGG pathway and were assigned to 49 GO categories, 242 KEGG pathways, among them 2 201 sequences were annotated against Nr database. By validating 235 EST-SSRs,74 primer pairs were ultimately proved to have high quality amplification. Subsequently, genetic diversity analysis, UPGMA cluster analysis, PCoA analysis and population structure analysis of 34 L. chuanxiong germplasm resources were carried out with 74 primer pairs. In both UPGMA tree and PCoA results, L. chuanxiong resources were clustered into two groups, which are believed to be partial related to their geographical distribution. In this study, EST-SSRs in L. chuanxiong was firstly identified, and newly developed molecular markers would contribute significantly to further genetic diversity study, the purity detection, gene mapping, and molecular breeding. Copyright© by the Chinese Pharmaceutical Association.

  10. Structural Analysis of Biodiversity

    PubMed Central

    Sirovich, Lawrence; Stoeckle, Mark Y.; Zhang, Yu

    2010-01-01

    Large, recently-available genomic databases cover a wide range of life forms, suggesting opportunity for insights into genetic structure of biodiversity. In this study we refine our recently-described technique using indicator vectors to analyze and visualize nucleotide sequences. The indicator vector approach generates correlation matrices, dubbed Klee diagrams, which represent a novel way of assembling and viewing large genomic datasets. To explore its potential utility, here we apply the improved algorithm to a collection of almost 17000 DNA barcode sequences covering 12 widely-separated animal taxa, demonstrating that indicator vectors for classification gave correct assignment in all 11000 test cases. Indicator vector analysis revealed discontinuities corresponding to species- and higher-level taxonomic divisions, suggesting an efficient approach to classification of organisms from poorly-studied groups. As compared to standard distance metrics, indicator vectors preserve diagnostic character probabilities, enable automated classification of test sequences, and generate high-information density single-page displays. These results support application of indicator vectors for comparative analysis of large nucleotide data sets and raise prospect of gaining insight into broad-scale patterns in the genetic structure of biodiversity. PMID:20195371

  11. Unassigned MS/MS Spectra: Who Am I?

    PubMed

    Pathan, Mohashin; Samuel, Monisha; Keerthikumar, Shivakumar; Mathivanan, Suresh

    2017-01-01

    Recent advances in high resolution tandem mass spectrometry (MS) has resulted in the accumulation of high quality data. Paralleled with these advances in instrumentation, bioinformatics software have been developed to analyze such quality datasets. In spite of these advances, data analysis in mass spectrometry still remains critical for protein identification. In addition, the complexity of the generated MS/MS spectra, unpredictable nature of peptide fragmentation, sequence annotation errors, and posttranslational modifications has impeded the protein identification process. In a typical MS data analysis, about 60 % of the MS/MS spectra remains unassigned. While some of these could attribute to the low quality of the MS/MS spectra, a proportion can be classified as high quality. Further analysis may reveal how much of the unassigned MS spectra attribute to search space, sequence annotation errors, mutations, and/or posttranslational modifications. In this chapter, the tools used to identify proteins and ways to assign unassigned tandem MS spectra are discussed.

  12. AMPLISAS: a web server for multilocus genotyping using next-generation amplicon sequencing data.

    PubMed

    Sebastian, Alvaro; Herdegen, Magdalena; Migalska, Magdalena; Radwan, Jacek

    2016-03-01

    Next-generation sequencing (NGS) technologies are revolutionizing the fields of biology and medicine as powerful tools for amplicon sequencing (AS). Using combinations of primers and barcodes, it is possible to sequence targeted genomic regions with deep coverage for hundreds, even thousands, of individuals in a single experiment. This is extremely valuable for the genotyping of gene families in which locus-specific primers are often difficult to design, such as the major histocompatibility complex (MHC). The utility of AS is, however, limited by the high intrinsic sequencing error rates of NGS technologies and other sources of error such as polymerase amplification or chimera formation. Correcting these errors requires extensive bioinformatic post-processing of NGS data. Amplicon Sequence Assignment (AMPLISAS) is a tool that performs analysis of AS results in a simple and efficient way, while offering customization options for advanced users. AMPLISAS is designed as a three-step pipeline consisting of (i) read demultiplexing, (ii) unique sequence clustering and (iii) erroneous sequence filtering. Allele sequences and frequencies are retrieved in excel spreadsheet format, making them easy to interpret. AMPLISAS performance has been successfully benchmarked against previously published genotyped MHC data sets obtained with various NGS technologies. © 2015 John Wiley & Sons Ltd.

  13. Phylogeography and systematics of the westernmost Italian Dolichopoda species (Orthoptera, Rhaphidophoridae)

    PubMed Central

    Allegrucci, Giuliana; Rampini, Mauro; Di Russo, Claudio; Lana, Enrico; Cocchi, Sara; Sbordoni, Valerio

    2014-01-01

    Abstract The genus Dolichopoda (Orthoptera; Rhaphidopohoridae) is present in Italy with 9 species distributed from northwestern Italy (Piedmont and Liguria) to the southernmost Apennines (Calabria), occurring also in the Tyrrhenian coastal areas and in Sardinia. Three morphologically very close taxa have been described in Piedmont and Liguria, i.e., D. ligustica ligustica, D. ligustica septentrionalis and D. azami azami. To investigate the delimitation of the northwestern species of Dolichopoda, we performed both morphological and molecular analyses. Morphological analysis was carried out by considering diagnostic characters generally used to distinguish different taxa, as the shape of epiphallus in males and the subgenital fig in females. Molecular analysis was performed by sequencing three mitochondrial genes, 12S rRNA, 16S rRNA, partially sequenced and the entire gene of COI. Results from both morphological and molecular analyses highlighted a very homogeneous group of populations, although genetically structured. Three haplogroups geographically distributed could be distinguished and based on these results we suggest a new taxonomic arrangement. All populations, due to the priority of description, should be assigned to D. azami azami Saulcy, 1893 and to preserve the names ligustica and septentrionalis, corresponding to different genetic haplogroups, we assign them to D. azami ligustica stat. n. Baccetti & Capra, 1959 and to D. azami septentrionalis stat. n. Baccetti & Capra, 1959. PMID:25197209

  14. [Phylogenetic relationships of the species of Oxytropis DC. subg. Oxytropis and Phacoxytropis (Fabaceae) from Asian Russia inferred from the nucleotide sequence analysis of the intergenic spacers of the chloroplast genome].

    PubMed

    Kholina, A B; Kozyrenko, M M; Artyukova, E V; Sandanov, D V; Andrianova, E A

    2016-08-01

    The nucleotide sequence analysis of trnH–psbA, trnL–trnF, and trnS–trnG intergenic spacer regions of chloroplast DNA performed in the representatives of the genus Oxytropis from Asian Russia provided clarification of the phylogenetic relationships of some species and sections in the subgenera Oxytropis and Phacoxytropis and in the genus Oxytropis as a whole. Only the section Mesogaea corresponds to the subgenus Phacoxytropis, while the section Janthina of the same subgenus groups together with the sections of the subgenus Oxytropis. The sections Chrysantha and Ortholoma of the subgenus Oxytropis are not only closely related to each other, but together with the section Mesogaea, they are grouped into the subgenus Phacoxytropis. It seems likely that the sections Chrysantha and Ortholoma should be assigned to the subgenus Phacoxytropis, and the section Janthina should be assigned to the subgenus Oxytropis. The molecular differences were identified between O. coerulea and O. mandshurica from the section Janthina that were indicative of considerable divergence of their chloroplast genomes and the species independence of the taxa. The species independence of O. czukotica belonging to the section Arctobia was also confirmed.

  15. Amino acid selective unlabeling for sequence specific resonance assignments in proteins

    PubMed Central

    Krishnarjuna, B.; Jaipuria, Garima; Thakur, Anushikha

    2010-01-01

    Sequence specific resonance assignment constitutes an important step towards high-resolution structure determination of proteins by NMR and is aided by selective identification and assignment of amino acid types. The traditional approach to selective labeling yields only the chemical shifts of the particular amino acid being selected and does not help in establishing a link between adjacent residues along the polypeptide chain, which is important for sequential assignments. An alternative approach is the method of amino acid selective ‘unlabeling’ or reverse labeling, which involves selective unlabeling of specific amino acid types against a uniformly 13C/15N labeled background. Based on this method, we present a novel approach for sequential assignments in proteins. The method involves a new NMR experiment named, {12COi–15Ni+1}-filtered HSQC, which aids in linking the 1HN/15N resonances of the selectively unlabeled residue, i, and its C-terminal neighbor, i + 1, in HN-detected double and triple resonance spectra. This leads to the assignment of a tri-peptide segment from the knowledge of the amino acid types of residues: i − 1, i and i + 1, thereby speeding up the sequential assignment process. The method has the advantage of being relatively inexpensive, applicable to 2H labeled protein and can be coupled with cell-free synthesis and/or automated assignment approaches. A detailed survey involving unlabeling of different amino acid types individually or in pairs reveals that the proposed approach is also robust to misincorporation of 14N at undesired sites. Taken together, this study represents the first application of selective unlabeling for sequence specific resonance assignments and opens up new avenues to using this methodology in protein structural studies. Electronic supplementary material The online version of this article (doi:10.1007/s10858-010-9459-z) contains supplementary material, which is available to authorized users. PMID:21153044

  16. Metagenomic analysis of Sichuan takin fecal sample viromes reveals novel enterovirus and astrovirus.

    PubMed

    Guan, Tian-Pei; Teng, Jade L L; Yeong, Kai-Yan; You, Zhang-Qiang; Liu, Hao; Wong, Samson S Y; Lau, Susanna K P; Woo, Patrick C Y

    2018-06-07

    The Sichuan takin inhabits the bamboo forests in the Eastern Himalayas and is considered as a national treasure of China with the highest legal protection and conservation status considered as vulnerable according to The IUCN Red List of Threatened Species. In this study, fecal samples of 71 Sichuan takins were pooled and deep sequenced. Among the 103,553 viral sequences, 21,961 were assigned to mammalian viruses. De novo assembly revealed genomes of an enterovirus and an astrovirus and contigs of circoviruses and genogroup I picobirnaviruses. Complete genome sequencing and phylogenetic analysis showed that Sichuan takin enterovirus is a novel serotype/genotype of the species Enterovirus G, with evidence of recombination. Sichuan takin astrovirus is a new subtype of bovine astrovirus, probably belonging to a new genogroup in the genus Mamastrovirus. Further studies will reveal whether these viruses can also be found in Mishmi takin and Shaanxi takin and their pathogenic potentials. Copyright © 2018 Elsevier Inc. All rights reserved.

  17. Escherichia coli K-12: a cooperatively developed annotation snapshot—2005

    PubMed Central

    Riley, Monica; Abe, Takashi; Arnaud, Martha B.; Berlyn, Mary K.B.; Blattner, Frederick R.; Chaudhuri, Roy R.; Glasner, Jeremy D.; Horiuchi, Takashi; Keseler, Ingrid M.; Kosuge, Takehide; Mori, Hirotada; Perna, Nicole T.; Plunkett, Guy; Rudd, Kenneth E.; Serres, Margrethe H.; Thomas, Gavin H.; Thomson, Nicholas R.; Wishart, David; Wanner, Barry L.

    2006-01-01

    The goal of this group project has been to coordinate and bring up-to-date information on all genes of Escherichia coli K-12. Annotation of the genome of an organism entails identification of genes, the boundaries of genes in terms of precise start and end sites, and description of the gene products. Known and predicted functions were assigned to each gene product on the basis of experimental evidence or sequence analysis. Since both kinds of evidence are constantly expanding, no annotation is complete at any moment in time. This is a snapshot analysis based on the most recent genome sequences of two E.coli K-12 bacteria. An accurate and up-to-date description of E.coli K-12 genes is of particular importance to the scientific community because experimentally determined properties of its gene products provide fundamental information for annotation of innumerable genes of other organisms. Availability of the complete genome sequence of two K-12 strains allows comparison of their genotypes and mutant status of alleles. PMID:16397293

  18. Investigation of the Evolutionary Development of the Genus Bifidobacterium by Comparative Genomics

    PubMed Central

    Lugli, Gabriele Andrea; Milani, Christian; Turroni, Francesca; Duranti, Sabrina; Ferrario, Chiara; Viappiani, Alice; Mancabelli, Leonardo; Mangifesta, Marta; Taminiau, Bernard; Delcenserie, Véronique; van Sinderen, Douwe

    2014-01-01

    The Bifidobacterium genus currently encompasses 48 recognized taxa, which have been isolated from different ecosystems. However, the current phylogeny of bifidobacteria is hampered by the relative paucity of genotypic data. Here, we reassessed the taxonomy of this bacterial genus using genome-based approaches, which demonstrated that the previous taxonomic view of bifidobacteria contained several inconsistencies. In particular, high levels of genetic relatedness were shown to exist between particular Bifidobacterium taxa which would not justify their status as separate species. The results presented are here based on average nucleotide identity analysis involving the genome sequences for each type strain of the 48 bifidobacterial taxa, as well as phylogenetic comparative analysis of the predicted core genome of the Bifidobacterium genus. The results of this study demonstrate that the availability of complete genome sequences allows the reconstruction of a more robust bifidobacterial phylogeny than that obtained from a single gene-based sequence comparison, thus discouraging the assignment of a new or separate bifidobacterial taxon without such a genome-based validation. PMID:25107967

  19. Reverse Genetics and High Throughput Sequencing Methodologies for Plant Functional Genomics

    PubMed Central

    Ben-Amar, Anis; Daldoul, Samia; Reustle, Götz M.; Krczal, Gabriele; Mliki, Ahmed

    2016-01-01

    In the post-genomic era, increasingly sophisticated genetic tools are being developed with the long-term goal of understanding how the coordinated activity of genes gives rise to a complex organism. With the advent of the next generation sequencing associated with effective computational approaches, wide variety of plant species have been fully sequenced giving a wealth of data sequence information on structure and organization of plant genomes. Since thousands of gene sequences are already known, recently developed functional genomics approaches provide powerful tools to analyze plant gene functions through various gene manipulation technologies. Integration of different omics platforms along with gene annotation and computational analysis may elucidate a complete view in a system biology level. Extensive investigations on reverse genetics methodologies were deployed for assigning biological function to a specific gene or gene product. We provide here an updated overview of these high throughout strategies highlighting recent advances in the knowledge of functional genomics in plants. PMID:28217003

  20. Ancestry estimation and control of population stratification for sequence-based association studies.

    PubMed

    Wang, Chaolong; Zhan, Xiaowei; Bragg-Gresham, Jennifer; Kang, Hyun Min; Stambolian, Dwight; Chew, Emily Y; Branham, Kari E; Heckenlively, John; Fulton, Robert; Wilson, Richard K; Mardis, Elaine R; Lin, Xihong; Swaroop, Anand; Zöllner, Sebastian; Abecasis, Gonçalo R

    2014-04-01

    Estimating individual ancestry is important in genetic association studies where population structure leads to false positive signals, although assigning ancestry remains challenging with targeted sequence data. We propose a new method for the accurate estimation of individual genetic ancestry, based on direct analysis of off-target sequence reads, and implement our method in the publicly available LASER software. We validate the method using simulated and empirical data and show that the method can accurately infer worldwide continental ancestry when used with sequencing data sets with whole-genome shotgun coverage as low as 0.001×. For estimates of fine-scale ancestry within Europe, the method performs well with coverage of 0.1×. On an even finer scale, the method improves discrimination between exome-sequenced study participants originating from different provinces within Finland. Finally, we show that our method can be used to improve case-control matching in genetic association studies and to reduce the risk of spurious findings due to population structure.

  1. GenBank.

    PubMed

    Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Wheeler, David L

    2007-01-01

    GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 240 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage (www.ncbi.nlm.nih.gov).

  2. GenBank.

    PubMed

    Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Wheeler, David L

    2005-01-01

    GenBank is a comprehensive database that contains publicly available DNA sequences for more than 165,000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in the UK and the DNA Data Bank of Japan helps to ensure worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, go to the NCBI Homepage at http://www.ncbi.nlm.nih.gov.

  3. GenBank.

    PubMed

    Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Wheeler, David L

    2006-01-01

    GenBank (R) is a comprehensive database that contains publicly available DNA sequences for more than 205 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the Web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, go to the NCBI Homepage at www.ncbi.nlm.nih.gov.

  4. Taxonomic Characterization of Honey Bee (Apis mellifera) Pollen Foraging Based on Non-Overlapping Paired-End Sequencing of Nuclear Ribosomal Loci.

    PubMed

    Cornman, R Scott; Otto, Clint R V; Iwanowicz, Deborah; Pettis, Jeffery S

    2015-01-01

    Identifying plant taxa that honey bees (Apis mellifera) forage upon is of great apicultural interest, but traditional methods are labor intensive and may lack resolution. Here we evaluate a high-throughput genetic barcoding approach to characterize trap-collected pollen from multiple North Dakota apiaries across multiple years. We used the Illumina MiSeq platform to generate sequence scaffolds from non-overlapping 300-bp paired-end sequencing reads of the ribosomal internal transcribed spacers (ITS). Full-length sequence scaffolds represented ~530 bp of ITS sequence after adapter trimming, drawn from the 5' of ITS1 and the 3' of ITS2, while skipping the uninformative 5.8S region. Operational taxonomic units (OTUs) were picked from scaffolds clustered at 97% identity, searched by BLAST against the nt database, and given taxonomic assignments using the paired-read lowest common ancestor approach. Taxonomic assignments and quantitative patterns were consistent with known plant distributions, phenology, and observational reports of pollen foraging, but revealed an unexpected contribution from non-crop graminoids and wetland plants. The mean number of plant species assignments per sample was 23.0 (+/- 5.5) and the mean species diversity (effective number of equally abundant species) was 3.3 (+/- 1.2). Bray-Curtis similarities showed good agreement among samples from the same apiary and sampling date. Rarefaction plots indicated that fewer than 50,000 reads are typically needed to characterize pollen samples of this complexity. Our results show that a pre-compiled, curated reference database is not essential for genus-level assignments, but species-level assignments are hindered by database gaps, reference length variation, and probable errors in the taxonomic assignment, requiring post-hoc evaluation. Although the effective per-sample yield achieved using custom MiSeq amplicon primers was less than the machine maximum, primarily due to lower "read2" quality, further protocol optimization and/or a modest reduction in multiplex scale should offset this difficulty. As small quantities of pollen are sufficient for amplification, our approach might be extendable to other questions or species for which large pollen samples are not available.

  5. Taxonomic Characterization of Honey Bee (Apis mellifera) Pollen Foraging Based on Non-Overlapping Paired-End Sequencing of Nuclear Ribosomal Loci

    PubMed Central

    Cornman, R. Scott; Otto, Clint R. V.; Iwanowicz, Deborah; Pettis, Jeffery S.

    2015-01-01

    Identifying plant taxa that honey bees (Apis mellifera) forage upon is of great apicultural interest, but traditional methods are labor intensive and may lack resolution. Here we evaluate a high-throughput genetic barcoding approach to characterize trap-collected pollen from multiple North Dakota apiaries across multiple years. We used the Illumina MiSeq platform to generate sequence scaffolds from non-overlapping 300-bp paired-end sequencing reads of the ribosomal internal transcribed spacers (ITS). Full-length sequence scaffolds represented ~530 bp of ITS sequence after adapter trimming, drawn from the 5’ of ITS1 and the 3’ of ITS2, while skipping the uninformative 5.8S region. Operational taxonomic units (OTUs) were picked from scaffolds clustered at 97% identity, searched by BLAST against the nt database, and given taxonomic assignments using the paired-read lowest common ancestor approach. Taxonomic assignments and quantitative patterns were consistent with known plant distributions, phenology, and observational reports of pollen foraging, but revealed an unexpected contribution from non-crop graminoids and wetland plants. The mean number of plant species assignments per sample was 23.0 (+/- 5.5) and the mean species diversity (effective number of equally abundant species) was 3.3 (+/- 1.2). Bray-Curtis similarities showed good agreement among samples from the same apiary and sampling date. Rarefaction plots indicated that fewer than 50,000 reads are typically needed to characterize pollen samples of this complexity. Our results show that a pre-compiled, curated reference database is not essential for genus-level assignments, but species-level assignments are hindered by database gaps, reference length variation, and probable errors in the taxonomic assignment, requiring post-hoc evaluation. Although the effective per-sample yield achieved using custom MiSeq amplicon primers was less than the machine maximum, primarily due to lower “read2” quality, further protocol optimization and/or a modest reduction in multiplex scale should offset this difficulty. As small quantities of pollen are sufficient for amplification, our approach might be extendable to other questions or species for which large pollen samples are not available. PMID:26700168

  6. Taxonomic characterization of honey bee (Apis mellifera) pollen foraging based on non-overlapping paired-end sequencing of nuclear ribosomal loci

    USGS Publications Warehouse

    Cornman, Robert S.; Otto, Clint R.; Iwanowicz, Deborah; Pettis, Jeffery S

    2015-01-01

    Identifying plant taxa that honey bees (Apis mellifera) forage upon is of great apicultural interest, but traditional methods are labor intensive and may lack resolution. Here we evaluate a high-throughput genetic barcoding approach to characterize trap-collected pollen from multiple North Dakota apiaries across multiple years. We used the Illumina MiSeq platform to generate sequence scaffolds from non-overlapping 300-bp paired-end sequencing reads of the ribosomal internal transcribed spacers (ITS). Full-length sequence scaffolds represented ~530 bp of ITS sequence after adapter trimming, drawn from the 5’ of ITS1 and the 3’ of ITS2, while skipping the uninformative 5.8S region. Operational taxonomic units (OTUs) were picked from scaffolds clustered at 97% identity, searched by BLAST against the nt database, and given taxonomic assignments using the paired-read lowest common ancestor approach. Taxonomic assignments and quantitative patterns were consistent with known plant distributions, phenology, and observational reports of pollen foraging, but revealed an unexpected contribution from non-crop graminoids and wetland plants. The mean number of plant species assignments per sample was 23.0 (+/- 5.5) and the mean species diversity (effective number of equally abundant species) was 3.3 (+/- 1.2). Bray-Curtis similarities showed good agreement among samples from the same apiary and sampling date. Rarefaction plots indicated that fewer than 50,000 reads are typically needed to characterize pollen samples of this complexity. Our results show that a pre-compiled, curated reference database is not essential for genus-level assignments, but species-level assignments are hindered by database gaps, reference length variation, and probable errors in the taxonomic assignment, requiring post-hoc evaluation. Although the effective per-sample yield achieved using custom MiSeq amplicon primers was less than the machine maximum, primarily due to lower “read2” quality, further protocol optimization and/or a modest reduction in multiplex scale should offset this difficulty. As small quantities of pollen are sufficient for amplification, our approach might be extendable to other questions or species for which large pollen samples are not available.

  7. Supervised DNA Barcodes species classification: analysis, comparisons and results

    PubMed Central

    2014-01-01

    Background Specific fragments, coming from short portions of DNA (e.g., mitochondrial, nuclear, and plastid sequences), have been defined as DNA Barcode and can be used as markers for organisms of the main life kingdoms. Species classification with DNA Barcode sequences has been proven effective on different organisms. Indeed, specific gene regions have been identified as Barcode: COI in animals, rbcL and matK in plants, and ITS in fungi. The classification problem assigns an unknown specimen to a known species by analyzing its Barcode. This task has to be supported with reliable methods and algorithms. Methods In this work the efficacy of supervised machine learning methods to classify species with DNA Barcode sequences is shown. The Weka software suite, which includes a collection of supervised classification methods, is adopted to address the task of DNA Barcode analysis. Classifier families are tested on synthetic and empirical datasets belonging to the animal, fungus, and plant kingdoms. In particular, the function-based method Support Vector Machines (SVM), the rule-based RIPPER, the decision tree C4.5, and the Naïve Bayes method are considered. Additionally, the classification results are compared with respect to ad-hoc and well-established DNA Barcode classification methods. Results A software that converts the DNA Barcode FASTA sequences to the Weka format is released, to adapt different input formats and to allow the execution of the classification procedure. The analysis of results on synthetic and real datasets shows that SVM and Naïve Bayes outperform on average the other considered classifiers, although they do not provide a human interpretable classification model. Rule-based methods have slightly inferior classification performances, but deliver the species specific positions and nucleotide assignments. On synthetic data the supervised machine learning methods obtain superior classification performances with respect to the traditional DNA Barcode classification methods. On empirical data their classification performances are at a comparable level to the other methods. Conclusions The classification analysis shows that supervised machine learning methods are promising candidates for handling with success the DNA Barcoding species classification problem, obtaining excellent performances. To conclude, a powerful tool to perform species identification is now available to the DNA Barcoding community. PMID:24721333

  8. Sequence-based analysis of pQBR103; a representative of a unique, transfer-proficient mega plasmid resident in the microbial community of sugar beet

    PubMed Central

    Tett, Adrian; Spiers, Andrew J; Crossman, Lisa C; Ager, Duane; Ciric, Lena; Dow, J Maxwell; Fry, John C; Harris, David; Lilley, Andrew; Oliver, Anna; Parkhill, Julian; Quail, Michael A; Rainey, Paul B; Saunders, Nigel J; Seeger, Kathy; Snyder, Lori AS; Squares, Rob; Thomas, Christopher M; Turner, Sarah L; Zhang, Xue-Xian; Field, Dawn; Bailey, Mark J

    2009-01-01

    The plasmid pQBR103 was found within Pseudomonas populations colonizing the leaf and root surfaces of sugar beet plants growing at Wytham, Oxfordshire, UK. At 425 kb it is the largest self-transmissible plasmid yet sequenced from the phytosphere. It is known to enhance the competitive fitness of its host, and parts of the plasmid are known to be actively transcribed in the plant environment. Analysis of the complete sequence of this plasmid predicts a coding sequence (CDS)-rich genome containing 478 CDSs and an exceptional degree of genetic novelty; 80% of predicted coding sequences cannot be ascribed a function and 60% are orphans. Of those to which function could be assigned, 40% bore greatest similarity to sequences from Pseudomonas spp, and the majority of the remainder showed similarity to other c-proteobacterial genera and plasmids. pQBR103 has identifiable regions presumed responsible for replication and partitioning, but despite being tra+ lacks the full complement of any previously described conjugal transfer functions. The DNA sequence provided few insights into the functional significance of plant-induced transcriptional regions, but suggests that 14% of CDSs may be expressed (11 CDSs with functional annotation and 54 without), further highlighting the ecological importance of these novel CDSs. Comparative analysis indicates that pQBR103 shares significant regions of sequence with other plasmids isolated from sugar beet plants grown at the same geographic location. These plasmid sequences indicate there is more novelty in the mobile DNA pool accessible to phytosphere pseudomonas than is currently appreciated or understood. PMID:18043644

  9. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    PubMed

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  10. A teaching-learning sequence about weather map reading

    NASA Astrophysics Data System (ADS)

    Mandrikas, Achilleas; Stavrou, Dimitrios; Skordoulis, Constantine

    2017-07-01

    In this paper a teaching-learning sequence (TLS) introducing pre-service elementary teachers (PET) to weather map reading, with emphasis on wind assignment, is presented. The TLS includes activities about recognition of wind symbols, assignment of wind direction and wind speed on a weather map and identification of wind characteristics in a weather forecast. Sixty PET capabilities and difficulties in understanding weather maps were investigated, using inquiry-based learning activities. The results show that most PET became more capable of reading weather maps and assigning wind direction and speed on them. Our results also show that PET could be guided to understand meteorology concepts useful in everyday life and in teaching their future students.

  11. Metagenomic Analysis of Subtidal Sediments from Polar and Subpolar Coastal Environments Highlights the Relevance of Anaerobic Hydrocarbon Degradation Processes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Espínola, Fernando; Dionisi, Hebe M.; Borglin, Sharon

    In this work, we analyzed the community structure and metabolic potential of sediment microbial communities in high-latitude coastal environments subjected to low to moderate levels of chronic pollution. Subtidal sediments from four low-energy inlets located in polar and subpolar regions from both Hemispheres were analyzed using large-scale 16S rRNA gene and metagenomic sequencing. Communities showed high diversity (Shannon’s index 6.8 to 10.2), with distinct phylogenetic structures (<40% shared taxa at the Phylum level among regions) but similar metabolic potential in terms of sequences assigned to KOs. Environmental factors (mainly salinity, temperature, and in less extent organic pollution) were drivers ofmore » both phylogenetic and functional traits. Bacterial taxa correlating with hydrocarbon pollution included families of anaerobic or facultative anaerobic lifestyle, such as Desulfuromonadaceae, Geobacteraceae, and Rhodocyclaceae. In accordance, biomarker genes for anaerobic hydrocarbon degradation (bamA, ebdA, bcrA, and bssA) were prevalent, only outnumbered by alkB, and their sequences were taxonomically binned to the same bacterial groups. BssA-assigned metagenomic sequences showed an extremely wide diversity distributed all along the phylogeny known for this gene, including bssA sensu stricto, nmsA, assA, and other clusters from poorly or not yet described variants. This work increases our understanding of microbial community patterns in cold coastal sediments, and highlights the relevance of anaerobic hydrocarbon degradation processes in subtidal environments.« less

  12. HMMER Cut-off Threshold Tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold.

    PubMed

    Pagnuco, Inti Anabela; Revuelta, María Victoria; Bondino, Hernán Gabriel; Brun, Marcel; Ten Have, Arjen

    2018-01-01

    Protein superfamilies can be divided into subfamilies of proteins with different functional characteristics. Their sequences can be classified hierarchically, which is part of sequence function assignation. Typically, there are no clear subfamily hallmarks that would allow pattern-based function assignation by which this task is mostly achieved based on the similarity principle. This is hampered by the lack of a score cut-off that is both sensitive and specific. HMMER Cut-off Threshold Tool (HMMERCTTER) adds a reliable cut-off threshold to the popular HMMER. Using a high quality superfamily phylogeny, it clusters a set of training sequences such that the cluster-specific HMMER profiles show cluster or subfamily member detection with 100% precision and recall (P&R), thereby generating a specific threshold as inclusion cut-off. Profiles and thresholds are then used as classifiers to screen a target dataset. Iterative inclusion of novel sequences to groups and the corresponding HMMER profiles results in high sensitivity while specificity is maintained by imposing 100% P&R self detection. In three presented case studies of protein superfamilies, classification of large datasets with 100% precision was achieved with over 95% recall. Limits and caveats are presented and explained. HMMERCTTER is a promising protein superfamily sequence classifier provided high quality training datasets are used. It provides a decision support system that aids in the difficult task of sequence function assignation in the twilight zone of sequence similarity. All relevant data and source codes are available from the Github repository at the following URL: https://github.com/BBCMdP/HMMERCTTER.

  13. HMMER Cut-off Threshold Tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold

    PubMed Central

    Pagnuco, Inti Anabela; Revuelta, María Victoria; Bondino, Hernán Gabriel; Brun, Marcel

    2018-01-01

    Background Protein superfamilies can be divided into subfamilies of proteins with different functional characteristics. Their sequences can be classified hierarchically, which is part of sequence function assignation. Typically, there are no clear subfamily hallmarks that would allow pattern-based function assignation by which this task is mostly achieved based on the similarity principle. This is hampered by the lack of a score cut-off that is both sensitive and specific. Results HMMER Cut-off Threshold Tool (HMMERCTTER) adds a reliable cut-off threshold to the popular HMMER. Using a high quality superfamily phylogeny, it clusters a set of training sequences such that the cluster-specific HMMER profiles show cluster or subfamily member detection with 100% precision and recall (P&R), thereby generating a specific threshold as inclusion cut-off. Profiles and thresholds are then used as classifiers to screen a target dataset. Iterative inclusion of novel sequences to groups and the corresponding HMMER profiles results in high sensitivity while specificity is maintained by imposing 100% P&R self detection. In three presented case studies of protein superfamilies, classification of large datasets with 100% precision was achieved with over 95% recall. Limits and caveats are presented and explained. Conclusions HMMERCTTER is a promising protein superfamily sequence classifier provided high quality training datasets are used. It provides a decision support system that aids in the difficult task of sequence function assignation in the twilight zone of sequence similarity. All relevant data and source codes are available from the Github repository at the following URL: https://github.com/BBCMdP/HMMERCTTER. PMID:29579071

  14. Phylogenetic screening of a bacterial, metagenomic library using homing endonuclease restriction and marker insertion

    PubMed Central

    Yung, Pui Yi; Burke, Catherine; Lewis, Matt; Egan, Suhelen; Kjelleberg, Staffan; Thomas, Torsten

    2009-01-01

    Metagenomics provides access to the uncultured majority of the microbial world. The approaches employed in this field have, however, had limited success in linking functional genes to the taxonomic or phylogenetic origin of the organism they belong to. Here we present an efficient strategy to recover environmental DNA fragments that contain phylogenetic marker genes from metagenomic libraries. Our method involves the cleavage of 23S ribsosmal RNA (rRNA) genes within pooled library clones by the homing endonuclease I-CeuI followed by the insertion and selection of an antibiotic resistance cassette. This approach was applied to screen a library of 6500 fosmid clones derived from the microbial community associated with the sponge Cymbastela concentrica. Several fosmid clones were recovered after the screen and detailed phylogenetic and taxonomic assignment based on the rRNA gene showed that they belong to previously unknown organisms. In addition, compositional features of these fosmid clones were used to classify and taxonomically assign a dataset of environmental shotgun sequences. Our approach represents a valuable tool for the analysis of rapidly increasing, environmental DNA sequencing information. PMID:19767618

  15. HoloVir: A Workflow for Investigating the Diversity and Function of Viruses in Invertebrate Holobionts

    PubMed Central

    Laffy, Patrick W.; Wood-Charlson, Elisha M.; Turaev, Dmitrij; Weynberg, Karen D.; Botté, Emmanuelle S.; van Oppen, Madeleine J. H.; Webster, Nicole S.; Rattei, Thomas

    2016-01-01

    Abundant bioinformatics resources are available for the study of complex microbial metagenomes, however their utility in viral metagenomics is limited. HoloVir is a robust and flexible data analysis pipeline that provides an optimized and validated workflow for taxonomic and functional characterization of viral metagenomes derived from invertebrate holobionts. Simulated viral metagenomes comprising varying levels of viral diversity and abundance were used to determine the optimal assembly and gene prediction strategy, and multiple sequence assembly methods and gene prediction tools were tested in order to optimize our analysis workflow. HoloVir performs pairwise comparisons of single read and predicted gene datasets against the viral RefSeq database to assign taxonomy and additional comparison to phage-specific and cellular markers is undertaken to support the taxonomic assignments and identify potential cellular contamination. Broad functional classification of the predicted genes is provided by assignment of COG microbial functional category classifications using EggNOG and higher resolution functional analysis is achieved by searching for enrichment of specific Swiss-Prot keywords within the viral metagenome. Application of HoloVir to viral metagenomes from the coral Pocillopora damicornis and the sponge Rhopaloeides odorabile demonstrated that HoloVir provides a valuable tool to characterize holobiont viral communities across species, environments, or experiments. PMID:27375564

  16. DNA Microarray Profiling of a Diverse Collection of Nosocomial Methicillin-Resistant Staphylococcus aureus Isolates Assigns the Majority to the Correct Sequence Type and Staphylococcal Cassette Chromosome mec (SCCmec) Type and Results in the Subsequent Identification and Characterization of Novel SCCmec-SCCM1 Composite Islands

    PubMed Central

    Brennan, Orla M.; Deasy, Emily C.; Rossney, Angela S.; Kinnevey, Peter M.; Ehricht, Ralf; Monecke, Stefan; Coleman, David C.

    2012-01-01

    One hundred seventy-five isolates representative of methicillin-resistant Staphylococcus aureus (MRSA) clones that predominated in Irish hospitals between 1971 and 2004 and that previously underwent multilocus sequence typing (MLST) and staphylococcal cassette chromosome mec (SCCmec) typing were characterized by spa typing (175 isolates) and DNA microarray profiling (107 isolates). The isolates belonged to 26 sequence type (ST)-SCCmec types and subtypes and 35 spa types. The array assigned all isolates to the correct MLST clonal complex (CC), and 94% (100/107) were assigned an ST, with 98% (98/100) correlating with MLST. The array assigned all isolates to the correct SCCmec type, but subtyping of only some SCCmec elements was possible. Additional SCCmec/SCC genes or DNA sequence variation not detected by SCCmec typing was detected by array profiling, including the SCC-fusidic acid resistance determinant Q6GD50/fusC. Novel SCCmec/SCC composite islands (CIs) were detected among CC8 isolates and comprised SCCmec IIA-IIE, IVE, IVF, or IVg and a ccrAB4-SCC element with 99% DNA sequence identity to SCCM1 from ST8/t024-MRSA, SCCmec VIII, and SCC-CI in Staphylococcus epidermidis. The array showed that the majority of isolates harbored one or more superantigen (94%; 100/107) and immune evasion cluster (91%; 97/107) genes. Apart from fusidic acid and trimethoprim resistance, the correlation between isolate antimicrobial resistance phenotype and the presence of specific resistance genes was ≥97%. Array profiling allowed high-throughput, accurate assignment of MRSA to CCs/STs and SCCmec types and provided further evidence of the diversity of SCCmec/SCC. In most cases, array profiling can accurately predict the resistance phenotype of an isolate. PMID:22869569

  17. Sequence-specific 1H-NMR assignments for the aromatic region of several biologically active, monomeric insulins including native human insulin.

    PubMed

    Roy, M; Lee, R W; Kaarsholm, N C; Thøgersen, H; Brange, J; Dunn, M F

    1990-06-12

    The aromatic region of the 1H-FT-NMR spectrum of the biologically fully-potent, monomeric human insulin mutant, B9 Ser----Asp, B27 Thr----Glu has been investigated in D2O. At 1 to 5 mM concentrations, this mutant insulin is monomeric above pH 7.5. Coupling and amino acid classification of all aromatic signals is established via a combination of homonuclear one- and two-dimensional methods, including COSY, multiple quantum filters, selective spin decoupling and pH titrations. By comparisons with other insulin mutants and with chemically modified native insulins, all resonances in the aromatic region are given sequence-specific assignments without any reliance on the various crystal structures reported for insulin. These comparisons also give the sequence-specific assignments of most of the aromatic resonances of the mutant insulins B16 Tyr----Glu, B27 Thr----Glu and B25 Phe----Asp and the chemically modified species des-(B23-B30) insulin and monoiodo-Tyr A14 insulin. Chemical dispersion of the assigned resonances, ring current perturbations and comparisons at high pH have made possible the assignment of the aromatic resonances of human insulin, and these studies indicate that the major structural features of the human insulin monomer (including those critical to biological function) are also present in the monomeric mutant.

  18. Mycofier: a new machine learning-based classifier for fungal ITS sequences.

    PubMed

    Delgado-Serrano, Luisa; Restrepo, Silvia; Bustos, Jose Ricardo; Zambrano, Maria Mercedes; Anzola, Juan Manuel

    2016-08-11

    The taxonomic and phylogenetic classification based on sequence analysis of the ITS1 genomic region has become a crucial component of fungal ecology and diversity studies. Nowadays, there is no accurate alignment-free classification tool for fungal ITS1 sequences for large environmental surveys. This study describes the development of a machine learning-based classifier for the taxonomical assignment of fungal ITS1 sequences at the genus level. A fungal ITS1 sequence database was built using curated data. Training and test sets were generated from it. A Naïve Bayesian classifier was built using features from the primary sequence with an accuracy of 87 % in the classification at the genus level. The final model was based on a Naïve Bayes algorithm using ITS1 sequences from 510 fungal genera. This classifier, denoted as Mycofier, provides similar classification accuracy compared to BLASTN, but the database used for the classification contains curated data and the tool, independent of alignment, is more efficient and contributes to the field, given the lack of an accurate classification tool for large data from fungal ITS1 sequences. The software and source code for Mycofier are freely available at https://github.com/ldelgado-serrano/mycofier.git .

  19. GenBank.

    PubMed

    Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W

    2010-01-01

    GenBank is a comprehensive database that contains publicly available nucleotide sequences for more than 300,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bi-monthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI homepage: www.ncbi.nlm.nih.gov.

  20. GenBank.

    PubMed

    Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W

    2009-01-01

    GenBank is a comprehensive database that contains publicly available nucleotide sequences for more than 300,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank(R) staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the National Center for Biotechnology Information (NCBI) Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov.

  1. Multi-loci diagnosis of acute lymphoblastic leukaemia with high-throughput sequencing and bioinformatics analysis.

    PubMed

    Ferret, Yann; Caillault, Aurélie; Sebda, Shéhérazade; Duez, Marc; Grardel, Nathalie; Duployez, Nicolas; Villenet, Céline; Figeac, Martin; Preudhomme, Claude; Salson, Mikaël; Giraud, Mathieu

    2016-05-01

    High-throughput sequencing (HTS) is considered a technical revolution that has improved our knowledge of lymphoid and autoimmune diseases, changing our approach to leukaemia both at diagnosis and during follow-up. As part of an immunoglobulin/T cell receptor-based minimal residual disease (MRD) assessment of acute lymphoblastic leukaemia patients, we assessed the performance and feasibility of the replacement of the first steps of the approach based on DNA isolation and Sanger sequencing, using a HTS protocol combined with bioinformatics analysis and visualization using the Vidjil software. We prospectively analysed the diagnostic and relapse samples of 34 paediatric patients, thus identifying 125 leukaemic clones with recombinations on multiple loci (TRG, TRD, IGH and IGK), including Dd2/Dd3 and Intron/KDE rearrangements. Sequencing failures were halved (14% vs. 34%, P = 0.0007), enabling more patients to be monitored. Furthermore, more markers per patient could be monitored, reducing the probability of false negative MRD results. The whole analysis, from sample receipt to clinical validation, was shorter than our current diagnostic protocol, with equal resources. V(D)J recombination was successfully assigned by the software, even for unusual recombinations. This study emphasizes the progress that HTS with adapted bioinformatics tools can bring to the diagnosis of leukaemia patients. © 2016 John Wiley & Sons Ltd.

  2. Pantoea allii sp. nov., isolated from onion plants and seed.

    PubMed

    Brady, Carrie L; Goszczynska, Teresa; Venter, Stephanus N; Cleenwerck, Ilse; De Vos, Paul; Gitaitis, Ronald D; Coutinho, Teresa A

    2011-04-01

    Eight yellow-pigmented, Gram-negative, rod-shaped, oxidase-negative, motile, facultatively anaerobic bacteria were isolated from onion seed in South Africa and from an onion plant exhibiting centre rot symptoms in the USA. The isolates were assigned to the genus Pantoea on the basis of phenotypic and biochemical tests. 16S rRNA gene sequence analysis and multilocus sequence analysis (MLSA), based on gyrB, rpoB, infB and atpD sequences, confirmed the allocation of the isolates to the genus Pantoea. MLSA further indicated that the isolates represented a novel species, which was phylogenetically most closely related to Pantoea ananatis and Pantoea stewartii. Amplified fragment length polymorphism analysis also placed the isolates into a cluster separate from P. ananatis and P. stewartii. Compared with type strains of species of the genus Pantoea that showed >97 % 16S rRNA gene sequence similarity with strain BD 390(T), the isolates exhibited 11-55 % whole-genome DNA-DNA relatedness, which confirmed the classification of the isolates in a novel species. The most useful phenotypic characteristics for the differentiation of the isolates from their closest phylogenetic neighbours are production of acid from amygdalin and utilization of adonitol and sorbitol. A novel species, Pantoea allii sp. nov., is proposed, with type strain BD 390(T) ( = LMG 24248(T)).

  3. VaDiR: an integrated approach to Variant Detection in RNA.

    PubMed

    Neums, Lisa; Suenaga, Seiji; Beyerlein, Peter; Anders, Sara; Koestler, Devin; Mariani, Andrea; Chien, Jeremy

    2018-02-01

    Advances in next-generation DNA sequencing technologies are now enabling detailed characterization of sequence variations in cancer genomes. With whole-genome sequencing, variations in coding and non-coding sequences can be discovered. But the cost associated with it is currently limiting its general use in research. Whole-exome sequencing is used to characterize sequence variations in coding regions, but the cost associated with capture reagents and biases in capture rate limit its full use in research. Additional limitations include uncertainty in assigning the functional significance of the mutations when these mutations are observed in the non-coding region or in genes that are not expressed in cancer tissue. We investigated the feasibility of uncovering mutations from expressed genes using RNA sequencing datasets with a method called Variant Detection in RNA(VaDiR) that integrates 3 variant callers, namely: SNPiR, RVBoost, and MuTect2. The combination of all 3 methods, which we called Tier 1 variants, produced the highest precision with true positive mutations from RNA-seq that could be validated at the DNA level. We also found that the integration of Tier 1 variants with those called by MuTect2 and SNPiR produced the highest recall with acceptable precision. Finally, we observed a higher rate of mutation discovery in genes that are expressed at higher levels. Our method, VaDiR, provides a possibility of uncovering mutations from RNA sequencing datasets that could be useful in further functional analysis. In addition, our approach allows orthogonal validation of DNA-based mutation discovery by providing complementary sequence variation analysis from paired RNA/DNA sequencing datasets.

  4. Analysis of MHC class I genes across horse MHC haplotypes

    PubMed Central

    Tallmadge, Rebecca L.; Campbell, Julie A.; Miller, Donald C.; Antczak, Douglas F.

    2010-01-01

    The genomic sequences of 15 horse Major Histocompatibility Complex (MHC) class I genes and a collection of MHC class I homozygous horses of five different haplotypes were used to investigate the genomic structure and polymorphism of the equine MHC. A combination of conserved and locus-specific primers was used to amplify horse MHC class I genes with classical and non-classical characteristics. Multiple clones from each haplotype identified three to five classical sequences per homozygous animal, and two to three non-classical sequences. Phylogenetic analysis was applied to these sequences and groups were identified which appear to be allelic series, but some sequences were left ungrouped. Sequences determined from MHC class I heterozygous horses and previously described MHC class I sequences were then added, representing a total of ten horse MHC haplotypes. These results were consistent with those obtained from the MHC homozygous horses alone, and 30 classical sequences were assigned to four previously confirmed loci and three new provisional loci. The non-classical genes had few alleles and the classical genes had higher levels of allelic polymorphism. Alleles for two classical loci with the expected pattern of polymorphism were found in the majority of haplotypes tested, but alleles at two other commonly detected loci had more variation outside of the hypervariable region than within. Our data indicate that the equine Major Histocompatibility Complex is characterized by variation in the complement of class I genes expressed in different haplotypes in addition to the expected allelic polymorphism within loci. PMID:20099063

  5. Towards proteomic analysis of milk proteins in historical building materials

    NASA Astrophysics Data System (ADS)

    Kuckova, S.; Crhova, M.; Vankova, L.; Hnizda, A.; Hynek, R.; Kodicek, M.

    2009-07-01

    The addition of proteinaceous binders to mortars and plasters has a long tradition. The protein additions were identified in many sacral and secular historical buildings. For this method of peptide mass mapping, three model mortar samples with protein additives were prepared. These samples were analysed fresh (1-2 weeks old) and after 9 months of natural ageing. The optimal duration of tryptic cleavage (2 h) and the lowest amount of material needed for relevant analysis of fresh and weathered samples were found; the sufficient amounts of weathered and fresh mortars were set to 0.05 and 0.005 g. The list of main tryptic peptides coming from milk additives (bovine milk, curd, and whey), their relative intensities and theoretical amino acid sequences assignment is presented. Several sequences have been "de novo" confirmed by mass spectrometry.

  6. Characterization and Expression of Drug Resistance Genes in MDROs Originating from Combat Wound Infections

    DTIC Science & Technology

    2016-09-01

    assigned a classification. MLST analysis MLST was determined using an in-house automated pipeline that first searches for homologs of each gene of...and virulence mechanism contributing to their success as pathogens in the wound environment. A novel bioinformatics pipeline was used to incorporate...monitored in two ways: read-based genome QC and assembly based metrics. The JCVI Genome QC pipeline samples sequence reads and performs BLAST

  7. Tidying Up International Nucleotide Sequence Databases: Ecological, Geographical and Sequence Quality Annotation of ITS Sequences of Mycorrhizal Fungi

    PubMed Central

    Tedersoo, Leho; Abarenkov, Kessy; Nilsson, R. Henrik; Schüssler, Arthur; Grelet, Gwen-Aëlle; Kohout, Petr; Oja, Jane; Bonito, Gregory M.; Veldre, Vilmar; Jairus, Teele; Ryberg, Martin; Larsson, Karl-Henrik; Kõljalg, Urmas

    2011-01-01

    Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi. PMID:21949797

  8. Comparative transcriptome analysis of microsclerotia development in Nomuraea rileyi

    PubMed Central

    2013-01-01

    Background Nomuraea rileyi is used as an environmental-friendly biopesticide. However, mass production and commercialization of this organism are limited due to its fastidious growth and sporulation requirements. When cultured in amended medium, we found that N. rileyi could produce microsclerotia bodies, replacing conidiophores as the infectious agent. However, little is known about the genes involved in microsclerotia development. In the present study, the transcriptomes were analyzed using next-generation sequencing technology to find the genes involved in microsclerotia development. Results A total of 4.69 Gb of clean nucleotides comprising 32,061 sequences was obtained, and 20,919 sequences were annotated (about 65%). Among the annotated sequences, only 5928 were annotated with 34 gene ontology (GO) functional categories, and 12,778 sequences were mapped to 165 pathways by searching against the Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) database. Furthermore, we assessed the transcriptomic differences between cultures grown in minimal and amended medium. In total, 4808 sequences were found to be differentially expressed; 719 differentially expressed unigenes were assigned to 25 GO classes and 1888 differentially expressed unigenes were assigned to 161 KEGG pathways, including 25 enrichment pathways. Subsequently, we examined the up-regulation or uniquely expressed genes following amended medium treatment, which were also expressed on the enrichment pathway, and found that most of them participated in mediating oxidative stress homeostasis. To elucidate the role of oxidative stress in microsclerotia development, we analyzed the diversification of unigenes using quantitative reverse transcription-PCR (RT-qPCR). Conclusion Our findings suggest that oxidative stress occurs during microsclerotia development, along with a broad metabolic activity change. Our data provide the most comprehensive sequence resource available for the study of N. rileyi. We believe that the transcriptome datasets will serve as an important public information platform to accelerate studies on N. rileyi microsclerotia. PMID:23777366

  9. Quality scores for 32,000 genomes

    DOE PAGES

    Land, Miriam L.; Hyatt, Doug; Jun, Se-Ran; ...

    2014-12-08

    More than 80% of the microbial genomes in GenBank are of ‘draft’ quality (12,553 draft vs. 2,679 finished, as of October, 2013). In this study, we have examined all the microbial DNA sequences available for complete, draft, and Sequence Read Archive genomes in GenBank as well as three other major public databases, and assigned quality scores for more than 30,000 prokaryotic genome sequences. Scores were assigned using four categories: the completeness of the assembly, the presence of full-length rRNA genes, tRNA composition and the presence of a set of 102 conserved genes in prokaryotes. Most (~88%) of the genomes hadmore » quality scores of 0.8 or better and can be safely used for standard comparative genomics analysis. We compared genomes across factors that may influence the score. We found that although sequencing depth coverage of over 100x did not ensure a better score, sequencing read length was a better indicator of sequencing quality. With few exceptions, most of the 30,000 genomes have nearly all the 102 essential genes. The score can be used to set thresholds for screening data when analyzing “all published genomes” and reference data is either not available or not applicable. The scores highlighted organisms for which commonly used tools do not perform well. This information can be used to improve tools and to serve a broad group of users as more diverse organisms are sequenced. Finally and unexpectedly, the comparison of predicted tRNAs across 15,000 high quality genomes showed that anticodons beginning with an ‘A’ (codons ending with a ‘U’) are almost non-existent, with the exception of one arginine codon (CGU); this has been noted previously in the literature for a few genomes, but not with the depth found here.« less

  10. Quality scores for 32,000 genomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Land, Miriam L.; Hyatt, Doug; Jun, Se-Ran

    More than 80% of the microbial genomes in GenBank are of ‘draft’ quality (12,553 draft vs. 2,679 finished, as of October, 2013). In this study, we have examined all the microbial DNA sequences available for complete, draft, and Sequence Read Archive genomes in GenBank as well as three other major public databases, and assigned quality scores for more than 30,000 prokaryotic genome sequences. Scores were assigned using four categories: the completeness of the assembly, the presence of full-length rRNA genes, tRNA composition and the presence of a set of 102 conserved genes in prokaryotes. Most (~88%) of the genomes hadmore » quality scores of 0.8 or better and can be safely used for standard comparative genomics analysis. We compared genomes across factors that may influence the score. We found that although sequencing depth coverage of over 100x did not ensure a better score, sequencing read length was a better indicator of sequencing quality. With few exceptions, most of the 30,000 genomes have nearly all the 102 essential genes. The score can be used to set thresholds for screening data when analyzing “all published genomes” and reference data is either not available or not applicable. The scores highlighted organisms for which commonly used tools do not perform well. This information can be used to improve tools and to serve a broad group of users as more diverse organisms are sequenced. Finally and unexpectedly, the comparison of predicted tRNAs across 15,000 high quality genomes showed that anticodons beginning with an ‘A’ (codons ending with a ‘U’) are almost non-existent, with the exception of one arginine codon (CGU); this has been noted previously in the literature for a few genomes, but not with the depth found here.« less

  11. Draft Sequences of the Radish (Raphanus sativus L.) Genome

    PubMed Central

    Kitashiba, Hiroyasu; Li, Feng; Hirakawa, Hideki; Kawanabe, Takahiro; Zou, Zhongwei; Hasegawa, Yoichi; Tonosaki, Kaoru; Shirasawa, Sachiko; Fukushima, Aki; Yokoi, Shuji; Takahata, Yoshihito; Kakizaki, Tomohiro; Ishida, Masahiko; Okamoto, Shunsuke; Sakamoto, Koji; Shirasawa, Kenta; Tabata, Satoshi; Nishio, Takeshi

    2014-01-01

    Radish (Raphanus sativus L., n = 9) is one of the major vegetables in Asia. Since the genomes of Brassica and related species including radish underwent genome rearrangement, it is quite difficult to perform functional analysis based on the reported genomic sequence of Brassica rapa. Therefore, we performed genome sequencing of radish. Short reads of genomic sequences of 191.1 Gb were obtained by next-generation sequencing (NGS) for a radish inbred line, and 76,592 scaffolds of ≥300 bp were constructed along with the bacterial artificial chromosome-end sequences. Finally, the whole draft genomic sequence of 402 Mb spanning 75.9% of the estimated genomic size and containing 61,572 predicted genes was obtained. Subsequently, 221 single nucleotide polymorphism markers and 768 PCR-RFLP markers were used together with the 746 markers produced in our previous study for the construction of a linkage map. The map was combined further with another radish linkage map constructed mainly with expressed sequence tag-simple sequence repeat markers into a high-density integrated map of 1,166 cM with 2,553 DNA markers. A total of 1,345 scaffolds were assigned to the linkage map, spanning 116.0 Mb. Bulked PCR products amplified by 2,880 primer pairs were sequenced by NGS, and SNPs in eight inbred lines were identified. PMID:24848699

  12. Partial DNA sequencing of Douglas-fir cDNAs used in RFLP mapping

    Treesearch

    K.D. Jermstad; D.L. Bassoni; C.S. Kinlaw; D.B. Neale

    1998-01-01

    DNA sequences from 87 Douglas-fir (Pseudotsuga menziesii [Mirb.] Franco) cDNA RFLP probes were determined. Sequences were submitted to the GenBank dbEST database and searched for similarity against nucleotide and protein databases using the BLASTn and BLASTx programs. Twenty-one sequences (24%) were assigned putative functions; 18 of which...

  13. Preparation of fosmid libraries and functional metagenomic analysis of microbial community DNA.

    PubMed

    Martínez, Asunción; Osburne, Marcia S

    2013-01-01

    One of the most important challenges in contemporary microbial ecology is to assign a functional role to the large number of novel genes discovered through large-scale sequencing of natural microbial communities that lack similarity to genes of known function. Functional screening of metagenomic libraries, that is, screening environmental DNA clones for the ability to confer an activity of interest to a heterologous bacterial host, is a promising approach for bridging the gap between metagenomic DNA sequencing and functional characterization. Here, we describe methods for isolating environmental DNA and constructing metagenomic fosmid libraries, as well as methods for designing and implementing successful functional screens of such libraries. © 2013 Elsevier Inc. All rights reserved.

  14. An Approach to Function Annotation for Proteins of Unknown Function (PUFs) in the Transcriptome of Indian Mulberry.

    PubMed

    Dhanyalakshmi, K H; Naika, Mahantesha B N; Sajeevan, R S; Mathew, Oommen K; Shafi, K Mohamed; Sowdhamini, Ramanathan; N Nataraja, Karaba

    2016-01-01

    The modern sequencing technologies are generating large volumes of information at the transcriptome and genome level. Translation of this information into a biological meaning is far behind the race due to which a significant portion of proteins discovered remain as proteins of unknown function (PUFs). Attempts to uncover the functional significance of PUFs are limited due to lack of easy and high throughput functional annotation tools. Here, we report an approach to assign putative functions to PUFs, identified in the transcriptome of mulberry, a perennial tree commonly cultivated as host of silkworm. We utilized the mulberry PUFs generated from leaf tissues exposed to drought stress at whole plant level. A sequence and structure based computational analysis predicted the probable function of the PUFs. For rapid and easy annotation of PUFs, we developed an automated pipeline by integrating diverse bioinformatics tools, designated as PUFs Annotation Server (PUFAS), which also provides a web service API (Application Programming Interface) for a large-scale analysis up to a genome. The expression analysis of three selected PUFs annotated by the pipeline revealed abiotic stress responsiveness of the genes, and hence their potential role in stress acclimation pathways. The automated pipeline developed here could be extended to assign functions to PUFs from any organism in general. PUFAS web server is available at http://caps.ncbs.res.in/pufas/ and the web service is accessible at http://capservices.ncbs.res.in/help/pufas.

  15. The Genome Sequence of Mannheimia haemolytica A1: Insights into Virulence, Natural Competence, and Pasteurellaceae Phylogeny†

    PubMed Central

    Gioia, Jason; Qin, Xiang; Jiang, Huaiyang; Clinkenbeard, Kenneth; Lo, Reggie; Liu, Yamei; Fox, George E.; Yerrapragada, Shailaja; McLeod, Michael P.; McNeill, Thomas Z.; Hemphill, Lisa; Sodergren, Erica; Wang, Qiaoyan; Muzny, Donna M.; Homsi, Farah J.; Weinstock, George M.; Highlander, Sarah K.

    2006-01-01

    The draft genome sequence of Mannheimia haemolytica A1, the causative agent of bovine respiratory disease complex (BRDC), is presented. Strain ATCC BAA-410, isolated from the lung of a calf with BRDC, was the DNA source. The annotated genome includes 2,839 coding sequences, 1,966 of which were assigned a function and 436 of which are unique to M. haemolytica. Through genome annotation many features of interest were identified, including bacteriophages and genes related to virulence, natural competence, and transcriptional regulation. In addition to previously described virulence factors, M. haemolytica encodes adhesins, including the filamentous hemagglutinin FhaB and two trimeric autotransporter adhesins. Two dual-function immunoglobulin-protease/adhesins are also present, as is a third immunoglobulin protease. Genes related to iron acquisition and drug resistance were identified and are likely important for survival in the host and virulence. Analysis of the genome indicates that M. haemolytica is naturally competent, as genes for natural competence and DNA uptake signal sequences (USS) are present. Comparison of competence loci and USS in other species in the family Pasteurellaceae indicates that M. haemolytica, Actinobacillus pleuropneumoniae, and Haemophilus ducreyi form a lineage distinct from other Pasteurellaceae. This observation was supported by a phylogenetic analysis using sequences of predicted housekeeping genes. PMID:17015664

  16. De Novo RNA Sequencing and Transcriptome Analysis of Colletotrichum gloeosporioides ES026 Reveal Genes Related to Biosynthesis of Huperzine A

    PubMed Central

    Zhang, Xiangmei; Xia, Qianqian; Zhao, Xinmei; Ahn, Youngjoon; Ahmed, Nevin; Cosoveanu, Andreea; Wang, Mo; Wang, Jialu; Shu, Shaohua

    2015-01-01

    Huperzine A is important in the treatment of Alzheimer’s disease. There are major challenges for the mass production of huperzine A from plants due to the limited number of huperzine-A-producing plants, as well as the low content of huperzine A in these plants. Various endophytic fungi produce huperzine A. Colletotrichum gloeosporioides ES026 was previously isolated from a huperzine-A-producing plant Huperzia serrata, and this fungus also produces huperzine A. In this study, de novo RNA sequencing of C. gloeosporioides ES026 was carried out with an Illumina HiSeq2000. A total of 4,324,299,051 bp from 50,442,617 high-quality sequence reads of ES026 were obtained. These raw data were assembled into 24,998 unigenes, 40,536,684 residues and 19,790 genes. The majority of the unique sequences were assigned to corresponding putative functions based on BLAST searches of public databases. The molecular functions, biological processes and biochemical pathways of these unique sequences were determined using gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) assignments. A gene encoding copper amine oxidase (CAO) (unigene 9322) was annotated for the conversion of cadaverine to 5-aminopentanal in the biosynthesis of huperzine A. This gene was also detected in the root, stem and leaf of H. serrata. Furthermore, a close relationship was observed between expression of the CAO gene (unigene 9322) and quantity of crude huperzine A extracted from ES026. Therefore, CAO might be involved in the biosynthesis of huperzine A and it most likely plays a key role in regulating the content of huperzine A in ES026. PMID:25799531

  17. Microbial Contaminants of Cord Blood Units Identified by 16S rRNA Sequencing and by API Test System, and Antibiotic Sensitivity Profiling

    PubMed Central

    França, Luís; Simões, Catarina; Taborda, Marco; Diogo, Catarina; da Costa, Milton S.

    2015-01-01

    Over a period of ten months a total of 5618 cord blood units (CBU) were screened for microbial contamination under routine conditions. The antibiotic resistance profile for all isolates was also examined using ATB strips. The detection rate for culture positive units was 7.5%, corresponding to 422 samples.16S rRNA sequence analysis and identification with API test system were used to identify the culturable aerobic, microaerophilic and anaerobic bacteria from CBUs. From these samples we recovered 485 isolates (84 operational taxonomic units, OTUs) assigned to the classes Bacteroidia, Actinobacteria, Clostridia, Bacilli, Betaproteobacteria and primarily to the Gammaproteobacteria. Sixty-nine OTUs, corresponding to 447 isolates, showed 16S rRNA sequence similarities above 99.0% with known cultured bacteria. However, 14 OTUs had 16S rRNA sequence similarities between 95 and 99% in support of genus level identification and one OTU with 16S rRNA sequence similarity of 90.3% supporting a family level identification only. The phenotypic identification formed 29 OTUs that could be identified to the species level and 9 OTUs that could be identified to the genus level by API test system. We failed to obtain identification for 14 OTUs, while 32 OTUs comprised organisms producing mixed identifications. Forty-two OTUs covered species not included in the API system databases. The API test system Rapid ID 32 Strep and Rapid ID 32 E showed the highest proportion of identifications to the species level, the lowest ratio of unidentified results and the highest agreement to the results of 16S rRNA assignments. Isolates affiliated to the Bacilli and Bacteroidia showed the highest antibiotic multi-resistance indices and microorganisms of the Clostridia displayed the most antibiotic sensitive phenotypes. PMID:26512991

  18. Microbial Contaminants of Cord Blood Units Identified by 16S rRNA Sequencing and by API Test System, and Antibiotic Sensitivity Profiling.

    PubMed

    França, Luís; Simões, Catarina; Taborda, Marco; Diogo, Catarina; da Costa, Milton S

    2015-01-01

    Over a period of ten months a total of 5618 cord blood units (CBU) were screened for microbial contamination under routine conditions. The antibiotic resistance profile for all isolates was also examined using ATB strips. The detection rate for culture positive units was 7.5%, corresponding to 422 samples.16S rRNA sequence analysis and identification with API test system were used to identify the culturable aerobic, microaerophilic and anaerobic bacteria from CBUs. From these samples we recovered 485 isolates (84 operational taxonomic units, OTUs) assigned to the classes Bacteroidia, Actinobacteria, Clostridia, Bacilli, Betaproteobacteria and primarily to the Gammaproteobacteria. Sixty-nine OTUs, corresponding to 447 isolates, showed 16S rRNA sequence similarities above 99.0% with known cultured bacteria. However, 14 OTUs had 16S rRNA sequence similarities between 95 and 99% in support of genus level identification and one OTU with 16S rRNA sequence similarity of 90.3% supporting a family level identification only. The phenotypic identification formed 29 OTUs that could be identified to the species level and 9 OTUs that could be identified to the genus level by API test system. We failed to obtain identification for 14 OTUs, while 32 OTUs comprised organisms producing mixed identifications. Forty-two OTUs covered species not included in the API system databases. The API test system Rapid ID 32 Strep and Rapid ID 32 E showed the highest proportion of identifications to the species level, the lowest ratio of unidentified results and the highest agreement to the results of 16S rRNA assignments. Isolates affiliated to the Bacilli and Bacteroidia showed the highest antibiotic multi-resistance indices and microorganisms of the Clostridia displayed the most antibiotic sensitive phenotypes.

  19. Self-organizing approach for meta-genomes.

    PubMed

    Zhu, Jianfeng; Zheng, Wei-Mou

    2014-12-01

    We extend the self-organizing approach for annotation of a bacterial genome to analyze the raw sequencing data of the human gut metagenome without sequence assembling. The original approach divides the genomic sequence of a bacterium into non-overlapping segments of equal length and assigns to each segment one of seven 'phases', among which one is for the noncoding regions, three for the direct coding regions to indicate the three possible codon positions of the segment starting site, and three for the reverse coding regions. The noncoding phase and the six coding phases are described by two frequency tables of the 64 triplet types or 'codon usages'. A set of codon usages can be used to update the phase assignment and vice versa. An iteration after an initialization leads to a convergent phase assignment to give an annotation of the genome. In the extension of the approach to a metagenome, we consider a mixture model of a number of categories described by different codon usages. The Illumina Genome Analyzer sequencing data of the total DNA from faecal samples are then examined to understand the diversity of the human gut microbiome. Copyright © 2014 Elsevier Ltd. All rights reserved.

  20. Using comparative genome analysis to identify problems in annotated microbial genomes.

    PubMed

    Poptsova, Maria S; Gogarten, J Peter

    2010-07-01

    Genome annotation is a tedious task that is mostly done by automated methods; however, the accuracy of these approaches has been questioned since the beginning of the sequencing era. Genome annotation is a multilevel process, and errors can emerge at different stages: during sequencing, as a result of gene-calling procedures, and in the process of assigning gene functions. Missed or wrongly annotated genes differentially impact different types of analyses. Here we discuss and demonstrate how the methods of comparative genome analysis can refine annotations by locating missing orthologues. We also discuss possible reasons for errors and show that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Since old errors may propagate to the newly sequenced genomes, we emphasize that the problem of continuously updating popular public databases is an urgent and unresolved one. Due to the progress in genome-sequencing technologies, automated annotation techniques will remain the main approach in the future. Researchers need to be aware of the existing errors in the annotation of even well-studied genomes, such as Escherichia coli, and consider additional quality control for their results.

  1. Rapid phylogenetic dissection of prokaryotic community structure in tidal flat using pyrosequencing.

    PubMed

    Kim, Bong-Soo; Kim, Byung Kwon; Lee, Jae-Hak; Kim, Myungjin; Lim, Young Woon; Chun, Jongsik

    2008-08-01

    Dissection of prokaryotic community structure is prerequisite to understand their ecological roles. Various methods are available for such a purpose which amplification and sequencing of 16S rRNA genes gained its popularity. However, conventional methods based on Sanger sequencing technique require cloning process prior to sequencing, and are expensive and labor-intensive. We investigated prokaryotic community structure in tidal flat sediments, Korea, using pyrosequencing and a subsequent automated bioinformatic pipeline for the rapid and accurate taxonomic assignment of each amplicon. The combination of pyrosequencing and bioinformatic analysis showed that bacterial and archaeal communities were more diverse than previously reported in clone library studies. Pyrosequencing analysis revealed 21 bacterial divisions and 37 candidate divisions. Proteobacteria was the most abundant division in the bacterial community, of which Gamma-and Delta-Proteobacteria were the most abundant. Similarly, 4 archaeal divisions were found in tidal flat sediments. Euryarchaeota was the most abundant division in the archaeal sequences, which were further divided into 8 classes and 11 unclassified euryarchaeota groups. The system developed here provides a simple, in-depth and automated way of dissecting a prokaryotic community structure without extensive pretreatment such as cloning.

  2. Identification of novel biomass-degrading enzymes from genomic dark matter: Populating genomic sequence space with functional annotation.

    PubMed

    Piao, Hailan; Froula, Jeff; Du, Changbin; Kim, Tae-Wan; Hawley, Erik R; Bauer, Stefan; Wang, Zhong; Ivanova, Nathalia; Clark, Douglas S; Klenk, Hans-Peter; Hess, Matthias

    2014-08-01

    Although recent nucleotide sequencing technologies have significantly enhanced our understanding of microbial genomes, the function of ∼35% of genes identified in a genome currently remains unknown. To improve the understanding of microbial genomes and consequently of microbial processes it will be crucial to assign a function to this "genomic dark matter." Due to the urgent need for additional carbohydrate-active enzymes for improved production of transportation fuels from lignocellulosic biomass, we screened the genomes of more than 5,500 microorganisms for hypothetical proteins that are located in the proximity of already known cellulases. We identified, synthesized and expressed a total of 17 putative cellulase genes with insufficient sequence similarity to currently known cellulases to be identified as such using traditional sequence annotation techniques that rely on significant sequence similarity. The recombinant proteins of the newly identified putative cellulases were subjected to enzymatic activity assays to verify their hydrolytic activity towards cellulose and lignocellulosic biomass. Eleven (65%) of the tested enzymes had significant activity towards at least one of the substrates. This high success rate highlights that a gene context-based approach can be used to assign function to genes that are otherwise categorized as "genomic dark matter" and to identify biomass-degrading enzymes that have little sequence similarity to already known cellulases. The ability to assign function to genes that have no related sequence representatives with functional annotation will be important to enhance our understanding of microbial processes and to identify microbial proteins for a wide range of applications. © 2014 Wiley Periodicals, Inc.

  3. A manually annotated Actinidia chinensis var. chinensis (kiwifruit) genome highlights the challenges associated with draft genomes and gene prediction in plants.

    PubMed

    Pilkington, Sarah M; Crowhurst, Ross; Hilario, Elena; Nardozza, Simona; Fraser, Lena; Peng, Yongyan; Gunaseelan, Kularajathevan; Simpson, Robert; Tahir, Jibran; Deroles, Simon C; Templeton, Kerry; Luo, Zhiwei; Davy, Marcus; Cheng, Canhong; McNeilage, Mark; Scaglione, Davide; Liu, Yifei; Zhang, Qiong; Datson, Paul; De Silva, Nihal; Gardiner, Susan E; Bassett, Heather; Chagné, David; McCallum, John; Dzierzon, Helge; Deng, Cecilia; Wang, Yen-Yi; Barron, Lorna; Manako, Kelvina; Bowen, Judith; Foster, Toshi M; Erridge, Zoe A; Tiffin, Heather; Waite, Chethi N; Davies, Kevin M; Grierson, Ella P; Laing, William A; Kirk, Rebecca; Chen, Xiuyin; Wood, Marion; Montefiori, Mirco; Brummell, David A; Schwinn, Kathy E; Catanach, Andrew; Fullerton, Christina; Li, Dawei; Meiyalaghan, Sathiyamoorthy; Nieuwenhuizen, Niels; Read, Nicola; Prakash, Roneel; Hunter, Don; Zhang, Huaibi; McKenzie, Marian; Knäbel, Mareike; Harris, Alastair; Allan, Andrew C; Gleave, Andrew; Chen, Angela; Janssen, Bart J; Plunkett, Blue; Ampomah-Dwamena, Charles; Voogd, Charlotte; Leif, Davin; Lafferty, Declan; Souleyre, Edwige J F; Varkonyi-Gasic, Erika; Gambi, Francesco; Hanley, Jenny; Yao, Jia-Long; Cheung, Joey; David, Karine M; Warren, Ben; Marsh, Ken; Snowden, Kimberley C; Lin-Wang, Kui; Brian, Lara; Martinez-Sanchez, Marcela; Wang, Mindy; Ileperuma, Nadeesha; Macnee, Nikolai; Campin, Robert; McAtee, Peter; Drummond, Revel S M; Espley, Richard V; Ireland, Hilary S; Wu, Rongmei; Atkinson, Ross G; Karunairetnam, Sakuntala; Bulley, Sean; Chunkath, Shayhan; Hanley, Zac; Storey, Roy; Thrimawithana, Amali H; Thomson, Susan; David, Charles; Testolin, Raffaele; Huang, Hongwen; Hellens, Roger P; Schaffer, Robert J

    2018-04-16

    Most published genome sequences are drafts, and most are dominated by computational gene prediction. Draft genomes typically incorporate considerable sequence data that are not assigned to chromosomes, and predicted genes without quality confidence measures. The current Actinidia chinensis (kiwifruit) 'Hongyang' draft genome has 164 Mb of sequences unassigned to pseudo-chromosomes, and omissions have been identified in the gene models. A second genome of an A. chinensis (genotype Red5) was fully sequenced. This new sequence resulted in a 554.0 Mb assembly with all but 6 Mb assigned to pseudo-chromosomes. Pseudo-chromosomal comparisons showed a considerable number of translocation events have occurred following a whole genome duplication (WGD) event some consistent with centromeric Robertsonian-like translocations. RNA sequencing data from 12 tissues and ab initio analysis informed a genome-wide manual annotation, using the WebApollo tool. In total, 33,044 gene loci represented by 33,123 isoforms were identified, named and tagged for quality of evidential support. Of these 3114 (9.4%) were identical to a protein within 'Hongyang' The Kiwifruit Information Resource (KIR v2). Some proportion of the differences will be varietal polymorphisms. However, as most computationally predicted Red5 models required manual re-annotation this proportion is expected to be small. The quality of the new gene models was tested by fully sequencing 550 cloned 'Hort16A' cDNAs and comparing with the predicted protein models for Red5 and both the original 'Hongyang' assembly and the revised annotation from KIR v2. Only 48.9% and 63.5% of the cDNAs had a match with 90% identity or better to the original and revised 'Hongyang' annotation, respectively, compared with 90.9% to the Red5 models. Our study highlights the need to take a cautious approach to draft genomes and computationally predicted genes. Our use of the manual annotation tool WebApollo facilitated manual checking and correction of gene models enabling improvement of computational prediction. This utility was especially relevant for certain types of gene families such as the EXPANSIN like genes. Finally, this high quality gene set will supply the kiwifruit and general plant community with a new tool for genomics and other comparative analysis.

  4. Metagenomic Analysis of a Biphenyl-Degrading Soil Bacterial Consortium Reveals the Metabolic Roles of Specific Populations

    PubMed Central

    Garrido-Sanz, Daniel; Manzano, Javier; Martín, Marta; Redondo-Nieto, Miguel; Rivilla, Rafael

    2018-01-01

    Polychlorinated biphenyls (PCBs) are widespread persistent pollutants that cause several adverse health effects. Aerobic bioremediation of PCBs involves the activity of either one bacterial species or a microbial consortium. Using multiple species will enhance the range of PCB congeners co-metabolized since different PCB-degrading microorganisms exhibit different substrate specificity. We have isolated a bacterial consortium by successive enrichment culture using biphenyl (analog of PCBs) as the sole carbon and energy source. This consortium is able to grow on biphenyl, benzoate, and protocatechuate. Whole-community DNA extracted from the consortium was used to analyze biodiversity by Illumina sequencing of a 16S rRNA gene amplicon library and to determine the metagenome by whole-genome shotgun Illumina sequencing. Biodiversity analysis shows that the consortium consists of 24 operational taxonomic units (≥97% identity). The consortium is dominated by strains belonging to the genus Pseudomonas, but also contains betaproteobacteria and Rhodococcus strains. whole-genome shotgun (WGS) analysis resulted in contigs containing 78.3 Mbp of sequenced DNA, representing around 65% of the expected DNA in the consortium. Bioinformatic analysis of this metagenome has identified the genes encoding the enzymes implicated in three pathways for the conversion of biphenyl to benzoate and five pathways from benzoate to tricarboxylic acid (TCA) cycle intermediates, allowing us to model the whole biodegradation network. By genus assignment of coding sequences, we have also been able to determine that the three biphenyl to benzoate pathways are carried out by Rhodococcus strains. In turn, strains belonging to Pseudomonas and Bordetella are the main responsible of three of the benzoate to TCA pathways while the benzoate conversion into TCA cycle intermediates via benzoyl-CoA and the catechol meta-cleavage pathways are carried out by beta proteobacteria belonging to genera such as Achromobacter and Variovorax. We have isolated a Rhodococcus strain WAY2 from the consortium which contains the genes encoding the three biphenyl to benzoate pathways indicating that this strain is responsible for all the biphenyl to benzoate transformations. The presented results show that metagenomic analysis of consortia allows the identification of bacteria active in biodegradation processes and the assignment of specific reactions and pathways to specific bacterial groups. PMID:29497412

  5. Stratification of co-evolving genomic groups using ranked phylogenetic profiles

    PubMed Central

    Freilich, Shiri; Goldovsky, Leon; Gottlieb, Assaf; Blanc, Eric; Tsoka, Sophia; Ouzounis, Christos A

    2009-01-01

    Background Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. Results The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. Conclusion Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples. PMID:19860884

  6. Orthology prediction methods: a quality assessment using curated protein families.

    PubMed

    Trachana, Kalliopi; Larsson, Tomas A; Powell, Sean; Chen, Wei-Hua; Doerks, Tobias; Muller, Jean; Bork, Peer

    2011-10-01

    The increasing number of sequenced genomes has prompted the development of several automated orthology prediction methods. Tests to evaluate the accuracy of predictions and to explore biases caused by biological and technical factors are therefore required. We used 70 manually curated families to analyze the performance of five public methods in Metazoa. We analyzed the strengths and weaknesses of the methods and quantified the impact of biological and technical challenges. From the latter part of the analysis, genome annotation emerged as the largest single influencer, affecting up to 30% of the performance. Generally, most methods did well in assigning orthologous group but they failed to assign the exact number of genes for half of the groups. The publicly available benchmark set (http://eggnog.embl.de/orthobench/) should facilitate the improvement of current orthology assignment protocols, which is of utmost importance for many fields of biology and should be tackled by a broad scientific community. Copyright © 2011 WILEY Periodicals, Inc.

  7. The first Taxus rhizosphere microbiome revealed by shotgun metagenomic sequencing.

    PubMed

    Hao, Da-Cheng; Zhang, Cai-Rong; Xiao, Pei-Gen

    2018-06-01

    In the present study, the shotgun high throughput metagenomic sequencing was implemented to globally capture the features of Taxus rhizosphere microbiome. Total reads could be assigned to 6925 species belonging to 113 bacteria phyla and 301 species of nine fungi phyla. For archaea and virus, 263 and 134 species were for the first time identified, respectively. More than 720,000 Unigenes were identified by clean reads assembly. The top five assigned phyla were Actinobacteria (363,941 Unigenes), Proteobacteria (182,053), Acidobacteria (44,527), Ascomycota (fungi; 18,267), and Chloroflexi (15,539). KEGG analysis predicted numerous functional genes; 7101 Unigenes belong to "Xenobiotics biodegradation and metabolism." A total of 12,040 Unigenes involved in defense mechanisms (e.g., xenobiotic metabolism) were annotated by eggNOG. Talaromyces addition could influence not only the diversity and structure of microbial communities of Taxus rhizosphere, but also the relative abundance of functional genes, including metabolic genes, antibiotic resistant genes, and genes involved in pathogen-host interaction, bacterial virulence, and bacterial secretion system. The structure and function of rhizosphere microbiome could be sensitive to non-native microbe addition, which could impact on the pollutant degradation. This study, complementary to the amplicon sequencing, more objectively reflects the native microbiome of Taxus rhizosphere and its response to environmental pressure, and lays a foundation for potential combination of phytoremediation and bioaugmentation. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  8. Defining objective clusters for rabies virus sequences using affinity propagation clustering

    PubMed Central

    Fischer, Susanne; Freuling, Conrad M.; Pfaff, Florian; Bodenhofer, Ulrich; Höper, Dirk; Fischer, Mareike; Marston, Denise A.; Fooks, Anthony R.; Mettenleiter, Thomas C.; Conraths, Franz J.; Homeier-Bachmann, Timo

    2018-01-01

    Rabies is caused by lyssaviruses, and is one of the oldest known zoonoses. In recent years, more than 21,000 nucleotide sequences of rabies viruses (RABV), from the prototype species rabies lyssavirus, have been deposited in public databases. Subsequent phylogenetic analyses in combination with metadata suggest geographic distributions of RABV. However, these analyses somewhat experience technical difficulties in defining verifiable criteria for cluster allocations in phylogenetic trees inviting for a more rational approach. Therefore, we applied a relatively new mathematical clustering algorythm named ‘affinity propagation clustering’ (AP) to propose a standardized sub-species classification utilizing full-genome RABV sequences. Because AP has the advantage that it is computationally fast and works for any meaningful measure of similarity between data samples, it has previously been applied successfully in bioinformatics, for analysis of microarray and gene expression data, however, cluster analysis of sequences is still in its infancy. Existing (516) and original (46) full genome RABV sequences were used to demonstrate the application of AP for RABV clustering. On a global scale, AP proposed four clusters, i.e. New World cluster, Arctic/Arctic-like, Cosmopolitan, and Asian as previously assigned by phylogenetic studies. By combining AP with established phylogenetic analyses, it is possible to resolve phylogenetic relationships between verifiably determined clusters and sequences. This workflow will be useful in confirming cluster distributions in a uniform transparent manner, not only for RABV, but also for other comparative sequence analyses. PMID:29357361

  9. HUNT: launch of a full-length cDNA database from the Helix Research Institute.

    PubMed

    Yudate, H T; Suwa, M; Irie, R; Matsui, H; Nishikawa, T; Nakamura, Y; Yamaguchi, D; Peng, Z Z; Yamamoto, T; Nagai, K; Hayashi, K; Otsuki, T; Sugiyama, T; Ota, T; Suzuki, Y; Sugano, S; Isogai, T; Masuho, Y

    2001-01-01

    The Helix Research Institute (HRI) in Japan is releasing 4356 HUman Novel Transcripts and related information in the newly established HUNT database. The institute is a joint research project principally funded by the Japanese Ministry of International Trade and Industry, and the clones were sequenced in the governmental New Energy and Industrial Technology Development Organization (NEDO) Human cDNA Sequencing Project. The HUNT database contains an extensive amount of annotation from advanced analysis and represents an essential bioinformatics contribution towards understanding of the gene function. The HRI human cDNA clones were obtained from full-length enriched cDNA libraries constructed with the oligo-capping method and have resulted in novel full-length cDNA sequences. A large fraction has little similarity to any proteins of known function and to obtain clues about possible function we have developed original analysis procedures. Any putative function deduced here can be validated or refuted by complementary analysis results. The user can also extract information from specific categories like PROSITE patterns, PFAM domains, PSORT localization, transmembrane helices and clones with GENIUS structure assignments. The HUNT database can be accessed at http://www.hri.co.jp/HUNT.

  10. Divergent Cryptosporidium parvum subtype and Enterocytozoon bieneusi genotypes in dromedary camels in Algeria.

    PubMed

    Baroudi, Djamel; Zhang, Hongwei; Amer, Said; Khelef, Djamel; Roellig, Dawn M; Wang, Yuanfei; Feng, Yaoyu; Xiao, Lihua

    2018-03-01

    Little information is available on the occurrence of the zoonotic protists Cryptosporidium spp. and none on Enterocytozoon bieneusi in camels. This preliminary study was conducted to examine the identity of Cryptosporidium subtypes and E. bieneusi genotypes in dromedary camels in Algeria. A total of 39 fecal specimens were collected from young camels. PCR-sequence analysis of the small subunit rRNA was used to detect and genotype Cryptosporidium spp. Cryptosporidium parvum present was further subtyped by sequence analysis of the 60 kDa glycoprotein gene. PCR-sequence analysis of the ribosomal internal transcribed spacer gene was used to detect and genotype E. bieneusi. Altogether, two and eight of the specimens analyzed were positive for C. parvum and E. bieneusi, respectively. The former was identified as a new subtype that is genetically related to the C. hominis If subtype family, whereas the latter was identified as two related genotypes (Macaque1 and a novel genotype) in the newly assigned E. bieneusi genotype group 8. Although they are not known hosts for C. parvum and E. bieneusi, camels are apparently infected with genetically distinct variants of these pathogens.

  11. Classifying proteins into functional groups based on all-versus-all BLAST of 10 million proteins.

    PubMed

    Kolker, Natali; Higdon, Roger; Broomall, William; Stanberry, Larissa; Welch, Dean; Lu, Wei; Haynes, Winston; Barga, Roger; Kolker, Eugene

    2011-01-01

    To address the monumental challenge of assigning function to millions of sequenced proteins, we completed the first of a kind all-versus-all sequence alignments using BLAST for 9.9 million proteins in the UniRef100 database. Microsoft Windows Azure produced over 3 billion filtered records in 6 days using 475 eight-core virtual machines. Protein classification into functional groups was then performed using Hive and custom jars implemented on top of Apache Hadoop utilizing the MapReduce paradigm. First, using the Clusters of Orthologous Genes (COG) database, a length normalized bit score (LNBS) was determined to be the best similarity measure for classification of proteins. LNBS achieved sensitivity and specificity of 98% each. Second, out of 5.1 million bacterial proteins, about two-thirds were assigned to significantly extended COG groups, encompassing 30 times more assigned proteins. Third, the remaining proteins were classified into protein functional groups using an innovative implementation of a single-linkage algorithm on an in-house Hadoop compute cluster. This implementation significantly reduces the run time for nonindexed queries and optimizes efficient clustering on a large scale. The performance was also verified on Amazon Elastic MapReduce. This clustering assigned nearly 2 million proteins to approximately half a million different functional groups. A similar approach was applied to classify 2.8 million eukaryotic sequences resulting in over 1 million proteins being assign to existing KOG groups and the remainder clustered into 100,000 functional groups.

  12. Isoform-level gene expression patterns in single-cell RNA-sequencing data.

    PubMed

    Vu, Trung Nghia; Wills, Quin F; Kalari, Krishna R; Niu, Nifang; Wang, Liewei; Pawitan, Yudi; Rantalainen, Mattias

    2018-02-27

    RNA sequencing of single cells enables characterization of transcriptional heterogeneity in seemingly homogeneous cell populations. Single-cell sequencing has been applied in a wide range of researches fields. However, few studies have focus on characterization of isoform-level expression patterns at the single-cell level. In this study we propose and apply a novel method, ISOform-Patterns (ISOP), based on mixture modeling, to characterize the expression patterns of isoform pairs from the same gene in single-cell isoform-level expression data. We define six principal patterns of isoform expression relationships and describe a method for differential-pattern analysis. We demonstrate ISOP through analysis of single-cell RNA-sequencing data from a breast cancer cell line, with replication in three independent datasets. We assigned the pattern types to each of 16,562 isoform-pairs from 4,929 genes. Among those, 26% of the discovered patterns were significant (p<0.05), while remaining patterns are possibly effects of transcriptional bursting, drop-out and stochastic biological heterogeneity. Furthermore, 32% of genes discovered through differential-pattern analysis were not detected by differential-expression analysis. The effect of drop-out events, mean expression level, and properties of the expression distribution on the performances of ISOP were also investigated through simulated datasets. To conclude, ISOP provides a novel approach for characterization of isoformlevel preference, commitment and heterogeneity in single-cell RNA-sequencing data. The ISOP method has been implemented as a R package and is available at https://github.com/nghiavtr/ISOP under a GPL-3 license. mattias.rantalainen@ki.se. Supplementary data are available at Bioinformatics online.

  13. [Sequence-based typing of enviromental Legionella pneumophila isolates in Guangzhou].

    PubMed

    Zhang, Ying; Qu, Pinghua; Zhang, Jian; Chen, Shouyi

    2011-03-01

    To characterize the genes of Legionella pneumophila isolated from different water source in Guangzhou from 2006 to 2009. To genotype the strains by using sequence-based typing (SBT) scheme. In total 44 L. pneumophila strains were identified by SBT with 7 diversifying genes of flaA, asd, mip, pilE, mompS, proA and neuA. Analysis of the amplicons sequence was taken in the European Working Group for Legionella Infections (EWGLI) international SBT database to obtain the allelic profiles and sequence types (STs). Serogroups were typed by latex agglutination test. Data from SBT revealed a high diversity among the strains and ST01 accounts for 30% (13/ 44). Fifteen new STs were discovered from 20 STs and 2 of them were newly assigned (ST887 and ST888) by EWGLI. SBT Phylogenetic tree was generated by SplitsTree and BURST programs. High diversity and specificity were observed of the L. pneumophila strains in Guangzhou. SBT is useful for L. pneumophila genomic study and epidemiological surveillance.

  14. Characterization of HIV Type 1 Envelope Sequence Among Viral Isolates Circulating in the Northern Region of Colombia, South America

    PubMed Central

    Villarreal, José-Luis; Gutiérrez, Jaime; Palacio, Lucy; Peñuela, Martha; Hernández, Robin; Lemay, Guy

    2012-01-01

    Abstract To characterize human immunodeficiency virus (HIV-1) strains circulating in the Northern region of Colombia in South America, sequences of the viral envelope C2V3C3 region were obtained from patients with different high-risk practices. Close to 60% of the sequences were predicted to belong to macrophage-tropic viruses, according to the positions of acidic amino acids and putative N-linked glycosylation sites. This is in agreement with the fact that most of the patients were recently diagnosed individuals. Phylogenic analysis then allowed assignment of all 35 samples to subtype B viruses. This same subtype was found in previous studies carried out in other Colombian regions. This study thus expands previous analyses with previously missing data from the Northern region of the country. The number and the length of the sequences examined also help to provide a clearer picture of the prevailing situation of the present HIV epidemics in this country. PMID:22482735

  15. Precise assignment of the heavy-strand promoter of mouse mitochondrial DNA: cognate start sites are not required for transcriptional initiation.

    PubMed Central

    Chang, D D; Clayton, D A

    1986-01-01

    Transcription of the heavy strand of mouse mitochondrial DNA starts from two closely spaced, distinct sites located in the displacement loop region of the genome. We report here an analysis of regulatory sequences required for faithful transcription from these two sites. Data obtained from in vitro assays demonstrated that a 51-base-pair region, encompassing nucleotides -40 to +11 of the downstream start site, contains sufficient information for accurate transcription from both start sites. Deletion of the 3' flanking sequences, including one or both start sites to -17, resulted in the initiation of transcription by the mitochondrial RNA polymerase from alternative sites within vector DNA sequences. This feature places the mouse heavy-strand promoter uniquely among other known mitochondrial promoters, all of which absolutely require cognate start sites for transcription. Comparison of the heavy-strand promoter with those of other vertebrate mitochondrial DNAs revealed a remarkably high rate of sequence divergence among species. Images PMID:3785226

  16. Ribosomal subunit protein typing using matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) for the identification and discrimination of Aspergillus species.

    PubMed

    Nakamura, Sayaka; Sato, Hiroaki; Tanaka, Reiko; Kusuya, Yoko; Takahashi, Hiroki; Yaguchi, Takashi

    2017-04-26

    Accurate identification of Aspergillus species is a very important subject. Mass spectral fingerprinting using matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) is generally employed for the rapid identification of fungal isolates. However, the results are based on simple mass spectral pattern-matching, with no peak assignment and no taxonomic input. We propose here a ribosomal subunit protein (RSP) typing technique using MALDI-TOF MS for the identification and discrimination of Aspergillus species. The results are concluded to be phylogenetic in that they reflect the molecular evolution of housekeeping RSPs. The amino acid sequences of RSPs of genome-sequenced strains of Aspergillus species were first verified and compared to compile a reliable biomarker list for the identification of Aspergillus species. In this process, we revealed that many amino acid sequences of RSPs (about 10-60%, depending on strain) registered in the public protein databases needed to be corrected or newly added. The verified RSPs were allocated to RSP types based on their mass. Peak assignments of RSPs of each sample strain as observed by MALDI-TOF MS were then performed to set RSP type profiles, which were then further processed by means of cluster analysis. The resulting dendrogram based on RSP types showed a relatively good concordance with the tree based on β-tubulin gene sequences. RSP typing was able to further discriminate the strains belonging to Aspergillus section Fumigati. The RSP typing method could be applied to identify Aspergillus species, even for species within section Fumigati. The discrimination power of RSP typing appears to be comparable to conventional β-tubulin gene analysis. This method would therefore be suitable for species identification and discrimination at the strain to species level. Because RSP typing can characterize the strains within section Fumigati, this method has potential as a powerful and reliable tool in the field of clinical microbiology.

  17. Metagenomic Analysis of the Rumen Microbiome of Steers with Wheat-Induced Frothy Bloat.

    PubMed

    Pitta, D W; Pinchak, W E; Indugu, N; Vecchiarelli, B; Sinha, R; Fulford, J D

    2016-01-01

    Frothy bloat is a serious metabolic disorder that affects stocker cattle grazing hard red winter wheat forage in the Southern Great Plains causing reduced performance, morbidity, and mortality. We hypothesize that a microbial dysbiosis develops in the rumen microbiome of stocker cattle when grazing on high quality winter wheat pasture that predisposes them to frothy bloat risk. In this study, rumen contents were harvested from six cannulated steers grazing hard red winter wheat (three with bloat score "2" and three with bloat score "0"), extracted for genomic DNA and subjected to 16S rDNA and shotgun sequencing on 454/Roche platform. Approximately 1.5 million reads were sequenced, assembled and assigned for phylogenetic and functional annotations. Bacteria predominated up to 84% of the sequences while archaea contributed to nearly 5% of the sequences. The abundance of archaea was higher in bloated animals (P < 0.05) and dominated by Methanobrevibacter. Predominant bacterial phyla were Firmicutes (65%), Actinobacteria (13%), Bacteroidetes (10%), and Proteobacteria (6%) across all samples. Genera from Firmicutes such as Clostridium, Eubacterium, and Butyrivibrio increased (P < 0.05) while Prevotella from Bacteroidetes decreased in bloated samples. Co-occurrence analysis revealed syntrophic associations between bacteria and archaea in non-bloated samples, however; such interactions faded in bloated samples. Functional annotations of assembled reads to Subsystems database revealed the abundance of several metabolic pathways, with carbohydrate and protein metabolism well represented. Assignment of contigs to CaZy database revealed a greater diversity of Glycosyl Hydrolases dominated by oligosaccharide breaking enzymes (>70%) in non-bloated samples. However, the abundance and diversity of CaZymes were greatly reduced in bloated samples indicating the disruption of carbohydrate metabolism. We conclude that mild to moderate frothy bloat results from tradeoffs both within and between microbial domains due to greater competition for substrates that are of limited availability as a result of biofilm formation.

  18. Metagenomic Analysis of the Rumen Microbiome of Steers with Wheat-Induced Frothy Bloat

    PubMed Central

    Pitta, D. W.; Pinchak, W. E.; Indugu, N.; Vecchiarelli, B.; Sinha, R.; Fulford, J. D.

    2016-01-01

    Frothy bloat is a serious metabolic disorder that affects stocker cattle grazing hard red winter wheat forage in the Southern Great Plains causing reduced performance, morbidity, and mortality. We hypothesize that a microbial dysbiosis develops in the rumen microbiome of stocker cattle when grazing on high quality winter wheat pasture that predisposes them to frothy bloat risk. In this study, rumen contents were harvested from six cannulated steers grazing hard red winter wheat (three with bloat score “2” and three with bloat score “0”), extracted for genomic DNA and subjected to 16S rDNA and shotgun sequencing on 454/Roche platform. Approximately 1.5 million reads were sequenced, assembled and assigned for phylogenetic and functional annotations. Bacteria predominated up to 84% of the sequences while archaea contributed to nearly 5% of the sequences. The abundance of archaea was higher in bloated animals (P < 0.05) and dominated by Methanobrevibacter. Predominant bacterial phyla were Firmicutes (65%), Actinobacteria (13%), Bacteroidetes (10%), and Proteobacteria (6%) across all samples. Genera from Firmicutes such as Clostridium, Eubacterium, and Butyrivibrio increased (P < 0.05) while Prevotella from Bacteroidetes decreased in bloated samples. Co-occurrence analysis revealed syntrophic associations between bacteria and archaea in non-bloated samples, however; such interactions faded in bloated samples. Functional annotations of assembled reads to Subsystems database revealed the abundance of several metabolic pathways, with carbohydrate and protein metabolism well represented. Assignment of contigs to CaZy database revealed a greater diversity of Glycosyl Hydrolases dominated by oligosaccharide breaking enzymes (>70%) in non-bloated samples. However, the abundance and diversity of CaZymes were greatly reduced in bloated samples indicating the disruption of carbohydrate metabolism. We conclude that mild to moderate frothy bloat results from tradeoffs both within and between microbial domains due to greater competition for substrates that are of limited availability as a result of biofilm formation. PMID:27242715

  19. Sequencing, de novo assembly and characterization of the spotted scat Scatophagus argus (Linnaeus 1766) transcriptome for discovery of reproduction related genes and SSRs

    NASA Astrophysics Data System (ADS)

    Yang, Wei; Chen, Huapu; Cui, Xuefan; Zhang, Kewei; Jiang, Dongneng; Deng, Siping; Zhu, Chunhua; Li, Guangli

    2017-09-01

    Spotted scat (Scatophagus argus) is an economically important farmed fish, particularly in East and Southeast Asia. Because there has been little research on reproductive development and regulation in this species, the lack of a mature artificial reproduction technology remains a barrier for the sustainable development of the aquaculture industry. More genetic and genomic background knowledge is urgently needed for an in-depth understanding of the molecular mechanism of reproductive process and identification of functional genes related to sexual differentiation, gonad maturation and gametogenesis. For these reasons, we performed transcriptomic analysis on spotted scat using a multiple tissue sample mixing strategy. The Illumina RNA sequencing generated 118 510 486 raw reads. After trimming, de novo assembly was performed and yielded 99 888 unigenes with an average length of 905.75 bp. A total of 45 015 unigenes were successfully annotated to the Nr, Swiss-Prot, KOG and KEGG databases. Additionally, 23 783 and 27 183 annotated unigenes were assigned to 56 Gene Ontology (GO) functional groups and 228 KEGG pathways, respectively. Subsequently, 2 474 transcripts associated with reproduction were selected using GO term and KEGG pathway assignments, and a number of reproduction-related genes involved in sex differentiation, gonad development and gametogenesis were identified. Furthermore, 22 279 simple sequence repeat (SSR) loci were discovered and characterized. The comprehensive transcript dataset described here greatly increases the genetic information available for spotted scat and contributes valuable sequence resources for functional gene mining and analysis. Candidate transcripts involved in reproduction would make good starting points for future studies on reproductive mechanisms, and the putative sex differentiation-related genes will be helpful for sex-determining gene identification and sex-specific marker isolation. Lastly, the SSRs can serve as marker resources for future research into genetics, marker-assisted selection (MAS) and conservation biology.

  20. Monitoring of microbial communities in anaerobic digestion sludge for biogas optimisation.

    PubMed

    Lim, Jun Wei; Ge, Tianshu; Tong, Yen Wah

    2018-01-01

    This study characterised and compared the microbial communities of anaerobic digestion (AD) sludge using three different methods - (1) Clone library; (2) Pyrosequencing; and (3) Terminal restriction fragment length polymorphism (T-RFLP). Although high-throughput sequencing techniques are becoming increasingly popular and affordable, the reliance of such techniques for frequent monitoring of microbial communities may be a financial burden for some. Furthermore, the depth of microbial analysis revealed by high-throughput sequencing may not be required for monitoring purposes. This study aims to develop a rapid, reliable and economical approach for the monitoring of microbial communities in AD sludge. A combined approach where genetic information of sequences from clone library was used to assign phylogeny to T-RFs determined experimentally was developed in this study. In order to assess the effectiveness of the combined approach, microbial communities determined by the combined approach was compared to that characterised by pyrosequencing. Results showed that both pyrosequencing and clone library methods determined the dominant bacteria phyla to be Proteobacteria, Firmicutes, Bacteroidetes, and Thermotogae. Both methods also found that sludge A and B were predominantly dominated by acetogenic methanogens followed by hydrogenotrophic methanogens. The number of OTUs detected by T-RFLP was significantly lesser than that detected by the clone library. In this study, T-RFLP analysis identified majority of the dominant species of the archaeal consortia. However, many of the more highly diverse bacteria consortia were missed. Nevertheless, the combined approach developed in this study where clone sequences from the clone library were used to assign phylogeny to T-RFs determined experimentally managed to accurately predict the same dominant microbial groups for both sludge A and sludge B, as compared to the pyrosequencing results. Results showed that the combined approach of clone library and T-RFLP accurately predicted the dominant microbial groups and thus is a reliable and more economical way to monitor the evolution of microbial systems in AD sludge. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Trial to assess the utility of genetic sequencing to improve patient outcomes

    Cancer.gov

    A pilot trial to assess whether assigning treatment based on specific gene mutations can provide benefit to patients with metastatic solid tumors is being launched this month by the NCI. The Molecular Profiling based Assignment of Cancer Therapeutics, or

  2. The Gap Procedure: for the identification of phylogenetic clusters in HIV-1 sequence data.

    PubMed

    Vrbik, Irene; Stephens, David A; Roger, Michel; Brenner, Bluma G

    2015-11-04

    In the context of infectious disease, sequence clustering can be used to provide important insights into the dynamics of transmission. Cluster analysis is usually performed using a phylogenetic approach whereby clusters are assigned on the basis of sufficiently small genetic distances and high bootstrap support (or posterior probabilities). The computational burden involved in this phylogenetic threshold approach is a major drawback, especially when a large number of sequences are being considered. In addition, this method requires a skilled user to specify the appropriate threshold values which may vary widely depending on the application. This paper presents the Gap Procedure, a distance-based clustering algorithm for the classification of DNA sequences sampled from individuals infected with the human immunodeficiency virus type 1 (HIV-1). Our heuristic algorithm bypasses the need for phylogenetic reconstruction, thereby supporting the quick analysis of large genetic data sets. Moreover, this fully automated procedure relies on data-driven gaps in sorted pairwise distances to infer clusters, thus no user-specified threshold values are required. The clustering results obtained by the Gap Procedure on both real and simulated data, closely agree with those found using the threshold approach, while only requiring a fraction of the time to complete the analysis. Apart from the dramatic gains in computational time, the Gap Procedure is highly effective in finding distinct groups of genetically similar sequences and obviates the need for subjective user-specified values. The clusters of genetically similar sequences returned by this procedure can be used to detect patterns in HIV-1 transmission and thereby aid in the prevention, treatment and containment of the disease.

  3. Comparative Genome Analysis of “Candidatus Phytoplasma australiense” (Subgroup tuf-Australia I; rp-A) and “Ca. Phytoplasma asteris” Strains OY-M and AY-WB▿ †

    PubMed Central

    Tran-Nguyen, L. T. T.; Kube, M.; Schneider, B.; Reinhardt, R.; Gibb, K. S.

    2008-01-01

    The chromosome sequence of “Candidatus Phytoplasma australiense” (subgroup tuf-Australia I; rp-A), associated with dieback in papaya, Australian grapevine yellows in grapevine, and several other important plant diseases, was determined. The circular chromosome is represented by 879,324 nucleotides, a GC content of 27%, and 839 protein-coding genes. Five hundred two of these protein-coding genes were functionally assigned, while 337 genes were hypothetical proteins with unknown function. Potential mobile units (PMUs) containing clusters of DNA repeats comprised 12.1% of the genome. These PMUs encoded genes involved in DNA replication, repair, and recombination; nucleotide transport and metabolism; translation; and ribosomal structure. Elements with similarities to phage integrases found in these mobile units were difficult to classify, as they were similar to both insertion sequences and bacteriophages. Comparative analysis of “Ca. Phytoplasma australiense” with “Ca. Phytoplasma asteris” strains OY-M and AY-WB showed that the gene order was more conserved between the closely related “Ca. Phytoplasma asteris” strains than to “Ca. Phytoplasma australiense.” Differences observed between “Ca. Phytoplasma australiense” and “Ca. Phytoplasma asteris” strains included the chromosome size (18,693 bp larger than OY-M), a larger number of genes with assigned function, and hypothetical proteins with unknown function. PMID:18359806

  4. Defining precision: The precision medicine initiative trials NCI-MPACT and NCI-MATCH.

    PubMed

    Coyne, Geraldine O'Sullivan; Takebe, Naoko; Chen, Alice P

    "Precision" trials, using rationally incorporated biomarker targets and molecularly selective anticancer agents, have become of great interest to both patients and their physicians. In the endeavor to test the cornerstone premise of precision oncotherapy, that is, determining if modulating a specific molecular aberration in a patient's tumor with a correspondingly specific therapeutic agent improves clinical outcomes, the design of clinical trials with embedded genomic characterization platforms which guide therapy are an increasing challenge. The National Cancer Institute Precision Medicine Initiative is an unprecedented large interdisciplinary collaborative effort to conceptualize and test the feasibility of trials incorporating sequencing platforms and large-scale bioinformatics processing that are not currently uniformly available to patients. National Cancer Institute-Molecular Profiling-based Assignment of Cancer Therapy and National Cancer Institute-Molecular Analysis for Therapy Choice are 2 genomic to phenotypic trials under this National Cancer Institute initiative, where treatment is selected according to predetermined genetic alterations detected using next-generation sequencing technology across a broad range of tumor types. In this article, we discuss the objectives and trial designs that have enabled the public-private partnerships required to complete the scale of both trials, as well as interim trial updates and strategic considerations that have driven data analysis and targeted therapy assignment, with the intent of elucidating further the benefits of this treatment approach for patients. Copyright © 2017. Published by Elsevier Inc.

  5. Due-Window Assignment Scheduling with Variable Job Processing Times

    PubMed Central

    Wu, Yu-Bin

    2015-01-01

    We consider a common due-window assignment scheduling problem jobs with variable job processing times on a single machine, where the processing time of a job is a function of its position in a sequence (i.e., learning effect) or its starting time (i.e., deteriorating effect). The problem is to determine the optimal due-windows, and the processing sequence simultaneously to minimize a cost function includes earliness, tardiness, the window location, window size, and weighted number of tardy jobs. We prove that the problem can be solved in polynomial time. PMID:25918745

  6. Functional assignment of gene AAC16202.1 from Rhodobacter capsulatus SB1003: new insights into the bacterial SDR sorbitol dehydrogenases family.

    PubMed

    Sola-Carvajal, Agustín; García-García, María Inmaculada; Sánchez-Carrón, Guiomar; García-Carmona, Francisco; Sánchez-Ferrer, Alvaro

    2012-11-01

    Short-chain dehydrogenases/reductases (SDR) constitute one of the largest enzyme superfamilies with over 60,000 non-redundant sequences in the database, many of which need a correct functional assignment. Among them, the gene AAC16202.1 (NCBI) from Rhodobacter capsulatus SB1003 has been assigned in Uniprot both as a sorbitol dehydrogenase (#D5AUY1) and, as an N-acetyl-d-mannosamine dehydrogenase (#O66112), both enzymes being of biotechnological interest. When the gene was overexpressed in Escherichia coli Rosetta (DE3)pLys, the purified enzyme was not active toward N-acetyl-d-mannosamine, whereas it was active toward d-sorbitol and d-fructose. However, the relative activities toward xylitol and l-iditol (0.45 and 6.9%, respectively) were low compared with that toward d-sorbitol. Thus, the enzyme could be considered sorbitol dehydrogenase (SDH) with very low activity toward xylitol, which could increase its biotechnological interest for determining sorbitol without the unspecific cross-determination of added xylitol in food and pharma compositions. The tetrameric enzyme (120 kDa) showed similar catalytic efficiency (2.2 × 10(3) M(-1) s(-1)) to other sorbitol dehydrogenases for d-sorbitol, with an optimum pH of 9.0 and an optimum temperature of 37 °C. The enzyme was also more thermostable than other reported SDH, ammonium sulfate being the best stabilizer in this respect, increasing the melting temperature (T(m)) up to 52.9 °C. The enzyme can also be considered as a new member of the Zn(2+) independent SDH family since no effect on activity was detected in the presence of divalent cations or chelating agents. Finally, its in silico analysis enabled the specific conserved sequence blocks that are the fingerprints of bacterial sorbitol dehydrogenases and mainly located at C-terminal of the protein, to be determined for the first time. This knowledge will facilitate future data curation of present databases and a better functional assignment of newly described sequences. Copyright © 2012 Elsevier Masson SAS. All rights reserved.

  7. Methanococcus jannaschii genome: revisited

    NASA Technical Reports Server (NTRS)

    Kyrpides, N. C.; Olsen, G. J.; Klenk, H. P.; White, O.; Woese, C. R.

    1996-01-01

    Analysis of genomic sequences is necessarily an ongoing process. Initial gene assignments tend (wisely) to be on the conservative side (Venter, 1996). The analysis of the genome then grows in an iterative fashion as additional data and more sophisticated algorithms are brought to bear on the data. The present report is an emendation of the original gene list of Methanococcus jannaschii (Bult et al., 1996). By using a somewhat more updated database and more relaxed (and operator-intensive) pattern matching methods, we were able to add significantly to, and in a few cases amend, the gene identification table originally published by Bult et al. (1996).

  8. Storage and utilization of HLA genomic data--new approaches to HLA typing.

    PubMed

    Helmberg, W

    2000-01-01

    Currently available DNA-based HLA typing assays can provide detailed information about sequence motifs of a tested sample. It is still a common practice, however, for information acquired by high-resolution sequence specific oligonucleotide probe (SSOP) typing or sequence specific priming (SSP) to be presented in a low-resolution serological format. Unfortunately, this representation can lead to significant loss of useful data in many cases. An alternative to assigning allele equivalents to suchDNA typing results is simply to store the observed typing pattern and utilize the information with the help of Virtual DNA Analysis (VDA). Interpretation of the stored typing patterns can then be updated based on newly defined alleles, assuming the sequence motifs detected by the typing reagents are known. Rather than updating reagent specificities in individual laboratories, such updates should be performed in a central, publicly available sequence database. By referring to this database, HLA genomic data can then be stored and transferred between laboratories without loss of information. The 13th International Histocompatibility Workshop offers an ideal opportunity to begin building this common database for the entire human MHC.

  9. Leptospira species molecular epidemiology in the genomic era.

    PubMed

    Caimi, K; Repetto, S A; Varni, V; Ruybal, P

    2017-10-01

    Leptospirosis is a zoonotic disease which global burden is increasing often related to climatic change. Hundreds of whole genome sequences from worldwide isolates of Leptospira spp. are available nowadays, together with online tools that permit to assign MLST sequence types (STs) directly from raw sequence data. In this work we have applied R7L-MLST to near 500 genomes and strains collection globally distributed. All 10 pathogenic species as well as intermediate were typed using this MLST scheme. The correlation observed between STs and serogroups in our previous work, is still satisfied with this higher dataset sustaining the implementation of MLST to assist serological classification as a complementary approach. Bayesian phylogenetic analysis of concatenated sequences from R7-MLST loci allowed us to resolve taxonomic inconsistencies but also showed that events such as recombination, gene conversion or lateral gene transfer played an important role in the evolution of Leptospira genus. Whole genome sequencing allows us to contribute with suitable epidemiologic information useful to apply in the design of control strategies and also in diagnostic methods for this illness. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. GenBank.

    PubMed

    Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W

    2011-01-01

    GenBank® is a comprehensive database that contains publicly available nucleotide sequences for more than 380,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system that integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov.

  11. Identification of Microbial Profile of Koji Using Single Molecule, Real-Time Sequencing Technology.

    PubMed

    Hui, Wenyan; Hou, Qiangchuan; Cao, Chenxia; Xu, Haiyan; Zhen, Yi; Kwok, Lai-Yu; Sun, Tiansong; Zhang, Heping; Zhang, Wenyi

    2017-05-01

    Koji is a kind of Japanese traditional fermented starter that has been used for centuries. Many fermented foods are made from koji, such as sake, miso, and soy sauce. This study used the single molecule real-time sequencing technology (SMRT) to investigate the bacterial and fungal microbiota of 3 Japanese koji samples. After SMRT analysis, a total of 39121 high-quality sequences were generated, including 14354 bacterial and 24767 fungal sequence reads. The high-quality gene sequences were assigned to 5 bacterial and 2 fungal plyla, dominated by Proteobacteria and Ascomycota, respectively. At the genus level, Ochrobactrum and Wickerhamomyces were the most abundant bacterial and fungal genera, respectively. The predominant bacterial and fungal species were Ochrobactrum lupini and Wickerhamomyces anomalus, respectively. Our study profiled the microbiota composition of 3 Japanese koji samples to the species level precision. The results may be useful for further development of traditional fermented products, especially optimization of koji preparation. Meanwhile, this study has demonstrated that SMRT is a robust tool for analyzing the microbial composition in food samples. © 2017 Institute of Food Technologists®.

  12. Lactobacillus apodemi sp. nov., a tannase-producing species isolated from wild mouse faeces.

    PubMed

    Osawa, Ro; Fujisawa, Tomohiko; Pukall, Rüdiger

    2006-07-01

    A Gram-positive, rod-shaped, non-endospore-forming bacterium, strain ASB1(T), able to degrade tannin, was isolated from faeces of the Japanese large wood mouse, Apodemus speciosus. Comparative analysis of the 16S rRNA gene sequence revealed that the strain could be assigned as a member of the genus Lactobacillus. The nearest phylogenetic neighbours were determined as Lactobacillus animalis DSM 20602(T) (98.9 % 16S rRNA gene sequence similarity) and Lactobacillus murinus ASF 361 (98.9 %). Subsequent polyphasic analysis, including automated ribotyping and DNA-DNA hybridization experiments, confirmed that the isolate represents a novel species, for which the name Lactobacillus apodemi sp. nov. is proposed. The DNA G+C content of the novel strain is 38.5 mol%. The cell-wall peptidoglycan is of type A4alpha L-lys-D-asp. The type strain is ASB1(T) (=DSM 16634(T)=CIP 108913(T)).

  13. Multiple approaches to characterize the microbial community in a thermophilic anaerobic digester running on swine manure: a case study.

    PubMed

    Tuan, Nguyen Ngoc; Chang, Yi-Chia; Yu, Chang-Ping; Huang, Shir-Ly

    2014-01-01

    In this study, the first survey of microbial community in thermophilic anaerobic digester using swine manure as sole feedstock was performed by multiple approaches including denaturing gradient gel electrophoresis (DGGE), clone library and pyrosequencing techniques. The integrated analysis of 21 DGGE bands, 126 clones and 8506 pyrosequencing read sequences revealed that Clostridia from the phylum Firmicutes account for the most dominant Bacteria. In addition, our analysis also identified additional taxa that were missed by the previous researches, including members of the bacterial phyla Synergistetes, Planctomycetes, Armatimonadetes, Chloroflexi and Nitrospira which might also play a role in thermophilic anaerobic digester. Most archaeal 16S rRNA sequences could be assigned to the order Methanobacteriales instead of Methanomicrobiales comparing to previous studies. In addition, this study reported that the member of Methanothermobacter genus was firstly found in thermophilic anaerobic digester. Copyright © 2014 Elsevier GmbH. All rights reserved.

  14. Molecular Characterization and Phylogenetic Analysis of Pseudomonas aeruginosa Isolates Recovered from Greek Aquatic Habitats Implementing the Double-Locus Sequence Typing Scheme.

    PubMed

    Pappa, Olga; Beloukas, Apostolos; Vantarakis, Apostolos; Mavridou, Athena; Kefala, Anastasia-Maria; Galanis, Alex

    2017-07-01

    The recently described double-locus sequence typing (DLST) scheme implemented to deeply characterize the genetic profiles of 52 resistant environmental Pseudomonas aeruginosa isolates deriving from aquatic habitats of Greece. DLST scheme was able not only to assign an already known allelic profile to the majority of the isolates but also to recognize two new ones (ms217-190, ms217-191) with high discriminatory power. A third locus (oprD) was also used for the molecular typing, which has been found to be fundamental for the phylogenetic analysis of environmental isolates given the resulted increased discrimination between the isolates. Additionally, the circulation of acquired resistant mechanisms in the aquatic habitats according to their genetic profiles was proved to be more extent. Hereby, we suggest that the combination of the DLST to oprD typing can discriminate phenotypically and genetically related environmental P. aeruginosa isolates providing reliable phylogenetic analysis at a local level.

  15. RNA-seq Data: Challenges in and Recommendations for Experimental Design and Analysis.

    PubMed

    Williams, Alexander G; Thomas, Sean; Wyman, Stacia K; Holloway, Alisha K

    2014-10-01

    RNA-seq is widely used to determine differential expression of genes or transcripts as well as identify novel transcripts, identify allele-specific expression, and precisely measure translation of transcripts. Thoughtful experimental design and choice of analysis tools are critical to ensure high-quality data and interpretable results. Important considerations for experimental design include number of replicates, whether to collect paired-end or single-end reads, sequence length, and sequencing depth. Common analysis steps in all RNA-seq experiments include quality control, read alignment, assigning reads to genes or transcripts, and estimating gene or transcript abundance. Our aims are two-fold: to make recommendations for common components of experimental design and assess tool capabilities for each of these steps. We also test tools designed to detect differential expression, since this is the most widespread application of RNA-seq. We hope that these analyses will help guide those who are new to RNA-seq and will generate discussion about remaining needs for tool improvement and development. Copyright © 2014 John Wiley & Sons, Inc.

  16. Phylogenetic analysis of VP2 gene of canine parvovirus and comparison with Indian and world isolates.

    PubMed

    Kaur, G; Chandra, M; Dwivedi, P N

    2016-03-01

    Canine parvovirus (CPV) causes hemorrhagic enteritis, especially in young dogs, leading to high morbidity and mortality. It has four main antigenic types CPV-2, CPV-2a, CPV-2b and CPV-2c. Virus protein 2 (VP2) is the main capsid protein and mutations affecting VP2 gene are responsible for the evolution of various antigenic types of CPV. Full length VP2 gene from field isolates was amplified and cloned for sequence analysis. The sequences were submitted to the GenBank and were assigned Acc. Nos., viz. KP406928.1 for P12, KP406927.1 for P15, KP406930.1 for P32, KP406926.1 for Megavac-6 and KP406929.1 for NobivacDHPPi. Phylogenetic analysis indicated that the samples were forming a separate clad with vaccine strains. When the samples were compared with the world and Indian isolates, it was observed that samples formed a separate node indicating regional genetic variation in CPV.

  17. ACLAME: a CLAssification of Mobile genetic Elements, update 2010.

    PubMed

    Leplae, Raphaël; Lima-Mendez, Gipsi; Toussaint, Ariane

    2010-01-01

    The ACLAME database is dedicated to the collection, analysis and classification of sequenced mobile genetic elements (MGEs, in particular phages and plasmids). In addition to providing information on the MGEs content, classifications are available at various levels of organization. At the gene/protein level, families group similar sequences that are expected to share the same function. Families of four or more proteins are manually assigned with a functional annotation using the GeneOntology and the locally developed ontology MeGO dedicated to MGEs. At the genome level, evolutionary cohesive modules group sets of protein families shared among MGEs. At the population level, networks display the reticulate evolutionary relationships among MGEs. To increase the coverage of the phage sequence space, ACLAME version 0.4 incorporates 760 high-quality predicted prophages selected from the Prophinder database. Most of the data can be downloaded from the freely accessible ACLAME web site (http://aclame.ulb.ac.be). The BLAST interface for querying the database has been extended and numerous tools for in-depth analysis of the results have been added.

  18. Characterization of a novel variant of Mycobacterium chimaera.

    PubMed

    van Ingen, J; Hoefsloot, W; Buijtels, P C A M; Tortoli, E; Supply, P; Dekhuijzen, P N R; Boeree, M J; van Soolingen, D

    2012-09-01

    In this study, nonchromogenic mycobacteria were isolated from pulmonary samples of three patients in the Netherlands. All isolates had identical, unique 16S rRNA gene and 16S-23S ITS sequences, which were closely related to those of Mycobacterium chimaera and Mycobacterium marseillense. The biochemical features of the isolates differed slightly from those of M. chimaera, suggesting that the isolates may represent a possible separate species within the Mycobacterium avium complex (MAC). However, the cell-wall mycolic acid pattern, analysed by HPLC, and the partial sequences of the hsp65 and rpoB genes were identical to those of M. chimaera. We concluded that the isolates represent a novel variant of M. chimaera. The results of this analysis have led us to question the currently used methods of species definition for members of the genus Mycobacterium, which are based largely on 16S rRNA or rpoB gene sequencing. Definitions based on a single genetic target are likely to be insufficient. Genetic divergence, especially in the MAC, yields strains that cannot be confidently assigned to a specific species based on the analysis of a single genetic target.

  19. ScanRanker: Quality Assessment of Tandem Mass Spectra via Sequence Tagging

    PubMed Central

    Ma, Ze-Qiang; Chambers, Matthew C.; Ham, Amy-Joan L.; Cheek, Kristin L.; Whitwell, Corbin W.; Aerni, Hans-Rudolf; Schilling, Birgit; Miller, Aaron W.; Caprioli, Richard M.; Tabb, David L.

    2011-01-01

    In shotgun proteomics, protein identification by tandem mass spectrometry relies on bioinformatics tools. Despite recent improvements in identification algorithms, a significant number of high quality spectra remain unidentified for various reasons. Here we present ScanRanker, an open-source tool that evaluates the quality of tandem mass spectra via sequence tagging with reliable performance in data from different instruments. The superior performance of ScanRanker enables it not only to find unassigned high quality spectra that evade identification through database search, but also to select spectra for de novo sequencing and cross-linking analysis. In addition, we demonstrate that the distribution of ScanRanker scores predicts the richness of identifiable spectra among multiple LC-MS/MS runs in an experiment, and ScanRanker scores assist the process of peptide assignment validation to increase confident spectrum identifications. The source code and executable versions of ScanRanker are available from http://fenchurch.mc.vanderbilt.edu. PMID:21520941

  20. Analysis of delay reducing and fuel saving sequencing and spacing algorithms for arrival traffic

    NASA Technical Reports Server (NTRS)

    Neuman, Frank; Erzberger, Heinz

    1991-01-01

    The air traffic control subsystem that performs sequencing and spacing is discussed. The function of the sequencing and spacing algorithms is to automatically plan the most efficient landing order and to assign optimally spaced landing times to all arrivals. Several algorithms are described and their statistical performance is examined. Sequencing brings order to an arrival sequence for aircraft. First-come-first-served sequencing (FCFS) establishes a fair order, based on estimated times of arrival, and determines proper separations. Because of the randomness of the arriving traffic, gaps will remain in the sequence of aircraft. Delays are reduced by time-advancing the leading aircraft of each group while still preserving the FCFS order. Tightly spaced groups of aircraft remain with a mix of heavy and large aircraft. Spacing requirements differ for different types of aircraft trailing each other. Traffic is reordered slightly to take advantage of this spacing criterion, thus shortening the groups and reducing average delays. For heavy traffic, delays for different traffic samples vary widely, even when the same set of statistical parameters is used to produce each sample. This report supersedes NASA TM-102795 on the same subject. It includes a new method of time-advance as well as an efficient method of sequencing and spacing for two dependent runways.

  1. Analysis of 10,000 ESTs from lymphocytes of the cynomolgus monkey to improve our understanding of its immune system

    PubMed Central

    Chen, Wei-Hua; Wang, Xue-Xia; Lin, Wei; He, Xiao-Wei; Wu, Zhen-Qiang; Lin, Ying; Hu, Song-Nian; Wang, Xiao-Ning

    2006-01-01

    Background The cynomolgus monkey (Macaca fascicularis) is one of the most widely used surrogate animal models for an increasing number of human diseases and vaccines, especially immune-system-related ones. Towards a better understanding of the gene expression background upon its immunogenetics, we constructed a cDNA library from Epstein-Barr virus (EBV)-transformed B lymphocytes of a cynomolgus monkey and sequenced 10,000 randomly picked clones. Results After processing, 8,312 high-quality expressed sequence tags (ESTs) were generated and assembled into 3,728 unigenes. Annotations of these uniquely expressed transcripts demonstrated that out of the 2,524 open reading frame (ORF) positive unigenes (mitochondrial and ribosomal sequences were not included), 98.8% shared significant similarities (E-value less than 1e-10) with the NCBI nucleotide (nt) database, while only 67.7% (E-value less than 1e-5) did so with the NCBI non-redundant protein (nr) database. Further analysis revealed that 90.0% of the unigenes that shared no similarities to the nr database could be assigned to human chromosomes, in which 75 did not match significantly to any cynomolgus monkey and human ESTs. The mapping regions to known human genes on the human genome were described in detail. The protein family and domain analysis revealed that the first, second and fourth of the most abundantly expressed protein families were all assigned to immunoglobulin and major histocompatibility complex (MHC)-related proteins. The expression profiles of these genes were compared with that of homologous genes in human blood, lymph nodes and a RAMOS cell line, which demonstrated expression changes after transformation with EBV. The degree of sequence similarity of the MHC class I and II genes to the human reference sequences was evaluated. The results indicated that class I molecules showed weak amino acid identities (<90%), while class II showed slightly higher ones. Conclusion These results indicated that the genes expressed in the cynomolgus monkey could be used to identify novel protein-coding genes and revise those incomplete or incorrect annotations in the human genome by comparative methods, since the old world monkeys and humans share high similarities at the molecular level, especially within coding regions. The identification of multiple genes involved in the immune response, their sequence variations to the human homologues, and their responses to EBV infection could provide useful information to improve our understanding of the cynomolgus monkey immune system. PMID:16618371

  2. Comparison of traditional phenotypic identification methods with partial 5' 16S rRNA gene sequencing for species-level identification of nonfermenting Gram-negative bacilli.

    PubMed

    Cloud, Joann L; Harmsen, Dag; Iwen, Peter C; Dunn, James J; Hall, Gerri; Lasala, Paul Rocco; Hoggan, Karen; Wilson, Deborah; Woods, Gail L; Mellmann, Alexander

    2010-04-01

    Correct identification of nonfermenting Gram-negative bacilli (NFB) is crucial for patient management. We compared phenotypic identifications of 96 clinical NFB isolates with identifications obtained by 5' 16S rRNA gene sequencing. Sequencing identified 88 isolates (91.7%) with >99% similarity to a sequence from the assigned species; 61.5% of sequencing results were concordant with phenotypic results, indicating the usability of sequencing to identify NFB.

  3. Pilot survey of expressed sequence tags (ESTs) from the asexual blood stages of Plasmodium vivax in human patients.

    PubMed

    Merino, Emilio F; Fernandez-Becerra, Carmen; Madeira, Alda M B N; Machado, Ariane L; Durham, Alan; Gruber, Arthur; Hall, Neil; del Portillo, Hernando A

    2003-07-21

    Plasmodium vivax is the most widely distributed human malaria, responsible for 70-80 million clinical cases each year and large socio-economical burdens for countries such as Brazil where it is the most prevalent species. Unfortunately, due to the impossibility of growing this parasite in continuous in vitro culture, research on P. vivax remains largely neglected. A pilot survey of expressed sequence tags (ESTs) from the asexual blood stages of P. vivax was performed. To do so, 1,184 clones from a cDNA library constructed with parasites obtained from 10 different human patients in the Brazilian Amazon were sequenced. Sequences were automatedly processed to remove contaminants and low quality reads. A total of 806 sequences with an average length of 586 bp met such criteria and their clustering revealed 666 distinct events. The consensus sequence of each cluster and the unique sequences of the singlets were used in similarity searches against different databases that included P. vivax, Plasmodium falciparum, Plasmodium yoelii, Plasmodium knowlesi, Apicomplexa and the GenBank non-redundant database. An E-value of <10(-30) was used to define a significant database match. ESTs were manually assigned a gene ontology (GO) terminology A total of 769 ESTs could be assigned a putative identity based upon sequence similarity to known proteins in GenBank. Moreover, 292 ESTs were annotated and a GO terminology was assigned to 164 of them. These are the first ESTs reported for P. vivax and, as such, they represent a valuable resource to assist in the annotation of the P. vivax genome currently being sequenced. Moreover, since the GC-content of the P. vivax genome is strikingly different from that of P. falciparum, these ESTs will help in the validation of gene predictions for P. vivax and to create a gene index of this malaria parasite.

  4. ArrayInitiative - a tool that simplifies creating custom Affymetrix CDFs

    PubMed Central

    2011-01-01

    Background Probes on a microarray represent a frozen view of a genome and are quickly outdated when new sequencing studies extend our knowledge, resulting in significant measurement error when analyzing any microarray experiment. There are several bioinformatics approaches to improve probe assignments, but without in-house programming expertise, standardizing these custom array specifications as a usable file (e.g. as Affymetrix CDFs) is difficult, owing mostly to the complexity of the specification file format. However, without correctly standardized files there is a significant barrier for testing competing analysis approaches since this file is one of the required inputs for many commonly used algorithms. The need to test combinations of probe assignments and analysis algorithms led us to develop ArrayInitiative, a tool for creating and managing custom array specifications. Results ArrayInitiative is a standalone, cross-platform, rich client desktop application for creating correctly formatted, custom versions of manufacturer-provided (default) array specifications, requiring only minimal knowledge of the array specification rules and file formats. Users can import default array specifications, import probe sequences for a default array specification, design and import a custom array specification, export any array specification to multiple output formats, export the probe sequences for any array specification and browse high-level information about the microarray, such as version and number of probes. The initial release of ArrayInitiative supports the Affymetrix 3' IVT expression arrays we currently analyze, but as an open source application, we hope that others will contribute modules for other platforms. Conclusions ArrayInitiative allows researchers to create new array specifications, in a standard format, based upon their own requirements. This makes it easier to test competing design and analysis strategies that depend on probe definitions. Since the custom array specifications are easily exported to the manufacturer's standard format, researchers can analyze these customized microarray experiments using established software tools, such as those available in Bioconductor. PMID:21548938

  5. openSputnik--a database to ESTablish comparative plant genomics using unsaturated sequence collections.

    PubMed

    Rudd, Stephen

    2005-01-01

    The public expressed sequence tag collections are continually being enriched with high-quality sequences that represent an ever-expanding range of taxonomically diverse plant species. While these sequence collections provide biased insight into the populations of expressed genes available within individual species and their associated tissues, the information is conceivably of wider relevance in a comparative context. When we consider the available expressed sequence tag (EST) collections of summer 2004, most of the major plant taxonomic clades are at least superficially represented. Investigation of the five million available plant ESTs provides a wealth of information that has applications in modelling the routes of plant genome evolution and the identification of lineage-specific genes and gene families. Over four million ESTs from over 50 distinct plant species have been collated within an EST analysis pipeline called openSputnik. The ESTs were resolved down into approximately one million unigene sequences. These have been annotated using orthology-based annotation transfer from reference plant genomes and using a variety of contemporary bioinformatics methods to assign peptide, structural and functional attributes. The openSputnik database is available at http://sputnik.btk.fi.

  6. Preparation of Term Papers Based upon a Research-Process Model.

    ERIC Educational Resources Information Center

    Feldmann, Rodney Mansfield; Schloman, Barbara Frick

    1990-01-01

    Described is an alternative method of term paper preparation which provides a step-by-step sequence of assignments and provides feedback to the students at all stages in the preparation of the report. An example of this model is provided including 13 sequential assignments. (CW)

  7. Ab Initio structure prediction for Escherichia coli: towards genome-wide protein structure modeling and fold assignment

    PubMed Central

    Xu, Dong; Zhang, Yang

    2013-01-01

    Genome-wide protein structure prediction and structure-based function annotation have been a long-term goal in molecular biology but not yet become possible due to difficulties in modeling distant-homology targets. We developed a hybrid pipeline combining ab initio folding and template-based modeling for genome-wide structure prediction applied to the Escherichia coli genome. The pipeline was tested on 43 known sequences, where QUARK-based ab initio folding simulation generated models with TM-score 17% higher than that by traditional comparative modeling methods. For 495 unknown hard sequences, 72 are predicted to have a correct fold (TM-score > 0.5) and 321 have a substantial portion of structure correctly modeled (TM-score > 0.35). 317 sequences can be reliably assigned to a SCOP fold family based on structural analogy to existing proteins in PDB. The presented results, as a case study of E. coli, represent promising progress towards genome-wide structure modeling and fold family assignment using state-of-the-art ab initio folding algorithms. PMID:23719418

  8. Integrated optimization of location assignment and sequencing in multi-shuttle automated storage and retrieval systems under modified 2n-command cycle pattern

    NASA Astrophysics Data System (ADS)

    Yang, Peng; Peng, Yongfei; Ye, Bin; Miao, Lixin

    2017-09-01

    This article explores the integrated optimization problem of location assignment and sequencing in multi-shuttle automated storage/retrieval systems under the modified 2n-command cycle pattern. The decision of storage and retrieval (S/R) location assignment and S/R request sequencing are jointly considered. An integer quadratic programming model is formulated to describe this integrated optimization problem. The optimal travel cycles for multi-shuttle S/R machines can be obtained to process S/R requests in the storage and retrieval request order lists by solving the model. The small-sized instances are optimally solved using CPLEX. For large-sized problems, two tabu search algorithms are proposed, in which the first come, first served and nearest neighbour are used to generate initial solutions. Various numerical experiments are conducted to examine the heuristics' performance and the sensitivity of algorithm parameters. Furthermore, the experimental results are analysed from the viewpoint of practical application, and a parameter list for applying the proposed heuristics is recommended under different real-life scenarios.

  9. Integrated palynology and sequence stratigraphy of the upper Cretaceous (Maastrichtian) strata, Rio Grande Embayment, Texas, using well, outcrop, and seismic data

    NASA Astrophysics Data System (ADS)

    Mahmoud, Salah El-Din Ragab

    Numerous nomenclature problems surround the Campanian and Maastrichtian strata of the Rio Grande Embayment. Sellards et al. (1932) and Stephenson et al. (1942) placed the Upson Clay and San Miguel Formation in the Taylor Group (Campanian). These workers assigned the overlying Olmos Coal and the Escondido Formation to the Navarro Group (Maastrichtian). Pessagno (1969, p. 90--91) tentatively included the Upson Clay, the San Miguel Formation, the Olmos Coal, and the Escondido Formation in the Navarro Group, but noted that these strata are lithologically dissimilar to those of the type Navarro Group in Navarro County (northeast Texas). He (ibid, p. 91) suggested that "---Future workers should consider the possibility of excluding the entire sequence from the Navarro Group. It is perhaps more closely related to the Difunta Group of Mexico or deserves a group name of its own." Pessagno (1967; 1969, p. 91--92) utilized planktonic foraminiferal biostratigraphic data to determine (1) that the Upson Clay and San Miguel Formation are assignable to the lower Maastrichtian and (2) that the Escondido Formation is assignable to the upper Maastrichtian. The present investigation attempts to build on the chronostratigraphic framework established by Pessagno (1967, 1969). Palynology is used for the first time in this report to generate biostratigraphic, chronostratigraphic, and paleoecological data for the Maastrichtian strata in the study area. New palynological data, integrated with existing planktonic foraminifera and megafossils, indicate that San Miguel Formation and Olmos Coal are of Maastrichtian age. Tying results with seismic data results in dating interpreted sequence boundaries to be within the time interval of 83 and 63 MA. Five seismic facies are delineated and a rate of sediment supply higher than the rate of subsidence during a prolonged progradational episode is suggested in the study area. Three-dimensional seismic data are interpreted in terms of the structure and hydrocarbon potential. Analysis of structural elements indicates northwest-southeast compressional forces resulting from sediment loading. Parasequence-level mapping was carried out and paleogeographic and depositional history was inferred and used in interpreting systems tracts. The study on San Miguel Formation by Weise (1980) was revisited using sequence stratigraphic techniques.

  10. Analysis of Aspergillus nidulans metabolism at the genome-scale

    PubMed Central

    David, Helga; Özçelik, İlknur Ş; Hofmann, Gerald; Nielsen, Jens

    2008-01-01

    Background Aspergillus nidulans is a member of a diverse group of filamentous fungi, sharing many of the properties of its close relatives with significance in the fields of medicine, agriculture and industry. Furthermore, A. nidulans has been a classical model organism for studies of development biology and gene regulation, and thus it has become one of the best-characterized filamentous fungi. It was the first Aspergillus species to have its genome sequenced, and automated gene prediction tools predicted 9,451 open reading frames (ORFs) in the genome, of which less than 10% were assigned a function. Results In this work, we have manually assigned functions to 472 orphan genes in the metabolism of A. nidulans, by using a pathway-driven approach and by employing comparative genomics tools based on sequence similarity. The central metabolism of A. nidulans, as well as biosynthetic pathways of relevant secondary metabolites, was reconstructed based on detailed metabolic reconstructions available for A. niger and Saccharomyces cerevisiae, and information on the genetics, biochemistry and physiology of A. nidulans. Thereby, it was possible to identify metabolic functions without a gene associated, and to look for candidate ORFs in the genome of A. nidulans by comparing its sequence to sequences of well-characterized genes in other species encoding the function of interest. A classification system, based on defined criteria, was developed for evaluating and selecting the ORFs among the candidates, in an objective and systematic manner. The functional assignments served as a basis to develop a mathematical model, linking 666 genes (both previously and newly annotated) to metabolic roles. The model was used to simulate metabolic behavior and additionally to integrate, analyze and interpret large-scale gene expression data concerning a study on glucose repression, thereby providing a means of upgrading the information content of experimental data and getting further insight into this phenomenon in A. nidulans. Conclusion We demonstrate how pathway modeling of A. nidulans can be used as an approach to improve the functional annotation of the genome of this organism. Furthermore we show how the metabolic model establishes functional links between genes, enabling the upgrade of the information content of transcriptome data. PMID:18405346

  11. Discovery of Genome-Wide Microsatellite Markers in Scombridae: A Pilot Study on Albacore Tuna

    PubMed Central

    Nikolic, Natacha; Duthoy, Stéphanie; Destombes, Antoine; Bodin, Nathalie; West, Wendy; Puech, Alexis; Bourjea, Jérôme

    2015-01-01

    Recent developments in sequencing technologies and bioinformatics analysis provide a greater amount of DNA sequencing reads at a low cost. Microsatellites are the markers of choice for a variety of population genetic studies, and high quality markers can be discovered in non-model organisms, such as tuna, with these recent developments. Here, we use a high-throughput method to isolate microsatellite markers in albacore tuna, Thunnus alalunga, based on coupling multiplex enrichment and next-generation sequencing on 454 GS-FLX Titanium pyrosequencing. The crucial minimum number of polymorphic markers to infer evolutionary and ecological processes for this species has been described for the first time. We provide 1670 microsatellite design primer pairs, and technical and molecular genetics selection resulting in 43 polymorphic microsatellite markers. On this panel, we characterized 34 random and selectively neutral markers («neutral») and 9 «non-neutral» markers. The variability of «neutral» markers was screened with 136 individuals of albacore tuna from southwest Indian Ocean (42), northwest Indian Ocean (31), South Africa (31), and southeast Atlantic Ocean (32). Power analysis demonstrated that the panel of genetic markers can be applied in diversity and population genetics studies. Global genetic diversity for albacore was high with a mean number of alleles at 16.94; observed heterozygosity 66% and expected heterozygosity 77%. The number of individuals was insufficient to provide accurate results on differentiation. Of the 9 «non-neutral» markers, 3 were linked to a sequence of known function. The one is located to a sequence having an immunity function (ThuAla-Tcell-01) and the other to a sequence having energy allocation function (ThuAla-Hki-01). These two markers were genotyped on the 136 individuals and presented different diversity levels. ThuAla-Tcell-01 has a high number of alleles (20), heterozygosity (87–90%), and assignment index. ThuAla-Hki-01 has a lower number of alleles (9), low heterozygosity (24–27%), low assignment index and significant inbreeding. Finally, the 34 «neutral» and 3 «non-neutral» microsatellites markers were tested on four economically important Scombridae species—Thunnus albacares, Thunnus thynnus, Thunnus obesus, and Acanthocybium solandri. PMID:26544051

  12. Comprehensive genomic analysis of a plant growth-promoting rhizobacterium Pantoea agglomerans strain P5.

    PubMed

    Shariati J, Vahid; Malboobi, Mohammad Ali; Tabrizi, Zeinab; Tavakol, Elahe; Owilia, Parviz; Safari, Maryam

    2017-11-15

    In this study, we provide a comparative genomic analysis of Pantoea agglomerans strain P5 and 10 closely related strains based on phylogenetic analyses. A next-generation shotgun strategy was implemented using the Illumina HiSeq 2500 technology followed by core- and pan-genome analysis. The genome of P. agglomerans strain P5 contains an assembly size of 5082485 bp with 55.4% G + C content. P. agglomerans consists of 2981 core and 3159 accessory genes for Coding DNA Sequences (CDSs) based on the pan-genome analysis. Strain P5 can be grouped closely with strains PG734 and 299 R using pan and core genes, respectively. All the predicted and annotated gene sequences were allocated to KEGG pathways. Accordingly,  genes involved in plant growth-promoting (PGP) ability, including phosphate solubilization, IAA and siderophore production, acetoin and 2,3-butanediol synthesis and bacterial secretion, were assigned. This study provides an in-depth view of the PGP characteristics of strain P5, highlighting its potential use in agriculture as a biofertilizer.

  13. In silico search, characterization and validation of new EST-SSR markers in the genus Prunus.

    PubMed

    Sorkheh, Karim; Prudencio, Angela S; Ghebinejad, Azim; Dehkordi, Mehrana Kohei; Erogul, Deniz; Rubio, Manuel; Martínez-Gómez, Pedro

    2016-07-07

    Simple sequence repeats (SSRs) are defined as sequence repeat units between 1 and 6 bp that occur in both coding and non-coding regions abundant in eukaryotic genomes, which may affect the expression of genes. In this study, expressed sequence tags (ESTs) of eight Prunus species were analyzed for in silico mining of EST-SSRs, protein annotation, and open reading frames (ORFs), and the identification of codon repetitions. A total of 316 SSRs were identified using MISA software. Dinucleotide SSR motifs (26.31 %) were found to be the most abundant type of repeats, followed by tri- (14.58 %), tetra- (0.53 %), and penta- (0.27 %) nucleotide motifs. An attempt was made to design primer pairs for 316 identified SSRs but these were successful for only 175 SSR sequences. The positions of SSRs with respect to ORFs were detected, and annotation of sequences containing SSRs was performed to assign function to each sequence. SSRs were also characterized (in terms of position in the reference genome and associated gene) using the two available Prunus reference genomes (mei and peach). Finally, 38 SSR markers were validated across peach, almond, plum, and apricot genotypes. This validation showed a higher transferability level of EST-SSR developed in P. mume (mei) in comparison with the rest of species analyzed. Findings will aid analysis of functionally important molecular markers and facilitate the analysis of genetic diversity.

  14. Identification of Sinorhizobium (Ensifer) medicae based on a specific genomic sequence unveiled by M13-PCR fingerprinting.

    PubMed

    Dourado, Ana Catarina; Alves, Paula I L; Tenreiro, Tania; Ferreira, Eugénio M; Tenreiro, Rogério; Fareleira, Paula; Crespo, M Teresa Barreto

    2009-12-01

    A collection of nodule isolates from Medicago polymorpha obtained from southern and central Portugal was evaluated by M13-PCR fingerprinting and hierarchical cluster analysis. Several genomic clusters were obtained which, by 16S rRNA gene sequencing of selected representatives, were shown to be associated with particular taxonomic groups of rhizobia and other soil bacteria. The method provided a clear separation between rhizobia and co-isolated non-symbiotic soil contaminants. Ten M13-PCR groups were assigned to Sinorhizobium (Ensifer) medicae and included all isolates responsible for the formation of nitrogen-fixing nodules upon re-inoculation of M. polymorpha test-plants. In addition, enterobacterial repetitive intergenic consensus (ERIC)-PCR fingerprinting indicated a high genomic heterogeneity within the major M13- PCR clusters of S. medicae isolates. Based on nucleotide sequence data of an M13-PCR amplicon of ca. 1500 bp, observed only in S. medicae isolates and spanning locus Smed_3707 to Smed_3709 from the pSMED01 plasmid sequence of S. medicae WSM419 genome's sequence, a pair of PCR primers was designed and used for direct PCR amplification of a 1399-bp sequence within this fragment. Additional in silico and in vitro experiments, as well as phylogenetic analysis, confirmed the specificity of this primer combination and therefore the reliability of this approach in the prompt identification of S. medicae isolates and their distinction from other soil bacteria.

  15. The Use of a Sequenced Questioning Paradigm to Facilitate Associative Fluency in Preschoolers.

    ERIC Educational Resources Information Center

    Pellegrini, A. D.; Greene, Helen

    The extent to which free play versus sequenced questioning conditions facilitates preschoolers' associative fluency was investigated in this study. Twenty-four children (12 boys and 12 girls, with a mean age of 50.7 months) were randomly assigned to one of three conditions: free play, sequenced questioning, and control. In the sequenced…

  16. Putative and unique gene sequence utilization for the design of species specific probes as modeled by Lactobacillus plantarum

    USDA-ARS?s Scientific Manuscript database

    The concept of utilizing putative and unique gene sequences for the design of species specific probes was tested. The abundance profile of assigned functions within the Lactobacillus plantarum genome was used for the identification of the putative and unique gene sequence, csh. The targeted gene (cs...

  17. Evaluation of a dkgB linked intergenic sequence ribotyping (ISR) method for assigning serotype to Salmonella enterica isolated from poultry environmental samples.

    USDA-ARS?s Scientific Manuscript database

    The Kauffman White (KW) serotyping method requires more than 250 antisera to characterize more than 2,500 Salmonella serovars. The complexity of serotyping could be overcome using molecular methods. In this study, a dkgB-linked intergenic sequence ribotyping (ISR) method that generates sequence occu...

  18. Assessing the Impact of Sequencing Practicums for Welding in Agricultural Mechanics

    ERIC Educational Resources Information Center

    Rose, Malcolm; Pate, Michael L.; Lawver, Rebecca G.; Warnick, Brian K.; Dai, Xin

    2015-01-01

    This study examined the impact of sequencing practicums for welding on students' ability to perform a 1F (flat position-fillet lap joint) weld on low-carbon steel. Participants were randomly assigned a specific practice sequence of welding for using gas metal arc welding (GMAW) and shielded metal arc welding (SMAW). A total of 71 participants…

  19. Genomic Sequence of the WHO International Standard for Hepatitis A Virus RNA.

    PubMed

    Jenkins, Adrian; Minhas, Rehan; Morris, Clare; Berry, Neil

    2018-05-10

    The World Health Organization (WHO) international standard for hepatitis A virus (HAV) RNA nucleic acid assays was characterized by complete genome sequencing. The entire coding sequence and noncoding regions were assigned HAV genotype IB. This information will aid the design, development, and evaluation of HAV RNA amplification assays. Copyright © 2018 Jenkins et al.

  20. Automatic phylogenetic classification of bacterial beta-lactamase sequences including structural and antibiotic substrate preference information.

    PubMed

    Ma, Jianmin; Eisenhaber, Frank; Maurer-Stroh, Sebastian

    2013-12-01

    Beta lactams comprise the largest and still most effective group of antibiotics, but bacteria can gain resistance through different beta lactamases that can degrade these antibiotics. We developed a user friendly tree building web server that allows users to assign beta lactamase sequences to their respective molecular classes and subclasses. Further clinically relevant information includes if the gene is typically chromosomal or transferable through plasmids as well as listing the antibiotics which the most closely related reference sequences are known to target and cause resistance against. This web server can automatically build three phylogenetic trees: the first tree with closely related sequences from a Tachyon search against the NCBI nr database, the second tree with curated reference beta lactamase sequences, and the third tree built specifically from substrate binding pocket residues of the curated reference beta lactamase sequences. We show that the latter is better suited to recover antibiotic substrate assignments through nearest neighbor annotation transfer. The users can also choose to build a structural model for the query sequence and view the binding pocket residues of their query relative to other beta lactamases in the sequence alignment as well as in the 3D structure relative to bound antibiotics. This web server is freely available at http://blac.bii.a-star.edu.sg/.

  1. Logic system aids in evaluation of project readiness

    NASA Technical Reports Server (NTRS)

    Maris, S. J.; Obrien, T. J.

    1966-01-01

    Measurement Operational Readiness Requirements /MORR/ assignments logic is used for determining the readiness of a complex project to go forward as planned. The system used logic network which assigns qualities to all important criteria in a project and establishes a logical sequence of measurements to determine what the conditions are.

  2. PONDEROSA, an automated 3D-NOESY peak picking program, enables automated protein structure determination.

    PubMed

    Lee, Woonghee; Kim, Jin Hae; Westler, William M; Markley, John L

    2011-06-15

    PONDEROSA (Peak-picking Of Noe Data Enabled by Restriction of Shift Assignments) accepts input information consisting of a protein sequence, backbone and sidechain NMR resonance assignments, and 3D-NOESY ((13)C-edited and/or (15)N-edited) spectra, and returns assignments of NOESY crosspeaks, distance and angle constraints, and a reliable NMR structure represented by a family of conformers. PONDEROSA incorporates and integrates external software packages (TALOS+, STRIDE and CYANA) to carry out different steps in the structure determination. PONDEROSA implements internal functions that identify and validate NOESY peak assignments and assess the quality of the calculated three-dimensional structure of the protein. The robustness of the analysis results from PONDEROSA's hierarchical processing steps that involve iterative interaction among the internal and external modules. PONDEROSA supports a variety of input formats: SPARKY assignment table (.shifts) and spectrum file formats (.ucsf), XEASY proton file format (.prot), and NMR-STAR format (.star). To demonstrate the utility of PONDEROSA, we used the package to determine 3D structures of two proteins: human ubiquitin and Escherichia coli iron-sulfur scaffold protein variant IscU(D39A). The automatically generated structural constraints and ensembles of conformers were as good as or better than those determined previously by much less automated means. The program, in the form of binary code along with tutorials and reference manuals, is available at http://ponderosa.nmrfam.wisc.edu/.

  3. DNA Music.

    ERIC Educational Resources Information Center

    Miner, Carol; della Villa, Paula

    1997-01-01

    Describes an activity in which students reverse-translate proteins from their amino acid sequences back to their DNA sequences then assign musical notes to represent the adenine, guanine, cytosine, and thymine bases. Data is obtained from the National Institutes of Health (NIH) on the Internet. (DDR)

  4. PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification.

    PubMed

    Thomas, Paul D; Kejariwal, Anish; Campbell, Michael J; Mi, Huaiyu; Diemer, Karen; Guo, Nan; Ladunga, Istvan; Ulitsky-Lazareva, Betty; Muruganujan, Anushya; Rabkin, Steven; Vandergriff, Jody A; Doremieux, Olivier

    2003-01-01

    The PANTHER database was designed for high-throughput analysis of protein sequences. One of the key features is a simplified ontology of protein function, which allows browsing of the database by biological functions. Biologist curators have associated the ontology terms with groups of protein sequences rather than individual sequences. Statistical models (Hidden Markov Models, or HMMs) are built from each of these groups. The advantage of this approach is that new sequences can be automatically classified as they become available. To ensure accurate functional classification, HMMs are constructed not only for families, but also for functionally distinct subfamilies. Multiple sequence alignments and phylogenetic trees, including curator-assigned information, are available for each family. The current version of the PANTHER database includes training sequences from all organisms in the GenBank non-redundant protein database, and the HMMs have been used to classify gene products across the entire genomes of human, and Drosophila melanogaster. The ontology terms and protein families and subfamilies, as well as Drosophila gene c;assifications, can be browsed and searched for free. Due to outstanding contractual obligations, access to human gene classifications and to protein family trees and multiple sequence alignments will temporarily require a nominal registration fee. PANTHER is publicly available on the web at http://panther.celera.com.

  5. The genome sequence of the outbreeding globe artichoke constructed de novo incorporating a phase-aware low-pass sequencing strategy of F1 progeny

    PubMed Central

    Scaglione, Davide; Reyes-Chin-Wo, Sebastian; Acquadro, Alberto; Froenicke, Lutz; Portis, Ezio; Beitel, Christopher; Tirone, Matteo; Mauro, Rosario; Lo Monaco, Antonino; Mauromicale, Giovanni; Faccioli, Primetta; Cattivelli, Luigi; Rieseberg, Loren; Michelmore, Richard; Lanteri, Sergio

    2016-01-01

    Globe artichoke (Cynara cardunculus var. scolymus) is an out-crossing, perennial, multi-use crop species that is grown worldwide and belongs to the Compositae, one of the most successful Angiosperm families. We describe the first genome sequence of globe artichoke. The assembly, comprising of 13,588 scaffolds covering 725 of the 1,084 Mb genome, was generated using ~133-fold Illumina sequencing data and encodes 26,889 predicted genes. Re-sequencing (30×) of globe artichoke and cultivated cardoon (C. cardunculus var. altilis) parental genotypes and low-coverage (0.5 to 1×) genotyping-by-sequencing of 163 F1 individuals resulted in 73% of the assembled genome being anchored in 2,178 genetic bins ordered along 17 chromosomal pseudomolecules. This was achieved using a novel pipeline, SOILoCo (Scaffold Ordering by Imputation with Low Coverage), to detect heterozygous regions and assign parental haplotypes with low sequencing read depth and of unknown phase. SOILoCo provides a powerful tool for de novo genome analysis of outcrossing species. Our data will enable genome-scale analyses of evolutionary processes among crops, weeds, and wild species within and beyond the Compositae, and will facilitate the identification of economically important genes from related species. PMID:26786968

  6. Total Extracellular Small RNA Profiles from Plasma, Saliva, and Urine of Healthy Subjects

    PubMed Central

    Yeri, Ashish; Courtright, Amanda; Reiman, Rebecca; Carlson, Elizabeth; Beecroft, Taylor; Janss, Alex; Siniard, Ashley; Richholt, Ryan; Balak, Chris; Rozowsky, Joel; Kitchen, Robert; Hutchins, Elizabeth; Winarta, Joseph; McCoy, Roger; Anastasi, Matthew; Kim, Seungchan; Huentelman, Matthew; Van Keuren-Jensen, Kendall

    2017-01-01

    Interest in circulating RNAs for monitoring and diagnosing human health has grown significantly. There are few datasets describing baseline expression levels for total cell-free circulating RNA from healthy control subjects. In this study, total extracellular RNA (exRNA) was isolated and sequenced from 183 plasma samples, 204 urine samples and 46 saliva samples from 55 male college athletes ages 18–25 years. Many participants provided more than one sample, allowing us to investigate variability in an individual’s exRNA expression levels over time. Here we provide a systematic analysis of small exRNAs present in each biofluid, as well as an analysis of exogenous RNAs. The small RNA profile of each biofluid is distinct. We find that a large number of RNA fragments in plasma (63%) and urine (54%) have sequences that are assigned to YRNA and tRNA fragments respectively. Surprisingly, while many miRNAs can be detected, there are few miRNAs that are consistently detected in all samples from a single biofluid, and profiles of miRNA are different for each biofluid. Not unexpectedly, saliva samples have high levels of exogenous sequence that can be traced to bacteria. These data significantly contribute to the current number of sequenced exRNA samples from normal healthy individuals. PMID:28303895

  7. LESSONS IN DE NOVO PEPTIDE SEQUENCING BY TANDEM MASS SPECTROMETRY

    PubMed Central

    Medzihradszky, Katalin F.; Chalkley, Robert J.

    2015-01-01

    Mass spectrometry has become the method of choice for the qualitative and quantitative characterization of protein mixtures isolated from all kinds of living organisms. The raw data in these studies are MS/MS spectra, usually of peptides produced by proteolytic digestion of a protein. These spectra are “translated” into peptide sequences, normally with the help of various search engines. Data acquisition and interpretation have both been automated, and most researchers look only at the summary of the identifications without ever viewing the underlying raw data used for assignments. Automated analysis of data is essential due to the volume produced. However, being familiar with the finer intricacies of peptide fragmentation processes, and experiencing the difficulties of manual data interpretation allow a researcher to be able to more critically evaluate key results, particularly because there are many known rules of peptide fragmentation that are not incorporated into search engine scoring. Since the most commonly used MS/MS activation method is collision-induced dissociation (CID), in this article we present a brief review of the history of peptide CID analysis. Next, we provide a detailed tutorial on how to determine peptide sequences from CID data. Although the focus of the tutorial is de novo sequencing, the lessons learned and resources supplied are useful for data interpretation in general. PMID:25667941

  8. [Transcriptome analysis of Dunaliella viridis].

    PubMed

    Zhu, Shuai-qi; Gong, Yi-fu; Hang, Yu-qing; Liu, Hao; Wang, He-yu

    2015-08-01

    In order to understand the gene information, function, haloduric pathway (glycerolipid metabolism) and related key genes for Dunaliella viridis, we used Illumina HiSeqTM 2000 high-throughput sequencing technology to sequence its transcriptome. Trinity soft was used to assemble the data to form transcripts. Based on the Clusters of Orthologous Groups (COG), Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG ) databases, we carried out functional annotation and classification, pathway annotation, and the opening reading fragment (ORF) sequence prediction of transcripts. The key genes in the glycerolipid metabolism were analyzed. The results suggested that 81,593 transcripts were found, and 77,117 ORF sequences were predicted, accounting for 94.50% of all transcripts. COG classification results showed that 16,569 transcripts were assigned to 24 categories. GO classification annotated 76,436 transcripts. The number of transcripts for biologcial processes was 30,678, accounting for 40.14% of all transcripts. KEGG pathway analysis showed that 26,428 transcripts were annotated to 317 pathways, and 131 pathways were related to metabolism, accounting for 41.32% of all annotated pathways. Only one transcript was annotated as coding the key enzyme dihydroxyacetone kinase involved in the glycerolipid pathway. This enzyme could be related to glycerol biosynthesis under salt stress. This study further improved the gene information and laid the foundation of metabolic pathway research for Dunaliella viridis.

  9. Streptococcus ovuberis sp. nov., isolated from a subcutaneous abscess in the udder of a sheep.

    PubMed

    Zamora, Leydis; Pérez-Sancho, Marta; Fernández-Garayzábal, Jose Francisco; Orden, Jose Antonio; Domínguez-Bernal, Gustavo; de la Fuente, Ricardo; Domínguez, Lucas; Vela, Ana Isabel

    2017-11-01

    One unidentified, Gram-stain-positive, catalase-negative coccus-shaped organism was recovered from a subcutaneous abscess of the udder of a sheep and subjected to a polyphasic taxonomic analysis. Based on cellular morphology and biochemical criteria, the isolate was tentatively assigned to the genus Streptococcus, although the organism did not appear to match any recognized species. 16S rRNA gene sequence comparison studies confirmed its identification as a member of the genus Streptococcus and showed that the nearest phylogenetic relatives of the unknown coccus corresponded to Streptococcus moroccensis and Streptococcus cameli (95.9 % 16S rRNA gene sequence similarity). The sodA sequence analysis showed less than 89.3 % sequence similarity with the currently recognized species of the genus Streptococcus. The novel bacterial isolate was distinguished from close relatives of the genus Streptococcusby using biochemical tests. A mass spectrometry profile was also obtained for the novel isolate using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS). Based on both phenotypic and phylogenetic findings, it is proposed that the unknown bacterium be classified as a representative of a novel species of the genus Streptococcus, Streptococcus ovuberis sp. nov. The type strain of Streptococcus ovuberissp. nov. is VB15-00779 T (=CECT 9179 T =CCUG 69612 T ).

  10. Pseudomonas aestus sp. nov., a plant growth-promoting bacterium isolated from mangrove sediments.

    PubMed

    Vasconcellos, Rafael L F; Santos, Suikinai Nobre; Zucchi, Tiago Domingues; Silva, Fábio Sérgio Paulino; Souza, Danilo Tosta; Melo, Itamar Soares

    2017-10-01

    Strain CMAA 1215 T , a Gram-reaction-negative, aerobic, catalase positive, polarly flagellated, motile, rod-shaped (0.5-0.8 × 1.3-1.9 µm) bacterium, was isolated from mangrove sediments, Cananéia Island, Brazil. Analysis of the 16S rRNA gene sequences showed that strain CMAA 1215 T forms a distinct phyletic line within the Pseudomonas putida subclade, being closely related to P. plecoglossicida ATCC 700383 T , P. monteilii NBRC 103158 T , and P. taiwanensis BCRC 17751 T of sequence similarity of 98.86, 98.73, and 98.71%, respectively. Genomic comparisons of the strain CMAA 1215 T with its closest phylogenetic type strains using average nucleotide index (ANI) and DNA:DNA relatedness approaches revealed 84.3-85.3% and 56.0-63.0%, respectively. A multilocus sequence analysis (MLSA) performed concatenating 16S rRNA, gyrB and rpoB gene sequences from the novel species was related with Pseudomonas putida subcluster and formed a new phylogenetic lineage. The phenotypic, physiological, biochemical, and genetic characteristics support the assignment of CMAA 1215 T to the genus Pseudomonas, representing a novel species. The name Pseudomonas aestus sp.nov. is proposed, with CMAA 1215 T (=NRRL B-653100 T  = CBMAI 1962 T ) as the type strain.

  11. Genome sequencing of adzuki bean (Vigna angularis) provides insight into high starch and low fat accumulation and domestication.

    PubMed

    Yang, Kai; Tian, Zhixi; Chen, Chunhai; Luo, Longhai; Zhao, Bo; Wang, Zhuo; Yu, Lili; Li, Yisong; Sun, Yudong; Li, Weiyu; Chen, Yan; Li, Yongqiang; Zhang, Yueyang; Ai, Danjiao; Zhao, Jinyang; Shang, Cheng; Ma, Yong; Wu, Bin; Wang, Mingli; Gao, Li; Sun, Dongjing; Zhang, Peng; Guo, Fangfang; Wang, Weiwei; Li, Yuan; Wang, Jinlong; Varshney, Rajeev K; Wang, Jun; Ling, Hong-Qing; Wan, Ping

    2015-10-27

    Adzuki bean (Vigna angularis), an important legume crop, is grown in more than 30 countries of the world. The seed of adzuki bean, as an important source of starch, digestible protein, mineral elements, and vitamins, is widely used foods for at least a billion people. Here, we generated a high-quality draft genome sequence of adzuki bean by whole-genome shotgun sequencing. The assembled contig sequences reached to 450 Mb (83% of the genome) with an N50 of 38 kb, and the total scaffold sequences were 466.7 Mb with an N50 of 1.29 Mb. Of them, 372.9 Mb of scaffold sequences were assigned to the 11 chromosomes of adzuki bean by using a single nucleotide polymorphism genetic map. A total of 34,183 protein-coding genes were predicted. Functional analysis revealed that significant differences in starch and fat content between adzuki bean and soybean were likely due to transcriptional abundance, rather than copy number variations, of the genes related to starch and oil synthesis. We detected strong selection signals in domestication by the population analysis of 50 accessions including 11 wild, 11 semiwild, 17 landraces, and 11 improved varieties. In addition, the semiwild accessions were illuminated to have a closer relationship to the cultigen accessions than the wild type, suggesting that the semiwild adzuki bean might be a preliminary landrace and play some roles in the adzuki bean domestication. The genome sequence of adzuki bean will facilitate the identification of agronomically important genes and accelerate the improvement of adzuki bean.

  12. Genome sequencing of adzuki bean (Vigna angularis) provides insight into high starch and low fat accumulation and domestication

    PubMed Central

    Yang, Kai; Tian, Zhixi; Chen, Chunhai; Luo, Longhai; Zhao, Bo; Wang, Zhuo; Yu, Lili; Li, Yisong; Sun, Yudong; Li, Weiyu; Chen, Yan; Li, Yongqiang; Zhang, Yueyang; Ai, Danjiao; Zhao, Jinyang; Shang, Cheng; Ma, Yong; Wu, Bin; Wang, Mingli; Gao, Li; Sun, Dongjing; Zhang, Peng; Guo, Fangfang; Wang, Weiwei; Li, Yuan; Wang, Jinlong; Varshney, Rajeev K.; Wang, Jun; Ling, Hong-Qing; Wan, Ping

    2015-01-01

    Adzuki bean (Vigna angularis), an important legume crop, is grown in more than 30 countries of the world. The seed of adzuki bean, as an important source of starch, digestible protein, mineral elements, and vitamins, is widely used foods for at least a billion people. Here, we generated a high-quality draft genome sequence of adzuki bean by whole-genome shotgun sequencing. The assembled contig sequences reached to 450 Mb (83% of the genome) with an N50 of 38 kb, and the total scaffold sequences were 466.7 Mb with an N50 of 1.29 Mb. Of them, 372.9 Mb of scaffold sequences were assigned to the 11 chromosomes of adzuki bean by using a single nucleotide polymorphism genetic map. A total of 34,183 protein-coding genes were predicted. Functional analysis revealed that significant differences in starch and fat content between adzuki bean and soybean were likely due to transcriptional abundance, rather than copy number variations, of the genes related to starch and oil synthesis. We detected strong selection signals in domestication by the population analysis of 50 accessions including 11 wild, 11 semiwild, 17 landraces, and 11 improved varieties. In addition, the semiwild accessions were illuminated to have a closer relationship to the cultigen accessions than the wild type, suggesting that the semiwild adzuki bean might be a preliminary landrace and play some roles in the adzuki bean domestication. The genome sequence of adzuki bean will facilitate the identification of agronomically important genes and accelerate the improvement of adzuki bean. PMID:26460024

  13. Genes encoding calmodulin-binding proteins in the Arabidopsis genome

    NASA Technical Reports Server (NTRS)

    Reddy, Vaka S.; Ali, Gul S.; Reddy, Anireddy S N.

    2002-01-01

    Analysis of the recently completed Arabidopsis genome sequence indicates that approximately 31% of the predicted genes could not be assigned to functional categories, as they do not show any sequence similarity with proteins of known function from other organisms. Calmodulin (CaM), a ubiquitous and multifunctional Ca(2+) sensor, interacts with a wide variety of cellular proteins and modulates their activity/function in regulating diverse cellular processes. However, the primary amino acid sequence of the CaM-binding domain in different CaM-binding proteins (CBPs) is not conserved. One way to identify most of the CBPs in the Arabidopsis genome is by protein-protein interaction-based screening of expression libraries with CaM. Here, using a mixture of radiolabeled CaM isoforms from Arabidopsis, we screened several expression libraries prepared from flower meristem, seedlings, or tissues treated with hormones, an elicitor, or a pathogen. Sequence analysis of 77 positive clones that interact with CaM in a Ca(2+)-dependent manner revealed 20 CBPs, including 14 previously unknown CBPs. In addition, by searching the Arabidopsis genome sequence with the newly identified and known plant or animal CBPs, we identified a total of 27 CBPs. Among these, 16 CBPs are represented by families with 2-20 members in each family. Gene expression analysis revealed that CBPs and CBP paralogs are expressed differentially. Our data suggest that Arabidopsis has a large number of CBPs including several plant-specific ones. Although CaM is highly conserved between plants and animals, only a few CBPs are common to both plants and animals. Analysis of Arabidopsis CBPs revealed the presence of a variety of interesting domains. Our analyses identified several hypothetical proteins in the Arabidopsis genome as CaM targets, suggesting their involvement in Ca(2+)-mediated signaling networks.

  14. Tissue-specific Proteogenomic Analysis of Plutella xylostella Larval Midgut Using a Multialgorithm Pipeline*

    PubMed Central

    Zhu, Xun; Xie, Shangbo; Armengaud, Jean; Xie, Wen; Guo, Zhaojiang; Kang, Shi; Wu, Qingjun; Wang, Shaoli; Xia, Jixing; He, Rongjun; Zhang, Youjun

    2016-01-01

    The diamondback moth, Plutella xylostella (L.), is the major cosmopolitan pest of brassica and other cruciferous crops. Its larval midgut is a dynamic tissue that interfaces with a wide variety of toxicological and physiological processes. The draft sequence of the P. xylostella genome was recently released, but its annotation remains challenging because of the low sequence coverage of this branch of life and the poor description of exon/intron splicing rules for these insects. Peptide sequencing by computational assignment of tandem mass spectra to genome sequence information provides an experimental independent approach for confirming or refuting protein predictions, a concept that has been termed proteogenomics. In this study, we carried out an in-depth proteogenomic analysis to complement genome annotation of P. xylostella larval midgut based on shotgun HPLC-ESI-MS/MS data by means of a multialgorithm pipeline. A total of 876,341 tandem mass spectra were searched against the predicted P. xylostella protein sequences and a whole-genome six-frame translation database. Based on a data set comprising 2694 novel genome search specific peptides, we discovered 439 novel protein-coding genes and corrected 128 existing gene models. To get the most accurate data to seed further insect genome annotation, more than half of the novel protein-coding genes, i.e. 235 over 439, were further validated after RT-PCR amplification and sequencing of the corresponding transcripts. Furthermore, we validated 53 novel alternative splicings. Finally, a total of 6764 proteins were identified, resulting in one of the most comprehensive proteogenomic study of a nonmodel animal. As the first tissue-specific proteogenomics analysis of P. xylostella, this study provides the fundamental basis for high-throughput proteomics and functional genomics approaches aimed at deciphering the molecular mechanisms of resistance and controlling this pest. PMID:26902207

  15. Candida phyllophila sp. nov. and Candida vitiphila sp. nov., two novel yeast species from grape phylloplane in Thailand.

    PubMed

    Limtong, Savitree; Kaewwichian, Rungluk

    2013-01-01

    Three strains (K59(T), K60 and K70 (T)) representing two novel yeast species were isolated from the external surface of leaves of different wine grape (Vitis vinifera) plants, which were collected from the Kanchanaburi Research Station (N14°07'15.1″ E099°19'05.6″), Wang Dong Sub-district, Mueang District, Kanchanaburi Province, Thailand, by an enrichment technique. The sequences of the D1/D2 domain of the large subunit (LSU) rRNA gene of two strains (K59(T) and K60) were identical and differed from that of strain K70(T). In terms of pairwise sequence similarity of the D1/D2 domain, the closest species to the three strains was Candida asparagi but with 2.3% nucleotide substitutions for strains K59(T) and K60, and 2.1% nucleotide substitutions for strain K70(T). On the basis of morphological, biochemical, physiological and chemotaxonomic characteristics and the sequence analysis of the D1/D2 domain of the large subunit (LSU) rRNA gene, the three strains were assigned to be two novel Candida species. Two strains (K59(T) and K60) were assigned as Candida phyllophila sp. nov. (type strain K59(T)=BCC 42662(T)=NBRC 107776(T)=CBS 12671(T)). Candida vitiphila sp. nov. is proposed for strain K70(T) (=BCC 42663(T)=NBRC 107777(T)=CBS 12672(T)).

  16. Isolation and characterization of a novel Rhabdovirus from a wild boar (Sus scrofa) in Japan.

    PubMed

    Sakai, Kouji; Hagiwara, Katsuro; Omatsu, Tsutomu; Hamasaki, Chinami; Kuwata, Ryusei; Shimoda, Hiroshi; Suzuki, Kazuo; Endoh, Daiji; Nagata, Noriyo; Nagai, Makoto; Katayama, Yukie; Oba, Mami; Kurane, Ichiro; Saijo, Masayuki; Morikawa, Shigeru; Mizutani, Tetsuya; Maeda, Ken

    2015-09-30

    A novel rhabdovirus was isolated from the serum of a healthy Japanese wild boar (Sus scrofa leucomystax) and identified using the rapid determination system for viral nucleic acid sequences (RDV), next-generation sequencing, and electron microscopy. The virus was tentatively named wild boar rhabdovirus 1 (WBRV1). Phylogenetic analysis of the entire genome sequence indicated that WBRV1 is closely related to Tupaia rhabdovirus (TRV), which was isolated from cultured cells of hepatocellular carcinoma tissue of tree shrew. TRV has not been assigned to any genus of Rhabdoviridae till date. Analysis of the L gene indicated that WBRV1 belongs to the genus Vesiculovirus. These observations suggest that both TRV and WBRV1 belong to a new genus of Rhabdoviridae. Next-generation genome sequencing of WBRV1 revealed 5 open reading frames of 1329, 765, 627, 1629, and 6336 bases in length. The WBRV1 gene sequences are similar to those of other rhabdoviruses. Epizootiological analysis of a population of wild boars in Wakayama prefecture in Japan indicated that 6.5% were positive for the WBRV1 gene and 52% were positive for WBRV1-neutralizing antibodies. Furthermore, such viral neutralizing antibodies were found in domestic pigs in another prefecture. WBRV1 was inoculated intranasally and intraperitoneally into SCID and BALB/c mice and viral RNA was detected in SCID mice, suggesting that WBRV1 can replicate in immunocompromised mice. These results indicate this novel virus is endemic in wild animals and livestock in Japan. Copyright © 2015 Elsevier B.V. All rights reserved.

  17. A Single Multilocus Sequence Typing (MLST) Scheme for Seven Pathogenic Leptospira Species

    PubMed Central

    Amornchai, Premjit; Wuthiekanun, Vanaporn; Bailey, Mark S.; Holden, Matthew T. G.; Zhang, Cuicai; Jiang, Xiugao; Koizumi, Nobuo; Taylor, Kyle; Galloway, Renee; Hoffmaster, Alex R.; Craig, Scott; Smythe, Lee D.; Hartskeerl, Rudy A.; Day, Nicholas P.; Chantratita, Narisara; Feil, Edward J.; Aanensen, David M.; Spratt, Brian G.; Peacock, Sharon J.

    2013-01-01

    Background The available Leptospira multilocus sequence typing (MLST) scheme supported by a MLST website is limited to L. interrogans and L. kirschneri. Our aim was to broaden the utility of this scheme to incorporate a total of seven pathogenic species. Methodology and Findings We modified the existing scheme by replacing one of the seven MLST loci (fadD was changed to caiB), as the former gene did not appear to be present in some pathogenic species. Comparison of the original and modified schemes using data for L. interrogans and L. kirschneri demonstrated that the discriminatory power of the two schemes was not significantly different. The modified scheme was used to further characterize 325 isolates (L. alexanderi [n = 5], L. borgpetersenii [n = 34], L. interrogans [n = 222], L. kirschneri [n = 29], L. noguchii [n = 9], L. santarosai [n = 10], and L. weilii [n = 16]). Phylogenetic analysis using concatenated sequences of the 7 loci demonstrated that each species corresponded to a discrete clade, and that no strains were misclassified at the species level. Comparison between genotype and serovar was possible for 254 isolates. Of the 31 sequence types (STs) represented by at least two isolates, 18 STs included isolates assigned to two or three different serovars. Conversely, 14 serovars were identified that contained between 2 to 10 different STs. New observations were made on the global phylogeography of Leptospira spp., and the utility of MLST in making associations between human disease and specific maintenance hosts was demonstrated. Conclusion The new MLST scheme, supported by an updated MLST website, allows the characterization and species assignment of isolates of the seven major pathogenic species associated with leptospirosis. PMID:23359622

  18. Detailed analysis of metagenome datasets obtained from biogas-producing microbial communities residing in biogas reactors does not indicate the presence of putative pathogenic microorganisms

    PubMed Central

    2013-01-01

    Background In recent years biogas plants in Germany have been supposed to be involved in amplification and dissemination of pathogenic bacteria causing severe infections in humans and animals. In particular, biogas plants are discussed to contribute to the spreading of Escherichia coli infections in humans or chronic botulism in cattle caused by Clostridium botulinum. Metagenome datasets of microbial communities from an agricultural biogas plant as well as from anaerobic lab-scale digesters operating at different temperatures and conditions were analyzed for the presence of putative pathogenic bacteria and virulence determinants by various bioinformatic approaches. Results All datasets featured a low abundance of reads that were taxonomically assigned to the genus Escherichia or further selected genera comprising pathogenic species. Higher numbers of reads were taxonomically assigned to the genus Clostridium. However, only very few sequences were predicted to originate from pathogenic clostridial species. Moreover, mapping of metagenome reads to complete genome sequences of selected pathogenic bacteria revealed that not the pathogenic species itself, but only species that are more or less related to pathogenic ones are present in the fermentation samples analyzed. Likewise, known virulence determinants could hardly be detected. Only a marginal number of reads showed similarity to sequences described in the Microbial Virulence Database MvirDB such as those encoding protein toxins, virulence proteins or antibiotic resistance determinants. Conclusions Findings of this first study of metagenomic sequence reads of biogas producing microbial communities suggest that the risk of dissemination of pathogenic bacteria by application of digestates from biogas fermentations as fertilizers is low, because obtained results do not indicate the presence of putative pathogenic microorganisms in the samples analyzed. PMID:23557021

  19. Species Identification of Archaeological Skin Objects from Danish Bogs: Comparison between Mass Spectrometry-Based Peptide Sequencing and Microscopy-Based Methods

    PubMed Central

    Brandt, Luise Ørsted; Schmidt, Anne Lisbeth; Mannering, Ulla; Sarret, Mathilde; Kelstrup, Christian D.; Olsen, Jesper V.; Cappellini, Enrico

    2014-01-01

    Denmark has an extraordinarily large and well-preserved collection of archaeological skin garments found in peat bogs, dated to approximately 920 BC – AD 775. These objects provide not only the possibility to study prehistoric skin costume and technologies, but also to investigate the animal species used for the production of skin garments. Until recently, species identification of archaeological skin was primarily performed by light and scanning electron microscopy or the analysis of ancient DNA. However, the efficacy of these methods can be limited due to the harsh, mostly acidic environment of peat bogs leading to morphological and molecular degradation within the samples. We compared species assignment results of twelve archaeological skin samples from Danish bogs using Mass Spectrometry (MS)-based peptide sequencing, against results obtained using light and scanning electron microscopy. While it was difficult to obtain reliable results using microscopy, MS enabled the identification of several species-diagnostic peptides, mostly from collagen and keratins, allowing confident species discrimination even among taxonomically close organisms, such as sheep and goat. Unlike previous MS-based methods, mostly relying on peptide fingerprinting, the shotgun sequencing approach we describe aims to identify the complete extracted ancient proteome, without preselected specific targets. As an example, we report the identification, in one of the samples, of two peptides uniquely assigned to bovine foetal haemoglobin, indicating the production of skin from a calf slaughtered within the first months of its life. We conclude that MS-based peptide sequencing is a reliable method for species identification of samples from bogs. The mass spectrometry proteomics data were deposited in the ProteomeXchange Consortium with the dataset identifier PXD001029. PMID:25260035

  20. Modestobacter caceresii sp. nov., novel actinobacteria with an insight into their adaptive mechanisms for survival in extreme hyper-arid Atacama Desert soils.

    PubMed

    Busarakam, Kanungnid; Bull, Alan T; Trujillo, Martha E; Riesco, Raul; Sangal, Vartul; van Wezel, Gilles P; Goodfellow, Michael

    2016-06-01

    A polyphasic study was designed to determine the taxonomic provenance of three Modestobacter strains isolated from an extreme hyper-arid Atacama Desert soil. The strains, isolates KNN 45-1a, KNN 45-2b(T) and KNN 45-3b, were shown to have chemotaxonomic and morphological properties in line with their classification in the genus Modestobacter. The isolates had identical 16S rRNA gene sequences and formed a branch in the Modestobacter gene tree that was most closely related to the type strain of Modestobacter marinus (99.6% similarity). All three isolates were distinguished readily from Modestobacter type strains by a broad range of phenotypic properties, by qualitative and quantitative differences in fatty acid profiles and by BOX fingerprint patterns. The whole genome sequence of isolate KNN 45-2b(T) showed 89.3% average nucleotide identity, 90.1% (SD: 10.97%) average amino acid identity and a digital DNA-DNA hybridization value of 42.4±3.1 against the genome sequence of M. marinus DSM 45201(T), values consistent with its assignment to a separate species. On the basis of all of these data, it is proposed that the isolates be assigned to the genus Modestobacter as Modestobacter caceresii sp. nov. with isolate KNN 45-2b(T) (CECT 9023(T)=DSM 101691(T)) as the type strain. Analysis of the whole-genome sequence of M. caceresii KNN 45-2b(T), with 4683 open reading frames and a genome size of ∽4.96Mb, revealed the presence of genes and gene-clusters that encode for properties relevant to its adaptability to harsh environmental conditions prevalent in extreme hyper arid Atacama Desert soils. Copyright © 2016. Published by Elsevier GmbH.

  1. Analysis of the distal gut bacterial community by 454-pyrosequencing in captive giraffes (Giraffa camelopardalis).

    PubMed

    AlZahal, Ousama; Valdes, Eduardo V; McBride, Brian W

    2016-01-01

    The objective of this study was to characterize the structure of the fecal bacterial community of five giraffes (Giraffa camelopardalis) at Disney's Animal Kingdom, FL. Fecal genomic DNA was extracted and variable regions 1-3 of the 16S rRNA gene was PCR-amplified and then sequenced. The MOTHUR software-program was used for sequence processing, diversity analysis, and classification. A total of 181,689 non-chimeric bacterial sequences were obtained, and average number of sequences per sample was 36,338 -± 8,818. Sequences were assigned to 8,284 operational taxonomic units (OTU) with 95% of genetic similarity, which included 2,942 singletons (36%). Number of OTUs per sample was 2,554 ± 264. Samples were normalized and alpha (intra-sample) diversity indices; Chao1, Inverse Simpson, Shannon, and coverage were estimated as 3,712 ± 430, 116 -± 70, 6.1 ± 0.4, and 96 ± 1%, respectively. Thirteen phyla were detected and Firmicutes, Bacteroidetes, and Spirochaetes were the most dominant phyla (more than 2% of total sequences), and constituted 92% of the classified sequences, 66% of total sequences, and 43% of total OTUs. Our computation predicted that three OTUs were likely to be present in at least three of the five samples at greater than 1% dominance rate. These OTUs were Treponema, an unidentified OTU belonging to the order Bacteroidales, and Ruminococcus. This report was the first to characterize the bacterial community of the distal gut in giraffes utilizing fecal samples, and it demonstrated that the distal gut of giraffes is likely a potential reservoir for a number of undocumented species of bacteria. © 2015 Wiley Periodicals, Inc.

  2. DNABIT Compress - Genome compression algorithm.

    PubMed

    Rajarajeswari, Pothuraju; Apparao, Allam

    2011-01-22

    Data compression is concerned with how information is organized in data. Efficient storage means removal of redundancy from the data being stored in the DNA molecule. Data compression algorithms remove redundancy and are used to understand biologically important molecules. We present a compression algorithm, "DNABIT Compress" for DNA sequences based on a novel algorithm of assigning binary bits for smaller segments of DNA bases to compress both repetitive and non repetitive DNA sequence. Our proposed algorithm achieves the best compression ratio for DNA sequences for larger genome. Significantly better compression results show that "DNABIT Compress" algorithm is the best among the remaining compression algorithms. While achieving the best compression ratios for DNA sequences (Genomes),our new DNABIT Compress algorithm significantly improves the running time of all previous DNA compression programs. Assigning binary bits (Unique BIT CODE) for (Exact Repeats, Reverse Repeats) fragments of DNA sequence is also a unique concept introduced in this algorithm for the first time in DNA compression. This proposed new algorithm could achieve the best compression ratio as much as 1.58 bits/bases where the existing best methods could not achieve a ratio less than 1.72 bits/bases.

  3. Metagenomic analysis of microbial community of an Amazonian geothermal spring in Peru.

    PubMed

    Paul, Sujay; Cortez, Yolanda; Vera, Nadia; Villena, Gretty K; Gutiérrez-Correa, Marcel

    2016-09-01

    Aguas Calientes (AC) is an isolated geothermal spring located deep into the Amazon rainforest (7°21'12″ S, 75°00'54″ W) of Peru. This geothermal spring is slightly acidic (pH 5.0-7.0) in nature, with temperatures varying from 45 to 90 °C and continually fed by plant litter, resulting in a relatively high degree of total organic content (TOC). Pooled water sample was analyzed at 16S rRNA V3-V4 hypervariable region by amplicon metagenome sequencing on Illumina HiSeq platform. A total of 2,976,534 paired ends reads were generated which were assigned into 5434 numbers of OTUs. All the resulting 16S rRNA fragments were then classified into 58 bacterial phyla and 2 archaeal phyla. Proteobacteria (88.06%) was found to be the highest represented phyla followed by Thermi (6.43%), Firmicutes (3.41%) and Aquificae (1.10%), respectively. Crenarchaeota and Euryarchaeota were the only 2 archaeal phyla detected in this study with low abundance. Metagenomic sequences were deposited to SRA database which is available at NCBI with accession number SRX1809286. Functional categorization of the assigned OTUs was performed using PICRUSt tool. In COG analysis "Amino acid transport and metabolism" (8.5%) was found to be the highest represented category whereas among predicted KEGG pathways "Metabolism" (50.6%) was the most abundant. This is the first report of a high resolution microbial phylogenetic profile of an Amazonian hot spring.

  4. International Students in the Scientific and Technical Writing Class.

    ERIC Educational Resources Information Center

    Constantinides, Janet C.

    A course sequence for teaching the forms and formats of scientific and technical writing to English as a second language (ESL) learners is described. The first assignment, a letter of application, serves as a diagnostic indication of the student's ability. The second assignment, a narrative, is designed to define the importance of audience and…

  5. Developing the Inferential Reasoning of Basic Writers.

    ERIC Educational Resources Information Center

    Zeller, Robert

    1987-01-01

    Describes an assignment sequence using photographs to introduce developmental students to conventions of academic inquiry, and to give them practice analyzing and synthesizing. Reports that students link details observed in the photos to inferences drawn about them. Concentrates on the assignment linking a photo of E. B. White with an essay by him…

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fogh, R.H.; Mabbutt, B.C.; Kem, W.R.

    Sequence-specific assignments are reported for the 500-MHz H nuclear magnetic resonance (NMR) spectrum of the 48-residue polypeptide neurotoxin I from the sea anemone Stichodactyla helianthus (Sh I). Spin systems were first identified by using two-dimensional relayed or multiple quantum filtered correlation spectroscopy, double quantum spectroscopy, and spin lock experiments. Specific resonance assignments were then obtained from nuclear Overhauser enhancement (NOE) connectivities between protons from residues adjacent in the amino acid sequence. Of a total of 265 potentially observable resonances, 248 (i.e., 94%) were assigned, arising from 39 completely and 9 partially assigned amino acid spin systems. The secondary structure ofmore » Sh I was defined on the basis of the pattern of sequential NOE connectivities. NOEs between protons on separate strands of the polypeptide backbone, and backbone amide exchange rates. Sh I contains a four-stranded antiparallel {beta}-sheet encompassing residues 1-5, 16-24, 30-33, and 40-46, with a {beta}-bulge at residues 17 and 18 and a reverse turn, probably a type II {beta}-turn, involving residues 27-30. No evidence of {alpha}-helical structure was found.« less

  7. Method for assigning sites to projected generic nuclear power plants

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Holter, G.M.; Purcell, W.L.; Shutz, M.E.

    1986-07-01

    Pacific Northwest Laboratory developed a method for forecasting potential locations and startup sequences of nuclear power plants that will be required in the future but have not yet been specifically identified by electric utilities. Use of the method results in numerical ratings for potential nuclear power plant sites located in each of the 10 federal energy regions. The rating for each potential site is obtained from numerical factors assigned to each of 5 primary siting characteristics: (1) cooling water availability, (2) site land area, (3) power transmission land area, (4) proximity to metropolitan areas, and (5) utility plans for themore » site. The sequence of plant startups in each federal energy region is obtained by use of the numerical ratings and the forecasts of generic nuclear power plant startups obtained from the EIA Middle Case electricity forecast. Sites are assigned to generic plants in chronological order according to startup date.« less

  8. Whole genome sequencing options for bacterial strain typing and epidemiologic analysis based on single nucleotide polymorphism versus gene-by-gene-based approaches.

    PubMed

    Schürch, A C; Arredondo-Alonso, S; Willems, R J L; Goering, R V

    2018-04-01

    Whole genome sequence (WGS)-based strain typing finds increasing use in the epidemiologic analysis of bacterial pathogens in both public health as well as more localized infection control settings. This minireview describes methodologic approaches that have been explored for WGS-based epidemiologic analysis and considers the challenges and pitfalls of data interpretation. Personal collection of relevant publications. When applying WGS to study the molecular epidemiology of bacterial pathogens, genomic variability between strains is translated into measures of distance by determining single nucleotide polymorphisms in core genome alignments or by indexing allelic variation in hundreds to thousands of core genes, assigning types to unique allelic profiles. Interpreting isolate relatedness from these distances is highly organism specific, and attempts to establish species-specific cutoffs are unlikely to be generally applicable. In cases where single nucleotide polymorphism or core gene typing do not provide the resolution necessary for accurate assessment of the epidemiology of bacterial pathogens, inclusion of accessory gene or plasmid sequences may provide the additional required discrimination. As with all epidemiologic analysis, realizing the full potential of the revolutionary advances in WGS-based approaches requires understanding and dealing with issues related to the fundamental steps of data generation and interpretation. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  9. Comparative transcriptomic analysis of key genes involved in flavonoid biosynthetic pathway and identification of a flavonol synthase from Artemisia annua L.

    PubMed

    Liu, S; Liu, L; Tang, Y; Xiong, S; Long, J; Liu, Z; Tian, N

    2017-07-01

    The regulatory mechanism of flavonoids, which synergise anti-malarial and anti-cancer compounds in Artemisia annua, is still unclear. In this study, an anthocyanidin-accumulating mutant callus was induced from A. annua and comparative transcriptomic analysis of wild-type and mutant calli performed, based on the next-generation Illumina/Solexa sequencing platform and de novo assembly. A total of 82,393 unigenes were obtained and 34,764 unigenes were annotated in the public database. Among these, 87 unigenes were assigned to 14 structural genes involved in the flavonoid biosynthetic pathway and 37 unigenes were assigned to 17 structural genes related to metabolism of flavonoids. More than 30 unigenes were assigned to regulatory genes, including R2R3-MYB, bHLH and WD40, which might regulate flavonoid biosynthesis. A further 29 unigenes encoding flavonoid biosynthetic enzymes or transcription factors were up-regulated in the mutant, while 19 unigenes were down-regulated, compared with the wild type. Expression levels of nine genes involved in the flavonoid pathway were compared using semi-quantitative RT-PCR, and results were consistent with comparative transcriptomic analysis. Finally, a putative flavonol synthase gene (AaFLS1) was identified from enzyme assay in vitro and in vivo through heterogeneous expression, and confirmed comparative transcriptomic analysis of wild-type and mutant callus. The present work has provided important target genes for the regulation of flavonoid biosynthesis in A. annua. © 2017 German Botanical Society and The Royal Botanical Society of the Netherlands.

  10. [Taxonomic status of the Tyulek virus (TLKV) (Orthomyxoviridae, Quaranjavirus, Quaranfil group) isolated from the ticks Argas vulgaris Filippova, 1961 (Argasidae) from the birds burrow nest biotopes in the Kyrgyzstan].

    PubMed

    L'vov, D K; Al'khovskiĭ, S V; Shchelkanov, M Iu; Shchetinin, A M; Deriabin, P G; Aristova, V A; Gitel'man, A K; Samokhvalov, E I; Botikov, A G

    2014-01-01

    The Tyulek virus (TLKV) was isolated from the ticks Argas vulgaris Filippova, 1961 (Argasidae), collected from the burrow biotopes in multispecies birds colony in the Aksu river floodplain near Tyulek village (northern part of Chu Valley, Kyrgyzstan). Recently, the TLKV was assigned to the Quaranfil group (including the Quaranfil virus (QRFV), Johnston Atoll virus (JAV), Lake Chad virus) that is a novel genus of the Quaranjavirus in the Orthomyxoviridae family. In his work, the complete genome (ID GenBank KJ438647-8) sequence of the TLKV was determined using next-generation sequencing (Illumina platform). Comparison of deduced amino acid sequences shows closed relationship of the TLKV with QRFV and JAV (86% and 84% identity for PB1 and about 70% for PB2 and PA, respectively). The identity level of the TLKV and QRFV in outer glycoprotein GP is 72% and 80% for nucleotide and amino acid sequences, respectively. The phylogenetic analysis showed that the TLKV belongs to the genus of the Quaranjavirus in the family Orthomyxoviridae.

  11. FOAM (Functional Ontology Assignments for Metagenomes): A Hidden Markov Model (HMM) database with environmental focus

    DOE PAGES

    Prestat, Emmanuel; David, Maude M.; Hultman, Jenni; ...

    2014-09-26

    A new functional gene database, FOAM (Functional Ontology Assignments for Metagenomes), was developed to screen environmental metagenomic sequence datasets. FOAM provides a new functional ontology dedicated to classify gene functions relevant to environmental microorganisms based on Hidden Markov Models (HMMs). Sets of aligned protein sequences (i.e. ‘profiles’) were tailored to a large group of target KEGG Orthologs (KOs) from which HMMs were trained. The alignments were checked and curated to make them specific to the targeted KO. Within this process, sequence profiles were enriched with the most abundant sequences available to maximize the yield of accurate classifier models. An associatedmore » functional ontology was built to describe the functional groups and hierarchy. FOAM allows the user to select the target search space before HMM-based comparison steps and to easily organize the results into different functional categories and subcategories. FOAM is publicly available at http://portal.nersc.gov/project/m1317/FOAM/.« less

  12. M dwarf spectra from 0.6 to 1.5 micron - A spectral sequence, model atmosphere fitting, and the temperature scale

    NASA Technical Reports Server (NTRS)

    Kirkpatrick, J. D.; Kelly, Douglas M.; Rieke, George H.; Liebert, James; Allard, France; Wehrse, Rainer

    1993-01-01

    Red/infrared (0.6-1.5 micron) spectra are presented for a sequence of well-studied M dwarfs ranging from M2 through M9. A variety of temperature-sensitive features useful for spectral classification are identified. Using these features, the spectral data are compared to recent theoretical models, from which a temperature scale is assigned. The red portion of the model spectra provide reasonably good fits for dwarfs earlier than M6. For layer types, the infrared region provides a more reliable fit to the observations. In each case, the wavelength region used includes the broad peak of the energy distribution. For a given spectral type, the derived temperature sequence assigns higher temperatures than have earlier studies - the difference becoming more pronounced at lower luminosities. The positions of M dwarfs on the H-R diagram are, as a result, in closer agreement with theoretical tracks of the lower main sequence.

  13. Strategies for Achieving High Sequencing Accuracy for Low Diversity Samples and Avoiding Sample Bleeding Using Illumina Platform

    PubMed Central

    Mitra, Abhishek; Skrzypczak, Magdalena; Ginalski, Krzysztof; Rowicka, Maga

    2015-01-01

    Sequencing microRNA, reduced representation sequencing, Hi-C technology and any method requiring the use of in-house barcodes result in sequencing libraries with low initial sequence diversity. Sequencing such data on the Illumina platform typically produces low quality data due to the limitations of the Illumina cluster calling algorithm. Moreover, even in the case of diverse samples, these limitations are causing substantial inaccuracies in multiplexed sample assignment (sample bleeding). Such inaccuracies are unacceptable in clinical applications, and in some other fields (e.g. detection of rare variants). Here, we discuss how both problems with quality of low-diversity samples and sample bleeding are caused by incorrect detection of clusters on the flowcell during initial sequencing cycles. We propose simple software modifications (Long Template Protocol) that overcome this problem. We present experimental results showing that our Long Template Protocol remarkably increases data quality for low diversity samples, as compared with the standard analysis protocol; it also substantially reduces sample bleeding for all samples. For comprehensiveness, we also discuss and compare experimental results from alternative approaches to sequencing low diversity samples. First, we discuss how the low diversity problem, if caused by barcodes, can be avoided altogether at the barcode design stage. Second and third, we present modified guidelines, which are more stringent than the manufacturer’s, for mixing low diversity samples with diverse samples and lowering cluster density, which in our experience consistently produces high quality data from low diversity samples. Fourth and fifth, we present rescue strategies that can be applied when sequencing results in low quality data and when there is no more biological material available. In such cases, we propose that the flowcell be re-hybridized and sequenced again using our Long Template Protocol. Alternatively, we discuss how analysis can be repeated from saved sequencing images using the Long Template Protocol to increase accuracy. PMID:25860802

  14. Exploring the sequence-structure protein landscape in the glycosyltransferase family

    PubMed Central

    Zhang, Ziding; Kochhar, Sunil; Grigorov, Martin

    2003-01-01

    To understand the molecular basis of glycosyltransferases’ (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBank and Swissprot. First, GTF sequences were retrieved and classified into clusters, based on sequence similarity only. Intracluster sequence similarity was chosen sufficiently high to ensure that the same fold is found within a given cluster. Then, a representative sequence from each cluster was selected to compose a subset of GTF sequences. The members of this reduced set were processed by three different fold recognition methods: 3D-PSSM, FUGUE, and GeneFold. Finally, the results from different fold recognition methods were analyzed and compared to sequence-similarity search methods (i.e., BLAST and PSI-BLAST). It was established that the folds of about 70% of all currently known GTF sequences can be confidently assigned by fold recognition methods, a value which is higher than the fold identification rate based on sequence comparison alone (48% for BLAST and 64% for PSI-BLAST). The identified folds were submitted to 3D clustering, and we found that most of the GTF sequences adopt the typical GTF A or GTF B folds. Our results indicate a lack of evidence that new GTF folds (i.e., folds other than GTF A and B) exist. Based on cases where fold identification was not possible, we suggest several sequences as the most promising targets for a structural genomics initiative focused on the GTF protein family. PMID:14500887

  15. MARTA: a suite of Java-based tools for assigning taxonomic status to DNA sequences.

    PubMed

    Horton, Matthew; Bodenhausen, Natacha; Bergelson, Joy

    2010-02-15

    We have created a suite of Java-based software to better provide taxonomic assignments to DNA sequences. We anticipate that the program will be useful for protistologists, virologists, mycologists and other microbial ecologists. The program relies on NCBI utilities including the BLAST software and Taxonomy database and is easily manipulated at the command-line to specify a BLAST candidate's query-coverage or percent identity requirements; other options include the ability to set minimal consensus requirements (%) for each of the eight major taxonomic ranks (Domain, Kingdom, Phylum, ...) and whether to consider lower scoring candidates when the top-hit lacks taxonomic classification.

  16. A pipeline of programs for collecting and analyzing group II intron retroelement sequences from GenBank

    PubMed Central

    2013-01-01

    Background Accurate and complete identification of mobile elements is a challenging task in the current era of sequencing, given their large numbers and frequent truncations. Group II intron retroelements, which consist of a ribozyme and an intron-encoded protein (IEP), are usually identified in bacterial genomes through their IEP; however, the RNA component that defines the intron boundaries is often difficult to identify because of a lack of strong sequence conservation corresponding to the RNA structure. Compounding the problem of boundary definition is the fact that a majority of group II intron copies in bacteria are truncated. Results Here we present a pipeline of 11 programs that collect and analyze group II intron sequences from GenBank. The pipeline begins with a BLAST search of GenBank using a set of representative group II IEPs as queries. Subsequent steps download the corresponding genomic sequences and flanks, filter out non-group II introns, assign introns to phylogenetic subclasses, filter out incomplete and/or non-functional introns, and assign IEP sequences and RNA boundaries to the full-length introns. In the final step, the redundancy in the data set is reduced by grouping introns into sets of ≥95% identity, with one example sequence chosen to be the representative. Conclusions These programs should be useful for comprehensive identification of group II introns in sequence databases as data continue to rapidly accumulate. PMID:24359548

  17. Unraveling the Complexities of Life Sciences Data.

    PubMed

    Higdon, Roger; Haynes, Winston; Stanberry, Larissa; Stewart, Elizabeth; Yandl, Gregory; Howard, Chris; Broomall, William; Kolker, Natali; Kolker, Eugene

    2013-03-01

    The life sciences have entered into the realm of big data and data-enabled science, where data can either empower or overwhelm. These data bring the challenges of the 5 Vs of big data: volume, veracity, velocity, variety, and value. Both independently and through our involvement with DELSA Global (Data-Enabled Life Sciences Alliance, DELSAglobal.org), the Kolker Lab ( kolkerlab.org ) is creating partnerships that identify data challenges and solve community needs. We specialize in solutions to complex biological data challenges, as exemplified by the community resource of MOPED (Model Organism Protein Expression Database, MOPED.proteinspire.org ) and the analysis pipeline of SPIRE (Systematic Protein Investigative Research Environment, PROTEINSPIRE.org ). Our collaborative work extends into the computationally intensive tasks of analysis and visualization of millions of protein sequences through innovative implementations of sequence alignment algorithms and creation of the Protein Sequence Universe tool (PSU). Pushing into the future together with our collaborators, our lab is pursuing integration of multi-omics data and exploration of biological pathways, as well as assigning function to proteins and porting solutions to the cloud. Big data have come to the life sciences; discovering the knowledge in the data will bring breakthroughs and benefits.

  18. Draft genome sequence of a CTX-M-8, CTX-M-55 and FosA3 co-producing Escherichia coli ST117/B2 isolated from an asymptomatic carrier.

    PubMed

    Fernandes, Miriam R; Sellera, Fábio P; Moura, Quézia; Souza, Tiago A; Lincopan, Nilton

    2018-03-01

    Asymptomatic carriers can act as reservoirs of multidrug-resistant (MDR) bacteria. The aim of this study was to describe the draft genome sequence of a MDR Escherichia coli lineage recovered from a faecal sample of a healthy carrier. Genomic DNA was sequenced on an Illumina NextSeq platform. Sequence reads were de novo assembled using CLC Genomics Workbench and the whole genome sequence was evaluated through bioinformatics tools available from the Center of Genomic Epidemiology as well as additional in silico analysis. The genome size was calculated as 5178340 bp, with 5442 protein-coding sequences and 5492 total genes. Presence of the bla CTX-M-8 , bla CTX-M-55 and fosA3 genes was detected in addition to other antimicrobial resistance genes. Interestingly, the strain was assigned to serotype O8:H4-fimH97 and was classified within the highly virulent phylogroup B2. This draft genome can provide helpful information to elucidate genetic features that contribute to colonisation and adaptation of MDR and virulent pathogens in asymptomatic carriers. Copyright © 2018 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.

  19. Anchoring genome sequence to chromosomes of the central bearded dragon (Pogona vitticeps) enables reconstruction of ancestral squamate macrochromosomes and identifies sequence content of the Z chromosome.

    PubMed

    Deakin, Janine E; Edwards, Melanie J; Patel, Hardip; O'Meally, Denis; Lian, Jinmin; Stenhouse, Rachael; Ryan, Sam; Livernois, Alexandra M; Azad, Bhumika; Holleley, Clare E; Li, Qiye; Georges, Arthur

    2016-06-10

    Squamates (lizards and snakes) are a speciose lineage of reptiles displaying considerable karyotypic diversity, particularly among lizards. Understanding the evolution of this diversity requires comparison of genome organisation between species. Although the genomes of several squamate species have now been sequenced, only the green anole lizard has any sequence anchored to chromosomes. There is only limited gene mapping data available for five other squamates. This makes it difficult to reconstruct the events that have led to extant squamate karyotypic diversity. The purpose of this study was to anchor the recently sequenced central bearded dragon (Pogona vitticeps) genome to chromosomes to trace the evolution of squamate chromosomes. Assigning sequence to sex chromosomes was of particular interest for identifying candidate sex determining genes. By using two different approaches to map conserved blocks of genes, we were able to anchor approximately 42 % of the dragon genome sequence to chromosomes. We constructed detailed comparative maps between dragon, anole and chicken genomes, and where possible, made broader comparisons across Squamata using cytogenetic mapping information for five other species. We show that squamate macrochromosomes are relatively well conserved between species, supporting findings from previous molecular cytogenetic studies. Macrochromosome diversity between members of the Toxicofera clade has been generated by intrachromosomal, and a small number of interchromosomal, rearrangements. We reconstructed the ancestral squamate macrochromosomes by drawing upon comparative cytogenetic mapping data from seven squamate species and propose the events leading to the arrangements observed in representative species. In addition, we assigned over 8 Mbp of sequence containing 219 genes to the Z chromosome, providing a list of genes to begin testing as candidate sex determining genes. Anchoring of the dragon genome has provided substantial insight into the evolution of squamate genomes, enabling us to reconstruct ancestral macrochromosome arrangements at key positions in the squamate phylogeny, demonstrating that fusions between macrochromosomes or fusions of macrochromosomes and microchromosomes, have played an important role during the evolution of squamate genomes. Assigning sequence to the sex chromosomes has identified NR5A1 as a promising candidate sex determining gene in the dragon.

  20. Successful enrichment and recovery of whole mitochondrial genomes from ancient human dental calculus.

    PubMed

    Ozga, Andrew T; Nieves-Colón, Maria A; Honap, Tanvi P; Sankaranarayanan, Krithivasan; Hofman, Courtney A; Milner, George R; Lewis, Cecil M; Stone, Anne C; Warinner, Christina

    2016-06-01

    Archaeological dental calculus is a rich source of host-associated biomolecules. Importantly, however, dental calculus is more accurately described as a calcified microbial biofilm than a host tissue. As such, concerns regarding destructive analysis of human remains may not apply as strongly to dental calculus, opening the possibility of obtaining human health and ancestry information from dental calculus in cases where destructive analysis of conventional skeletal remains is not permitted. Here we investigate the preservation of human mitochondrial DNA (mtDNA) in archaeological dental calculus and its potential for full mitochondrial genome (mitogenome) reconstruction in maternal lineage ancestry analysis. Extracted DNA from six individuals at the 700-year-old Norris Farms #36 cemetery in Illinois was enriched for mtDNA using in-solution capture techniques, followed by Illumina high-throughput sequencing. Full mitogenomes (7-34×) were successfully reconstructed from dental calculus for all six individuals, including three individuals who had previously tested negative for DNA preservation in bone using conventional PCR techniques. Mitochondrial haplogroup assignments were consistent with previously published findings, and additional comparative analysis of paired dental calculus and dentine from two individuals yielded equivalent haplotype results. All dental calculus samples exhibited damage patterns consistent with ancient DNA, and mitochondrial sequences were estimated to be 92-100% endogenous. DNA polymerase choice was found to impact error rates in downstream sequence analysis, but these effects can be mitigated by greater sequencing depth. Dental calculus is a viable alternative source of human DNA that can be used to reconstruct full mitogenomes from archaeological remains. Am J Phys Anthropol 160:220-228, 2016. © 2016 The Authors American Journal of Physical Anthropology Published by Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  1. Successful enrichment and recovery of whole mitochondrial genomes from ancient human dental calculus

    PubMed Central

    Ozga, Andrew T.; Nieves‐Colón, Maria A.; Honap, Tanvi P.; Sankaranarayanan, Krithivasan; Hofman, Courtney A.; Milner, George R.; Lewis, Cecil M.; Stone, Anne C.

    2016-01-01

    ABSTRACT Objectives Archaeological dental calculus is a rich source of host‐associated biomolecules. Importantly, however, dental calculus is more accurately described as a calcified microbial biofilm than a host tissue. As such, concerns regarding destructive analysis of human remains may not apply as strongly to dental calculus, opening the possibility of obtaining human health and ancestry information from dental calculus in cases where destructive analysis of conventional skeletal remains is not permitted. Here we investigate the preservation of human mitochondrial DNA (mtDNA) in archaeological dental calculus and its potential for full mitochondrial genome (mitogenome) reconstruction in maternal lineage ancestry analysis. Materials and Methods Extracted DNA from six individuals at the 700‐year‐old Norris Farms #36 cemetery in Illinois was enriched for mtDNA using in‐solution capture techniques, followed by Illumina high‐throughput sequencing. Results Full mitogenomes (7–34×) were successfully reconstructed from dental calculus for all six individuals, including three individuals who had previously tested negative for DNA preservation in bone using conventional PCR techniques. Mitochondrial haplogroup assignments were consistent with previously published findings, and additional comparative analysis of paired dental calculus and dentine from two individuals yielded equivalent haplotype results. All dental calculus samples exhibited damage patterns consistent with ancient DNA, and mitochondrial sequences were estimated to be 92–100% endogenous. DNA polymerase choice was found to impact error rates in downstream sequence analysis, but these effects can be mitigated by greater sequencing depth. Discussion Dental calculus is a viable alternative source of human DNA that can be used to reconstruct full mitogenomes from archaeological remains. Am J Phys Anthropol 160:220–228, 2016. © 2016 The Authors American Journal of Physical Anthropology Published by Wiley Periodicals, Inc. PMID:26989998

  2. Analysis of the endogenous peptide profile of milk: identification of 248 mainly casein-derived peptides.

    PubMed

    Baum, Florian; Fedorova, Maria; Ebner, Jennifer; Hoffmann, Ralf; Pischetsrieder, Monika

    2013-12-06

    Milk is an excellent source of bioactive peptides. However, the composition of the native milk peptidome has only been partially elucidated. The present study applied matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS) directly or after prefractionation of the milk peptides by reverse-phase high-performance liquid chromatography (RP-HPLC) or OFFGEL fractionation for the comprehensive analysis of the peptide profile of raw milk. The peptide sequences were determined by MALDI-TOF/TOF or nano-ultra-performance liquid chromatography-nanoelectrospray ionization-LTQ-Orbitrap-MS. Direct MALDI-TOF-MS analysis led to the assignment of 57 peptides. Prefractionation by both complementary methods led to the assignment of another 191 peptides. Most peptides originate from α(S1)-casein, followed by β-casein, and α(S2)-casein. κ-Casein and whey proteins seem to play only a minor role as peptide precursors. The formation of many, but not all, peptides could be explained by the activity of the endogenous peptidases, plasmin or cathepsin D, B, and G. Database searches revealed the presence of 22 peptides with established physiological function, including those with angiotensin-converting-enzyme (ACE) inhibitory, immunomodulating, or antimicrobial activity.

  3. An efficient randomized algorithm for contact-based NMR backbone resonance assignment.

    PubMed

    Kamisetty, Hetunandan; Bailey-Kellogg, Chris; Pandurangan, Gopal

    2006-01-15

    Backbone resonance assignment is a critical bottleneck in studies of protein structure, dynamics and interactions by nuclear magnetic resonance (NMR) spectroscopy. A minimalist approach to assignment, which we call 'contact-based', seeks to dramatically reduce experimental time and expense by replacing the standard suite of through-bond experiments with the through-space (nuclear Overhauser enhancement spectroscopy, NOESY) experiment. In the contact-based approach, spectral data are represented in a graph with vertices for putative residues (of unknown relation to the primary sequence) and edges for hypothesized NOESY interactions, such that observed spectral peaks could be explained if the residues were 'close enough'. Due to experimental ambiguity, several incorrect edges can be hypothesized for each spectral peak. An assignment is derived by identifying consistent patterns of edges (e.g. for alpha-helices and beta-sheets) within a graph and by mapping the vertices to the primary sequence. The key algorithmic challenge is to be able to uncover these patterns even when they are obscured by significant noise. This paper develops, analyzes and applies a novel algorithm for the identification of polytopes representing consistent patterns of edges in a corrupted NOESY graph. Our randomized algorithm aggregates simplices into polytopes and fixes inconsistencies with simple local modifications, called rotations, that maintain most of the structure already uncovered. In characterizing the effects of experimental noise, we employ an NMR-specific random graph model in proving that our algorithm gives optimal performance in expected polynomial time, even when the input graph is significantly corrupted. We confirm this analysis in simulation studies with graphs corrupted by up to 500% noise. Finally, we demonstrate the practical application of the algorithm on several experimental beta-sheet datasets. Our approach is able to eliminate a large majority of noise edges and to uncover large consistent sets of interactions. Our algorithm has been implemented in the platform-independent Python code. The software can be freely obtained for academic use by request from the authors.

  4. Whole-genome sequence analysis of the Mycobacterium avium complex and proposal of the transfer of Mycobacterium yongonense to Mycobacterium intracellulare subsp. yongonense subsp. nov.

    PubMed

    Castejon, Maria; Menéndez, Maria Carmen; Comas, Iñaki; Vicente, Ana; Garcia, Maria J

    2018-06-01

    Bacterial whole-genome sequences contain informative features of their evolutionary pathways. Comparison of whole-genome sequences have become the method of choice for classification of prokaryotes, thus allowing the identification of bacteria from an evolutionary perspective, and providing data to resolve some current controversies. Currently, controversy exists about the assignment of members of the Mycobacterium avium complex, as is for the cases of Mycobacterium yongonense and 'Mycobacterium indicus pranii'. These two mycobacteria, closely related to Mycobacterium intracellulare on the basis of standard phenotypic and single gene-sequences comparisons, were not considered a member of such species on the basis on some particular differences displayed by a single strain. Whole-genome sequence comparison procedures, namely the average nucleotide identity and the genome distance, showed that those two mycobacteria should be considered members of the species M. intracellulare. The results were confirmed with other whole-genome comparison supplementary methods. According to the data provided, Mycobacterium yongonense and 'Mycobacterium indicus pranii' should be considered and renamed and included as members of M. intracellulare. This study highlights the problems caused when a novel species is accepted on the basis of a single strain, as was the case for M. yongonense. Based mainly on whole-genome sequence analysis, we conclude that M. yongonense should be reclassified as a subspecies of Mycobacterium intracellulareas Mycobacterium intracellularesubsp. yongonense and 'Mycobacterium indicus pranii' classified in the same subspecies as the type strain of Mycobacterium intracellulare and classified as Mycobacterium intracellularesubsp. intracellulare.

  5. Massively parallel sequencing and the emergence of forensic genomics: Defining the policy and legal issues for law enforcement.

    PubMed

    Scudder, Nathan; McNevin, Dennis; Kelty, Sally F; Walsh, Simon J; Robertson, James

    2018-03-01

    Use of DNA in forensic science will be significantly influenced by new technology in coming years. Massively parallel sequencing and forensic genomics will hasten the broadening of forensic DNA analysis beyond short tandem repeats for identity towards a wider array of genetic markers, in applications as diverse as predictive phenotyping, ancestry assignment, and full mitochondrial genome analysis. With these new applications come a range of legal and policy implications, as forensic science touches on areas as diverse as 'big data', privacy and protected health information. Although these applications have the potential to make a more immediate and decisive forensic intelligence contribution to criminal investigations, they raise policy issues that will require detailed consideration if this potential is to be realised. The purpose of this paper is to identify the scope of the issues that will confront forensic and user communities. Copyright © 2017 The Chartered Society of Forensic Sciences. All rights reserved.

  6. Listeria costaricensis sp. nov.

    PubMed

    Núñez-Montero, Kattia; Leclercq, Alexandre; Moura, Alexandra; Vales, Guillaume; Peraza, Johnny; Pizarro-Cerdá, Javier; Lecuit, Marc

    2018-03-01

    A bacterial strain isolated from a food processing drainage system in Costa Rica fulfilled the criteria as belonging to the genus Listeria, but could not be assigned to any of the known species. Phylogenetic analysis based on the 16S rRNA gene revealed highest sequence similarity with the type strain of Listeria floridensis (98.7 %). Phylogenetic analysis based on Listeria core genomes placed the novel taxon within the Listeria fleishmannii, L. floridensis and Listeria aquatica clade (Listeria sensu lato). Whole-genome sequence analyses based on the average nucleotide blast identity (ANI<80 %) indicated that this isolate belonged to a novel species. Results of pairwise amino acid identity (AAI>70 %) and percentage of conserved proteins (POCP>68 %) with currently known Listeria species, as well as of biochemical characterization, confirmed that the strain constituted a novel species within the genus Listeria. The name Listeria costaricensis sp. nov. is proposed for the novel species, and is represented by the type strain CLIP 2016/00682 T (=CIP 111400 T =DSM 105474 T ).

  7. Staphylococcus petrasii subsp. pragensis subsp. nov., occurring in human clinical material.

    PubMed

    Švec, Pavel; De Bel, Annelies; Sedláček, Ivo; Petráš, Petr; Gelbíčová, Tereza; Černohlávková, Jitka; Mašlanˇová, Ivana; Cnockaert, Margo; Varbanovová, Ivana; Echahidi, Fedoua; Vandamme, Peter; Pantuček, Roman

    2015-07-01

    Seven coagulase-negative, oxidase-negative and novobiocin-susceptible staphylococci assigned tentatively as Staphylococcus petrasii were investigated in this study in order to elucidate their taxonomic position. All strains were initially shown to form a genetically homogeneous group separated from remaining species of the genus Staphylococcus by using a repetitive sequence-based PCR fingerprinting with the (GTG)5 primer. Phylogenetic analysis based on 16S rRNA gene, hsp60, rpoB, dnaJ, gap and tuf sequences showed that the group is closely related to Staphylococcus petrasii but separated from the three hitherto known subspecies, S. petrasii subsp. petrasii, S. petrasii subsp. croceilyticus and S. petrasii subsp. jettensis. Further investigation using automated ribotyping, MALDI-TOF mass spectrometry, fatty acid methyl ester analysis, DNA-DNA hybridization and extensive biotyping confirmed that the analysed group represents a novel subspecies within S. petrasii, for which the name Staphylococcus petrasii subsp. pragensis subsp. nov. is proposed. The type strain is NRL/St 12/356(T) ( = CCM 8529(T) = LMG 28327(T)).

  8. Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses

    PubMed Central

    Callahan, Ben J.; Sankaran, Kris; Fukuyama, Julia A.; McMurdie, Paul J.; Holmes, Susan P.

    2016-01-01

    High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or OTU composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, whether parametric or nonparametric. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests and nonparametric testing using community networks and the ggnetwork package. PMID:27508062

  9. Genome Structure of the Legume, Lotus japonicus

    PubMed Central

    Sato, Shusei; Nakamura, Yasukazu; Kaneko, Takakazu; Asamizu, Erika; Kato, Tomohiko; Nakao, Mitsuteru; Sasamoto, Shigemi; Watanabe, Akiko; Ono, Akiko; Kawashima, Kumiko; Fujishiro, Tsunakazu; Katoh, Midori; Kohara, Mitsuyo; Kishida, Yoshie; Minami, Chiharu; Nakayama, Shinobu; Nakazaki, Naomi; Shimizu, Yoshimi; Shinpo, Sayaka; Takahashi, Chika; Wada, Tsuyuko; Yamada, Manabu; Ohmido, Nobuko; Hayashi, Makoto; Fukui, Kiichi; Baba, Tomoya; Nakamichi, Tomoko; Mori, Hirotada; Tabata, Satoshi

    2008-01-01

    The legume Lotus japonicus has been widely used as a model system to investigate the genetic background of legume-specific phenomena such as symbiotic nitrogen fixation. Here, we report structural features of the L. japonicus genome. The 315.1-Mb sequences determined in this and previous studies correspond to 67% of the genome (472 Mb), and are likely to cover 91.3% of the gene space. Linkage mapping anchored 130-Mb sequences onto the six linkage groups. A total of 10 951 complete and 19 848 partial structures of protein-encoding genes were assigned to the genome. Comparative analysis of these genes revealed the expansion of several functional domains and gene families that are characteristic of L. japonicus. Synteny analysis detected traces of whole-genome duplication and the presence of synteny blocks with other plant genomes to various degrees. This study provides the first opportunity to look into the complex and unique genetic system of legumes. PMID:18511435

  10. Open Source Tools for Seismicity Analysis

    NASA Astrophysics Data System (ADS)

    Powers, P.

    2010-12-01

    The spatio-temporal analysis of seismicity plays an important role in earthquake forecasting and is integral to research on earthquake interactions and triggering. For instance, the third version of the Uniform California Earthquake Rupture Forecast (UCERF), currently under development, will use Epidemic Type Aftershock Sequences (ETAS) as a model for earthquake triggering. UCERF will be a "living" model and therefore requires robust, tested, and well-documented ETAS algorithms to ensure transparency and reproducibility. Likewise, as earthquake aftershock sequences unfold, real-time access to high quality hypocenter data makes it possible to monitor the temporal variability of statistical properties such as the parameters of the Omori Law and the Gutenberg Richter b-value. Such statistical properties are valuable as they provide a measure of how much a particular sequence deviates from expected behavior and can be used when assigning probabilities of aftershock occurrence. To address these demands and provide public access to standard methods employed in statistical seismology, we present well-documented, open-source JavaScript and Java software libraries for the on- and off-line analysis of seismicity. The Javascript classes facilitate web-based asynchronous access to earthquake catalog data and provide a framework for in-browser display, analysis, and manipulation of catalog statistics; implementations of this framework will be made available on the USGS Earthquake Hazards website. The Java classes, in addition to providing tools for seismicity analysis, provide tools for modeling seismicity and generating synthetic catalogs. These tools are extensible and will be released as part of the open-source OpenSHA Commons library.

  11. Transcriptome profiling to discover putative genes associated with paraquat resistance in goosegrass (Eleusine indica L.).

    PubMed

    An, Jing; Shen, Xuefeng; Ma, Qibin; Yang, Cunyi; Liu, Simin; Chen, Yong

    2014-01-01

    Goosegrass (Eleusine indica L.), a serious annual weed in the world, has evolved resistance to several herbicides including paraquat, a non-selective herbicide. The mechanism of paraquat resistance in weeds is only partially understood. To further study the molecular mechanism underlying paraquat resistance in goosegrass, we performed transcriptome analysis of susceptible and resistant biotypes of goosegrass with or without paraquat treatment. The RNA-seq libraries generated 194,716,560 valid reads with an average length of 91.29 bp. De novo assembly analysis produced 158,461 transcripts with an average length of 1153.74 bp and 100,742 unigenes with an average length of 712.79 bp. Among these, 25,926 unigenes were assigned to 65 GO terms that contained three main categories. A total of 13,809 unigenes with 1,208 enzyme commission numbers were assigned to 314 predicted KEGG metabolic pathways, and 12,719 unigenes were categorized into 25 KOG classifications. Furthermore, our results revealed that 53 genes related to reactive oxygen species scavenging, 10 genes related to polyamines and 18 genes related to transport were differentially expressed in paraquat treatment experiments. The genes related to polyamines and transport are likely potential candidate genes that could be further investigated to confirm their roles in paraquat resistance of goosegrass. This is the first large-scale transcriptome sequencing of E. indica using the Illumina platform. Potential genes involved in paraquat resistance were identified from the assembled sequences. The transcriptome data may serve as a reference for further analysis of gene expression and functional genomics studies, and will facilitate the study of paraquat resistance at the molecular level in goosegrass.

  12. Transcriptome Profiling to Discover Putative Genes Associated with Paraquat Resistance in Goosegrass (Eleusine indica L.)

    PubMed Central

    An, Jing; Shen, Xuefeng; Ma, Qibin; Yang, Cunyi; Liu, Simin; Chen, Yong

    2014-01-01

    Background Goosegrass (Eleusine indica L.), a serious annual weed in the world, has evolved resistance to several herbicides including paraquat, a non-selective herbicide. The mechanism of paraquat resistance in weeds is only partially understood. To further study the molecular mechanism underlying paraquat resistance in goosegrass, we performed transcriptome analysis of susceptible and resistant biotypes of goosegrass with or without paraquat treatment. Results The RNA-seq libraries generated 194,716,560 valid reads with an average length of 91.29 bp. De novo assembly analysis produced 158,461 transcripts with an average length of 1153.74 bp and 100,742 unigenes with an average length of 712.79 bp. Among these, 25,926 unigenes were assigned to 65 GO terms that contained three main categories. A total of 13,809 unigenes with 1,208 enzyme commission numbers were assigned to 314 predicted KEGG metabolic pathways, and 12,719 unigenes were categorized into 25 KOG classifications. Furthermore, our results revealed that 53 genes related to reactive oxygen species scavenging, 10 genes related to polyamines and 18 genes related to transport were differentially expressed in paraquat treatment experiments. The genes related to polyamines and transport are likely potential candidate genes that could be further investigated to confirm their roles in paraquat resistance of goosegrass. Conclusion This is the first large-scale transcriptome sequencing of E. indica using the Illumina platform. Potential genes involved in paraquat resistance were identified from the assembled sequences. The transcriptome data may serve as a reference for further analysis of gene expression and functional genomics studies, and will facilitate the study of paraquat resistance at the molecular level in goosegrass. PMID:24927422

  13. Candida sirachaensis sp. nov. and Candida sakaeoensis sp. nov. two anamorphic yeast species from phylloplane in Thailand.

    PubMed

    Limtong, Savitree; Koowadjanakul, Nampueng; Jindamorakot, Sasitorn; Yongmanitchai, Wichien; Nakase, Takashi

    2012-08-01

    Three strains (LM008(T), LM068 and LM078(T)), representing two novel yeast species were isolated from the phylloplane of three plant species by an enrichment technique. On the basis of morphological, biochemical, physiological and chemotaxonomic characteristics, and the sequence analysis of the D1/D2 domain of the large subunit rRNA gene and the internal spacer region, the three strains were assigned as two novel Candida species. Strain LM008(T) was assigned to be Candida sirachaensis sp. nov. (type strain LM008(T) = BCC 47628(T) = NBRC 108605(T) CBS 12094(T)) in the Starmerella clade. Two strains (LM068 and LM078(T)) represent a single species in the Lodderomyces-Spathaspora clade for which the name Candida sakaeoensis sp. nov. is proposed with the type strain LM078(T) = BCC 47632(T) = NBRC 108895(T) = CBS 12318(T).

  14. A Simultaneous Approach to Optimizing Treatment Assignments with Mastery Scores. Research Report 89-5.

    ERIC Educational Resources Information Center

    Vos, Hans J.

    An approach to simultaneous optimization of assignments of subjects to treatments followed by an end-of-mastery test is presented using the framework of Bayesian decision theory. Focus is on demonstrating how rules for the simultaneous optimization of sequences of decisions can be found. The main advantages of the simultaneous approach, compared…

  15. Near-Infrared Diode Laser Spectroscopy of FeC in the 0.8-μm Region: A Simultaneous Analysis of the X3Δ i and [3.8] 1Δ States

    NASA Astrophysics Data System (ADS)

    Fujitake, Masaharu; Toba, Aki; Mori, Masaya; Miyazawa, Fuyuki; Ohashi, Nobukimi; Aiuchi, Kosuke; Shibuya, Kazuhiko

    2001-08-01

    The near-infrared absorption spectra of the FeC radical have been measured by diode laser spectroscopy in the regions from 11 810 to 13 200 cm-1. Rotationally resolved 28 new vibronic bands of the 56Fe12C major isotopic species were observed as well as two known bands. Five of them were assigned as the v‧=v″=0-4 sequence bands in the [13.1]3Φ4←X3Δ3 system, six are the v‧=v″=0-5 sequences in the [13.18]Ω=3←X3Δ2 system, and five were assigned as the v‧=v″=0-4 sequences in the [13.5]Ω=2←X3Δ1 system. The X3Δ1 state of FeC is reported for the first time but we found that the rotational constant for this state of 56FeC is very close to that for the X3Δ3 state of 54FeC. Transitions of the 54FeC minor isotopic species for the two known bands were thus reinvestigated. For the unclassified 11 new bands, the lower states were assigned as the v=0-3 states in X3Δi. The energy difference between the Ω=2 and 3 sublevels in the X3Δi state could be determined to be 329.8059±0.0005 cm-1 as a results of the analysis of the two of new bands, [12.32]←X3Δ3v=0 and [12.32]←X3Δ2v=0 observed near 12 317 cm-1 and 11 987 cm-1, respectively. Stimulated emission pumping spectra were also recorded near 17 600 cm-1 to examine the low-lying electronic state which had been observed 3460 cm-1 above the X3Δ2 state (K. Aiuchi and K. Tsuji and K. Shibuya, Chem. Phys. Lett.309, 229-233 (1999)), and it was assigned as the [3.8]1Δ state arising from the same configuration as that of X3Δi (…3π41δ39σ1). In addition, the v=0-0 and 1-1 bands of the [16.2]Ω=3←[3.8]1Δ system were observed by near-infrared diode laser spectroscopy. The lower state combination differences were calculated for the possible sets of the energy levels in the X3Δi and [3.8]1Δ states with v=0-5 by using the present result and the other spectroscopic data reported. A simultaneous analysis was carried out by using the combination differences to determine the molecular constants for the X3Δi and [3.8]1Δ states as well as the off-diagonal spin-orbit interaction terms. The [13.18]Ω=3 state and the [13.5]Ω=2 state were concluded to be other two spin sublevels for the [13.1]3Φ4 state and the [16.2]Ω=3 state was assigned as an isoconfigurational 1Φ state.

  16. Characterization by Suppression Subtractive Hybridization of Transcripts That Are Differentially Expressed in Leaves of Anthracnose-Resistant Ramie Cultivar.

    PubMed

    Xuxia, Wang; Jie, Chen; Bo, Wang; Lijun, Liu; Hui, Jiang; Diluo, Tang; Dingxiang, Peng

    2012-01-01

    For the purpose of screening putative anthracnose resistance-related genes of ramie ( Boehmeria nivea L. Gaud), a cDNA library was constructed by suppression subtractive hybridization using anthracnose-resistant cultivar Huazhu no. 4. The cDNAs from Huazhu no. 4, which were infected with Colletotrichum gloeosporioides , were used as the tester and cDNAs from uninfected Huazhu no. 4 as the driver. Sequencing analysis and homology searching showed that these clones represented 132 single genes, which were assigned to functional categories, including 14 putative cellular functions, according to categories established for Arabidopsis . These 132 genes included 35 disease resistance and stress tolerance-related genes including putative heat-shock protein 90, metallothionein, PR-1.2 protein, catalase gene, WRKY family genes, and proteinase inhibitor-like protein. Partial disease-related genes were further analyzed by reverse transcription PCR and RNA gel blot. These expressed sequence tags are the first anthracnose resistance-related expressed sequence tags reported in ramie.

  17. Genotyping of ancient Mycobacterium tuberculosis strains reveals historic genetic diversity.

    PubMed

    Müller, Romy; Roberts, Charlotte A; Brown, Terence A

    2014-04-22

    The evolutionary history of the Mycobacterium tuberculosis complex (MTBC) has previously been studied by analysis of sequence diversity in extant strains, but not addressed by direct examination of strain genotypes in archaeological remains. Here, we use ancient DNA sequencing to type 11 single nucleotide polymorphisms and two large sequence polymorphisms in the MTBC strains present in 10 archaeological samples from skeletons from Britain and Europe dating to the second-nineteenth centuries AD. The results enable us to assign the strains to groupings and lineages recognized in the extant MTBC. We show that at least during the eighteenth-nineteenth centuries AD, strains of M. tuberculosis belonging to different genetic groups were present in Britain at the same time, possibly even at a single location, and we present evidence for a mixed infection in at least one individual. Our study shows that ancient DNA typing applied to multiple samples can provide sufficiently detailed information to contribute to both archaeological and evolutionary knowledge of the history of tuberculosis.

  18. MS/MS Digital Readout: Analysis of Binary Information Encoded in the Monomer Sequences of Poly(triazole amide)s.

    PubMed

    Amalian, Jean-Arthur; Trinh, Thanh Tam; Lutz, Jean-François; Charles, Laurence

    2016-04-05

    Tandem mass spectrometry was evaluated as a reliable sequencing methodology to read codes encrypted in monodisperse sequence-coded oligo(triazole amide)s. The studied oligomers were composed of monomers containing a triazole ring, a short ethylene oxide segment, and an amide group as well as a short alkyl chain (propyl or isobutyl) which defined the 0/1 molecular binary code. Using electrospray ionization, oligo(triazole amide)s were best ionized as protonated molecules and were observed to adopt a single charge state, suggesting that adducted protons were located on every other monomer unit. Upon collisional activation, cleavages of the amide bond and of one ether bond were observed to proceed in each monomer, yielding two sets of complementary product ions. Distribution of protons over the precursor structure was found to remain unchanged upon activation, allowing charge state to be anticipated for product ions in the four series and hence facilitating their assignment for a straightforward characterization of any encoded oligo(triazole amide)s.

  19. Genomic organization, complete sequence, and chromosomal location of the gene for human eotaxin (SCYA11), an eosinophil-specific CC chemokine

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Garcia-Zepeda, E.A.; Sarafi, M.N.; Luster, A.D.

    1997-05-01

    Eotaxin is a CC chemokine that is a specific chemoattractant for eosinophils and is implicated in the pathogenesis of eosinophilic inflammatory diseases, such as asthma. We describe the genomic organization, complete sequence, including 1354 bp 5{prime} of the RNA initiation site, and chromosomal localization of the human eotaxin gene. Fluorescence in situ hybridization analysis localized eotaxin to human chromosome 17, in the region q21.1-q21.2, and the human gene name SCYA11 was assigned. We also present the 5{prime} flanking sequence of the mouse eotaxin gene and have identified several regulatory elements that are conserved between the murine and the human promoters.more » In particular, the presence of elements such as NF-{Kappa}B, interferon-{gamma} response element, and glucocorticoid response element may explain the observed regulation of the eotaxin gene by cytokines and glucocorticoids. 17 refs., 4 figs., 1 tab.« less

  20. The genome of woodland strawberry (Fragaria vesca)

    PubMed Central

    Shulaev, Vladimir; Sargent, Daniel J; Crowhurst, Ross N; Mockler, Todd C; Folkerts, Otto; Delcher, Arthur L; Jaiswal, Pankaj; Mockaitis, Keithanne; Liston, Aaron; Mane, Shrinivasrao P; Burns, Paul; Davis, Thomas M; Slovin, Janet P; Bassil, Nahla; Hellens, Roger P; Evans, Clive; Harkins, Tim; Kodira, Chinnappa; Desany, Brian; Crasta, Oswald R; Jensen, Roderick V; Allan, Andrew C; Michael, Todd P; Setubal, Joao Carlos; Celton, Jean-Marc; Rees, D Jasper G; Williams, Kelly P; Holt, Sarah H; Ruiz Rojas, Juan Jairo; Chatterjee, Mithu; Liu, Bo; Silva, Herman; Meisel, Lee; Adato, Avital; Filichkin, Sergei A; Troggio, Michela; Viola, Roberto; Ashman, Tia-Lynn; Wang, Hao; Dharmawardhana, Palitha; Elser, Justin; Raja, Rajani; Priest, Henry D; Bryant, Douglas W; Fox, Samuel E; Givan, Scott A; Wilhelm, Larry J; Naithani, Sushma; Christoffels, Alan; Salama, David Y; Carter, Jade; Girona, Elena Lopez; Zdepski, Anna; Wang, Wenqin; Kerstetter, Randall A; Schwab, Wilfried; Korban, Schuyler S; Davik, Jahn; Monfort, Amparo; Denoyes-Rothan, Beatrice; Arus, Pere; Mittler, Ron; Flinn, Barry; Aharoni, Asaph; Bennetzen, Jeffrey L; Salzberg, Steven L; Dickerman, Allan W; Velasco, Riccardo; Borodovsky, Mark; Veilleux, Richard E; Folta, Kevin M

    2012-01-01

    The woodland strawberry, Fragaria vesca (2n = 2x = 14), is a versatile experimental plant system. This diminutive herbaceous perennial has a small genome (240 Mb), is amenable to genetic transformation and shares substantial sequence identity with the cultivated strawberry (Fragaria × ananassa) and other economically important rosaceous plants. Here we report the draft F. vesca genome, which was sequenced to ×39 coverage using second-generation technology, assembled de novo and then anchored to the genetic linkage map into seven pseudochromosomes. This diploid strawberry sequence lacks the large genome duplications seen in other rosids. Gene prediction modeling identified 34,809 genes, with most being supported by transcriptome mapping. Genes critical to valuable horticultural traits including flavor, nutritional value and flowering time were identified. Macrosyntenic relationships between Fragaria and Prunus predict a hypothetical ancestral Rosaceae genome that had nine chromosomes. New phylogenetic analysis of 154 protein-coding genes suggests that assignment of Populus to Malvidae, rather than Fabidae, is warranted. PMID:21186353

  1. The application of DNA sequence data for the identification of benthic nematodes from the North Sea

    NASA Astrophysics Data System (ADS)

    Vogt, Philipp; Miljutina, Maria; Raupach, Michael J.

    2014-12-01

    Nematodes or roundworms represent one of the most diverse and dominant taxon in marine benthic habitats. Whereas a morphological identification of many species is challenging, the application of molecular markers represents a promising approach for species discrimination and identification. In this study, we used an integrative taxonomic approach, combining both molecular and morphological methods, to characterize nematodes of distinct sex and ontogenetic stages from three sampling sites of the North Sea. Morphospecies were discriminated after first visual determination, followed by a molecular analysis of the nuclear 28S rDNA: D2-D3 marker. By linking each sequence to a morphological voucher, discordant morphological identification was subjected to a so-called reverse taxonomic approach. Molecular operational taxonomic units (MOTUs) and morphospecies were compared for all of the three sampling sites to assess concordance of methodology. In total, 32 MOTUs and 26 morphospecies were assigned, of which 12 taxa were identified as described species. Both approaches showed high concordance in taxon assignment (84.4 %) except for a cluster comprising various Sabatieria species. Our study revealed the high potential of the analyzed fragment as a useful molecular marker for the identification of the North Sea nematodes and highlighted the applicability of this combined taxonomic approach in general.

  2. Consensus proposals for classification of the family Hepeviridae.

    PubMed

    Smith, Donald B; Simmonds, Peter; Jameel, Shahid; Emerson, Suzanne U; Harrison, Tim J; Meng, Xiang-Jin; Okamoto, Hiroaki; Van der Poel, Wim H M; Purdy, Michael A

    2014-10-01

    The family Hepeviridae consists of positive-stranded RNA viruses that infect a wide range of mammalian species, as well as chickens and trout. A subset of these viruses infects humans and can cause a self-limiting acute hepatitis that may become chronic in immunosuppressed individuals. Current published descriptions of the taxonomical divisions within the family Hepeviridae are contradictory in relation to the assignment of species and genotypes. Through analysis of existing sequence information, we propose a taxonomic scheme in which the family is divided into the genera Orthohepevirus (all mammalian and avian hepatitis E virus (HEV) isolates) and Piscihepevirus (cutthroat trout virus). Species within the genus Orthohepevirus are designated Orthohepevirus A (isolates from human, pig, wild boar, deer, mongoose, rabbit and camel), Orthohepevirus B (isolates from chicken), Orthohepevirus C (isolates from rat, greater bandicoot, Asian musk shrew, ferret and mink) and Orthohepevirus D (isolates from bat). Proposals are also made for the designation of genotypes within the human and rat HEVs. This hierarchical system is congruent with hepevirus phylogeny, and the three classification levels (genus, species and genotype) are consistent with, and reflect discontinuities in the ranges of pairwise distances between amino acid sequences. Adoption of this system would include the avoidance of host names in taxonomic identifiers and provide a logical framework for the assignment of novel variants.

  3. Molecular detection and species identification of Alexandrium (Dinophyceae) causing harmful algal blooms along the Chilean coastline

    PubMed Central

    Jedlicki, Ana; Fernández, Gonzalo; Astorga, Marcela; Oyarzún, Pablo; Toro, Jorge E.; Navarro, Jorge M.; Martínez, Víctor

    2012-01-01

    Background and aims On the basis of morphological evidence, the species involved in South American Pacific coast harmful algal blooms (HABs) has been traditionally recognized as Alexandrium catenella (Dinophyceae). However, these observations have not been confirmed using evidence based on genomic sequence variability. Our principal objective was to accurately determine the species of Alexandrium involved in local HABs in order to implement a real-time polymerase chain reaction (PCR) assay for its rapid and easy detection on filter-feeding shellfish, such as mussels. Methodology For species-specific determination, the intergenic spacer 1 (ITS1), 5.8S subunit, ITS2 and the hypervariable genomic regions D1–D5 of the large ribosomal subunit of local strains were sequenced and compared with two data sets of other Alexandrium sequences. Species-specific primers were used to amplify signature sequences within the genomic DNA of the studied species by conventional and real-time PCR. Principal results Phylogenetic analysis determined that the Chilean strain falls into Group I of the tamarensis complex. Our results support the allocation of the Chilean Alexandrium species as a toxic Alexandrium tamarense rather than A. catenella, as currently defined. Once local species were determined to belong to Group I of the tamarensis complex, a highly sensitive and accurate real-time PCR procedure was developed to detect dinoflagellate presence in Mytilus spp. (Bivalvia) samples after being fed (challenged) in vitro with the Chilean Alexandrium strain. The results show that real-time PCR is useful to detect Alexandrium intake in filter-feeding molluscs. Conclusions It has been shown that the classification of local Alexandrium using morphological evidence is not very accurate. Molecular methods enabled the HAB dinoflagellate species of the Chilean coast to be assigned as A. tamarense rather than A. catenella. Real-time PCR analysis based on A. tamarense primers allowed the detection of dinoflagellate DNA in Mytilus spp. samples exposed to this alga. Through the specific assignment of dinoflagellate species involved in HABs, more reliable preventive policies can be implemented. PMID:23259043

  4. Micromonospora schwarzwaldensis sp. nov., a producer of telomycin, isolated from soil.

    PubMed

    Vela Gurovic, Maria Soledad; Müller, Sebastian; Domin, Nicole; Seccareccia, Ivana; Nietzsche, Sandor; Martin, Karin; Nett, Markus

    2013-10-01

    A Gram-stain-positive, spore-forming actinomycete strain (HKI0641(T)) was isolated from a soil sample collected in the Black Forest, Germany. During screening for antimicrobial natural products this bacterium was identified as a producer of the antibiotic telomycin. Morphological characteristics and chemotaxonomic data indicated that the strain belonged to the genus Micromonospora. The peptidoglycan of strain HKI0641(T) contained meso-diaminopimelic acid, and the fatty acid profile consisted predominantly of anteiso-C15 : 0, iso-C15 : 0, iso-C16 : 0 and C16 : 0. MK-10(H4), MK-10(H2) and MK-10 were identified as the major menaquinones. To determine the taxonomic positioning of strain HKI0641(T), we computed a binary tanglegram of two rooted phylogenetic trees that were based upon 16S rRNA and gyrB gene sequences. The comparative analysis of the two common classification methods strongly supported the phylogenetic affiliation with the genus Micromonospora, but it also revealed discrepancies in the assignment at the level of the genomic species. 16S rRNA gene sequence analysis identified Micromonospora coxensis DSM 45161(T) (99.1 % sequence similarity) and Micromonospora marina DSM 45555(T) (99.0 %) as the nearest taxonomic neighbours, whereas the gyrB sequence of strain HKI0641(T) indicated a closer relationship to Micromonospora aurantiaca DSM 43813(T) (95.1 %). By means of DNA-DNA hybridization experiments, it was possible to resolve this issue and to clearly differentiate strain HKI0641(T) from other species of the genus Micromonospora. The type strains of the aforementioned species of the genus Micromonospora could be further distinguished from strain HKI0641(T) by several phenotypic properties, such as colony colour, NaCl tolerance and the utilization of carbon sources. The isolate was therefore assigned to a novel species of the genus Micromonospora, for which the name Micromonospora schwarzwaldensis sp. nov. is proposed. The type strain is HKI0641(T) ( = DSM 45708(T) = CIP 110415(T)).

  5. Characterization of mango (Mangifera indica L.) transcriptome and chloroplast genome.

    PubMed

    Azim, M Kamran; Khan, Ishtaiq A; Zhang, Yong

    2014-05-01

    We characterized mango leaf transcriptome and chloroplast genome using next generation DNA sequencing. The RNA-seq output of mango transcriptome generated >12 million reads (total nucleotides sequenced >1 Gb). De novo transcriptome assembly generated 30,509 unigenes with lengths in the range of 300 to ≥3,000 nt and 67× depth of coverage. Blast searching against nonredundant nucleotide databases and several Viridiplantae genomic datasets annotated 24,593 mango unigenes (80% of total) and identified Citrus sinensis as closest neighbor of mango with 9,141 (37%) matched sequences. The annotation with gene ontology and Clusters of Orthologous Group terms categorized unigene sequences into 57 and 25 classes, respectively. More than 13,500 unigenes were assigned to 293 KEGG pathways. Besides major plant biology related pathways, KEGG based gene annotation pointed out active presence of an array of biochemical pathways involved in (a) biosynthesis of bioactive flavonoids, flavones and flavonols, (b) biosynthesis of terpenoids and lignins and (c) plant hormone signal transduction. The mango transcriptome sequences revealed 235 proteases belonging to five catalytic classes of proteolytic enzymes. The draft genome of mango chloroplast (cp) was obtained by a combination of Sanger and next generation sequencing. The draft mango cp genome size is 151,173 bp with a pair of inverted repeats of 27,093 bp separated by small and large single copy regions, respectively. Out of 139 genes in mango cp genome, 91 found to be protein coding. Sequence analysis revealed cp genome of C. sinensis as closest neighbor of mango. We found 51 short repeats in mango cp genome supposed to be associated with extensive rearrangements. This is the first report of transcriptome and chloroplast genome analysis of any Anacardiaceae family member.

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Curry, J J; Gallagher, D W; Modarres, M

    Appendices are presented concerning isolation condenser makeup; vapor suppression system; station air system; reactor building closed cooling water system; turbine building secondary closed water system; service water system; emergency service water system; fire protection system; emergency ac power; dc power system; event probability estimation; methodology of accident sequence quantification; and assignment of dominant sequences to release categories.

  7. Screening and Characterization of RAPD Markers in Viscerotropic Leishmania Parasites

    PubMed Central

    Mkada–Driss, Imen; Talbi, Chiraz; Guerbouj, Souheila; Driss, Mehdi; Elamine, Elwaleed M.; Cupolillo, Elisa; Mukhtar, Moawia M.; Guizani, Ikram

    2014-01-01

    Visceral leishmaniasis (VL) is mainly due to the Leishmania donovani complex. VL is endemic in many countries worldwide including East Africa and the Mediterranean region where the epidemiology is complex. Taxonomy of these pathogens is under controversy but there is a correlation between their genetic diversity and geographical origin. With steady increase in genome knowledge, RAPD is still a useful approach to identify and characterize novel DNA markers. Our aim was to identify and characterize polymorphic DNA markers in VL Leishmania parasites in diverse geographic regions using RAPD in order to constitute a pool of PCR targets having the potential to differentiate among the VL parasites. 100 different oligonucleotide decamers having arbitrary DNA sequences were screened for reproducible amplification and a selection of 28 was used to amplify DNA from 12 L. donovani, L. archibaldi and L. infantum strains having diverse origins. A total of 155 bands were amplified of which 60.65% appeared polymorphic. 7 out of 28 primers provided monomorphic patterns. Phenetic analysis allowed clustering the parasites according to their geographical origin. Differentially amplified bands were selected, among them 22 RAPD products were successfully cloned and sequenced. Bioinformatic analysis allowed mapping of the markers and sequences and priming sites analysis. This study was complemented with Southern-blot to confirm assignment of markers to the kDNA. The bioinformatic analysis identified 16 nuclear and 3 minicircle markers. Analysis of these markers highlighted polymorphisms at RAPD priming sites with mainly 5′ end transversions, and presence of inter– and intra– taxonomic complex sequence and microsatellites variations; a bias in transitions over transversions and indels between the different sequences compared is observed, which is however less marked between L. infantum and L. donovani. The study delivers a pool of well-documented polymorphic DNA markers, to develop molecular diagnostics assays to characterize and differentiate VL causing agents. PMID:25313833

  8. MILP model for integrated balancing and sequencing mixed-model two-sided assembly line with variable launching interval and assignment restrictions

    NASA Astrophysics Data System (ADS)

    Azmi, N. I. L. Mohd; Ahmad, R.; Zainuddin, Z. M.

    2017-09-01

    This research explores the Mixed-Model Two-Sided Assembly Line (MMTSAL). There are two interrelated problems in MMTSAL which are line balancing and model sequencing. In previous studies, many researchers considered these problems separately and only few studied them simultaneously for one-sided line. However in this study, these two problems are solved simultaneously to obtain more efficient solution. The Mixed Integer Linear Programming (MILP) model with objectives of minimizing total utility work and idle time is generated by considering variable launching interval and assignment restriction constraint. The problem is analysed using small-size test cases to validate the integrated model. Throughout this paper, numerical experiment was conducted by using General Algebraic Modelling System (GAMS) with the solver CPLEX. Experimental results indicate that integrating the problems of model sequencing and line balancing help to minimise the proposed objectives function.

  9. Genome-wide analysis of the WRKY gene family in physic nut (Jatropha curcas L.).

    PubMed

    Xiong, Wangdan; Xu, Xueqin; Zhang, Lin; Wu, Pingzhi; Chen, Yaping; Li, Meiru; Jiang, Huawu; Wu, Guojiang

    2013-07-25

    The WRKY proteins, which contain highly conserved WRKYGQK amino acid sequences and zinc-finger-like motifs, constitute a large family of transcription factors in plants. They participate in diverse physiological and developmental processes. WRKY genes have been identified and characterized in a number of plant species. We identified a total of 58 WRKY genes (JcWRKY) in the genome of the physic nut (Jatropha curcas L.). On the basis of their conserved WRKY domain sequences, all of the JcWRKY proteins could be assigned to one of the previously defined groups, I-III. Phylogenetic analysis of JcWRKY genes with Arabidopsis and rice WRKY genes, and separately with castor bean WRKY genes, revealed no evidence of recent gene duplication in JcWRKY gene family. Analysis of transcript abundance of JcWRKY gene products were tested in different tissues under normal growth condition. In addition, 47 WRKY genes responded to at least one abiotic stress (drought, salinity, phosphate starvation and nitrogen starvation) in individual tissues (leaf, root and/or shoot cortex). Our study provides a useful reference data set as the basis for cloning and functional analysis of physic nut WRKY genes. Copyright © 2013 Elsevier B.V. All rights reserved.

  10. Lactobacillus crustorum sp. nov., isolated from two traditional Belgian wheat sourdoughs.

    PubMed

    Scheirlinck, Ilse; Van der Meulen, Roel; Van Schoor, Ann; Huys, Geert; Vandamme, Peter; De Vuyst, Luc; Vancanneyt, Marc

    2007-07-01

    A polyphasic taxonomic study of the lactic acid bacteria (LAB) population in three traditional Belgian sourdoughs, sampled between 2002 and 2004, revealed a group of isolates that could not be assigned to any recognized LAB species. Initially, sourdough isolates were screened by means of (GTG)(5)-PCR fingerprinting. Four isolates displaying unique (GTG)(5)-PCR patterns were further investigated by means of phenylalanyl-tRNA synthase (pheS) gene sequence analysis and represented a bifurcated branch that could not be allocated to any LAB species present in the in-house pheS database. Their phylogenetic affiliation was determined using 16S rRNA gene sequence analysis and showed that the four sourdough isolates belong to the Lactobacillus plantarum group with Lactobacillus mindensis, Lactobacillus farciminis and Lactobacillus nantensis as closest relatives. Further genotypic and phenotypic studies, including whole-cell protein analysis (SDS-PAGE), amplified fragment length polymorphism (AFLP) fingerprinting, DNA-DNA hybridization, DNA G+C content analysis, growth characteristics and biochemical features, demonstrated that the new sourdough isolates represent a novel Lactobacillus species for which the name Lactobacillus crustorum sp. nov. is proposed. The type strain of the new species is LMG 23699(T) (=CCUG 53174(T)).

  11. Whole genome analysis of halotolerant and alkalotolerant plant growth-promoting rhizobacterium Klebsiella sp. D5A

    NASA Astrophysics Data System (ADS)

    Liu, Wuxing; Wang, Qingling; Hou, Jinyu; Tu, Chen; Luo, Yongming; Christie, Peter

    2016-05-01

    This research undertook the systematic analysis of the Klebsiella sp. D5A genome and identification of genes that contribute to plant growth-promoting (PGP) traits, especially genes related to salt tolerance and wide pH adaptability. The genome sequence of isolate D5A was obtained using an Illumina HiSeq 2000 sequencing system with average coverages of 174.7× and 200.1× using the paired-end and mate-pair sequencing, respectively. Predicted and annotated gene sequences were analyzed for similarity with the Kyoto Encyclopedia of Genes and Genomes (KEGG) enzyme database followed by assignment of each gene into the KEGG pathway charts. The results show that the Klebsiella sp. D5A genome has a total of 5,540,009 bp with 57.15% G + C content. PGP conferring genes such as indole-3-acetic acid (IAA) biosynthesis, phosphate solubilization, siderophore production, acetoin and 2,3-butanediol synthesis, and N2 fixation were determined. Moreover, genes putatively responsible for resistance to high salinity including glycine-betaine synthesis, trehalose synthesis and a number of osmoregulation receptors and transport systems were also observed in the D5A genome together with numerous genes that contribute to pH homeostasis. These genes reveal the genetic adaptation of D5A to versatile environmental conditions and the effectiveness of the isolate to serve as a plant growth stimulator.

  12. Consolidation of glycosyl hydrolase family 30 : a dual domain 4/7 hydrolase family consisting of two structurally distinct groups

    Treesearch

    Franz J. St John; Javier M. Gonzalez; Edwin Pozharski

    2010-01-01

    In this work glycosyl hydrolase (GH) family 30 (GH30) is analyzed and shown to consist of its currently classified member sequences as well as several homologous sequence groups currently assigned within family GH5. A large scale amino acid sequence alignment and a phylogenetic tree were generated and GH30 groups and subgroups were designated. A partial rearrangement...

  13. DNABIT Compress – Genome compression algorithm

    PubMed Central

    Rajarajeswari, Pothuraju; Apparao, Allam

    2011-01-01

    Data compression is concerned with how information is organized in data. Efficient storage means removal of redundancy from the data being stored in the DNA molecule. Data compression algorithms remove redundancy and are used to understand biologically important molecules. We present a compression algorithm, “DNABIT Compress” for DNA sequences based on a novel algorithm of assigning binary bits for smaller segments of DNA bases to compress both repetitive and non repetitive DNA sequence. Our proposed algorithm achieves the best compression ratio for DNA sequences for larger genome. Significantly better compression results show that “DNABIT Compress” algorithm is the best among the remaining compression algorithms. While achieving the best compression ratios for DNA sequences (Genomes),our new DNABIT Compress algorithm significantly improves the running time of all previous DNA compression programs. Assigning binary bits (Unique BIT CODE) for (Exact Repeats, Reverse Repeats) fragments of DNA sequence is also a unique concept introduced in this algorithm for the first time in DNA compression. This proposed new algorithm could achieve the best compression ratio as much as 1.58 bits/bases where the existing best methods could not achieve a ratio less than 1.72 bits/bases. PMID:21383923

  14. Epidemiological tracking and population assignment of the non-clonal bacterium, Burkholderia pseudomallei.

    PubMed

    Dale, Julia; Price, Erin P; Hornstra, Heidie; Busch, Joseph D; Mayo, Mark; Godoy, Daniel; Wuthiekanun, Vanaporn; Baker, Anthony; Foster, Jeffrey T; Wagner, David M; Tuanyok, Apichai; Warner, Jeffrey; Spratt, Brian G; Peacock, Sharon J; Currie, Bart J; Keim, Paul; Pearson, Talima

    2011-12-01

    Rapid assignment of bacterial pathogens into predefined populations is an important first step for epidemiological tracking. For clonal species, a single allele can theoretically define a population. For non-clonal species such as Burkholderia pseudomallei, however, shared allelic states between distantly related isolates make it more difficult to identify population defining characteristics. Two distinct B. pseudomallei populations have been previously identified using multilocus sequence typing (MLST). These populations correlate with the major foci of endemicity (Australia and Southeast Asia). Here, we use multiple Bayesian approaches to evaluate the compositional robustness of these populations, and provide assignment results for MLST sequence types (STs). Our goal was to provide a reference for assigning STs to an established population without the need for further computational analyses. We also provide allele frequency results for each population to enable estimation of population assignment even when novel STs are discovered. The ability for humans and potentially contaminated goods to move rapidly across the globe complicates the task of identifying the source of an infection or outbreak. Population genetic dynamics of B. pseudomallei are particularly complicated relative to other bacterial pathogens, but the work here provides the ability for broad scale population assignment. As there is currently no independent empirical measure of successful population assignment, we provide comprehensive analytical details of our comparisons to enable the reader to evaluate the robustness of population designations and assignments as they pertain to individual research questions. Finer scale subdivision and verification of current population compositions will likely be possible with genotyping data that more comprehensively samples the genome. The approach used here may be valuable for other non-clonal pathogens that lack simple group-defining genetic characteristics and provides a rapid reference for epidemiologists wishing to track the origin of infection without the need to compile population data and learn population assignment algorithms.

  15. High Prevalence of ESBL-Producing Klebsiella pneumoniae Causing Community-Onset Infections in China

    PubMed Central

    Zhang, Jing; Zhou, Kai; Zheng, Beiwen; Zhao, Lina; Shen, Ping; Ji, Jinru; Wei, Zeqing; Li, Lanjuan; Zhou, Jianying; Xiao, Yonghong

    2016-01-01

    The aim of this work was to investigate the epidemiological and genetic characteristics of ESBL-producing Klebsiella pneumoniae (ESBL-Kp) causing community-onset infections. K. pneumoniae isolates were collected from 31 Chinese secondary hospitals between August 2010 and 2011. Genes encoding ESBL and AmpC beta-lactamases were detected by PCR. The isolates were assigned to sequence types (STs) using multi-locus sequence typing (MLST). Eleven ESBL-Kp strains were selected for whole-genome sequencing (WGS) for investigating the genetic environment and plasmids encoding ESBL genes. A total of 578 K. pneumoniae isolates were collected, and 184 (31.8%) carried ESBL genes. The prevalence of ESBL-Kp varied from different geographical areas of China (10.2–50.3%). The three most prevalent ESBL genes were blaCTX-M-14 (n = 74), blaCTX-M-15 (n = 60), and blaCTX-M-3 (n = 40). MLST assigned 127 CTX-M-14 and CTX-M-15 producers to 54 STs, and CC17 was the most prevalent population (12.6%). STs (23, 37, and 86) that were known frequently associated with hypervirulent K. pneumoniae (hvKP) account for 14.1% (18/127). Phylogenetic analysis by concatenating the seven loci of MLST revealed the existence of ESBL-producing K. quasipneumoniae (two strains) and K. varricola (one strain), which was further confirmed by WGS. This study highlights the challenge of community-onset infections caused by ESBL-Kp in China. The prevalence of STs frequently associating with hvKP should be of concern. Surveillance of ESBL-KP causing community-onset infections now appears imperative. PMID:27895637

  16. NMR-based automated protein structure determination.

    PubMed

    Würz, Julia M; Kazemi, Sina; Schmidt, Elena; Bagaria, Anurag; Güntert, Peter

    2017-08-15

    NMR spectra analysis for protein structure determination can now in many cases be performed by automated computational methods. This overview of the computational methods for NMR protein structure analysis presents recent automated methods for signal identification in multidimensional NMR spectra, sequence-specific resonance assignment, collection of conformational restraints, and structure calculation, as implemented in the CYANA software package. These algorithms are sufficiently reliable and integrated into one software package to enable the fully automated structure determination of proteins starting from NMR spectra without manual interventions or corrections at intermediate steps, with an accuracy of 1-2 Å backbone RMSD in comparison with manually solved reference structures. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. Localization of the human tripeptidyl peptidase II gene (TPP2) to 13q32-q33 by nonradioactive in situ hybridization and somatic cell hybrids

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Martinsson, T.; Vujic, M.; Tomkinson, B.

    1993-08-01

    The authors have assigned the human tripeptidyl peptidase II (TPP2) gene to chromosome region 13q32-q33 using two different methods. First, a full-length TPP2 cDNA was used as a probe on Southern blots of DNA from a panel of human/rodent somatic cell hybrids. The TPP2 sequences were found to segregate with the human chromosome 13. Second, fluorescence in situ hybridization analysis was performed with the same probe. This analysis supported the chromosome 13 localization and further refined it to region 13q32-q33. 20 refs., 2 figs.

  18. More reliable protein NMR peak assignment via improved 2-interval scheduling.

    PubMed

    Chen, Zhi-Zhong; Lin, Guohui; Rizzi, Romeo; Wen, Jianjun; Xu, Dong; Xu, Ying; Jiang, Tao

    2005-03-01

    Protein NMR peak assignment refers to the process of assigning a group of "spin systems" obtained experimentally to a protein sequence of amino acids. The automation of this process is still an unsolved and challenging problem in NMR protein structure determination. Recently, protein NMR peak assignment has been formulated as an interval scheduling problem (ISP), where a protein sequence P of amino acids is viewed as a discrete time interval I (the amino acids on P one-to-one correspond to the time units of I), each subset S of spin systems that are known to originate from consecutive amino acids from P is viewed as a "job" j(s), the preference of assigning S to a subsequence P of consecutive amino acids on P is viewed as the profit of executing job j(s) in the subinterval of I corresponding to P, and the goal is to maximize the total profit of executing the jobs (on a single machine) during I. The interval scheduling problem is max SNP-hard in general; but in the real practice of protein NMR peak assignment, each job j(s) usually requires at most 10 consecutive time units, and typically the jobs that require one or two consecutive time units are the most difficult to assign/schedule. In order to solve these most difficult assignments, we present an efficient 13/7-approximation algorithm for the special case of the interval scheduling problem where each job takes one or two consecutive time units. Combining this algorithm with a greedy filtering strategy for handling long jobs (i.e., jobs that need more than two consecutive time units), we obtain a new efficient heuristic for protein NMR peak assignment. Our experimental study shows that the new heuristic produces the best peak assignment in most of the cases, compared with the NMR peak assignment algorithms in the recent literature. The above algorithm is also the first approximation algorithm for a nontrivial case of the well-known interval scheduling problem that breaks the ratio 2 barrier.

  19. Genome Wide Re-Annotation of Caldicellulosiruptor saccharolyticus with New Insights into Genes Involved in Biomass Degradation and Hydrogen Production

    PubMed Central

    Chowdhary, Nupoor; Selvaraj, Ashok; KrishnaKumaar, Lakshmi; Kumar, Gopal Ramesh

    2015-01-01

    Caldicellulosiruptor saccharolyticus has proven itself to be an excellent candidate for biological hydrogen (H2) production, but still it has major drawbacks like sensitivity to high osmotic pressure and low volumetric H2 productivity, which should be considered before it can be used industrially. A whole genome re-annotation work has been carried out as an attempt to update the incomplete genome information that causes gap in the knowledge especially in the area of metabolic engineering, to improve the H2 producing capabilities of C. saccharolyticus. Whole genome re-annotation was performed through manual means for 2,682 Coding Sequences (CDSs). Bioinformatics tools based on sequence similarity, motif search, phylogenetic analysis and fold recognition were employed for re-annotation. Our methodology could successfully add functions for 409 hypothetical proteins (HPs), 46 proteins previously annotated as putative and assigned more accurate functions for the known protein sequences. Homology based gene annotation has been used as a standard method for assigning function to novel proteins, but over the past few years many non-homology based methods such as genomic context approaches for protein function prediction have been developed. Using non-homology based functional prediction methods, we were able to assign cellular processes or physical complexes for 249 hypothetical sequences. Our re-annotation pipeline highlights the addition of 231 new CDSs generated from MicroScope Platform, to the original genome with functional prediction for 49 of them. The re-annotation of HPs and new CDSs is stored in the relational database that is available on the MicroScope web-based platform. In parallel, a comparative genome analyses were performed among the members of genus Caldicellulosiruptor to understand the function and evolutionary processes. Further, with results from integrated re-annotation studies (homology and genomic context approach), we strongly suggest that Csac_0437 and Csac_0424 encode for glycoside hydrolases (GH) and are proposed to be involved in the decomposition of recalcitrant plant polysaccharides. Similarly, HPs: Csac_0732, Csac_1862, Csac_1294 and Csac_0668 are suggested to play a significant role in biohydrogen production. Function prediction of these HPs by using our integrated approach will considerably enhance the interpretation of large-scale experiments targeting this industrially important organism. PMID:26196387

  20. Genome Wide Re-Annotation of Caldicellulosiruptor saccharolyticus with New Insights into Genes Involved in Biomass Degradation and Hydrogen Production.

    PubMed

    Chowdhary, Nupoor; Selvaraj, Ashok; KrishnaKumaar, Lakshmi; Kumar, Gopal Ramesh

    2015-01-01

    Caldicellulosiruptor saccharolyticus has proven itself to be an excellent candidate for biological hydrogen (H2) production, but still it has major drawbacks like sensitivity to high osmotic pressure and low volumetric H2 productivity, which should be considered before it can be used industrially. A whole genome re-annotation work has been carried out as an attempt to update the incomplete genome information that causes gap in the knowledge especially in the area of metabolic engineering, to improve the H2 producing capabilities of C. saccharolyticus. Whole genome re-annotation was performed through manual means for 2,682 Coding Sequences (CDSs). Bioinformatics tools based on sequence similarity, motif search, phylogenetic analysis and fold recognition were employed for re-annotation. Our methodology could successfully add functions for 409 hypothetical proteins (HPs), 46 proteins previously annotated as putative and assigned more accurate functions for the known protein sequences. Homology based gene annotation has been used as a standard method for assigning function to novel proteins, but over the past few years many non-homology based methods such as genomic context approaches for protein function prediction have been developed. Using non-homology based functional prediction methods, we were able to assign cellular processes or physical complexes for 249 hypothetical sequences. Our re-annotation pipeline highlights the addition of 231 new CDSs generated from MicroScope Platform, to the original genome with functional prediction for 49 of them. The re-annotation of HPs and new CDSs is stored in the relational database that is available on the MicroScope web-based platform. In parallel, a comparative genome analyses were performed among the members of genus Caldicellulosiruptor to understand the function and evolutionary processes. Further, with results from integrated re-annotation studies (homology and genomic context approach), we strongly suggest that Csac_0437 and Csac_0424 encode for glycoside hydrolases (GH) and are proposed to be involved in the decomposition of recalcitrant plant polysaccharides. Similarly, HPs: Csac_0732, Csac_1862, Csac_1294 and Csac_0668 are suggested to play a significant role in biohydrogen production. Function prediction of these HPs by using our integrated approach will considerably enhance the interpretation of large-scale experiments targeting this industrially important organism.

  1. Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon

    PubMed Central

    2011-01-01

    Background Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. Result We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot plants. Codon usages of melon full-length transcripts were largely similar to those of Arabidopsis coding sequences. Conclusion The collection of melon ESTs generated from full-length enriched and standard cDNA libraries is expected to play significant roles in annotating the melon genome. The ESTs and associated analysis results will be useful resources for gene discovery, functional analysis, marker-assisted breeding of melon and closely related species, comparative genomic studies and for gaining insights into gene expression patterns. PMID:21599934

  2. Lentzea soli sp. nov., an actinomycete isolated from soil.

    PubMed

    Li, Dongmei; Zheng, Weiwei; Zhao, Junwei; Han, Liyuan; Zhao, Xueli; Jiang, Hao; Wang, Xiangjing; Xiang, Wensheng

    2018-05-01

    A novel actinobacterium, designated strain NEAU-LZC 7 T , was isolated from soil collected from Mount Song and characterized using a polyphasic approach. Phylogenetic analysis based on 16S rRNA gene sequence indicated that strain NEAU-LZC 7 T belonged to the genus Lentzea, with highest sequence similarity to Lentzea violacea JCM 10975 T (98.1 %). Morphological and chemotaxonomic characteristics of the strain also supported its assignment to the genus Lentzea. However, DNA-DNA relatedness, physiological and biochemical data showed that strain NEAU-LZC 7 T could be distinguished from its closest relative. Therefore, strain NEAU-LZC 7 T represents a novel species of the genus Lentzea, for which the name Lentzea soli sp. nov. is proposed, with NEAU-LZC 7 T (=CCTCC AA 2017027 T =JCM 32384 T ) as the type strain.

  3. The Advanced Glaucoma Intervention Study (AGIS): 4. Comparison of treatment outcomes within race. Seven-year results.

    PubMed

    1998-07-01

    The purpose of this report is to present separately for black and white patients with advanced glaucoma 7-year results of two alternative surgical intervention sequences. A randomized controlled trial. A total of 332 black patients (451 eyes), 249 white patients (325 eyes), and 10 patients of other races (13 eyes) participated. Potential follow-up ranged from 4 to 7 years. Eyes were randomly assigned to either an argon laser trabeculoplasty (ALT)-trabeculectomy-trabeculectomy (ATT) sequence or a trabeculectomy-ALT-trabeculectomy (TAT) sequence. The second and third interventions were offered after failure of the first and second interventions, respectively. Average percent of eyes with decrease of visual field (APDVF), average percent of eyes with decrease of visual acuity (APDVA), and average percent of eyes with decrease of vision (APDV) are the outcome measures. Decrease of visual field (DVF) is an increase from baseline of at least 4 points on a glaucoma visual field defect scale ranging from 0 to 20, decrease of visual acuity (DVA) is a decrease from baseline of at least 15 letters (3 lines), and decrease of vision (DV) is the occurrence of either DVF or DVA. The averages are of percent decreases observed at 6-month intervals from the first 6-month visit to the end of the specified observation period. In both black and white patients throughout 7-year follow-up, the mean decrease in intraocular pressure was greater in eyes assigned to TAT, and the cumulative probability of failure of the first intervention was greater in eyes assigned to ATT. In black patients, APDVF, APDVA, and APDV are less for the ATT sequence than for the TAT sequence throughout the 7 years. In white patients, APDVF also favors the ATT sequence but only for the first year, after which it favors the TAT sequence through the seventh year; APDVA also favors the ATT sequence, but the ATT-TAT difference progressively diminishes over 7 years; and APDV favors ATT over TAT initially, but after 4 years, the advantage switches to and remains with TAT. These data support use of the ATT sequence for all black patients. For white patients without life-threatening health problems, the data support use of the TAT sequence.

  4. Genetic Characterization of a Panel of Diverse HIV-1 Isolates at Seven International Sites

    PubMed Central

    Chen, Yue; Sanchez, Ana M.; Sabino, Ester; Hunt, Gillian; Ledwaba, Johanna; Hackett, John; Swanson, Priscilla; Hewlett, Indira; Ragupathy, Viswanath; Vikram Vemula, Sai; Zeng, Peibin; Tee, Kok-Keng; Chow, Wei Zhen; Ji, Hezhao; Sandstrom, Paul; Denny, Thomas N.; Busch, Michael P.; Gao, Feng

    2016-01-01

    HIV-1 subtypes and drug resistance are routinely tested by many international surveillance groups. However, results from different sites often vary. A systematic comparison of results from multiple sites is needed to determine whether a standardized protocol is required for consistent and accurate data analysis. A panel of well-characterized HIV-1 isolates (N = 50) from the External Quality Assurance Program Oversight Laboratory (EQAPOL) was assembled for evaluation at seven international sites. This virus panel included seven subtypes, six circulating recombinant forms (CRFs), nine unique recombinant forms (URFs) and three group O viruses. Seven viruses contained 10 major drug resistance mutations (DRMs). HIV-1 isolates were prepared at a concentration of 107 copies/ml and compiled into blinded panels. Subtypes and DRMs were determined with partial or full pol gene sequences by conventional Sanger sequencing and/or Next Generation Sequencing (NGS). Subtype and DRM results were reported and decoded for comparison with full-length genome sequences generated by EQAPOL. The partial pol gene was amplified by RT-PCR and sequenced for 89.4%-100% of group M viruses at six sites. Subtyping results of majority of the viruses (83%-97.9%) were correctly determined for the partial pol sequences. All 10 major DRMs in seven isolates were detected at these six sites. The complete pol gene sequence was also obtained by NGS at one site. However, this method missed six group M viruses and sequences contained host chromosome fragments. Three group O viruses were only characterized with additional group O-specific RT-PCR primers employed by one site. These results indicate that PCR protocols and subtyping tools should be standardized to efficiently amplify diverse viruses and more consistently assign virus genotypes, which is critical for accurate global subtype and drug resistance surveillance. Targeted NGS analysis of partial pol sequences can serve as an alternative approach, especially for detection of low-abundance DRMs. PMID:27314585

  5. Genetic Characterization of a Panel of Diverse HIV-1 Isolates at Seven International Sites.

    PubMed

    Hora, Bhavna; Keating, Sheila M; Chen, Yue; Sanchez, Ana M; Sabino, Ester; Hunt, Gillian; Ledwaba, Johanna; Hackett, John; Swanson, Priscilla; Hewlett, Indira; Ragupathy, Viswanath; Vikram Vemula, Sai; Zeng, Peibin; Tee, Kok-Keng; Chow, Wei Zhen; Ji, Hezhao; Sandstrom, Paul; Denny, Thomas N; Busch, Michael P; Gao, Feng

    2016-01-01

    HIV-1 subtypes and drug resistance are routinely tested by many international surveillance groups. However, results from different sites often vary. A systematic comparison of results from multiple sites is needed to determine whether a standardized protocol is required for consistent and accurate data analysis. A panel of well-characterized HIV-1 isolates (N = 50) from the External Quality Assurance Program Oversight Laboratory (EQAPOL) was assembled for evaluation at seven international sites. This virus panel included seven subtypes, six circulating recombinant forms (CRFs), nine unique recombinant forms (URFs) and three group O viruses. Seven viruses contained 10 major drug resistance mutations (DRMs). HIV-1 isolates were prepared at a concentration of 107 copies/ml and compiled into blinded panels. Subtypes and DRMs were determined with partial or full pol gene sequences by conventional Sanger sequencing and/or Next Generation Sequencing (NGS). Subtype and DRM results were reported and decoded for comparison with full-length genome sequences generated by EQAPOL. The partial pol gene was amplified by RT-PCR and sequenced for 89.4%-100% of group M viruses at six sites. Subtyping results of majority of the viruses (83%-97.9%) were correctly determined for the partial pol sequences. All 10 major DRMs in seven isolates were detected at these six sites. The complete pol gene sequence was also obtained by NGS at one site. However, this method missed six group M viruses and sequences contained host chromosome fragments. Three group O viruses were only characterized with additional group O-specific RT-PCR primers employed by one site. These results indicate that PCR protocols and subtyping tools should be standardized to efficiently amplify diverse viruses and more consistently assign virus genotypes, which is critical for accurate global subtype and drug resistance surveillance. Targeted NGS analysis of partial pol sequences can serve as an alternative approach, especially for detection of low-abundance DRMs.

  6. De novo Assembly of the Indo-Pacific Humpback Dolphin Leucocyte Transcriptome to Identify Putative Genes Involved in the Aquatic Adaptation and Immune Response

    PubMed Central

    Xia, Jia; Yang, Lili; Chen, Jialin; Wu, Yuping; Yi, Meisheng

    2013-01-01

    Background The Indo-Pacific humpback dolphin (Sousa chinensis), a marine mammal species inhabited in the waters of Southeast Asia, South Africa and Australia, has attracted much attention because of the dramatic decline in population size in the past decades, which raises the concern of extinction. So far, this species is poorly characterized at molecular level due to little sequence information available in public databases. Recent advances in large-scale RNA sequencing provide an efficient approach to generate abundant sequences for functional genomic analyses in the species with un-sequenced genomes. Principal Findings We performed a de novo assembly of the Indo-Pacific humpback dolphin leucocyte transcriptome by Illumina sequencing. 108,751 high quality sequences from 47,840,388 paired-end reads were generated, and 48,868 and 46,587 unigenes were functionally annotated by BLAST search against the NCBI non-redundant and Swiss-Prot protein databases (E-value<10−5), respectively. In total, 16,467 unigenes were clustered into 25 functional categories by searching against the COG database, and BLAST2GO search assigned 37,976 unigenes to 61 GO terms. In addition, 36,345 unigenes were grouped into 258 KEGG pathways. We also identified 9,906 simple sequence repeats and 3,681 putative single nucleotide polymorphisms as potential molecular markers in our assembled sequences. A large number of unigenes were predicted to be involved in immune response, and many genes were predicted to be relevant to adaptive evolution and cetacean-specific traits. Conclusion This study represented the first transcriptome analysis of the Indo-Pacific humpback dolphin, an endangered species. The de novo transcriptome analysis of the unique transcripts will provide valuable sequence information for discovery of new genes, characterization of gene expression, investigation of various pathways and adaptive evolution, as well as identification of genetic markers. PMID:24015242

  7. De novo assembly of the Indo-Pacific humpback dolphin leucocyte transcriptome to identify putative genes involved in the aquatic adaptation and immune response.

    PubMed

    Gui, Duan; Jia, Kuntong; Xia, Jia; Yang, Lili; Chen, Jialin; Wu, Yuping; Yi, Meisheng

    2013-01-01

    The Indo-Pacific humpback dolphin (Sousa chinensis), a marine mammal species inhabited in the waters of Southeast Asia, South Africa and Australia, has attracted much attention because of the dramatic decline in population size in the past decades, which raises the concern of extinction. So far, this species is poorly characterized at molecular level due to little sequence information available in public databases. Recent advances in large-scale RNA sequencing provide an efficient approach to generate abundant sequences for functional genomic analyses in the species with un-sequenced genomes. We performed a de novo assembly of the Indo-Pacific humpback dolphin leucocyte transcriptome by Illumina sequencing. 108,751 high quality sequences from 47,840,388 paired-end reads were generated, and 48,868 and 46,587 unigenes were functionally annotated by BLAST search against the NCBI non-redundant and Swiss-Prot protein databases (E-value<10(-5)), respectively. In total, 16,467 unigenes were clustered into 25 functional categories by searching against the COG database, and BLAST2GO search assigned 37,976 unigenes to 61 GO terms. In addition, 36,345 unigenes were grouped into 258 KEGG pathways. We also identified 9,906 simple sequence repeats and 3,681 putative single nucleotide polymorphisms as potential molecular markers in our assembled sequences. A large number of unigenes were predicted to be involved in immune response, and many genes were predicted to be relevant to adaptive evolution and cetacean-specific traits. This study represented the first transcriptome analysis of the Indo-Pacific humpback dolphin, an endangered species. The de novo transcriptome analysis of the unique transcripts will provide valuable sequence information for discovery of new genes, characterization of gene expression, investigation of various pathways and adaptive evolution, as well as identification of genetic markers.

  8. Single machine scheduling with slack due dates assignment

    NASA Astrophysics Data System (ADS)

    Liu, Weiguo; Hu, Xiangpei; Wang, Xuyin

    2017-04-01

    This paper considers a single machine scheduling problem in which each job is assigned an individual due date based on a common flow allowance (i.e. all jobs have slack due date). The goal is to find a sequence for jobs, together with a due date assignment, that minimizes a non-regular criterion comprising the total weighted absolute lateness value and common flow allowance cost, where the weight is a position-dependent weight. In order to solve this problem, an ? time algorithm is proposed. Some extensions of the problem are also shown.

  9. A comprehensive DNA barcode database for Central European beetles with a focus on Germany: adding more than 3500 identified species to BOLD.

    PubMed

    Hendrich, Lars; Morinière, Jérôme; Haszprunar, Gerhard; Hebert, Paul D N; Hausmann, Axel; Köhler, Frank; Balke, Michael

    2015-07-01

    Beetles are the most diverse group of animals and are crucial for ecosystem functioning. In many countries, they are well established for environmental impact assessment, but even in the well-studied Central European fauna, species identification can be very difficult. A comprehensive and taxonomically well-curated DNA barcode library could remedy this deficit and could also link hundreds of years of traditional knowledge with next generation sequencing technology. However, such a beetle library is missing to date. This study provides the globally largest DNA barcode reference library for Coleoptera for 15 948 individuals belonging to 3514 well-identified species (53% of the German fauna) with representatives from 97 of 103 families (94%). This study is the first comprehensive regional test of the efficiency of DNA barcoding for beetles with a focus on Germany. Sequences ≥500 bp were recovered from 63% of the specimens analysed (15 948 of 25 294) with short sequences from another 997 specimens. Whereas most specimens (92.2%) could be unambiguously assigned to a single known species by sequence diversity at CO1, 1089 specimens (6.8%) were assigned to more than one Barcode Index Number (BIN), creating 395 BINs which need further study to ascertain if they represent cryptic species, mitochondrial introgression, or simply regional variation in widespread species. We found 409 specimens (2.6%) that shared a BIN assignment with another species, most involving a pair of closely allied species as 43 BINs were involved. Most of these taxa were separated by barcodes although sequence divergences were low. Only 155 specimens (0.97%) show identical or overlapping clusters. © 2014 John Wiley & Sons Ltd.

  10. The application of new software tools to quantitative protein profiling via isotope-coded affinity tag (ICAT) and tandem mass spectrometry: I. Statistically annotated datasets for peptide sequences and proteins identified via the application of ICAT and tandem mass spectrometry to proteins copurifying with T cell lipid rafts.

    PubMed

    von Haller, Priska D; Yi, Eugene; Donohoe, Samuel; Vaughn, Kelly; Keller, Andrew; Nesvizhskii, Alexey I; Eng, Jimmy; Li, Xiao-jun; Goodlett, David R; Aebersold, Ruedi; Watts, Julian D

    2003-07-01

    Lipid rafts were prepared according to standard protocols from Jurkat T cells stimulated via T cell receptor/CD28 cross-linking and from control (unstimulated) cells. Co-isolating proteins from the control and stimulated cell preparations were labeled with isotopically normal (d0) and heavy (d8) versions of the same isotope-coded affinity tag (ICAT) reagent, respectively. Samples were combined, proteolyzed, and resultant peptides fractionated via cation exchange chromatography. Cysteine-containing (ICAT-labeled) peptides were recovered via the biotin tag component of the ICAT reagents by avidin-affinity chromatography. On-line micro-capillary liquid chromatography tandem mass spectrometry was performed on both avidin-affinity (ICAT-labeled) and flow-through (unlabeled) fractions. Initial peptide sequence identification was by searching recorded tandem mass spectrometry spectra against a human sequence data base using SEQUEST software. New statistical data modeling algorithms were then applied to the SEQUEST search results. These allowed for discrimination between likely "correct" and "incorrect" peptide assignments, and from these the inferred proteins that they collectively represented, by calculating estimated probabilities that each peptide assignment and subsequent protein identification was a member of the "correct" population. For convenience, the resultant lists of peptide sequences assigned and the proteins to which they corresponded were filtered at an arbitrarily set cut-off of 0.5 (i.e. 50% likely to be "correct") and above and compiled into two separate datasets. In total, these data sets contained 7667 individual peptide identifications, which represented 2669 unique peptide sequences, corresponding to 685 proteins and related protein groups.

  11. UFO: a web server for ultra-fast functional profiling of whole genome protein sequences.

    PubMed

    Meinicke, Peter

    2009-09-02

    Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.

  12. Distantly related lipocalins share two conserved clusters of hydrophobic residues: use in homology modeling

    PubMed Central

    Adam, Benoit; Charloteaux, Benoit; Beaufays, Jerome; Vanhamme, Luc; Godfroid, Edmond; Brasseur, Robert; Lins, Laurence

    2008-01-01

    Background Lipocalins are widely distributed in nature and are found in bacteria, plants, arthropoda and vertebra. In hematophagous arthropods, they are implicated in the successful accomplishment of the blood meal, interfering with platelet aggregation, blood coagulation and inflammation and in the transmission of disease parasites such as Trypanosoma cruzi and Borrelia burgdorferi. The pairwise sequence identity is low among this family, often below 30%, despite a well conserved tertiary structure. Under the 30% identity threshold, alignment methods do not correctly assign and align proteins. The only safe way to assign a sequence to that family is by experimental determination. However, these procedures are long and costly and cannot always be applied. A way to circumvent the experimental approach is sequence and structure analyze. To further help in that task, the residues implicated in the stabilisation of the lipocalin fold were determined. This was done by analyzing the conserved interactions for ten lipocalins having a maximum pairwise identity of 28% and various functions. Results It was determined that two hydrophobic clusters of residues are conserved by analysing the ten lipocalin structures and sequences. One cluster is internal to the barrel, involving all strands and the 310 helix. The other is external, involving four strands and the helix lying parallel to the barrel surface. These clusters are also present in RaHBP2, a unusual "outlier" lipocalin from tick Rhipicephalus appendiculatus. This information was used to assess assignment of LIR2 a protein from Ixodes ricinus and to build a 3D model that helps to predict function. FTIR data support the lipocalin fold for this protein. Conclusion By sequence and structural analyzes, two conserved clusters of hydrophobic residues in interactions have been identified in lipocalins. Since the residues implicated are not conserved for function, they should provide the minimal subset necessary to confer the lipocalin fold. This information has been used to assign LIR2 to lipocalins and to investigate its structure/function relationship. This study could be applied to other protein families with low pairwise similarity, such as the structurally related fatty acid binding proteins or avidins. PMID:18190694

  13. VAMPS: a website for visualization and analysis of microbial population structures.

    PubMed

    Huse, Susan M; Mark Welch, David B; Voorhis, Andy; Shipunova, Anna; Morrison, Hilary G; Eren, A Murat; Sogin, Mitchell L

    2014-02-05

    The advent of next-generation DNA sequencing platforms has revolutionized molecular microbial ecology by making the detailed analysis of complex communities over time and space a tractable research pursuit for small research groups. However, the ability to generate 10⁵-10⁸ reads with relative ease brings with it many downstream complications. Beyond the computational resources and skills needed to process and analyze data, it is difficult to compare datasets in an intuitive and interactive manner that leads to hypothesis generation and testing. We developed the free web service VAMPS (Visualization and Analysis of Microbial Population Structures, http://vamps.mbl.edu) to address these challenges and to facilitate research by individuals or collaborating groups working on projects with large-scale sequencing data. Users can upload marker gene sequences and associated metadata; reads are quality filtered and assigned to both taxonomic structures and to taxonomy-independent clusters. A simple point-and-click interface allows users to select for analysis any combination of their own or their collaborators' private data and data from public projects, filter these by their choice of taxonomic and/or abundance criteria, and then explore these data using a wide range of analytic methods and visualizations. Each result is extensively hyperlinked to other analysis and visualization options, promoting data exploration and leading to a greater understanding of data relationships. VAMPS allows researchers using marker gene sequence data to analyze the diversity of microbial communities and the relationships between communities, to explore these analyses in an intuitive visual context, and to download data, results, and images for publication. VAMPS obviates the need for individual research groups to make the considerable investment in computational infrastructure and bioinformatic support otherwise necessary to process, analyze, and interpret massive amounts of next-generation sequence data. Any web-capable device can be used to upload, process, explore, and extract data and results from VAMPS. VAMPS encourages researchers to share sequence and metadata, and fosters collaboration between researchers of disparate biomes who recognize common patterns in shared data.

  14. Single-cell template strand sequencing by Strand-seq enables the characterization of individual homologs.

    PubMed

    Sanders, Ashley D; Falconer, Ester; Hills, Mark; Spierings, Diana C J; Lansdorp, Peter M

    2017-06-01

    The ability to distinguish between genome sequences of homologous chromosomes in single cells is important for studies of copy-neutral genomic rearrangements (such as inversions and translocations), building chromosome-length haplotypes, refining genome assemblies, mapping sister chromatid exchange events and exploring cellular heterogeneity. Strand-seq is a single-cell sequencing technology that resolves the individual homologs within a cell by restricting sequence analysis to the DNA template strands used during DNA replication. This protocol, which takes up to 4 d to complete, relies on the directionality of DNA, in which each single strand of a DNA molecule is distinguished based on its 5'-3' orientation. Culturing cells in a thymidine analog for one round of cell division labels nascent DNA strands, allowing for their selective removal during genomic library construction. To preserve directionality of template strands, genomic preamplification is bypassed and labeled nascent strands are nicked and not amplified during library preparation. Each single-cell library is multiplexed for pooling and sequencing, and the resulting sequence data are aligned, mapping to either the minus or plus strand of the reference genome, to assign template strand states for each chromosome in the cell. The major adaptations to conventional single-cell sequencing protocols include harvesting of daughter cells after a single round of BrdU incorporation, bypassing of whole-genome amplification, and removal of the BrdU + strand during Strand-seq library preparation. By sequencing just template strands, the structure and identity of each homolog are preserved.

  15. Genomic organization of the human gene (CA5) and pseudogene for mitochondrial carbonic anhydrase V and their localization to chromosomes 16q and 16p

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nagao, Yoshiro; Sly, W.S.; Batanian, J.R.

    1995-08-10

    Carbonic anhydrase V (CA V) is expressed in mitochondrial matrix in liver and several other tissues. It is of interest for its putative roles in providing bicarbonate to carbamoyl phosphate synthetase for ureagenesis and to pyruvate carboxylase for gluconeogenesis and its possible importance in explaining certain inherited metabolic disorders with hyperammonemia and hypoglycemia. Following the recent characterization of the cDNA for human CA V, we report the isolation of the human gene from two {lambda} genomic libraries and its characterization. The CA V gene (CA5) is approximately 50 kb long and contains 7 exons and 6 introns. The exon-intron boundariesmore » are found in positions identical to those determined for the previously described CA II, CA III, and CA VII genes. Like the CA VII gene, CA5 does not contain typical TATA and CAAT promoter elements in the 5{prime} flanking region but does contain a TTTAA sequence 147 nucleotides upstream of the initiation codon. CA5 also contains a 12-bp GT-rich segment beginning 13 bp downstream of the polyadenylation signal in the 3{prime} untranslated region of exon 7. FISH analysis allowed CA5 to be assigned to chromosome 16q24.3. An unprocessed pseudogene containing sequence homologous to exons 3-7 and introns 3-6 was also isolated and was assigned by FISH analysis to chromosome 16p11.2-p12. 22 refs., 4 figs., 1 tab.« less

  16. Comparative transcriptomics reveals genes involved in metabolic and immune pathways in the digestive gland of scallop Chlamys farreri following cadmium exposure

    NASA Astrophysics Data System (ADS)

    Zhang, Hui; Zhai, Yuxiu; Yao, Lin; Jiang, Yanhua; Li, Fengling

    2017-05-01

    Chlamys farreri is an economically important mollusk that can accumulate excessive amounts of cadmium (Cd). Studying the molecular mechanism of Cd accumulation in bivalves is difficult because of the lack of genome background. Transcriptomic analysis based on high-throughput RNA sequencing has been shown to be an efficient and powerful method for the discovery of relevant genes in non-model and genome reference-free organisms. Here, we constructed two cDNA libraries (control and Cd exposure groups) from the digestive gland of C. farreri and compared the transcriptomic data between them. A total of 227 673 transcripts were assembled into 105 071 unigenes, most of which shared high similarity with sequences in the NCBI non-redundant protein database. For functional classification, 24 493 unigenes were assigned to Gene Ontology terms. Additionally, EuKaryotic Ortholog Groups and Kyoto Encyclopedia of Genes and Genomes analyses assigned 12 028 unigenes to 26 categories and 7 849 unigenes to five pathways, respectively. Comparative transcriptomics analysis identified 3 800 unigenes that were differentially expressed in the Cd-treated group compared with the control group. Among them, genes associated with heavy metal accumulation were screened, including metallothionein, divalent metal transporter, and metal tolerance protein. The functional genes and predicted pathways identified in our study will contribute to a better understanding of the metabolic and immune system in the digestive gland of C. farreri. In addition, the transcriptomic data will provide a comprehensive resource that may contribute to the understanding of molecular mechanisms that respond to marine pollutants in bivalves.

  17. Molecular characterization of canine parvovirus and canine enteric coronavirus in diarrheic dogs on the island of St. Kitts: First report from the Caribbean region.

    PubMed

    Navarro, Ryan; Nair, Rajeev; Peda, Andrea; Aung, Meiji Soe; Ashwinie, G S; Gallagher, Christa A; Malik, Yashpal S; Kobayashi, Nobumichi; Ghosh, Souvik

    2017-08-15

    Although canine parvovirus (CPV) and canine enteric coronavirus (CCoV) are important enteric pathogens of dogs and have been studied extensively in different parts of the world, there are no reports on these viruses from the Caribbean region. During 2015-2016, a total of 104 diarrheic fecal samples were collected from puppies and adult dogs, with or without hemorrhagic gastroenteritis, on the Caribbean island of St. Kitts (KNA). By PCR, 25 (24%, n=104) samples tested positive for CPV. Based on analysis of the complete deduced VP2 amino acid sequences, 20 of the KNA CPV strains were assigned to new CPV-2a (also designated as CPV-2a-297A). On the other hand, the VP2 genes of the remaining 5 strains were partially characterized, or could not be sequenced. New CPV-2a was the predominant CPV variant in St. Kitts, contrasting the molecular epidemiology of CPV variants reported in most studies from nearby North and South American countries. By RT-PCR, CCoVs were detected in 5 samples (4.8%, n=104). Based on analysis of partial M-protein gene, the KNA CCoV strains were assigned to CCoV-I genotype, and were closely related to CCoV-I strains from Brazil. To our knowledge, this is the first report on detection and genetic diversity of CPV and CCoV in dogs from the Caribbean region, and underscores the importance of similar studies in the other Caribbean islands. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Functional metagenomics of oil-impacted mangrove sediments reveals high abundance of hydrolases of biotechnological interest.

    PubMed

    Ottoni, Júlia Ronzella; Cabral, Lucélia; de Sousa, Sanderson Tarciso Pereira; Júnior, Gileno Vieira Lacerda; Domingos, Daniela Ferreira; Soares Junior, Fábio Lino; da Silva, Mylenne Calciolari Pinheiro; Marcon, Joelma; Dias, Armando Cavalcante Franco; de Melo, Itamar Soares; de Souza, Anete Pereira; Andreote, Fernando Dini; de Oliveira, Valéria Maia

    2017-07-01

    Mangroves are located in coastal wetlands and are susceptible to the consequences of oil spills, what may threaten the diversity of microorganisms responsible for the nutrient cycling and the consequent ecosystem functioning. Previous reports show that high concentration of oil favors the incidence of epoxide hydrolases and haloalkane dehalogenases in mangroves. This finding has guided the goals of this study in an attempt to broaden the analysis to other hydrolases and thereby verify whether oil contamination interferes with the prevalence of particular hydrolases and their assigned microorganisms. For this, an in-depth survey of the taxonomic and functional microbial diversity recovered in a fosmid library (Library_Oil Mgv) constructed from oil-impacted Brazilian mangrove sediment was carried out. Fosmid DNA of the whole library was extracted and submitted to Illumina HiSeq sequencing. The resulting Library Oil_Mgv dataset was further compared with those obtained by direct sequencing of environmental DNA from Brazilian mangroves (from distinct regions and affected by distinct sources of contamination), focusing on hydrolases with potential use in biotechnological processes. The most abundant hydrolases found were proteases, esterases and amylases, with similar occurrence profile in all datasets. The main microbial groups harboring such hydrolase-encoding genes were distinct in each mangrove, and in the fosmid library these enzymes were mainly assigned to Chloroflexaceae (for amylases), Planctomycetaceae (for esterases) and Bradyrhizobiaceae (for proteases). Assembly and analysis of Library_Oil Mgv reads revealed three potentially novel enzymes, one epoxide hydrolase, one xylanase and one amylase, to be further investigated via heterologous expression assays.

  19. Classification and phylogeny of the cyanobiont Anabaena azollae Strasburger: an answered question?

    PubMed

    Pereira, Ana L; Vasconcelos, Vitor

    2014-06-01

    The symbiosis Azolla-Anabaena azollae, with a worldwide distribution in pantropical and temperate regions, is one of the most studied, because of its potential application as a biofertilizer, especially in rice fields, but also as an animal food and in phytoremediation. The cyanobiont is a filamentous, heterocystic cyanobacterium that inhabits the foliar cavities of the pteridophyte and the indusium on the megasporocarp (female reproductive structure). The classification and phylogeny of the cyanobiont is very controversial: from its morphology, it has been named Nostoc azollae, Anabaena azollae, Anabaena variabilis status azollae and recently Trichormus azollae, but, from its 16S rRNA gene sequence, it has been assigned to Nostoc and/or Anabaena, and from its phycocyanin gene sequence, it has been assigned as non-Nostoc and non-Anabaena. The literature also points to a possible co-evolution between the cyanobiont and the Azolla host, since dendrograms and phylogenetic trees of fatty acids, short tandemly repeated repetitive (STRR) analysis and restriction fragment length polymorphism (RFLP) analysis of nif genes and the 16S rRNA gene give a two-cluster association that matches the two-section ranking of the host (Azolla). Another controversy surrounds the possible existence of more than one genus or more than one species strain. The use of freshly isolated or cultured cyanobionts is an additional problem, since their morphology and protein profiles are different. This review gives an overview of how morphological, chemical and genetic analyses influence the classification and phylogeny of the cyanobiont and future research. © 2014 IUMS.

  20. Developmental staging of male murine embryonic gonad by SAGE analysis

    PubMed Central

    Lee, Tin-Lap; Li, Yunmin; Alba, Diana; Vong, Queenie P.; Wu, Shao-Ming; Baxendale, Vanessa; Rennert, Owen M.; Lau, Yun-Fai Chris; Chan, Wai-Yee

    2012-01-01

    Despite the identification of key genes such as Sry integral to embryonic gonadal development, the genomic classification and identification of chromosomal activation of this process is still poorly understood. To better understand the genetic regulation of gonadal development, we performed Serial Analysis of Gene Expression (SAGE) to profile the genes and novel transcripts, and an average of 152,000 tags from male embryonic gonads at E10.5 (embryonic day 10.5), E11.5, E12.5, E13.5, E15.5 and E17.5 were analyzed. A total of 275,583 non-singleton tags that do not map to any annotated sequence were identified in the six gonad libraries, and 47,255 tags were mapped to 24,975 annotated sequences, among which 987 sequences were uncharacterized. Utilizing an unsupervised pattern identification technique, we established molecular staging of male gonadal development. Rather than providing a static descriptive analysis, we developed algorithms to cluster the SAGE data and assign SAGE tags to a corresponding chromosomal position; these data are displayed in chromosome graphic format. A prominent increase in global genomic activity from E10.5 to E17.5 was observed. Important chromosomal regions related to the developmental processes were identified and validated based on established mouse models with developmental disorders. These regions may represent markers for early diagnosis for disorders of male gonad development as well as potential treatment targets. PMID:19376482

  1. Phylogenetic analysis of rubella viruses in Vietnam during 2009-2010.

    PubMed

    Tran, Dinh Nguyen; Pham, Ngan Thi Kim; Tran, Thi Thuy Trinh; Khamrin, Pattara; Thongprachum, Aksara; Komase, Katsuhiro; Hayakawa, Satoshi; Mizuguchi, Masashi; Ushijima, Hiroshi

    2012-04-01

    Rubella virus (RV) usually causes a mild disease. However, infection during the first trimester of pregnancy often leads to severe birth defects known as congenital rubella syndrome (CRS). Although wild-type RVs exist and circulate worldwide, their genotypes remain unknown in many countries. The aim of this study was to identify the molecular characteristics of RVs found in Vietnam during the years 2009-2010 and to provide the first data concerning RV genotypes in this country. Throat swab samples were collected between 2009 and 2010 from four CRS cases and nine rubella infection cases visiting one Children's Hospital and one outpatient clinic in Ho Chi Minh City. The 739-nucleotide coding region of the RV E1 gene recommended by the World Health Organization was amplified by reverse transcriptase PCR, and the resulting DNA fragments were then sequenced. Sequences were assigned to genotypes by phylogenetic analysis with RV reference strains. RV RNA was detected in 11 clinical specimens. Phylogenetic analysis of the sequences showed that all 11 strains belonged to 2B genotype. Several variations in amino acids were found, among which five changes were involved in the B and T cell epitopes. These data indicate that viruses of genotype 2B were circulating in Vietnam. The increasing information about RV genotype in Vietnam should aid in the control of rubella infection and CRS in this country. Copyright © 2012 Wiley Periodicals, Inc.

  2. Sequence and expression variation in SUPPRESSOR of OVEREXPRESSION of CONSTANS 1 (SOC1): homeolog evolution in Indian Brassicas.

    PubMed

    Sri, Tanu; Mayee, Pratiksha; Singh, Anandita

    2015-09-01

    Whole genome sequence analyses allow unravelling such evolutionary consequences of meso-triplication event in Brassicaceae (∼14-20 million years ago (MYA)) as differential gene fractionation and diversification in homeologous sub-genomes. This study presents a simple gene-centric approach involving microsynteny and natural genetic variation analysis for understanding SUPPRESSOR of OVEREXPRESSION of CONSTANS 1 (SOC1) homeolog evolution in Brassica. Analysis of microsynteny in Brassica rapa homeologous regions containing SOC1 revealed differential gene fractionation correlating to reported fractionation status of sub-genomes of origin, viz. least fractionated (LF), moderately fractionated 1 (MF1) and most fractionated (MF2), respectively. Screening 18 cultivars of 6 Brassica species led to the identification of 8 genomic and 27 transcript variants of SOC1, including splice-forms. Co-occurrence of both interrupted and intronless SOC1 genes was detected in few Brassica species. In silico analysis characterised Brassica SOC1 as MADS intervening, K-box, C-terminal (MIKC(C)) transcription factor, with highly conserved MADS and I domains relative to K-box and C-terminal domain. Phylogenetic analyses and multiple sequence alignments depicting shared pattern of silent/non-silent mutations assigned Brassica SOC1 homologs into groups based on shared diploid base genome. In addition, a sub-genome structure in uncharacterised Brassica genomes was inferred. Expression analysis of putative MF2 and LF (Brassica diploid base genome A (AA)) sub-genome-specific SOC1 homeologs of Brassica juncea revealed near identical expression pattern. However, MF2-specific homeolog exhibited significantly higher expression implying regulatory diversification. In conclusion, evidence for polyploidy-induced sequence and regulatory evolution in Brassica SOC1 is being presented wherein differential homeolog expression is implied in functional diversification.

  3. AGAPE (Automated Genome Analysis PipelinE) for Pan-Genome Analysis of Saccharomyces cerevisiae

    PubMed Central

    Song, Giltae; Dickins, Benjamin J. A.; Demeter, Janos; Engel, Stacia; Dunn, Barbara; Cherry, J. Michael

    2015-01-01

    The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community. PMID:25781462

  4. Nonencapsulated or nontypeable Haemophilus influenzae are more likely than their encapsulated or serotypeable counterparts to have mutations in their fucose operon.

    PubMed

    Shuel, Michelle L; Karlowsky, Kathleen E; Law, Dennis K S; Tsang, Raymond S W

    2011-12-01

    Population biology of Haemophilus influenzae can be studied by multilocus sequence typing (MLST), and isolates are assigned sequence types (STs) based on nucleotide sequence variations in seven housekeeping genes, including fucK. However, the ST cannot be assigned if one of the housekeeping genes is absent or cannot be detected by the current protocol. Occasionally, strains of H. influenzae have been reported to lack the fucK gene. In this study, we examined the prevalence of this mutation among our collection of H. influenzae isolates. Of the 704 isolates studied, including 282 encapsulated and 422 nonencapsulated isolates, nine were not typeable by MLST owing to failure to detect the fucK gene. All nine fucK-negative isolates were nonencapsulated and belonged to various biotypes. DNA sequencing of the fucose operon region confirmed complete deletion of genes in the operon in seven of the nine isolates, while in the remaining two isolates, some of the genes were found intact or in parts. The significance of these findings is discussed.

  5. Molecular genotyping of Trypanosoma cruzi for lineage assignment and population genetics.

    PubMed

    Messenger, Louisa A; Yeo, Matthew; Lewis, Michael D; Llewellyn, Martin S; Miles, Michael A

    2015-01-01

    Trypanosoma cruzi, the etiological agent of Chagas disease, remains a major public health problem in Latin America. Infection with T. cruzi is lifelong and can lead to a spectrum of pathological sequelae ranging from subclinical to lethal cardiac and/or gastrointestinal complications. Isolates of T. cruzi can be assigned to six genetic lineages or discrete typing units (DTUs), which are broadly associated with disparate ecologies, transmission cycles, and geographical distributions. This extensive genetic diversity is also believed to contribute to the clinical variation observed among chagasic patients. Unravelling the population structure of T. cruzi is fundamental to understanding Chagas disease epidemiology, developing control strategies, and resolving the relationship between parasite genotype and clinical prognosis. To date, no single, widely validated, genetic target allows unequivocal resolution to DTU-level. In this chapter we present standardized methods for strain DTU assignment using PCR-restriction fragment length polymorphism analysis (PCR-RFLP) and nuclear multilocus sequence typing (MLST). PCR-RFLPs have the advantages of simplicity and reproducibility, requiring limited expertise and few laboratory consumables. MLST data are more laborious to generate but more informative; DNA sequences are readily transferable between research groups and amenable to recombination detection and intra-lineage analyses. We also recommend a mitochondrial (maxicircle) MLST scheme and a panel of 28 microsatellite loci for higher resolution population genetics studies. Due to the scarcity of T. cruzi in blood and tissue, all of these genotyping techniques have limited sensitivity when applied directly to clinical or biological specimens, particularly when targets are single (MLST) or low copy number (PCR-RFLPs). We therefore describe essential protocols to isolate parasites, derive biological clones, and extract T. cruzi genomic DNA from field and clinical samples.

  6. Accurate continuous geographic assignment from low- to high-density SNP data.

    PubMed

    Guillot, Gilles; Jónsson, Hákon; Hinge, Antoine; Manchih, Nabil; Orlando, Ludovic

    2016-04-01

    Large-scale genotype datasets can help track the dispersal patterns of epidemiological outbreaks and predict the geographic origins of individuals. Such genetically-based geographic assignments also show a range of possible applications in forensics for profiling both victims and criminals, and in wildlife management, where poaching hotspot areas can be located. They, however, require fast and accurate statistical methods to handle the growing amount of genetic information made available from genotype arrays and next-generation sequencing technologies. We introduce a novel statistical method for geopositioning individuals of unknown origin from genotypes. Our method is based on a geostatistical model trained with a dataset of georeferenced genotypes. Statistical inference under this model can be implemented within the theoretical framework of Integrated Nested Laplace Approximation, which represents one of the major recent breakthroughs in statistics, as it does not require Monte Carlo simulations. We compare the performance of our method and an alternative method for geospatial inference, SPA in a simulation framework. We highlight the accuracy and limits of continuous spatial assignment methods at various scales by analyzing genotype datasets from a diversity of species, including Florida Scrub-jay birds Aphelocoma coerulescens, Arabidopsis thaliana and humans, representing 41-197,146 SNPs. Our method appears to be best suited for the analysis of medium-sized datasets (a few tens of thousands of loci), such as reduced-representation sequencing data that become increasingly available in ecology. http://www2.imm.dtu.dk/∼gigu/Spasiba/ gilles.b.guillot@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. Transcriptome-Derived Tetranucleotide Microsatellites and Their Associated Genes from the Giant Panda (Ailuropoda melanoleuca).

    PubMed

    Song, Xuhao; Shen, Fujun; Huang, Jie; Huang, Yan; Du, Lianming; Wang, Chengdong; Fan, Zhenxin; Hou, Rong; Yue, Bisong; Zhang, Xiuyue

    2016-09-01

    Recently, an increasing number of microsatellites or simple sequence repeats (SSRs) have been found and characterized from transcriptomes. Such SSRs can be employed as putative functional markers to easily tag corresponding genes, which play an important role in biomedical studies and genetic analysis. However, the transcriptome-derived SSRs for giant panda (Ailuropoda melanoleuca) are not yet available. In this work, we identified and characterized 20 tetranucleotide microsatellite loci from a transcript database generated from the blood of giant panda. Furthermore, we assigned their predicted transcriptome locations: 16 loci were assigned to untranslated regions (UTRs) and 4 loci were assigned to coding regions (CDSs). Gene identities of 14 transcripts contained corresponding microsatellites were determined, which provide useful information to study the potential contribution of SSRs to gene regulation in giant panda. The polymorphic information content (PIC) values ranged from 0.293 to 0.789 with an average of 0.603 for the 16 UTRs-derived SSRs. Interestingly, 4 CDS-derived microsatellites developed in our study were also polymorphic, and the instability of these 4 CDS-derived SSRs was further validated by re-genotyping and sequencing. The genes containing these 4 CDS-derived SSRs were embedded with various types of repeat motifs. The interaction of all the length-changing SSRs might provide a way against coding region frameshift caused by microsatellite instability. We hope these newly gene-associated biomarkers will pave the way for genetic and biomedical studies for giant panda in the future. In sum, this set of transcriptome-derived markers complements the genetic resources available for giant panda. © The American Genetic Association. 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  8. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    Standardization of grant and contract awardee names has been an area of concern since the development of the Department`s Procurement and Assistance Data System (PADS). A joint effort was begun in 1983 by the Office of Scientific and Technical Information (OSTI) and the Office of Procurement and Assistance Management/Information Systems and Analysis Division to develop a means for providing uniformity of awardee names. As a result of this effort, a method of assigning vendor identification codes to each unique awardee name, division, city, and state combination was developed and is maintained by OSTI. Changes to vendor identification codes or awardeemore » names contained in PADS can be made only by OSTI. Awardee names in the Directory indicate that the awardee has had a prime contract (excluding purchase orders of $10,000 or less) with, or a financial assistance award from, the Department. Award status--active, inactive, or retired--is not shown. The Directory is in alphabetic sequence based on awardee name and reflects the OSTI-assigned vendor identification code to the right of the name. A vendor identification code is assigned to each unique awardee name, division, city, and state (for place of performance). The same vendor identification code is used for awards throughout the Department.« less

  9. Reduced dimensionality (3,2)D NMR experiments and their automated analysis: implications to high-throughput structural studies on proteins.

    PubMed

    Reddy, Jithender G; Kumar, Dinesh; Hosur, Ramakrishna V

    2015-02-01

    Protein NMR spectroscopy has expanded dramatically over the last decade into a powerful tool for the study of their structure, dynamics, and interactions. The primary requirement for all such investigations is sequence-specific resonance assignment. The demand now is to obtain this information as rapidly as possible and in all types of protein systems, stable/unstable, soluble/insoluble, small/big, structured/unstructured, and so on. In this context, we introduce here two reduced dimensionality experiments – (3,2)D-hNCOcanH and (3,2)D-hNcoCAnH – which enhance the previously described 2D NMR-based assignment methods quite significantly. Both the experiments can be recorded in just about 2-3 h each and hence would be of immense value for high-throughput structural proteomics and drug discovery research. The applicability of the method has been demonstrated using alpha-helical bovine apo calbindin-D9k P43M mutant (75 aa) protein. Automated assignment of this data using AUTOBA has been presented, which enhances the utility of these experiments. The backbone resonance assignments so derived are utilized to estimate secondary structures and the backbone fold using Web-based algorithms. Taken together, we believe that the method and the protocol proposed here can be used for routine high-throughput structural studies of proteins. Copyright © 2014 John Wiley & Sons, Ltd.

  10. CRISPR adaptive immune systems of Archaea

    PubMed Central

    Vestergaard, Gisle; Garrett, Roger A; Shah, Shiraz A

    2014-01-01

    CRISPR adaptive immune systems were analyzed for all available completed genomes of archaea, which included representatives of each of the main archaeal phyla. Initially, all proteins encoded within, and proximal to, CRISPR-cas loci were clustered and analyzed using a profile–profile approach. Then cas genes were assigned to gene cassettes and to functional modules for adaptation and interference. CRISPR systems were then classified primarily on the basis of their concatenated Cas protein sequences and gene synteny of the interference modules. With few exceptions, they could be assigned to the universal Type I or Type III systems. For Type I, subtypes I-A, I-B, and I-D dominate but the data support the division of subtype I-B into two subtypes, designated I-B and I-G. About 70% of the Type III systems fall into the universal subtypes III-A and III-B but the remainder, some of which are phyla-specific, diverge significantly in Cas protein sequences, and/or gene synteny, and they are classified separately. Furthermore, a few CRISPR systems that could not be assigned to Type I or Type III are categorized as variant systems. Criteria are presented for assigning newly sequenced archaeal CRISPR systems to the different subtypes. Several accessory proteins were identified that show a specific gene linkage, especially to Type III interference modules, and these may be cofunctional with the CRISPR systems. Evidence is presented for extensive exchange having occurred between adaptation and interference modules of different archaeal CRISPR systems, indicating the wide compatibility of the functionally diverse interference complexes with the relatively conserved adaptation modules. PMID:24531374

  11. Genotyping-by-Sequencing Analysis for Determining Population Structure of Finger Millet Germplasm of Diverse Origins.

    PubMed

    Kumar, Anil; Sharma, Divya; Tiwari, Apoorv; Jaiswal, J P; Singh, N K; Sood, Salej

    2016-07-01

    Finger millet [ (L.) Gaertn.] is grown mainly by subsistence farmers in arid and semiarid regions of the world. To broaden its genetic base and to boost its production, it is of paramount importance to characterize and genotype the diverse gene pool of this important food and nutritional security crop. However, as a result of nonavailability of the genome sequence of finger millet, the progress could not be made in realizing the molecular basis of unique qualities of the crop. In the present investigation, attempts have been made to characterize the genetically diverse collection of 113 finger millet accessions through whole-genome genotyping-by-sequencing (GBS), which resulted in a genome-wide set of 23,000 single-nucleotide polymorphisms (SNPs) segregating across the entire collection and several thousand SNPs segregating within every accession. A model-based population structure analysis reveals the presence of three subpopulations among the finger millet accessions, which are in parallel with the results of phylogenetic analysis. The observed population structure is consistent with the hypothesis that finger millet was domesticated first in Africa, and from there it was introduced to India some 3000 yr ago. A total of 1128 gene ontology (GO) terms were assigned to SNP-carrying genes for three main categories: biological process, cellular component, and molecular function. Facilitated access to high-throughput genotyping and sequencing technologies are likely to improve the breeding process in developing countries, and as such, this data will be very useful to breeders who are working for the genetic improvement of finger millet. Copyright © 2016 Crop Science Society of America.

  12. Cloning and Expression Analysis of Genes Encoding Lytic Endopeptidases L1 and L5 from Lysobacter sp. Strain XL1

    PubMed Central

    Lapteva, Y. S.; Zolova, O. E.; Shlyapnikov, M. G.; Tsfasman, I. M.; Muranova, T. A.; Stepnaya, O. A.; Kulaev, I. S.

    2012-01-01

    Lytic enzymes are the group of hydrolases that break down structural polymers of the cell walls of various microorganisms. In this work, we determined the nucleotide sequences of the Lysobacter sp. strain XL1 alpA and alpB genes, which code for, respectively, secreted lytic endopeptidases L1 (AlpA) and L5 (AlpB). In silico analysis of their amino acid sequences showed these endopeptidases to be homologous proteins synthesized as precursors similar in structural organization: the mature enzyme sequence is preceded by an N-terminal signal peptide and a pro region. On the basis of phylogenetic analysis, endopeptidases AlpA and AlpB were assigned to the S1E family [clan PA(S)] of serine peptidases. Expression of the alpA and alpB open reading frames (ORFs) in Escherichia coli confirmed that they code for functionally active lytic enzymes. Each ORF was predicted to have the Shine-Dalgarno sequence located at a canonical distance from the start codon and a potential Rho-independent transcription terminator immediately after the stop codon. The alpA and alpB mRNAs were experimentally found to be monocistronic; transcription start points were determined for both mRNAs. The synthesis of the alpA and alpB mRNAs was shown to occur predominantly in the late logarithmic growth phase. The amount of alpA mRNA in cells of Lysobacter sp. strain XL1 was much higher, which correlates with greater production of endopeptidase L1 than of L5. PMID:22865082

  13. Cloning and expression analysis of genes encoding lytic endopeptidases L1 and L5 from Lysobacter sp. strain XL1.

    PubMed

    Lapteva, Y S; Zolova, O E; Shlyapnikov, M G; Tsfasman, I M; Muranova, T A; Stepnaya, O A; Kulaev, I S; Granovsky, I E

    2012-10-01

    Lytic enzymes are the group of hydrolases that break down structural polymers of the cell walls of various microorganisms. In this work, we determined the nucleotide sequences of the Lysobacter sp. strain XL1 alpA and alpB genes, which code for, respectively, secreted lytic endopeptidases L1 (AlpA) and L5 (AlpB). In silico analysis of their amino acid sequences showed these endopeptidases to be homologous proteins synthesized as precursors similar in structural organization: the mature enzyme sequence is preceded by an N-terminal signal peptide and a pro region. On the basis of phylogenetic analysis, endopeptidases AlpA and AlpB were assigned to the S1E family [clan PA(S)] of serine peptidases. Expression of the alpA and alpB open reading frames (ORFs) in Escherichia coli confirmed that they code for functionally active lytic enzymes. Each ORF was predicted to have the Shine-Dalgarno sequence located at a canonical distance from the start codon and a potential Rho-independent transcription terminator immediately after the stop codon. The alpA and alpB mRNAs were experimentally found to be monocistronic; transcription start points were determined for both mRNAs. The synthesis of the alpA and alpB mRNAs was shown to occur predominantly in the late logarithmic growth phase. The amount of alpA mRNA in cells of Lysobacter sp. strain XL1 was much higher, which correlates with greater production of endopeptidase L1 than of L5.

  14. DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses.

    PubMed

    Zepeda-Mendoza, Marie Lisandra; Bohmann, Kristine; Carmona Baez, Aldo; Gilbert, M Thomas P

    2016-05-03

    DNA metabarcoding is an approach for identifying multiple taxa in an environmental sample using specific genetic loci and taxa-specific primers. When combined with high-throughput sequencing it enables the taxonomic characterization of large numbers of samples in a relatively time- and cost-efficient manner. One recent laboratory development is the addition of 5'-nucleotide tags to both primers producing double-tagged amplicons and the use of multiple PCR replicates to filter erroneous sequences. However, there is currently no available toolkit for the straightforward analysis of datasets produced in this way. We present DAMe, a toolkit for the processing of datasets generated by double-tagged amplicons from multiple PCR replicates derived from an unlimited number of samples. Specifically, DAMe can be used to (i) sort amplicons by tag combination, (ii) evaluate PCR replicates dissimilarity, and (iii) filter sequences derived from sequencing/PCR errors, chimeras, and contamination. This is attained by calculating the following parameters: (i) sequence content similarity between the PCR replicates from each sample, (ii) reproducibility of each unique sequence across the PCR replicates, and (iii) copy number of the unique sequences in each PCR replicate. We showcase the insights that can be obtained using DAMe prior to taxonomic assignment, by applying it to two real datasets that vary in their complexity regarding number of samples, sequencing libraries, PCR replicates, and used tag combinations. Finally, we use a third mock dataset to demonstrate the impact and importance of filtering the sequences with DAMe. DAMe allows the user-friendly manipulation of amplicons derived from multiple samples with PCR replicates built in a single or multiple sequencing libraries. It allows the user to: (i) collapse amplicons into unique sequences and sort them by tag combination while retaining the sample identifier and copy number information, (ii) identify sequences carrying unused tag combinations, (iii) evaluate the comparability of PCR replicates of the same sample, and (iv) filter tagged amplicons from a number of PCR replicates using parameters of minimum length, copy number, and reproducibility across the PCR replicates. This enables an efficient analysis of complex datasets, and ultimately increases the ease of handling datasets from large-scale studies.

  15. Metabolic Pathway Assignment of Plant Genes based on Phylogenetic Profiling–A Feasibility Study

    PubMed Central

    Weißenborn, Sandra; Walther, Dirk

    2017-01-01

    Despite many developed experimental and computational approaches, functional gene annotation remains challenging. With the rapidly growing number of sequenced genomes, the concept of phylogenetic profiling, which predicts functional links between genes that share a common co-occurrence pattern across different genomes, has gained renewed attention as it promises to annotate gene functions based on presence/absence calls alone. We applied phylogenetic profiling to the problem of metabolic pathway assignments of plant genes with a particular focus on secondary metabolism pathways. We determined phylogenetic profiles for 40,960 metabolic pathway enzyme genes with assigned EC numbers from 24 plant species based on sequence and pathway annotation data from KEGG and Ensembl Plants. For gene sequence family assignments, needed to determine the presence or absence of particular gene functions in the given plant species, we included data of all 39 species available at the Ensembl Plants database and established gene families based on pairwise sequence identities and annotation information. Aside from performing profiling comparisons, we used machine learning approaches to predict pathway associations from phylogenetic profiles alone. Selected metabolic pathways were indeed found to be composed of gene families of greater than expected phylogenetic profile similarity. This was particularly evident for primary metabolism pathways, whereas for secondary pathways, both the available annotation in different species as well as the abstraction of functional association via distinct pathways proved limiting. While phylogenetic profile similarity was generally not found to correlate with gene co-expression, direct physical interactions of proteins were reflected by a significantly increased profile similarity suggesting an application of phylogenetic profiling methods as a filtering step in the identification of protein-protein interactions. This feasibility study highlights the potential and challenges associated with phylogenetic profiling methods for the detection of functional relationships between genes as well as the need to enlarge the set of plant genes with proven secondary metabolism involvement as well as the limitations of distinct pathways as abstractions of relationships between genes. PMID:29163570

  16. A novel method based on new adaptive LVQ neural network for predicting protein-protein interactions from protein sequences.

    PubMed

    Yousef, Abdulaziz; Moghadam Charkari, Nasrollah

    2013-11-07

    Protein-Protein interaction (PPI) is one of the most important data in understanding the cellular processes. Many interesting methods have been proposed in order to predict PPIs. However, the methods which are based on the sequence of proteins as a prior knowledge are more universal. In this paper, a sequence-based, fast, and adaptive PPI prediction method is introduced to assign two proteins to an interaction class (yes, no). First, in order to improve the presentation of the sequences, twelve physicochemical properties of amino acid have been used by different representation methods to transform the sequence of protein pairs into different feature vectors. Then, for speeding up the learning process and reducing the effect of noise PPI data, principal component analysis (PCA) is carried out as a proper feature extraction algorithm. Finally, a new and adaptive Learning Vector Quantization (LVQ) predictor is designed to deal with different models of datasets that are classified into balanced and imbalanced datasets. The accuracy of 93.88%, 90.03%, and 89.72% has been found on S. cerevisiae, H. pylori, and independent datasets, respectively. The results of various experiments indicate the efficiency and validity of the method. © 2013 Published by Elsevier Ltd.

  17. Binning of shallowly sampled metagenomic sequence fragments reveals that low abundance bacteria play important roles in sulfur cycling and degradation of complex organic polymers in an acid mine drainage community

    NASA Astrophysics Data System (ADS)

    Dick, G. J.; Andersson, A.; Banfield, J. F.

    2007-12-01

    Our understanding of environmental microbiology has been greatly enhanced by community genome sequencing of DNA recovered directly the environment. Community genomics provides insights into the diversity, community structure, metabolic function, and evolution of natural populations of uncultivated microbes, thereby revealing dynamics of how microorganisms interact with each other and their environment. Recent studies have demonstrated the potential for reconstructing near-complete genomes from natural environments while highlighting the challenges of analyzing community genomic sequence, especially from diverse environments. A major challenge of shotgun community genome sequencing is identification of DNA fragments from minor community members for which only low coverage of genomic sequence is present. We analyzed community genome sequence retrieved from biofilms in an acid mine drainage (AMD) system in the Richmond Mine at Iron Mountain, CA, with an emphasis on identification and assembly of DNA fragments from low-abundance community members. The Richmond mine hosts an extensive, relatively low diversity subterranean chemolithoautotrophic community that is sustained entirely by oxidative dissolution of pyrite. The activity of these microorganisms greatly accelerates the generation of AMD. Previous and ongoing work in our laboratory has focused on reconstrucing genomes of dominant community members, including several bacteria and archaea. We binned contigs from several samples (including one new sample and two that had been previously analyzed) by tetranucleotide frequency with clustering by Self-Organizing Maps (SOM). The binning, evaluated by comparison with information from the manually curated assembly of the dominant organisms, was found to be very effective: fragments were correctly assigned with 95% accuracy. Improperly assigned fragments often contained sequences that are either evolutionarily constrained (e.g. 16S rRNA genes) or mobile elements that are not expected to reflect the tetranucleotide frequency signature of the host genome. Four unknown tetranucleotide frequency clusters with significant sequence (6 Mb total) were noted and analyzed further. Based on phylogenetic markers and BLAST results, these clusters represent low abundance bacteria including Acintobacteria, Firmicutes, and Proteobacteria. Functional analysis of these clusters revealved that the low- abundance bacteria harbor genes that could potentially encode important ecosystem functions such as sulfur utilization (e.g. polysulfide reductase) and polymer degradation (e.g. chitinase and glycoside hydrolase). We conclude that ESOM clustering of tetranucleotide frequency patterns is an effective method for rapidly binning shotgun community genomic sequences and a valuable tool for analyzing minor community members, which despite their low abundance may play crucial ecological roles.

  18. In vitro optimization of truncated stem-loop II variants of the hammerhead ribozyme for cleavage in low concentrations of magnesium under non-turnover conditions.

    PubMed Central

    Zillmann, M; Limauro, S E; Goodchild, J

    1997-01-01

    By truncating helix II to two base pairs in a hammerhead ribozyme having long flanking sequences (greater than 30 bases), the rate of cleavage in 1 mM magnesium can be increased roughly 100-fold. Replacing most of the nucleotides in a typical stem-loop II with 1-4 randomized nucleotides gave an RNA library that, even before selection, was more active in 1 mM magnesium than the parent ribozyme, but considerably less active than the truncated stem-loop II ribozyme. A novel, multiround selection for intermolecular cleavage was exploited to optimize this library for cleavage in low concentrations of magnesium. After three rounds of selection at sequentially lower concentrations of magnesium, the library cleaved substrate RNA 20-fold faster than the initial pool and was cloned. This pool was heavily enriched for one particular sequence (5'-CGUG-3') that represented 16 of 52 isolates (the next most common sequence was represented only six times). This sequence also represented the most active sequence, exceeding the activity of the short helix II variant under the conditions of the selection, thereby demonstrating the effectiveness of the selection technique. Analysis of the cleavage rates of RNAs made from eight isolates having different four-base insert sequences allowed assignment of highly preferred bases at each position in the insert. Analysis of pool clones having insert of differing lengths showed that, in general, activity decreased as the length of the insert decreased from 4 to 1. This supports the suggested role of stem-loop II in stabilizing the non-Watson-Crick interactions between the conserved bases of the catalytic core. PMID:9214657

  19. Large-scale transcriptome analysis in chickpea (Cicer arietinum L.), an orphan legume crop of the semi-arid tropics of Asia and Africa.

    PubMed

    Hiremath, Pavana J; Farmer, Andrew; Cannon, Steven B; Woodward, Jimmy; Kudapa, Himabindu; Tuteja, Reetu; Kumar, Ashish; Bhanuprakash, Amindala; Mulaosmanovic, Benjamin; Gujaria, Neha; Krishnamurthy, Laxmanan; Gaur, Pooran M; Kavikishor, Polavarapu B; Shah, Trushar; Srinivasan, Ramamurthy; Lohse, Marc; Xiao, Yongli; Town, Christopher D; Cook, Douglas R; May, Gregory D; Varshney, Rajeev K

    2011-10-01

    Chickpea (Cicer arietinum L.) is an important legume crop in the semi-arid regions of Asia and Africa. Gains in crop productivity have been low however, particularly because of biotic and abiotic stresses. To help enhance crop productivity using molecular breeding techniques, next generation sequencing technologies such as Roche/454 and Illumina/Solexa were used to determine the sequence of most gene transcripts and to identify drought-responsive genes and gene-based molecular markers. A total of 103,215 tentative unique sequences (TUSs) have been produced from 435,018 Roche/454 reads and 21,491 Sanger expressed sequence tags (ESTs). Putative functions were determined for 49,437 (47.8%) of the TUSs, and gene ontology assignments were determined for 20,634 (41.7%) of the TUSs. Comparison of the chickpea TUSs with the Medicago truncatula genome assembly (Mt 3.5.1 build) resulted in 42,141 aligned TUSs with putative gene structures (including 39,281 predicted intron/splice junctions). Alignment of ∼37 million Illumina/Solexa tags generated from drought-challenged root tissues of two chickpea genotypes against the TUSs identified 44,639 differentially expressed TUSs. The TUSs were also used to identify a diverse set of markers, including 728 simple sequence repeats (SSRs), 495 single nucleotide polymorphisms (SNPs), 387 conserved orthologous sequence (COS) markers, and 2088 intron-spanning region (ISR) markers. This resource will be useful for basic and applied research for genome analysis and crop improvement in chickpea. Plant Biotechnology Journal © 2011 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd. No claim to original US government works.

  20. Comparative Analysis of Expressed Genes from Cacao Meristems Infected by Moniliophthora perniciosa

    PubMed Central

    Gesteira, Abelmon S.; Micheli, Fabienne; Carels, Nicolas; Da Silva, Aline C.; Gramacho, Karina P.; Schuster, Ivan; Macêdo, Joci N.; Pereira, Gonçalo A. G.; Cascardo, Júlio C. M.

    2007-01-01

    Background and Aims Witches' broom disease is caused by the hemibiotrophic basidiomycete Moniliophthora perniciosa, and is one of the most important diseases of cacao in the western hemisphere. Because very little is known about the global process of such disease development, expressed sequence tags (ESTs) were used to identify genes expressed during the Theobroma cacao–Moniliophthora perniciosa interaction. Methods Two cDNA libraries corresponding to the resistant (RT) and susceptible (SP) cacao–M. perniciosa interactions were constructed from total RNA, using the DB SMART Creator cDNA library kit (Clontech). Clones were randomly selected, sequenced from the 5′ end and analysed using bioinformatics tools including in silico analysis of the differential gene expression. Key Results A total of 6884 ESTs were generated from the RT and SP cDNA libraries. These ESTs were composed of 2585 singlets and 341 contigs for a total of 2926 non-redundant sequences. The redundancy of the libraries was low and their specificity high when compared with the few other cacao libraries already published. Sequence analysis allowed the assignment of a putative functional category for 54 % of sequences, whereas approx. 22 % of sequences corresponded to unknown function and approx. 24 % of sequences did not show any significant similarity with other proteins present in the database. Despite the similar overall distribution of the sequences in functional categories between the two libraries, qualitative differences were observed. Genes involved during the defence response to pathogen infection or in programmed cell death were identified, such as pathogenesis related-proteins, trypsin inhibitor or oxalate oxidase, and some of them showed an in silico differential expression between the resistant and the susceptible interactions. Conclusions As far as is known this is the first EST resource from the cacao–M. perniciosa interaction and it is believed that it will provide a significant contribution to the understanding of the molecular mechanisms of the resistance and susceptibility of cacao to M. perniciosa, to develop strategies to control witches broom, and as a source of polymorphism for molecular marker development and marker-assisted selection. PMID:17557832

  1. Validation of Minim typing for fast and accurate discrimination of extended-spectrum, beta-lactamase-producing Klebsiella pneumoniae isolates in tertiary care hospital.

    PubMed

    Brhelova, Eva; Kocmanova, Iva; Racil, Zdenek; Hanslianova, Marketa; Antonova, Mariya; Mayer, Jiri; Lengerova, Martina

    2016-09-01

    Minim typing is derived from the multi-locus sequence typing (MLST). It targets the same genes, but sequencing is replaced by high resolution melt analysis. Typing can be performed by analysing six loci (6MelT), four loci (4MelT) or using data from four loci plus sequencing the tonB gene (HybridMelT). The aim of this study was to evaluate Minim typing to discriminate extended-spectrum beta-lactamase producing Klebsiella pneumoniae (ESBL-KLPN) isolates at our hospital. In total, 380 isolates were analyzed. The obtained alleles were assigned according to both the 6MelT and 4MelT typing scheme. In 97 isolates, the tonB gene was sequenced to enable HybridMelT typing. We found that the presented method is suitable to quickly monitor isolates of ESBL-KLPN; results are obtained in less than 2 hours and at a lower cost than MLST. We identified a local ESBL-KLPN outbreak and a comparison of colonizing and invasive isolates revealed a long term colonization of patients with the same strain. Copyright © 2016 Elsevier Inc. All rights reserved.

  2. Sequencing and De Novo Assembly of the Toxicodendron radicans (Poison Ivy) Transcriptome

    PubMed Central

    Kim, Gunjune

    2017-01-01

    Contact with poison ivy plants is widely dreaded because they produce a natural product called urushiol that is responsible for allergenic contact delayed-dermatitis symptoms lasting for weeks. For this reason, the catchphrase most associated with poison ivy is “leaves of three, let it be”, which serves the purpose of both identification and an appeal for avoidance. Ironically, despite this notoriety, there is a dearth of specific knowledge about nearly all other aspects of poison ivy physiology and ecology. As a means of gaining a more molecular-oriented understanding of poison ivy physiology and ecology, Next Generation DNA sequencing technology was used to develop poison ivy root and leaf RNA-seq transcriptome resources. De novo assembled transcriptomes were analyzed to generate a core set of high quality expressed transcripts present in poison ivy tissue. The predicted protein sequences were evaluated for similarity to SwissProt homologs and InterProScan domains, as well as assigned both GO terms and KEGG annotations. Over 23,000 simple sequence repeats were identified in the transcriptome, and corresponding oligo nucleotide primer pairs were designed. A pan-transcriptome analysis of existing Anacardiaceae transcriptomes revealed conserved and unique transcripts among these species. PMID:29125533

  3. Sequencing and De Novo Assembly of the Toxicodendron radicans (Poison Ivy) Transcriptome.

    PubMed

    Weisberg, Alexandra J; Kim, Gunjune; Westwood, James H; Jelesko, John G

    2017-11-10

    Contact with poison ivy plants is widely dreaded because they produce a natural product called urushiol that is responsible for allergenic contact delayed-dermatitis symptoms lasting for weeks. For this reason, the catchphrase most associated with poison ivy is "leaves of three, let it be", which serves the purpose of both identification and an appeal for avoidance. Ironically, despite this notoriety, there is a dearth of specific knowledge about nearly all other aspects of poison ivy physiology and ecology. As a means of gaining a more molecular-oriented understanding of poison ivy physiology and ecology, Next Generation DNA sequencing technology was used to develop poison ivy root and leaf RNA-seq transcriptome resources. De novo assembled transcriptomes were analyzed to generate a core set of high quality expressed transcripts present in poison ivy tissue. The predicted protein sequences were evaluated for similarity to SwissProt homologs and InterProScan domains, as well as assigned both GO terms and KEGG annotations. Over 23,000 simple sequence repeats were identified in the transcriptome, and corresponding oligo nucleotide primer pairs were designed. A pan-transcriptome analysis of existing Anacardiaceae transcriptomes revealed conserved and unique transcripts among these species.

  4. HGVS Recommendations for the Description of Sequence Variants: 2016 Update.

    PubMed

    den Dunnen, Johan T; Dalgleish, Raymond; Maglott, Donna R; Hart, Reece K; Greenblatt, Marc S; McGowan-Jordan, Jean; Roux, Anne-Francoise; Smith, Timothy; Antonarakis, Stylianos E; Taschner, Peter E M

    2016-06-01

    The consistent and unambiguous description of sequence variants is essential to report and exchange information on the analysis of a genome. In particular, DNA diagnostics critically depends on accurate and standardized description and sharing of the variants detected. The sequence variant nomenclature system proposed in 2000 by the Human Genome Variation Society has been widely adopted and has developed into an internationally accepted standard. The recommendations are currently commissioned through a Sequence Variant Description Working Group (SVD-WG) operating under the auspices of three international organizations: the Human Genome Variation Society (HGVS), the Human Variome Project (HVP), and the Human Genome Organization (HUGO). Requests for modifications and extensions go through the SVD-WG following a standard procedure including a community consultation step. Version numbers are assigned to the nomenclature system to allow users to specify the version used in their variant descriptions. Here, we present the current recommendations, HGVS version 15.11, and briefly summarize the changes that were made since the 2000 publication. Most focus has been on removing inconsistencies and tightening definitions allowing automatic data processing. An extensive version of the recommendations is available online, at http://www.HGVS.org/varnomen. © 2016 WILEY PERIODICALS, INC.

  5. Transcriptome Sequencing, and Rapid Development and Application of SNP Markers for the Legume Pod Borer Maruca vitrata (Lepidoptera: Crambidae)

    PubMed Central

    Margam, Venu M.; Coates, Brad S.; Bayles, Darrell O.; Hellmich, Richard L.; Agunbiade, Tolulope; Seufferheld, Manfredo J.; Sun, Weilin; Kroemer, Jeremy A.; Ba, Malick N.; Binso-Dabire, Clementine L.; Baoua, Ibrahim; Ishiyaku, Mohammad F.; Covas, Fernando G.; Srinivasan, Ramasamy; Armstrong, Joel; Murdock, Larry L.; Pittendrigh, Barry R.

    2011-01-01

    The legume pod borer, Maruca vitrata (Lepidoptera: Crambidae), is an insect pest species of crops grown by subsistence farmers in tropical regions of Africa. We present the de novo assembly of 3729 contigs from 454- and Sanger-derived sequencing reads for midgut, salivary, and whole adult tissues of this non-model species. Functional annotation predicted that 1320 M. vitrata protein coding genes are present, of which 631 have orthologs within the Bombyx mori gene model. A homology-based analysis assigned M. vitrata genes into a group of paralogs, but these were subsequently partitioned into putative orthologs following phylogenetic analyses. Following sequence quality filtering, a total of 1542 putative single nucleotide polymorphisms (SNPs) were predicted within M. vitrata contig assemblies. Seventy one of 1078 designed molecular genetic markers were used to screen M. vitrata samples from five collection sites in West Africa. Population substructure may be present with significant implications in the insect resistance management recommendations pertaining to the release of biological control agents or transgenic cowpea that express Bacillus thuringiensis crystal toxins. Mutation data derived from transcriptome sequencing is an expeditious and economical source for genetic markers that allow evaluation of ecological differentiation. PMID:21754987

  6. Characterization of the glutathione S-transferase gene family through ESTs and expression analyses within common and pigmented cultivars of Citrus sinensis (L.) Osbeck.

    PubMed

    Licciardello, Concetta; D'Agostino, Nunzio; Traini, Alessandra; Recupero, Giuseppe Reforgiato; Frusciante, Luigi; Chiusano, Maria Luisa

    2014-02-03

    Glutathione S-transferases (GSTs) represent a ubiquitous gene family encoding detoxification enzymes able to recognize reactive electrophilic xenobiotic molecules as well as compounds of endogenous origin. Anthocyanin pigments require GSTs for their transport into the vacuole since their cytoplasmic retention is toxic to the cell. Anthocyanin accumulation in Citrus sinensis (L.) Osbeck fruit flesh determines different phenotypes affecting the typical pigmentation of Sicilian blood oranges. In this paper we describe: i) the characterization of the GST gene family in C. sinensis through a systematic EST analysis; ii) the validation of the EST assembly by exploiting the genome sequences of C. sinensis and C. clementina and their genome annotations; iii) GST gene expression profiling in six tissues/organs and in two different sweet orange cultivars, Cadenera (common) and Moro (pigmented). We identified 61 GST transcripts, described the full- or partial-length nature of the sequences and assigned to each sequence the GST class membership exploiting a comparative approach and the classification scheme proposed for plant species. A total of 23 full-length sequences were defined. Fifty-four of the 61 transcripts were successfully aligned to the C. sinensis and C. clementina genomes. Tissue specific expression profiling demonstrated that the expression of some GST transcripts was 'tissue-affected' and cultivar specific. A comparative analysis of C. sinensis GSTs with those from other plant species was also considered. Data from the current analysis are accessible at http://biosrv.cab.unina.it/citrusGST/, with the aim to provide a reference resource for C. sinensis GSTs. This study aimed at the characterization of the GST gene family in C. sinensis. Based on expression patterns from two different cultivars and on sequence-comparative analyses, we also highlighted that two sequences, a Phi class GST and a Mapeg class GST, could be involved in the conjugation of anthocyanin pigments and in their transport into the vacuole, specifically in fruit flesh of the pigmented cultivar.

  7. Tissue-specific Proteogenomic Analysis of Plutella xylostella Larval Midgut Using a Multialgorithm Pipeline.

    PubMed

    Zhu, Xun; Xie, Shangbo; Armengaud, Jean; Xie, Wen; Guo, Zhaojiang; Kang, Shi; Wu, Qingjun; Wang, Shaoli; Xia, Jixing; He, Rongjun; Zhang, Youjun

    2016-06-01

    The diamondback moth, Plutella xylostella (L.), is the major cosmopolitan pest of brassica and other cruciferous crops. Its larval midgut is a dynamic tissue that interfaces with a wide variety of toxicological and physiological processes. The draft sequence of the P. xylostella genome was recently released, but its annotation remains challenging because of the low sequence coverage of this branch of life and the poor description of exon/intron splicing rules for these insects. Peptide sequencing by computational assignment of tandem mass spectra to genome sequence information provides an experimental independent approach for confirming or refuting protein predictions, a concept that has been termed proteogenomics. In this study, we carried out an in-depth proteogenomic analysis to complement genome annotation of P. xylostella larval midgut based on shotgun HPLC-ESI-MS/MS data by means of a multialgorithm pipeline. A total of 876,341 tandem mass spectra were searched against the predicted P. xylostella protein sequences and a whole-genome six-frame translation database. Based on a data set comprising 2694 novel genome search specific peptides, we discovered 439 novel protein-coding genes and corrected 128 existing gene models. To get the most accurate data to seed further insect genome annotation, more than half of the novel protein-coding genes, i.e. 235 over 439, were further validated after RT-PCR amplification and sequencing of the corresponding transcripts. Furthermore, we validated 53 novel alternative splicings. Finally, a total of 6764 proteins were identified, resulting in one of the most comprehensive proteogenomic study of a nonmodel animal. As the first tissue-specific proteogenomics analysis of P. xylostella, this study provides the fundamental basis for high-throughput proteomics and functional genomics approaches aimed at deciphering the molecular mechanisms of resistance and controlling this pest. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  8. ReQON: a Bioconductor package for recalibrating quality scores from next-generation sequencing data

    PubMed Central

    2012-01-01

    Background Next-generation sequencing technologies have become important tools for genome-wide studies. However, the quality scores that are assigned to each base have been shown to be inaccurate. If the quality scores are used in downstream analyses, these inaccuracies can have a significant impact on the results. Results Here we present ReQON, a tool that recalibrates the base quality scores from an input BAM file of aligned sequencing data using logistic regression. ReQON also generates diagnostic plots showing the effectiveness of the recalibration. We show that ReQON produces quality scores that are both more accurate, in the sense that they more closely correspond to the probability of a sequencing error, and do a better job of discriminating between sequencing errors and non-errors than the original quality scores. We also compare ReQON to other available recalibration tools and show that ReQON is less biased and performs favorably in terms of quality score accuracy. Conclusion ReQON is an open source software package, written in R and available through Bioconductor, for recalibrating base quality scores for next-generation sequencing data. ReQON produces a new BAM file with more accurate quality scores, which can improve the results of downstream analysis, and produces several diagnostic plots showing the effectiveness of the recalibration. PMID:22946927

  9. Comprehensive comparison of three commercial human whole-exome capture platforms.

    PubMed

    Asan; Xu, Yu; Jiang, Hui; Tyler-Smith, Chris; Xue, Yali; Jiang, Tao; Wang, Jiawei; Wu, Mingzhi; Liu, Xiao; Tian, Geng; Wang, Jun; Wang, Jian; Yang, Huangming; Zhang, Xiuqing

    2011-09-28

    Exome sequencing, which allows the global analysis of protein coding sequences in the human genome, has become an effective and affordable approach to detecting causative genetic mutations in diseases. Currently, there are several commercial human exome capture platforms; however, the relative performances of these have not been characterized sufficiently to know which is best for a particular study. We comprehensively compared three platforms: NimbleGen's Sequence Capture Array and SeqCap EZ, and Agilent's SureSelect. We assessed their performance in a variety of ways, including number of genes covered and capture efficacy. Differences that may impact on the choice of platform were that Agilent SureSelect covered approximately 1,100 more genes, while NimbleGen provided better flanking sequence capture. Although all three platforms achieved similar capture specificity of targeted regions, the NimbleGen platforms showed better uniformity of coverage and greater genotype sensitivity at 30- to 100-fold sequencing depth. All three platforms showed similar power in exome SNP calling, including medically relevant SNPs. Compared with genotyping and whole-genome sequencing data, the three platforms achieved a similar accuracy of genotype assignment and SNP detection. Importantly, all three platforms showed similar levels of reproducibility, GC bias and reference allele bias. We demonstrate key differences between the three platforms, particularly advantages of solutions over array capture and the importance of a large gene target set.

  10. Protein structure and the sequential structure of mRNA: alpha-helix and beta-sheet signals at the nucleotide level.

    PubMed

    Brunak, S; Engelbrecht, J

    1996-06-01

    A direct comparison of experimentally determined protein structures and their corresponding protein coding mRNA sequences has been performed. We examine whether real world data support the hypothesis that clusters of rare codons correlate with the location of structural units in the resulting protein. The degeneracy of the genetic code allows for a biased selection of codons which may control the translational rate of the ribosome, and may thus in vivo have a catalyzing effect on the folding of the polypeptide chain. A complete search for GenBank nucleotide sequences coding for structural entries in the Brookhaven Protein Data Bank produced 719 protein chains with matching mRNA sequence, amino acid sequence, and secondary structure assignment. By neural network analysis, we found strong signals in mRNA sequence regions surrounding helices and sheets. These signals do not originate from the clustering of rare codons, but from the similarity of codons coding for very abundant amino acid residues at the N- and C-termini of helices and sheets. No correlation between the positioning of rare codons and the location of structural units was found. The mRNA signals were also compared with conserved nucleotide features of 16S-like ribosomal RNA sequences and related to mechanisms for maintaining the correct reading frame by the ribosome.

  11. Diagnostic performance and reproducibility of T2w based and diffusion weighted imaging (DWI) based PI-RADSv2 lexicon descriptors for prostate MRI.

    PubMed

    Benndorf, Matthias; Hahn, Felix; Krönig, Malte; Jilg, Cordula Annette; Krauss, Tobias; Langer, Mathias; Dovi-Akué, Philippe

    2017-08-01

    To examine the diagnostic performance of PI-RADSv2 T2w and diffusion weighted imaging (DWI) based lexicon descriptors, inter-observer agreement for descriptor assignment and diagnostic accuracy of the PI-RADSv2 assessment categories for multiparametric prostate MRI. 176 lesions in 79 consecutive patients are analyzed, lesions are histopathologically verified by MRI-ultrasound fusion biopsy. All lesions are rated according to the PI-RADSv2 lexicon, descriptors for T2w and DWI sequences and resulting assessment categories are assigned by two independent blinded radiologists. We perform receiver-operating-characteristic analysis using the assessment categories. To analyze inter-observer agreement, we calculate weighted kappa values for assessment category assignment and unweighted kappa values for descriptor assignment. PI-RADSv2 assessment categories yield an area under the curve of 0.76/0.74 (radiologist 1/radiologist 2), P >0.05. Weighted kappa for agreement is 0.601 in the peripheral zone and 0.580 in the transition zone. We detect a difference in the cancer rate for PI-RADSv2 category 3 between peripheral zone (32%) and transition zone (12%), P <0.05. We obtain moderate agreement at most for descriptor assignment with kappa values ranging from 0.082 (T2w shape in the transition zone) to 0.407 (T2w signal intensity in the peripheral zone) and 0.493 (ADC pattern in the peripheral zone). Our analysis corroborates typical descriptors for benign/malignant lesions, but also reveals insights into potential pitfalls - T2w wedge shaped lesions in the peripheral zone have a considerable cancer rate, despite being labelled category 2 in the lexicon. Agreement for descriptor assignment in the PI-RADSv2 lexicon is at most moderate in our study. Typical descriptors for benign and malignant lesions are validated, whereas the discriminatory power of some descriptors is challenged. The difference in the cancer rate for PI-RADSv2 category 3 between peripheral zone and transition zone should be considered when management recommendations are linked to assessment categories in the future. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Unravelling Glucan Recognition Systems by Glycome Microarrays Using the Designer Approach and Mass Spectrometry*

    PubMed Central

    Palma, Angelina S.; Liu, Yan; Zhang, Hongtao; Zhang, Yibing; McCleary, Barry V.; Yu, Guangli; Huang, Qilin; Guidolin, Leticia S.; Ciocchini, Andres E.; Torosantucci, Antonella; Wang, Denong; Carvalho, Ana Luísa; Fontes, Carlos M. G. A.; Mulloy, Barbara; Childs, Robert A.; Feizi, Ten; Chai, Wengang

    2015-01-01

    Glucans are polymers of d-glucose with differing linkages in linear or branched sequences. They are constituents of microbial and plant cell-walls and involved in important bio-recognition processes, including immunomodulation, anticancer activities, pathogen virulence, and plant cell-wall biodegradation. Translational possibilities for these activities in medicine and biotechnology are considerable. High-throughput micro-methods are needed to screen proteins for recognition of specific glucan sequences as a lead to structure–function studies and their exploitation. We describe construction of a “glucome” microarray, the first sequence-defined glycome-scale microarray, using a “designer” approach from targeted ligand-bearing glucans in conjunction with a novel high-sensitivity mass spectrometric sequencing method, as a screening tool to assign glucan recognition motifs. The glucome microarray comprises 153 oligosaccharide probes with high purity, representing major sequences in glucans. Negative-ion electrospray tandem mass spectrometry with collision-induced dissociation was used for complete linkage analysis of gluco-oligosaccharides in linear “homo” and “hetero” and branched sequences. The system is validated using antibodies and carbohydrate-binding modules known to target α- or β-glucans in different biological contexts, extending knowledge on their specificities, and applied to reveal new information on glucan recognition by two signaling molecules of the immune system against pathogens: Dectin-1 and DC-SIGN. The sequencing of the glucan oligosaccharides by the MS method and their interrogation on the microarrays provides detailed information on linkage, sequence and chain length requirements of glucan-recognizing proteins, and are a sensitive means of revealing unsuspected sequences in the polysaccharides. PMID:25670804

  13. Molecular analysis in true hermaphrodites with different karyotypes and similar phenotypes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Torres, L.; Cervantes, A.; Kofman-Alfaro, S.

    1996-05-17

    True hermaphroditism is characterized by the development of ovarian and testicular tissue in the same individual. Muellerian and Wolffian structures are usually present, and external genitalia are often ambiguous. The most frequent karyotype in these patients is 46,XX or various forms of mosaicism, whereas 46,XY is very rarely found. The phenotype in all these subjects is similar. We studied 10 true hermaphrodites. Six of them had a 46,XX chromosomal complement: 3 had been reared as males and 3 as females. The other 4 patients were mosaics: 3 were 46,XX/46,XY and one had a 46,XX/47,XXY karyotype. One of the 46,XX/46,XY mosaicsmore » was reared as a female, whereas the other 3 mosaics were reared as males. The sex of assignment in the 10 patients depended only on labio-scrotal differentiation. Molecular studies in 46,XX subjects documented the absence of Y centromeric sequences in all cases, arguing against hidden mosaicism. One patient presented Yp sequences (ZFY+, SRY+), which contrast with South African black 46,XX true hermaphrodites in whom no Y sequences were found. Molecular analysis in the subjects with mosaicism demonstrated the presence of Y centromeric and Yp sequences confirming the presence of a Y chromosome. Gonadal development, endocrine function, and phenotype in the 10 patients did not correlate with the presence of a Y chromosome or Y-derived sequences in the genome, confirming that true hermaphroditism is a heterogeneous condition. Both Mexican and non-South African 46,XX true hermaphrodites may be SRY positive. 51 refs., 3 figs., 2 tabs.« less

  14. Molecular diagnostics and ITS-based phylogenic analysis of Streptococcus suis serotype 2 in central Vietnam.

    PubMed

    Nguyen, Bach Hoang; Phan, Dieu Hong Nu; Nguyen, Hien Xuan; Le, An Van; Alberti, Alberto

    2015-07-04

    Streptococcus suis (S. suis) serotype 2 has recently become the most prevalent cause of meningitis in adults in many areas of Vietnam. This study provides data on S. suis molecular diagnosis in central Vietnam using a real-time polymerase chain reaction (PCR) assay targeting the S. suis serotype 2 cps2J gene. Additionally, 16S-23S rDNA intragenic spacer (ITS)-based phylogenic analysis of strains isolated from cerebrospinal fluid (CSF) in Thua Thien Hue Province, Vietnam, is presented and discussed. Pathogenic bacteria were isolated from 40 CSF samples, and 18 were identified as S. suis by culture-dependent methods. Capsular serotyping was assessed by real-time PCR. ITS sequences were obtained after traditional PCR and were used in phylogenic analyses. Pathogenic bacteria were isolated from 36 out of 40 CSF samples. A total of 18 S. suis strains were isolated and assigned to serotype 2 by real-time PCR. One CSF sample, negative when tested by culture-dependent methods, was positive to S. suis serotype 2 by real-time PCR. Pairwise alignments of the 18 ITS sequences did not reveal any variable nucleotide position, and resulted in a single sequence type. Sequences were similar to S. suis serotype 2 reference ITS sequences (> 98.1%), and there was no lack of an ITS spacer region in the isolates. S. suis serotype 2 is the most prevalent serotype in central Vietnam. Real-time PCR assay proved to be a reliable diagnostic method for early detection of S. suis 2 in CSF samples.

  15. Insights into the genetic structure and diversity of 38 South Asian Indians from deep whole-genome sequencing.

    PubMed

    Wong, Lai-Ping; Lai, Jason Kuan-Han; Saw, Woei-Yuh; Ong, Rick Twee-Hee; Cheng, Anthony Youzhi; Pillai, Nisha Esakimuthu; Liu, Xuanyao; Xu, Wenting; Chen, Peng; Foo, Jia-Nee; Tan, Linda Wei-Lin; Koo, Seok-Hwee; Soong, Richie; Wenk, Markus Rene; Lim, Wei-Yen; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2014-05-01

    South Asia possesses a significant amount of genetic diversity due to considerable intergroup differences in culture and language. There have been numerous reports on the genetic structure of Asian Indians, although these have mostly relied on genotyping microarrays or targeted sequencing of the mitochondria and Y chromosomes. Asian Indians in Singapore are primarily descendants of immigrants from Dravidian-language-speaking states in south India, and 38 individuals from the general population underwent deep whole-genome sequencing with a target coverage of 30X as part of the Singapore Sequencing Indian Project (SSIP). The genetic structure and diversity of these samples were compared against samples from the Singapore Sequencing Malay Project and populations in Phase 1 of the 1,000 Genomes Project (1 KGP). SSIP samples exhibited greater intra-population genetic diversity and possessed higher heterozygous-to-homozygous genotype ratio than other Asian populations. When compared against a panel of well-defined Asian Indians, the genetic makeup of the SSIP samples was closely related to South Indians. However, even though the SSIP samples clustered distinctly from the Europeans in the global population structure analysis with autosomal SNPs, eight samples were assigned to mitochondrial haplogroups that were predominantly present in Europeans and possessed higher European admixture than the remaining samples. An analysis of the relative relatedness between SSIP with two archaic hominins (Denisovan, Neanderthal) identified higher ancient admixture in East Asian populations than in SSIP. The data resource for these samples is publicly available and is expected to serve as a valuable complement to the South Asian samples in Phase 3 of 1 KGP.

  16. A compositional segmentation of the human mitochondrial genome is related to heterogeneities in the guanine mutation rate

    PubMed Central

    Samuels, David C.; Boys, Richard J.; Henderson, Daniel A.; Chinnery, Patrick F.

    2003-01-01

    We applied a hidden Markov model segmentation method to the human mitochondrial genome to identify patterns in the sequence, to compare these patterns to the gene structure of mtDNA and to see whether these patterns reveal additional characteristics important for our understanding of genome evolution, structure and function. Our analysis identified three segmentation categories based upon the sequence transition probabilities. Category 2 segments corresponded to the tRNA and rRNA genes, with a greater strand-symmetry in these segments. Category 1 and 3 segments covered the protein- coding genes and almost all of the non-coding D-loop. Compared to category 1, the mtDNA segments assigned to category 3 had much lower guanine abundance. A comparison to two independent databases of mitochondrial mutations and polymorphisms showed that the high substitution rate of guanine in human mtDNA is largest in the category 3 segments. Analysis of synonymous mutations showed the same pattern. This suggests that this heterogeneity in the mutation rate is partly independent of respiratory chain function and is a direct property of the genome sequence itself. This has important implications for our understanding of mtDNA evolution and its use as a ‘molecular clock’ to determine the rate of population and species divergence. PMID:14530452

  17. Large-scale gene function analysis with the PANTHER classification system.

    PubMed

    Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T; Thomas, Paul D

    2013-08-01

    The PANTHER (protein annotation through evolutionary relationship) classification system (http://www.pantherdb.org/) is a comprehensive system that combines gene function, ontology, pathways and statistical analysis tools that enable biologists to analyze large-scale, genome-wide data from sequencing, proteomics or gene expression experiments. The system is built with 82 complete genomes organized into gene families and subfamilies, and their evolutionary relationships are captured in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models or HMMs). Genes are classified according to their function in several different ways: families and subfamilies are annotated with ontology terms (Gene Ontology (GO) and PANTHER protein class), and sequences are assigned to PANTHER pathways. The PANTHER website includes a suite of tools that enable users to browse and query gene functions, and to analyze large-scale experimental data with a number of statistical tests. It is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists. In the 2013 release of PANTHER (v.8.0), in addition to an update of the data content, we redesigned the website interface to improve both user experience and the system's analytical capability. This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system.

  18. PONDEROSA, an automated 3D-NOESY peak picking program, enables automated protein structure determination

    PubMed Central

    Lee, Woonghee; Kim, Jin Hae; Westler, William M.; Markley, John L.

    2011-01-01

    Summary: PONDEROSA (Peak-picking Of Noe Data Enabled by Restriction of Shift Assignments) accepts input information consisting of a protein sequence, backbone and sidechain NMR resonance assignments, and 3D-NOESY (13C-edited and/or 15N-edited) spectra, and returns assignments of NOESY crosspeaks, distance and angle constraints, and a reliable NMR structure represented by a family of conformers. PONDEROSA incorporates and integrates external software packages (TALOS+, STRIDE and CYANA) to carry out different steps in the structure determination. PONDEROSA implements internal functions that identify and validate NOESY peak assignments and assess the quality of the calculated three-dimensional structure of the protein. The robustness of the analysis results from PONDEROSA's hierarchical processing steps that involve iterative interaction among the internal and external modules. PONDEROSA supports a variety of input formats: SPARKY assignment table (.shifts) and spectrum file formats (.ucsf), XEASY proton file format (.prot), and NMR-STAR format (.star). To demonstrate the utility of PONDEROSA, we used the package to determine 3D structures of two proteins: human ubiquitin and Escherichia coli iron-sulfur scaffold protein variant IscU(D39A). The automatically generated structural constraints and ensembles of conformers were as good as or better than those determined previously by much less automated means. Availability: The program, in the form of binary code along with tutorials and reference manuals, is available at http://ponderosa.nmrfam.wisc.edu/. Contact: whlee@nmrfam.wisc.edu; markley@nmrfam.wisc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21511715

  19. Assessing the Ability of Chloroplast and Nuclear DNA Gene Markers to Verify the Geographic Origin of Jatoba (Hymenaea courbaril L.) Timber.

    PubMed

    Chaves, Camila L; Degen, Bernd; Pakull, Birte; Mader, Malte; Honorio, Euridice; Ruas, Paulo; Tysklind, Niklas; Sebbenn, Alexandre M

    2018-06-27

    Deforestation-reinforced by illegal logging-is a serious problem in many tropical regions and causes pervasive environmental and economic damage. Existing laws that intend to reduce illegal logging need efficient, fraud resistant control methods. We developed a genetic reference database for Jatoba (Hymenaea courbaril), an important, high value timber species from the Neotropics. The data set can be used for controls on declarations of wood origin. Samples from 308 Hymenaea trees from 12 locations in Brazil, Bolivia, Peru, and French Guiana have been collected and genotyped on 10 nuclear microsatellites (nSSRs), 13 chloroplast SNPs (cpSNP), and 1 chloroplast indel marker. The chloroplast gene markers have been developed using Illumina DNA sequencing. Bayesian cluster analysis divided the individuals based on the nSSRs into 8 genetic groups. Using self-assignment tests, the power of the genetic reference database to judge on declarations on the location has been tested for 3 different assignment methods. We observed a strong genetic differentiation among locations leading to high and reliable self-assignment rates for the locations between 50% to 100% (average of 88%). Although all 3 assignment methods came up with similar mean self-assignment rates, there were differences for some locations linked to the level of genetic diversity, differentiation, and heterozygosity. Our results show that the nuclear and chloroplast gene markers are effective to be used for a genetic certification system and can provide national and international authorities with a robust tool to confirm legality of timber.

  20. Microbial community profiling of fresh basil and pitfalls in taxonomic assignment of enterobacterial pathogenic species based upon 16S rRNA amplicon sequencing.

    PubMed

    Ceuppens, Siele; De Coninck, Dieter; Bottledoorn, Nadine; Van Nieuwerburgh, Filip; Uyttendaele, Mieke

    2017-09-18

    Application of 16S rRNA (gene) amplicon sequencing on food samples is increasingly applied for assessing microbial diversity but may as unintended advantage also enable simultaneous detection of any human pathogens without a priori definition. In the present study high-throughput next-generation sequencing (NGS) of the V1-V2-V3 regions of the 16S rRNA gene was applied to identify the bacteria present on fresh basil leaves. However, results were strongly impacted by variations in the bioinformatics analysis pipelines (MEGAN, SILVAngs, QIIME and MG-RAST), including the database choice (Greengenes, RDP and M5RNA) and the annotation algorithm (best hit, representative hit and lowest common ancestor). The use of pipelines with default parameters will lead to discrepancies. The estimate of microbial diversity of fresh basil using 16S rRNA (gene) amplicon sequencing is thus indicative but subject to biases. Salmonella enterica was detected at low frequencies, between 0.1% and 0.4% of bacterial sequences, corresponding with 37 to 166 reads. However, this result was dependent upon the pipeline used: Salmonella was detected by MEGAN, SILVAngs and MG-RAST, but not by QIIME. Confirmation of Salmonella sequences by real-time PCR was unsuccessful. It was shown that taxonomic resolution obtained from the short (500bp) sequence reads of the 16S rRNA gene containing the hypervariable regions V1-V3 cannot allow distinction of Salmonella with closely related enterobacterial species. In conclusion 16S amplicon sequencing, getting the status of standard method in microbial ecology studies of foods, needs expertise on both bioinformatics and microbiology for analysis of results. It is a powerful tool to estimate bacterial diversity but amenable to biases. Limitations concerning taxonomic resolution for some bacterial species or its inability to detect sub-dominant (pathogenic) species should be acknowledged in order to avoid overinterpretation of results. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. Analysis of expressed sequence tags from Maize mosaic rhabdovirus-infected gut tissues of Peregrinus maidis reveals the presence of key components of insect innate immunity.

    PubMed

    Whitfield, A E; Rotenberg, D; Aritua, V; Hogenhout, S A

    2011-04-01

    The corn planthopper, Peregrinus maidis, causes direct feeding damage to plants and transmits Maize mosaic rhabdovirus (MMV) in a persistent-propagative manner. MMV must cross several insect tissue layers for successful transmission to occur, and the gut serves as an important barrier for rhabdovirus transmission. In order to facilitate the identification of proteins that may interact with MMV either by facilitating acquisition or responding to virus infection, we generated and analysed the gut transcriptome of P. maidis. From two normalized cDNA libraries, we generated a P. maidis gut transcriptome composed of 20,771 expressed sequence tags (ESTs). Assembly of the sequences yielded 1860 contigs and 14,032 singletons, and biological roles were assigned to 5793 (36%). Comparison of P. maidis ESTs with other insect amino acid sequences revealed that P. maidis shares greatest sequence similarity with another hemipteran, the brown planthopper Nilaparvata lugens. We identified 202 P. maidis transcripts with putative homology to proteins associated with insect innate immunity, including those implicated in the Toll, Imd, JAK/STAT, Jnk and the small-interfering RNA-mediated pathways. Sequence comparisons between our P. maidis gut EST collection and the currently available National Center for Biotechnology Information EST database collection for Ni. lugens revealed that a pathogen recognition receptor in the Imd pathway, peptidoglycan recognition protein-long class (PGRP-LC), is present in these two members of the family Delphacidae; however, these recognition receptors are lacking in the model hemipteran Acyrthosiphon pisum. In addition, we identified sequences in the P. maidis gut transcriptome that share significant amino acid sequence similarities with the rhabdovirus receptor molecule, acetylcholine receptor (AChR), found in other hosts. This EST analysis sheds new light on immune response pathways in hemipteran guts that will be useful for further dissecting innate defence response pathways to rhabdovirus infection. © 2011 The Authors. Insect Molecular Biology © 2011 The Royal Entomological Society.

  2. AQME: A forensic mitochondrial DNA analysis tool for next-generation sequencing data.

    PubMed

    Sturk-Andreaggi, Kimberly; Peck, Michelle A; Boysen, Cecilie; Dekker, Patrick; McMahon, Timothy P; Marshall, Charla K

    2017-11-01

    The feasibility of generating mitochondrial DNA (mtDNA) data has expanded considerably with the advent of next-generation sequencing (NGS), specifically in the generation of entire mtDNA genome (mitogenome) sequences. However, the analysis of these data has emerged as the greatest challenge to implementation in forensics. To address this need, a custom toolkit for use in the CLC Genomics Workbench (QIAGEN, Hilden, Germany) was developed through a collaborative effort between the Armed Forces Medical Examiner System - Armed Forces DNA Identification Laboratory (AFMES-AFDIL) and QIAGEN Bioinformatics. The AFDIL-QIAGEN mtDNA Expert, or AQME, generates an editable mtDNA profile that employs forensic conventions and includes the interpretation range required for mtDNA data reporting. AQME also integrates an mtDNA haplogroup estimate into the analysis workflow, which provides the analyst with phylogenetic nomenclature guidance and a profile quality check without the use of an external tool. Supplemental AQME outputs such as nucleotide-per-position metrics, configurable export files, and an audit trail are produced to assist the analyst during review. AQME is applied to standard CLC outputs and thus can be incorporated into any mtDNA bioinformatics pipeline within CLC regardless of sample type, library preparation or NGS platform. An evaluation of AQME was performed to demonstrate its functionality and reliability for the analysis of mitogenome NGS data. The study analyzed Illumina mitogenome data from 21 samples (including associated controls) of varying quality and sample preparations with the AQME toolkit. A total of 211 tool edits were automatically applied to 130 of the 698 total variants reported in an effort to adhere to forensic nomenclature. Although additional manual edits were required for three samples, supplemental tools such as mtDNA haplogroup estimation assisted in identifying and guiding these necessary modifications to the AQME-generated profile. Along with profile generation, AQME reported accurate haplogroups for 18 of the 19 samples analyzed. The single errant haplogroup assignment, although phylogenetically close, identified a bug that only affects partial mitogenome data. Future adjustments to AQME's haplogrouping tool will address this bug as well as enhance the overall scoring strategy to better refine and automate haplogroup assignments. As NGS enables broader use of the mtDNA locus in forensics, the availability of AQME and other forensic-focused mtDNA analysis tools will ease the transition and further support mitogenome analysis within routine casework. Toward this end, the AFMES-AFDIL has utilized the AQME toolbox in conjunction with the CLC Genomics Workbench to successfully validate and implement two NGS mitogenome methods. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. Morphometric and molecular characterization of fungus Pestalotiopsis using nuclear ribosomal DNA analysis.

    PubMed

    Gehlot, Praveen; Singh, S K; Pathak, Rakesh

    2012-09-01

    Taxonomy of the fungus Pestalotiopsis based on morphological characters has been equivocal. Molecular characterization often Pestalotiopsis species was done based on nuclear ribosomal DNA internal transcribed spacer (ITS) amplifications. Results of the analyses showed that species of genus Pestalotiopsis are monophyletic. We report ITS length variations, single nucleotide polymorphisms (SNPs) and insertions/ deletions (INDELS) among ten species of Pestalotiopsis that did not cause any phylogenetic error at either genus or species designation levels. New gene sequences have been assigned (Gen Accession numbers from HM 190146 to HM 190155) by the National Centre for Biotechnology Information, USA.

  4. Pilot self-coding applied in optical OFDM systems

    NASA Astrophysics Data System (ADS)

    Li, Changping; Yi, Ying; Lee, Kyesan

    2015-04-01

    This paper studies the frequency offset correction technique which can be applied in optical OFDM systems. Through theoretical analysis and computer simulations, we can observe that our proposed scheme named pilot self-coding (PSC) has a distinct influence for rectifying the frequency offset, which could mitigate the OFDM performance deterioration because of inter-carrier interference and common phase error. The main approach is to assign a pilot subcarrier before data subcarriers and copy this subcarrier sequence to the symmetric side. The simulation results verify that our proposed PSC is indeed effective against the high degree of frequency offset.

  5. Teaching Note--Integrating Theory and Research Methods in a First-Year Doctoral Sequence or Program

    ERIC Educational Resources Information Center

    Pollio, David E.; MacNeil, Gordon; Womack, Bethany; Brazeal, Michelle; Church, Wesley T., II

    2016-01-01

    This teaching note describes an innovative process in which faculty members worked collaboratively to create an integrated three-course sequence of requisite course content in a PhD program, developed complementary assignments, and coordinated a classroom experience that led to the creation of an individualized area statement and eventual…

  6. The Relationship between Successful Completion and Sequential Movement in Self-Paced Distance Courses

    ERIC Educational Resources Information Center

    Lim, Janine M.

    2016-01-01

    A course design question for self-paced courses includes whether or not technological measures should be used in course design to force students to follow the sequence intended by the course author. This study examined learner behavior to understand whether the sequence of student assignment submissions in a self-paced distance course is related…

  7. Serogroup-level resolution of the “Super-7” Shiga toxin-producing Escherichia coli using nanopore single-molecule DNA sequencing

    USDA-ARS?s Scientific Manuscript database

    DNA sequencing and other DNA-based methods, such as PCR, are now broadly used for detection and identification of bacterial foodborne pathogens. For the identification of foodborne bacterial pathogens, it is important to make taxonomic assignments to the species, or even subspecies level. Long-read ...

  8. Noncontiguous Finished Genome Sequence of Staphylococcus aureus KLT6, a Staphylococcal Enterotoxin B-Positive Strain Involved in a Food Poisoning Outbreak in Switzerland

    PubMed Central

    Tobes, Raquel; Manrique, Marina; Brozynska, Marta; Stephan, Roger; Pareja, Eduardo

    2013-01-01

    We present the first complete genome sequence of a Staphylococcus aureus strain assigned to clonal complex 12. The strain was isolated in a food poisoning outbreak due to contaminated potato salad in Switzerland in 2009, and it produces staphylococcal enterotoxin B. PMID:23704175

  9. SubCellProt: predicting protein subcellular localization using machine learning approaches.

    PubMed

    Garg, Prabha; Sharma, Virag; Chaudhari, Pradeep; Roy, Nilanjan

    2009-01-01

    High-throughput genome sequencing projects continue to churn out enormous amounts of raw sequence data. However, most of this raw sequence data is unannotated and, hence, not very useful. Among the various approaches to decipher the function of a protein, one is to determine its localization. Experimental approaches for proteome annotation including determination of a protein's subcellular localizations are very costly and labor intensive. Besides the available experimental methods, in silico methods present alternative approaches to accomplish this task. Here, we present two machine learning approaches for prediction of the subcellular localization of a protein from the primary sequence information. Two machine learning algorithms, k Nearest Neighbor (k-NN) and Probabilistic Neural Network (PNN) were used to classify an unknown protein into one of the 11 subcellular localizations. The final prediction is made on the basis of a consensus of the predictions made by two algorithms and a probability is assigned to it. The results indicate that the primary sequence derived features like amino acid composition, sequence order and physicochemical properties can be used to assign subcellular localization with a fair degree of accuracy. Moreover, with the enhanced accuracy of our approach and the definition of a prediction domain, this method can be used for proteome annotation in a high throughput manner. SubCellProt is available at www.databases.niper.ac.in/SubCellProt.

  10. A maize map standard with sequenced core markers, grass genome reference points and 932 expressed sequence tagged sites (ESTs) in a 1736-locus map.

    PubMed Central

    Davis, G L; McMullen, M D; Baysdorfer, C; Musket, T; Grant, D; Staebell, M; Xu, G; Polacco, M; Koster, L; Melia-Hancock, S; Houchins, K; Chao, S; Coe, E H

    1999-01-01

    We have constructed a 1736-locus maize genome map containing1156 loci probed by cDNAs, 545 probed by random genomic clones, 16 by simple sequence repeats (SSRs), 14 by isozymes, and 5 by anonymous clones. Sequence information is available for 56% of the loci with 66% of the sequenced loci assigned functions. A total of 596 new ESTs were mapped from a B73 library of 5-wk-old shoots. The map contains 237 loci probed by barley, oat, wheat, rice, or tripsacum clones, which serve as grass genome reference points in comparisons between maize and other grass maps. Ninety core markers selected for low copy number, high polymorphism, and even spacing along the chromosome delineate the 100 bins on the map. The average bin size is 17 cM. Use of bin assignments enables comparison among different maize mapping populations and experiments including those involving cytogenetic stocks, mutants, or quantitative trait loci. Integration of nonmaize markers in the map extends the resources available for gene discovery beyond the boundaries of maize mapping information into the expanse of map, sequence, and phenotype information from other grass species. This map provides a foundation for numerous basic and applied investigations including studies of gene organization, gene and genome evolution, targeted cloning, and dissection of complex traits. PMID:10388831

  11. Structural and theoretical study of 1-[1-oxo-3-phenyl-(2-benzosulfonamide)-propyl amido] - anthracene-9,10-dione to be i-motif inhibitor

    NASA Astrophysics Data System (ADS)

    Vatsal, Manu; Devi, Vandna; Awasthi, Pamita

    2018-04-01

    The 1-[1-oxo-3-phenyl-(2-benzosulfonamide)-propyl amido] - anthracene-9,10-dione (BPAQ) an analogue of anthracenedione class of antibiotic has been synthesized. To characterize molecular functional groups FT-IR and FT-Raman spectrum were recorded and vibrational frequencies were assigned accordingly. The optimized geometrical parameters, vibrational assignments, chemical shifts and thermodynamic properties of title compound were computed by ab initio calculations at Density Functional Theory (DFT) method with 6-31G(d,p) as basis set. The calculated harmonic vibrational frequencies of molecule were then analysed in comparison to experimental FT-IR and Raman spectrum. Gauge independent atomic orbital (GIAO) method was used for determining, (1H) and carbon (13C) nuclear magnetic resonance (NMR) spectra of the molecule. Molecular parameters were calculated along with its periodic boundary conditions calculation (PBC) analysis supported by X-ray diffraction studies. The frontier molecular orbital (HOMO, LUMO) analysis describes charge distribution and stability of the molecule which concluded that nucleophilic substitution is more preferred and the mullikan charge analysis also confirmed the same. Further the title compound showed an inhibitory action at d(TCCCCC), an intermolecular i-motif sequence, hence molecular docking study suggested the inhibitory activity of the compound at these junction.

  12. Undergraduates improve upon published crystal structure in class assignment.

    PubMed

    Horowitz, Scott; Koldewey, Philipp; Bardwell, James C

    2014-01-01

    Recently, 57 undergraduate students at the University of Michigan were assigned the task of solving a crystal structure, given only the electron density map of a 1.3 Å crystal structure from the electron density server, and the position of the N-terminal amino acid. To test their knowledge of amino acid chemistry, the students were not given the protein sequence. With minimal direction from the instructor on how the students should complete the assignment, the students fared remarkably well in this task, with over half the class able to reconstruct the original sequence with over 77% sequence identity, and with structures whose median ranked in the 91(st) percentile of all structures of comparable resolution in terms of structure quality. Fourteen percent of the students' structures produced Molprobity steric clash validation scores even better than that of the original structure, suggesting that multiple students achieved an improvement in the overall structure quality compared to the published structure. Students were able to delineate limiting case chemical environments, such as charged interactions or complete solvent exposure, but were less able to distinguish finer details of hydrogen bonding or hydrophobicity. Our results prompt several questions: why were students able to perform so well in their structural validation scores? How were some students able to outperform the 88% sequence identity mark that would constitute a perfect score, given the level of degenerate density or surface residues with poor density? And how can the methodology used by the best students inform the practices of professional X-ray crystallographers? Copyright © 2014 Wiley Periodicals, Inc.

  13. DNA barcoding of Bemisia tabaci complex (Hemiptera: Aleyrodidae) reveals southerly expansion of the dominant whitefly species on cotton in Pakistan.

    PubMed

    Ashfaq, Muhammad; Hebert, Paul D N; Mirza, M Sajjad; Khan, Arif M; Mansoor, Shahid; Shah, Ghulam S; Zafar, Yusuf

    2014-01-01

    Although whiteflies (Bemisia tabaci complex) are an important pest of cotton in Pakistan, its taxonomic diversity is poorly understood. As DNA barcoding is an effective tool for resolving species complexes and analyzing species distributions, we used this approach to analyze genetic diversity in the B. tabaci complex and map the distribution of B. tabaci lineages in cotton growing areas of Pakistan. Sequence diversity in the DNA barcode region (mtCOI-5') was examined in 593 whiteflies from Pakistan to determine the number of whitefly species and their distributions in the cotton-growing areas of Punjab and Sindh provinces. These new records were integrated with another 173 barcode sequences for B. tabaci, most from India, to better understand regional whitefly diversity. The Barcode Index Number (BIN) System assigned the 766 sequences to 15 BINs, including nine from Pakistan. Representative specimens of each Pakistan BIN were analyzed for mtCOI-3' to allow their assignment to one of the putative species in the B. tabaci complex recognized on the basis of sequence variation in this gene region. This analysis revealed the presence of Asia II 1, Middle East-Asia Minor 1, Asia 1, Asia II 5, Asia II 7, and a new lineage "Pakistan". The first two taxa were found in both Punjab and Sindh, but Asia 1 was only detected in Sindh, while Asia II 5, Asia II 7 and "Pakistan" were only present in Punjab. The haplotype networks showed that most haplotypes of Asia II 1, a species implicated in transmission of the cotton leaf curl virus, occurred in both India and Pakistan. DNA barcodes successfully discriminated cryptic species in B. tabaci complex. The dominant haplotypes in the B. tabaci complex were shared by India and Pakistan. Asia II 1 was previously restricted to Punjab, but is now the dominant lineage in southern Sindh; its southward spread may have serious implications for cotton plantations in this region.

  14. De Novo Assembly, Annotation, and Characterization of Root Transcriptomes of Three Caladium Cultivars with a Focus on Necrotrophic Pathogen Resistance/Defense-Related Genes

    PubMed Central

    Cao, Zhe; Deng, Zhanao

    2017-01-01

    Roots are vital to plant survival and crop yield, yet few efforts have been made to characterize the expressed genes in the roots of non-model plants (root transcriptomes). This study was conducted to sequence, assemble, annotate, and characterize the root transcriptomes of three caladium cultivars (Caladium × hortulanum) using RNA-Seq. The caladium cultivars used in this study have different levels of resistance to Pythium myriotylum, the most damaging necrotrophic pathogen to caladium roots. Forty-six to 61 million clean reads were obtained for each caladium root transcriptome. De novo assembly of the reads resulted in approximately 130,000 unigenes. Based on bioinformatic analysis, 71,825 (52.3%) caladium unigenes were annotated for putative functions, 48,417 (67.4%) and 31,417 (72.7%) were assigned to Gene Ontology (GO) and Clusters of Orthologous Groups (COG), respectively, and 46,406 (64.6%) unigenes were assigned to 128 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. A total of 4518 distinct unigenes were observed only in Pythium-resistant “Candidum” roots, of which 98 seemed to be involved in disease resistance and defense responses. In addition, 28,837 simple sequence repeat sites and 44,628 single nucleotide polymorphism sites were identified among the three caladium cultivars. These root transcriptome data will be valuable for further genetic improvement of caladium and related aroids. PMID:28346370

  15. ProteinInferencer: Confident protein identification and multiple experiment comparison for large scale proteomics projects.

    PubMed

    Zhang, Yaoyang; Xu, Tao; Shan, Bing; Hart, Jonathan; Aslanian, Aaron; Han, Xuemei; Zong, Nobel; Li, Haomin; Choi, Howard; Wang, Dong; Acharya, Lipi; Du, Lisa; Vogt, Peter K; Ping, Peipei; Yates, John R

    2015-11-03

    Shotgun proteomics generates valuable information from large-scale and target protein characterizations, including protein expression, protein quantification, protein post-translational modifications (PTMs), protein localization, and protein-protein interactions. Typically, peptides derived from proteolytic digestion, rather than intact proteins, are analyzed by mass spectrometers because peptides are more readily separated, ionized and fragmented. The amino acid sequences of peptides can be interpreted by matching the observed tandem mass spectra to theoretical spectra derived from a protein sequence database. Identified peptides serve as surrogates for their proteins and are often used to establish what proteins were present in the original mixture and to quantify protein abundance. Two major issues exist for assigning peptides to their originating protein. The first issue is maintaining a desired false discovery rate (FDR) when comparing or combining multiple large datasets generated by shotgun analysis and the second issue is properly assigning peptides to proteins when homologous proteins are present in the database. Herein we demonstrate a new computational tool, ProteinInferencer, which can be used for protein inference with both small- or large-scale data sets to produce a well-controlled protein FDR. In addition, ProteinInferencer introduces confidence scoring for individual proteins, which makes protein identifications evaluable. This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015. Published by Elsevier B.V.

  16. A Glycomics Platform for the Analysis of Permethylated Oligosaccharide Alditols

    PubMed Central

    Costello, Catherine E.; Contado-Miller, Joy May; Cipollo, John F.

    2007-01-01

    This communication reports the development of an LC/MS platform for the analysis of permethylated oligosaccharide alditols that, for the first time, demonstrates routine online oligosaccharide isomer separation of these compounds prior to introduction into the mass spectrometer. The method leverages a high resolution liquid chromatography system with the superior fragmentation pattern characteristics of permethylated oligosaccharide alditols that are dissociated under low-energy collision conditions using quadrupole orthogonal time-of-flight (QoTOF) instrumentation and up to pseudo MS3 mass spectrometry. Glycoforms, including isomers, are readily identified and their structures assigned. The isomer-specific spectra include highly informative cross-ring and elimination fragments, branch position specific signatures and glycosidic bond fragments, thus facilitating linkage, branch and sequence assignment. The method is sensitive and can be applied using as little as 40 fmol of derivatized oligosaccharide. Because permethylation renders oligosaccharides nearly chemically equivalent in the mass spectrometer, the method is semi-quantitative and, in this regard, is comparable to methods reported using high field NMR and capillary electrophoresis. In this post - genomic age, the importance of glycosylation in biological processes has become clear. The nature of many of the important questions in glycomics is such that sample material is often extremely limited, thus necessitating the development of highly sensitive methods for rigorous structural assignment of the oligosaccharides in complex mixtures. The glycomics platform presented here fulfills these criteria and should lead to more facile glycomics analyses. PMID:17719235

  17. Characterizing visible and invisible cell wall mutant phenotypes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Carpita, Nicholas C.; McCann, Maureen C.

    2015-04-06

    About 10% of a plant's genome is devoted to generating the protein machinery to synthesize, remodel, and deconstruct the cell wall. High-throughput genome sequencing technologies have enabled a reasonably complete inventory of wall-related genes that can be assembled into families of common evolutionary origin. Assigning function to each gene family member has been aided immensely by identification of mutants with visible phenotypes or by chemical and spectroscopic analysis of mutants with ‘invisible’ phenotypes of modified cell wall composition and architecture that do not otherwise affect plant growth or development. This review connects the inference of gene function on the basismore » of deviation from the wild type in genetic functional analyses to insights provided by modern analytical techniques that have brought us ever closer to elucidating the sequence structures of the major polysaccharide components of the plant cell wall.« less

  18. A “SMART” Design for Building Individualized Treatment Sequences

    PubMed Central

    Lei, H.; Nahum-Shani, I.; Lynch, K.; Oslin, D.; Murphy, S.A.

    2013-01-01

    Interventions often involve a sequence of decisions. For example, clinicians frequently adapt the intervention to an individual’s outcomes. Altering the intensity and type of intervention over time is crucial for many reasons, such as to obtain improvement if the individual is not responding or to reduce costs and burden when intensive treatment is no longer necessary. Adaptive interventions utilize individual variables (severity, preferences) to adapt the intervention and then dynamically utilize individual outcomes (response to treatment, adherence) to readapt the intervention. The Sequential Multiple Assignment Randomized Trial (SMART)provides high-quality data that can be used to construct adaptive interventions. We review the SMART and highlight its advantages in constructing and revising adaptive interventions as compared to alternative experimental designs. Selected examples of SMART studies are described and compared. A data analysis method is provided and illustrated using data from the Extending Treatment Effectiveness of Naltrexone SMART study. PMID:22224838

  19. Determination of the complete genomic sequence and analysis of the gene products of the virus of Spring Viremia of Carp, a fish rhabdovirus.

    PubMed

    Hoffmann, Bernd; Schütze, Heike; Mettenleiter, Thomas C

    2002-03-20

    The complete genome of spring viremia of carp virus (SVCV) was cloned and the sequence of 11019 nucleotides was determined. It contains five open reading frames (ORF's) encoding for the nucleoprotein N; phosphoprotein P; matrix protein M; glycoprotein G; and the viral RNA dependent RNA polymerase L. Genes are organised in the order typical for rhabdoviruses: 3'-N-P-M-G-L-5'. The short leader and trailer regions of SVCV exhibit inverse complementarity and are similar to the respective 3' and 5' ends of the genome of vesicular stomatitis virus. To verify the predicted open reading frames proteins were expressed in bacteria and analysed with a polyclonal anti-SVCV serum. Furthermore, monospecific antisera against the distinct viral proteins were generated. Comparison of genome and protein confirm the assignment of SVCV to the genus Vesiculovirus.

  20. [What gene and chromosomes say about the origin and evolution of insects and other arthropods].

    PubMed

    Lukhtanov, V A; Kuznetsova, V G

    2010-09-01

    At the turn of the 21st century, the use of molecular and molecular cytogenetic methods led to revolutionary advances in systematics of insects and other arthropods. Analysis of nuclear and mitochondrial genes, as well as investigation of structural rearrangements in the mitochondrial chromosome convincingly supported the Pancrustacea hypothesis, according to which insects originated directly from crustaceans, whereas myriapods are not closely related to them. The presence of the specific telomeric motif TTAGG confirmed the monophyletic origin of arthropods (Arthropoda) and the assignment of tongue worms (Pentastomida) to this type. Several different types of telomeric sequences have been found within the class of insects. Investigation of the molecular organization of these sequences may shed light on the relationships between the orders Diptera, Siphonaptera, and Mecoptera and on the origin of such enigmatic groups as the orders Strepsiptera, Zoraptera and suborder Coleorrhyncha.

Top