Sample records for sequence analysis demonstrated

  1. Method and apparatus for enhanced sequencing of complex molecules using surface-induced dissociation in conjunction with mass spectrometric analysis

    DOEpatents

    Laskin, Julia [Richland, WA; Futrell, Jean H [Richland, WA

    2008-04-29

    The invention relates to a method and apparatus for enhanced sequencing of complex molecules using surface-induced dissociation (SID) in conjunction with mass spectrometric analysis. Results demonstrate formation of a wide distribution of structure-specific fragments having wide sequence coverage useful for sequencing and identifying the complex molecules.

  2. Novel Molecular Method for Identification of Streptococcus pneumoniae Applicable to Clinical Microbiology and 16S rRNA Sequence-Based Microbiome Studies

    PubMed Central

    Scholz, Christian F. P.; Poulsen, Knud

    2012-01-01

    The close phylogenetic relationship of the important pathogen Streptococcus pneumoniae and several species of commensal streptococci, particularly Streptococcus mitis and Streptococcus pseudopneumoniae, and the recently demonstrated sharing of genes and phenotypic traits previously considered specific for S. pneumoniae hamper the exact identification of S. pneumoniae. Based on sequence analysis of 16S rRNA genes of a collection of 634 streptococcal strains, identified by multilocus sequence analysis, we detected a cytosine at position 203 present in all 440 strains of S. pneumoniae but replaced by an adenosine residue in all strains representing other species of mitis group streptococci. The S. pneumoniae-specific sequence signature could be demonstrated by sequence analysis or indirectly by restriction endonuclease digestion of a PCR amplicon covering the site. The S. pneumoniae-specific signature offers an inexpensive means for validation of the identity of clinical isolates and should be used as an integrated marker in the annotation procedure employed in 16S rRNA-based molecular studies of complex human microbiotas. This may avoid frequent misidentifications such as those we demonstrate to have occurred in previous reports and in reference sequence databases. PMID:22442329

  3. Sequence analysis of cultivated strawberry (Fragaria × ananassa Duch.) using microdissected single somatic chromosomes.

    PubMed

    Yanagi, Tomohiro; Shirasawa, Kenta; Terachi, Mayuko; Isobe, Sachiko

    2017-01-01

    Cultivated strawberry ( Fragaria  ×  ananassa Duch.) has homoeologous chromosomes because of allo-octoploidy. For example, two homoeologous chromosomes that belong to different sub-genome of allopolyploids have similar base sequences. Thus, when conducting de novo assembly of DNA sequences, it is difficult to determine whether these sequences are derived from the same chromosome. To avoid the difficulties associated with homoeologous chromosomes and demonstrate the possibility of sequencing allopolyploids using single chromosomes, we conducted sequence analysis using microdissected single somatic chromosomes of cultivated strawberry. Three hundred and ten somatic chromosomes of the Japanese octoploid strawberry 'Reiko' were individually selected under a light microscope using a microdissection system. DNA from 288 of the dissected chromosomes was successfully amplified using a DNA amplification kit. Using next-generation sequencing, we decoded the base sequences of the amplified DNA segments, and on the basis of mapping, we identified DNA sequences from 144 samples that were best matched to the reference genomes of the octoploid strawberry, F.  ×  ananassa , and the diploid strawberry, F. vesca . The 144 samples were classified into seven pseudo-molecules of F. vesca . The coverage rates of the DNA sequences from the single chromosome onto all pseudo-molecular sequences varied from 3 to 29.9%. We demonstrated an efficient method for sequence analysis of allopolyploid plants using microdissected single chromosomes. On the basis of our results, we believe that whole-genome analysis of allopolyploid plants can be enhanced using methodology that employs microdissected single chromosomes.

  4. mPUMA: a computational approach to microbiota analysis by de novo assembly of operational taxonomic units based on protein-coding barcode sequences.

    PubMed

    Links, Matthew G; Chaban, Bonnie; Hemmingsen, Sean M; Muirhead, Kevin; Hill, Janet E

    2013-08-15

    Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. The de novo assembly of OTU sequences has been recently demonstrated as an alternative to widely used clustering methods, providing robust information from experimental data alone, without any reliance on an external reference database. Here we introduce mPUMA (microbial Profiling Using Metagenomic Assembly, http://mpuma.sourceforge.net), a software package for identification and analysis of protein-coding barcode sequence data. It was developed originally for Cpn60 universal target sequences (also known as GroEL or Hsp60). Using an unattended process that is independent of external reference sequences, mPUMA forms OTUs by DNA sequence assembly and is capable of tracking OTU abundance. mPUMA processes microbial profiles both in terms of the direct DNA sequence as well as in the translated amino acid sequence for protein coding barcodes. By forming OTUs and calculating abundance through an assembly approach, mPUMA is capable of generating inputs for several popular microbiota analysis tools. Using SFF data from sequencing of a synthetic community of Cpn60 sequences derived from the human vaginal microbiome, we demonstrate that mPUMA can faithfully reconstruct all expected OTU sequences and produce compositional profiles consistent with actual community structure. mPUMA enables analysis of microbial communities while empowering the discovery of novel organisms through OTU assembly.

  5. Assessing copy number from exome sequencing and exome array CGH based on CNV spectrum in a large clinical cohort.

    PubMed

    Retterer, Kyle; Scuffins, Julie; Schmidt, Daniel; Lewis, Rachel; Pineda-Alvarez, Daniel; Stafford, Amanda; Schmidt, Lindsay; Warren, Stephanie; Gibellini, Federica; Kondakova, Anastasia; Blair, Amanda; Bale, Sherri; Matyakhina, Ludmila; Meck, Jeanne; Aradhya, Swaroop; Haverfield, Eden

    2015-08-01

    Detection of copy-number variation (CNV) is important for investigating many genetic disorders. Testing a large clinical cohort by array comparative genomic hybridization provides a deep perspective on the spectrum of pathogenic CNV. In this context, we describe a bioinformatics approach to extract CNV information from whole-exome sequencing and demonstrate its utility in clinical testing. Exon-focused arrays and whole-genome chromosomal microarray analysis were used to test 14,228 and 14,000 individuals, respectively. Based on these results, we developed an algorithm to detect deletions/duplications in whole-exome sequencing data and a novel whole-exome array. In the exon array cohort, we observed a positive detection rate of 2.4% (25 duplications, 318 deletions), of which 39% involved one or two exons. Chromosomal microarray analysis identified 3,345 CNVs affecting single genes (18%). We demonstrate that our whole-exome sequencing algorithm resolves CNVs of three or more exons. These results demonstrate the clinical utility of single-exon resolution in CNV assays. Our whole-exome sequencing algorithm approaches this resolution but is complemented by a whole-exome array to unambiguously identify intragenic CNVs and single-exon changes. These data illustrate the next advancements in CNV analysis through whole-exome sequencing and whole-exome array.Genet Med 17 8, 623-629.

  6. Location analysis for the estrogen receptor-α reveals binding to diverse ERE sequences and widespread binding within repetitive DNA elements

    PubMed Central

    Mason, Christopher E.; Shu, Feng-Jue; Wang, Cheng; Session, Ryan M.; Kallen, Roland G.; Sidell, Neil; Yu, Tianwei; Liu, Mei Hui; Cheung, Edwin; Kallen, Caleb B.

    2010-01-01

    Location analysis for estrogen receptor-α (ERα)-bound cis-regulatory elements was determined in MCF7 cells using chromatin immunoprecipitation (ChIP)-on-chip. Here, we present the estrogen response element (ERE) sequences that were identified at ERα-bound loci and quantify the incidence of ERE sequences under two stringencies of detection: <10% and 10–20% nucleotide deviation from the canonical ERE sequence. We demonstrate that ∼50% of all ERα-bound loci do not have a discernable ERE and show that most ERα-bound EREs are not perfect consensus EREs. Approximately one-third of all ERα-bound ERE sequences reside within repetitive DNA sequences, most commonly of the AluS family. In addition, the 3-bp spacer between the inverted ERE half-sites, rather than being random nucleotides, is C(A/T)G-enriched at bona fide receptor targets. Diverse ERα-bound loci were validated using electrophoretic mobility shift assay and ChIP-polymerase chain reaction (PCR). The functional significance of receptor-bound loci was demonstrated using luciferase reporter assays which proved that repetitive element ERE sequences contribute to enhancer function. ChIP-PCR demonstrated estrogen-dependent recruitment of the coactivator SRC3 to these loci in vivo. Our data demonstrate that ERα binds to widely variant EREs with less sequence specificity than had previously been suspected and that binding at repetitive and nonrepetitive genomic targets is favored by specific trinucleotide spacers. PMID:20047966

  7. Location analysis for the estrogen receptor-alpha reveals binding to diverse ERE sequences and widespread binding within repetitive DNA elements.

    PubMed

    Mason, Christopher E; Shu, Feng-Jue; Wang, Cheng; Session, Ryan M; Kallen, Roland G; Sidell, Neil; Yu, Tianwei; Liu, Mei Hui; Cheung, Edwin; Kallen, Caleb B

    2010-04-01

    Location analysis for estrogen receptor-alpha (ERalpha)-bound cis-regulatory elements was determined in MCF7 cells using chromatin immunoprecipitation (ChIP)-on-chip. Here, we present the estrogen response element (ERE) sequences that were identified at ERalpha-bound loci and quantify the incidence of ERE sequences under two stringencies of detection: <10% and 10-20% nucleotide deviation from the canonical ERE sequence. We demonstrate that approximately 50% of all ERalpha-bound loci do not have a discernable ERE and show that most ERalpha-bound EREs are not perfect consensus EREs. Approximately one-third of all ERalpha-bound ERE sequences reside within repetitive DNA sequences, most commonly of the AluS family. In addition, the 3-bp spacer between the inverted ERE half-sites, rather than being random nucleotides, is C(A/T)G-enriched at bona fide receptor targets. Diverse ERalpha-bound loci were validated using electrophoretic mobility shift assay and ChIP-polymerase chain reaction (PCR). The functional significance of receptor-bound loci was demonstrated using luciferase reporter assays which proved that repetitive element ERE sequences contribute to enhancer function. ChIP-PCR demonstrated estrogen-dependent recruitment of the coactivator SRC3 to these loci in vivo. Our data demonstrate that ERalpha binds to widely variant EREs with less sequence specificity than had previously been suspected and that binding at repetitive and nonrepetitive genomic targets is favored by specific trinucleotide spacers.

  8. Sequence analysis of the L protein of the Ebola 2014 outbreak: Insight into conserved regions and mutations.

    PubMed

    Ayub, Gohar; Waheed, Yasir

    2016-06-01

    The 2014 Ebola outbreak was one of the largest that have occurred; it started in Guinea and spread to Nigeria, Liberia and Sierra Leone. Phylogenetic analysis of the current virus species indicated that this outbreak is the result of a divergent lineage of the Zaire ebolavirus. The L protein of Ebola virus (EBOV) is the catalytic subunit of the RNA‑dependent RNA polymerase complex, which, with VP35, is key for the replication and transcription of viral RNA. Earlier sequence analysis demonstrated that the L protein of all non‑segmented negative‑sense (NNS) RNA viruses consists of six domains containing conserved functional motifs. The aim of the present study was to analyze the presence of these motifs in 2014 EBOV isolates, highlight their function and how they may contribute to the overall pathogenicity of the isolates. For this purpose, 81 2014 EBOV L protein sequences were aligned with 475 other NNS RNA viruses, including Paramyxoviridae and Rhabdoviridae viruses. Phylogenetic analysis of all EBOV outbreak L protein sequences was also performed. Analysis of the amino acid substitutions in the 2014 EBOV outbreak was conducted using sequence analysis. The alignment demonstrated the presence of previously conserved motifs in the 2014 EBOV isolates and novel residues. Notably, all the mutations identified in the 2014 EBOV isolates were tolerant, they were pathogenic with certain examples occurring within previously determined functional conserved motifs, possibly altering viral pathogenicity, replication and virulence. The phylogenetic analysis demonstrated that all sequences with the exception of the 2014 EBOV sequences were clustered together. The 2014 EBOV outbreak has acquired a great number of mutations, which may explain the reasons behind this unprecedented outbreak. Certain residues critical to the function of the polymerase remain conserved and may be targets for the development of antiviral therapeutic agents.

  9. SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform.

    PubMed

    Lin, Jie; Wei, Jing; Adjeroh, Donald; Jiang, Bing-Hua; Jiang, Yue

    2018-05-02

    Alignment-free sequence similarity analysis methods often lead to significant savings in computational time over alignment-based counterparts. A new alignment-free sequence similarity analysis method, called SSAW is proposed. SSAW stands for Sequence Similarity Analysis using the Stationary Discrete Wavelet Transform (SDWT). It extracts k-mers from a sequence, then maps each k-mer to a complex number field. Then, the series of complex numbers formed are transformed into feature vectors using the stationary discrete wavelet transform. After these steps, the original sequence is turned into a feature vector with numeric values, which can then be used for clustering and/or classification. Using two different types of applications, namely, clustering and classification, we compared SSAW against the the-state-of-the-art alignment free sequence analysis methods. SSAW demonstrates competitive or superior performance in terms of standard indicators, such as accuracy, F-score, precision, and recall. The running time was significantly better in most cases. These make SSAW a suitable method for sequence analysis, especially, given the rapidly increasing volumes of sequence data required by most modern applications.

  10. Whole genome sequence analysis of unidentified genetically modified papaya for development of a specific detection method.

    PubMed

    Nakamura, Kosuke; Kondo, Kazunari; Akiyama, Hiroshi; Ishigaki, Takumi; Noguchi, Akio; Katsumata, Hiroshi; Takasaki, Kazuto; Futo, Satoshi; Sakata, Kozue; Fukuda, Nozomi; Mano, Junichi; Kitta, Kazumi; Tanaka, Hidenori; Akashi, Ryo; Nishimaki-Mogami, Tomoko

    2016-08-15

    Identification of transgenic sequences in an unknown genetically modified (GM) papaya (Carica papaya L.) by whole genome sequence analysis was demonstrated. Whole genome sequence data were generated for a GM-positive fresh papaya fruit commodity detected in monitoring using real-time polymerase chain reaction (PCR). The sequences obtained were mapped against an open database for papaya genome sequence. Transgenic construct- and event-specific sequences were identified as a GM papaya developed to resist infection from a Papaya ringspot virus. Based on the transgenic sequences, a specific real-time PCR detection method for GM papaya applicable to various food commodities was developed. Whole genome sequence analysis enabled identifying unknown transgenic construct- and event-specific sequences in GM papaya and development of a reliable method for detecting them in papaya food commodities. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. Sequence investigation of 34 forensic autosomal STRs with massively parallel sequencing.

    PubMed

    Zhang, Suhua; Niu, Yong; Bian, Yingnan; Dong, Rixia; Liu, Xiling; Bao, Yun; Jin, Chao; Zheng, Hancheng; Li, Chengtao

    2018-05-01

    STRs vary not only in the length of the repeat units and the number of repeats but also in the region with which they conform to an incremental repeat pattern. Massively parallel sequencing (MPS) offers new possibilities in the analysis of STRs since they can simultaneously sequence multiple targets in a single reaction and capture potential internal sequence variations. Here, we sequenced 34 STRs applied in the forensic community of China with a custom-designed panel. MPS performance were evaluated from sequencing reads analysis, concordance study and sensitivity testing. High coverage sequencing data were obtained to determine the constitute ratios and heterozygous balance. No actual inconsistent genotypes were observed between capillary electrophoresis (CE) and MPS, demonstrating the reliability of the panel and the MPS technology. With the sequencing data from the 200 investigated individuals, 346 and 418 alleles were obtained via CE and MPS technologies at the 34 STRs, indicating MPS technology provides higher discrimination than CE detection. The whole study demonstrated that STR genotyping with the custom panel and MPS technology has the potential not only to reveal length and sequence variations but also to satisfy the demands of high throughput and high multiplexing with acceptable sensitivity.

  12. Regioselective Hydration of an Alkene and Analysis of the Alcohol Product by Remote Access NMR: A Classroom Demonstration

    ERIC Educational Resources Information Center

    Smith, Maureen E.; Johnson, Sara L.; Masterson, Douglas S.

    2013-01-01

    A two-part demonstration was conducted in our first-semester organic chemistry course designed to introduce students to the formation of alcohols, regioselective reactions, and analysis of organic products by NMR analysis. This demonstration utilized the oxymercuration-demercuration sequence to prepare an alcohol from an alkene in a Markovnikov…

  13. Identifying Novel Helix-Loop-Helix Genes in "Caenorhabditis elegans" through a Classroom Demonstration of Functional Genomics

    ERIC Educational Resources Information Center

    Griffin, Vernetta; McMiller, Tracee; Jones, Erika; Johnson, Casonya M.

    2003-01-01

    A 14-week, undergraduate-level Genetics and Population Biology course at Morgan State University was modified to include a demonstration of functional genomics in the research laboratory. Students performed a rudimentary sequence analysis of the "Caenorhabditis elegans" genome and further characterized three sequences that were predicted to encode…

  14. Application of discrete Fourier inter-coefficient difference for assessing genetic sequence similarity.

    PubMed

    King, Brian R; Aburdene, Maurice; Thompson, Alex; Warres, Zach

    2014-01-01

    Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.

  15. DendroBLAST: approximate phylogenetic trees in the absence of multiple sequence alignments.

    PubMed

    Kelly, Steven; Maini, Philip K

    2013-01-01

    The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realistic simulated sequence data we demonstrate that this method produces phylogenetic trees that are more accurate than other commonly-used distance based methods though not as accurate as maximum likelihood methods from good quality multiple sequence alignments. In addition to tests on simulated data, we use DendroBLAST to generate input trees for a supertree reconstruction of the phylogeny of the Archaea. This independent analysis produces an approximate phylogeny of the Archaea that has both high precision and recall when compared to previously published analysis of the same dataset using conventional methods. Taken together these results demonstrate that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and we propose that these trees will provide a platform for improving and informing downstream bioinformatic analysis. A web implementation of the DendroBLAST method is freely available for use at http://www.dendroblast.com/.

  16. TaxI: a software tool for DNA barcoding using distance methods

    PubMed Central

    Steinke, Dirk; Vences, Miguel; Salzburger, Walter; Meyer, Axel

    2005-01-01

    DNA barcoding is a promising approach to the diagnosis of biological diversity in which DNA sequences serve as the primary key for information retrieval. Most existing software for evolutionary analysis of DNA sequences was designed for phylogenetic analyses and, hence, those algorithms do not offer appropriate solutions for the rapid, but precise analyses needed for DNA barcoding, and are also unable to process the often large comparative datasets. We developed a flexible software tool for DNA taxonomy, named TaxI. This program calculates sequence divergences between a query sequence (taxon to be barcoded) and each sequence of a dataset of reference sequences defined by the user. Because the analysis is based on separate pairwise alignments this software is also able to work with sequences characterized by multiple insertions and deletions that are difficult to align in large sequence sets (i.e. thousands of sequences) by multiple alignment algorithms because of computational restrictions. Here, we demonstrate the utility of this approach with two datasets of fish larvae and juveniles from Lake Constance and juvenile land snails under different models of sequence evolution. Sets of ribosomal 16S rRNA sequences, characterized by multiple indels, performed as good as or better than cox1 sequence sets in assigning sequences to species, demonstrating the suitability of rRNA genes for DNA barcoding. PMID:16214755

  17. Comparative sequence analysis of Sordaria macrospora and Neurospora crassa as a means to improve genome annotation.

    PubMed

    Nowrousian, Minou; Würtz, Christian; Pöggeler, Stefanie; Kück, Ulrich

    2004-03-01

    One of the most challenging parts of large scale sequencing projects is the identification of functional elements encoded in a genome. Recently, studies of genomes of up to six different Saccharomyces species have demonstrated that a comparative analysis of genome sequences from closely related species is a powerful approach to identify open reading frames and other functional regions within genomes [Science 301 (2003) 71, Nature 423 (2003) 241]. Here, we present a comparison of selected sequences from Sordaria macrospora to their corresponding Neurospora crassa orthologous regions. Our analysis indicates that due to the high degree of sequence similarity and conservation of overall genomic organization, S. macrospora sequence information can be used to simplify the annotation of the N. crassa genome.

  18. Insights into transcriptomes of Big and Low sagebrush

    Treesearch

    Mark D. Huynh; Justin T. Page; Bryce A. Richardson; Joshua A. Udall

    2015-01-01

    We report the sequencing and assembly of three transcriptomes from Big (Artemisia tridentatassp. wyomingensis and A. tridentatassp. tridentata) and Low (A. arbuscula ssp. arbuscula) sagebrush. The sequence reads are available in the Sequence Read Archive of NCBI. We demonstrate the utilities of these transcriptomes for gene discovery and phylogenomic analysis. An...

  19. Taxonomic evaluation of putative Streptomyces scabiei strains held in the ARS (NRRL) Culture Collection using multi-locus sequence analysis

    USDA-ARS?s Scientific Manuscript database

    Multi-locus sequence analysis has been demonstrated to be a useful tool for identification of Streptomyces species and was previously applied to phylogenetically differentiate the type strains of species pathogenic on potatoes (Solanum tuberosum L.). The ARS Culture Collection (NRRL) contains 43 str...

  20. Molecular Analysis of Dehalococcoides 16S Ribosomal DNA from Chloroethene-Contaminated Sites throughout North America and Europe

    PubMed Central

    Hendrickson, Edwin R.; Payne, Jo Ann; Young, Roslyn M.; Starr, Mark G.; Perry, Michael P.; Fahnestock, Stephen; Ellis, David E.; Ebersole, Richard C.

    2002-01-01

    The environmental distribution of Dehalococcoides group organisms and their association with chloroethene-contaminated sites were examined. Samples from 24 chloroethene-dechlorinating sites scattered throughout North America and Europe were tested for the presence of members of the Dehalococcoides group by using a PCR assay developed to detect Dehalococcoides 16S rRNA gene (rDNA) sequences. Sequences identified by sequence analysis as sequences of members of the Dehalococcoides group were detected at 21 sites. Full dechlorination of chloroethenes to ethene occurred at these sites. Dehalococcoides sequences were not detected in samples from three sites at which partial dechlorination of chloroethenes occurred, where dechlorination appeared to stop at 1,2-cis-dichloroethene. Phylogenetic analysis of the 16S rDNA amplicons confirmed that Dehalococcoides sequences formed a unique 16S rDNA group. These 16S rDNA sequences were divided into three subgroups based on specific base substitution patterns in variable regions 2 and 6 of the Dehalococcoides 16S rDNA sequence. Analyses also demonstrated that specific base substitution patterns were signature patterns. The specific base substitutions distinguished the three sequence subgroups phylogenetically. These results demonstrated that members of the Dehalococcoides group are widely distributed in nature and can be found in a variety of geological formations and in different climatic zones. Furthermore, the association of these organisms with full dechlorination of chloroethenes suggests that they are promising candidates for engineered bioremediation and may be important contributors to natural attenuation of chloroethenes. PMID:11823182

  1. Laser Desorption Mass Spectrometry for DNA Sequencing and Analysis

    NASA Astrophysics Data System (ADS)

    Chen, C. H. Winston; Taranenko, N. I.; Golovlev, V. V.; Isola, N. R.; Allman, S. L.

    1998-03-01

    Rapid DNA sequencing and/or analysis is critically important for biomedical research. In the past, gel electrophoresis has been the primary tool to achieve DNA analysis and sequencing. However, gel electrophoresis is a time-consuming and labor-extensive process. Recently, we have developed and used laser desorption mass spectrometry (LDMS) to achieve sequencing of ss-DNA longer than 100 nucleotides. With LDMS, we succeeded in sequencing DNA in seconds instead of hours or days required by gel electrophoresis. In addition to sequencing, we also applied LDMS for the detection of DNA probes for hybridization LDMS was also used to detect short tandem repeats for forensic applications. Clinical applications for disease diagnosis such as cystic fibrosis caused by base deletion and point mutation have also been demonstrated. Experimental details will be presented in the meeting. abstract.

  2. Burkholderia cordobensis sp. nov., from agricultural soils.

    PubMed

    Draghi, Walter O; Peeters, Charlotte; Cnockaert, Margo; Snauwaert, Cindy; Wall, Luis G; Zorreguieta, Angeles; Vandamme, Peter

    2014-06-01

    Two Gram-negative, rod-shaped bacteria were isolated from agricultural soils in Córdoba province in central Argentina. Their 16S rRNA gene sequences demonstrated that they belong to the genus Burkholderia, with Burkholderia zhejiangensis as most closely related formally named species; this relationship was confirmed through comparative gyrB sequence analysis. Whole-cell fatty acid analysis supported their assignment to the genus Burkholderia. Burkholderia sp. strain YI23, for which a whole-genome sequence is available, represents the same taxon, as demonstrated by its highly similar 16S rRNA (100% similarity) and gyrB (99.1-99.7%) gene sequences. The results of DNA-DNA hybridization experiments and physiological and biochemical characterization further substantiated the genotypic and phenotypic distinctiveness of the Argentinian soil isolates, for which the name Burkholderia cordobensis sp. nov. is proposed, with strain MMP81(T) ( = LMG 27620(T) = CCUG 64368(T)) as the type strain. © 2014 IUMS.

  3. Secure and robust cloud computing for high-throughput forensic microsatellite sequence analysis and databasing.

    PubMed

    Bailey, Sarah F; Scheible, Melissa K; Williams, Christopher; Silva, Deborah S B S; Hoggan, Marina; Eichman, Christopher; Faith, Seth A

    2017-11-01

    Next-generation Sequencing (NGS) is a rapidly evolving technology with demonstrated benefits for forensic genetic applications, and the strategies to analyze and manage the massive NGS datasets are currently in development. Here, the computing, data storage, connectivity, and security resources of the Cloud were evaluated as a model for forensic laboratory systems that produce NGS data. A complete front-to-end Cloud system was developed to upload, process, and interpret raw NGS data using a web browser dashboard. The system was extensible, demonstrating analysis capabilities of autosomal and Y-STRs from a variety of NGS instrumentation (Illumina MiniSeq and MiSeq, and Oxford Nanopore MinION). NGS data for STRs were concordant with standard reference materials previously characterized with capillary electrophoresis and Sanger sequencing. The computing power of the Cloud was implemented with on-demand auto-scaling to allow multiple file analysis in tandem. The system was designed to store resulting data in a relational database, amenable to downstream sample interpretations and databasing applications following the most recent guidelines in nomenclature for sequenced alleles. Lastly, a multi-layered Cloud security architecture was tested and showed that industry standards for securing data and computing resources were readily applied to the NGS system without disadvantageous effects for bioinformatic analysis, connectivity or data storage/retrieval. The results of this study demonstrate the feasibility of using Cloud-based systems for secured NGS data analysis, storage, databasing, and multi-user distributed connectivity. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. A harmonized immunoassay with liquid chromatography-mass spectrometry analysis in egg allergen determination.

    PubMed

    Nimata, Masaomi; Okada, Hideki; Kurihara, Kei; Sugimoto, Tsukasa; Honjoh, Tsutomu; Kuroda, Kazuhiko; Yano, Takeo; Tachibana, Hirofumi; Shoji, Masahiro

    2018-01-01

    Food allergy is a serious health issue worldwide. Implementing allergen labeling regulations is extremely challenging for regulators, food manufacturers, and analytical kit manufacturers. Here we have developed an "amino acid sequence immunoassay" approach to ELISA. The new ELISA comprises of a monoclonal antibody generated via an analyte specific peptide antigen and sodium lauryl sulfate/sulfite solution. This combination enables the antibody to access the epitope site in unfolded analyte protein. The newly developed ELISA recovered 87.1%-106.4% ovalbumin from ovalbumin-incurred model processed foods, thereby demonstrating its applicability as practical egg allergen determination. Furthermore, the comparison of LC-MS/MS and the new ELISA, which targets the amino acid sequence conforming to the LC-MS/MS detection peptide, showed a good agreement. Consequently the harmonization of two methods was demonstrated. The complementary use of the new ELISA and LC-MS analysis can offer a wide range of practical benefits in terms of easiness, cost, accuracy, and efficiency in food allergen analysis. In addition, the new assay is attractive in respect to its easy antigen preparation and predetermined specificity. Graphical abstract The ELISA composing of the monoclonal antibody targeting the amino acid sequence conformed to LC-MS detection peptide, and the protein conformation unfolding reagent was developed. In ovalbumin determination, the developed ELISA showed a good agreement with LC-MS analysis. Consequently the harmonization of immunoassay with LC-MS analysis by using common target amino acid sequence was demonstrated.

  5. Deep Sequencing to Identify the Causes of Viral Encephalitis

    PubMed Central

    Chan, Benjamin K.; Wilson, Theodore; Fischer, Kael F.; Kriesel, John D.

    2014-01-01

    Deep sequencing allows for a rapid, accurate characterization of microbial DNA and RNA sequences in many types of samples. Deep sequencing (also called next generation sequencing or NGS) is being developed to assist with the diagnosis of a wide variety of infectious diseases. In this study, seven frozen brain samples from deceased subjects with recent encephalitis were investigated. RNA from each sample was extracted, randomly reverse transcribed and sequenced. The sequence analysis was performed in a blinded fashion and confirmed with pathogen-specific PCR. This analysis successfully identified measles virus sequences in two brain samples and herpes simplex virus type-1 sequences in three brain samples. No pathogen was identified in the other two brain specimens. These results were concordant with pathogen-specific PCR and partially concordant with prior neuropathological examinations, demonstrating that deep sequencing can accurately identify viral infections in frozen brain tissue. PMID:24699691

  6. Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis.

    PubMed

    Jakupciak, John P; Wells, Jeffrey M; Karalus, Richard J; Pawlowski, David R; Lin, Jeffrey S; Feldman, Andrew B

    2013-01-01

    Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations.

  7. Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis

    PubMed Central

    Jakupciak, John P.; Wells, Jeffrey M.; Karalus, Richard J.; Pawlowski, David R.; Lin, Jeffrey S.; Feldman, Andrew B.

    2013-01-01

    Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations. PMID:24455204

  8. Confirmation of translatability and functionality certifies the dual endothelin1/VEGFsp receptor (DEspR) protein.

    PubMed

    Herrera, Victoria L M; Steffen, Martin; Moran, Ann Marie; Tan, Glaiza A; Pasion, Khristine A; Rivera, Keith; Pappin, Darryl J; Ruiz-Opazo, Nelson

    2016-06-14

    In contrast to rat and mouse databases, the NCBI gene database lists the human dual-endothelin1/VEGFsp receptor (DEspR, formerly Dear) as a unitary transcribed pseudogene due to a stop [TGA]-codon at codon#14 in automated DNA and RNA sequences. However, re-analysis is needed given prior single gene studies detected a tryptophan [TGG]-codon#14 by manual Sanger sequencing, demonstrated DEspR translatability and functionality, and since the demonstration of actual non-translatability through expression studies, the standard-of-excellence for pseudogene designation, has not been performed. Re-analysis must meet UNIPROT criteria for demonstration of a protein's existence at the highest (protein) level, which a priori, would override DNA- or RNA-based deductions. To dissect the nucleotide sequence discrepancy, we performed Maxam-Gilbert sequencing and reviewed 727 RNA-seq entries. To comply with the highest level multiple UNIPROT criteria for determining DEspR's existence, we performed various experiments using multiple anti-DEspR monoclonal antibodies (mAbs) targeting distinct DEspR epitopes with one spanning the contested tryptophan [TGG]-codon#14, assessing: (a) DEspR protein expression, (b) predicted full-length protein size, (c) sequence-predicted protein-specific properties beyond codon#14: receptor glycosylation and internalization, (d) protein-partner interactions, and (e) DEspR functionality via DEspR-inhibition effects. Maxam-Gilbert sequencing and some RNA-seq entries demonstrate two guanines, hence a tryptophan [TGG]-codon#14 within a compression site spanning an error-prone compression sequence motif. Western blot analysis using anti-DEspR mAbs targeting distinct DEspR epitopes detect the identical glycosylated 17.5 kDa pull-down protein. Decrease in DEspR-protein size after PNGase-F digest demonstrates post-translational glycosylation, concordant with the consensus-glycosylation site beyond codon#14. Like other small single-transmembrane proteins, mass spectrometry analysis of anti-DEspR mAb pull-down proteins do not detect DEspR, but detect DEspR-protein interactions with proteins implicated in intracellular trafficking and cancer. FACS analyses also detect DEspR-protein in different human cancer stem-like cells (CSCs). DEspR-inhibition studies identify DEspR-roles in CSC survival and growth. Live cell imaging detects fluorescently-labeled anti-DEspR mAb targeted-receptor internalization, concordant with the single internalization-recognition sequence also located beyond codon#14. Data confirm translatability of DEspR, the full-length DEspR protein beyond codon#14, and elucidate DEspR-specific functionality. Along with detection of the tryptophan [TGG]-codon#14 within an error-prone compression site, cumulative data demonstrating DEspR protein existence fulfill multiple UNIPROT criteria, thus refuting its pseudogene designation.

  9. Copy number variants calling for single cell sequencing data by multi-constrained optimization.

    PubMed

    Xu, Bo; Cai, Hongmin; Zhang, Changsheng; Yang, Xi; Han, Guoqiang

    2016-08-01

    Variations in DNA copy number carry important information on genome evolution and regulation of DNA replication in cancer cells. The rapid development of single-cell sequencing technology allows one to explore gene expression heterogeneity among single-cells, thus providing important cancer cell evolution information. Single-cell DNA/RNA sequencing data usually have low genome coverage, which requires an extra step of amplification to accumulate enough samples. However, such amplification will introduce large bias and makes bioinformatics analysis challenging. Accurately modeling the distribution of sequencing data and effectively suppressing the bias influence is the key to success variations analysis. Recent advances demonstrate the technical noises by amplification are more likely to follow negative binomial distribution, a special case of Poisson distribution. Thus, we tackle the problem CNV detection by formulating it into a quadratic optimization problem involving two constraints, in which the underling signals are corrupted by Poisson distributed noises. By imposing the constraints of sparsity and smoothness, the reconstructed read depth signals from single-cell sequencing data are anticipated to fit the CNVs patterns more accurately. An efficient numerical solution based on the classical alternating direction minimization method (ADMM) is tailored to solve the proposed model. We demonstrate the advantages of the proposed method using both synthetic and empirical single-cell sequencing data. Our experimental results demonstrate that the proposed method achieves excellent performance and high promise of success with single-cell sequencing data. Crown Copyright © 2016. Published by Elsevier Ltd. All rights reserved.

  10. Auditory sequence analysis and phonological skill

    PubMed Central

    Grube, Manon; Kumar, Sukhbinder; Cooper, Freya E.; Turton, Stuart; Griffiths, Timothy D.

    2012-01-01

    This work tests the relationship between auditory and phonological skill in a non-selected cohort of 238 school students (age 11) with the specific hypothesis that sound-sequence analysis would be more relevant to phonological skill than the analysis of basic, single sounds. Auditory processing was assessed across the domains of pitch, time and timbre; a combination of six standard tests of literacy and language ability was used to assess phonological skill. A significant correlation between general auditory and phonological skill was demonstrated, plus a significant, specific correlation between measures of phonological skill and the auditory analysis of short sequences in pitch and time. The data support a limited but significant link between auditory and phonological ability with a specific role for sound-sequence analysis, and provide a possible new focus for auditory training strategies to aid language development in early adolescence. PMID:22951739

  11. Short-Read Sequencing for Genomic Analysis of the Brown Rot Fungus Fibroporia radiculosa

    Treesearch

    J. D. Tang; A. D. Perkins; T. S. Sonstegard; S. G. Schroeder; S. C. Burgess; S. V. Diehl

    2012-01-01

    The feasibility of short-read sequencing for genomic analysis was demonstrated for Fibroporia radiculosa, a copper-tolerant fungus that causes brown rot decay of wood. The effect of read quality on genomic assembly was assessed by filtering Illumina GAIIx reads from a single run of a paired-end library (75-nucleotide read length and 300-bp fragment...

  12. Taxonomic evaluation of putative Streptomyces scabiei strains held in the ARS Culture Collection (NRRL) using multi-locus sequence analysis.

    PubMed

    Labeda, David P

    2016-03-01

    Multi-locus sequence analysis has been demonstrated to be a useful tool for identification of Streptomyces species and was previously applied to phylogenetically differentiate the type strains of species pathogenic on potatoes (Solanum tuberosum L.). The ARS Culture Collection (NRRL) contains 43 strains identified as Streptomyces scabiei deposited at various times since the 1950s and these were subjected to multi-locus sequence analysis utilising partial sequences of the house-keeping genes atpD, gyrB, recA, rpoB and trpB. Phylogenetic analyses confirmed the identity of 17 of these strains as Streptomyces scabiei, 9 of the strains as the potato-pathogenic species Streptomyces europaeiscabiei and 6 strains as potentially new phytopathogenic species. Of the 16 other strains, 12 were identified as members of previously described non-pathogenic Streptomyces species while the remaining 4 strains may represent heretofore unrecognised non-pathogenic species. This study demonstrated the value of this technique for the relatively rapid, simple and sensitive molecular identification of Streptomyces strains held in culture collections.

  13. RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets.

    PubMed

    Scheuch, Matthias; Höper, Dirk; Beer, Martin

    2015-03-03

    Fuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck. To overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS - Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets. RIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.

  14. SACCHARIS: an automated pipeline to streamline discovery of carbohydrate active enzyme activities within polyspecific families and de novo sequence datasets.

    PubMed

    Jones, Darryl R; Thomas, Dallas; Alger, Nicholas; Ghavidel, Ata; Inglis, G Douglas; Abbott, D Wade

    2018-01-01

    Deposition of new genetic sequences in online databases is expanding at an unprecedented rate. As a result, sequence identification continues to outpace functional characterization of carbohydrate active enzymes (CAZymes). In this paradigm, the discovery of enzymes with novel functions is often hindered by high volumes of uncharacterized sequences particularly when the enzyme sequence belongs to a family that exhibits diverse functional specificities (i.e., polyspecificity). Therefore, to direct sequence-based discovery and characterization of new enzyme activities we have developed an automated in silico pipeline entitled: Sequence Analysis and Clustering of CarboHydrate Active enzymes for Rapid Informed prediction of Specificity (SACCHARIS). This pipeline streamlines the selection of uncharacterized sequences for discovery of new CAZyme or CBM specificity from families currently maintained on the CAZy website or within user-defined datasets. SACCHARIS was used to generate a phylogenetic tree of a GH43, a CAZyme family with defined subfamily designations. This analysis confirmed that large datasets can be organized into sequence clusters of manageable sizes that possess related functions. Seeding this tree with a GH43 sequence from Bacteroides dorei DSM 17855 (BdGH43b, revealed it partitioned as a single sequence within the tree. This pattern was consistent with it possessing a unique enzyme activity for GH43 as BdGH43b is the first described α-glucanase described for this family. The capacity of SACCHARIS to extract and cluster characterized carbohydrate binding module sequences was demonstrated using family 6 CBMs (i.e., CBM6s). This CBM family displays a polyspecific ligand binding profile and contains many structurally determined members. Using SACCHARIS to identify a cluster of divergent sequences, a CBM6 sequence from a unique clade was demonstrated to bind yeast mannan, which represents the first description of an α-mannan binding CBM. Additionally, we have performed a CAZome analysis of an in-house sequenced bacterial genome and a comparative analysis of B. thetaiotaomicron VPI-5482 and B. thetaiotaomicron 7330, to demonstrate that SACCHARIS can generate "CAZome fingerprints", which differentiate between the saccharolytic potential of two related strains in silico. Establishing sequence-function and sequence-structure relationships in polyspecific CAZyme families are promising approaches for streamlining enzyme discovery. SACCHARIS facilitates this process by embedding CAZyme and CBM family trees generated from biochemically to structurally characterized sequences, with protein sequences that have unknown functions. In addition, these trees can be integrated with user-defined datasets (e.g., genomics, metagenomics, and transcriptomics) to inform experimental characterization of new CAZymes or CBMs not currently curated, and for researchers to compare differential sequence patterns between entire CAZomes. In this light, SACCHARIS provides an in silico tool that can be tailored for enzyme bioprospecting in datasets of increasing complexity and for diverse applications in glycobiotechnology.

  15. Streaming fragment assignment for real-time analysis of sequencing experiments

    PubMed Central

    Roberts, Adam; Pachter, Lior

    2013-01-01

    We present eXpress, a software package for highly efficient probabilistic assignment of ambiguously mapping sequenced fragments. eXpress uses a streaming algorithm with linear run time and constant memory use. It can determine abundances of sequenced molecules in real time, and can be applied to ChIP-seq, metagenomics and other large-scale sequencing data. We demonstrate its use on RNA-seq data, showing greater efficiency than other quantification methods. PMID:23160280

  16. Mosaic organization of DNA nucleotides

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Havlin, S.; Simons, M.; Stanley, H. E.; Goldberger, A. L.

    1994-01-01

    Long-range power-law correlations have been reported recently for DNA sequences containing noncoding regions. We address the question of whether such correlations may be a trivial consequence of the known mosaic structure ("patchiness") of DNA. We analyze two classes of controls consisting of patchy nucleotide sequences generated by different algorithms--one without and one with long-range power-law correlations. Although both types of sequences are highly heterogenous, they are quantitatively distinguishable by an alternative fluctuation analysis method that differentiates local patchiness from long-range correlations. Application of this analysis to selected DNA sequences demonstrates that patchiness is not sufficient to account for long-range correlation properties.

  17. Determination of the sequences of protein-derived peptides and peptide mixtures by mass spectrometry

    PubMed Central

    Morris, Howard R.; Williams, Dudley H.; Ambler, Richard P.

    1971-01-01

    Micro-quantities of protein-derived peptides have been converted into N-acetylated permethyl derivatives, and their sequences determined by low-resolution mass spectrometry without prior knowledge of their amino acid compositions or lengths. A new strategy is suggested for the mass spectrometric sequencing of oligopeptides or proteins, involving gel filtration of protein hydrolysates and subsequent sequence analysis of peptide mixtures. Finally, results are given that demonstrate for the first time the use of mass spectrometry for the analysis of a protein-derived peptide mixture, again without prior knowledge of the protein or components within the mixture. PMID:5158904

  18. Quantitative mutant analysis of viral quasispecies by chip-based matrix-assisted laser desorption/ ionization time-of-flight mass spectrometry

    PubMed Central

    Amexis, Georgios; Oeth, Paul; Abel, Kenneth; Ivshina, Anna; Pelloquin, Francois; Cantor, Charles R.; Braun, Andreas; Chumakov, Konstantin

    2001-01-01

    RNA viruses exist as quasispecies, heterogeneous and dynamic mixtures of mutants having one or more consensus sequences. An adequate description of the genomic structure of such viral populations must include the consensus sequence(s) plus a quantitative assessment of sequence heterogeneities. For example, in quality control of live attenuated viral vaccines, the presence of even small quantities of mutants or revertants may indicate incomplete or unstable attenuation that may influence vaccine safety. Previously, we demonstrated the monitoring of oral poliovirus vaccine with the use of mutant analysis by PCR and restriction enzyme cleavage (MAPREC). In this report, we investigate genetic variation in live attenuated mumps virus vaccine by using both MAPREC and a platform (DNA MassArray) based on matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry. Mumps vaccines prepared from the Jeryl Lynn strain typically contain at least two distinct viral substrains, JL1 and JL2, which have been characterized by full length sequencing. We report the development of assays for characterizing sequence variants in these substrains and demonstrate their use in quantitative analysis of substrains and sequence variations in mixed virus cultures and mumps vaccines. The results obtained from both the MAPREC and MALDI-TOF methods showed excellent correlation. This suggests the potential utility of MALDI-TOF for routine quality control of live viral vaccines and for assessment of genetic stability and quantitative monitoring of genetic changes in other RNA viruses of clinical interest. PMID:11593021

  19. EventThread: Visual Summarization and Stage Analysis of Event Sequence Data.

    PubMed

    Guo, Shunan; Xu, Ke; Zhao, Rongwen; Gotz, David; Zha, Hongyuan; Cao, Nan

    2018-01-01

    Event sequence data such as electronic health records, a person's academic records, or car service records, are ordered series of events which have occurred over a period of time. Analyzing collections of event sequences can reveal common or semantically important sequential patterns. For example, event sequence analysis might reveal frequently used care plans for treating a disease, typical publishing patterns of professors, and the patterns of service that result in a well-maintained car. It is challenging, however, to visually explore large numbers of event sequences, or sequences with large numbers of event types. Existing methods focus on extracting explicitly matching patterns of events using statistical analysis to create stages of event progression over time. However, these methods fail to capture latent clusters of similar but not identical evolutions of event sequences. In this paper, we introduce a novel visualization system named EventThread which clusters event sequences into threads based on tensor analysis and visualizes the latent stage categories and evolution patterns by interactively grouping the threads by similarity into time-specific clusters. We demonstrate the effectiveness of EventThread through usage scenarios in three different application domains and via interviews with an expert user.

  20. Secondary structure prediction and structure-specific sequence analysis of single-stranded DNA.

    PubMed

    Dong, F; Allawi, H T; Anderson, T; Neri, B P; Lyamichev, V I

    2001-08-01

    DNA sequence analysis by oligonucleotide binding is often affected by interference with the secondary structure of the target DNA. Here we describe an approach that improves DNA secondary structure prediction by combining enzymatic probing of DNA by structure-specific 5'-nucleases with an energy minimization algorithm that utilizes the 5'-nuclease cleavage sites as constraints. The method can identify structural differences between two DNA molecules caused by minor sequence variations such as a single nucleotide mutation. It also demonstrates the existence of long-range interactions between DNA regions separated by >300 nt and the formation of multiple alternative structures by a 244 nt DNA molecule. The differences in the secondary structure of DNA molecules revealed by 5'-nuclease probing were used to design structure-specific probes for mutation discrimination that target the regions of structural, rather than sequence, differences. We also demonstrate the performance of structure-specific 'bridge' probes complementary to non-contiguous regions of the target molecule. The structure-specific probes do not require the high stringency binding conditions necessary for methods based on mismatch formation and permit mutation detection at temperatures from 4 to 37 degrees C. Structure-specific sequence analysis is applied for mutation detection in the Mycobacterium tuberculosis katG gene and for genotyping of the hepatitis C virus.

  1. Transcriptome analysis of blueberry using 454 EST sequencing

    USDA-ARS?s Scientific Manuscript database

    Blueberry (Vaccinium corymbosum) is a major berry crop in the United States, and one that has great nutritional and economical value. Next generation sequencing methodologies, such as 454, have been demonstrated to be successful and efficient in producing a snap-shot of transcriptional activities du...

  2. Using information content and base frequencies to distinguish mutations from genetic polymorphisms in splice junction recognition sites.

    PubMed

    Rogan, P K; Schneider, T D

    1995-01-01

    Predicting the effects of nucleotide substitutions in human splice sites has been based on analysis of consensus sequences. We used a graphic representation of sequence conservation and base frequency, the sequence logo, to demonstrate that a change in a splice acceptor of hMSH2 (a gene associated with familial nonpolyposis colon cancer) probably does not reduce splicing efficiency. This confirms a population genetic study that suggested that this substitution is a genetic polymorphism. The information theory-based sequence logo is quantitative and more sensitive than the corresponding splice acceptor consensus sequence for detection of true mutations. Information analysis may potentially be used to distinguish polymorphisms from mutations in other types of transcriptional, translational, or protein-coding motifs.

  3. Sequence heterogeneities of genes encoding 16S rRNAs in Paenibacillus polymyxa detected by temperature gradient gel electrophoresis.

    PubMed Central

    Nübel, U; Engelen, B; Felske, A; Snaidr, J; Wieshuber, A; Amann, R I; Ludwig, W; Backhaus, H

    1996-01-01

    Sequence heterogeneities in 16S rRNA genes from individual strains of Paenibacillus polymyxa were detected by sequence-dependent separation of PCR products by temperature gradient gel electrophoresis (TGGE). A fragment of the 16S rRNA genes, comprising variable regions V6 to V8, was used as a target sequence for amplifications. PCR products from P. polymyxa (type strain) emerged as a well-defined pattern of bands in the gradient gel. Six plasmids with different inserts, individually demonstrating the migration characteristics of single bands of the pattern, were obtained by cloning the PCR products. Their sequences were analyzed as a representative sample of the total heterogeneity. An amount of 10 variant nucleotide positions in the fragment of 347 bp was observed, with all substitutions conserving the relevant secondary structures of the V6 and V8 regions in the RNA molecules. Hybridizations with specifically designed probes demonstrated different chromosomal locations of the respective rRNA genes. Amplifications of reverse-transcribed rRNA from ribosome preparations, as well as whole-cell hybridizations, revealed a predominant representation of particular sequences in ribosomes of exponentially growing laboratory cultures. Different strains of P. polymyxa showed not only remarkably differing patterns of PCR products in TGGE analysis but also discriminative whole-cell labeling with the designed oligonucleotide probes, indicating the different representation of individual sequences in active ribosomes. Our results demonstrate the usefulness of TGGE for the structural analysis of heterogeneous rRNA genes together with their expression, stress problems of the generation of meaningful data for 16S rRNA sequences and probe designs, and might have consequences for evolutionary concepts. PMID:8824607

  4. Pulseq-Graphical Programming Interface: Open source visual environment for prototyping pulse sequences and integrated magnetic resonance imaging algorithm development.

    PubMed

    Ravi, Keerthi Sravan; Potdar, Sneha; Poojar, Pavan; Reddy, Ashok Kumar; Kroboth, Stefan; Nielsen, Jon-Fredrik; Zaitsev, Maxim; Venkatesan, Ramesh; Geethanath, Sairam

    2018-03-11

    To provide a single open-source platform for comprehensive MR algorithm development inclusive of simulations, pulse sequence design and deployment, reconstruction, and image analysis. We integrated the "Pulseq" platform for vendor-independent pulse programming with Graphical Programming Interface (GPI), a scientific development environment based on Python. Our integrated platform, Pulseq-GPI, permits sequences to be defined visually and exported to the Pulseq file format for execution on an MR scanner. For comparison, Pulseq files using either MATLAB only ("MATLAB-Pulseq") or Python only ("Python-Pulseq") were generated. We demonstrated three fundamental sequences on a 1.5 T scanner. Execution times of the three variants of implementation were compared on two operating systems. In vitro phantom images indicate equivalence with the vendor supplied implementations and MATLAB-Pulseq. The examples demonstrated in this work illustrate the unifying capability of Pulseq-GPI. The execution times of all the three implementations were fast (a few seconds). The software is capable of user-interface based development and/or command line programming. The tool demonstrated here, Pulseq-GPI, integrates the open-source simulation, reconstruction and analysis capabilities of GPI Lab with the pulse sequence design and deployment features of Pulseq. Current and future work includes providing an ISMRMRD interface and incorporating Specific Absorption Ratio and Peripheral Nerve Stimulation computations. Copyright © 2018 Elsevier Inc. All rights reserved.

  5. Cytochrome cd1-containing nitrite reductase encoding gene nirS as a new functional biomarker for detection of anaerobic ammonium oxidizing (Anammox) bacteria.

    PubMed

    Li, Meng; Ford, Tim; Li, Xiaoyan; Gu, Ji-Dong

    2011-04-15

    A newly designed primer set (AnnirS), together with a previously published primer set (ScnirS), was used to detect anammox bacterial nirS genes from sediments collected from three marine environments. Phylogenetic analysis demonstrated that all retrieved sequences were clearly different from typical denitrifiers' nirS, but do group together with the known anammox bacterial nirS. Sequences targeted by ScnirS are closely related to Scalindua nirS genes recovered from the Peruvian oxygen minimum zone (OMZ), whereas sequences targeted by AnnirS are more closely affiliated with the nirS of Candidatus 'Kuenenia stuttgartiensis' and even form a new phylogenetic nirS clade, which might be related to other genera of the anammox bacteria. Analysis demonstrated that retrieved sequences had higher sequence identities (>60%) with known anammox bacterial nirS genes than with denitrifiers' nirS, on both nucleotide and amino acid levels. Compared to the 16S rRNA and hydrazine oxidoreductase (hzo) genes, the anammox bacterial nirS not only showed consistent phylogenetic relationships but also demonstrated more reliable quantification of anammox bacteria because of the single copy of the nirS gene in the anammox bacterial genome and the specificity of PCR primers for different genera of anammox bacteria, thus providing a suitable functional biomarker for investigation of anammox bacteria.

  6. Analysis and Visualization of ChIP-Seq and RNA-Seq Sequence Alignments Using ngs.plot.

    PubMed

    Loh, Yong-Hwee Eddie; Shen, Li

    2016-01-01

    The continual maturation and increasing applications of next-generation sequencing technology in scientific research have yielded ever-increasing amounts of data that need to be effectively and efficiently analyzed and innovatively mined for new biological insights. We have developed ngs.plot-a quick and easy-to-use bioinformatics tool that performs visualizations of the spatial relationships between sequencing alignment enrichment and specific genomic features or regions. More importantly, ngs.plot is customizable beyond the use of standard genomic feature databases to allow the analysis and visualization of user-specified regions of interest generated by the user's own hypotheses. In this protocol, we demonstrate and explain the use of ngs.plot using command line executions, as well as a web-based workflow on the Galaxy framework. We replicate the underlying commands used in the analysis of a true biological dataset that we had reported and published earlier and demonstrate how ngs.plot can easily generate publication-ready figures. With ngs.plot, users would be able to efficiently and innovatively mine their own datasets without having to be involved in the technical aspects of sequence coverage calculations and genomic databases.

  7. Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells.

    PubMed

    Beltman, Joost B; Urbanus, Jos; Velds, Arno; van Rooij, Nienke; Rohr, Jan C; Naik, Shalin H; Schumacher, Ton N

    2016-04-02

    Next generation sequencing (NGS) of amplified DNA is a powerful tool to describe genetic heterogeneity within cell populations that can both be used to investigate the clonal structure of cell populations and to perform genetic lineage tracing. For applications in which both abundant and rare sequences are biologically relevant, the relatively high error rate of NGS techniques complicates data analysis, as it is difficult to distinguish rare true sequences from spurious sequences that are generated by PCR or sequencing errors. This issue, for instance, applies to cellular barcoding strategies that aim to follow the amount and type of offspring of single cells, by supplying these with unique heritable DNA tags. Here, we use genetic barcoding data from the Illumina HiSeq platform to show that straightforward read threshold-based filtering of data is typically insufficient to filter out spurious barcodes. Importantly, we demonstrate that specific sequencing errors occur at an approximately constant rate across different samples that are sequenced in parallel. We exploit this observation by developing a novel approach to filter out spurious sequences. Application of our new method demonstrates its value in the identification of true sequences amongst spurious sequences in biological data sets.

  8. Molecular identification and phylogenetic analysis of Wuchereria bancrofti from human blood samples in Egypt.

    PubMed

    Abdel-Shafi, Iman R; Shoieb, Eman Y; Attia, Samar S; Rubio, José M; Ta-Tang, Thuy-Huong; El-Badry, Ayman A

    2017-03-01

    Lymphatic filariasis (LF) is a serious vector-borne health problem, and Wuchereria bancrofti (W.b) is the major cause of LF worldwide and is focally endemic in Egypt. Identification of filarial infection using traditional morphologic and immunological criteria can be difficult and lead to misdiagnosis. The aim of the present study was molecular detection of W.b in residents in endemic areas in Egypt, sequence variance analysis, and phylogenetic analysis of W.b DNA. Collected blood samples from residents in filariasis endemic areas in five governorates were subjected to semi-nested PCR targeting repeated DNA sequence, for detection of W.b DNA. PCR products were sequenced; subsequently, a phylogenetic analysis of the obtained sequences was performed. Out of 300 blood samples, W.b DNA was identified in 48 (16%). Sequencing analysis confirmed PCR results identifying only W.b species. Sequence alignment and phylogenetic analysis indicated genetically distinct clusters of W.b among the study population. Study results demonstrated that the semi-nested PCR proved to be an effective diagnostic tool for accurate and rapid detection of W.b infections in nano-epidemics and is applicable for samples collected in the daytime as well as the night time. PCR products sequencing and phylogenitic analysis revealed three different nucleotide sequences variants. Further genetic studies of W.b in Egypt and other endemic areas are needed to distinguish related strains and the various ecological as well as drug effects exerted on them to support W.b elimination.

  9. Partial sequencing of sodA gene and its application to identification of Streptococcus dysgalactiae subsp. dysgalactiae isolated from farmed fish.

    PubMed

    Nomoto, R; Kagawa, H; Yoshida, T

    2008-01-01

    To investigate the difference between Lancefield group C Streptococcus dysgalactiae (GCSD) strains isolated from diseased fish and animals by sequencing and phylogenetic analysis of the sodA gene. The sodA gene of Strep. dysgalactiae strains isolated from fish and animals were amplified and its nucleotide sequences were determined. Although 100% sequence identity was observed among fish GCSD strains, the determined sequences from animal isolates showed variations against fish isolate sequences. Thus, all fish GCSD strains were clearly separated from the GCSD strains of other origin by using phylogenetic tree analysis. In addition, the original primer set was designed based on the determined sequences for specifically amplify the sodA gene of fish GCSD strains. The primer set yield amplification products from only fish GCSD strains. By sequencing analysis of the sodA gene, the genetic divergence between Strep. dysgalactiae strains isolated from fish and mammals was demonstrated. Moreover, an original oligonucletide primer set, which could simply detect the genotype of fish GCSD strains was designed. This study shows that Strep. dysgalactiae isolated from diseased fish could be distinguished from conventional GCSD strains by the difference in the sequence of the sodA gene.

  10. A 12-year molecular survey of clinical herpes simplex virus type 2 isolates demonstrates the circulation of clade A and B strains in Germany.

    PubMed

    Schmidt-Chanasit, Jonas; Bialonski, Alexandra; Heinemann, Patrick; Ulrich, Rainer G; Günther, Stephan; Rabenau, Holger F; Doerr, Hans Wilhelm

    2010-07-01

    Recently two different herpes simplex virus type 2 (HSV-2) clades (A and B) were described on DNA sequence data of the glycoprotein E (gE), G (gG) and I (gI) genes. To type the circulating HSV-2 wild-type strains in Germany by a novel approach and to monitor potential changes in the molecular epidemiology between 1997 and 2008. A total of 64 clinical HSV-2 isolates were analyzed by a novel approach using the DNA sequences of the complete open reading frames of glycoprotein B (gB) and gG. Recombination analysis of the gB and gG gene sequences was performed to reveal intragenic recombinants. Based on the phylogenetic analysis of the gB coding DNA sequence 8 of 64 (12%) isolates were classified as clade A strains and 56 of 64 (88%) isolates were classified as clade B strains. Analysis of the gG coding DNA sequence classified 4 (6%) isolates as clade A strains and 60 (94%) isolates as clade B strains. In comparison, the 8 isolates classified as clade A strains using the gB sequence data were classified as clade B strains when using the gG coding DNA sequence, suggesting intergenic recombination events. Intragenic recombination events were not detected. The first molecular survey of clinical HSV-2 isolates from Germany demonstrated the circulation of clade A and B strains and of intergenic recombinants over a period of 12 years. Copyright (c) 2010 Elsevier B.V. All rights reserved.

  11. Implementation of Amplicon Parallel Sequencing Leads to Improvement of Diagnosis and Therapy of Lung Cancer Patients.

    PubMed

    König, Katharina; Peifer, Martin; Fassunke, Jana; Ihle, Michaela A; Künstlinger, Helen; Heydt, Carina; Stamm, Katrin; Ueckeroth, Frank; Vollbrecht, Claudia; Bos, Marc; Gardizi, Masyar; Scheffler, Matthias; Nogova, Lucia; Leenders, Frauke; Albus, Kerstin; Meder, Lydia; Becker, Kerstin; Florin, Alexandra; Rommerscheidt-Fuss, Ursula; Altmüller, Janine; Kloth, Michael; Nürnberg, Peter; Henkel, Thomas; Bikár, Sven-Ernö; Sos, Martin L; Geese, William J; Strauss, Lewis; Ko, Yon-Dschun; Gerigk, Ulrich; Odenthal, Margarete; Zander, Thomas; Wolf, Jürgen; Merkelbach-Bruse, Sabine; Buettner, Reinhard; Heukamp, Lukas C

    2015-07-01

    The Network Genomic Medicine Lung Cancer was set up to rapidly translate scientific advances into early clinical trials of targeted therapies in lung cancer performing molecular analyses of more than 3500 patients annually. Because sequential analysis of the relevant driver mutations on fixated samples is challenging in terms of workload, tissue availability, and cost, we established multiplex parallel sequencing in routine diagnostics. The aim was to analyze all therapeutically relevant mutations in lung cancer samples in a high-throughput fashion while significantly reducing turnaround time and amount of input DNA compared with conventional dideoxy sequencing of single polymerase chain reaction amplicons. In this study, we demonstrate the feasibility of a 102 amplicon multiplex polymerase chain reaction followed by sequencing on an Illumina sequencer on formalin-fixed paraffin-embedded tissue in routine diagnostics. Analysis of a validation cohort of 180 samples showed this approach to require significantly less input material and to be more reliable, robust, and cost-effective than conventional dideoxy sequencing. Subsequently, 2657 lung cancer patients were analyzed. We observed that comprehensive biomarker testing provided novel information in addition to histological diagnosis and clinical staging. In 2657 consecutively analyzed lung cancer samples, we identified driver mutations at the expected prevalence. Furthermore we found potentially targetable DDR2 mutations at a frequency of 3% in both adenocarcinomas and squamous cell carcinomas. Overall, our data demonstrate the utility of systematic sequencing analysis in a clinical routine setting and highlight the dramatic impact of such an approach on the availability of therapeutic strategies for the targeted treatment of individual cancer patients.

  12. Interaction of healthcare worker hands and portable medical equipment: a sequence analysis to show potential transmission opportunities.

    PubMed

    Jinadatha, Chetan; Villamaria, Frank C; Coppin, John D; Dale, Charles R; Williams, Marjory D; Whitworth, Ryan; Stibich, Mark

    2017-12-28

    While research has demonstrated the importance of a clean health care environment, there is a lack of research on the role portable medical equipment (PME) play in the transmission cycle of healthcare-acquired infections (HAIs). This study investigated the patterns and sequence of contact events among health care workers, patients, surfaces, and medical equipment in a hospital environment. Research staff observed patient care events over six different 24 h periods on six different hospital units. Each encounter was recorded as a sequence of events and analyzed using sequence analysis and visually represented by network plots. In addition, a point prevalence microbial sample was taken from the computer on wheels (COW). The most touched items during patient care was the individual patient (850), bedrail (375), bed-surface (302), and bed side Table (223). Three of the top ten most common subsequences included touching PME and the patient: computer on wheels ➔ patient (62 of 274 total sequences, 22.6%, contained this sequence), patient ➔ COW (20.4%), and patient ➔ IV pump (16.1%). The network plots revealed large interconnectedness among objects in the room, the patient, PME, and the healthcare worker. Our results demonstrated that PME such as COW and IV pump were two of the most highly-touched items during patient care. Even with proper hand sanitization and personal protective equipment, this sequence analysis reveals the potential for contamination from the patient and environment, to a vector such as portable medical equipment, and ultimately to another patient in the hospital.

  13. An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge.

    PubMed

    Brownstein, Catherine A; Beggs, Alan H; Homer, Nils; Merriman, Barry; Yu, Timothy W; Flannery, Katherine C; DeChene, Elizabeth T; Towne, Meghan C; Savage, Sarah K; Price, Emily N; Holm, Ingrid A; Luquette, Lovelace J; Lyon, Elaine; Majzoub, Joseph; Neupert, Peter; McCallie, David; Szolovits, Peter; Willard, Huntington F; Mendelsohn, Nancy J; Temme, Renee; Finkel, Richard S; Yum, Sabrina W; Medne, Livija; Sunyaev, Shamil R; Adzhubey, Ivan; Cassa, Christopher A; de Bakker, Paul I W; Duzkale, Hatice; Dworzyński, Piotr; Fairbrother, William; Francioli, Laurent; Funke, Birgit H; Giovanni, Monica A; Handsaker, Robert E; Lage, Kasper; Lebo, Matthew S; Lek, Monkol; Leshchiner, Ignaty; MacArthur, Daniel G; McLaughlin, Heather M; Murray, Michael F; Pers, Tune H; Polak, Paz P; Raychaudhuri, Soumya; Rehm, Heidi L; Soemedi, Rachel; Stitziel, Nathan O; Vestecka, Sara; Supper, Jochen; Gugenmus, Claudia; Klocke, Bernward; Hahn, Alexander; Schubach, Max; Menzel, Mortiz; Biskup, Saskia; Freisinger, Peter; Deng, Mario; Braun, Martin; Perner, Sven; Smith, Richard J H; Andorf, Janeen L; Huang, Jian; Ryckman, Kelli; Sheffield, Val C; Stone, Edwin M; Bair, Thomas; Black-Ziegelbein, E Ann; Braun, Terry A; Darbro, Benjamin; DeLuca, Adam P; Kolbe, Diana L; Scheetz, Todd E; Shearer, Aiden E; Sompallae, Rama; Wang, Kai; Bassuk, Alexander G; Edens, Erik; Mathews, Katherine; Moore, Steven A; Shchelochkov, Oleg A; Trapane, Pamela; Bossler, Aaron; Campbell, Colleen A; Heusel, Jonathan W; Kwitek, Anne; Maga, Tara; Panzer, Karin; Wassink, Thomas; Van Daele, Douglas; Azaiez, Hela; Booth, Kevin; Meyer, Nic; Segal, Michael M; Williams, Marc S; Tromp, Gerard; White, Peter; Corsmeier, Donald; Fitzgerald-Butt, Sara; Herman, Gail; Lamb-Thrush, Devon; McBride, Kim L; Newsom, David; Pierson, Christopher R; Rakowsky, Alexander T; Maver, Aleš; Lovrečić, Luca; Palandačić, Anja; Peterlin, Borut; Torkamani, Ali; Wedell, Anna; Huss, Mikael; Alexeyenko, Andrey; Lindvall, Jessica M; Magnusson, Måns; Nilsson, Daniel; Stranneheim, Henrik; Taylan, Fulya; Gilissen, Christian; Hoischen, Alexander; van Bon, Bregje; Yntema, Helger; Nelen, Marcel; Zhang, Weidong; Sager, Jason; Zhang, Lu; Blair, Kathryn; Kural, Deniz; Cariaso, Michael; Lennon, Greg G; Javed, Asif; Agrawal, Saloni; Ng, Pauline C; Sandhu, Komal S; Krishna, Shuba; Veeramachaneni, Vamsi; Isakov, Ofer; Halperin, Eran; Friedman, Eitan; Shomron, Noam; Glusman, Gustavo; Roach, Jared C; Caballero, Juan; Cox, Hannah C; Mauldin, Denise; Ament, Seth A; Rowen, Lee; Richards, Daniel R; San Lucas, F Anthony; Gonzalez-Garay, Manuel L; Caskey, C Thomas; Bai, Yu; Huang, Ying; Fang, Fang; Zhang, Yan; Wang, Zhengyuan; Barrera, Jorge; Garcia-Lobo, Juan M; González-Lamuño, Domingo; Llorca, Javier; Rodriguez, Maria C; Varela, Ignacio; Reese, Martin G; De La Vega, Francisco M; Kiruluta, Edward; Cargill, Michele; Hart, Reece K; Sorenson, Jon M; Lyon, Gholson J; Stevenson, David A; Bray, Bruce E; Moore, Barry M; Eilbeck, Karen; Yandell, Mark; Zhao, Hongyu; Hou, Lin; Chen, Xiaowei; Yan, Xiting; Chen, Mengjie; Li, Cong; Yang, Can; Gunel, Murat; Li, Peining; Kong, Yong; Alexander, Austin C; Albertyn, Zayed I; Boycott, Kym M; Bulman, Dennis E; Gordon, Paul M K; Innes, A Micheil; Knoppers, Bartha M; Majewski, Jacek; Marshall, Christian R; Parboosingh, Jillian S; Sawyer, Sarah L; Samuels, Mark E; Schwartzentruber, Jeremy; Kohane, Isaac S; Margulies, David M

    2014-03-25

    There is tremendous potential for genome sequencing to improve clinical diagnosis and care once it becomes routinely accessible, but this will require formalizing research methods into clinical best practices in the areas of sequence data generation, analysis, interpretation and reporting. The CLARITY Challenge was designed to spur convergence in methods for diagnosing genetic disease starting from clinical case history and genome sequencing data. DNA samples were obtained from three families with heritable genetic disorders and genomic sequence data were donated by sequencing platform vendors. The challenge was to analyze and interpret these data with the goals of identifying disease-causing variants and reporting the findings in a clinically useful format. Participating contestant groups were solicited broadly, and an independent panel of judges evaluated their performance. A total of 30 international groups were engaged. The entries reveal a general convergence of practices on most elements of the analysis and interpretation process. However, even given this commonality of approach, only two groups identified the consensus candidate variants in all disease cases, demonstrating a need for consistent fine-tuning of the generally accepted methods. There was greater diversity of the final clinical report content and in the patient consenting process, demonstrating that these areas require additional exploration and standardization. The CLARITY Challenge provides a comprehensive assessment of current practices for using genome sequencing to diagnose and report genetic diseases. There is remarkable convergence in bioinformatic techniques, but medical interpretation and reporting are areas that require further development by many groups.

  14. An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge

    PubMed Central

    2014-01-01

    Background There is tremendous potential for genome sequencing to improve clinical diagnosis and care once it becomes routinely accessible, but this will require formalizing research methods into clinical best practices in the areas of sequence data generation, analysis, interpretation and reporting. The CLARITY Challenge was designed to spur convergence in methods for diagnosing genetic disease starting from clinical case history and genome sequencing data. DNA samples were obtained from three families with heritable genetic disorders and genomic sequence data were donated by sequencing platform vendors. The challenge was to analyze and interpret these data with the goals of identifying disease-causing variants and reporting the findings in a clinically useful format. Participating contestant groups were solicited broadly, and an independent panel of judges evaluated their performance. Results A total of 30 international groups were engaged. The entries reveal a general convergence of practices on most elements of the analysis and interpretation process. However, even given this commonality of approach, only two groups identified the consensus candidate variants in all disease cases, demonstrating a need for consistent fine-tuning of the generally accepted methods. There was greater diversity of the final clinical report content and in the patient consenting process, demonstrating that these areas require additional exploration and standardization. Conclusions The CLARITY Challenge provides a comprehensive assessment of current practices for using genome sequencing to diagnose and report genetic diseases. There is remarkable convergence in bioinformatic techniques, but medical interpretation and reporting are areas that require further development by many groups. PMID:24667040

  15. CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing.

    PubMed

    Angiuoli, Samuel V; Matalka, Malcolm; Gussman, Aaron; Galens, Kevin; Vangala, Mahesh; Riley, David R; Arze, Cesar; White, James R; White, Owen; Fricke, W Florian

    2011-08-30

    Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.

  16. Multiplex sequence analysis demonstrates the competitive growth advantage of the A-to-G mutants of clarithromycin-resistant Helicobacter pylori.

    PubMed

    Wang, G; Rahman, M S; Humayun, M Z; Taylor, D E

    1999-03-01

    Clarithromycin resistance in Helicobacter pylori is due to point mutation within the 23S rRNA. We examined the growth rates of different types of site-directed mutants and demonstrated quantitatively the competitive growth advantage of A-to-G mutants over other types of mutants by a multiplex sequencing assay. The results provide a rational explanation of why A-to-G mutants are predominantly observed among clarithromycin-resistant clinical isolates.

  17. Multiplex Sequence Analysis Demonstrates the Competitive Growth Advantage of the A-to-G Mutants of Clarithromycin-Resistant Helicobacter pylori

    PubMed Central

    Wang, Ge; Rahman, M. Sayeedur; Humayun, M. Zafri; Taylor, Diane E.

    1999-01-01

    Clarithromycin resistance in Helicobacter pylori is due to point mutation within the 23S rRNA. We examined the growth rates of different types of site-directed mutants and demonstrated quantitatively the competitive growth advantage of A-to-G mutants over other types of mutants by a multiplex sequencing assay. The results provide a rational explanation of why A-to-G mutants are predominantly observed among clarithromycin-resistant clinical isolates. PMID:10049289

  18. Molecular evolution of miraculin-like proteins in soybean Kunitz super-family.

    PubMed

    Selvakumar, Purushotham; Gahloth, Deepankar; Tomar, Prabhat Pratap Singh; Sharma, Nidhi; Sharma, Ashwani Kumar

    2011-12-01

    Miraculin-like proteins (MLPs) belong to soybean Kunitz super-family and have been characterized from many plant families like Rutaceae, Solanaceae, Rubiaceae, etc. Many of them possess trypsin inhibitory activity and are involved in plant defense. MLPs exhibit significant sequence identity (~30-95%) to native miraculin protein, also belonging to Kunitz super-family compared with a typical Kunitz family member (~30%). The sequence and structure-function comparison of MLPs with that of a classical Kunitz inhibitor have demonstrated that MLPs have evolved to form a distinct group within Kunitz super-family. Sequence analysis of new genes along with available MLP sequences in the literature revealed three major groups for these proteins. A significant feature of Rutaceae MLP type 2 sequences is the presence of phosphorylation motif. Subtle changes are seen in putative reactive loop residues among different MLPs suggesting altered specificities to specific proteases. In phylogenetic analysis, Rutaceae MLP type 1 and type 2 proteins clustered together on separate branches, whereas native miraculin along with other MLPs formed distinct clusters. Site-specific positive Darwinian selection was observed at many sites in both the groups of Rutaceae MLP sequences with most of the residues undergoing positive selection located in loop regions. The results demonstrate the sequence and thereby the structure-function divergence of MLPs as a distinct group within soybean Kunitz super-family due to biotic and abiotic stresses of local environment.

  19. Oligo Design: a computer program for development of probes for oligonucleotide microarrays.

    PubMed

    Herold, Keith E; Rasooly, Avraham

    2003-12-01

    Oligonucleotide microarrays have demonstrated potential for the analysis of gene expression, genotyping, and mutational analysis. Our work focuses primarily on the detection and identification of bacteria based on known short sequences of DNA. Oligo Design, the software described here, automates several design aspects that enable the improved selection of oligonucleotides for use with microarrays for these applications. Two major features of the program are: (i) a tiling algorithm for the design of short overlapping temperature-matched oligonucleotides of variable length, which are useful for the analysis of single nucleotide polymorphisms and (ii) a set of tools for the analysis of multiple alignments of gene families and related short DNA sequences, which allow for the identification of conserved DNA sequences for PCR primer selection and variable DNA sequences for the selection of unique probes for identification. Note that the program does not address the full genome perspective but, instead, is focused on the genetic analysis of short segments of DNA. The program is Internet-enabled and includes a built-in browser and the automated ability to download sequences from GenBank by specifying the GI number. The program also includes several utilities, including audio recital of a DNA sequence (useful for verifying sequences against a written document), a random sequence generator that provides insight into the relationship between melting temperature and GC content, and a PCR calculator.

  20. Direct Detection and Sequencing of Damaged DNA Bases

    PubMed Central

    2011-01-01

    Products of various forms of DNA damage have been implicated in a variety of important biological processes, such as aging, neurodegenerative diseases, and cancer. Therefore, there exists great interest to develop methods for interrogating damaged DNA in the context of sequencing. Here, we demonstrate that single-molecule, real-time (SMRT®) DNA sequencing can directly detect damaged DNA bases in the DNA template - as a by-product of the sequencing method - through an analysis of the DNA polymerase kinetics that are altered by the presence of a modified base. We demonstrate the sequencing of several DNA templates containing products of DNA damage, including 8-oxoguanine, 8-oxoadenine, O6-methylguanine, 1-methyladenine, O4-methylthymine, 5-hydroxycytosine, 5-hydroxyuracil, 5-hydroxymethyluracil, or thymine dimers, and show that these base modifications can be readily detected with single-modification resolution and DNA strand specificity. We characterize the distinct kinetic signatures generated by these DNA base modifications. PMID:22185597

  1. Direct detection and sequencing of damaged DNA bases.

    PubMed

    Clark, Tyson A; Spittle, Kristi E; Turner, Stephen W; Korlach, Jonas

    2011-12-20

    Products of various forms of DNA damage have been implicated in a variety of important biological processes, such as aging, neurodegenerative diseases, and cancer. Therefore, there exists great interest to develop methods for interrogating damaged DNA in the context of sequencing. Here, we demonstrate that single-molecule, real-time (SMRT®) DNA sequencing can directly detect damaged DNA bases in the DNA template - as a by-product of the sequencing method - through an analysis of the DNA polymerase kinetics that are altered by the presence of a modified base. We demonstrate the sequencing of several DNA templates containing products of DNA damage, including 8-oxoguanine, 8-oxoadenine, O6-methylguanine, 1-methyladenine, O4-methylthymine, 5-hydroxycytosine, 5-hydroxyuracil, 5-hydroxymethyluracil, or thymine dimers, and show that these base modifications can be readily detected with single-modification resolution and DNA strand specificity. We characterize the distinct kinetic signatures generated by these DNA base modifications.

  2. Analysis of levels of support and resonance demonstrated by an elite singing teacher

    NASA Astrophysics Data System (ADS)

    Scherer, Ronald C.; Radhakrishnan, Nandhakumar; Poulimenos, Andreas

    2003-04-01

    This was a study of levels of singing expertise demonstrated by an elite operatic singer and teacher. This approach may prove advantageous because the teacher demonstrates what he thinks is important, not what the nonsinging scientist thinks should be important. Two pedagogical sequences were studied: (1) the location of support-glottis (poor), chest (better), abdomen (best); (2) locations of resonance-hard palate/straight tone (poor), mouth (better), sinus/head (best). Measures were obtained for a single frequency (196 Hz), the vowel /ae/, and for mezzo-forte loudness using the /pae pae pae/ technique. Sequence differences: The support sequence was characterized by formant frequency lowering suggestive of vocal tract lengthening. The resonance sequence was characterized by flow (AC, mean flow) and abduction increases. Sequence similarities: The best locations had the widest F2 bandwidths. The better and best locations had the largest dB difference between F2 and F3. Although acoustic power increased through the sequences, the acoustic efficiency was not a discriminating factor. Open and speed quotients were not differentiating. The flow resistance was highest and aerodynamic power the lowest for the first of each sequence. Combined data: The maximum flow declination rate correlated highly with the AC flow (r=-0.92) and SPL (r=0.901).

  3. Clonality and serotypes of Streptococcus mutans among children by multilocus sequence typing

    PubMed Central

    Momeni, Stephanie S.; Whiddon, Jennifer; Cheon, Kyounga; Moser, Stephen A.; Childers, Noel K.

    2015-01-01

    Studies using multilocus sequence typing (MLST) have demonstrated that Streptococcus mutans isolates are genetically diverse. Our laboratory previously demonstrated clonality of S. mutans using MLST but could not discount the possibility of sampling bias. In this study, the clonality of randomly selected S. mutans plaque isolates from African American children was examined using MLST. Serotype and presence of collagen-binding proteins (CBP) cnm/cbm were also assessed. One hundred S. mutans isolates were randomly selected for MLST analysis. Sequence analysis was performed and phylogenetic trees were generated using START2 and MEGA. Thirty-four sequence types (ST) were identified of which 27 were unique to this population. Seventy-five percent of the isolates clustered into 16 clonal groups. Serotypes observed were c (n=84), e (n=3), and k (n=11). The prevalence of S. mutans isolates serotype k was notably high at 17.5%. All isolates were cnm/cbm negative. The clonality of S. mutans demonstrated in this study illustrates the importance of localized populations studies and are consistent with transmission. The prevalence of serotype k, a recently proposed systemic pathogen, observed in this study is higher than reported in most populations and is the first report of S. mutans serotype k in a US population. PMID:26443288

  4. Developmental validation of a Nextera XT mitogenome Illumina MiSeq sequencing method for high-quality samples.

    PubMed

    Peck, Michelle A; Sturk-Andreaggi, Kimberly; Thomas, Jacqueline T; Oliver, Robert S; Barritt-Ross, Suzanne; Marshall, Charla

    2018-05-01

    Generating mitochondrial genome (mitogenome) data from reference samples in a rapid and efficient manner is critical to harnessing the greater power of discrimination of the entire mitochondrial DNA (mtDNA) marker. The method of long-range target enrichment, Nextera XT library preparation, and Illumina sequencing on the MiSeq is a well-established technique for generating mitogenome data from high-quality samples. To this end, a validation was conducted for this mitogenome method processing up to 24 samples simultaneously along with analysis in the CLC Genomics Workbench and utilizing the AQME (AFDIL-QIAGEN mtDNA Expert) tool to generate forensic profiles. This validation followed the Federal Bureau of Investigation's Quality Assurance Standards (QAS) for forensic DNA testing laboratories and the Scientific Working Group on DNA Analysis Methods (SWGDAM) validation guidelines. The evaluation of control DNA, non-probative samples, blank controls, mixtures, and nonhuman samples demonstrated the validity of this method. Specifically, the sensitivity was established at ≥25 pg of nuclear DNA input for accurate mitogenome profile generation. Unreproducible low-level variants were observed in samples with low amplicon yields. Further, variant quality was shown to be a useful metric for identifying sequencing error and crosstalk. Success of this method was demonstrated with a variety of reference sample substrates and extract types. These studies further demonstrate the advantages of using NGS techniques by highlighting the quantitative nature of heteroplasmy detection. The results presented herein from more than 175 samples processed in ten sequencing runs, show this mitogenome sequencing method and analysis strategy to be valid for the generation of reference data. Copyright © 2018 Elsevier B.V. All rights reserved.

  5. Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison.

    PubMed

    Dai, Qi; Yang, Yanchun; Wang, Tianming

    2008-10-15

    Many proposed statistical measures can efficiently compare biological sequences to further infer their structures, functions and evolutionary information. They are related in spirit because all the ideas for sequence comparison try to use the information on the k-word distributions, Markov model or both. Motivated by adding k-word distributions to Markov model directly, we investigated two novel statistical measures for sequence comparison, called wre.k.r and S2.k.r. The proposed measures were tested by similarity search, evaluation on functionally related regulatory sequences and phylogenetic analysis. This offers the systematic and quantitative experimental assessment of our measures. Moreover, we compared our achievements with these based on alignment or alignment-free. We grouped our experiments into two sets. The first one, performed via ROC (receiver operating curve) analysis, aims at assessing the intrinsic ability of our statistical measures to search for similar sequences from a database and discriminate functionally related regulatory sequences from unrelated sequences. The second one aims at assessing how well our statistical measure is used for phylogenetic analysis. The experimental assessment demonstrates that our similarity measures intending to incorporate k-word distributions into Markov model are more efficient.

  6. Power law tails in phylogenetic systems.

    PubMed

    Qin, Chongli; Colwell, Lucy J

    2018-01-23

    Covariance analysis of protein sequence alignments uses coevolving pairs of sequence positions to predict features of protein structure and function. However, current methods ignore the phylogenetic relationships between sequences, potentially corrupting the identification of covarying positions. Here, we use random matrix theory to demonstrate the existence of a power law tail that distinguishes the spectrum of covariance caused by phylogeny from that caused by structural interactions. The power law is essentially independent of the phylogenetic tree topology, depending on just two parameters-the sequence length and the average branch length. We demonstrate that these power law tails are ubiquitous in the large protein sequence alignments used to predict contacts in 3D structure, as predicted by our theory. This suggests that to decouple phylogenetic effects from the interactions between sequence distal sites that control biological function, it is necessary to remove or down-weight the eigenvectors of the covariance matrix with largest eigenvalues. We confirm that truncating these eigenvectors improves contact prediction.

  7. Self-Organizing Hidden Markov Model Map (SOHMMM): Biological Sequence Clustering and Cluster Visualization.

    PubMed

    Ferles, Christos; Beaufort, William-Scott; Ferle, Vanessa

    2017-01-01

    The present study devises mapping methodologies and projection techniques that visualize and demonstrate biological sequence data clustering results. The Sequence Data Density Display (SDDD) and Sequence Likelihood Projection (SLP) visualizations represent the input symbolical sequences in a lower-dimensional space in such a way that the clusters and relations of data elements are depicted graphically. Both operate in combination/synergy with the Self-Organizing Hidden Markov Model Map (SOHMMM). The resulting unified framework is in position to analyze automatically and directly raw sequence data. This analysis is carried out with little, or even complete absence of, prior information/domain knowledge.

  8. Analysis of sequence diversity through internal transcribed spacers and simple sequence repeats to identify Dendrobium species.

    PubMed

    Liu, Y T; Chen, R K; Lin, S J; Chen, Y C; Chin, S W; Chen, F C; Lee, C Y

    2014-04-08

    The Orchidaceae is one of the largest and most diverse families of flowering plants. The Dendrobium genus has high economic potential as ornamental plants and for medicinal purposes. In addition, the species of this genus are able to produce large crops. However, many Dendrobium varieties are very similar in outward appearance, making it difficult to distinguish one species from another. This study demonstrated that the 12 Dendrobium species used in this study may be divided into 2 groups by internal transcribed spacer (ITS) sequence analysis. Red and yellow flowers may also be used to separate these species into 2 main groups. In particular, the deciduous characteristic is associated with the ITS genetic diversity of the A group. Of 53 designed simple sequence repeat (SSR) primer pairs, 7 pairs were polymorphic for polymerase chain reaction products that were amplified from a specific band. The results of this study demonstrate that these 7 SSR primer pairs may potentially be used to identify Dendrobium species and their progeny in future studies.

  9. Isolation and sequence analysis of a novel rhesus macaque foamy virus isolate with a serotype-1-like env.

    PubMed

    Ensser, Armin; Großkopf, Anna K; Mätz-Rensing, Kerstin; Roos, Christian; Hahn, Alexander S

    2018-06-02

    SFVmmu-DPZ9524 represents the third completely sequenced rhesus macaque simian foamy virus (SFV) isolate, alongside SFVmmu_K3T with a similar SFV-1-type env, and R289HybAGM with a SFV-2-like env. Sequence analysis demonstrates that, in gag and pol, SFVmmu-DPZ9524 is more closely related to R289HybAGM than to SFVmmu_K3T, which, outside of env, is more similar to a Japanese macaque isolate than to the other two rhesus macaque isolates SFVmmu-DPZ9524 and R289HybAGM. Further, we identify bel as another recombinant locus in R289HybAGM, confirming that recombination contributes to sequence diversity in SFV.

  10. K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.

    PubMed

    Lin, Jie; Adjeroh, Donald A; Jiang, Bing-Hua; Jiang, Yue

    2018-05-15

    Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods. We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes. The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz). yueljiang@163.com. Supplementary data are available at Bioinformatics online.

  11. Application of representational difference analysis to identify genomic differences between Bradyrhizobium elkanii and B. Japonicum species.

    PubMed

    Soares, René Arderius; Passaglia, Luciane Maria Pereira

    2010-10-01

    Bradyrhizobium elkanii is successfully used in the formulation of commercial inoculants and, together with B. japonicum, it fully supplies the plant nitrogen demands. Despite the similarity between B. japonicum and B. elkanii species, several works demonstrated genetic and physiological differences between them. In this work Representational Difference Analysis (RDA) was used for genomic comparison between B. elkanii SEMIA 587, a crop inoculant strain, and B. japonicum USDA 110, a reference strain. Two hundred sequences were obtained. From these, 46 sequences belonged exclusively to the genome of B. elkanii strain, and 154 showed similarity to sequences from B. japonicum genome. From the 46 sequences with no similarity to sequences from B. japonicum, 39 showed no similarity to sequences in public databases and seven showed similarity to sequences of genes coding for known proteins. These seven sequences were divided in three groups: similar to sequences from other Bradyrhizobium strains, similar to sequences from other nitrogen-fixing bacteria, and similar to sequences from non nitrogen-fixing bacteria. These new sequences could be used as DNA markers in order to investigate the rates of genetic material gain and loss in natural Bradyrhizobium strains.

  12. Patterns and Sequences: Interactive Exploration of Clickstreams to Understand Common Visitor Paths.

    PubMed

    Liu, Zhicheng; Wang, Yang; Dontcheva, Mira; Hoffman, Matthew; Walker, Seth; Wilson, Alan

    2017-01-01

    Modern web clickstream data consists of long, high-dimensional sequences of multivariate events, making it difficult to analyze. Following the overarching principle that the visual interface should provide information about the dataset at multiple levels of granularity and allow users to easily navigate across these levels, we identify four levels of granularity in clickstream analysis: patterns, segments, sequences and events. We present an analytic pipeline consisting of three stages: pattern mining, pattern pruning and coordinated exploration between patterns and sequences. Based on this approach, we discuss properties of maximal sequential patterns, propose methods to reduce the number of patterns and describe design considerations for visualizing the extracted sequential patterns and the corresponding raw sequences. We demonstrate the viability of our approach through an analysis scenario and discuss the strengths and limitations of the methods based on user feedback.

  13. The complete and fully assembled genome sequence of Aeromonas salmonicida subsp. pectinolytica and its comparative analysis with other Aeromonas species: investigation of the mobilome in environmental and pathogenic strains.

    PubMed

    Pfeiffer, Friedhelm; Zamora-Lagos, Maria-Antonia; Blettinger, Martin; Yeroslaviz, Assa; Dahl, Andreas; Gruber, Stephan; Habermann, Bianca H

    2018-01-05

    Due to the predominant usage of short-read sequencing to date, most bacterial genome sequences reported in the last years remain at the draft level. This precludes certain types of analyses, such as the in-depth analysis of genome plasticity. Here we report the finalized genome sequence of the environmental strain Aeromonas salmonicida subsp. pectinolytica 34mel, for which only a draft genome with 253 contigs is currently available. Successful completion of the transposon-rich genome critically depended on the PacBio long read sequencing technology. Using finalized genome sequences of A. salmonicida subsp. pectinolytica and other Aeromonads, we report the detailed analysis of the transposon composition of these bacterial species. Mobilome evolution is exemplified by a complex transposon, which has shifted from pathogenicity-related to environmental-related gene content in A. salmonicida subsp. pectinolytica 34mel. Obtaining the complete, circular genome of A. salmonicida subsp. pectinolytica allowed us to perform an in-depth analysis of its mobilome. We demonstrate the mobilome-dependent evolution of this strain's genetic profile from pathogenic to environmental.

  14. Genetic analysis of duck circovirus in Pekin ducks from South Korea.

    PubMed

    Cha, S-Y; Kang, M; Cho, J-G; Jang, H-K

    2013-11-01

    The genetic organization of the 24 duck circovirus (DuCV) strains detected in commercial Pekin ducks from South Korea between 2011 and 2012 is described in this study. Multiple sequence alignment and phylogenetic analyses were performed on the 24 viral genome sequences as well as on 45 genome sequences available from the GenBank database. Phylogenetic analyses based on the genomic and open reading frame 2/cap sequences demonstrated that all DuCV strains belonged to genotype 1 and were designated in a subcluster under genotype 1. Analysis of the capsid protein amino acid sequences of the 24 Korean DuCV strains showed 10 substitutions compared with that of other genotype 1 strains. Our analysis showed that genotype 1 is predominant and circulating in South Korea. These present results serve as incentive to add more data to the DuCV database and provide insight to conduct further intensive study on the geographic relationships among these virus strains.

  15. Infrared thermal facial image sequence registration analysis and verification

    NASA Astrophysics Data System (ADS)

    Chen, Chieh-Li; Jian, Bo-Lin

    2015-03-01

    To study the emotional responses of subjects to the International Affective Picture System (IAPS), infrared thermal facial image sequence is preprocessed for registration before further analysis such that the variance caused by minor and irregular subject movements is reduced. Without affecting the comfort level and inducing minimal harm, this study proposes an infrared thermal facial image sequence registration process that will reduce the deviations caused by the unconscious head shaking of the subjects. A fixed image for registration is produced through the localization of the centroid of the eye region as well as image translation and rotation processes. Thermal image sequencing will then be automatically registered using the two-stage genetic algorithm proposed. The deviation before and after image registration will be demonstrated by image quality indices. The results show that the infrared thermal image sequence registration process proposed in this study is effective in localizing facial images accurately, which will be beneficial to the correlation analysis of psychological information related to the facial area.

  16. Algorithm, applications and evaluation for protein comparison by Ramanujan Fourier transform.

    PubMed

    Zhao, Jian; Wang, Jiasong; Hua, Wei; Ouyang, Pingkai

    2015-12-01

    The amino acid sequence of a protein determines its chemical properties, chain conformation and biological functions. Protein sequence comparison is of great importance to identify similarities of protein structures and infer their functions. Many properties of a protein correspond to the low-frequency signals within the sequence. Low frequency modes in protein sequences are linked to the secondary structures, membrane protein types, and sub-cellular localizations of the proteins. In this paper, we present Ramanujan Fourier transform (RFT) with a fast algorithm to analyze the low-frequency signals of protein sequences. The RFT method is applied to similarity analysis of protein sequences with the Resonant Recognition Model (RRM). The results show that the proposed fast RFT method on protein comparison is more efficient than commonly used discrete Fourier transform (DFT). RFT can detect common frequencies as significant feature for specific protein families, and the RFT spectrum heat-map of protein sequences demonstrates the information conservation in the sequence comparison. The proposed method offers a new tool for pattern recognition, feature extraction and structural analysis on protein sequences. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. Using relational databases for improved sequence similarity searching and large-scale genomic analyses.

    PubMed

    Mackey, Aaron J; Pearson, William R

    2004-10-01

    Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.

  18. Sequence-structure mapping errors in the PDB: OB-fold domains

    PubMed Central

    Venclovas, Česlovas; Ginalski, Krzysztof; Kang, Chulhee

    2004-01-01

    The Protein Data Bank (PDB) is the single most important repository of structural data for proteins and other biologically relevant molecules. Therefore, it is critically important to keep the PDB data, as much as possible, error-free. In this study, we have analyzed PDB crystal structures possessing oligonucleotide/oligosaccharide binding (OB)-fold, one of the highly populated folds, for the presence of sequence-structure mapping errors. Using energy-based structure quality assessment coupled with sequence analyses, we have found that there are at least five OB-structures in the PDB that have regions where sequences have been incorrectly mapped onto the structure. We have demonstrated that the combination of these computation techniques is effective not only in detecting sequence-structure mapping errors, but also in providing guidance to correct them. Namely, we have used results of computational analysis to direct a revision of X-ray data for one of the PDB entries containing a fairly inconspicuous sequence-structure mapping error. The revised structure has been deposited with the PDB. We suggest use of computational energy assessment and sequence analysis techniques to facilitate structure determination when homologs having known structure are available to use as a reference. Such computational analysis may be useful in either guiding the sequence-structure assignment process or verifying the sequence mapping within poorly defined regions. PMID:15133161

  19. Studies of a biochemical factory: tomato trichome deep expressed sequence tag sequencing and proteomics.

    PubMed

    Schilmiller, Anthony L; Miner, Dennis P; Larson, Matthew; McDowell, Eric; Gang, David R; Wilkerson, Curtis; Last, Robert L

    2010-07-01

    Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces beta-caryophyllene and alpha-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells.

  20. Studies of a Biochemical Factory: Tomato Trichome Deep Expressed Sequence Tag Sequencing and Proteomics1[W][OA

    PubMed Central

    Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.

    2010-01-01

    Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087

  1. BlockLogo: visualization of peptide and sequence motif conservation

    PubMed Central

    Olsen, Lars Rønn; Kudahl, Ulrich Johan; Simon, Christian; Sun, Jing; Schönbach, Christian; Reinherz, Ellis L.; Zhang, Guang Lan; Brusic, Vladimir

    2013-01-01

    BlockLogo is a web-server application for visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from multiple sequence alignments. The user input consists of a multiple sequence alignment, selection of motif positions, type of sequence, and output format definition. The output has BlockLogo along with the sequence logo, and a table of motif frequencies. We deployed BlockLogo as an online application and have demonstrated its utility through examples that show visualization of T-cell epitopes and B-cell epitopes (both continuous and discontinuous). Our additional example shows a visualization and analysis of structural motifs that determine specificity of peptide binding to HLA-DR molecules. The BlockLogo server also employs selected experimentally validated prediction algorithms to enable on-the-fly prediction of MHC binding affinity to 15 common HLA class I and class II alleles as well as visual analysis of discontinuous epitopes from multiple sequence alignments. It enables the visualization and analysis of structural and functional motifs that are usually described as regular expressions. It provides a compact view of discontinuous motifs composed of distant positions within biological sequences. BlockLogo is available at: http://research4.dfci.harvard.edu/cvc/blocklogo/ and http://methilab.bu.edu/blocklogo/ PMID:24001880

  2. Analysis of plant microbe interactions in the era of next generation sequencing technologies

    PubMed Central

    Knief, Claudia

    2014-01-01

    Next generation sequencing (NGS) technologies have impressively accelerated research in biological science during the last years by enabling the production of large volumes of sequence data to a drastically lower price per base, compared to traditional sequencing methods. The recent and ongoing developments in the field allow addressing research questions in plant-microbe biology that were not conceivable just a few years ago. The present review provides an overview of NGS technologies and their usefulness for the analysis of microorganisms that live in association with plants. Possible limitations of the different sequencing systems, in particular sources of errors and bias, are critically discussed and methods are disclosed that help to overcome these shortcomings. A focus will be on the application of NGS methods in metagenomic studies, including the analysis of microbial communities by amplicon sequencing, which can be considered as a targeted metagenomic approach. Different applications of NGS technologies are exemplified by selected research articles that address the biology of the plant associated microbiota to demonstrate the worth of the new methods. PMID:24904612

  3. DArT Markers Effectively Target Gene Space in the Rye Genome

    PubMed Central

    Gawroński, Piotr; Pawełkowicz, Magdalena; Tofil, Katarzyna; Uszyński, Grzegorz; Sharifova, Saida; Ahluwalia, Shivaksh; Tyrka, Mirosław; Wędzony, Maria; Kilian, Andrzej; Bolibok-Brągoszewska, Hanna

    2016-01-01

    Large genome size and complexity hamper considerably the genomics research in relevant species. Rye (Secale cereale L.) has one of the largest genomes among cereal crops and repetitive sequences account for over 90% of its length. Diversity Arrays Technology is a high-throughput genotyping method, in which a preferential sampling of gene-rich regions is achieved through the use of methylation sensitive restriction enzymes. We obtained sequences of 6,177 rye DArT markers and following a redundancy analysis assembled them into 3,737 non-redundant sequences, which were then used in homology searches against five Pooideae sequence sets. In total 515 DArT sequences could be incorporated into publicly available rye genome zippers providing a starting point for the integration of DArT- and transcript-based genomics resources in rye. Using Blast2Go pipeline we attributed putative gene functions to 1101 (29.4%) of the non-redundant DArT marker sequences, including 132 sequences with putative disease resistance-related functions, which were found to be preferentially located in the 4RL and 6RL chromosomes. Comparative analysis based on the DArT sequences revealed obvious inconsistencies between two recently published high density consensus maps of rye. Furthermore we demonstrated that DArT marker sequences can be a source of SSR polymorphisms. Obtained data demonstrate that DArT markers effectively target gene space in the large, complex, and repetitive rye genome. Through the annotation of putative gene functions and the alignment of DArT sequences relative to reference genomes we obtained information, that will complement the results of the studies, where DArT genotyping was deployed, by simplifying the gene ontology and microcolinearity based identification of candidate genes. PMID:27833625

  4. DArT Markers Effectively Target Gene Space in the Rye Genome.

    PubMed

    Gawroński, Piotr; Pawełkowicz, Magdalena; Tofil, Katarzyna; Uszyński, Grzegorz; Sharifova, Saida; Ahluwalia, Shivaksh; Tyrka, Mirosław; Wędzony, Maria; Kilian, Andrzej; Bolibok-Brągoszewska, Hanna

    2016-01-01

    Large genome size and complexity hamper considerably the genomics research in relevant species. Rye ( Secale cereale L.) has one of the largest genomes among cereal crops and repetitive sequences account for over 90% of its length. Diversity Arrays Technology is a high-throughput genotyping method, in which a preferential sampling of gene-rich regions is achieved through the use of methylation sensitive restriction enzymes. We obtained sequences of 6,177 rye DArT markers and following a redundancy analysis assembled them into 3,737 non-redundant sequences, which were then used in homology searches against five Pooideae sequence sets. In total 515 DArT sequences could be incorporated into publicly available rye genome zippers providing a starting point for the integration of DArT- and transcript-based genomics resources in rye. Using Blast2Go pipeline we attributed putative gene functions to 1101 (29.4%) of the non-redundant DArT marker sequences, including 132 sequences with putative disease resistance-related functions, which were found to be preferentially located in the 4RL and 6RL chromosomes. Comparative analysis based on the DArT sequences revealed obvious inconsistencies between two recently published high density consensus maps of rye. Furthermore we demonstrated that DArT marker sequences can be a source of SSR polymorphisms. Obtained data demonstrate that DArT markers effectively target gene space in the large, complex, and repetitive rye genome. Through the annotation of putative gene functions and the alignment of DArT sequences relative to reference genomes we obtained information, that will complement the results of the studies, where DArT genotyping was deployed, by simplifying the gene ontology and microcolinearity based identification of candidate genes.

  5. CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing

    PubMed Central

    2011-01-01

    Background Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. Results We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. Conclusion The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing. PMID:21878105

  6. Identification of multiple mRNA and DNA sequences from small tissue samples isolated by laser-assisted microdissection.

    PubMed

    Bernsen, M R; Dijkman, H B; de Vries, E; Figdor, C G; Ruiter, D J; Adema, G J; van Muijen, G N

    1998-10-01

    Molecular analysis of small tissue samples has become increasingly important in biomedical studies. Using a laser dissection microscope and modified nucleic acid isolation protocols, we demonstrate that multiple mRNA as well as DNA sequences can be identified from a single-cell sample. In addition, we show that the specificity of procurement of tissue samples is not compromised by smear contamination resulting from scraping of the microtome knife during sectioning of lesions. The procedures described herein thus allow for efficient RT-PCR or PCR analysis of multiple nucleic acid sequences from small tissue samples obtained by laser-assisted microdissection.

  7. Transcriptome analysis by strand-specific sequencing of complementary DNA

    PubMed Central

    Parkhomchuk, Dmitri; Borodina, Tatiana; Amstislavskiy, Vyacheslav; Banaru, Maria; Hallen, Linda; Krobitsch, Sylvia; Lehrach, Hans; Soldatov, Alexey

    2009-01-01

    High-throughput complementary DNA sequencing (RNA-Seq) is a powerful tool for whole-transcriptome analysis, supplying information about a transcript's expression level and structure. However, it is difficult to determine the polarity of transcripts, and therefore identify which strand is transcribed. Here, we present a simple cDNA sequencing protocol that preserves information about a transcript's direction. Using Saccharomyces cerevisiae and mouse brain transcriptomes as models, we demonstrate that knowing the transcript's orientation allows more accurate determination of the structure and expression of genes. It also helps to identify new genes and enables studying promoter-associated and antisense transcription. The transcriptional landscapes we obtained are available online. PMID:19620212

  8. Transcriptome analysis by strand-specific sequencing of complementary DNA.

    PubMed

    Parkhomchuk, Dmitri; Borodina, Tatiana; Amstislavskiy, Vyacheslav; Banaru, Maria; Hallen, Linda; Krobitsch, Sylvia; Lehrach, Hans; Soldatov, Alexey

    2009-10-01

    High-throughput complementary DNA sequencing (RNA-Seq) is a powerful tool for whole-transcriptome analysis, supplying information about a transcript's expression level and structure. However, it is difficult to determine the polarity of transcripts, and therefore identify which strand is transcribed. Here, we present a simple cDNA sequencing protocol that preserves information about a transcript's direction. Using Saccharomyces cerevisiae and mouse brain transcriptomes as models, we demonstrate that knowing the transcript's orientation allows more accurate determination of the structure and expression of genes. It also helps to identify new genes and enables studying promoter-associated and antisense transcription. The transcriptional landscapes we obtained are available online.

  9. Cloning and sequence analysis demonstrate the chromate reduction ability of a novel chromate reductase gene from Serratia sp.

    PubMed

    Deng, Peng; Tan, Xiaoqing; Wu, Ying; Bai, Qunhua; Jia, Yan; Xiao, Hong

    2015-03-01

    The ChrT gene encodes a chromate reductase enzyme which catalyzes the reduction of Cr(VI). The chromate reductase is also known as flavin mononucleotide (FMN) reductase (FMN_red). The aim of the present study was to clone the full-length ChrT DNA from Serratia sp. CQMUS2 and analyze the deduced amino acid sequence and three-dimensional structure. The putative ChrT gene fragment of Serratia sp. CQMUS2 was isolated by polymerase chain reaction (PCR), according to the known FMN_red gene sequence from Serratia sp. AS13. The flanking sequences of the ChrT gene were obtained by high efficiency TAIL-PCR, while the full-length gene of ChrT was cloned in Escherichia coli for subsequent sequencing. The nucleotide sequence of ChrT was submitted onto GenBank under the accession number, KF211434. Sequence analysis of the gene and amino acids was conducted using the Basic Local Alignment Search Tool, and open reading frame (ORF) analysis was performed using ORF Finder software. The ChrT gene was found to be an ORF of 567 bp that encodes a 188-amino acid enzyme with a calculated molecular weight of 20.4 kDa. In addition, the ChrT protein was hypothesized to be an NADPH-dependent FMN_red and a member of the flavodoxin-2 superfamily. The amino acid sequence of ChrT showed high sequence similarity to the FMN reductase genes of Klebsiella pneumonia and Raoultella ornithinolytica , which belong to the flavodoxin-2 superfamily. Furthermore, ChrT was shown to have a 85.6% similarity to the three-dimensional structure of Escherichia coli ChrR, sharing four common enzyme active sites for chromate reduction. Therefore, ChrT gene cloning and protein structure determination demonstrated the ability of the gene for chromate reduction. The results of the present study provide a basis for further studies on ChrT gene expression and protein function.

  10. Cloning and sequence analysis demonstrate the chromate reduction ability of a novel chromate reductase gene from Serratia sp

    PubMed Central

    DENG, PENG; TAN, XIAOQING; WU, YING; BAI, QUNHUA; JIA, YAN; XIAO, HONG

    2015-01-01

    The ChrT gene encodes a chromate reductase enzyme which catalyzes the reduction of Cr(VI). The chromate reductase is also known as flavin mononucleotide (FMN) reductase (FMN_red). The aim of the present study was to clone the full-length ChrT DNA from Serratia sp. CQMUS2 and analyze the deduced amino acid sequence and three-dimensional structure. The putative ChrT gene fragment of Serratia sp. CQMUS2 was isolated by polymerase chain reaction (PCR), according to the known FMN_red gene sequence from Serratia sp. AS13. The flanking sequences of the ChrT gene were obtained by high efficiency TAIL-PCR, while the full-length gene of ChrT was cloned in Escherichia coli for subsequent sequencing. The nucleotide sequence of ChrT was submitted onto GenBank under the accession number, KF211434. Sequence analysis of the gene and amino acids was conducted using the Basic Local Alignment Search Tool, and open reading frame (ORF) analysis was performed using ORF Finder software. The ChrT gene was found to be an ORF of 567 bp that encodes a 188-amino acid enzyme with a calculated molecular weight of 20.4 kDa. In addition, the ChrT protein was hypothesized to be an NADPH-dependent FMN_red and a member of the flavodoxin-2 superfamily. The amino acid sequence of ChrT showed high sequence similarity to the FMN reductase genes of Klebsiella pneumonia and Raoultella ornithinolytica, which belong to the flavodoxin-2 superfamily. Furthermore, ChrT was shown to have a 85.6% similarity to the three-dimensional structure of Escherichia coli ChrR, sharing four common enzyme active sites for chromate reduction. Therefore, ChrT gene cloning and protein structure determination demonstrated the ability of the gene for chromate reduction. The results of the present study provide a basis for further studies on ChrT gene expression and protein function. PMID:25667630

  11. A Comprehensive Approach to Sequence-oriented IsomiR annotation (CASMIR): demonstration with IsomiR profiling in colorectal neoplasia.

    PubMed

    Wu, Chung Wah; Evans, Jared M; Huang, Shengbing; Mahoney, Douglas W; Dukek, Brian A; Taylor, William R; Yab, Tracy C; Smyrk, Thomas C; Jen, Jin; Kisiel, John B; Ahlquist, David A

    2018-05-25

    MicroRNA (miRNA) profiling is an important step in studying biological associations and identifying marker candidates. miRNA exists in isoforms, called isomiRs, which may exhibit distinct properties. With conventional profiling methods, limitations in assay and analysis platforms may compromise isomiR interrogation. We introduce a comprehensive approach to sequence-oriented isomiR annotation (CASMIR) to allow unbiased identification of global isomiRs from small RNA sequencing data. In this approach, small RNA reads are maintained as independent sequences instead of being summarized under miRNA names. IsomiR features are identified through step-wise local alignment against canonical forms and precursor sequences. Through customizing the reference database, CASMIR is applicable to isomiR annotation across species. To demonstrate its application, we investigated isomiR profiles in normal and neoplastic human colorectal epithelia. We also ran miRDeep2, a popular miRNA analysis algorithm to validate isomiRs annotated by CASMIR. With CASMIR, specific and biologically relevant isomiR patterns could be identified. We note that specific isomiRs are often more abundant than their canonical forms. We identify isomiRs that are commonly up-regulated in both colorectal cancer and advanced adenoma, and illustrate advantages in targeting isomiRs as potential biomarkers over canonical forms. Studying miRNAs at the isomiR level could reveal new insight into miRNA biology and inform assay design for specific isomiRs. CASMIR facilitates comprehensive annotation of isomiR features in small RNA sequencing data for isomiR profiling and differential expression analysis.

  12. The Use of Weighted Graphs for Large-Scale Genome Analysis

    PubMed Central

    Zhou, Fang; Toivonen, Hannu; King, Ross D.

    2014-01-01

    There is an acute need for better tools to extract knowledge from the growing flood of sequence data. For example, thousands of complete genomes have been sequenced, and their metabolic networks inferred. Such data should enable a better understanding of evolution. However, most existing network analysis methods are based on pair-wise comparisons, and these do not scale to thousands of genomes. Here we propose the use of weighted graphs as a data structure to enable large-scale phylogenetic analysis of networks. We have developed three types of weighted graph for enzymes: taxonomic (these summarize phylogenetic importance), isoenzymatic (these summarize enzymatic variety/redundancy), and sequence-similarity (these summarize sequence conservation); and we applied these types of weighted graph to survey prokaryotic metabolism. To demonstrate the utility of this approach we have compared and contrasted the large-scale evolution of metabolism in Archaea and Eubacteria. Our results provide evidence for limits to the contingency of evolution. PMID:24619061

  13. VDJServer: A Cloud-Based Analysis Portal and Data Commons for Immune Repertoire Sequences and Rearrangements.

    PubMed

    Christley, Scott; Scarborough, Walter; Salinas, Eddie; Rounds, William H; Toby, Inimary T; Fonner, John M; Levin, Mikhail K; Kim, Min; Mock, Stephen A; Jordan, Christopher; Ostmeyer, Jared; Buntzman, Adam; Rubelt, Florian; Davila, Marco L; Monson, Nancy L; Scheuermann, Richard H; Cowell, Lindsay G

    2018-01-01

    Recent technological advances in immune repertoire sequencing have created tremendous potential for advancing our understanding of adaptive immune response dynamics in various states of health and disease. Immune repertoire sequencing produces large, highly complex data sets, however, which require specialized methods and software tools for their effective analysis and interpretation. VDJServer is a cloud-based analysis portal for immune repertoire sequence data that provide access to a suite of tools for a complete analysis workflow, including modules for preprocessing and quality control of sequence reads, V(D)J gene segment assignment, repertoire characterization, and repertoire comparison. VDJServer also provides sophisticated visualizations for exploratory analysis. It is accessible through a standard web browser via a graphical user interface designed for use by immunologists, clinicians, and bioinformatics researchers. VDJServer provides a data commons for public sharing of repertoire sequencing data, as well as private sharing of data between users. We describe the main functionality and architecture of VDJServer and demonstrate its capabilities with use cases from cancer immunology and autoimmunity. VDJServer provides a complete analysis suite for human and mouse T-cell and B-cell receptor repertoire sequencing data. The combination of its user-friendly interface and high-performance computing allows large immune repertoire sequencing projects to be analyzed with no programming or software installation required. VDJServer is a web-accessible cloud platform that provides access through a graphical user interface to a data management infrastructure, a collection of analysis tools covering all steps in an analysis, and an infrastructure for sharing data along with workflows, results, and computational provenance. VDJServer is a free, publicly available, and open-source licensed resource.

  14. A biological compression model and its applications.

    PubMed

    Cao, Minh Duc; Dix, Trevor I; Allison, Lloyd

    2011-01-01

    A biological compression model, expert model, is presented which is superior to existing compression algorithms in both compression performance and speed. The model is able to compress whole eukaryotic genomes. Most importantly, the model provides a framework for knowledge discovery from biological data. It can be used for repeat element discovery, sequence alignment and phylogenetic analysis. We demonstrate that the model can handle statistically biased sequences and distantly related sequences where conventional knowledge discovery tools often fail.

  15. Polymorphisms and variants in the prion protein sequence of European moose (Alces alces), reindeer (Rangifer tarandus), roe deer (Capreolus capreolus) and fallow deer (Dama dama) in Scandinavia

    PubMed Central

    Wik, Lotta; Mikko, Sofia; Klingeborn, Mikael; Stéen, Margareta; Simonsson, Magnus; Linné, Tommy

    2012-01-01

    The prion protein (PrP) sequence of European moose, reindeer, roe deer and fallow deer in Scandinavia has high homology to the PrP sequence of North American cervids. Variants in the European moose PrP sequence were found at amino acid position 109 as K or Q. The 109Q variant is unique in the PrP sequence of vertebrates. During the 1980s a wasting syndrome in Swedish moose, Moose Wasting Syndrome (MWS), was described. SNP analysis demonstrated a difference in the observed genotype proportions of the heterozygous Q/K and homozygous Q/Q variants in the MWS animals compared with the healthy animals. In MWS moose the allele frequencies for 109K and 109Q were 0.73 and 0.27, respectively, and for healthy animals 0.69 and 0.31. Both alleles were seen as heterozygotes and homozygotes. In reindeer, PrP sequence variation was demonstrated at codon 176 as D or N and codon 225 as S or Y. The PrP sequences in roe deer and fallow deer were identical with published GenBank sequences. PMID:22441661

  16. A novel progesterone receptor membrane component (PGRMC) in the human and swine parasite Taenia solium: implications to the host-parasite relationship.

    PubMed

    Aguilar-Díaz, Hugo; Nava-Castro, Karen E; Escobedo, Galileo; Domínguez-Ramírez, Lenin; García-Varela, Martín; Del Río-Araiza, Víctor H; Palacios-Arreola, Margarita I; Morales-Montor, Jorge

    2018-03-09

    We have previously reported that progesterone (P 4 ) has a direct in vitro effect on the scolex evagination and growth of Taenia solium cysticerci. Here, we explored the hypothesis that the P 4 direct effect on T. solium might be mediated by a novel steroid-binding parasite protein. By way of using immunofluorescent confocal microscopy, flow cytometry analysis, double-dimension electrophoresis analysis, and sequencing the corresponding protein spot, we detected a novel PGRMC in T. solium. Molecular modeling studies accompanied by computer docking using the sequenced protein, together with phylogenetic analysis and sequence alignment clearly demonstrated that T. solium PGRMC is from parasite origin. Our results show that P 4 in vitro increases parasite evagination and scolex size. Using immunofluorescent confocal microscopy, we detected that parasite cells showed expression of a P 4 -binding like protein exclusively located at the cysticercus subtegumental tissue. Presence of the P 4 -binding protein in cyst cells was also confirmed by flow cytometry. Double-dimension electrophoresis analysis, followed by sequencing the corresponding protein spot, revealed a protein that was previously reported in the T. solium genome belonging to a membrane-associated progesterone receptor component (PGRMC). Molecular modeling studies accompanied by computer docking using the sequenced protein showed that PGRMC is potentially able to bind steroid hormones such as progesterone, estradiol, testosterone and dihydrodrotestosterone with different affinities. Phylogenetic analysis and sequence alignment clearly demonstrated that T. solium PGRMC is related to a steroid-binding protein of Echinoccocus granulosus, both of them being nested within a cluster including similar proteins present in platyhelminths such as Schistocephalus solidus and Schistosoma haematobium. Progesterone may directly act upon T. solium cysticerci probably by binding to PGRMC. This research has implications in the field of host-parasite co-evolution as well as the sex-associated susceptibility to this infection. In a more practical matter, present results may contribute to the molecular design of new drugs with anti-parasite actions.

  17. Forensic massively parallel sequencing data analysis tool: Implementation of MyFLq as a standalone web- and Illumina BaseSpace(®)-application.

    PubMed

    Van Neste, Christophe; Gansemans, Yannick; De Coninck, Dieter; Van Hoofstat, David; Van Criekinge, Wim; Deforce, Dieter; Van Nieuwerburgh, Filip

    2015-03-01

    Routine use of massively parallel sequencing (MPS) for forensic genomics is on the horizon. The last few years, several algorithms and workflows have been developed to analyze forensic MPS data. However, none have yet been tailored to the needs of the forensic analyst who does not possess an extensive bioinformatics background. We developed our previously published forensic MPS data analysis framework MyFLq (My-Forensic-Loci-queries) into an open-source, user-friendly, web-based application. It can be installed as a standalone web application, or run directly from the Illumina BaseSpace environment. In the former, laboratories can keep their data on-site, while in the latter, data from forensic samples that are sequenced on an Illumina sequencer can be uploaded to Basespace during acquisition, and can subsequently be analyzed using the published MyFLq BaseSpace application. Additional features were implemented such as an interactive graphical report of the results, an interactive threshold selection bar, and an allele length-based analysis in addition to the sequenced-based analysis. Practical use of the application is demonstrated through the analysis of four 16-plex short tandem repeat (STR) samples, showing the complementarity between the sequence- and length-based analysis of the same MPS data. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

  18. Novel gastric helicobacters and oral campylobacters are present in captive and wild cetaceans

    PubMed Central

    Goldman, Cinthia G.; Matteo, Mario J.; Loureiro, Julio D.; Almuzara, Marisa; Barberis, Claudia; Vay, Carlos; Catalano, Mariana; Heredia, Sergio Rodríguez; Mantero, Paula; Boccio, Jose R.; Zubillaga, Marcela B.; Cremaschi, Graciela A.; Solnick, Jay V.; Perez-Perez, Guillermo I.; Blaser, Martin J.

    2011-01-01

    The mammalian gastric and oral mucosa may be colonized by mixed Helicobacter and Campylobacter species, respectively, in individual animals. To better characterize the presence and distribution of Helicobacter and Campylobacter among marine mammals, we used PCR and 16S rDNA sequence analysis to examine gastric and oral samples from ten dolphins (Tursiops gephyreus), one killer whale (Orcinus orca), one false killer whale (Pseudorca crassidens), and three wild La Plata river dolphins (Pontoporia blainvillei). Helicobacter spp. DNA was widely distributed in gastric and oral samples from both captive and wild cetaceans. Phylogenetic analysis demonstrated two Helicobacter sequence clusters, one closely related to H. cetorum, a species isolated from dolphins and whales in North America. The second related cluster was to sequences obtained from dolphins in Australia and to gastric non-Helicobacter pylori helicobacters, and may represent a novel taxonomic group. Dental plaque sequences from four dolphins formed a third cluster within the Campylobacter genus that likely represents a novel species isolated from marine mammals. Identification of identical Helicobacter spp. DNA sequences from dental plaque, saliva and gastric fluids from the same hosts, suggests that the oral cavity may be involved in transmission. These results demonstrate that Helicobacter and Campylobacter species are commonly distributed in marine mammals, and identify taxonomic clusters that may represent novel species. PMID:21592686

  19. Assessment of clonality and serotypes of Streptococcus mutans among children by multilocus sequence typing.

    PubMed

    Momeni, Stephanie S; Whiddon, Jennifer; Cheon, Kyounga; Moser, Stephen A; Childers, Noel K

    2015-12-01

    Studies using multilocus sequence typing (MLST) have demonstrated that Streptococcus mutans isolates are genetically diverse. Our laboratory previously demonstrated clonality of S. mutans using MLST but could not discount the possibility of sampling bias. In this study, the clonality of randomly selected S. mutans plaque isolates from African-American children was examined using MLST. Serotype and the presence of collagen-binding proteins (CBPs) encoded by cnm/cbm were also assessed. One-hundred S. mutans isolates were randomly selected for MLST analysis. Sequence analysis was performed and phylogenetic trees were generated using start2 and mega. Thirty-four sequence types were identified, of which 27 were unique to this population. Seventy-five per cent of the isolates clustered into 16 clonal groups. The serotypes observed were c (n = 84), e (n = 3), and k (n = 11). The prevalence of S. mutans isolates of serotype k was notably high, at 17.5%. All isolates were cnm/cbm negative. The clonality of S. mutans demonstrated in this study illustrates the importance of localized population studies and are consistent with transmission. The prevalence of serotype k, a recently proposed systemic pathogen, observed in this study, is higher than reported in most populations and is the first report of S. mutans serotype k in a United States population. © 2015 Eur J Oral Sci.

  20. Laser mass spectrometry for DNA sequencing, disease diagnosis, and fingerprinting

    NASA Astrophysics Data System (ADS)

    Chen, C. H. Winston; Taranenko, N. I.; Zhu, Y. F.; Chung, C. N.; Allman, S. L.

    1997-05-01

    Since laser mass spectrometry has the potential for achieving very fast DNA analysis, we recently applied it to DNA sequencing, DNA typing for fingerprinting, and DNA screening for disease diagnosis. Two different approaches for sequencing DNA have been successfully demonstrated. One is to sequence DNA with DNA ladders produced from Sanger's enzymatic method. The other is to do direct sequencing without DNA ladders. The need for quick DNA typing for identification purposes is critical for forensic application. Our preliminary results indicate laser mass spectrometry can possible be used for rapid DNA fingerprinting applications at a much lower cost than gel electrophoresis. Population screening for certain genetic disease can be a very efficient step to reducing medical costs through prevention. Since laser mass spectrometry can provide very fast DNA analysis, we applied laser mass spectrometry to disease diagnosis. Clinical samples with both base deletion and point mutation have been tested with complete success.

  1. Insertion sequence ISRP10 inactivation of the oprD gene in imipenem-resistant Pseudomonas aeruginosa clinical isolates.

    PubMed

    Sun, Qinghui; Ba, Zhaofen; Wu, Guoying; Wang, Wei; Lin, Shuxiang; Yang, Hongjiang

    2016-05-01

    Carbapenem resistance mechanisms were investigated in 32 imipenem-resistant Pseudomonas aeruginosa clinical isolates recovered from hospitalised children. Sequence analysis revealed that 31 of the isolates had an insertion sequence element ISRP10 disrupting the porin gene oprD, demonstrating that ISRP10 inactivation of oprD conferred imipenem resistance in the majority of the isolates. Multilocus sequence typing (MLST) was used to discriminate the isolates. In total, 11 sequence types (STs) were identified including 3 novel STs, and 68.3% (28/41) of the tested strains were characterised as clone ST253. In combination with random amplified polymorphic DNA (RAPD) analysis, the imipenem-resistant isolates displayed a relatively high degree of genetic variability and were unlikely associated with nosocomial infections. Copyright © 2016 Elsevier B.V. and the International Society of Chemotherapy. All rights reserved.

  2. Generation and analysis of expressed sequence tags from a cDNA library of the fruiting body of Ganoderma lucidum

    PubMed Central

    2010-01-01

    Background Little genomic or trancriptomic information on Ganoderma lucidum (Lingzhi) is known. This study aims to discover the transcripts involved in secondary metabolite biosynthesis and developmental regulation of G. lucidum using an expressed sequence tag (EST) library. Methods A cDNA library was constructed from the G. lucidum fruiting body. Its high-quality ESTs were assembled into unique sequences with contigs and singletons. The unique sequences were annotated according to sequence similarities to genes or proteins available in public databases. The detection of simple sequence repeats (SSRs) was preformed by online analysis. Results A total of 1,023 clones were randomly selected from the G. lucidum library and sequenced, yielding 879 high-quality ESTs. These ESTs showed similarities to a diverse range of genes. The sequences encoding squalene epoxidase (SE) and farnesyl-diphosphate synthase (FPS) were identified in this EST collection. Several candidate genes, such as hydrophobin, MOB2, profilin and PHO84 were detected for the first time in G. lucidum. Thirteen (13) potential SSR-motif microsatellite loci were also identified. Conclusion The present study demonstrates a successful application of EST analysis in the discovery of transcripts involved in the secondary metabolite biosynthesis and the developmental regulation of G. lucidum. PMID:20230644

  3. CloVR-ITS: Automated internal transcribed spacer amplicon sequence analysis pipeline for the characterization of fungal microbiota

    PubMed Central

    2013-01-01

    Background Besides the development of comprehensive tools for high-throughput 16S ribosomal RNA amplicon sequence analysis, there exists a growing need for protocols emphasizing alternative phylogenetic markers such as those representing eukaryotic organisms. Results Here we introduce CloVR-ITS, an automated pipeline for comparative analysis of internal transcribed spacer (ITS) pyrosequences amplified from metagenomic DNA isolates and representing fungal species. This pipeline performs a variety of steps similar to those commonly used for 16S rRNA amplicon sequence analysis, including preprocessing for quality, chimera detection, clustering of sequences into operational taxonomic units (OTUs), taxonomic assignment (at class, order, family, genus, and species levels) and statistical analysis of sample groups of interest based on user-provided information. Using ITS amplicon pyrosequencing data from a previous human gastric fluid study, we demonstrate the utility of CloVR-ITS for fungal microbiota analysis and provide runtime and cost examples, including analysis of extremely large datasets on the cloud. We show that the largest fractions of reads from the stomach fluid samples were assigned to Dothideomycetes, Saccharomycetes, Agaricomycetes and Sordariomycetes but that all samples were dominated by sequences that could not be taxonomically classified. Representatives of the Candida genus were identified in all samples, most notably C. quercitrusa, while sequence reads assigned to the Aspergillus genus were only identified in a subset of samples. CloVR-ITS is made available as a pre-installed, automated, and portable software pipeline for cloud-friendly execution as part of the CloVR virtual machine package (http://clovr.org). Conclusion The CloVR-ITS pipeline provides fungal microbiota analysis that can be complementary to bacterial 16S rRNA and total metagenome sequence analysis allowing for more comprehensive studies of environmental and host-associated microbial communities. PMID:24451270

  4. Investigation of the Evolutionary Development of the Genus Bifidobacterium by Comparative Genomics

    PubMed Central

    Lugli, Gabriele Andrea; Milani, Christian; Turroni, Francesca; Duranti, Sabrina; Ferrario, Chiara; Viappiani, Alice; Mancabelli, Leonardo; Mangifesta, Marta; Taminiau, Bernard; Delcenserie, Véronique; van Sinderen, Douwe

    2014-01-01

    The Bifidobacterium genus currently encompasses 48 recognized taxa, which have been isolated from different ecosystems. However, the current phylogeny of bifidobacteria is hampered by the relative paucity of genotypic data. Here, we reassessed the taxonomy of this bacterial genus using genome-based approaches, which demonstrated that the previous taxonomic view of bifidobacteria contained several inconsistencies. In particular, high levels of genetic relatedness were shown to exist between particular Bifidobacterium taxa which would not justify their status as separate species. The results presented are here based on average nucleotide identity analysis involving the genome sequences for each type strain of the 48 bifidobacterial taxa, as well as phylogenetic comparative analysis of the predicted core genome of the Bifidobacterium genus. The results of this study demonstrate that the availability of complete genome sequences allows the reconstruction of a more robust bifidobacterial phylogeny than that obtained from a single gene-based sequence comparison, thus discouraging the assignment of a new or separate bifidobacterial taxon without such a genome-based validation. PMID:25107967

  5. Differences in a ribosomal DNA sequence of Strongylus species allows identification of single eggs.

    PubMed

    Campbell, A J; Gasser, R B; Chilton, N B

    1995-03-01

    In the current study, molecular techniques were evaluated for the species identification of individual strongyle eggs. Adult worms of Strongylus edentatus, S. equinus and S. vulgaris were collected at necropsy from horses from Australia and the U.S.A. Genomic DNA was isolated and a ribosomal transcribed spacer (ITS-2) amplified and sequenced using polymerase chain reaction (PCR) techniques. The length of the ITS-2 sequence of S. edentatus, S. equinus and S. vulgaris ranged between 217 and 235 nucleotides. Extensive sequence analysis demonstrated a low degree (0-0.9%) of intraspecific variation in the ITS-2 for the Strongylus species examined, whereas the levels of interspecific differences (13-29%) were significantly greater. Interspecific differences in the ITS-2 sequences allowed unequivocal species identification of single worms and eggs using PCR-linked restriction fragment length polymorphism. These results demonstrate the potential of the ribosomal spacers as genetic markers for species identification of single strongyle eggs from horse faeces.

  6. Special Focus

    PubMed Central

    Nawrocki, Eric P.; Burge, Sarah W.

    2013-01-01

    The development of RNA bioinformatic tools began more than 30 y ago with the description of the Nussinov and Zuker dynamic programming algorithms for single sequence RNA secondary structure prediction. Since then, many tools have been developed for various RNA sequence analysis problems such as homology search, multiple sequence alignment, de novo RNA discovery, read-mapping, and many more. In this issue, we have collected a sampling of reviews and original research that demonstrate some of the many ways bioinformatics is integrated with current RNA biology research. PMID:23948768

  7. Genetic Diversity of Crimean Congo Hemorrhagic Fever Virus Strains from Iran

    PubMed Central

    Chinikar, Sadegh; Bouzari, Saeid; Shokrgozar, Mohammad Ali; Mostafavi, Ehsan; Jalali, Tahmineh; Khakifirouz, Sahar; Nowotny, Norbert; Fooks, Anthony R.; Shah-Hosseini, Nariman

    2016-01-01

    Background: Crimean Congo hemorrhagic fever virus (CCHFV) is a member of the Bunyaviridae family and Nairovirus genus. It has a negative-sense, single stranded RNA genome approximately 19.2 kb, containing the Small, Medium, and Large segments. CCHFVs are relatively divergent in their genome sequence and grouped in seven distinct clades based on S-segment sequence analysis and six clades based on M-segment sequences. Our aim was to obtain new insights into the molecular epidemiology of CCHFV in Iran. Methods: We analyzed partial and complete nucleotide sequences of the S and M segments derived from 50 Iranian patients. The extracted RNA was amplified using one-step RT-PCR and then sequenced. The sequences were analyzed using Mega5 software. Results: Phylogenetic analysis of partial S segment sequences demonstrated that clade IV-(Asia 1), clade IV-(Asia 2) and clade V-(Europe) accounted for 80 %, 4 % and 14 % of the circulating genomic variants of CCHFV in Iran respectively. However, one of the Iranian strains (Iran-Kerman/22) was associated with none of other sequences and formed a new clade (VII). The phylogenetic analysis of complete S-segment nucleotide sequences from selected Iranian CCHFV strains complemented with representative strains from GenBank revealed similar topology as partial sequences with eight major clusters. A partial M segment phylogeny positioned the Iranian strains in either association with clade III (Asia-Africa) or clade V (Europe). Conclusion: The phylogenetic analysis revealed subtle links between distant geographic locations, which we propose might originate either from international livestock trade or from long-distance carriage of CCHFV by infected ticks via bird migration. PMID:27308271

  8. Short-read, high-throughput sequencing technology for STR genotyping

    PubMed Central

    Bornman, Daniel M.; Hester, Mark E.; Schuetter, Jared M.; Kasoji, Manjula D.; Minard-Smith, Angela; Barden, Curt A.; Nelson, Scott C.; Godbold, Gene D.; Baker, Christine H.; Yang, Boyu; Walther, Jacquelyn E.; Tornes, Ivan E.; Yan, Pearlly S.; Rodriguez, Benjamin; Bundschuh, Ralf; Dickens, Michael L.; Young, Brian A.; Faith, Seth A.

    2013-01-01

    DNA-based methods for human identification principally rely upon genotyping of short tandem repeat (STR) loci. Electrophoretic-based techniques for variable-length classification of STRs are universally utilized, but are limited in that they have relatively low throughput and do not yield nucleotide sequence information. High-throughput sequencing technology may provide a more powerful instrument for human identification, but is not currently validated for forensic casework. Here, we present a systematic method to perform high-throughput genotyping analysis of the Combined DNA Index System (CODIS) STR loci using short-read (150 bp) massively parallel sequencing technology. Open source reference alignment tools were optimized to evaluate PCR-amplified STR loci using a custom designed STR genome reference. Evaluation of this approach demonstrated that the 13 CODIS STR loci and amelogenin (AMEL) locus could be accurately called from individual and mixture samples. Sensitivity analysis showed that as few as 18,500 reads, aligned to an in silico referenced genome, were required to genotype an individual (>99% confidence) for the CODIS loci. The power of this technology was further demonstrated by identification of variant alleles containing single nucleotide polymorphisms (SNPs) and the development of quantitative measurements (reads) for resolving mixed samples. PMID:25621315

  9. There is Diversity in Disorder-"In all Chaos there is a Cosmos, in all Disorder a Secret Order".

    PubMed

    Nielsen, Jakob T; Mulder, Frans A A

    2016-01-01

    The protein universe consists of a continuum of structures ranging from full order to complete disorder. As the structured part of the proteome has been intensively studied, stably folded proteins are increasingly well documented and understood. However, proteins that are fully, or in large part, disordered are much less well characterized. Here we collected NMR chemical shifts in a small database for 117 protein sequences that are known to contain disorder. We demonstrate that NMR chemical shift data can be brought to bear as an exquisite judge of protein disorder at the residue level, and help in validation. With the help of secondary chemical shift analysis we demonstrate that the proteins in the database span the full spectrum of disorder, but still, largely segregate into two classes; disordered with small segments of order scattered along the sequence, and structured with small segments of disorder inserted between the different structured regions. A detailed analysis reveals that the distribution of order/disorder along the sequence shows a complex and asymmetric distribution, that is highly protein-dependent. Access to ratified training data further suggests an avenue to improving prediction of disorder from sequence.

  10. Bidirectional Retroviral Integration Site PCR Methodology and Quantitative Data Analysis Workflow.

    PubMed

    Suryawanshi, Gajendra W; Xu, Song; Xie, Yiming; Chou, Tom; Kim, Namshin; Chen, Irvin S Y; Kim, Sanggu

    2017-06-14

    Integration Site (IS) assays are a critical component of the study of retroviral integration sites and their biological significance. In recent retroviral gene therapy studies, IS assays, in combination with next-generation sequencing, have been used as a cell-tracking tool to characterize clonal stem cell populations sharing the same IS. For the accurate comparison of repopulating stem cell clones within and across different samples, the detection sensitivity, data reproducibility, and high-throughput capacity of the assay are among the most important assay qualities. This work provides a detailed protocol and data analysis workflow for bidirectional IS analysis. The bidirectional assay can simultaneously sequence both upstream and downstream vector-host junctions. Compared to conventional unidirectional IS sequencing approaches, the bidirectional approach significantly improves IS detection rates and the characterization of integration events at both ends of the target DNA. The data analysis pipeline described here accurately identifies and enumerates identical IS sequences through multiple steps of comparison that map IS sequences onto the reference genome and determine sequencing errors. Using an optimized assay procedure, we have recently published the detailed repopulation patterns of thousands of Hematopoietic Stem Cell (HSC) clones following transplant in rhesus macaques, demonstrating for the first time the precise time point of HSC repopulation and the functional heterogeneity of HSCs in the primate system. The following protocol describes the step-by-step experimental procedure and data analysis workflow that accurately identifies and quantifies identical IS sequences.

  11. Environmental Barcoding: A Next-Generation Sequencing Approach for Biomonitoring Applications Using River Benthos

    PubMed Central

    Hajibabaei, Mehrdad; Shokralla, Shadi; Zhou, Xin; Singer, Gregory A. C.; Baird, Donald J.

    2011-01-01

    Timely and accurate biodiversity analysis poses an ongoing challenge for the success of biomonitoring programs. Morphology-based identification of bioindicator taxa is time consuming, and rarely supports species-level resolution especially for immature life stages. Much work has been done in the past decade to develop alternative approaches for biodiversity analysis using DNA sequence-based approaches such as molecular phylogenetics and DNA barcoding. On-going assembly of DNA barcode reference libraries will provide the basis for a DNA-based identification system. The use of recently introduced next-generation sequencing (NGS) approaches in biodiversity science has the potential to further extend the application of DNA information for routine biomonitoring applications to an unprecedented scale. Here we demonstrate the feasibility of using 454 massively parallel pyrosequencing for species-level analysis of freshwater benthic macroinvertebrate taxa commonly used for biomonitoring. We designed our experiments in order to directly compare morphology-based, Sanger sequencing DNA barcoding, and next-generation environmental barcoding approaches. Our results show the ability of 454 pyrosequencing of mini-barcodes to accurately identify all species with more than 1% abundance in the pooled mixture. Although the approach failed to identify 6 rare species in the mixture, the presence of sequences from 9 species that were not represented by individuals in the mixture provides evidence that DNA based analysis may yet provide a valuable approach in finding rare species in bulk environmental samples. We further demonstrate the application of the environmental barcoding approach by comparing benthic macroinvertebrates from an urban region to those obtained from a conservation area. Although considerable effort will be required to robustly optimize NGS tools to identify species from bulk environmental samples, our results indicate the potential of an environmental barcoding approach for biomonitoring programs. PMID:21533287

  12. A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments.

    PubMed

    Bansal, Vikas

    2017-03-14

    PCR amplification is an important step in the preparation of DNA sequencing libraries prior to high-throughput sequencing. PCR amplification introduces redundant reads in the sequence data and estimating the PCR duplication rate is important to assess the frequency of such reads. Existing computational methods do not distinguish PCR duplicates from "natural" read duplicates that represent independent DNA fragments and therefore, over-estimate the PCR duplication rate for DNA-seq and RNA-seq experiments. In this paper, we present a computational method to estimate the average PCR duplication rate of high-throughput sequence datasets that accounts for natural read duplicates by leveraging heterozygous variants in an individual genome. Analysis of simulated data and exome sequence data from the 1000 Genomes project demonstrated that our method can accurately estimate the PCR duplication rate on paired-end as well as single-end read datasets which contain a high proportion of natural read duplicates. Further, analysis of exome datasets prepared using the Nextera library preparation method indicated that 45-50% of read duplicates correspond to natural read duplicates likely due to fragmentation bias. Finally, analysis of RNA-seq datasets from individuals in the 1000 Genomes project demonstrated that 70-95% of read duplicates observed in such datasets correspond to natural duplicates sampled from genes with high expression and identified outlier samples with a 2-fold greater PCR duplication rate than other samples. The method described here is a useful tool for estimating the PCR duplication rate of high-throughput sequence datasets and for assessing the fraction of read duplicates that correspond to natural read duplicates. An implementation of the method is available at https://github.com/vibansal/PCRduplicates .

  13. Image encryption using random sequence generated from generalized information domain

    NASA Astrophysics Data System (ADS)

    Xia-Yan, Zhang; Guo-Ji, Zhang; Xuan, Li; Ya-Zhou, Ren; Jie-Hua, Wu

    2016-05-01

    A novel image encryption method based on the random sequence generated from the generalized information domain and permutation-diffusion architecture is proposed. The random sequence is generated by reconstruction from the generalized information file and discrete trajectory extraction from the data stream. The trajectory address sequence is used to generate a P-box to shuffle the plain image while random sequences are treated as keystreams. A new factor called drift factor is employed to accelerate and enhance the performance of the random sequence generator. An initial value is introduced to make the encryption method an approximately one-time pad. Experimental results show that the random sequences pass the NIST statistical test with a high ratio and extensive analysis demonstrates that the new encryption scheme has superior security.

  14. Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis

    PubMed Central

    2012-01-01

    Background The multiplexing becomes the major limitation of the next-generation sequencing (NGS) in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Results Here we introduce pair-barcode sequencing (PBS), an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc.), 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. Conclusions By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand. PMID:22276739

  15. Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis.

    PubMed

    Tu, Jing; Ge, Qinyu; Wang, Shengqin; Wang, Lei; Sun, Beili; Yang, Qi; Bai, Yunfei; Lu, Zuhong

    2012-01-25

    The multiplexing becomes the major limitation of the next-generation sequencing (NGS) in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Here we introduce pair-barcode sequencing (PBS), an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc.), 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand.

  16. PipeCraft: Flexible open-source toolkit for bioinformatics analysis of custom high-throughput amplicon sequencing data.

    PubMed

    Anslan, Sten; Bahram, Mohammad; Hiiesalu, Indrek; Tedersoo, Leho

    2017-11-01

    High-throughput sequencing methods have become a routine analysis tool in environmental sciences as well as in public and private sector. These methods provide vast amount of data, which need to be analysed in several steps. Although the bioinformatics may be applied using several public tools, many analytical pipelines allow too few options for the optimal analysis for more complicated or customized designs. Here, we introduce PipeCraft, a flexible and handy bioinformatics pipeline with a user-friendly graphical interface that links several public tools for analysing amplicon sequencing data. Users are able to customize the pipeline by selecting the most suitable tools and options to process raw sequences from Illumina, Pacific Biosciences, Ion Torrent and Roche 454 sequencing platforms. We described the design and options of PipeCraft and evaluated its performance by analysing the data sets from three different sequencing platforms. We demonstrated that PipeCraft is able to process large data sets within 24 hr. The graphical user interface and the automated links between various bioinformatics tools enable easy customization of the workflow. All analytical steps and options are recorded in log files and are easily traceable. © 2017 John Wiley & Sons Ltd.

  17. Cloning of a CACTA transposon-like insertion in intron I of tomato invertase Lin5 gene and identification of transposase-like sequences of Solanaceae species.

    PubMed

    Proels, Reinhard K; Roitsch, Thomas

    2006-03-01

    Very few CACTA transposon-like sequences have been described in Solanaceae species. Sequence information has been restricted to partial transposase (TPase)-like fragments, and no target gene of CACTA-like transposon insertion has been described in tomato to date. In this manuscript, we report on a CACTA transposon-like insertion in intron I of tomato (Lycopersicon esculentum) invertase gene Lin5 and TPase-like sequences of several Solanaceae species. Consensus primers deduced from the TPase region of the tomato CACTA transposon-like element allowed the amplification of similar sequences from various Solanaceae species of different subfamilies including Solaneae (Solanum tuberosum), Cestreae (Nicotiana tabacum) and Datureae (Datura stramonium). This demonstrates the ubiquitous presence of CACTA-like elements in Solanaceae genomes. The obtained partial sequences are highly conserved, and allow further detection and detailed analysis of CACTA-like transposons throughout Solanaceae species. CACTA-like transposon sequences make possible the evaluation of their use for genome analysis, functional studies of genes and the evolutionary relationships between plant species.

  18. Whole-exome sequencing identifies USH2A mutations in a pseudo-dominant Usher syndrome family.

    PubMed

    Zheng, Sui-Lian; Zhang, Hong-Liang; Lin, Zhen-Lang; Kang, Qian-Yan

    2015-10-01

    Usher syndrome (USH) is an autosomal recessive (AR) multi-sensory degenerative disorder leading to deaf-blindness. USH is clinically subdivided into three subclasses, and 10 genes have been identified thus far. Clinical and genetic heterogeneities in USH make a precise diagnosis difficult. A dominant‑like USH family in successive generations was identified, and the present study aimed to determine the genetic predisposition of this family. Whole‑exome sequencing was performed in two affected patients and an unaffected relative. Systematic data were analyzed by bioinformatic analysis to remove the candidate mutations via step‑wise filtering. Direct Sanger sequencing and co‑segregation analysis were performed in the pedigree. One novel and two known mutations in the USH2A gene were identified, and were further confirmed by direct sequencing and co‑segregation analysis. The affected mother carried compound mutations in the USH2A gene, while the unaffected father carried a heterozygous mutation. The present study demonstrates that whole‑exome sequencing is a robust approach for the molecular diagnosis of disorders with high levels of genetic heterogeneity.

  19. Using next-generation sequencing for high resolution multiplex analysis of copy number variation from nanogram quantities of DNA from formalin-fixed paraffin-embedded specimens.

    PubMed

    Wood, Henry M; Belvedere, Ornella; Conway, Caroline; Daly, Catherine; Chalkley, Rebecca; Bickerdike, Melissa; McKinley, Claire; Egan, Phil; Ross, Lisa; Hayward, Bruce; Morgan, Joanne; Davidson, Leslie; MacLennan, Ken; Ong, Thian K; Papagiannopoulos, Kostas; Cook, Ian; Adams, David J; Taylor, Graham R; Rabbitts, Pamela

    2010-08-01

    The use of next-generation sequencing technologies to produce genomic copy number data has recently been described. Most approaches, however, reply on optimal starting DNA, and are therefore unsuitable for the analysis of formalin-fixed paraffin-embedded (FFPE) samples, which largely precludes the analysis of many tumour series. We have sought to challenge the limits of this technique with regards to quality and quantity of starting material and the depth of sequencing required. We confirm that the technique can be used to interrogate DNA from cell lines, fresh frozen material and FFPE samples to assess copy number variation. We show that as little as 5 ng of DNA is needed to generate a copy number karyogram, and follow this up with data from a series of FFPE biopsies and surgical samples. We have used various levels of sample multiplexing to demonstrate the adjustable resolution of the methodology, depending on the number of samples and available resources. We also demonstrate reproducibility by use of replicate samples and comparison with microarray-based comparative genomic hybridization (aCGH) and digital PCR. This technique can be valuable in both the analysis of routine diagnostic samples and in examining large repositories of fixed archival material.

  20. Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis

    PubMed Central

    Steele, Joe; Bastola, Dhundy

    2014-01-01

    Modern sequencing and genome assembly technologies have provided a wealth of data, which will soon require an analysis by comparison for discovery. Sequence alignment, a fundamental task in bioinformatics research, may be used but with some caveats. Seminal techniques and methods from dynamic programming are proving ineffective for this work owing to their inherent computational expense when processing large amounts of sequence data. These methods are prone to giving misleading information because of genetic recombination, genetic shuffling and other inherent biological events. New approaches from information theory, frequency analysis and data compression are available and provide powerful alternatives to dynamic programming. These new methods are often preferred, as their algorithms are simpler and are not affected by synteny-related problems. In this review, we provide a detailed discussion of computational tools, which stem from alignment-free methods based on statistical analysis from word frequencies. We provide several clear examples to demonstrate applications and the interpretations over several different areas of alignment-free analysis such as base–base correlations, feature frequency profiles, compositional vectors, an improved string composition and the D2 statistic metric. Additionally, we provide detailed discussion and an example of analysis by Lempel–Ziv techniques from data compression. PMID:23904502

  1. Weissella fabaria sp. nov., from a Ghanaian cocoa fermentation.

    PubMed

    De Bruyne, Katrien; Camu, Nicholas; De Vuyst, Luc; Vandamme, Peter

    2010-09-01

    Two lactic acid bacteria, strains 257(T) and 252, were isolated from traditional heap fermentations of Ghanaian cocoa beans. 16S rRNA gene sequence analysis of these strains allocated them to the genus Weissella, showing 99.5 % 16S rRNA gene sequence similarity towards Weissella ghanensis LMG 24286(T). Whole-cell protein electrophoresis, fluorescent amplified fragment length polymorphism fingerprinting of whole genomes and biochemical tests confirmed their unique taxonomic position. DNA-DNA hybridization experiments towards their nearest phylogenetic neighbour demonstrated that the two strains represent a novel species, for which we propose the name Weissella fabaria sp. nov., with strain 257(T) (=LMG 24289(T) =DSM 21416(T)) as the type strain. Additional sequence analysis using pheS gene sequences proved useful for identification of all Weissella-Leuconostoc-Oenococcus species and for the recognition of the novel species.

  2. Functional organization of a single nif cluster in the mesophilic archaeon Methanosarcina mazei strain Gö1

    PubMed Central

    Ehlers, Claudia; Veit, Katharina; Gottschalk, Gerhard; Schmitz, Ruth A.

    2002-01-01

    The mesophilic methanogenic archaeon Methanosarcina mazei strain Gö1 is able to utilize molecular nitrogen (N2) as its sole nitrogen source. We have identified and characterized a single nitrogen fixation (nif) gene cluster in M. mazei Gö1 with an approximate length of 9 kbp. Sequence analysis revealed seven genes with sequence similarities to nifH, nifI1, nifI2, nifD, nifK, nifE and nifN, similar to other diazotrophic methanogens and certain bacteria such as Clostridium acetobutylicum, with the two glnB-like genes (nifI1 and nifI2) located between nifH and nifD. Phylogenetic analysis of deduced amino acid sequences for the nitrogenase structural genes of M. mazei Gö1 showed that they are most closely related to Methanosarcina barkeri nif2 genes, and also closely resemble those for the corresponding nif products of the gram-positive bacterium C. acetobutylicum. Northern blot analysis and reverse transcription PCR analysis demonstrated that the M. mazei nif genes constitute an operon transcribed only under nitrogen starvation as a single 8 kb transcript. Sequence analysis revealed a palindromic sequence at the transcriptional start site in front of the M. mazei nifH gene, which may have a function in transcriptional regulation of the nif operon. PMID:15803652

  3. Evolutionary Influenced Interaction Pattern as Indicator for the Investigation of Natural Variants Causing Nephrogenic Diabetes Insipidus

    PubMed Central

    Labudde, Dirk

    2015-01-01

    The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations. PMID:26180540

  4. Evolutionary Influenced Interaction Pattern as Indicator for the Investigation of Natural Variants Causing Nephrogenic Diabetes Insipidus.

    PubMed

    Grunert, Steffen; Labudde, Dirk

    2015-01-01

    The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations.

  5. Assessment of clinical analytical sensitivity and specificity of next-generation sequencing for detection of simple and complex mutations.

    PubMed

    Chin, Ephrem L H; da Silva, Cristina; Hegde, Madhuri

    2013-02-19

    Detecting mutations in disease genes by full gene sequence analysis is common in clinical diagnostic laboratories. Sanger dideoxy terminator sequencing allows for rapid development and implementation of sequencing assays in the clinical laboratory, but it has limited throughput, and due to cost constraints, only allows analysis of one or at most a few genes in a patient. Next-generation sequencing (NGS), on the other hand, has evolved rapidly, although to date it has mainly been used for large-scale genome sequencing projects and is beginning to be used in the clinical diagnostic testing. One advantage of NGS is that many genes can be analyzed easily at the same time, allowing for mutation detection when there are many possible causative genes for a specific phenotype. In addition, regions of a gene typically not tested for mutations, like deep intronic and promoter mutations, can also be detected. Here we use 20 previously characterized Sanger-sequenced positive controls in disease-causing genes to demonstrate the utility of NGS in a clinical setting using standard PCR based amplification to assess the analytical sensitivity and specificity of the technology for detecting all previously characterized changes (mutations and benign SNPs). The positive controls chosen for validation range from simple substitution mutations to complex deletion and insertion mutations occurring in autosomal dominant and recessive disorders. The NGS data was 100% concordant with the Sanger sequencing data identifying all 119 previously identified changes in the 20 samples. We have demonstrated that NGS technology is ready to be deployed in clinical laboratories. However, NGS and associated technologies are evolving, and clinical laboratories will need to invest significantly in staff and infrastructure to build the necessary foundation for success.

  6. Next-Generation Sequencing of Coccidioides immitis Isolated during Cluster Investigation

    PubMed Central

    Engelthaler, David M.; Chiller, Tom; Schupp, James A.; Colvin, Joshua; Beckstrom-Sternberg, Stephen M.; Driebe, Elizabeth M.; Moses, Tracy; Tembe, Waibhav; Sinari, Shripad; Beckstrom-Sternberg, James S.; Christoforides, Alexis; Pearson, John V.; Carpten, John; Keim, Paul; Peterson, Ashley; Terashita, Dawn

    2011-01-01

    Next-generation sequencing enables use of whole-genome sequence typing (WGST) as a viable and discriminatory tool for genotyping and molecular epidemiologic analysis. We used WGST to confirm the linkage of a cluster of Coccidioides immitis isolates from 3 patients who received organ transplants from a single donor who later had positive test results for coccidioidomycosis. Isolates from the 3 patients were nearly genetically identical (a total of 3 single-nucleotide polymorphisms identified among them), thereby demonstrating direct descent of the 3 isolates from an original isolate. We used WGST to demonstrate the genotypic relatedness of C. immitis isolates that were also epidemiologically linked. Thus, WGST offers unique benefits to public health for investigation of clusters considered to be linked to a single source. PMID:21291593

  7. The DNA Methylome of Human Peripheral Blood Mononuclear Cells

    PubMed Central

    Ye, Mingzhi; Zheng, Hancheng; Yu, Jian; Wu, Honglong; Sun, Jihua; Zhang, Hongyu; Chen, Quan; Luo, Ruibang; Chen, Minfeng; He, Yinghua; Jin, Xin; Zhang, Qinghui; Yu, Chang; Zhou, Guangyu; Sun, Jinfeng; Huang, Yebo; Zheng, Huisong; Cao, Hongzhi; Zhou, Xiaoyu; Guo, Shicheng; Hu, Xueda; Li, Xin; Kristiansen, Karsten; Bolund, Lars; Xu, Jiujin; Wang, Wen; Yang, Huanming; Wang, Jian; Li, Ruiqiang; Beck, Stephan; Wang, Jun; Zhang, Xiuqing

    2010-01-01

    DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome) analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per strand), we report a comprehensive (92.62%) methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC) from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found that 68.4% of CpG sites and <0.2% of non-CpG sites were methylated, demonstrating that non-CpG cytosine methylation is minor in human PBMC. Analysis of the PBMC methylome revealed a rich epigenomic landscape for 20 distinct genomic features, including regulatory, protein-coding, non-coding, RNA-coding, and repeat sequences. Integration of our methylome data with the YH genome sequence enabled a first comprehensive assessment of allele-specific methylation (ASM) between the two haploid methylomes of any individual and allowed the identification of 599 haploid differentially methylated regions (hDMRs) covering 287 genes. Of these, 76 genes had hDMRs within 2 kb of their transcriptional start sites of which >80% displayed allele-specific expression (ASE). These data demonstrate that ASM is a recurrent phenomenon and is highly correlated with ASE in human PBMCs. Together with recently reported similar studies, our study provides a comprehensive resource for future epigenomic research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies. PMID:21085693

  8. A novel endogenous betaretrovirus group characterized from polar bears (Ursus maritimus) and giant pandas (Ailuropoda melanoleuca).

    PubMed

    Mayer, Jens; Tsangaras, Kyriakos; Heeger, Felix; Avila-Arcos, María; Stenglein, Mark D; Chen, Wei; Sun, Wei; Mazzoni, Camila J; Osterrieder, Nikolaus; Greenwood, Alex D

    2013-08-15

    Transcriptome analysis of polar bears (Ursus maritimus) yielded sequences with highest similarity to the human endogenous retrovirus group HERV-K(HML-2). Further analysis of the polar bear draft genome identified an endogenous betaretrovirus group comprising 26 proviral copies and 231 solo LTRs. Molecular dating indicates the group originated before the divergence of bears from a common ancestor but is not present in all carnivores. Closely related sequences were identified in the giant panda (Ailuropoda melanoleuca) and characterized from its genome. We have designated the polar bear and giant panda sequences U. maritimus endogenous retrovirus (UmaERV) and A. melanoleuca endogenous retrovirus (AmeERV), respectively. Phylogenetic analysis demonstrated that the bear virus group is nested within the HERV-K supergroup among bovine and bat endogenous retroviruses suggesting a complex evolutionary history within the HERV-K group. All individual remnants of proviral sequences contain numerous frameshifts and stop codons and thus, the virus is likely non-infectious. Copyright © 2013 Elsevier Inc. All rights reserved.

  9. An analysis of rotor blade twist variables associated with different Euler sequences and pretwist treatments

    NASA Technical Reports Server (NTRS)

    Alkire, K.

    1984-01-01

    A nonlinear analysis which is necessary to adequately model elastic helicopter rotor blades experiencing moderately large deformations was examined. The analysis must be based on an appropriate description of the blade's deformation geometry including elastic bending and twist. Built-in pretwist angles complicate the deformation process ant its definition. Relationships between the twist variables associated with different rotation sequences and corresponding forms of the transformation matrix are lasted. Relationships between the twist variables associated with first, the pretwist combined with the deformation twist are included. Many of the corresponding forms of the transformation matrix for the two cases are listed. It is shown that twist variables connected with the combined twist treatment are related to those where the pretwist is applied initially. A method to determine the relationships and some results are outlined. A procedure to evaluate the transformation matrix that eliminates the Eulerlike sequence altogether is demonstrated. The resulting form of the transformation matrix is unaffected by rotation sequence or pretwist treatment.

  10. A novel endogenous betaretrovirus group characterized from polar bears (Ursus maritimus) and giant pandas (Ailuropoda melanoleuca)

    PubMed Central

    Mayer, Jens; Tsangaras, Kyriakos; Heeger, Felix; Ávila-Arcos, Maria; Stenglein, Mark D.; Chen, Wei; Sun, Wei; Mazzoni, Camila; Osterrieder, Nikolaus; Greenwood, Alex D.

    2013-01-01

    Transcriptome analysis of polar bears (Ursus maritimus) yielded sequences with highest similarity to the human endogenous retrovirus group HERV-K(HML-2). Further analysis of the polar bear draft genome identified an endogenous betaretrovirus group comprising 26 proviral copies and 231 solo LTRs. Molecular dating indicates the group originated before the divergence of bears from a common ancestor but is not present in all carnivores. Closely related sequences were identified in the giant panda (Ailuropoda melanoleuca) and characterized from its genome. We have designated the polar bear and giant panda sequences Ursus maritimus endogenous retrovirus (UmaERV) and Ailuropoda melanoleuca endogenous retrovirus (AmeERV), respectively. Phylogenetic analysis demonstrated that the bear virus group is nested within the HERV-K supergroup among bovine and bat endogenous retroviruses suggesting a complex evolutionary history within the HERV-K group. All individual remnants of proviral sequences contain numerous frameshifts and stop codons and thus, the virus is likely non-infectious. PMID:23725819

  11. An integrated approach to demonstrating the ANR pathway of proanthocyanidin biosynthesis in plants.

    PubMed

    Peng, Qing-Zhong; Zhu, Yue; Liu, Zhong; Du, Ci; Li, Ke-Gang; Xie, De-Yu

    2012-09-01

    Proanthocyanidins (PAs) are oligomers or polymers of plant flavan-3-ols and are important to plant adaptation in extreme environmental conditions. The characterization of anthocyanidin reductase (ANR) and leucoanthocyanidin reductase (LAR) has demonstrated the different biogenesis of four stereo-configurations of flavan-3-ols. It is important to understand whether ANR and the ANR pathway widely occur in the plant kingdom. Here, we report an integrated approach to demonstrate the ANR pathway in plants. This includes different methods to extract native ANR from different tissues of eight angiosperm plants (Lotus corniculatus, Desmodium uncinatum, Medicago sativa, Hordeum vulgare, Vitis vinifera, Vitis bellula, Parthenocissus heterophylla, and Cerasus serrulata) and one fern plant (Dryopteris pycnopteroides), a general enzymatic analysis approach to demonstrate the ANR activity, high-performance liquid chromatography-based fingerprinting to demonstrate (-)-epicatechin and other flavan-3-ol molecules, and phytochemical analysis of PAs. Results demonstrate that in addition to leaves of M. sativa, tissues of other eight plants contain an active ANR pathway. Particularly, the leaves, flowers and pods of D. uncinatum, which is a model plant to study LAR and the LAR pathways, are demonstrated to express an active ANR pathway. This finding suggests that the ANR pathway involves PA biosynthesis in D. uncinatum. In addition, a sequence BLAST analysis reveals that ANR homologs have been sequenced in plants from both gymnosperms and angiosperms. These data show that the ANR pathway to PA biosynthesis occurs in both seed and seedless vascular plants.

  12. Impact of cultivation on characterisation of species composition of soil bacterial communities.

    PubMed

    McCaig, A E.; Grayston, S J.; Prosser, J I.; Glover, L A.

    2001-03-01

    The species composition of culturable bacteria in Scottish grassland soils was investigated using a combination of Biolog and 16S rDNA analysis for characterisation of isolates. The inclusion of a molecular approach allowed direct comparison of sequences from culturable bacteria with sequences obtained during analysis of DNA extracted directly from the same soil samples. Bacterial strains were isolated on Pseudomonas isolation agar (PIA), a selective medium, and on tryptone soya agar (TSA), a general laboratory medium. In total, 12 and 21 morphologically different bacterial cultures were isolated on PIA and TSA, respectively. Biolog and sequencing placed PIA isolates in the same taxonomic groups, the majority of cultures belonging to the Pseudomonas (sensu stricto) group. However, analysis of 16S rDNA sequences proved more efficient than Biolog for characterising TSA isolates due to limitations of the Microlog database for identifying environmental bacteria. In general, 16S rDNA sequences from TSA isolates showed high similarities to cultured species represented in sequence databases, although TSA-8 showed only 92.5% similarity to the nearest relative, Bacillus insolitus. In general, there was very little overlap between the culturable and uncultured bacterial communities, although two sequences, PIA-2 and TSA-13, showed >99% similarity to soil clones. A cloning step was included prior to sequence analysis of two isolates, TSA-5 and TSA-14, and analysis of several clones confirmed that these cultures comprised at least four and three sequence types, respectively. All isolate clones were most closely related to uncultured bacteria, with clone TSA-5.1 showing 99.8% similarity to a sequence amplified directly from the same soil sample. Interestingly, one clone, TSA-5.4, clustered within a novel group comprising only uncultured sequences. This group, which is associated with the novel, deep-branching Acidobacterium capsulatum lineage, also included clones isolated during direct analysis of the same soil and from a wide range of other sample types studied elsewhere. The study demonstrates the value of fine-scale molecular analysis for identification of laboratory isolates and indicates the culturability of approximately 1% of the total population but under a restricted range of media and cultivation conditions.

  13. Verification of nonlinear dynamic structural test results by combined image processing and acoustic analysis

    NASA Astrophysics Data System (ADS)

    Tene, Yair; Tene, Noam; Tene, G.

    1993-08-01

    An interactive data fusion methodology of video, audio, and nonlinear structural dynamic analysis for potential application in forensic engineering is presented. The methodology was developed and successfully demonstrated in the analysis of heavy transportable bridge collapse during preparation for testing. Multiple bridge elements failures were identified after the collapse, including fracture, cracks and rupture of high performance structural materials. Videotape recording by hand held camcorder was the only source of information about the collapse sequence. The interactive data fusion methodology resulted in extracting relevant information form the videotape and from dynamic nonlinear structural analysis, leading to full account of the sequence of events during the bridge collapse.

  14. Comparative genome and methylome analysis reveals restriction/modification system diversity in the gut commensal Bifidobacterium breve

    PubMed Central

    Bottacini, Francesca; Morrissey, Ruth; Roberts, Richard John; James, Kieran; van Breen, Justin; Egan, Muireann; Lambert, Jolanda; van Limpt, Kees; Knol, Jan; Motherway, Mary O’Connell; van Sinderen, Douwe

    2018-01-01

    Abstract Bifidobacterium breve represents one of the most abundant bifidobacterial species in the gastro-intestinal tract of breast-fed infants, where their presence is believed to exert beneficial effects. In the present study whole genome sequencing, employing the PacBio Single Molecule, Real-Time (SMRT) sequencing platform, combined with comparative genome analysis allowed the most extensive genetic investigation of this taxon. Our findings demonstrate that genes encoding Restriction/Modification (R/M) systems constitute a substantial part of the B. breve variable gene content (or variome). Using the methylome data generated by SMRT sequencing, combined with targeted Illumina bisulfite sequencing (BS-seq) and comparative genome analysis, we were able to detect methylation recognition motifs and assign these to identified B. breve R/M systems, where in several cases such assignments were confirmed by restriction analysis. Furthermore, we show that R/M systems typically impose a very significant barrier to genetic accessibility of B. breve strains, and that cloning of a methyltransferase-encoding gene may overcome such a barrier, thus allowing future functional investigations of members of this species. PMID:29294107

  15. Texture analysis of common renal masses in multiple MR sequences for prediction of pathology

    NASA Astrophysics Data System (ADS)

    Hoang, Uyen N.; Malayeri, Ashkan A.; Lay, Nathan S.; Summers, Ronald M.; Yao, Jianhua

    2017-03-01

    This pilot study performs texture analysis on multiple magnetic resonance (MR) images of common renal masses for differentiation of renal cell carcinoma (RCC). Bounding boxes are drawn around each mass on one axial slice in T1 delayed sequence to use for feature extraction and classification. All sequences (T1 delayed, venous, arterial, pre-contrast phases, T2, and T2 fat saturated sequences) are co-registered and texture features are extracted from each sequence simultaneously. Random forest is used to construct models to classify lesions on 96 normal regions, 87 clear cell RCCs, 8 papillary RCCs, and 21 renal oncocytomas; ground truths are verified through pathology reports. The highest performance is seen in random forest model when data from all sequences are used in conjunction, achieving an overall classification accuracy of 83.7%. When using data from one single sequence, the overall accuracies achieved for T1 delayed, venous, arterial, and pre-contrast phase, T2, and T2 fat saturated were 79.1%, 70.5%, 56.2%, 61.0%, 60.0%, and 44.8%, respectively. This demonstrates promising results of utilizing intensity information from multiple MR sequences for accurate classification of renal masses.

  16. Targeted Analysis of Whole Genome Sequence Data to Diagnose Genetic Cardiomyopathy

    DOE PAGES

    Golbus, Jessica R.; Puckelwartz, Megan J.; Dellefave-Castillo, Lisa; ...

    2014-09-01

    Background—Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of more than 50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift towards comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Methods and Results—Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused onmore » 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1-14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and/or segregation analysis. Three of three previously identified primary mutations were detected by this analysis. In six subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and/or had additional pathological correlation to provide evidence for causality. For two subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. We conclude that these pilot data demonstrate that ~30-40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes.« less

  17. Comparisons of Highly Virulent H5N1 Influenza A Viruses Isolated from Humans and Chickens from Hong Kong

    PubMed Central

    Suarez, David L.; Perdue, Michael L.; Cox, Nancy; Rowe, Thomas; Bender, Catherine; Huang, Jing; Swayne, David E.

    1998-01-01

    Genes of an influenza A (H5N1) virus from a human in Hong Kong isolated in May 1997 were sequenced and found to be all avian-like (K. Subbarao et al., Science 279:393–395, 1998). Gene sequences of this human isolate were compared to those of a highly pathogenic chicken H5N1 influenza virus isolated from Hong Kong in April 1997. Sequence comparisons of all eight RNA segments from the two viruses show greater than 99% sequence identity between them. However, neither isolate’s gene sequence was closely (>95% sequence identity) related to any other gene sequences found in the GenBank database. Phylogenetic analysis demonstrated that the nucleotide sequences of at least four of the eight RNA segments clustered with Eurasian origin avian influenza viruses. The hemagglutinin gene phylogenetic analysis also included the sequences from an additional three human and two chicken H5N1 virus isolates from Hong Kong, and the isolates separated into two closely related groups. However, no single amino acid change separated the chicken origin and human origin isolates, but they all contained multiple basic amino acids at the hemagglutinin cleavage site, which is associated with a highly pathogenic phenotype in poultry. In experimental intravenous inoculation studies with chickens, all seven viruses were highly pathogenic, killing most birds within 24 h. All infected chickens had virtually identical pathologic lesions, including moderate to severe diffuse edema and interstitial pneumonitis. Viral nucleoprotein was most frequently demonstrated in vascular endothelium, macrophages, heterophils, and cardiac myocytes. Asphyxiation from pulmonary edema and generalized cardiovascular collapse were the most likely pathogenic mechanisms responsible for illness and death. In summary, a small number of changes in hemagglutinin gene sequences defined two closely related subgroups, with both subgroups having human and chicken members, among the seven viruses examined from Hong Kong, and all seven viruses were highly pathogenic in chickens and caused similar lesions in experimental inoculations. PMID:9658115

  18. Asystasia mosaic Madagascar virus: a novel bipartite begomovirus infecting the weed Asystasia gangetica in Madagascar.

    PubMed

    De Bruyn, Alexandre; Harimalala, Mireille; Hoareau, Murielle; Ranomenjanahary, Sahondramalala; Reynaud, Bernard; Lefeuvre, Pierre; Lett, Jean-Michel

    2015-06-01

    Here, we describe for the first time the complete genome sequence of a new bipartite begomovirus in Madagascar isolated from the weed Asystasia gangetica (Acanthaceae), for which we propose the tentative name asystasia mosaic Madagascar virus (AMMGV). DNA-A and -B nucleotide sequences of AMMGV were only distantly related to known begomovirus sequence and shared highest nucleotide sequence identity of 72.9 % (DNA-A) and 66.9 % (DNA-B) with a recently described bipartite begomovirus infecting Asystasia sp. in West Africa. Phylogenetic analysis demonstrated that this novel virus from Madagascar belongs to a new lineage of Old World bipartite begomoviruses.

  19. Assessing the 5S ribosomal RNA heterogeneity in Arabidopsis thaliana using short RNA next generation sequencing data.

    PubMed

    Szymanski, Maciej; Karlowski, Wojciech M

    2016-01-01

    In eukaryotes, ribosomal 5S rRNAs are products of multigene families organized within clusters of tandemly repeated units. Accumulation of genomic data obtained from a variety of organisms demonstrated that the potential 5S rRNA coding sequences show a large number of variants, often incompatible with folding into a correct secondary structure. Here, we present results of an analysis of a large set of short RNA sequences generated by the next generation sequencing techniques, to address the problem of heterogeneity of the 5S rRNA transcripts in Arabidopsis and identification of potentially functional rRNA-derived fragments.

  20. Sequence data - Magnitude and implications of some ambiguities.

    NASA Technical Reports Server (NTRS)

    Holmquist, R.; Jukes, T. H.

    1972-01-01

    A stochastic model is applied to the divergence of the horse-pig lineage from a common ansestor in terms of the alpha and beta chains of hemoglobin and fibrinopeptides. The results are compared with those based on the minimum mutation distance model of Fitch (1972). Buckwheat and cauliflower cytochrome c sequences are analyzed to demonstrate their ambiguities. A comparative analysis of evolutionary rates for various proteins of horses and pigs shows that errors of considerable magnitude are introduced by Glx and Asx ambiguities into evolutionary conclusions drawn from sequences of incompletely analyzed proteins.

  1. Nucleotide Sequence Analysis of RNA Synthesized from Rabbit Globin Complementary DNA

    PubMed Central

    Poon, Raymond; Paddock, Gary V.; Heindell, Howard; Whitcome, Philip; Salser, Winston; Kacian, Dan; Bank, Arthur; Gambino, Roberto; Ramirez, Francesco

    1974-01-01

    Rabbit globin complementary DNA made with RNA-dependent DNA polymerase (reverse transcriptase) was used as template for in vitro synthesis of 32P-labeled RNA. The sequences of the nucleotides in most of the fragments resulting from combined ribonuclease T1 and alkaline phosphatase digestion have been determined. Several fragments were long enough to fit uniquely with the α or β globin amino-acid sequences. These data demonstrate that the cDNA was copied from globin mRNA and contained no detectable contaminants. Images PMID:4139714

  2. Demonstration of human T-cell lymphotropic virus type I (HTLV-I) from an HTLV-I seronegative south Indian patient with chronic, progressive spastic paraparesis.

    PubMed

    Nishimura, M; Mingioli, E; McFarlin, D E; Jacobson, S

    1993-12-01

    Here we describe a human T-cell lymphotropic virus type I (HTLV-I) seronegative patient from South India with a chronic, progressive spastic paraparesis from which HTLV-I has been isolated from peripheral blood lymphocytes. HTLV-I pol and tax viral sequences were detected in DNA from fresh peripheral blood lymphocytes (PBL) by polymerase chain reaction (PCR) and liquid hybridization techniques. Southern blot analysis of the PCR products demonstrated a low copy number of HTLV-I at the level of one viral copy per 10,000 fresh PBL. A long-term CD4+ T-cell line was established from PBL of this patient using recombinant interleukin-2, OKT3, and feeder cells. DNA from these cultured lines was amplified and portions of the HTLV-I long terminal repeat (U3), pol, env, and tax regions were sequenced (a total of 1,115 bp). The sequence data showed that the HTLV-I associated with this patient was 98.8% homologous to prototype HTLV-I. Southern blot analysis also confirmed the presence of full-length HTLV-I. These results indicate that HTLV-I can be demonstrated in an HTLV-I seronegative patient from South India with a chronic progressive neurological disorder.

  3. From sequencer to supercomputer: an automatic pipeline for managing and processing next generation sequencing data.

    PubMed

    Camerlengo, Terry; Ozer, Hatice Gulcin; Onti-Srinivasan, Raghuram; Yan, Pearlly; Huang, Tim; Parvin, Jeffrey; Huang, Kun

    2012-01-01

    Next Generation Sequencing is highly resource intensive. NGS Tasks related to data processing, management and analysis require high-end computing servers or even clusters. Additionally, processing NGS experiments requires suitable storage space and significant manual interaction. At The Ohio State University's Biomedical Informatics Shared Resource, we designed and implemented a scalable architecture to address the challenges associated with the resource intensive nature of NGS secondary analysis built around Illumina Genome Analyzer II sequencers and Illumina's Gerald data processing pipeline. The software infrastructure includes a distributed computing platform consisting of a LIMS called QUEST (http://bisr.osumc.edu), an Automation Server, a computer cluster for processing NGS pipelines, and a network attached storage device expandable up to 40TB. The system has been architected to scale to multiple sequencers without requiring additional computing or labor resources. This platform provides demonstrates how to manage and automate NGS experiments in an institutional or core facility setting.

  4. Protein evolution analysis of S-hydroxynitrile lyase by complete sequence design utilizing the INTMSAlign software.

    PubMed

    Nakano, Shogo; Asano, Yasuhisa

    2015-02-03

    Development of software and methods for design of complete sequences of functional proteins could contribute to studies of protein engineering and protein evolution. To this end, we developed the INTMSAlign software, and used it to design functional proteins and evaluate their usefulness. The software could assign both consensus and correlation residues of target proteins. We generated three protein sequences with S-selective hydroxynitrile lyase (S-HNL) activity, which we call designed S-HNLs; these proteins folded as efficiently as the native S-HNL. Sequence and biochemical analysis of the designed S-HNLs suggested that accumulation of neutral mutations occurs during the process of S-HNLs evolution from a low-activity form to a high-activity (native) form. Taken together, our results demonstrate that our software and the associated methods could be applied not only to design of complete sequences, but also to predictions of protein evolution, especially within families such as esterases and S-HNLs.

  5. Protein evolution analysis of S-hydroxynitrile lyase by complete sequence design utilizing the INTMSAlign software

    NASA Astrophysics Data System (ADS)

    Nakano, Shogo; Asano, Yasuhisa

    2015-02-01

    Development of software and methods for design of complete sequences of functional proteins could contribute to studies of protein engineering and protein evolution. To this end, we developed the INTMSAlign software, and used it to design functional proteins and evaluate their usefulness. The software could assign both consensus and correlation residues of target proteins. We generated three protein sequences with S-selective hydroxynitrile lyase (S-HNL) activity, which we call designed S-HNLs; these proteins folded as efficiently as the native S-HNL. Sequence and biochemical analysis of the designed S-HNLs suggested that accumulation of neutral mutations occurs during the process of S-HNLs evolution from a low-activity form to a high-activity (native) form. Taken together, our results demonstrate that our software and the associated methods could be applied not only to design of complete sequences, but also to predictions of protein evolution, especially within families such as esterases and S-HNLs.

  6. BioPig: a Hadoop-based analytic toolkit for large-scale sequence data.

    PubMed

    Nordberg, Henrik; Bhatia, Karan; Wang, Kai; Wang, Zhong

    2013-12-01

    The recent revolution in sequencing technologies has led to an exponential growth of sequence data. As a result, most of the current bioinformatics tools become obsolete as they fail to scale with data. To tackle this 'data deluge', here we introduce the BioPig sequence analysis toolkit as one of the solutions that scale to data and computation. We built BioPig on the Apache's Hadoop MapReduce system and the Pig data flow language. Compared with traditional serial and MPI-based algorithms, BioPig has three major advantages: first, BioPig's programmability greatly reduces development time for parallel bioinformatics applications; second, testing BioPig with up to 500 Gb sequences demonstrates that it scales automatically with size of data; and finally, BioPig can be ported without modification on many Hadoop infrastructures, as tested with Magellan system at National Energy Research Scientific Computing Center and the Amazon Elastic Compute Cloud. In summary, BioPig represents a novel program framework with the potential to greatly accelerate data-intensive bioinformatics analysis.

  7. ANCAC: amino acid, nucleotide, and codon analysis of COGs--a tool for sequence bias analysis in microbial orthologs.

    PubMed

    Meiler, Arno; Klinger, Claudia; Kaufmann, Michael

    2012-09-08

    The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC's NUCOCOG dataset as the largest one available for that purpose thus far. Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.

  8. ANCAC: amino acid, nucleotide, and codon analysis of COGs – a tool for sequence bias analysis in microbial orthologs

    PubMed Central

    2012-01-01

    Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills. PMID:22958836

  9. Tandem mass spectrometry for the detection of plant pathogenic fungi and the effects of database composition on protein inferences.

    PubMed

    Padliya, Neerav D; Garrett, Wesley M; Campbell, Kimberly B; Tabb, David L; Cooper, Bret

    2007-11-01

    LC-MS/MS has demonstrated potential for detecting plant pathogens. Unlike PCR or ELISA, LC-MS/MS does not require pathogen-specific reagents for the detection of pathogen-specific proteins and peptides. However, the MS/MS approach we and others have explored does require a protein sequence reference database and database-search software to interpret tandem mass spectra. To evaluate the limitations of database composition on pathogen identification, we analyzed proteins from cultured Ustilago maydis, Phytophthora sojae, Fusarium graminearum, and Rhizoctonia solani by LC-MS/MS. When the search database did not contain sequences for a target pathogen, or contained sequences to related pathogens, target pathogen spectra were reliably matched to protein sequences from nontarget organisms, giving an illusion that proteins from nontarget organisms were identified. Our analysis demonstrates that when database-search software is used as part of the identification process, a paradox exists whereby additional sequences needed to detect a wide variety of possible organisms may lead to more cross-species protein matches and misidentification of pathogens.

  10. Whole Transcriptome Sequencing Enables Discovery and Analysis of Viruses in Archived Primary Central Nervous System Lymphomas

    PubMed Central

    DeBoever, Christopher; Reid, Erin G.; Smith, Erin N.; Wang, Xiaoyun; Dumaop, Wilmar; Harismendy, Olivier; Carson, Dennis; Richman, Douglas; Masliah, Eliezer; Frazer, Kelly A.

    2013-01-01

    Primary central nervous system lymphomas (PCNSL) have a dramatically increased prevalence among persons living with AIDS and are known to be associated with human Epstein Barr virus (EBV) infection. Previous work suggests that in some cases, co-infection with other viruses may be important for PCNSL pathogenesis. Viral transcription in tumor samples can be measured using next generation transcriptome sequencing. We demonstrate the ability of transcriptome sequencing to identify viruses, characterize viral expression, and identify viral variants by sequencing four archived AIDS-related PCNSL tissue samples and analyzing raw sequencing reads. EBV was detected in all four PCNSL samples and cytomegalovirus (CMV), JC polyomavirus (JCV), and HIV were also discovered, consistent with clinical diagnoses. CMV was found to express three long non-coding RNAs recently reported as expressed during active infection. Single nucleotide variants were observed in each of the viruses observed and three indels were found in CMV. No viruses were found in several control tumor types including 32 diffuse large B-cell lymphoma samples. This study demonstrates the ability of next generation transcriptome sequencing to accurately identify viruses, including DNA viruses, in solid human cancer tissue samples. PMID:24023918

  11. Genetic characterization of the non-structural protein-3 gene of bluetongue virus serotype-2 isolate from India.

    PubMed

    Pudupakam, Raghavendra Sumanth; Raghunath, Shobana; Pudupakam, Meghanath; Daggupati, Sreenivasulu

    2017-03-01

    Sequence analysis and phylogenetic studies based on non-structural protein-3 (NS3) gene are important in understanding the evolution and epidemiology of bluetongue virus (BTV). This study was aimed at characterizing the NS3 gene sequence of Indian BTV serotype-2 (BTV2) to elucidate its genetic relationship to global BTV isolates. The NS3 gene of BTV2 was amplified from infected BHK-21 cell cultures, cloned and subjected to sequence analysis. The generated NS3 gene sequence was compared with the corresponding sequences of different BTV serotypes across the world, and a phylogenetic relationship was established. The NS3 gene of BTV2 showed moderate levels of variability in comparison to different BTV serotypes, with nucleotide sequence identities ranging from 81% to 98%. The region showed high sequence homology of 93-99% at amino acid level with various BTV serotypes. The PPXY/PTAP late domain motifs, glycosylation sites, hydrophobic domains, and the amino acid residues critical for virus-host interactions were conserved in NS3 protein. Phylogenetic analysis revealed that BTV isolates segregate into four topotypes and that the Indian BTV2 in subclade IA is closely related to Asian and Australian origin strains. Analysis of the NS3 gene indicated that Indian BTV2 isolate is closely related to strains from Asia and Australia, suggesting a common origin of infection. Although the pattern of evolution of BTV2 isolate is different from other global isolates, the deduced amino acid sequence of NS3 protein demonstrated high molecular stability.

  12. Genetic characterization of the non-structural protein-3 gene of bluetongue virus serotype-2 isolate from India

    PubMed Central

    Pudupakam, Raghavendra Sumanth; Raghunath, Shobana; Pudupakam, Meghanath; Daggupati, Sreenivasulu

    2017-01-01

    Aim: Sequence analysis and phylogenetic studies based on non-structural protein-3 (NS3) gene are important in understanding the evolution and epidemiology of bluetongue virus (BTV). This study was aimed at characterizing the NS3 gene sequence of Indian BTV serotype-2 (BTV2) to elucidate its genetic relationship to global BTV isolates. Materials and Methods: The NS3 gene of BTV2 was amplified from infected BHK-21 cell cultures, cloned and subjected to sequence analysis. The generated NS3 gene sequence was compared with the corresponding sequences of different BTV serotypes across the world, and a phylogenetic relationship was established. Results: The NS3 gene of BTV2 showed moderate levels of variability in comparison to different BTV serotypes, with nucleotide sequence identities ranging from 81% to 98%. The region showed high sequence homology of 93-99% at amino acid level with various BTV serotypes. The PPXY/PTAP late domain motifs, glycosylation sites, hydrophobic domains, and the amino acid residues critical for virus-host interactions were conserved in NS3 protein. Phylogenetic analysis revealed that BTV isolates segregate into four topotypes and that the Indian BTV2 in subclade IA is closely related to Asian and Australian origin strains. Conclusion: Analysis of the NS3 gene indicated that Indian BTV2 isolate is closely related to strains from Asia and Australia, suggesting a common origin of infection. Although the pattern of evolution of BTV2 isolate is different from other global isolates, the deduced amino acid sequence of NS3 protein demonstrated high molecular stability. PMID:28435199

  13. Primary and secondary structural analyses of glutathione S-transferase pi from human placenta.

    PubMed

    Ahmad, H; Wilson, D E; Fritz, R R; Singh, S V; Medh, R D; Nagle, G T; Awasthi, Y C; Kurosky, A

    1990-05-01

    The primary structure of glutathione S-transferase (GST) pi from a single human placenta was determined. The structure was established by chemical characterization of tryptic and cyanogen bromide peptides as well as automated sequence analysis of the intact enzyme. The structural analysis indicated that the protein is comprised of 209 amino acid residues and gave no evidence of post-translational modifications. The amino acid sequence differed from that of the deduced amino acid sequence determined by nucleotide sequence analysis of a cDNA clone (Kano, T., Sakai, M., and Muramatsu, M., 1987, Cancer Res. 47, 5626-5630) at position 104 which contained both valine and isoleucine whereas the deduced sequence from nucleotide sequence analysis identified only isoleucine at this position. These results demonstrated that in the one individual placenta studied at least two GST pi genes are coexpressed, probably as a result of allelomorphism. Computer assisted consensus sequence evaluation identified a hydrophobic region in GST pi (residues 155-181) that was predicted to be either a buried transmembrane helical region or a signal sequence region. The significance of this hydrophobic region was interpreted in relation to the mode of action of the enzyme especially in regard to the potential involvement of a histidine in the active site mechanism. A comparison of the chemical similarity of five known human GST complete enzyme structures, one of pi, one of mu, two of alpha, and one microsomal, gave evidence that all five enzymes have evolved by a divergent evolutionary process after gene duplication, with the microsomal enzyme representing the most divergent form.

  14. Issues with RNA-seq analysis in non-model organisms: A salmonid example.

    PubMed

    Sundaram, Arvind; Tengs, Torstein; Grimholt, Unni

    2017-10-01

    High throughput sequencing (HTS) is useful for many purposes as exemplified by the other topics included in this special issue. The purpose of this paper is to look into the unique challenges of using this technology in non-model organisms where resources such as genomes, functional genome annotations or genome complexity provide obstacles not met in model organisms. To describe these challenges, we narrow our scope to RNA sequencing used to study differential gene expression in response to pathogen challenge. As a demonstration species we chose Atlantic salmon, which has a sequenced genome with poor annotation and an added complexity due to many duplicated genes. We find that our RNA-seq analysis pipeline deciphers between duplicates despite high sequence identity. However, annotation issues provide problems in linking differentially expressed genes to pathways. Also, comparing results between approaches and species are complicated due to lack of standardized annotation. Copyright © 2017 Elsevier Ltd. All rights reserved.

  15. Using SQL Databases for Sequence Similarity Searching and Analysis.

    PubMed

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  16. Quantitative analysis and prediction of G-quadruplex forming sequences in double-stranded DNA

    PubMed Central

    Kim, Minji; Kreig, Alex; Lee, Chun-Ying; Rube, H. Tomas; Calvert, Jacob; Song, Jun S.; Myong, Sua

    2016-01-01

    Abstract G-quadruplex (GQ) is a four-stranded DNA structure that can be formed in guanine-rich sequences. GQ structures have been proposed to regulate diverse biological processes including transcription, replication, translation and telomere maintenance. Recent studies have demonstrated the existence of GQ DNA in live mammalian cells and a significant number of potential GQ forming sequences in the human genome. We present a systematic and quantitative analysis of GQ folding propensity on a large set of 438 GQ forming sequences in double-stranded DNA by integrating fluorescence measurement, single-molecule imaging and computational modeling. We find that short minimum loop length and the thymine base are two main factors that lead to high GQ folding propensity. Linear and Gaussian process regression models further validate that the GQ folding potential can be predicted with high accuracy based on the loop length distribution and the nucleotide content of the loop sequences. Our study provides important new parameters that can inform the evaluation and classification of putative GQ sequences in the human genome. PMID:27095201

  17. Winnowing DNA for rare sequences: highly specific sequence and methylation based enrichment.

    PubMed

    Thompson, Jason D; Shibahara, Gosuke; Rajan, Sweta; Pel, Joel; Marziali, Andre

    2012-01-01

    Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue.

  18. A statistical method for the detection of variants from next-generation resequencing of DNA pools.

    PubMed

    Bansal, Vikas

    2010-06-15

    Next-generation sequencing technologies have enabled the sequencing of several human genomes in their entirety. However, the routine resequencing of complete genomes remains infeasible. The massive capacity of next-generation sequencers can be harnessed for sequencing specific genomic regions in hundreds to thousands of individuals. Sequencing-based association studies are currently limited by the low level of multiplexing offered by sequencing platforms. Pooled sequencing represents a cost-effective approach for studying rare variants in large populations. To utilize the power of DNA pooling, it is important to accurately identify sequence variants from pooled sequencing data. Detection of rare variants from pooled sequencing represents a different challenge than detection of variants from individual sequencing. We describe a novel statistical approach, CRISP [Comprehensive Read analysis for Identification of Single Nucleotide Polymorphisms (SNPs) from Pooled sequencing] that is able to identify both rare and common variants by using two approaches: (i) comparing the distribution of allele counts across multiple pools using contingency tables and (ii) evaluating the probability of observing multiple non-reference base calls due to sequencing errors alone. Information about the distribution of reads between the forward and reverse strands and the size of the pools is also incorporated within this framework to filter out false variants. Validation of CRISP on two separate pooled sequencing datasets generated using the Illumina Genome Analyzer demonstrates that it can detect 80-85% of SNPs identified using individual sequencing while achieving a low false discovery rate (3-5%). Comparison with previous methods for pooled SNP detection demonstrates the significantly lower false positive and false negative rates for CRISP. Implementation of this method is available at http://polymorphism.scripps.edu/~vbansal/software/CRISP/.

  19. Computational analysis of stochastic heterogeneity in PCR amplification efficiency revealed by single molecule barcoding

    PubMed Central

    Best, Katharine; Oakes, Theres; Heather, James M.; Shawe-Taylor, John; Chain, Benny

    2015-01-01

    The polymerase chain reaction (PCR) is one of the most widely used techniques in molecular biology. In combination with High Throughput Sequencing (HTS), PCR is widely used to quantify transcript abundance for RNA-seq, and in the context of analysis of T and B cell receptor repertoires. In this study, we combine DNA barcoding with HTS to quantify PCR output from individual target molecules. We develop computational tools that simulate both the PCR branching process itself, and the subsequent subsampling which typically occurs during HTS sequencing. We explore the influence of different types of heterogeneity on sequencing output, and compare them to experimental results where the efficiency of amplification is measured by barcodes uniquely identifying each molecule of starting template. Our results demonstrate that the PCR process introduces substantial amplification heterogeneity, independent of primer sequence and bulk experimental conditions. This heterogeneity can be attributed both to inherited differences between different template DNA molecules, and the inherent stochasticity of the PCR process. The results demonstrate that PCR heterogeneity arises even when reaction and substrate conditions are kept as constant as possible, and therefore single molecule barcoding is essential in order to derive reproducible quantitative results from any protocol combining PCR with HTS. PMID:26459131

  20. Oligonucleotide gap-fill ligation for mutation detection and sequencing in situ

    PubMed Central

    Mignardi, Marco; Mezger, Anja; Qian, Xiaoyan; La Fleur, Linnea; Botling, Johan; Larsson, Chatarina; Nilsson, Mats

    2015-01-01

    In clinical diagnostics a great need exists for targeted in situ multiplex nucleic acid analysis as the mutational status can offer guidance for effective treatment. One well-established method uses padlock probes for mutation detection and multiplex expression analysis directly in cells and tissues. Here, we use oligonucleotide gap-fill ligation to further increase specificity and to capture molecular substrates for in situ sequencing. Short oligonucleotides are joined at both ends of a padlock gap probe by two ligation events and are then locally amplified by target-primed rolling circle amplification (RCA) preserving spatial information. We demonstrate the specific detection of the A3243G mutation of mitochondrial DNA and we successfully characterize a single nucleotide variant in the ACTB mRNA in cells by in situ sequencing of RCA products generated by padlock gap-fill ligation. To demonstrate the clinical applicability of our assay, we show specific detection of a point mutation in the EGFR gene in fresh frozen and formalin-fixed, paraffin-embedded (FFPE) lung cancer samples and confirm the detected mutation by in situ sequencing. This approach presents several advantages over conventional padlock probes allowing simpler assay design for multiplexed mutation detection to screen for the presence of mutations in clinically relevant mutational hotspots directly in situ. PMID:26240388

  1. The complete chloroplast genome sequence of the medicinal plant Salvia miltiorrhiza.

    PubMed

    Qian, Jun; Song, Jingyuan; Gao, Huanhuan; Zhu, Yingjie; Xu, Jiang; Pang, Xiaohui; Yao, Hui; Sun, Chao; Li, Xian'en; Li, Chuyuan; Liu, Juyan; Xu, Haibin; Chen, Shilin

    2013-01-01

    Salvia miltiorrhiza is an important medicinal plant with great economic and medicinal value. The complete chloroplast (cp) genome sequence of Salvia miltiorrhiza, the first sequenced member of the Lamiaceae family, is reported here. The genome is 151,328 bp in length and exhibits a typical quadripartite structure of the large (LSC, 82,695 bp) and small (SSC, 17,555 bp) single-copy regions, separated by a pair of inverted repeats (IRs, 25,539 bp). It contains 114 unique genes, including 80 protein-coding genes, 30 tRNAs and four rRNAs. The genome structure, gene order, GC content and codon usage are similar to the typical angiosperm cp genomes. Four forward, three inverted and seven tandem repeats were detected in the Salvia miltiorrhiza cp genome. Simple sequence repeat (SSR) analysis among the 30 asterid cp genomes revealed that most SSRs are AT-rich, which contribute to the overall AT richness of these cp genomes. Additionally, fewer SSRs are distributed in the protein-coding sequences compared to the non-coding regions, indicating an uneven distribution of SSRs within the cp genomes. Entire cp genome comparison of Salvia miltiorrhiza and three other Lamiales cp genomes showed a high degree of sequence similarity and a relatively high divergence of intergenic spacers. Sequence divergence analysis discovered the ten most divergent and ten most conserved genes as well as their length variation, which will be helpful for phylogenetic studies in asterids. Our analysis also supports that both regional and functional constraints affect gene sequence evolution. Further, phylogenetic analysis demonstrated a sister relationship between Salvia miltiorrhiza and Sesamum indicum. The complete cp genome sequence of Salvia miltiorrhiza reported in this paper will facilitate population, phylogenetic and cp genetic engineering studies of this medicinal plant.

  2. Multilocus sequence typing of Pseudomonas syringae sensu lato confirms previously described genomospecies and permits rapid identification of P. syringae pv. coriandricola and P. syringae pv. apii causing bacterial leaf spot on parsley.

    PubMed

    Bull, Carolee T; Clarke, Christopher R; Cai, Rongman; Vinatzer, Boris A; Jardini, Teresa M; Koike, Steven T

    2011-07-01

    Since 2002, severe leaf spotting on parsley (Petroselinum crispum) has occurred in Monterey County, CA. Either of two different pathovars of Pseudomonas syringae sensu lato were isolated from diseased leaves from eight distinct outbreaks and once from the same outbreak. Fragment analysis of DNA amplified between repetitive sequence polymerase chain reaction; 16S rDNA sequence analysis; and biochemical, physiological, and host range tests identified the pathogens as Pseudomonas syringae pv. apii and P. syringae pv. coriandricola. Koch's postulates were completed for the isolates from parsley, and host range tests with parsley isolates and pathotype strains demonstrated that P. syringae pv. apii and P. syringae pv. coriandricola cause leaf spot diseases on parsley, celery, and coriander or cilantro. In a multilocus sequence typing (MLST) approach, four housekeeping gene fragments were sequenced from 10 strains isolated from parsley and 56 pathotype strains of P. syringae. Allele sequences were uploaded to the Plant-Associated Microbes Database and a phylogenetic tree was built based on concatenated sequences. Tree topology directly corresponded to P. syringae genomospecies and P. syringae pv. apii was allocated appropriately to genomospecies 3. This is the first demonstration that MLST can accurately allocate new pathogens directly to P. syringae sensu lato genomospecies. According to MLST, P. syringae pv. coriandricola is a member of genomospecies 9, P. cannabina. In a blind test, both P. syringae pv. coriandricola and P. syringae pv. apii isolates from parsley were correctly identified to pathovar. In both cases, MLST described diversity within each pathovar that was previously unknown.

  3. Interpreting the biological relevance of bioinformatic analyses with T-DNA sequence for protein allergenicity.

    PubMed

    Harper, B; McClain, S; Ganko, E W

    2012-08-01

    Global regulatory agencies require bioinformatic sequence analysis as part of their safety evaluation for transgenic crops. Analysis typically focuses on encoded proteins and adjacent endogenous flanking sequences. Recently, regulatory expectations have expanded to include all reading frames of the inserted DNA. The intent is to provide biologically relevant results that can be used in the overall assessment of safety. This paper evaluates the relevance of assessing the allergenic potential of all DNA reading frames found in common food genes using methods considered for the analysis of T-DNA sequences used in transgenic crops. FASTA and BLASTX algorithms were used to compare genes from maize, rice, soybean, cucumber, melon, watermelon, and tomato using international regulatory guidance. Results show that BLASTX for maize yielded 7254 alignments that exceeded allergen similarity thresholds and 210,772 alignments that matched eight or more consecutive amino acids with an allergen; other crops produced similar results. This analysis suggests that each nontransgenic crop has a much greater potential for allergenic risk than what has been observed clinically. We demonstrate that a meaningful safety assessment is unlikely to be provided by using methods with inherently high frequencies of false positive alignments when broadly applied to all reading frames of DNA sequence. Copyright © 2012 Elsevier Inc. All rights reserved.

  4. High throughput sequencing analysis of RNA libraries reveals the influences of initial library and PCR methods on SELEX efficiency.

    PubMed

    Takahashi, Mayumi; Wu, Xiwei; Ho, Michelle; Chomchan, Pritsana; Rossi, John J; Burnett, John C; Zhou, Jiehua

    2016-09-22

    The systemic evolution of ligands by exponential enrichment (SELEX) technique is a powerful and effective aptamer-selection procedure. However, modifications to the process can dramatically improve selection efficiency and aptamer performance. For example, droplet digital PCR (ddPCR) has been recently incorporated into SELEX selection protocols to putatively reduce the propagation of byproducts and avoid selection bias that result from differences in PCR efficiency of sequences within the random library. However, a detailed, parallel comparison of the efficacy of conventional solution PCR versus the ddPCR modification in the RNA aptamer-selection process is needed to understand effects on overall SELEX performance. In the present study, we took advantage of powerful high throughput sequencing technology and bioinformatics analysis coupled with SELEX (HT-SELEX) to thoroughly investigate the effects of initial library and PCR methods in the RNA aptamer identification. Our analysis revealed that distinct "biased sequences" and nucleotide composition existed in the initial, unselected libraries purchased from two different manufacturers and that the fate of the "biased sequences" was target-dependent during selection. Our comparison of solution PCR- and ddPCR-driven HT-SELEX demonstrated that PCR method affected not only the nucleotide composition of the enriched sequences, but also the overall SELEX efficiency and aptamer efficacy.

  5. Rapid Identification of Sequences for Orphan Enzymes to Power Accurate Protein Annotation

    PubMed Central

    Ojha, Sunil; Watson, Douglas S.; Bomar, Martha G.; Galande, Amit K.; Shearer, Alexander G.

    2013-01-01

    The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the “back catalog” of enzymology – “orphan enzymes,” those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme “back catalog” is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology’s “back catalog” another powerful tool to drive accurate genome annotation. PMID:24386392

  6. Rapid identification of sequences for orphan enzymes to power accurate protein annotation.

    PubMed

    Ramkissoon, Kevin R; Miller, Jennifer K; Ojha, Sunil; Watson, Douglas S; Bomar, Martha G; Galande, Amit K; Shearer, Alexander G

    2013-01-01

    The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the "back catalog" of enzymology--"orphan enzymes," those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme "back catalog" is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology's "back catalog" another powerful tool to drive accurate genome annotation.

  7. HIV drug resistance testing among patients failing second line antiretroviral therapy. Comparison of in-house and commercial sequencing.

    PubMed

    Chimukangara, Benjamin; Varyani, Bhavini; Shamu, Tinei; Mutsvangwa, Junior; Manasa, Justen; White, Elizabeth; Chimbetete, Cleophas; Luethy, Ruedi; Katzenstein, David

    2017-05-01

    HIV genotyping is often unavailable in low and middle-income countries due to infrastructure requirements and cost. We compared genotype resistance testing in patients with virologic failure, by amplification of HIV pol gene, followed by "in-house" sequencing and commercial sequencing. Remnant plasma samples from adults and children failing second-line ART were amplified and sequenced using in-house and commercial di-deoxysequencing, and analyzed in Harare, Zimbabwe and at Stanford, U.S.A, respectively. HIV drug resistance mutations were determined using the Stanford HIV drug resistance database. Twenty-six of 28 samples were amplified and 25 were successfully genotyped. Comparison of average percent nucleotide and amino acid identities between 23 pairs sequenced in both laboratories were 99.51 (±0.56) and 99.11 (±0.95), respectively. All pairs clustered together in phylogenetic analysis. Sequencing analysis identified 6/23 pairs with mutation discordances resulting in differences in phenotype, but these did not impact future regimens. The results demonstrate our ability to produce good quality drug resistance data in-house. Despite discordant mutations in some sequence pairs, the phenotypic predictions were not clinically significant. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Discrimination of Bacillus anthracis from closely related microorganisms by analysis of 16S and 23S rRNA with oligonucleotide microchips

    DOEpatents

    Bavykin, Sergei G.; Mirzabekov, Andrei D.

    2007-10-30

    The present invention is directed to a novel method of discriminating a highly infectious bacterium Bacillus anthracis from a group of closely related microorganisms. Sequence variations in the 16S and 23S rRNA of the B. cereus subgroup including B. anthracis are utilized to construct an array that can detect these sequence variations through selective hybridizations. The identification and analysis of these sequence variations enables positive discrimination of isolates of the B. cereus group that includes B. anthracis. Discrimination of single base differences in rRNA was achieved with a microchip during analysis of B. cereus group isolates from both single and in mixed probes, as well as identification of polymorphic sites. Successful use of a microchip to determine the appropriate subgroup classification using eight reference microorganisms from the B. cereus group as a study set, was demonstrated.

  9. Molecular and Physiological Analysis of a Heat-Shock Response in Wheat 1

    PubMed Central

    McElwain, Elizabeth F.; Spiker, Steven

    1992-01-01

    We have isolated two cDNA clones from wheat (Triticum aestivum L. var Stephens), designated WHSP16.8 and WHSP16.9, that are highly similar in sequence to the low molecular weight heat-shock protein genes previously isolated from soybean. RNA blot analysis confirms that these sequences are present in heat-shocked wheat seedlings, but not in control tissues. The WHSP16.8 and WHSP16.9 cDNAs were isolated by screening a lambda gt11 expression library with antibodies to HMGc (a chromosomal protein of wheat). Immunoblot analysis has demonstrated that the antibodies raised against HMGc also recognize a group of proteins that are induced by heat shock and have molecular weights (estimated by sodium dodecyl sulfate electrophoresis) consistent with the molecular weights of the proteins deduced from the sequences of the cDNAs. ImagesFigure 3Figure 4Figure 5 PMID:16669058

  10. Wasabi: An Integrated Platform for Evolutionary Sequence Analysis and Data Visualization.

    PubMed

    Veidenberg, Andres; Medlar, Alan; Löytynoja, Ari

    2016-04-01

    Wasabi is an open source, web-based environment for evolutionary sequence analysis. Wasabi visualizes sequence data together with a phylogenetic tree within a modern, user-friendly interface: The interface hides extraneous options, supports context sensitive menus, drag-and-drop editing, and displays additional information, such as ancestral sequences, associated with specific tree nodes. The Wasabi environment supports reproducibility by automatically storing intermediate analysis steps and includes built-in functions to share data between users and publish analysis results. For computational analysis, Wasabi supports PRANK and PAGAN for phylogeny-aware alignment and alignment extension, and it can be easily extended with other tools. Along with drag-and-drop import of local files, Wasabi can access remote data through URL and import sequence data, GeneTrees and EPO alignments directly from Ensembl. To demonstrate a typical workflow using Wasabi, we reproduce key findings from recent comparative genomics studies, including a reanalysis of the EGLN1 gene from the tiger genome study: These case studies can be browsed within Wasabi at http://wasabiapp.org:8000?id=usecases. Wasabi runs inside a web browser and does not require any installation. One can start using it at http://wasabiapp.org. All source code is licensed under the AGPLv3. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  11. Two estrogen response element sequences near the PCNA gene are not responsible for its estrogen-enhanced expression in MCF7 cells.

    PubMed

    Wang, Cheng; Yu, Jie; Kallen, Caleb B

    2008-01-01

    The proliferating cell nuclear antigen (PCNA) is an essential component of DNA replication, cell cycle regulation, and epigenetic inheritance. High expression of PCNA is associated with poor prognosis in patients with breast cancer. The 5'-region of the PCNA gene contains two computationally-detected estrogen response element (ERE) sequences, one of which is evolutionarily conserved. Both of these sequences are of undocumented cis-regulatory function. We recently demonstrated that estradiol (E2) enhances PCNA mRNA expression in MCF7 breast cancer cells. MCF7 cells proliferate in response to E2. Here, we demonstrate that E2 rapidly enhanced PCNA mRNA and protein expression in a process that requires ERalpha as well as de novo protein synthesis. One of the two upstream ERE sequences was specifically bound by ERalpha-containing protein complexes, in vitro, in gel shift analysis. Yet, each ERE sequence, when cloned as a single copy, or when engineered as two tandem copies of the ERE-containing sequence, was not capable of activating a luciferase reporter construct in response to E2. In MCF7 cells, neither ERE-containing genomic region demonstrated E2-dependent recruitment of ERalpha by sensitive ChIP-PCR assays. We conclude that E2 enhances PCNA gene expression by an indirect process and that computational detection of EREs, even when evolutionarily conserved and when near E2-responsive genes, requires biochemical validation.

  12. Whole-genome sequence-based analysis of thyroid function.

    PubMed

    Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J; Traglia, Michela; Brown, Suzanne J; Mullin, Benjamin H; Shihab, Hashem A; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R; Beilby, John P; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D; Hui, Jennie; Lim, Ee M; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R B; Bell, Jordana T; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M; Naitza, Silvia; Walsh, John P; Spector, Tim; Davey Smith, George; Durbin, Richard; Richards, J Brent; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J; Wilson, Scott G

    2015-03-06

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10(-9)) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10(-14)). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10(-9)) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10(-11)). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function.

  13. Biosynthesis of Lipoic Acid in Arabidopsis: Cloning and Characterization of the cDNA for Lipoic Acid Synthase1

    PubMed Central

    Yasuno, Rie; Wada, Hajime

    1998-01-01

    Lipoic acid is a coenzyme that is essential for the activity of enzyme complexes such as those of pyruvate dehydrogenase and glycine decarboxylase. We report here the isolation and characterization of LIP1 cDNA for lipoic acid synthase of Arabidopsis. The Arabidopsis LIP1 cDNA was isolated using an expressed sequence tag homologous to the lipoic acid synthase of Escherichia coli. This cDNA was shown to code for Arabidopsis lipoic acid synthase by its ability to complement a lipA mutant of E. coli defective in lipoic acid synthase. DNA-sequence analysis of the LIP1 cDNA revealed an open reading frame predicting a protein of 374 amino acids. Comparisons of the deduced amino acid sequence with those of E. coli and yeast lipoic acid synthase homologs showed a high degree of sequence similarity and the presence of a leader sequence presumably required for import into the mitochondria. Southern-hybridization analysis suggested that LIP1 is a single-copy gene in Arabidopsis. Western analysis with an antibody against lipoic acid synthase demonstrated that this enzyme is located in the mitochondrial compartment in Arabidopsis cells as a 43-kD polypeptide. PMID:9808738

  14. Longitudinal Metagenomic Analysis of Hospital Air Identifies Clinically Relevant Microbes.

    PubMed

    King, Paula; Pham, Long K; Waltz, Shannon; Sphar, Dan; Yamamoto, Robert T; Conrad, Douglas; Taplitz, Randy; Torriani, Francesca; Forsyth, R Allyn

    2016-01-01

    We describe the sampling of sixty-three uncultured hospital air samples collected over a six-month period and analysis using shotgun metagenomic sequencing. Our primary goals were to determine the longitudinal metagenomic variability of this environment, identify and characterize genomes of potential pathogens and determine whether they are atypical to the hospital airborne metagenome. Air samples were collected from eight locations which included patient wards, the main lobby and outside. The resulting DNA libraries produced 972 million sequences representing 51 gigabases. Hierarchical clustering of samples by the most abundant 50 microbial orders generated three major nodes which primarily clustered by type of location. Because the indoor locations were longitudinally consistent, episodic relative increases in microbial genomic signatures related to the opportunistic pathogens Aspergillus, Penicillium and Stenotrophomonas were identified as outliers at specific locations. Further analysis of microbial reads specific for Stenotrophomonas maltophilia indicated homology to a sequenced multi-drug resistant clinical strain and we observed broad sequence coverage of resistance genes. We demonstrate that a shotgun metagenomic sequencing approach can be used to characterize the resistance determinants of pathogen genomes that are uncharacteristic for an otherwise consistent hospital air microbial metagenomic profile.

  15. Implementation of Cloud based next generation sequencing data analysis in a clinical laboratory.

    PubMed

    Onsongo, Getiria; Erdmann, Jesse; Spears, Michael D; Chilton, John; Beckman, Kenneth B; Hauge, Adam; Yohe, Sophia; Schomaker, Matthew; Bower, Matthew; Silverstein, Kevin A T; Thyagarajan, Bharat

    2014-05-23

    The introduction of next generation sequencing (NGS) has revolutionized molecular diagnostics, though several challenges remain limiting the widespread adoption of NGS testing into clinical practice. One such difficulty includes the development of a robust bioinformatics pipeline that can handle the volume of data generated by high-throughput sequencing in a cost-effective manner. Analysis of sequencing data typically requires a substantial level of computing power that is often cost-prohibitive to most clinical diagnostics laboratories. To address this challenge, our institution has developed a Galaxy-based data analysis pipeline which relies on a web-based, cloud-computing infrastructure to process NGS data and identify genetic variants. It provides additional flexibility, needed to control storage costs, resulting in a pipeline that is cost-effective on a per-sample basis. It does not require the usage of EBS disk to run a sample. We demonstrate the validation and feasibility of implementing this bioinformatics pipeline in a molecular diagnostics laboratory. Four samples were analyzed in duplicate pairs and showed 100% concordance in mutations identified. This pipeline is currently being used in the clinic and all identified pathogenic variants confirmed using Sanger sequencing further validating the software.

  16. Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis.

    PubMed

    Bonham-Carter, Oliver; Steele, Joe; Bastola, Dhundy

    2014-11-01

    Modern sequencing and genome assembly technologies have provided a wealth of data, which will soon require an analysis by comparison for discovery. Sequence alignment, a fundamental task in bioinformatics research, may be used but with some caveats. Seminal techniques and methods from dynamic programming are proving ineffective for this work owing to their inherent computational expense when processing large amounts of sequence data. These methods are prone to giving misleading information because of genetic recombination, genetic shuffling and other inherent biological events. New approaches from information theory, frequency analysis and data compression are available and provide powerful alternatives to dynamic programming. These new methods are often preferred, as their algorithms are simpler and are not affected by synteny-related problems. In this review, we provide a detailed discussion of computational tools, which stem from alignment-free methods based on statistical analysis from word frequencies. We provide several clear examples to demonstrate applications and the interpretations over several different areas of alignment-free analysis such as base-base correlations, feature frequency profiles, compositional vectors, an improved string composition and the D2 statistic metric. Additionally, we provide detailed discussion and an example of analysis by Lempel-Ziv techniques from data compression. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  17. Phylogenomics from Whole Genome Sequences Using aTRAM.

    PubMed

    Allen, Julie M; Boyd, Bret; Nguyen, Nam-Phuong; Vachaspati, Pranjal; Warnow, Tandy; Huang, Daisie I; Grady, Patrick G S; Bell, Kayce C; Cronk, Quentin C B; Mugisha, Lawrence; Pittendrigh, Barry R; Leonardi, M Soledad; Reed, David L; Johnson, Kevin P

    2017-09-01

    Novel sequencing technologies are rapidly expanding the size of data sets that can be applied to phylogenetic studies. Currently the most commonly used phylogenomic approaches involve some form of genome reduction. While these approaches make assembling phylogenomic data sets more economical for organisms with large genomes, they reduce the genomic coverage and thereby the long-term utility of the data. Currently, for organisms with moderate to small genomes ($<$1000 Mbp) it is feasible to sequence the entire genome at modest coverage ($10-30\\times$). Computational challenges for handling these large data sets can be alleviated by assembling targeted reads, rather than assembling the entire genome, to produce a phylogenomic data matrix. Here we demonstrate the use of automated Target Restricted Assembly Method (aTRAM) to assemble 1107 single-copy ortholog genes from whole genome sequencing of sucking lice (Anoplura) and out-groups. We developed a pipeline to extract exon sequences from the aTRAM assemblies by annotating them with respect to the original target protein. We aligned these protein sequences with the inferred amino acids and then performed phylogenetic analyses on both the concatenated matrix of genes and on each gene separately in a coalescent analysis. Finally, we tested the limits of successful assembly in aTRAM by assembling 100 genes from close- to distantly related taxa at high to low levels of coverage.Both the concatenated analysis and the coalescent-based analysis produced the same tree topology, which was consistent with previously published results and resolved weakly supported nodes. These results demonstrate that this approach is successful at developing phylogenomic data sets from raw genome sequencing reads. Further, we found that with coverages above $5-10\\times$, aTRAM was successful at assembling 80-90% of the contigs for both close and distantly related taxa. As sequencing costs continue to decline, we expect full genome sequencing will become more feasible for a wider array of organisms, and aTRAM will enable mining of these genomic data sets for an extensive variety of applications, including phylogenomics. [aTRAM; gene assembly; genome sequencing; phylogenomics.]. © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  18. Remote Stratigraphic Analysis: Combined TM and AIS Results in the Wind River/bighorn Basin Area, Wyoming

    NASA Technical Reports Server (NTRS)

    Lang, H. R.; Paylor, E. D.; Adams, S.

    1985-01-01

    An in-progress study demonstrates the utility of airborne imaging spectrometer (AIS) data for unraveling the stratigraphic evolution of a North American, western interior foreland basin. AIS data are used to determine the stratigraphic distribution of mineralogical facies that are diagnostic of specific depositional environments. After wavelength and amplitude calibration using natural ground targets with known spectral characteristics, AIS data identify calcite, dolomite, gypsum and montmorillonite-bearing strata in the Permian-Cretaceous sequence. Combined AIS and TM results illustrate the feasibility of spectral stratigraphy, remote analysis of stratigraphic sequences.

  19. A long natural-antisense RNA is accumulated in the conidia of Aspergillus oryzae.

    PubMed

    Tsujii, Masaru; Okuda, Satoshi; Ishi, Kazutomo; Madokoro, Kana; Takeuchi, Michio; Yamagata, Youhei

    2016-01-01

    Analysis of expressed sequence tag libraries from various culture conditions revealed the existence of conidia-specific transcripts assembled to putative conidiation-specific reductase gene (csrA) in Aspergillus oryzae. However, the all transcripts were transcribed with opposite direction to the gene csrA. The sequence analysis of the transcript revealed that the RNA overlapped mRNA of csrA with 3'-end, and did not code protein longer than 60 amino acid residues. We designated the transcript Conidia Specific Long Natural-antisense RNA (CSLNR). The real-time PCR analysis demonstrated that the CSLNR is conidia-specific transcript, which cannot be transcribed in the absence of brlA, and the amount of CSLNR was much more than that of the transcript from csrA in conidia. Furthermore, the csrA deletion, also lacking coding region of CSLNR in A. oryzae reduced the number of conidia. Overexpression of CsrA demonstrated the inhibition of growth and conidiation, while CSLNR did not affect conidiation.

  20. [Comparative analysis of variable regions in the genomes of variola virus].

    PubMed

    Babkin, I V; Nepomniashchikh, T S; Maksiutov, R A; Gutorov, V V; Babkina, I N; Shchelkunov, S N

    2008-01-01

    Nucleotide sequences of two extended segments of the terminal variable regions in variola virus genome were determined. The size of the left segment was 13.5 kbp and of the right, 10.5 kbp. Totally, over 540 kbp were sequenced for 22 variola virus strains. The conducted phylogenetic analysis and the data published earlier allowed us to find the interrelations between 70 variola virus isolates, the character of their clustering, and the degree of intergroup and intragroup variations of the clusters of variola virus strains. The most polymorphic loci of the genome segments studied were determined. It was demonstrated that that these loci are localized to either noncoding genome regions or to the regions of destroyed open reading frames, characteristic of the ancestor virus. These loci are promising for development of the strategy for genotyping variola virus strains. Analysis of recombination using various methods demonstrated that, with the only exception, no statistically significant recombinational events in the genomes of variola virus strains studied were detectable.

  1. A 29-year-old pregnant woman with worsening left hemiparesis, encephalopathy, and hemodynamic instability: a case report of subacute sclerosing panencephalitis.

    PubMed

    Reis, Gerald F; Ritter, Jana M; Bellini, William J; Rota, Paul A; Bollen, Andrew W

    2015-01-01

    A 29-year-old pregnant woman developed progressively worsening encephalopathy, left hemiparesis, and hemodynamic instability over a 6-week period. Initial brain MRI and work-up for infectious and autoimmune causes were normal, although elevated IgG and oligoclonal bands were seen on analysis of the cerebrospinal fluid (CSF). After uncomplicated spontaneous delivery of a preterm healthy infant, her condition worsened. Repeat brain MRI demonstrated generalized volume loss and evidence of corticospinal tract degeneration. She underwent a brain biopsy, which showed characteristic viral inclusions of the type seen in subacute sclerosing panencephalitis (SSPE). The diagnosis was confirmed by immunohistochemistry and electron microscopy, and additional CSF analysis also showed markedly elevated IgG titer for measles. Sequence analysis of the nucleoprotein gene N-450 demonstrated a close relationship to the sequences of viruses in genotype D7. This case documents an ~ 6-month progression to death of SSPE in a pregnant woman.

  2. ASSET: Analysis of Sequences of Synchronous Events in Massively Parallel Spike Trains

    PubMed Central

    Canova, Carlos; Denker, Michael; Gerstein, George; Helias, Moritz

    2016-01-01

    With the ability to observe the activity from large numbers of neurons simultaneously using modern recording technologies, the chance to identify sub-networks involved in coordinated processing increases. Sequences of synchronous spike events (SSEs) constitute one type of such coordinated spiking that propagates activity in a temporally precise manner. The synfire chain was proposed as one potential model for such network processing. Previous work introduced a method for visualization of SSEs in massively parallel spike trains, based on an intersection matrix that contains in each entry the degree of overlap of active neurons in two corresponding time bins. Repeated SSEs are reflected in the matrix as diagonal structures of high overlap values. The method as such, however, leaves the task of identifying these diagonal structures to visual inspection rather than to a quantitative analysis. Here we present ASSET (Analysis of Sequences of Synchronous EvenTs), an improved, fully automated method which determines diagonal structures in the intersection matrix by a robust mathematical procedure. The method consists of a sequence of steps that i) assess which entries in the matrix potentially belong to a diagonal structure, ii) cluster these entries into individual diagonal structures and iii) determine the neurons composing the associated SSEs. We employ parallel point processes generated by stochastic simulations as test data to demonstrate the performance of the method under a wide range of realistic scenarios, including different types of non-stationarity of the spiking activity and different correlation structures. Finally, the ability of the method to discover SSEs is demonstrated on complex data from large network simulations with embedded synfire chains. Thus, ASSET represents an effective and efficient tool to analyze massively parallel spike data for temporal sequences of synchronous activity. PMID:27420734

  3. Motion video analysis using planar parallax

    NASA Astrophysics Data System (ADS)

    Sawhney, Harpreet S.

    1994-04-01

    Motion and structure analysis in video sequences can lead to efficient descriptions of objects and their motions. Interesting events in videos can be detected using such an analysis--for instance independent object motion when the camera itself is moving, figure-ground segregation based on the saliency of a structure compared to its surroundings. In this paper we present a method for 3D motion and structure analysis that uses a planar surface in the environment as a reference coordinate system to describe a video sequence. The motion in the video sequence is described as the motion of the reference plane, and the parallax motion of all the non-planar components of the scene. It is shown how this method simplifies the otherwise hard general 3D motion analysis problem. In addition, a natural coordinate system in the environment is used to describe the scene which can simplify motion based segmentation. This work is a part of an ongoing effort in our group towards video annotation and analysis for indexing and retrieval. Results from a demonstration system being developed are presented.

  4. Recent patents of nanopore DNA sequencing technology: progress and challenges.

    PubMed

    Zhou, Jianfeng; Xu, Bingqian

    2010-11-01

    DNA sequencing techniques witnessed fast development in the last decades, primarily driven by the Human Genome Project. Among the proposed new techniques, Nanopore was considered as a suitable candidate for the single DNA sequencing with ultrahigh speed and very low cost. Several fabrication and modification techniques have been developed to produce robust and well-defined nanopore devices. Many efforts have also been done to apply nanopore to analyze the properties of DNA molecules. By comparing with traditional sequencing techniques, nanopore has demonstrated its distinctive superiorities in main practical issues, such as sample preparation, sequencing speed, cost-effective and read-length. Although challenges still remain, recent researches in improving the capabilities of nanopore have shed a light to achieve its ultimate goal: Sequence individual DNA strand at single nucleotide level. This patent review briefly highlights recent developments and technological achievements for DNA analysis and sequencing at single molecule level, focusing on nanopore based methods.

  5. AmpliVar: mutation detection in high-throughput sequence from amplicon-based libraries.

    PubMed

    Hsu, Arthur L; Kondrashova, Olga; Lunke, Sebastian; Love, Clare J; Meldrum, Cliff; Marquis-Nicholson, Renate; Corboy, Greg; Pham, Kym; Wakefield, Matthew; Waring, Paul M; Taylor, Graham R

    2015-04-01

    Conventional means of identifying variants in high-throughput sequencing align each read against a reference sequence, and then call variants at each position. Here, we demonstrate an orthogonal means of identifying sequence variation by grouping the reads as amplicons prior to any alignment. We used AmpliVar to make key-value hashes of sequence reads and group reads as individual amplicons using a table of flanking sequences. Low-abundance reads were removed according to a selectable threshold, and reads above this threshold were aligned as groups, rather than as individual reads, permitting the use of sensitive alignment tools. We show that this approach is more sensitive, more specific, and more computationally efficient than comparable methods for the analysis of amplicon-based high-throughput sequencing data. The method can be extended to enable alignment-free confirmation of variants seen in hybridization capture target-enrichment data. © 2015 WILEY PERIODICALS, INC.

  6. Identification of Delta5-fatty acid desaturase from the cellular slime mold dictyostelium discoideum.

    PubMed

    Saito, T; Ochiai, H

    1999-10-01

    cDNA fragments putatively encoding amino acid sequences characteristic of the fatty acid desaturase were obtained using expressed sequence tag (EST) information of the Dictyostelium cDNA project. Using this sequence, we have determined the cDNA sequence and genomic sequence of a desaturase. The cloned cDNA is 1489 nucleotides long and the deduced amino acid sequence comprised 464 amino acid residues containing an N-terminal cytochrome b5 domain. The whole sequence was 38.6% identical to the initially identified Delta5-desaturase of Mortierella alpina. We have confirmed its function as Delta5-desaturase by over expression mutation in D. discoideum and also the gain of function mutation in the yeast Saccharomyces cerevisiae. Analysis of the lipids from transformed D. discoideum and yeast demonstrated the accumulation of Delta5-desaturated products. This is the first report concering fatty acid desaturase in cellular slime molds.

  7. B-MIC: An Ultrafast Three-Level Parallel Sequence Aligner Using MIC.

    PubMed

    Cui, Yingbo; Liao, Xiangke; Zhu, Xiaoqian; Wang, Bingqiang; Peng, Shaoliang

    2016-03-01

    Sequence alignment is the central process for sequence analysis, where mapping raw sequencing data to reference genome. The large amount of data generated by NGS is far beyond the process capabilities of existing alignment tools. Consequently, sequence alignment becomes the bottleneck of sequence analysis. Intensive computing power is required to address this challenge. Intel recently announced the MIC coprocessor, which can provide massive computing power. The Tianhe-2 is the world's fastest supercomputer now equipped with three MIC coprocessors each compute node. A key feature of sequence alignment is that different reads are independent. Considering this property, we proposed a MIC-oriented three-level parallelization strategy to speed up BWA, a widely used sequence alignment tool, and developed our ultrafast parallel sequence aligner: B-MIC. B-MIC contains three levels of parallelization: firstly, parallelization of data IO and reads alignment by a three-stage parallel pipeline; secondly, parallelization enabled by MIC coprocessor technology; thirdly, inter-node parallelization implemented by MPI. In this paper, we demonstrate that B-MIC outperforms BWA by a combination of those techniques using Inspur NF5280M server and the Tianhe-2 supercomputer. To the best of our knowledge, B-MIC is the first sequence alignment tool to run on Intel MIC and it can achieve more than fivefold speedup over the original BWA while maintaining the alignment precision.

  8. Identification of three duplicated Spin genes in medaka (Oryzias latipes).

    PubMed

    Wang, Xiao-Lei; Mei, Jie; Sun, Min; Hong, Yun-Han; Gui, Jian-Fang

    2005-05-09

    Gene and genomic duplications are very important and frequent events in fish evolution, and the divergence of duplicated genes in sequences and functions is a focus of research on gene evolution. Here, we report the identification and characterization of three duplicated Spindlin (Spin) genes from medaka (Oryzias latipes): OlSpinA, OlSpinB, and OlSpinC. Molecular cloning, genomic DNA Blast analysis and phylogenetic relationship analysis demonstrated that the three duplicated OlSpin genes should belong to gene duplication. Furthermore, Western blot analysis revealed significant expression differences of the three OlSpins among different tissues and during embryogenesis in medaka, and suggested that sequence and functional divergence might have occurred in evolution among them.

  9. Evidence for tyrosine-linked glycosaminoglycan in a bacterial surface protein.

    PubMed

    Peters, J; Rudolf, S; Oschkinat, H; Mengele, R; Sumper, M; Kellermann, J; Lottspeich, F; Baumeister, W

    1992-04-01

    The S-layer protein of Acetogenium kivui was subjected to proteolysis with different proteases and several high molecular mass glycosaminoglycan peptides containing glucose, galactosamine and an unidentified sugar-related component were separated by molecular sieve chromatography and reversed-phase HPLC and subjected to N-terminal sequence analysis. By methylation analysis glucose was found to be uniformly 1,6-linked, whereas galactosamine was exclusively 1,4-linked. Hydrazinolysis and subsequent amino-acid analysis as well as two-dimensional NMR spectroscopy were used to demonstrate that in these peptides carbohydrate was covalently linked to tyrosine. As all of the four Tyr-glycosylation sites were found to be preceded by valine, a new recognition sequence for glycosylation is suggested.

  10. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Golbus, Jessica R.; Puckelwartz, Megan J.; Dellefave-Castillo, Lisa

    Background—Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of more than 50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift towards comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Methods and Results—Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused onmore » 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1-14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and/or segregation analysis. Three of three previously identified primary mutations were detected by this analysis. In six subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and/or had additional pathological correlation to provide evidence for causality. For two subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. We conclude that these pilot data demonstrate that ~30-40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes.« less

  11. Ancient DNA sequence revealed by error-correcting codes.

    PubMed

    Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo

    2015-07-10

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.

  12. Ancient DNA sequence revealed by error-correcting codes

    PubMed Central

    Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo

    2015-01-01

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228

  13. The diploid genome sequence of an Asian individual

    PubMed Central

    Wang, Jun; Wang, Wei; Li, Ruiqiang; Li, Yingrui; Tian, Geng; Goodman, Laurie; Fan, Wei; Zhang, Junqing; Li, Jun; Zhang, Juanbin; Guo, Yiran; Feng, Binxiao; Li, Heng; Lu, Yao; Fang, Xiaodong; Liang, Huiqing; Du, Zhenglin; Li, Dong; Zhao, Yiqing; Hu, Yujie; Yang, Zhenzhen; Zheng, Hancheng; Hellmann, Ines; Inouye, Michael; Pool, John; Yi, Xin; Zhao, Jing; Duan, Jinjie; Zhou, Yan; Qin, Junjie; Ma, Lijia; Li, Guoqing; Yang, Zhentao; Zhang, Guojie; Yang, Bin; Yu, Chang; Liang, Fang; Li, Wenjie; Li, Shaochuan; Li, Dawei; Ni, Peixiang; Ruan, Jue; Li, Qibin; Zhu, Hongmei; Liu, Dongyuan; Lu, Zhike; Li, Ning; Guo, Guangwu; Zhang, Jianguo; Ye, Jia; Fang, Lin; Hao, Qin; Chen, Quan; Liang, Yu; Su, Yeyang; san, A.; Ping, Cuo; Yang, Shuang; Chen, Fang; Li, Li; Zhou, Ke; Zheng, Hongkun; Ren, Yuanyuan; Yang, Ling; Gao, Yang; Yang, Guohua; Li, Zhuo; Feng, Xiaoli; Kristiansen, Karsten; Wong, Gane Ka-Shu; Nielsen, Rasmus; Durbin, Richard; Bolund, Lars; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian

    2009-01-01

    Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics. PMID:18987735

  14. Genome-wide identification of conserved intronic non-coding sequences using a Bayesian segmentation approach.

    PubMed

    Algama, Manjula; Tasker, Edward; Williams, Caitlin; Parslow, Adam C; Bryson-Richardson, Robert J; Keith, Jonathan M

    2017-03-27

    Computational identification of non-coding RNAs (ncRNAs) is a challenging problem. We describe a genome-wide analysis using Bayesian segmentation to identify intronic elements highly conserved between three evolutionarily distant vertebrate species: human, mouse and zebrafish. We investigate the extent to which these elements include ncRNAs (or conserved domains of ncRNAs) and regulatory sequences. We identified 655 deeply conserved intronic sequences in a genome-wide analysis. We also performed a pathway-focussed analysis on genes involved in muscle development, detecting 27 intronic elements, of which 22 were not detected in the genome-wide analysis. At least 87% of the genome-wide and 70% of the pathway-focussed elements have existing annotations indicative of conserved RNA secondary structure. The expression of 26 of the pathway-focused elements was examined using RT-PCR, providing confirmation that they include expressed ncRNAs. Consistent with previous studies, these elements are significantly over-represented in the introns of transcription factors. This study demonstrates a novel, highly effective, Bayesian approach to identifying conserved non-coding sequences. Our results complement previous findings that these sequences are enriched in transcription factors. However, in contrast to previous studies which suggest the majority of conserved sequences are regulatory factor binding sites, the majority of conserved sequences identified using our approach contain evidence of conserved RNA secondary structures, and our laboratory results suggest most are expressed. Functional roles at DNA and RNA levels are not mutually exclusive, and many of our elements possess evidence of both. Moreover, ncRNAs play roles in transcriptional and post-transcriptional regulation, and this may contribute to the over-representation of these elements in introns of transcription factors. We attribute the higher sensitivity of the pathway-focussed analysis compared to the genome-wide analysis to improved alignment quality, suggesting that enhanced genomic alignments may reveal many more conserved intronic sequences.

  15. Analysis of whole genome sequences of 16 strains of rubella virus from the United States, 1961-2009.

    PubMed

    Abernathy, Emily; Chen, Min-hsin; Bera, Jayati; Shrivastava, Susmita; Kirkness, Ewen; Zheng, Qi; Bellini, William; Icenogle, Joseph

    2013-01-25

    Rubella virus is the causative agent of rubella, a mild rash illness, and a potent teratogenic agent when contracted by a pregnant woman. Global rubella control programs target the reduction and elimination of congenital rubella syndrome. Phylogenetic analysis of partial sequences of rubella viruses has contributed to virus surveillance efforts and played an important role in demonstrating that indigenous rubella viruses have been eliminated in the United States. Sixteen wild-type rubella viruses were chosen for whole genome sequencing. All 16 viruses were collected in the United States from 1961 to 2009 and are from 8 of the 13 known rubella genotypes. Phylogenetic analysis of 30 whole genome sequences produced a maximum likelihood tree giving high bootstrap values for all genotypes except provisional genotype 1a. Comparison of the 16 new complete sequences and 14 previously sequenced wild-type viruses found regions with clusters of variable amino acids. The 5' 250 nucleotides of the genome are more conserved than any other part of the genome. Genotype specific deletions in the untranslated region between the non-structural and structural open reading frames were observed for genotypes 2B and genotype 1G. No evidence was seen for recombination events among the 30 viruses. The analysis presented here is consistent with previous reports on the genetic characterization of rubella virus genomes. Conserved and variable regions were identified and additional evidence for genotype specific nucleotide deletions in the intergenic region was found. Phylogenetic analysis confirmed genotype groupings originally based on structural protein coding region sequences, which provides support for the WHO nomenclature for genetic characterization of wild-type rubella viruses.

  16. Plasmid Flux in Escherichia coli ST131 Sublineages, Analyzed by Plasmid Constellation Network (PLACNET), a New Method for Plasmid Reconstruction from Whole Genome Sequences

    PubMed Central

    Garcillán-Barcia, M. Pilar; Mora, Azucena; Blanco, Jorge; Coque, Teresa M.; de la Cruz, Fernando

    2014-01-01

    Bacterial whole genome sequence (WGS) methods are rapidly overtaking classical sequence analysis. Many bacterial sequencing projects focus on mobilome changes, since macroevolutionary events, such as the acquisition or loss of mobile genetic elements, mainly plasmids, play essential roles in adaptive evolution. Existing WGS analysis protocols do not assort contigs between plasmids and the main chromosome, thus hampering full analysis of plasmid sequences. We developed a method (called plasmid constellation networks or PLACNET) that identifies, visualizes and analyzes plasmids in WGS projects by creating a network of contig interactions, thus allowing comprehensive plasmid analysis within WGS datasets. The workflow of the method is based on three types of data: assembly information (including scaffold links and coverage), comparison to reference sequences and plasmid-diagnostic sequence features. The resulting network is pruned by expert analysis, to eliminate confounding data, and implemented in a Cytoscape-based graphic representation. To demonstrate PLACNET sensitivity and efficacy, the plasmidome of the Escherichia coli lineage ST131 was analyzed. ST131 is a globally spread clonal group of extraintestinal pathogenic E. coli (ExPEC), comprising different sublineages with ability to acquire and spread antibiotic resistance and virulence genes via plasmids. Results show that plasmids flux in the evolution of this lineage, which is wide open for plasmid exchange. MOBF12/IncF plasmids were pervasive, adding just by themselves more than 350 protein families to the ST131 pangenome. Nearly 50% of the most frequent γ–proteobacterial plasmid groups were found to be present in our limited sample of ten analyzed ST131 genomes, which represent the main ST131 sublineages. PMID:25522143

  17. Plasmid flux in Escherichia coli ST131 sublineages, analyzed by plasmid constellation network (PLACNET), a new method for plasmid reconstruction from whole genome sequences.

    PubMed

    Lanza, Val F; de Toro, María; Garcillán-Barcia, M Pilar; Mora, Azucena; Blanco, Jorge; Coque, Teresa M; de la Cruz, Fernando

    2014-12-01

    Bacterial whole genome sequence (WGS) methods are rapidly overtaking classical sequence analysis. Many bacterial sequencing projects focus on mobilome changes, since macroevolutionary events, such as the acquisition or loss of mobile genetic elements, mainly plasmids, play essential roles in adaptive evolution. Existing WGS analysis protocols do not assort contigs between plasmids and the main chromosome, thus hampering full analysis of plasmid sequences. We developed a method (called plasmid constellation networks or PLACNET) that identifies, visualizes and analyzes plasmids in WGS projects by creating a network of contig interactions, thus allowing comprehensive plasmid analysis within WGS datasets. The workflow of the method is based on three types of data: assembly information (including scaffold links and coverage), comparison to reference sequences and plasmid-diagnostic sequence features. The resulting network is pruned by expert analysis, to eliminate confounding data, and implemented in a Cytoscape-based graphic representation. To demonstrate PLACNET sensitivity and efficacy, the plasmidome of the Escherichia coli lineage ST131 was analyzed. ST131 is a globally spread clonal group of extraintestinal pathogenic E. coli (ExPEC), comprising different sublineages with ability to acquire and spread antibiotic resistance and virulence genes via plasmids. Results show that plasmids flux in the evolution of this lineage, which is wide open for plasmid exchange. MOBF12/IncF plasmids were pervasive, adding just by themselves more than 350 protein families to the ST131 pangenome. Nearly 50% of the most frequent γ-proteobacterial plasmid groups were found to be present in our limited sample of ten analyzed ST131 genomes, which represent the main ST131 sublineages.

  18. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing.

    PubMed

    Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P; Panitz, Frank; Bendixen, Christian; Nielsen, Rasmus; Willerslev, Eske

    2007-02-14

    The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis. We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses, population genetics, and phylogenetics.

  19. Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Leung, Elo; Huang, Amy; Cadag, Eithon

    In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less

  20. Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations

    DOE PAGES

    Leung, Elo; Huang, Amy; Cadag, Eithon; ...

    2016-01-20

    In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less

  1. [Sequence analysis of LEAFY homologous gene from Dendrobium moniliforme and application for identification of medicinal Dendrobium].

    PubMed

    Xing, Wen-Rui; Hou, Bei-Wei; Guan, Jing-Jiao; Luo, Jing; Ding, Xiao-Yu

    2013-04-01

    The LEAFY (LFY) homologous gene of Dendrobium moniliforme (L.) Sw. was cloned by new primers which were designed based on the conservative region of known sequences of orchid LEAFY gene. Partial LFY homologous gene was cloned by common PCR, then we got the complete LFY homologous gene Den LFY by Tail-PCR. The complete sequence of DenLFY gene was 3 575 bp which contained three exons and two introns. Using BLAST method, comparison analysis among the exon of LFY homologous gene indicted that the DenLFY gene had high identity with orchids LFY homologous, including the related fragment of PhalLFY (84%) in Phalaenopsis hybrid cultivar, LFY homologous gene in Oncidium (90%) and in other orchid (over 80%). Using MP analysis, Dendrobium is found to be the sister to Oncidium and Phalaenopsis. Homologous analysis demonstrated that the C-terminal amino acids were highly conserved. When the exons and introns were separately considered, exons and the sequence of amino acid were good markers for the function research of DenLFY gene. The second intron can be used in authentication research of Dendrobium based on the length polymorphism between Dendrobium moniliforme and Dendrobium officinale.

  2. oPOSSUM-3: Advanced Analysis of Regulatory Motif Over-Representation Across Genes or ChIP-Seq Datasets

    PubMed Central

    Kwon, Andrew T.; Arenillas, David J.; Hunt, Rebecca Worsley; Wasserman, Wyeth W.

    2012-01-01

    oPOSSUM-3 is a web-accessible software system for identification of over-represented transcription factor binding sites (TFBS) and TFBS families in either DNA sequences of co-expressed genes or sequences generated from high-throughput methods, such as ChIP-Seq. Validation of the system with known sets of co-regulated genes and published ChIP-Seq data demonstrates the capacity for oPOSSUM-3 to identify mediating transcription factors (TF) for co-regulated genes or co-recovered sequences. oPOSSUM-3 is available at http://opossum.cisreg.ca. PMID:22973536

  3. oPOSSUM-3: advanced analysis of regulatory motif over-representation across genes or ChIP-Seq datasets.

    PubMed

    Kwon, Andrew T; Arenillas, David J; Worsley Hunt, Rebecca; Wasserman, Wyeth W

    2012-09-01

    oPOSSUM-3 is a web-accessible software system for identification of over-represented transcription factor binding sites (TFBS) and TFBS families in either DNA sequences of co-expressed genes or sequences generated from high-throughput methods, such as ChIP-Seq. Validation of the system with known sets of co-regulated genes and published ChIP-Seq data demonstrates the capacity for oPOSSUM-3 to identify mediating transcription factors (TF) for co-regulated genes or co-recovered sequences. oPOSSUM-3 is available at http://opossum.cisreg.ca.

  4. Identification of a novel cyanobacterial group as active diazotrophs in a coastal microbial mat using NanoSIMS analysis

    DOE PAGES

    Woebken, Dagmar; Burow, Luke C.; Prufert-Bebout, Leslie; ...

    2012-01-12

    N 2 fixation is a key process in photosynthetic microbial mats to support the nitrogen demands associated with primary production. Despite its importance, groups that actively fix N 2 and contribute to the input of organic N in these ecosystems still remain largely unclear. To investigate the active diazotrophic community in microbial mats from the Elkhorn Slough estuary, Monterey Bay, CA, USA, we conducted an extensive combined approach, including biogeochemical, molecular and high-resolution secondary ion mass spectrometry (NanoSIMS) analyses. Detailed analysis of dinitrogenase reductase (nifH) transcript clone libraries from mat samples that fixed N 2 at night indicated that cyanobacterialmore » nifH transcripts were abundant and formed a novel monophyletic lineage. Independent NanoSIMS analysis of 15N2-incubated samples revealed significant incorporation of 15N into small, non-heterocystous cyanobacterial filaments. Mat-derived enrichment cultures yielded a unicyanobacterial culture with similar filaments (named Elkhorn Slough Filamentous Cyanobacterium-1 (ESFC-1)) that contained nifH gene sequences grouping with the novel cyanobacterial lineage identified in the transcript clone libraries, displaying up to 100% amino-acid sequence identity. The 16S rRNA gene sequence recovered from this enrichment allowed for the identification of related sequences from Elkhorn Slough mats and revealed great sequence diversity in this cluster. Furthermore, by combining 15N 2 tracer experiments, fluorescence in situ hybridization and NanoSIMS, in situ N 2 fixation activity by the novel ESFC-1 group was demonstrated, suggesting that this group may be the most active cyanobacterial diazotroph in the Elkhorn Slough mat. Pyrotag sequences affiliated with ESFC-1 were recovered from mat samples throughout 2009, demonstrating the prevalence of this group. Here, this work illustrates that combining standard and single-cell analyses can link phylogeny and function to identify previously unknown key functional groups in complex ecosystems.« less

  5. Heterogeneous Suppression of Sequential Effects in Random Sequence Generation, but Not in Operant Learning.

    PubMed

    Shteingart, Hanan; Loewenstein, Yonatan

    2016-01-01

    There is a long history of experiments in which participants are instructed to generate a long sequence of binary random numbers. The scope of this line of research has shifted over the years from identifying the basic psychological principles and/or the heuristics that lead to deviations from randomness, to one of predicting future choices. In this paper, we used generalized linear regression and the framework of Reinforcement Learning in order to address both points. In particular, we used logistic regression analysis in order to characterize the temporal sequence of participants' choices. Surprisingly, a population analysis indicated that the contribution of the most recent trial has only a weak effect on behavior, compared to more preceding trials, a result that seems irreconcilable with standard sequential effects that decay monotonously with the delay. However, when considering each participant separately, we found that the magnitudes of the sequential effect are a monotonous decreasing function of the delay, yet these individual sequential effects are largely averaged out in a population analysis because of heterogeneity. The substantial behavioral heterogeneity in this task is further demonstrated quantitatively by considering the predictive power of the model. We show that a heterogeneous model of sequential dependencies captures the structure available in random sequence generation. Finally, we show that the results of the logistic regression analysis can be interpreted in the framework of reinforcement learning, allowing us to compare the sequential effects in the random sequence generation task to those in an operant learning task. We show that in contrast to the random sequence generation task, sequential effects in operant learning are far more homogenous across the population. These results suggest that in the random sequence generation task, different participants adopt different cognitive strategies to suppress sequential dependencies when generating the "random" sequences.

  6. High-Throughput Sequencing, a Versatile Weapon to Support Genome-Based Diagnosis in Infectious Diseases: Applications to Clinical Bacteriology

    PubMed Central

    Caboche, Ségolène; Audebert, Christophe; Hot, David

    2014-01-01

    The recent progresses of high-throughput sequencing (HTS) technologies enable easy and cost-reduced access to whole genome sequencing (WGS) or re-sequencing. HTS associated with adapted, automatic and fast bioinformatics solutions for sequencing applications promises an accurate and timely identification and characterization of pathogenic agents. Many studies have demonstrated that data obtained from HTS analysis have allowed genome-based diagnosis, which has been consistent with phenotypic observations. These proofs of concept are probably the first steps toward the future of clinical microbiology. From concept to routine use, many parameters need to be considered to promote HTS as a powerful tool to help physicians and clinicians in microbiological investigations. This review highlights the milestones to be completed toward this purpose. PMID:25437800

  7. Winnowing DNA for Rare Sequences: Highly Specific Sequence and Methylation Based Enrichment

    PubMed Central

    Thompson, Jason D.; Shibahara, Gosuke; Rajan, Sweta; Pel, Joel; Marziali, Andre

    2012-01-01

    Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue. PMID:22355378

  8. Occurrence of a Sequence in Marine Cyanophages Similar to That of T4 g20 and Its Application to PCR-Based Detection and Quantification Techniques†

    PubMed Central

    Fuller, Nicholas J.; Wilson, William H.; Joint, Ian R.; Mann, Nicholas H.

    1998-01-01

    Viruses are ubiquitous components of marine ecosystems and are known to infect unicellular phycoerythrin-containing cyanobacteria belonging to the genus Synechococcus. A conserved region from the cyanophage genome was identified in three genetically distinct cyanomyoviruses, and a sequence analysis revealed that this region exhibited significant similarity to a gene encoding a capsid assembly protein (gp20) from the enteric coliphage T4. The results of a comparison of gene 20 sequences from three cyanomyoviruses and T4 allowed us to design two degenerate PCR primers, CPS1 and CPS2, which specifically amplified a 165-bp region from the majority of cyanomyoviruses tested. A competitive PCR (cPCR) analysis revealed that cyanomyovirus strains could be accurately enumerated, and it was demonstrated that quantification was log-linear over ca. 3 orders of magnitude. Different calibration curves were obtained for each of the three cyanomyovirus strains tested; consequently, cPCR performed with primers CPS1 and CPS2 could lead to substantial inaccuracies in estimates of phage abundance in natural assemblages. Further sequence analysis of cyanomyovirus gene 20 homologs would be necessary in order to design primers which do not exhibit phage-to-phage variability in priming efficiency. It was demonstrated that PCR products of the correct size could be amplified from seawater samples following 100× concentration and even directly without any prior concentration. Hence, the use of degenerate primers in PCR analyses of cyanophage populations should provide valuable data on the diversity of cyanophages in natural assemblages. Further optimization of procedures may ultimately lead to a sensitive assay which can be used to analyze natural cyanophage populations both quantitatively (by cPCR) and qualitatively following phylogenetic analysis of amplified products. PMID:9603813

  9. An investigation of fMRI time series stationarity during motor sequence learning foot tapping tasks.

    PubMed

    Muhei-aldin, Othman; VanSwearingen, Jessie; Karim, Helmet; Huppert, Theodore; Sparto, Patrick J; Erickson, Kirk I; Sejdić, Ervin

    2014-04-30

    Understanding complex brain networks using functional magnetic resonance imaging (fMRI) is of great interest to clinical and scientific communities. To utilize advanced analysis methods such as graph theory for these investigations, the stationarity of fMRI time series needs to be understood as it has important implications on the choice of appropriate approaches for the analysis of complex brain networks. In this paper, we investigated the stationarity of fMRI time series acquired from twelve healthy participants while they performed a motor (foot tapping sequence) learning task. Since prior studies have documented that learning is associated with systematic changes in brain activation, a sequence learning task is an optimal paradigm to assess the degree of non-stationarity in fMRI time-series in clinically relevant brain areas. We predicted that brain regions involved in a "learning network" would demonstrate non-stationarity and may violate assumptions associated with some advanced analysis approaches. Six blocks of learning, and six control blocks of a foot tapping sequence were performed in a fixed order. The reverse arrangement test was utilized to investigate the time series stationarity. Our analysis showed some non-stationary signals with a time varying first moment as a major source of non-stationarity. We also demonstrated a decreased number of non-stationarities in the third block as a result of priming and repetition. Most of the current literature does not examine stationarity prior to processing. The implication of our findings is that future investigations analyzing complex brain networks should utilize approaches robust to non-stationarities, as graph-theoretical approaches can be sensitive to non-stationarities present in data. Copyright © 2014 Elsevier B.V. All rights reserved.

  10. The siRNA Non-seed Region and Its Target Sequences Are Auxiliary Determinants of Off-Target Effects.

    PubMed

    Kamola, Piotr J; Nakano, Yuko; Takahashi, Tomoko; Wilson, Paul A; Ui-Tei, Kumiko

    2015-12-01

    RNA interference (RNAi) is a powerful tool for post-transcriptional gene silencing. However, the siRNA guide strand may bind unintended off-target transcripts via partial sequence complementarity by a mechanism closely mirroring micro RNA (miRNA) silencing. To better understand these off-target effects, we investigated the correlation between sequence features within various subsections of siRNA guide strands, and its corresponding target sequences, with off-target activities. Our results confirm previous reports that strength of base-pairing in the siRNA seed region is the primary factor determining the efficiency of off-target silencing. However, the degree of downregulation of off-target transcripts with shared seed sequence is not necessarily similar, suggesting that there are additional auxiliary factors that influence the silencing potential. Here, we demonstrate that both the melting temperature (Tm) in a subsection of siRNA non-seed region, and the GC contents of its corresponding target sequences, are negatively correlated with the efficiency of off-target effect. Analysis of experimentally validated miRNA targets demonstrated a similar trend, indicating a putative conserved mechanistic feature of seed region-dependent targeting mechanism. These observations may prove useful as parameters for off-target prediction algorithms and improve siRNA 'specificity' design rules.

  11. shRNA target prediction informed by comprehensive enquiry (SPICE): a supporting system for high-throughput screening of shRNA library.

    PubMed

    Kamatuka, Kenta; Hattori, Masahiro; Sugiyama, Tomoyasu

    2016-12-01

    RNA interference (RNAi) screening is extensively used in the field of reverse genetics. RNAi libraries constructed using random oligonucleotides have made this technology affordable. However, the new methodology requires exploration of the RNAi target gene information after screening because the RNAi library includes non-natural sequences that are not found in genes. Here, we developed a web-based tool to support RNAi screening. The system performs short hairpin RNA (shRNA) target prediction that is informed by comprehensive enquiry (SPICE). SPICE automates several tasks that are laborious but indispensable to evaluate the shRNAs obtained by RNAi screening. SPICE has four main functions: (i) sequence identification of shRNA in the input sequence (the sequence might be obtained by sequencing clones in the RNAi library), (ii) searching the target genes in the database, (iii) demonstrating biological information obtained from the database, and (iv) preparation of search result files that can be utilized in a local personal computer (PC). Using this system, we demonstrated that genes targeted by random oligonucleotide-derived shRNAs were not different from those targeted by organism-specific shRNA. The system facilitates RNAi screening, which requires sequence analysis after screening. The SPICE web application is available at http://www.spice.sugysun.org/.

  12. Alterations of microbiota in urine from women with interstitial cystitis

    PubMed Central

    2012-01-01

    Background Interstitial Cystitis (IC) is a chronic inflammatory condition of the bladder with unknown etiology. The aim of this study was to characterize the microbial community present in the urine from IC female patients by 454 high throughput sequencing of the 16S variable regions V1V2 and V6. The taxonomical composition, richness and diversity of the IC microbiota were determined and compared to the microbial profile of asymptomatic healthy female (HF) urine. Results The composition and distribution of bacterial sequences differed between the urine microbiota of IC patients and HFs. Reduced sequence richness and diversity were found in IC patient urine, and a significant difference in the community structure of IC urine in relation to HF urine was observed. More than 90% of the IC sequence reads were identified as belonging to the bacterial genus Lactobacillus, a marked increase compared to 60% in HF urine. Conclusion The 16S rDNA sequence data demonstrates a shift in the composition of the bacterial community in IC urine. The reduced microbial diversity and richness is accompanied by a higher abundance of the bacterial genus Lactobacillus, compared to HF urine. This study demonstrates that high throughput sequencing analysis of urine microbiota in IC patients is a powerful tool towards a better understanding of this enigmatic disease. PMID:22974186

  13. To Clone or Not To Clone: Method Analysis for Retrieving Consensus Sequences In Ancient DNA Samples

    PubMed Central

    Winters, Misa; Barta, Jodi Lynn; Monroe, Cara; Kemp, Brian M.

    2011-01-01

    The challenges associated with the retrieval and authentication of ancient DNA (aDNA) evidence are principally due to post-mortem damage which makes ancient samples particularly prone to contamination from “modern” DNA sources. The necessity for authentication of results has led many aDNA researchers to adopt methods considered to be “gold standards” in the field, including cloning aDNA amplicons as opposed to directly sequencing them. However, no standardized protocol has emerged regarding the necessary number of clones to sequence, how a consensus sequence is most appropriately derived, or how results should be reported in the literature. In addition, there has been no systematic demonstration of the degree to which direct sequences are affected by damage or whether direct sequencing would provide disparate results from a consensus of clones. To address this issue, a comparative study was designed to examine both cloned and direct sequences amplified from ∼3,500 year-old ancient northern fur seal DNA extracts. Majority rules and the Consensus Confidence Program were used to generate consensus sequences for each individual from the cloned sequences, which exhibited damage at 31 of 139 base pairs across all clones. In no instance did the consensus of clones differ from the direct sequence. This study demonstrates that, when appropriate, cloning need not be the default method, but instead, should be used as a measure of authentication on a case-by-case basis, especially when this practice adds time and cost to studies where it may be superfluous. PMID:21738625

  14. Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation.

    PubMed

    Hara, Yuichiro; Tatsumi, Kaori; Yoshida, Michio; Kajikawa, Eriko; Kiyonari, Hiroshi; Kuraku, Shigehiro

    2015-11-18

    RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as well as whole genome analyses.

  15. Elements in the transcriptional regulatory region flanking herpes simplex virus type 1 oriS stimulate origin function.

    PubMed

    Wong, S W; Schaffer, P A

    1991-05-01

    Like other DNA-containing viruses, the three origins of herpes simplex virus type 1 (HSV-1) DNA replication are flanked by sequences containing transcriptional regulatory elements. In a transient plasmid replication assay, deletion of sequences comprising the transcriptional regulatory elements of ICP4 and ICP22/47, which flank oriS, resulted in a greater than 80-fold decrease in origin function compared with a plasmid, pOS-822, which retains these sequences. In an effort to identify specific cis-acting elements responsible for this effect, we conducted systematic deletion analysis of the flanking region with plasmid pOS-822 and tested the resulting mutant plasmids for origin function. Stimulation by cis-acting elements was shown to be both distance and orientation dependent, as changes in either parameter resulted in a decrease in oriS function. Additional evidence for the stimulatory effect of flanking sequences on origin function was demonstrated by replacement of these sequences with the cytomegalovirus immediate-early promoter, resulting in nearly wild-type levels of oriS function. In competition experiments, cotransfection of cells with the test plasmid, pOS-822, and increasing molar concentrations of a competitor plasmid which contained the ICP4 and ICP22/47 transcriptional regulatory regions but lacked core origin sequences resulted in a significant reduction in the replication efficiency of pOS-822, demonstrating that factors which bind specifically to the oriS-flanking sequences are likely involved as auxiliary proteins in oriS function. Together, these studies demonstrate that trans-acting factors and the sites to which they bind play a critical role in the efficiency of HSV-1 DNA replication from oriS in transient-replication assays.

  16. Genome-wide gene–gene interaction analysis for next-generation sequencing

    PubMed Central

    Zhao, Jinying; Zhu, Yun; Xiong, Momiao

    2016-01-01

    The critical barrier in interaction analysis for next-generation sequencing (NGS) data is that the traditional pairwise interaction analysis that is suitable for common variants is difficult to apply to rare variants because of their prohibitive computational time, large number of tests and low power. The great challenges for successful detection of interactions with NGS data are (1) the demands in the paradigm of changes in interaction analysis; (2) severe multiple testing; and (3) heavy computations. To meet these challenges, we shift the paradigm of interaction analysis between two SNPs to interaction analysis between two genomic regions. In other words, we take a gene as a unit of analysis and use functional data analysis techniques as dimensional reduction tools to develop a novel statistic to collectively test interaction between all possible pairs of SNPs within two genome regions. By intensive simulations, we demonstrate that the functional logistic regression for interaction analysis has the correct type 1 error rates and higher power to detect interaction than the currently used methods. The proposed method was applied to a coronary artery disease dataset from the Wellcome Trust Case Control Consortium (WTCCC) study and the Framingham Heart Study (FHS) dataset, and the early-onset myocardial infarction (EOMI) exome sequence datasets with European origin from the NHLBI's Exome Sequencing Project. We discovered that 6 of 27 pairs of significantly interacted genes in the FHS were replicated in the independent WTCCC study and 24 pairs of significantly interacted genes after applying Bonferroni correction in the EOMI study. PMID:26173972

  17. Enabling large-scale next-generation sequence assembly with Blacklight

    PubMed Central

    Couger, M. Brian; Pipes, Lenore; Squina, Fabio; Prade, Rolf; Siepel, Adam; Palermo, Robert; Katze, Michael G.; Mason, Christopher E.; Blood, Philip D.

    2014-01-01

    Summary A variety of extremely challenging biological sequence analyses were conducted on the XSEDE large shared memory resource Blacklight, using current bioinformatics tools and encompassing a wide range of scientific applications. These include genomic sequence assembly, very large metagenomic sequence assembly, transcriptome assembly, and sequencing error correction. The data sets used in these analyses included uncategorized fungal species, reference microbial data, very large soil and human gut microbiome sequence data, and primate transcriptomes, composed of both short-read and long-read sequence data. A new parallel command execution program was developed on the Blacklight resource to handle some of these analyses. These results, initially reported previously at XSEDE13 and expanded here, represent significant advances for their respective scientific communities. The breadth and depth of the results achieved demonstrate the ease of use, versatility, and unique capabilities of the Blacklight XSEDE resource for scientific analysis of genomic and transcriptomic sequence data, and the power of these resources, together with XSEDE support, in meeting the most challenging scientific problems. PMID:25294974

  18. Sequence verification of synthetic DNA by assembly of sequencing reads

    PubMed Central

    Wilson, Mandy L.; Cai, Yizhi; Hanlon, Regina; Taylor, Samantha; Chevreux, Bastien; Setubal, João C.; Tyler, Brett M.; Peccoud, Jean

    2013-01-01

    Gene synthesis attempts to assemble user-defined DNA sequences with base-level precision. Verifying the sequences of construction intermediates and the final product of a gene synthesis project is a critical part of the workflow, yet one that has received the least attention. Sequence validation is equally important for other kinds of curated clone collections. Ensuring that the physical sequence of a clone matches its published sequence is a common quality control step performed at least once over the course of a research project. GenoREAD is a web-based application that breaks the sequence verification process into two steps: the assembly of sequencing reads and the alignment of the resulting contig with a reference sequence. GenoREAD can determine if a clone matches its reference sequence. Its sophisticated reporting features help identify and troubleshoot problems that arise during the sequence verification process. GenoREAD has been experimentally validated on thousands of gene-sized constructs from an ORFeome project, and on longer sequences including whole plasmids and synthetic chromosomes. Comparing GenoREAD results with those from manual analysis of the sequencing data demonstrates that GenoREAD tends to be conservative in its diagnostic. GenoREAD is available at www.genoread.org. PMID:23042248

  19. Repair of DNA double-strand breaks by templated nucleotide sequence insertions derived from distant regions of the genome.

    PubMed

    Onozawa, Masahiro; Zhang, Zhenhua; Kim, Yoo Jung; Goldberg, Liat; Varga, Tamas; Bergsagel, P Leif; Kuehl, W Michael; Aplan, Peter D

    2014-05-27

    We used the I-SceI endonuclease to produce DNA double-strand breaks (DSBs) and observed that a fraction of these DSBs were repaired by insertion of sequences, which we termed "templated sequence insertions" (TSIs), derived from distant regions of the genome. These TSIs were derived from genic, retrotransposon, or telomere sequences and were not deleted from the donor site in the genome, leading to the hypothesis that they were derived from reverse-transcribed RNA. Cotransfection of RNA and an I-SceI expression vector demonstrated insertion of RNA-derived sequences at the DNA-DSB site, and TSIs were suppressed by reverse-transcriptase inhibitors. Both observations support the hypothesis that TSIs were derived from RNA templates. In addition, similar insertions were detected at sites of DNA DSBs induced by transcription activator-like effector nuclease proteins. Whole-genome sequencing of myeloma cell lines revealed additional TSIs, demonstrating that repair of DNA DSBs via insertion was not restricted to experimentally produced DNA DSBs. Analysis of publicly available databases revealed that many of these TSIs are polymorphic in the human genome. Taken together, these results indicate that insertional events should be considered as alternatives to gross chromosomal rearrangements in the interpretation of whole-genome sequence data and that this mutagenic form of DNA repair may play a role in genetic disease, exon shuffling, and mammalian evolution.

  20. Validation of Metagenomic Next-Generation Sequencing Tests for Universal Pathogen Detection.

    PubMed

    Schlaberg, Robert; Chiu, Charles Y; Miller, Steve; Procop, Gary W; Weinstock, George

    2017-06-01

    - Metagenomic sequencing can be used for detection of any pathogens using unbiased, shotgun next-generation sequencing (NGS), without the need for sequence-specific amplification. Proof-of-concept has been demonstrated in infectious disease outbreaks of unknown causes and in patients with suspected infections but negative results for conventional tests. Metagenomic NGS tests hold great promise to improve infectious disease diagnostics, especially in immunocompromised and critically ill patients. - To discuss challenges and provide example solutions for validating metagenomic pathogen detection tests in clinical laboratories. A summary of current regulatory requirements, largely based on prior guidance for NGS testing in constitutional genetics and oncology, is provided. - Examples from 2 separate validation studies are provided for steps from assay design, and validation of wet bench and bioinformatics protocols, to quality control and assurance. - Although laboratory and data analysis workflows are still complex, metagenomic NGS tests for infectious diseases are increasingly being validated in clinical laboratories. Many parallels exist to NGS tests in other fields. Nevertheless, specimen preparation, rapidly evolving data analysis algorithms, and incomplete reference sequence databases are idiosyncratic to the field of microbiology and often overlooked.

  1. RNA-Seq Alignment to Individualized Genomes Improves Transcript Abundance Estimates in Multiparent Populations

    PubMed Central

    Munger, Steven C.; Raghupathy, Narayanan; Choi, Kwangbom; Simons, Allen K.; Gatti, Daniel M.; Hinerfeld, Douglas A.; Svenson, Karen L.; Keller, Mark P.; Attie, Alan D.; Hibbs, Matthew A.; Graber, Joel H.; Chesler, Elissa J.; Churchill, Gary A.

    2014-01-01

    Massively parallel RNA sequencing (RNA-seq) has yielded a wealth of new insights into transcriptional regulation. A first step in the analysis of RNA-seq data is the alignment of short sequence reads to a common reference genome or transcriptome. Genetic variants that distinguish individual genomes from the reference sequence can cause reads to be misaligned, resulting in biased estimates of transcript abundance. Fine-tuning of read alignment algorithms does not correct this problem. We have developed Seqnature software to construct individualized diploid genomes and transcriptomes for multiparent populations and have implemented a complete analysis pipeline that incorporates other existing software tools. We demonstrate in simulated and real data sets that alignment to individualized transcriptomes increases read mapping accuracy, improves estimation of transcript abundance, and enables the direct estimation of allele-specific expression. Moreover, when applied to expression QTL mapping we find that our individualized alignment strategy corrects false-positive linkage signals and unmasks hidden associations. We recommend the use of individualized diploid genomes over reference sequence alignment for all applications of high-throughput sequencing technology in genetically diverse populations. PMID:25236449

  2. Molecular Characterization of Transgenic Events Using Next Generation Sequencing Approach.

    PubMed

    Guttikonda, Satish K; Marri, Pradeep; Mammadov, Jafar; Ye, Liang; Soe, Khaing; Richey, Kimberly; Cruse, James; Zhuang, Meibao; Gao, Zhifang; Evans, Clive; Rounsley, Steve; Kumpatla, Siva P

    2016-01-01

    Demand for the commercial use of genetically modified (GM) crops has been increasing in light of the projected growth of world population to nine billion by 2050. A prerequisite of paramount importance for regulatory submissions is the rigorous safety assessment of GM crops. One of the components of safety assessment is molecular characterization at DNA level which helps to determine the copy number, integrity and stability of a transgene; characterize the integration site within a host genome; and confirm the absence of vector DNA. Historically, molecular characterization has been carried out using Southern blot analysis coupled with Sanger sequencing. While this is a robust approach to characterize the transgenic crops, it is both time- and resource-consuming. The emergence of next-generation sequencing (NGS) technologies has provided highly sensitive and cost- and labor-effective alternative for molecular characterization compared to traditional Southern blot analysis. Herein, we have demonstrated the successful application of both whole genome sequencing and target capture sequencing approaches for the characterization of single and stacked transgenic events and compared the results and inferences with traditional method with respect to key criteria required for regulatory submissions.

  3. Influenza virus sequence feature variant type analysis: evidence of a role for NS1 in influenza virus host range restriction.

    PubMed

    Noronha, Jyothi M; Liu, Mengya; Squires, R Burke; Pickett, Brett E; Hale, Benjamin G; Air, Gillian M; Galloway, Summer E; Takimoto, Toru; Schmolke, Mirco; Hunt, Victoria; Klem, Edward; García-Sastre, Adolfo; McGee, Monnie; Scheuermann, Richard H

    2012-05-01

    Genetic drift of influenza virus genomic sequences occurs through the combined effects of sequence alterations introduced by a low-fidelity polymerase and the varying selective pressures experienced as the virus migrates through different host environments. While traditional phylogenetic analysis is useful in tracking the evolutionary heritage of these viruses, the specific genetic determinants that dictate important phenotypic characteristics are often difficult to discern within the complex genetic background arising through evolution. Here we describe a novel influenza virus sequence feature variant type (Flu-SFVT) approach, made available through the public Influenza Research Database resource (www.fludb.org), in which variant types (VTs) identified in defined influenza virus protein sequence features (SFs) are used for genotype-phenotype association studies. Since SFs have been defined for all influenza virus proteins based on known structural, functional, and immune epitope recognition properties, the Flu-SFVT approach allows the rapid identification of the molecular genetic determinants of important influenza virus characteristics and their connection to underlying biological functions. We demonstrate the use of the SFVT approach to obtain statistical evidence for effects of NS1 protein sequence variations in dictating influenza virus host range restriction.

  4. Analysis of Sequence Data Under Multivariate Trait-Dependent Sampling.

    PubMed

    Tao, Ran; Zeng, Donglin; Franceschini, Nora; North, Kari E; Boerwinkle, Eric; Lin, Dan-Yu

    2015-06-01

    High-throughput DNA sequencing allows for the genotyping of common and rare variants for genetic association studies. At the present time and for the foreseeable future, it is not economically feasible to sequence all individuals in a large cohort. A cost-effective strategy is to sequence those individuals with extreme values of a quantitative trait. We consider the design under which the sampling depends on multiple quantitative traits. Under such trait-dependent sampling, standard linear regression analysis can result in bias of parameter estimation, inflation of type I error, and loss of power. We construct a likelihood function that properly reflects the sampling mechanism and utilizes all available data. We implement a computationally efficient EM algorithm and establish the theoretical properties of the resulting maximum likelihood estimators. Our methods can be used to perform separate inference on each trait or simultaneous inference on multiple traits. We pay special attention to gene-level association tests for rare variants. We demonstrate the superiority of the proposed methods over standard linear regression through extensive simulation studies. We provide applications to the Cohorts for Heart and Aging Research in Genomic Epidemiology Targeted Sequencing Study and the National Heart, Lung, and Blood Institute Exome Sequencing Project.

  5. Large-scale analysis of full-length cDNAs from the tomato (Solanum lycopersicum) cultivar Micro-Tom, a reference system for the Solanaceae genomics.

    PubMed

    Aoki, Koh; Yano, Kentaro; Suzuki, Ayako; Kawamura, Shingo; Sakurai, Nozomu; Suda, Kunihiro; Kurabayashi, Atsushi; Suzuki, Tatsuya; Tsugane, Taneaki; Watanabe, Manabu; Ooga, Kazuhide; Torii, Maiko; Narita, Takanori; Shin-I, Tadasu; Kohara, Yuji; Yamamoto, Naoki; Takahashi, Hideki; Watanabe, Yuichiro; Egusa, Mayumi; Kodama, Motoichiro; Ichinose, Yuki; Kikuchi, Mari; Fukushima, Sumire; Okabe, Akiko; Arie, Tsutomu; Sato, Yuko; Yazawa, Katsumi; Satoh, Shinobu; Omura, Toshikazu; Ezura, Hiroshi; Shibata, Daisuke

    2010-03-30

    The Solanaceae family includes several economically important vegetable crops. The tomato (Solanum lycopersicum) is regarded as a model plant of the Solanaceae family. Recently, a number of tomato resources have been developed in parallel with the ongoing tomato genome sequencing project. In particular, a miniature cultivar, Micro-Tom, is regarded as a model system in tomato genomics, and a number of genomics resources in the Micro-Tom-background, such as ESTs and mutagenized lines, have been established by an international alliance. To accelerate the progress in tomato genomics, we developed a collection of fully-sequenced 13,227 Micro-Tom full-length cDNAs. By checking redundant sequences, coding sequences, and chimeric sequences, a set of 11,502 non-redundant full-length cDNAs (nrFLcDNAs) was generated. Analysis of untranslated regions demonstrated that tomato has longer 5'- and 3'-untranslated regions than most other plants but rice. Classification of functions of proteins predicted from the coding sequences demonstrated that nrFLcDNAs covered a broad range of functions. A comparison of nrFLcDNAs with genes of sixteen plants facilitated the identification of tomato genes that are not found in other plants, most of which did not have known protein domains. Mapping of the nrFLcDNAs onto currently available tomato genome sequences facilitated prediction of exon-intron structure. Introns of tomato genes were longer than those of Arabidopsis and rice. According to a comparison of exon sequences between the nrFLcDNAs and the tomato genome sequences, the frequency of nucleotide mismatch in exons between Micro-Tom and the genome-sequencing cultivar (Heinz 1706) was estimated to be 0.061%. The collection of Micro-Tom nrFLcDNAs generated in this study will serve as a valuable genomic tool for plant biologists to bridge the gap between basic and applied studies. The nrFLcDNA sequences will help annotation of the tomato whole-genome sequence and aid in tomato functional genomics and molecular breeding. Full-length cDNA sequences and their annotations are provided in the database KaFTom http://www.pgb.kazusa.or.jp/kaftom/ via the website of the National Bioresource Project Tomato http://tomato.nbrp.jp.

  6. Using high throughput sequencing to explore the biodiversity in oral bacterial communities.

    PubMed

    Diaz, P I; Dupuy, A K; Abusleme, L; Reese, B; Obergfell, C; Choquette, L; Dongari-Bagtzoglou, A; Peterson, D E; Terzi, E; Strausbaugh, L D

    2012-06-01

    High throughput sequencing of 16S ribosomal RNA gene amplicons is a cost-effective method for characterization of oral bacterial communities. However, before undertaking large-scale studies, it is necessary to understand the technique-associated limitations and intrinsic variability of the oral ecosystem. In this work we evaluated bias in species representation using an in vitro-assembled mock community of oral bacteria. We then characterized the bacterial communities in saliva and buccal mucosa of five healthy subjects to investigate the power of high throughput sequencing in revealing their diversity and biogeography patterns. Mock community analysis showed primer and DNA isolation biases and an overestimation of diversity that was reduced after eliminating singleton operational taxonomic units (OTUs). Sequencing of salivary and mucosal communities found a total of 455 OTUs (0.3% dissimilarity) with only 78 of these present in all subjects. We demonstrate that this variability was partly the result of incomplete richness coverage even at great sequencing depths, and so comparing communities by their structure was more effective than comparisons based solely on membership. With respect to oral biogeography, we found inter-subject variability in community structure was lower than site differences between salivary and mucosal communities within subjects. These differences were evident at very low sequencing depths and were mostly caused by the abundance of Streptococcus mitis and Gemella haemolysans in mucosa. In summary, we present an experimental and data analysis framework that will facilitate design and interpretation of pyrosequencing-based studies. Despite challenges associated with this technique, we demonstrate its power for evaluation of oral diversity and biogeography patterns. © 2012 John Wiley & Sons A/S.

  7. A novel ultra high-throughput 16S rRNA gene amplicon sequencing library preparation method for the Illumina HiSeq platform.

    PubMed

    de Muinck, Eric J; Trosvik, Pål; Gilfillan, Gregor D; Hov, Johannes R; Sundaram, Arvind Y M

    2017-07-06

    Advances in sequencing technologies and bioinformatics have made the analysis of microbial communities almost routine. Nonetheless, the need remains to improve on the techniques used for gathering such data, including increasing throughput while lowering cost and benchmarking the techniques so that potential sources of bias can be better characterized. We present a triple-index amplicon sequencing strategy to sequence large numbers of samples at significantly lower c ost and in a shorter timeframe compared to existing methods. The design employs a two-stage PCR protocol, incorpo rating three barcodes to each sample, with the possibility to add a fourth-index. It also includes heterogeneity spacers to overcome low complexity issues faced when sequencing amplicons on Illumina platforms. The library preparation method was extensively benchmarked through analysis of a mock community in order to assess biases introduced by sample indexing, number of PCR cycles, and template concentration. We further evaluated the method through re-sequencing of a standardized environmental sample. Finally, we evaluated our protocol on a set of fecal samples from a small cohort of healthy adults, demonstrating good performance in a realistic experimental setting. Between-sample variation was mainly related to batch effects, such as DNA extraction, while sample indexing was also a significant source of bias. PCR cycle number strongly influenced chimera formation and affected relative abundance estimates of species with high GC content. Libraries were sequenced using the Illumina HiSeq and MiSeq platforms to demonstrate that this protocol is highly scalable to sequence thousands of samples at a very low cost. Here, we provide the most comprehensive study of performance and bias inherent to a 16S rRNA gene amplicon sequencing method to date. Triple-indexing greatly reduces the number of long custom DNA oligos required for library preparation, while the inclusion of variable length heterogeneity spacers minimizes the need for PhiX spike-in. This design results in a significant cost reduction of highly multiplexed amplicon sequencing. The biases we characterize highlight the need for highly standardized protocols. Reassuringly, we find that the biological signal is a far stronger structuring factor than the various sources of bias.

  8. rpoB gene mutations among Mycobacterium tuberculosis isolates from extrapulmonary sites.

    PubMed

    Khosravi, Azar Dokht; Meghdadi, Hossein; Ghadiri, Ata A; Alami, Ameneh; Sina, Amir Hossein; Mirsaeidi, Mehdi

    2018-03-01

    The aim of this study was to analyze mutations occurring in the rpoB gene of Mycobacterium tuberculosis (MTB) isolates from clinical samples of extrapulmonary tuberculosis (EPTB). Seventy formalin-fixed, paraffin-embedded samples and fresh tissue samples from confirmed EPTB cases were analyzed. Nested PCR based on the rpoB gene was performed on the extracted DNAs, combined with cloning and subsequent sequencing. Sixty-seven (95.7%) samples were positive for nester PCR. Sequence analysis of the 81 bp region of the rpoB gene demonstrated mutations in 41 (61.2%) of 67 sequenced samples. Several point mutations including deletion mutations at codons 510, 512, 513 and 515, with 45% and 51% of the mutations in codons 512 and 513 respectively were seen, along with 26% replacement mutations at codons 509, 513, 514, 518, 520, 524 and 531. The most common alteration was Gln → His, at codon 513, presented in 30 (75.6%) isolates. This study demonstrated sequence alterations in codon 513 of the 81 bp region of the rpoB gene as the most common mutation occurred in 75.6% of molecularly confirmed rifampin-resistant strains. In addition, simultaneous mutation at codons 512 and 513 was demonstrated in 34.3% of the isolates. © 2018 APMIS. Published by John Wiley & Sons Ltd.

  9. Sequence analysis of the pyruvylated galactan sulfate-derived oligosaccharides by negative-ion electrospray tandem mass spectrometry.

    PubMed

    Li, Na; Mao, Wenjun; Liu, Xue; Wang, Shuyao; Xia, Zheng; Cao, Sujian; Li, Lin; Zhang, Qi; Liu, Shan

    2016-10-04

    Five sulfated oligosaccharide fragments, F1-F5, were prepared from a pyruvylated galactan sulfate from the green alga Codium divaricatum, by partial depolymerization using mild acid hydrolysis and purification with gel-permeation chromatography. Negative-ion electrospray tandem mass spectrometry with collision-induced dissociation (ES-CID-MS/MS) is attempted for sequence determination of the sulfated oligosaccharides. The sequence of F1 with homogeneous disaccharide composition was first characterized to be Galp-(4SO4)-(1 → 3)-Galp by detailed nuclear magnetic resonance spectroscopic analyses. The fragmentation pattern of F1 in the product ion spectra was established on the basis of negative-ion ES-CID MS/MS, which was then applied to sequence analysis of other sulfated oligosaccharides. The sequences of F2 and F3 were deduced to be Galp-(4SO4)-(1 → 3)-Galp-(1 → 3)-Galp-(1 → 3)-Galp and 3,4-O-(1-carboxyethylidene)-Galp-(6SO4)-(1 → 3)-Galp, respectively. The sequences of major fragments in F4 and F5 were also deduced. The investigation demonstrated that negative-ion ES-CID-MS/MS was an efficient method for the sequence analysis of the pyruvylated galactan sulfate-derived oligosaccharides which revealed the patterns of substitution and glycosidic linkages. The pyruvylated galactan sulfate-derived oligosaccharides were novel sulfated oligosaccharides different from other algal polysaccharide-derived oligosaccharides. Copyright © 2016 Elsevier Ltd. All rights reserved.

  10. Analysis of the whole mitochondrial genome: translation of the Ion Torrent Personal Genome Machine system to the diagnostic bench?

    PubMed

    Seneca, Sara; Vancampenhout, Kim; Van Coster, Rudy; Smet, Joél; Lissens, Willy; Vanlander, Arnaud; De Paepe, Boel; Jonckheere, An; Stouffs, Katrien; De Meirleir, Linda

    2015-01-01

    Next-generation sequencing (NGS), an innovative sequencing technology that enables the successful analysis of numerous gene sequences in a massive parallel sequencing approach, has revolutionized the field of molecular biology. Although NGS was introduced in a rather recent past, the technology has already demonstrated its potential and effectiveness in many research projects, and is now on the verge of being introduced into the diagnostic setting of routine laboratories to delineate the molecular basis of genetic disease in undiagnosed patient samples. We tested a benchtop device on retrospective genomic DNA (gDNA) samples of controls and patients with a clinical suspicion of a mitochondrial DNA disorder. This Ion Torrent Personal Genome Machine platform is a high-throughput sequencer with a fast turnaround time and reasonable running costs. We challenged the chemistry and technology with the analysis and processing of a mutational spectrum composed of samples with single-nucleotide substitutions, indels (insertions and deletions) and large single or multiple deletions, occasionally in heteroplasmy. The output data were compared with previously obtained conventional dideoxy sequencing results and the mitochondrial revised Cambridge Reference Sequence (rCRS). We were able to identify the majority of all nucleotide alterations, but three false-negative results were also encountered in the data set. At the same time, the poor performance of the PGM instrument in regions associated with homopolymeric stretches generated many false-positive miscalls demanding additional manual curation of the data.

  11. Masking as an effective quality control method for next-generation sequencing data analysis.

    PubMed

    Yun, Sajung; Yun, Sijung

    2014-12-13

    Next generation sequencing produces base calls with low quality scores that can affect the accuracy of identifying simple nucleotide variation calls, including single nucleotide polymorphisms and small insertions and deletions. Here we compare the effectiveness of two data preprocessing methods, masking and trimming, and the accuracy of simple nucleotide variation calls on whole-genome sequence data from Caenorhabditis elegans. Masking substitutes low quality base calls with 'N's (undetermined bases), whereas trimming removes low quality bases that results in a shorter read lengths. We demonstrate that masking is more effective than trimming in reducing the false-positive rate in single nucleotide polymorphism (SNP) calling. However, both of the preprocessing methods did not affect the false-negative rate in SNP calling with statistical significance compared to the data analysis without preprocessing. False-positive rate and false-negative rate for small insertions and deletions did not show differences between masking and trimming. We recommend masking over trimming as a more effective preprocessing method for next generation sequencing data analysis since masking reduces the false-positive rate in SNP calling without sacrificing the false-negative rate although trimming is more commonly used currently in the field. The perl script for masking is available at http://code.google.com/p/subn/. The sequencing data used in the study were deposited in the Sequence Read Archive (SRX450968 and SRX451773).

  12. Microwave-assisted acid and base hydrolysis of intact proteins containing disulfide bonds for protein sequence analysis by mass spectrometry.

    PubMed

    Reiz, Bela; Li, Liang

    2010-09-01

    Controlled hydrolysis of proteins to generate peptide ladders combined with mass spectrometric analysis of the resultant peptides can be used for protein sequencing. In this paper, two methods of improving the microwave-assisted protein hydrolysis process are described to enable rapid sequencing of proteins containing disulfide bonds and increase sequence coverage, respectively. It was demonstrated that proteins containing disulfide bonds could be sequenced by MS analysis by first performing hydrolysis for less than 2 min, followed by 1 h of reduction to release the peptides originally linked by disulfide bonds. It was shown that a strong base could be used as a catalyst for microwave-assisted protein hydrolysis, producing complementary sequence information to that generated by microwave-assisted acid hydrolysis. However, using either acid or base hydrolysis, amide bond breakages in small regions of the polypeptide chains of the model proteins (e.g., cytochrome c and lysozyme) were not detected. Dynamic light scattering measurement of the proteins solubilized in an acid or base indicated that protein-protein interaction or aggregation was not the cause of the failure to hydrolyze certain amide bonds. It was speculated that there were some unknown local structures that might play a role in preventing an acid or base from reacting with the peptide bonds therein. 2010 American Society for Mass Spectrometry. Published by Elsevier Inc. All rights reserved.

  13. Targeted DNA sequencing and in situ mutation analysis using mobile phone microscopy

    NASA Astrophysics Data System (ADS)

    Kühnemund, Malte; Wei, Qingshan; Darai, Evangelia; Wang, Yingjie; Hernández-Neuta, Iván; Yang, Zhao; Tseng, Derek; Ahlford, Annika; Mathot, Lucy; Sjöblom, Tobias; Ozcan, Aydogan; Nilsson, Mats

    2017-01-01

    Molecular diagnostics is typically outsourced to well-equipped centralized laboratories, often far from the patient. We developed molecular assays and portable optical imaging designs that permit on-site diagnostics with a cost-effective mobile-phone-based multimodal microscope. We demonstrate that targeted next-generation DNA sequencing reactions and in situ point mutation detection assays in preserved tumour samples can be imaged and analysed using mobile phone microscopy, achieving a new milestone for tele-medicine technologies.

  14. Phylogenetic shadowing of primate sequences to find functional regions of the human genome.

    PubMed

    Boffelli, Dario; McAuliffe, Jon; Ovcharenko, Dmitriy; Lewis, Keith D; Ovcharenko, Ivan; Pachter, Lior; Rubin, Edward M

    2003-02-28

    Nonhuman primates represent the most relevant model organisms to understand the biology of Homo sapiens. The recent divergence and associated overall sequence conservation between individual members of this taxon have nonetheless largely precluded the use of primates in comparative sequence studies. We used sequence comparisons of an extensive set of Old World and New World monkeys and hominoids to identify functional regions in the human genome. Analysis of these data enabled the discovery of primate-specific gene regulatory elements and the demarcation of the exons of multiple genes. Much of the information content of the comprehensive primate sequence comparisons could be captured with a small subset of phylogenetically close primates. These results demonstrate the utility of intraprimate sequence comparisons to discover common mammalian as well as primate-specific functional elements in the human genome, which are unattainable through the evaluation of more evolutionarily distant species.

  15. Low Maternal Microbiota Sharing across Gut, Breast Milk and Vagina, as Revealed by 16S rRNA Gene and Reduced Metagenomic Sequencing.

    PubMed

    Avershina, Ekaterina; Angell, Inga Leena; Simpson, Melanie; Storrø, Ola; Øien, Torbjørn; Johnsen, Roar; Rudi, Knut

    2018-05-01

    The maternal microbiota plays an important role in infant gut colonization. In this work we have investigated which bacterial species are shared across the breast milk, vaginal and stool microbiotas of 109 women shortly before and after giving birth using 16S rRNA gene sequencing and a novel reduced metagenomic sequencing (RMS) approach in a subgroup of 16 women. All the species predicted by the 16S rRNA gene sequencing were also detected by RMS analysis and there was good correspondence between their relative abundances estimated by both approaches. Both approaches also demonstrate a low level of maternal microbiota sharing across the population and RMS analysis identified only two species common to most women and in all sample types ( Bifidobacterium longum and Enterococcus faecalis ). Breast milk was the only sample type that had significantly higher intra- than inter- individual similarity towards both vaginal and stool samples. We also searched our RMS dataset against an in silico generated reference database derived from bacterial isolates in the Human Microbiome Project. The use of this reference-based search enabled further separation of Bifidobacterium longum into Bifidobacterium longum ssp. longum and Bifidobacterium longum ssp. infantis . We also detected the Lactobacillus rhamnosus GG strain, which was used as a probiotic supplement by some women, demonstrating the potential of RMS approach for deeper taxonomic delineation and estimation.

  16. Genetic variability in isolates of Chromobacterium violaceum from pulmonary secretion, water, and soil.

    PubMed

    Santini, A C; Magalhães, J T; Cascardo, J C M; Corrêa, R X

    2016-04-28

    Chromobacterium violaceum is a free-living Gram-negative bacillus usually found in the water and soil in tropical regions, which causes infections in humans. Chromobacteriosis is characterized by rapid dissemination and high mortality. The aim of this study was to detect the genetic variability among C. violaceum type strain ATCC 12472, and seven isolates from the environment and one from a pulmonary secretion from a chromobacteriosis patient from Ilhéus, Bahia. The molecular characterization of all samples was performed by polymerase chain reaction (PCR) sequencing and 16S rDNA analysis. Primers specific for two ATCC 12472 pathogenicity genes, hilA and yscD, as well as random amplified polymorphic DNA (RAPD), were used for PCR amplification and comparative sequencing of the products. For a more specific approach, the PCR products of 16S rDNA were digested with restriction enzymes. Seven of the samples, including type-strain ATCC 12472, were amplified by the hilA primers; these were subsequently sequenced. Gene yscD was amplified only in type-strain ATCC 12472. MspI and AluI digestion revealed 16S rDNA polymorphisms. This data allowed the generation of a dendogram for each analysis. The isolates of C. violaceum have variability in random genomic regions demonstrated by RAPD. Also, these isolates have variability in pathogenicity genes, as demonstrated by sequencing and restriction enzyme digestion.

  17. ViDiT-CACTUS: an inexpensive and versatile library preparation and sequence analysis method for virus discovery and other microbiology applications.

    PubMed

    Verhoeven, Joost Theo Petra; Canuti, Marta; Munro, Hannah J; Dufour, Suzanne C; Lang, Andrew S

    2018-04-19

    High-throughput sequencing (HTS) technologies are becoming increasingly important within microbiology research, but aspects of library preparation, such as high cost per sample or strict input requirements, make HTS difficult to implement in some niche applications and for research groups on a budget. To answer these necessities, we developed ViDiT, a customizable, PCR-based, extremely low-cost (<5 US dollars per sample) and versatile library preparation method, and CACTUS, an analysis pipeline designed to rely on cloud computing power to generate high-quality data from ViDiT-based experiments without the need of expensive servers. We demonstrate here the versatility and utility of these methods within three fields of microbiology: virus discovery, amplicon-based viral genome sequencing and microbiome profiling. ViDiT-CACTUS allowed the identification of viral fragments from 25 different viral families from 36 oropharyngeal-cloacal swabs collected from wild birds, the sequencing of three almost complete genomes of avian influenza A viruses (>90% coverage), and the characterization and functional profiling of the complete microbial diversity (bacteria, archaea, viruses) within a deep-sea carnivorous sponge. ViDiT-CACTUS demonstrated its validity in a wide range of microbiology applications and its simplicity and modularity make it easily implementable in any molecular biology laboratory, towards various research goals.

  18. Low Maternal Microbiota Sharing across Gut, Breast Milk and Vagina, as Revealed by 16S rRNA Gene and Reduced Metagenomic Sequencing

    PubMed Central

    Angell, Inga Leena; Storrø, Ola; Øien, Torbjørn; Johnsen, Roar; Rudi, Knut

    2018-01-01

    The maternal microbiota plays an important role in infant gut colonization. In this work we have investigated which bacterial species are shared across the breast milk, vaginal and stool microbiotas of 109 women shortly before and after giving birth using 16S rRNA gene sequencing and a novel reduced metagenomic sequencing (RMS) approach in a subgroup of 16 women. All the species predicted by the 16S rRNA gene sequencing were also detected by RMS analysis and there was good correspondence between their relative abundances estimated by both approaches. Both approaches also demonstrate a low level of maternal microbiota sharing across the population and RMS analysis identified only two species common to most women and in all sample types (Bifidobacterium longum and Enterococcus faecalis). Breast milk was the only sample type that had significantly higher intra- than inter- individual similarity towards both vaginal and stool samples. We also searched our RMS dataset against an in silico generated reference database derived from bacterial isolates in the Human Microbiome Project. The use of this reference-based search enabled further separation of Bifidobacterium longum into Bifidobacterium longum ssp. longum and Bifidobacterium longum ssp. infantis. We also detected the Lactobacillus rhamnosus GG strain, which was used as a probiotic supplement by some women, demonstrating the potential of RMS approach for deeper taxonomic delineation and estimation. PMID:29724017

  19. Building toy models of proteins using coevolutionary information

    NASA Astrophysics Data System (ADS)

    Cheng, Ryan; Raghunathan, Mohit; Onuchic, Jose

    2015-03-01

    Recent developments in global statistical methodologies have advanced the analysis of large collections of protein sequences for coevolutionary information. Coevolution between amino acids in a protein arises from compensatory mutations that are needed to maintain the stability or function of a protein over the course of evolution. This gives rise to quantifiable correlations between amino acid positions within the multiple sequence alignment of a protein family. Here, we use Direct Coupling Analysis (DCA) to infer a Potts model Hamiltonian governing the correlated mutations in a protein family to obtain the sequence-dependent interaction energies of a toy protein model. We demonstrate that this methodology predicts residue-residue interaction energies that are consistent with experimental mutational changes in protein stabilities as well as other computational methodologies. Furthermore, we demonstrate with several examples that DCA could be used to construct a structure-based model that quantitatively agrees with experimental data on folding mechanisms. This work serves as a potential framework for generating models of proteins that are enriched by evolutionary data that can potentially be used to engineer key functional motions and interactions in protein systems. This research has been supported by the NSF INSPIRE award MCB-1241332 and by the CTBP sponsored by the NSF (Grant PHY-1427654).

  20. Microbial diversity in ikaite tufa columns: an alkaline, cold ecological niche in Greenland.

    PubMed

    Stougaard, Peter; Jørgensen, Flemming; Johnsen, Mads G; Hansen, Ole C

    2002-08-01

    Ikaite tufa columns from the Ikka Fjord in south-western Greenland constitute a natural, stable environment at low temperature and with a pH ranging from neutral at the exterior to very alkaline (pH 10.4) at the interior of the column. Phylogenetic analysis of culturable organisms revealed ten different isolates representing three of the major bacterial divisions. Nine of the isolates showed 94-99% similarity to known sequences, whereas one isolate displayed a low degree of similarity (less than 90%) to a Cyclobacterium species. Seven of the isolates were shown to be cold active alkaliphiles, whereas three isolates showed optimal growth at neutral pH. Phylogenetic analysis of DNA isolated directly from the ikaite material demonstrated the presence of a microbial flora more diverse than the culturable isolates. Whereas approximately half of the phylotypes showed 90-99% similarity to known meso- or thermophilic alkaliphiles, the rest of the sequences displayed less than 90% similarity when compared to known 16S rRNA gene sequences in databases. Thus, in the present paper, we demonstrate that ikaite columns that host a specialized macroscopic flora and fauna also contain a unique, cold active, alkaliphilic microflora.

  1. The fast changing landscape of sequencing technologies and their impact on microbial genome assemblies and annotation.

    PubMed

    Mavromatis, Konstantinos; Land, Miriam L; Brettin, Thomas S; Quest, Daniel J; Copeland, Alex; Clum, Alicia; Goodwin, Lynne; Woyke, Tanja; Lapidus, Alla; Klenk, Hans Peter; Cottingham, Robert W; Kyrpides, Nikos C

    2012-01-01

    The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation. In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis. These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).

  2. Trans splicing in Leishmania enriettii and identification of ribonucleoprotein complexes containing the spliced leader and U2 equivalent RNAs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Miller, S.I.; Wirth, D.F.

    1988-06-01

    The 5' ends of Leishmania mRNAs contain an identical 35-nucleotide sequence termed the spliced leader (SL) or 5' mini-exon. The SL sequence is at the 5' end of an 85-nucleotide primary transcript that contains a consensus eucaryotic 5' intron-exon splice junction immediately 3' to the SL. The SL is added to protein-coding genes immediately 3' to a consensus eucaryotic 3' intron-exon splice junction. The authors' previous work demonstrated possible intermediates in discontinuous mRNA processing that contain the 50 nucleotides of the SL primary transcript 3' to the SL, the SL intron sequence (SLIS). These RNAs have a 5' terminus atmore » the splice junction of the SL and the SLIS. The authors examined a Leishmania nuclear extract for these RNAs in ribonucleoprotein (RNP) particles. Density centrifugation analysis showed that the SL RNA is predominately in RNP complexes at 60S, while the SLIS-containing RNAs are in complexes at 40S. They also demonstrated that the SLIS can be released from polyadenylated RNA by incubation with a HeLa cell extract containing debranching enzymatic activity. These data suggested that Leishmania enriettii mRNAs are assembled by bimolecular or trans splicing as has been recently demonstrated for Trypanosoma brucei. Furthermore, they determined the partial sequence of the Leishmania U2 equivalent RNA and demonstrated that it cosediments with the SL RNA at 60S in a nuclear extract. These RNP particles may be analogous to so-called spliceosomes that have been demonstrated in other systems.« less

  3. Community and gene composition of a human dental plaque microbiota obtained by metagenomic sequencing

    PubMed Central

    Xie, G.; Chain, P.S.G.; Lo, C.; Liu, K-L.; Gans, J.; Merritt, J.; Qi, F.

    2010-01-01

    SUMMARY Human dental plaque is a complex microbial community containing an estimated 700 to 19,000 species/phylotypes. Despite numerous studies analysing species richness in healthy and diseased human subjects, the true genomic composition of the human dental plaque microbiota remains unknown. Here we report a metagenomic analysis of a healthy human plaque sample using a combination of second-generation sequencing platforms. A total of 860 million base pairs of non-human sequences were generated. Various analysis tools revealed the presence of 12 well-characterized phyla, members of the TM-7 and BRC1 clade, and sequences that could not be classified. Both pathogens and opportunistic pathogens were identified, supporting the ecological plaque hypothesis for oral diseases. Mapping the metagenomic reads to sequenced reference genomes demonstrated that 4% of the reads could be assigned to the sequenced species. Preliminary annotation identified genes belonging to all known functional categories. Interestingly, although 73% of the total assembled contig sequences were predicted to code for proteins, only 51% of them could be assigned a functional role. Furthermore, ~ 2.8% of the total predicted genes coded for proteins involved in resistance to antibiotics and toxic compounds, suggesting that the oral cavity is an important reservoir for antimicrobial resistance. PMID:21040513

  4. Community and gene composition of a human dental plaque microbiota obtained by metagenomic sequencing.

    PubMed

    Xie, G; Chain, P S G; Lo, C-C; Liu, K-L; Gans, J; Merritt, J; Qi, F

    2010-12-01

    Human dental plaque is a complex microbial community containing an estimated 700 to 19,000 species/phylotypes. Despite numerous studies analysing species richness in healthy and diseased human subjects, the true genomic composition of the human dental plaque microbiota remains unknown. Here we report a metagenomic analysis of a healthy human plaque sample using a combination of second-generation sequencing platforms. A total of 860 million base pairs of non-human sequences were generated. Various analysis tools revealed the presence of 12 well-characterized phyla, members of the TM-7 and BRC1 clade, and sequences that could not be classified. Both pathogens and opportunistic pathogens were identified, supporting the ecological plaque hypothesis for oral diseases. Mapping the metagenomic reads to sequenced reference genomes demonstrated that 4% of the reads could be assigned to the sequenced species. Preliminary annotation identified genes belonging to all known functional categories. Interestingly, although 73% of the total assembled contig sequences were predicted to code for proteins, only 51% of them could be assigned a functional role. Furthermore, ~2.8% of the total predicted genes coded for proteins involved in resistance to antibiotics and toxic compounds, suggesting that the oral cavity is an important reservoir for antimicrobial resistance. © 2010 John Wiley & Sons A/S.

  5. Rescaled earthquake recurrence time statistics: application to microrepeaters

    NASA Astrophysics Data System (ADS)

    Goltz, Christian; Turcotte, Donald L.; Abaimov, Sergey G.; Nadeau, Robert M.; Uchida, Naoki; Matsuzawa, Toru

    2009-01-01

    Slip on major faults primarily occurs during `characteristic' earthquakes. The recurrence statistics of characteristic earthquakes play an important role in seismic hazard assessment. A major problem in determining applicable statistics is the short sequences of characteristic earthquakes that are available worldwide. In this paper, we introduce a rescaling technique in which sequences can be superimposed to establish larger numbers of data points. We consider the Weibull and log-normal distributions, in both cases we rescale the data using means and standard deviations. We test our approach utilizing sequences of microrepeaters, micro-earthquakes which recur in the same location on a fault. It seems plausible to regard these earthquakes as a miniature version of the classic characteristic earthquakes. Microrepeaters are much more frequent than major earthquakes, leading to longer sequences for analysis. In this paper, we present results for the analysis of recurrence times for several microrepeater sequences from Parkfield, CA as well as NE Japan. We find that, once the respective sequence can be considered to be of sufficient stationarity, the statistics can be well fitted by either a Weibull or a log-normal distribution. We clearly demonstrate this fact by our technique of rescaled combination. We conclude that the recurrence statistics of the microrepeater sequences we consider are similar to the recurrence statistics of characteristic earthquakes on major faults.

  6. Cluster analysis of S. Cerevisiae nucleosome binding sites

    NASA Astrophysics Data System (ADS)

    Suvorova, Y.; Korotkov, E.

    2017-12-01

    It is well known that major part of a eukaryotic genome is wrapped around histone proteins forming nucleosomes. It was also demonstrated that the DNA sequence itself is playing an important role in the nucleosome positioning process. In this work, a cluster analysis of 67 517 nucleosome binding sites from the S. Cerevisiae genome was carried out. The classification method is based on the self-adjusting dinucleotides position weight matrix. As a result, 135 significant clusters were discovered that contain 43225 sequences (which constitutes 64% of the initial set). The meaning of the found classes is discussed, as well as the possibility of the further usage.

  7. Parallel human genome analysis: microarray-based expression monitoring of 1000 genes.

    PubMed Central

    Schena, M; Shalon, D; Heller, R; Chai, A; Brown, P O; Davis, R W

    1996-01-01

    Microarrays containing 1046 human cDNAs of unknown sequence were printed on glass with high-speed robotics. These 1.0-cm2 DNA "chips" were used to quantitatively monitor differential expression of the cognate human genes using a highly sensitive two-color hybridization assay. Array elements that displayed differential expression patterns under given experimental conditions were characterized by sequencing. The identification of known and novel heat shock and phorbol ester-regulated genes in human T cells demonstrates the sensitivity of the assay. Parallel gene analysis with microarrays provides a rapid and efficient method for large-scale human gene discovery. Images Fig. 1 Fig. 2 Fig. 3 PMID:8855227

  8. Novel splice mutation in microthalmia-associated transcription factor in Waardenburg Syndrome.

    PubMed

    Brenner, Laura; Burke, Kelly; Leduc, Charles A; Guha, Saurav; Guo, Jiancheng; Chung, Wendy K

    2011-01-01

    Waardenburg Syndrome (WS) is a syndromic form of hearing loss associated with mutations in six different genes. We identified a large family with WS that had previously undergone clinical testing, with no reported pathogenic mutation. Using linkage analysis, a region on 3p14.1 with an LOD score of 6.6 was identified. Microthalmia-Associated Transcription Factor, a gene known to cause WS, is located within this region of linkage. Sequencing of Microthalmia-Associated Transcription Factor demonstrated a c.1212 G>A synonymous variant that segregated with the WS in the family and was predicted to cause a novel splicing site that was confirmed with expression analysis of the mRNA. This case illustrates the need to computationally analyze novel synonymous sequence variants for possible effects on splicing to maximize the clinical sensitivity of sequence-based genetic testing.

  9. Cleavage sites within the poliovirus capsid protein precursors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Larsen, G.R.; Anderson, C.W.; Dorner, A.J.

    1982-01-01

    Partial amino-terminal sequence analysis was performed on radiolabeled poliovirus capsid proteins VP1, VP2, and VP3. A computer-assisted comparison of the amino acid sequences obtained with that predicted by the nucleotide sequence of the poliovirus genome allows assignment of the amino terminus of each capsid protein to a unique position within the virus polyprotein. Sequence analysis of trypsin-digested VP4, which has a blocked amino terminus, demonstrates that VP4 is encoded at or very near to the amino terminus of the polyprotein. The gene order of the capsid proteins is VP4-VP2-VP3-VP1. Cleavage of VP0 to VP4 and VP2 is shown to occurmore » between asparagine and serine, whereas the cleavages that separate VP2/VP3 and VP3/VP1 occur between glutamine and glycine residues. This finding supports the hypothesis that the cleavage of VP0, which occurs during virion morphogenesis, is distinct from the cleavages that separate functional regions of the polyprotein.« less

  10. Added Value of Next-Generation Sequencing for Multilocus Sequence Typing Analysis of a Pneumocystis jirovecii Pneumonia Outbreak1.

    PubMed

    Charpentier, Elena; Garnaud, Cécile; Wintenberger, Claire; Bailly, Sébastien; Murat, Jean-Benjamin; Rendu, John; Pavese, Patricia; Drouet, Thibault; Augier, Caroline; Malvezzi, Paolo; Thiébaut-Bertrand, Anne; Mallaret, Marie-Reine; Epaulard, Olivier; Cornet, Muriel; Larrat, Sylvie; Maubon, Danièle

    2017-08-01

    Pneumocystis jirovecii is a major threat for immunocompromised patients, and clusters of pneumocystis pneumonia (PCP) have been increasingly described in transplant units during the past decade. Exploring an outbreak transmission network requires complementary spatiotemporal and strain-typing approaches. We analyzed a PCP outbreak and demonstrated the added value of next-generation sequencing (NGS) for the multilocus sequence typing (MLST) study of P. jirovecii strains. Thirty-two PCP patients were included. Among the 12 solid organ transplant patients, 5 shared a major and unique genotype that was also found as a minor strain in a sixth patient. A transmission map analysis strengthened the suspicion of nosocomial acquisition of this strain for the 6 patients. NGS-MLST enables accurate determination of subpopulation, which allowed excluding other patients from the transmission network. NGS-MLST genotyping approach was essential to deciphering this outbreak. This innovative approach brings new insights for future epidemiologic studies on this uncultivable opportunistic fungus.

  11. Added Value of Next-Generation Sequencing for Multilocus Sequence Typing Analysis of a Pneumocystis jirovecii Pneumonia Outbreak1

    PubMed Central

    Charpentier, Elena; Garnaud, Cécile; Wintenberger, Claire; Bailly, Sébastien; Murat, Jean-Benjamin; Rendu, John; Pavese, Patricia; Drouet, Thibault; Augier, Caroline; Malvezzi, Paolo; Thiébaut-Bertrand, Anne; Mallaret, Marie-Reine; Epaulard, Olivier; Cornet, Muriel; Larrat, Sylvie

    2017-01-01

    Pneumocystis jirovecii is a major threat for immunocompromised patients, and clusters of pneumocystis pneumonia (PCP) have been increasingly described in transplant units during the past decade. Exploring an outbreak transmission network requires complementary spatiotemporal and strain-typing approaches. We analyzed a PCP outbreak and demonstrated the added value of next-generation sequencing (NGS) for the multilocus sequence typing (MLST) study of P. jirovecii strains. Thirty-two PCP patients were included. Among the 12 solid organ transplant patients, 5 shared a major and unique genotype that was also found as a minor strain in a sixth patient. A transmission map analysis strengthened the suspicion of nosocomial acquisition of this strain for the 6 patients. NGS-MLST enables accurate determination of subpopulation, which allowed excluding other patients from the transmission network. NGS-MLST genotyping approach was essential to deciphering this outbreak. This innovative approach brings new insights for future epidemiologic studies on this uncultivable opportunistic fungus. PMID:28726611

  12. Lactobacillus fabifermentans sp. nov. and Lactobacillus cacaonum sp. nov., isolated from Ghanaian cocoa fermentations.

    PubMed

    De Bruyne, Katrien; Camu, Nicholas; De Vuyst, Luc; Vandamme, Peter

    2009-01-01

    Two Gram-positive bacterial strains, LMG 24284T and LMG 24285T, were isolated from different spontaneous cocoa bean heap fermentations in Ghana. Analysis of their 16S rRNA gene sequences indicated that they were members of the Lactobacillus plantarum and Lactobacillus salivarius species groups, respectively. DNA-DNA hybridization experiments with their nearest phylogenetic neighbours demonstrated that both strains represented novel species that could be differentiated from their nearest neighbours by pheS sequence analysis, whole-cell protein electrophoresis, fluorescent amplified fragment length polymorphism analysis and biochemical characterization. Therefore, two novel Lactobacillus species are proposed, Lactobacillus fabifermentans sp. nov. (type strain LMG 24284T =DSM 21115T) and Lactobacillus cacaonum sp. nov. (type strain LMG 24285T =DSM 21116T).

  13. 16S rRNA Gene Sequencing, Multilocus Sequence Analysis, and Mass Spectrometry Identification of the Proposed New Species “Clostridium neonatale”

    PubMed Central

    Bouvet, Philippe; Ferraris, Laurent; Dauphin, Brunhilde; Popoff, Michel-Robert; Butel, Marie Jose

    2014-01-01

    In 2002, an outbreak of necrotizing enterocolitis in a Canadian neonatal intensive care unit was associated with a proposed novel species of Clostridium, “Clostridium neonatale.” To date, there are no data about the isolation, identification, or clinical significance of this species. Additionally, C. neonatale has not been formally classified as a new species, rendering its identification challenging. Indeed, the C. neonatale 16S rRNA gene sequence shows high similarity to another Clostridium species involved in neonatal necrotizing enterocolitis, Clostridium butyricum. By performing a polyphasic study combining phylogenetic analysis (16S rRNA gene sequencing and multilocus sequence analysis) and phenotypic characterization with mass spectrometry, we demonstrated that C. neonatale is a new species within the Clostridium genus sensu stricto, for which we propose the name Clostridium neonatale sp. nov. Now that the status of C. neonatale has been clarified, matrix-assisted laser desorption ionization–time of flight mass spectrometry (MALDI-TOF MS) can be used for better differential identification of C. neonatale and C. butyricum clinical isolates. This is necessary to precisely define the role and clinical significance of C. neonatale, a species that may have been misidentified and underrepresented during previous neonatal necrotizing enterocolitis studies. PMID:25232167

  14. Streptococcus pneumoniae PstS production is phosphate responsive and enhanced during growth in the murine peritoneal cavity

    NASA Technical Reports Server (NTRS)

    Orihuela, C. J.; Mills, J.; Robb, C. W.; Wilson, C. J.; Watson, D. A.; Niesel, D. W.

    2001-01-01

    Differential display-PCR (DDPCR) was used to identify a Streptococcus pneumoniae gene with enhanced transcription during growth in the murine peritoneal cavity. Northern dot blot analysis and comparative densitometry confirmed a 1.8-fold increase in expression of the encoded sequence following murine peritoneal culture (MPC) versus laboratory culture or control culture (CC). Sequencing and basic local alignment search tool analysis identified the DDPCR fragment as pstS, the phosphate-binding protein of a high-affinity phosphate uptake system. PCR amplification of the complete pstS gene followed by restriction analysis and sequencing suggests a high level of conservation between strains and serotypes. Quantitative immunodot blotting using antiserum to recombinant PstS (rPstS) demonstrated an approximately twofold increase in PstS production during MPC from that during CCs, a finding consistent with the low levels of phosphate observed in the peritoneum. Moreover, immunodot blot and Northern analysis demonstrated phosphate-dependent production of PstS in six of seven strains examined. These results identify pstS expression as responsive to the MPC environment and extracellular phosphate concentrations. Presently, it remains unclear if phosphate concentrations in vivo contribute to the regulation of pstS. Finally, polyclonal antiserum to rPstS did not inhibit growth of the pneumococcus in vitro, suggesting that antibodies do not block phosphate uptake; moreover, vaccination of mice with rPstS did not protect against intraperitoneal challenge as assessed by the 50% lethal dose.

  15. Stimulus sequence context differentially modulates inhibition-related theta and delta band activity in a go/nogo task

    PubMed Central

    Harper, Jeremy; Malone, Stephen M.; Bachman, Matthew D.; Bernat, Edward M.

    2015-01-01

    Recent work suggests that dissociable activity in theta and delta frequency bands underlies several common event-related potential (ERP) components, including the nogo N2/P3 complex, which can better index separable functional processes than traditional time-domain measures. Reports have also demonstrated that neural activity can be affected by stimulus sequence context information (i.e., the number and type of preceding stimuli). Stemming from prior work demonstrating that theta and delta index separable processes during response inhibition, the current study assessed sequence context in a Go/Nogo paradigm in which the number of go stimuli preceding each nogo was selectively manipulated. Principal component analysis (PCA) of time-frequency representations revealed differential modulation of evoked theta and delta related to sequence context, where delta increased robustly with additional preceding go stimuli, while theta did not. Findings are consistent with the view that theta indexes simpler initial salience-related processes, while delta indexes more varied and complex processes related to a variety of task parameters. PMID:26751830

  16. Automated Antibody De Novo Sequencing and Its Utility in Biopharmaceutical Discovery

    NASA Astrophysics Data System (ADS)

    Sen, K. Ilker; Tang, Wilfred H.; Nayak, Shruti; Kil, Yong J.; Bern, Marshall; Ozoglu, Berk; Ueberheide, Beatrix; Davis, Darryl; Becker, Christopher

    2017-05-01

    Applications of antibody de novo sequencing in the biopharmaceutical industry range from the discovery of new antibody drug candidates to identifying reagents for research and determining the primary structure of innovator products for biosimilar development. When murine, phage display, or patient-derived monoclonal antibodies against a target of interest are available, but the cDNA or the original cell line is not, de novo protein sequencing is required to humanize and recombinantly express these antibodies, followed by in vitro and in vivo testing for functional validation. Availability of fully automated software tools for monoclonal antibody de novo sequencing enables efficient and routine analysis. Here, we present a novel method to automatically de novo sequence antibodies using mass spectrometry and the Supernovo software. The robustness of the algorithm is demonstrated through a series of stress tests.

  17. Computational sequence analysis of predicted long dsRNA transcriptomes of major crops reveals sequence complementarity with human genes.

    PubMed

    Jensen, Peter D; Zhang, Yuanji; Wiggins, B Elizabeth; Petrick, Jay S; Zhu, Jin; Kerstetter, Randall A; Heck, Gregory R; Ivashuta, Sergey I

    2013-01-01

    Long double-stranded RNAs (long dsRNAs) are precursors for the effector molecules of sequence-specific RNA-based gene silencing in eukaryotes. Plant cells can contain numerous endogenous long dsRNAs. This study demonstrates that such endogenous long dsRNAs in plants have sequence complementarity to human genes. Many of these complementary long dsRNAs have perfect sequence complementarity of at least 21 nucleotides to human genes; enough complementarity to potentially trigger gene silencing in targeted human cells if delivered in functional form. However, the number and diversity of long dsRNA molecules in plant tissue from crops such as lettuce, tomato, corn, soy and rice with complementarity to human genes that have a long history of safe consumption supports a conclusion that long dsRNAs do not present a significant dietary risk.

  18. Phylogenetic characterization of a biogas plant microbial community integrating clone library 16S-rDNA sequences and metagenome sequence data obtained by 454-pyrosequencing.

    PubMed

    Kröber, Magdalena; Bekel, Thomas; Diaz, Naryttza N; Goesmann, Alexander; Jaenicke, Sebastian; Krause, Lutz; Miller, Dimitri; Runte, Kai J; Viehöver, Prisca; Pühler, Alfred; Schlüter, Andreas

    2009-06-01

    The phylogenetic structure of the microbial community residing in a fermentation sample from a production-scale biogas plant fed with maize silage, green rye and liquid manure was analysed by an integrated approach using clone library sequences and metagenome sequence data obtained by 454-pyrosequencing. Sequencing of 109 clones from a bacterial and an archaeal 16S-rDNA amplicon library revealed that the obtained nucleotide sequences are similar but not identical to 16S-rDNA database sequences derived from different anaerobic environments including digestors and bioreactors. Most of the bacterial 16S-rDNA sequences could be assigned to the phylum Firmicutes with the most abundant class Clostridia and to the class Bacteroidetes, whereas most archaeal 16S-rDNA sequences cluster close to the methanogen Methanoculleus bourgensis. Further sequences of the archaeal library most probably represent so far non-characterised species within the genus Methanoculleus. A similar result derived from phylogenetic analysis of mcrA clone sequences. The mcrA gene product encodes the alpha-subunit of methyl-coenzyme-M reductase involved in the final step of methanogenesis. BLASTn analysis applying stringent settings resulted in assignment of 16S-rDNA metagenome sequence reads to 62 16S-rDNA amplicon sequences thus enabling frequency of abundance estimations for 16S-rDNA clone library sequences. Ribosomal Database Project (RDP) Classifier processing of metagenome 16S-rDNA reads revealed abundance of the phyla Firmicutes, Bacteroidetes and Euryarchaeota and the orders Clostridiales, Bacteroidales and Methanomicrobiales. Moreover, a large fraction of 16S-rDNA metagenome reads could not be assigned to lower taxonomic ranks, demonstrating that numerous microorganisms in the analysed fermentation sample of the biogas plant are still unclassified or unknown.

  19. Transmission clustering among newly diagnosed HIV patients in Chicago, 2008 to 2011: using phylogenetics to expand knowledge of regional HIV transmission patterns

    PubMed Central

    Lubelchek, Ronald J.; Hoehnen, Sarah C.; Hotton, Anna L.; Kincaid, Stacey L.; Barker, David E.; French, Audrey L.

    2014-01-01

    Introduction HIV transmission cluster analyses can inform HIV prevention efforts. We describe the first such assessment for transmission clustering among HIV patients in Chicago. Methods We performed transmission cluster analyses using HIV pol sequences from newly diagnosed patients presenting to Chicago’s largest HIV clinic between 2008 and 2011. We compared sequences via progressive pairwise alignment, using neighbor joining to construct an un-rooted phylogenetic tree. We defined clusters as >2 sequences among which each sequence had at least one partner within a genetic distance of ≤ 1.5%. We used multivariable regression to examine factors associated with clustering and used geospatial analysis to assess geographic proximity of phylogenetically clustered patients. Results We compared sequences from 920 patients; median age 35 years; 75% male; 67% Black, 23% Hispanic; 8% had a Rapid Plasma Reagin (RPR) titer ≥ 1:16 concurrent with their HIV diagnosis. We had HIV transmission risk data for 54%; 43% identified as men who have sex with men (MSM). Phylogenetic analysis demonstrated 123 patients (13%) grouped into 26 clusters, the largest having 20 members. In multivariable regression, age < 25, Black race, MSM status, male gender, higher HIV viral load, and RPR ≥ 1:16 associated with clustering. We did not observe geographic grouping of genetically clustered patients. Discussion Our results demonstrate high rates of HIV transmission clustering, without local geographic foci, among young Black MSM in Chicago. Applied prospectively, phylogenetic analyses could guide prevention efforts and help break the cycle of transmission. PMID:25321182

  20. Fragment assignment in the cloud with eXpress-D

    PubMed Central

    2013-01-01

    Background Probabilistic assignment of ambiguously mapped fragments produced by high-throughput sequencing experiments has been demonstrated to greatly improve accuracy in the analysis of RNA-Seq and ChIP-Seq, and is an essential step in many other sequence census experiments. A maximum likelihood method using the expectation-maximization (EM) algorithm for optimization is commonly used to solve this problem. However, batch EM-based approaches do not scale well with the size of sequencing datasets, which have been increasing dramatically over the past few years. Thus, current approaches to fragment assignment rely on heuristics or approximations for tractability. Results We present an implementation of a distributed EM solution to the fragment assignment problem using Spark, a data analytics framework that can scale by leveraging compute clusters within datacenters–“the cloud”. We demonstrate that our implementation easily scales to billions of sequenced fragments, while providing the exact maximum likelihood assignment of ambiguous fragments. The accuracy of the method is shown to be an improvement over the most widely used tools available and can be run in a constant amount of time when cluster resources are scaled linearly with the amount of input data. Conclusions The cloud offers one solution for the difficulties faced in the analysis of massive high-thoughput sequencing data, which continue to grow rapidly. Researchers in bioinformatics must follow developments in distributed systems–such as new frameworks like Spark–for ways to port existing methods to the cloud and help them scale to the datasets of the future. Our software, eXpress-D, is freely available at: http://github.com/adarob/express-d. PMID:24314033

  1. Complete Genomic Sequence and Comparative Analysis of the Genome Segments of Sweet Potato Chlorotic Stunt Virus in China

    PubMed Central

    Qin, Yanhong; Wang, Li; Zhang, Zhenchen; Qiao, Qi; Zhang, Desheng; Tian, Yuting; Wang, Shuang; Wang, Yongjiang; Yan, Zhaoling

    2014-01-01

    Background Sweet potato chlorotic stunt virus (family Closteroviridae, genus Crinivirus) features a large bipartite, single-stranded, positive-sense RNA genome. To date, only three complete genomic sequences of SPCSV can be accessed through GenBank. SPCSV was first detected from China in 2011, only partial genomic sequences have been determined in the country. No report on the complete genomic sequence and genome structure of Chinese SPCSV isolates or the genetic relation between isolates from China and other countries is available. Methodology/Principal Findings The complete genomic sequences of five isolates from different areas in China were characterized. This study is the first to report the complete genome sequences of SPCSV from whitefly vectors. Genome structure analysis showed that isolates of WA and EA strains from China have the same coding protein as isolates Can181-9 and m2-47, respectively. Twenty cp genes and four RNA1 partial segments were sequenced and analyzed, and the nucleotide identities of complete genomic, cp, and RNA1 partial sequences were determined. Results indicated high conservation among strains and significant differences between WA and EA strains. Genetic analysis demonstrated that, except for isolates from Guangdong Province, SPCSVs from other areas belong to the WA strain. Genome organization analysis showed that the isolates in this study lack the p22 gene. Conclusions/Significance We presented the complete genome sequences of SPCSV in China. Comparison of nucleotide identities and genome structures between these isolates and previously reported isolates showed slight differences. The nucleotide identities of different SPCSV isolates showed high conservation among strains and significant differences between strains. All nine isolates in this study lacked p22 gene. WA strains were more extensively distributed than EA strains in China. These data provide important insights into the molecular variation and genomic structure of SPCSV in China as well as genetic relationships among isolates from China and other countries. PMID:25170926

  2. WISARD: workbench for integrated superfast association studies for related datasets.

    PubMed

    Lee, Sungyoung; Choi, Sungkyoung; Qiao, Dandi; Cho, Michael; Silverman, Edwin K; Park, Taesung; Won, Sungho

    2018-04-20

    A Mendelian transmission produces phenotypic and genetic relatedness between family members, giving family-based analytical methods an important role in genetic epidemiological studies-from heritability estimations to genetic association analyses. With the advance in genotyping technologies, whole-genome sequence data can be utilized for genetic epidemiological studies, and family-based samples may become more useful for detecting de novo mutations. However, genetic analyses employing family-based samples usually suffer from the complexity of the computational/statistical algorithms, and certain types of family designs, such as incorporating data from extended families, have rarely been used. We present a Workbench for Integrated Superfast Association studies for Related Data (WISARD) programmed in C/C++. WISARD enables the fast and a comprehensive analysis of SNP-chip and next-generation sequencing data on extended families, with applications from designing genetic studies to summarizing analysis results. In addition, WISARD can automatically be run in a fully multithreaded manner, and the integration of R software for visualization makes it more accessible to non-experts. Comparison with existing toolsets showed that WISARD is computationally suitable for integrated analysis of related subjects, and demonstrated that WISARD outperforms existing toolsets. WISARD has also been successfully utilized to analyze the large-scale massive sequencing dataset of chronic obstructive pulmonary disease data (COPD), and we identified multiple genes associated with COPD, which demonstrates its practical value.

  3. PROFESS: a PROtein Function, Evolution, Structure and Sequence database

    PubMed Central

    Triplet, Thomas; Shortridge, Matthew D.; Griep, Mark A.; Stark, Jaime L.; Powers, Robert; Revesz, Peter

    2010-01-01

    The proliferation of biological databases and the easy access enabled by the Internet is having a beneficial impact on biological sciences and transforming the way research is conducted. There are ∼1100 molecular biology databases dispersed throughout the Internet. To assist in the functional, structural and evolutionary analysis of the abundant number of novel proteins continually identified from whole-genome sequencing, we introduce the PROFESS (PROtein Function, Evolution, Structure and Sequence) database. Our database is designed to be versatile and expandable and will not confine analysis to a pre-existing set of data relationships. A fundamental component of this approach is the development of an intuitive query system that incorporates a variety of similarity functions capable of generating data relationships not conceived during the creation of the database. The utility of PROFESS is demonstrated by the analysis of the structural drift of homologous proteins and the identification of potential pancreatic cancer therapeutic targets based on the observation of protein–protein interaction networks. Database URL: http://cse.unl.edu/∼profess/ PMID:20624718

  4. Intact and Top-Down Characterization of Biomolecules and Direct Analysis Using Infrared Matrix-Assisted Laser Desorption Electrospray Ionization Coupled to FT-ICR Mass Spectrometry

    PubMed Central

    Sampson, Jason S.; Murray, Kermit K.; Muddiman, David C.

    2013-01-01

    We report the implementation of an infrared laser onto our previously reported matrix-assisted laser desorption electrospray ionization (MALDESI) source with ESI post-ionization yielding multiply charged peptides and proteins. Infrared (IR)-MALDESI is demonstrated for atmospheric pressure desorption and ionization of biological molecules ranging in molecular weight from 1.2 to 17 kDa. High resolving power, high mass accuracy single-acquisition Fourier transform ion cyclotron resonance (FT-ICR) mass spectra were generated from liquid-and solid-state peptide and protein samples by desorption with an infrared laser (2.94 µm) followed by ESI post-ionization. Intact and top-down analysis of equine myoglobin (17 kDa) desorbed from the solid state with ESI post-ionization demonstrates the sequencing capabilities using IR-MALDESI coupled to FT-ICR mass spectrometry. Carbohydrates and lipids were detected through direct analysis of milk and egg yolk using both UV- and IR-MALDESI with minimal sample preparation. Three of the four classes of biological macromolecules (proteins, carbohydrates, and lipids) have been ionized and detected using MALDESI with minimal sample preparation. Sequencing of O-linked glycans, cleaved from mucin using reductive β-elimination chemistry, is also demonstrated. PMID:19185512

  5. Sample sequencing of vascular plants demonstrates widespread conservation and divergence of microRNAs.

    PubMed

    Chávez Montes, Ricardo A; de Fátima Rosas-Cárdenas, Flor; De Paoli, Emanuele; Accerbi, Monica; Rymarquis, Linda A; Mahalingam, Gayathri; Marsch-Martínez, Nayelli; Meyers, Blake C; Green, Pamela J; de Folter, Stefan

    2014-04-23

    Small RNAs are pivotal regulators of gene expression that guide transcriptional and post-transcriptional silencing mechanisms in eukaryotes, including plants. Here we report a comprehensive atlas of sRNA and miRNA from 3 species of algae and 31 representative species across vascular plants, including non-model plants. We sequence and quantify sRNAs from 99 different tissues or treatments across species, resulting in a data set of over 132 million distinct sequences. Using miRBase mature sequences as a reference, we identify the miRNA sequences present in these libraries. We apply diverse profiling methods to examine critical sRNA and miRNA features, such as size distribution, tissue-specific regulation and sequence conservation between species, as well as to predict putative new miRNA sequences. We also develop database resources, computational analysis tools and a dedicated website, http://smallrna.udel.edu/. This study provides new insights on plant sRNAs and miRNAs, and a foundation for future studies.

  6. Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex.

    PubMed

    Pollen, Alex A; Nowakowski, Tomasz J; Shuga, Joe; Wang, Xiaohui; Leyrat, Anne A; Lui, Jan H; Li, Nianzhen; Szpankowski, Lukasz; Fowler, Brian; Chen, Peilin; Ramalingam, Naveen; Sun, Gang; Thu, Myo; Norris, Michael; Lebofsky, Ronald; Toppani, Dominique; Kemp, Darnell W; Wong, Michael; Clerkson, Barry; Jones, Brittnee N; Wu, Shiquan; Knutsson, Lawrence; Alvarado, Beatriz; Wang, Jing; Weaver, Lesley S; May, Andrew P; Jones, Robert C; Unger, Marc A; Kriegstein, Arnold R; West, Jay A A

    2014-10-01

    Large-scale surveys of single-cell gene expression have the potential to reveal rare cell populations and lineage relationships but require efficient methods for cell capture and mRNA sequencing. Although cellular barcoding strategies allow parallel sequencing of single cells at ultra-low depths, the limitations of shallow sequencing have not been investigated directly. By capturing 301 single cells from 11 populations using microfluidics and analyzing single-cell transcriptomes across downsampled sequencing depths, we demonstrate that shallow single-cell mRNA sequencing (~50,000 reads per cell) is sufficient for unbiased cell-type classification and biomarker identification. In the developing cortex, we identify diverse cell types, including multiple progenitor and neuronal subtypes, and we identify EGR1 and FOS as previously unreported candidate targets of Notch signaling in human but not mouse radial glia. Our strategy establishes an efficient method for unbiased analysis and comparison of cell populations from heterogeneous tissue by microfluidic single-cell capture and low-coverage sequencing of many cells.

  7. Detecting cooperative sequences in the binding of RNA Polymerase-II

    NASA Astrophysics Data System (ADS)

    Glass, Kimberly; Rozenberg, Julian; Girvan, Michelle; Losert, Wolfgang; Ott, Ed; Vinson, Charles

    2008-03-01

    Regulation of the expression level of genes is a key biological process controlled largely by the 1000 base pair (bp) sequence preceding each gene (the promoter region). Within that region transcription factor binding sites (TFBS), 5-10 bp long sequences, act individually or cooperate together in the recruitment of, and therefore subsequent gene transcription by, RNA Polymerase-II (RNAP). We have measured the binding of RNAP to promoters on a genome-wide basis using Chromatin Immunoprecipitation (ChIP-on-Chip) microarray assays. Using all 8-base pair long sequences as a test set, we have identified the DNA sequences that are enriched in promoters with high RNAP binding values. We are able to demonstrate that virtually all sequences enriched in such promoters contain a CpG dinucleotide, indicating that TFBS that contain the CpG dinucleotide are involved in RNAP binding to promoters. Further analysis shows that the presence of pairs of CpG containing sequences cooperate to enhance the binding of RNAP to the promoter.

  8. Targeted DNA sequencing and in situ mutation analysis using mobile phone microscopy

    PubMed Central

    Kühnemund, Malte; Wei, Qingshan; Darai, Evangelia; Wang, Yingjie; Hernández-Neuta, Iván; Yang, Zhao; Tseng, Derek; Ahlford, Annika; Mathot, Lucy; Sjöblom, Tobias; Ozcan, Aydogan; Nilsson, Mats

    2017-01-01

    Molecular diagnostics is typically outsourced to well-equipped centralized laboratories, often far from the patient. We developed molecular assays and portable optical imaging designs that permit on-site diagnostics with a cost-effective mobile-phone-based multimodal microscope. We demonstrate that targeted next-generation DNA sequencing reactions and in situ point mutation detection assays in preserved tumour samples can be imaged and analysed using mobile phone microscopy, achieving a new milestone for tele-medicine technologies. PMID:28094784

  9. An outbreak of dengue virus (DENV) type 2 Cosmopolitan genotype in Israeli travellers returning from the Seychelles, April 2017.

    PubMed

    Lustig, Yaniv; Wolf, Dana; Halutz, Ora; Schwartz, Eli

    2017-06-29

    Dengue virus infection was diagnosed in six Israeli travellers returning from the Seychelles in April 2017. Phylogenetic analysis identified identical sequences belonging to the Cosmopolitan genotype of dengue virus type 2 in all samples sequenced, thus providing evidence for a probable dengue type 2 outbreak in the Seychelles. This report further demonstrates the role of travellers as sentinels for arboviral infections, especially in countries with limited diagnostic capabilities. This article is copyright of The Authors, 2017.

  10. Extensive characterization of Tupaia belangeri neuropeptidome using an integrated mass spectrometric approach.

    PubMed

    Petruzziello, Filomena; Fouillen, Laetitia; Wadensten, Henrik; Kretz, Robert; Andren, Per E; Rainer, Gregor; Zhang, Xiaozhe

    2012-02-03

    Neuropeptidomics is used to characterize endogenous peptides in the brain of tree shrews (Tupaia belangeri). Tree shrews are small animals similar to rodents in size but close relatives of primates, and are excellent models for brain research. Currently, tree shrews have no complete proteome information available on which direct database search can be allowed for neuropeptide identification. To increase the capability in the identification of neuropeptides in tree shrews, we developed an integrated mass spectrometry (MS)-based approach that combines methods including data-dependent, directed, and targeted liquid chromatography (LC)-Fourier transform (FT)-tandem MS (MS/MS) analysis, database construction, de novo sequencing, precursor protein search, and homology analysis. Using this integrated approach, we identified 107 endogenous peptides that have sequences identical or similar to those from other mammalian species. High accuracy MS and tandem MS information, with BLAST analysis and chromatographic characteristics were used to confirm the sequences of all the identified peptides. Interestingly, further sequence homology analysis demonstrated that tree shrew peptides have a significantly higher degree of homology to equivalent sequences in humans than those in mice or rats, consistent with the close phylogenetic relationship between tree shrews and primates. Our results provide the first extensive characterization of the peptidome in tree shrews, which now permits characterization of their function in nervous and endocrine system. As the approach developed fully used the conservative properties of neuropeptides in evolution and the advantage of high accuracy MS, it can be portable for identification of neuropeptides in other species for which the fully sequenced genomes or proteomes are not available.

  11. Genomic analysis of coxsackieviruses A1, A19, A22, enteroviruses 113 and 104: viruses representing two clades with distinct tropism within enterovirus C

    PubMed Central

    Haq, Saddef; Sameroff, Stephen; Howie, Stephen R. C.; Lipkin, W. Ian

    2013-01-01

    Coxsackieviruses (CV) A1, CV-A19 and CV-A22 have historically comprised a distinct phylogenetic clade within Enterovirus (EV) C. Several novel serotypes that are genetically similar to these three viruses have been recently discovered and characterized. Here, we report the coding sequence analysis of two genotypes of a previously uncharacterized serotype EV-C113 from Bangladesh and demonstrate that it is most similar to CV-A22 and EV-C116 within the capsid region. We sequenced novel genotypes of CV-A1, CV-A19 and CV-A22 from Bangladesh and observed a high rate of recombination within this group. We also report genomic analysis of the rarely reported EV-C104 circulating in the Gambia in 2009. All available EV-C104 sequences displayed a high degree of similarity within the structural genes but formed two clusters within the non-structural genes. One cluster included the recently reported EV-C117, suggesting an ancestral recombination between these two serotypes. Phylogenetic analysis of all available complete genome sequences indicated the existence of two subgroups within this distinct Enterovirus C clade: one has been exclusively recovered from gastrointestinal samples, while the other cluster has been implicated in respiratory disease. PMID:23761409

  12. Achieving an empathic stance: dialogical sequence analysis of a change episode.

    PubMed

    Tikkanen, Soile; Stiles, William B; Leiman, Mikael

    2013-01-01

    Abstract This study examined a client's therapeutic progress within one session of an 18-session child neurological assessment. The analysis focused on a parent-psychologist dialogue in one session of the assessment process. Dialogical sequence analysis (DSA; Leiman, 2004, 2012) was used as a micro-analytic method to examine the developing discourse. The analysis traced the mother's developing of a reflective stance toward herself and her problematic ways of interacting with her daughter, who was the client. During the dialogue, the mother began to recognize her own contribution in maintaining the problematic pattern. Her gradual acknowledgment of the child's perspective and her growing sense of the child's otherness were mediated by an observer position (third-person view) toward the problematic pattern, which allowed a flexible exchange between the perspectives of self and the other. The results demonstrate the parallel development of intrapersonal and interpersonal empathy shown previously to characterize the transition from stage 3 (problem statement/clarification) to stage 4 (understanding/insight) in the assimilation of problematic experiences sequence (Brinegar, Salvi, Stiles, & Greenberg, 2006).

  13. Repetitive sequence analysis and karyotyping reveals centromere-associated DNA sequences in radish (Raphanus sativus L.).

    PubMed

    He, Qunyan; Cai, Zexi; Hu, Tianhua; Liu, Huijun; Bao, Chonglai; Mao, Weihai; Jin, Weiwei

    2015-04-18

    Radish (Raphanus sativus L., 2n = 2x = 18) is a major root vegetable crop especially in eastern Asia. Radish root contains various nutritions which play an important role in strengthening immunity. Repetitive elements are primary components of the genomic sequence and the most important factors in genome size variations in higher eukaryotes. To date, studies about repetitive elements of radish are still limited. To better understand genome structure of radish, we undertook a study to evaluate the proportion of repetitive elements and their distribution in radish. We conducted genome-wide characterization of repetitive elements in radish with low coverage genome sequencing followed by similarity-based cluster analysis. Results showed that about 31% of the genome was composed of repetitive sequences. Satellite repeats were the most dominating elements of the genome. The distribution pattern of three satellite repeat sequences (CL1, CL25, and CL43) on radish chromosomes was characterized using fluorescence in situ hybridization (FISH). CL1 was predominantly located at the centromeric region of all chromosomes, CL25 located at the subtelomeric region, and CL43 was a telomeric satellite. FISH signals of two satellite repeats, CL1 and CL25, together with 5S rDNA and 45S rDNA, provide useful cytogenetic markers to identify each individual somatic metaphase chromosome. The centromere-specific histone H3 (CENH3) has been used as a marker to identify centromere DNA sequences. One putative CENH3 (RsCENH3) was characterized and cloned from radish. Its deduced amino acid sequence shares high similarities to those of the CENH3s in Brassica species. An antibody against B. rapa CENH3, specifically stained radish centromeres. Immunostaining and chromatin immunoprecipitation (ChIP) tests with anti-BrCENH3 antibody demonstrated that both the centromere-specific retrotransposon (CR-Radish) and satellite repeat (CL1) are directly associated with RsCENH3 in radish. Proportions of repetitive elements in radish were estimated and satellite repeats were the most dominating elements. Fine karyotyping analysis was established which allow us to easily identify each individual somatic metaphase chromosome. Immunofluorescence- and ChIP-based assays demonstrated the functional significance of satellite and centromere-specific retrotransposon at centromeres. Our study provides a valuable basis for future genomic studies in radish.

  14. Comparison of illumina and 454 deep sequencing in participants failing raltegravir-based antiretroviral therapy.

    PubMed

    Li, Jonathan Z; Chapman, Brad; Charlebois, Patrick; Hofmann, Oliver; Weiner, Brian; Porter, Alyssa J; Samuel, Reshmi; Vardhanabhuti, Saran; Zheng, Lu; Eron, Joseph; Taiwo, Babafemi; Zody, Michael C; Henn, Matthew R; Kuritzkes, Daniel R; Hide, Winston; Wilson, Cara C; Berzins, Baiba I; Acosta, Edward P; Bastow, Barbara; Kim, Peter S; Read, Sarah W; Janik, Jennifer; Meres, Debra S; Lederman, Michael M; Mong-Kryspin, Lori; Shaw, Karl E; Zimmerman, Louis G; Leavitt, Randi; De La Rosa, Guy; Jennings, Amy

    2014-01-01

    The impact of raltegravir-resistant HIV-1 minority variants (MVs) on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs. A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegravir resistance at the time of virologic failure. A control library was created by pooling integrase clones at predefined proportions. Multiplexed sequencing was performed with Illumina and 454 platforms at comparable costs. Illumina sequence analysis was performed with the novel snp-assess tool and 454 sequencing was analyzed with V-Phaser. Illumina sequencing resulted in significantly higher sequence coverage and a 0.095% limit of detection. Illumina accurately detected all MVs in the control library at ≥0.5% and 7/10 MVs expected at 0.1%. 454 sequencing failed to detect any MVs at 0.1% with 5 false positive calls. For MVs detected in the patient samples by both 454 and Illumina, the correlation in the detected variant frequencies was high (R2 = 0.92, P<0.001). Illumina sequencing detected 2.4-fold greater nucleotide MVs and 2.9-fold greater amino acid MVs compared to 454. The only raltegravir-resistant MV detected was an E138K mutation in one participant by Illumina sequencing, but not by 454. In participants of A5262 with raltegravir resistance at virologic failure, baseline raltegravir-resistant MVs were rarely detected. At comparable costs to 454 sequencing, Illumina demonstrated greater depth of coverage, increased sensitivity for detecting HIV MVs, and fewer false positive variant calls.

  15. Unraveling the Dyad: Using Recurrence Analysis to Explore Patterns of Syntactic Coordination between Children and Caregivers in Conversation

    ERIC Educational Resources Information Center

    Dale, Rick; Spivey, Michael J.

    2006-01-01

    Recurrence analysis is introduced as a means to investigate syntactic coordination between child and caregiver. Three CHILDES ( MacWhinney, 2000) corpora are analyzed and demonstrate coordination between children and their caregivers in terms of word-class n-gram sequences. Results further indicate that trade-offs in leading or following this…

  16. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

    USGS Publications Warehouse

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  17. Transcriptome-based differentiation of closely-related Miscanthus lines.

    PubMed

    Chouvarine, Philippe; Cooksey, Amanda M; McCarthy, Fiona M; Ray, David A; Baldwin, Brian S; Burgess, Shane C; Peterson, Daniel G

    2012-01-01

    Distinguishing between individuals is critical to those conducting animal/plant breeding, food safety/quality research, diagnostic and clinical testing, and evolutionary biology studies. Classical genetic identification studies are based on marker polymorphisms, but polymorphism-based techniques are time and labor intensive and often cannot distinguish between closely related individuals. Illumina sequencing technologies provide the detailed sequence data required for rapid and efficient differentiation of related species, lines/cultivars, and individuals in a cost-effective manner. Here we describe the use of Illumina high-throughput exome sequencing, coupled with SNP mapping, as a rapid means of distinguishing between related cultivars of the lignocellulosic bioenergy crop giant miscanthus (Miscanthus × giganteus). We provide the first exome sequence database for Miscanthus species complete with Gene Ontology (GO) functional annotations. A SNP comparative analysis of rhizome-derived cDNA sequences was successfully utilized to distinguish three Miscanthus × giganteus cultivars from each other and from other Miscanthus species. Moreover, the resulting phylogenetic tree generated from SNP frequency data parallels the known breeding history of the plants examined. Some of the giant miscanthus plants exhibit considerable sequence divergence. Here we describe an analysis of Miscanthus in which high-throughput exome sequencing was utilized to differentiate between closely related genotypes despite the current lack of a reference genome sequence. We functionally annotated the exome sequences and provide resources to support Miscanthus systems biology. In addition, we demonstrate the use of the commercial high-performance cloud computing to do computational GO annotation.

  18. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data.

    PubMed

    Miller, Mark P; Knaus, Brian J; Mullins, Thomas D; Haig, Susan M

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25 bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  19. Implementation of Quality Management in Core Service Laboratories

    PubMed Central

    Creavalle, T.; Haque, K.; Raley, C.; Subleski, M.; Smith, M.W.; Hicks, B.

    2010-01-01

    CF-28 The Genetics and Genomics group of the Advanced Technology Program of SAIC-Frederick exists to bring innovative genomic expertise, tools and analysis to NCI and the scientific community. The Sequencing Facility (SF) provides next generation short read (Illumina) sequencing capacity to investigators using a streamlined production approach. The Laboratory of Molecular Technology (LMT) offers a wide range of genomics core services including microarray expression analysis, miRNA analysis, array comparative genome hybridization, long read (Roche) next generation sequencing, quantitative real time PCR, transgenic genotyping, Sanger sequencing, and clinical mutation detection services to investigators from across the NIH. As the technology supporting this genomic research becomes more complex, the need for basic quality processes within all aspects of the core service groups becomes critical. The Quality Management group works alongside members of these labs to establish or improve processes supporting operations control (equipment, reagent and materials management), process improvement (reengineering/optimization, automation, acceptance criteria for new technologies and tech transfer), and quality assurance and customer support (controlled documentation/SOPs, training, service deficiencies and continual improvement efforts). Implementation and expansion of quality programs within unregulated environments demonstrates SAIC-Frederick's dedication to providing the highest quality products and services to the NIH community.

  20. Evidence for a Role of rpoE in Stressed and Unstressed Cells of Marine Vibrio angustum Strain S14

    PubMed Central

    Hild, Erika; Takayama, Kathy; Olsson, Rose-Marie; Kjelleberg, Staffan

    2000-01-01

    We report the cloning, sequencing, and characterization of the rpoE homolog in Vibrio angustum S14. The rpoE gene encodes a protein with a predicted molecular mass of 19.4 kDa and has been demonstrated to be present as a single-copy gene by Southern blot analysis. The deduced amino acid sequence of RpoE is most similar to that of the RpoE homolog of Sphingomonas aromaticivorans, ς24, displaying sequence similarity and identity of 63 and 43%, respectively. Northern blot analysis demonstrated the induction of rpoE 6, 12, and 40 min after a temperature shift to 40°C. An rpoE mutant was constructed by gene disruption. There was no difference in viability during logarithmic growth, stationary phase, or carbon starvation between the wild type and the rpoE mutant strain. In contrast, survival of the mutant was impaired following heat shock during exponential growth, as well as after oxidative stress at 24 h of carbon starvation. The mutant exhibited microcolony formation during optimal growth temperatures (22 to 30°C), and cell area measurements revealed an increase in cell volume of the mutant during growth at 30°C, compared to the wild-type strain. Moreover, outer membrane and periplasmic space protein analysis demonstrated many alterations in the protein profiles for the mutant during growth and carbon starvation, as well as following oxidative stress, in comparison with the wild-type strain. It is thereby concluded that RpoE has an extracytoplasmic function and mediates a range of specific responses in stressed as well as unstressed cells of V. angustum S14. PMID:11092857

  1. Regulation of the Human Endogenous Retrovirus K (HML-2) Transcriptome by the HIV-1 Tat Protein

    PubMed Central

    Gonzalez-Hernandez, Marta J.; Cavalcoli, James D.; Sartor, Maureen A.; Contreras-Galindo, Rafael; Meng, Fan; Dai, Manhong; Dube, Derek; Saha, Anjan K.; Gitlin, Scott D.; Omenn, Gilbert S.; Kaplan, Mark H.

    2014-01-01

    ABSTRACT Approximately 8% of the human genome is made up of endogenous retroviral sequences. As the HIV-1 Tat protein activates the overall expression of the human endogenous retrovirus type K (HERV-K) (HML-2), we used next-generation sequencing to determine which of the 91 currently annotated HERV-K (HML-2) proviruses are regulated by Tat. Transcriptome sequencing of total RNA isolated from Tat- and vehicle-treated peripheral blood lymphocytes from a healthy donor showed that Tat significantly activates expression of 26 unique HERV-K (HML-2) proviruses, silences 12, and does not significantly alter the expression of the remaining proviruses. Quantitative reverse transcription-PCR validation of the sequencing data was performed on Tat-treated PBLs of seven donors using provirus-specific primers and corroborated the results with a substantial degree of quantitative similarity. IMPORTANCE The expression of HERV-K (HML-2) is tightly regulated but becomes markedly increased following infection with HIV-1, in part due to the HIV-1 Tat protein. The findings reported here demonstrate the complexity of the genome-wide regulation of HERV-K (HML-2) expression by Tat. This work also demonstrates that although HERV-K (HML-2) proviruses in the human genome are highly similar in terms of DNA sequence, modulation of the expression of specific proviruses in a given biological situation can be ascertained using next-generation sequencing and bioinformatics analysis. PMID:24872592

  2. Target Site Recognition by a Diversity-Generating Retroelement

    PubMed Central

    Guo, Huatao; Tse, Longping V.; Nieh, Angela W.; Czornyj, Elizabeth; Williams, Steven; Oukil, Sabrina; Liu, Vincent B.; Miller, Jeff F.

    2011-01-01

    Diversity-generating retroelements (DGRs) are in vivo sequence diversification machines that are widely distributed in bacterial, phage, and plasmid genomes. They function to introduce vast amounts of targeted diversity into protein-encoding DNA sequences via mutagenic homing. Adenine residues are converted to random nucleotides in a retrotransposition process from a donor template repeat (TR) to a recipient variable repeat (VR). Using the Bordetella bacteriophage BPP-1 element as a prototype, we have characterized requirements for DGR target site function. Although sequences upstream of VR are dispensable, a 24 bp sequence immediately downstream of VR, which contains short inverted repeats, is required for efficient retrohoming. The inverted repeats form a hairpin or cruciform structure and mutational analysis demonstrated that, while the structure of the stem is important, its sequence can vary. In contrast, the loop has a sequence-dependent function. Structure-specific nuclease digestion confirmed the existence of a DNA hairpin/cruciform, and marker coconversion assays demonstrated that it influences the efficiency, but not the site of cDNA integration. Comparisons with other phage DGRs suggested that similar structures are a conserved feature of target sequences. Using a kanamycin resistance determinant as a reporter, we found that transplantation of the IMH and hairpin/cruciform-forming region was sufficient to target the DGR diversification machinery to a heterologous gene. In addition to furthering our understanding of DGR retrohoming, our results suggest that DGRs may provide unique tools for directed protein evolution via in vivo DNA diversification. PMID:22194701

  3. Discrimination of Bacillus anthracis from closely related microorganisms by analysis of 16S and 23S rRNA with oligonucleotide microchips

    DOEpatents

    Bavykin, Sergei G.; Mirzabekova, legal representative, Natalia V.; Mirzabekov, deceased, Andrei D.

    2007-12-04

    The present invention relates to methods and compositions for using nucleotide sequence variations of 16S and 23S rRNA within the B. cereus group to discriminate a highly infectious bacterium B. anthracis from closely related microorganisms. Sequence variations in the 16S and 23S rRNA of the B. cereus subgroup including B. anthracis are utilized to construct an array that can detect these sequence variations through selective hybridizations and discriminate B. cereus group that includes B. anthracis. Discrimination of single base differences in rRNA was achieved with a microchip during analysis of B. cereus group isolates from both single and in mixed samples, as well as identification of polymorphic sites. Successful use of a microchip to determine the appropriate subgroup classification using eight reference microorganisms from the B. cereus group as a study set, was demonstrated.

  4. [Cloning, sequencing and prokaryotic expression of cDNAs for the antifreeze protein family from the beetle Tenebrio molitor].

    PubMed

    Liu, Zhong-Yuan; Wang, Yun; Lü, Guo-Dong; Wang, Xian-Lei; Zhang, Fu-Chun; Ma, Ji

    2006-12-01

    The partial cDNA sequence coding for the antifreeze proteins in the Tenebrio molitor was obtained by RT-PCR. Sequence analysis revealed nine putative cDNAs with a high degree of homology to Tenebrio molitor antifreeze proteins. The recombinant pGEX-4T-1-tmafp-XJ430 was introduced into E. coli BL21 to induce a GST fusion protein by IPTG. SDS-PAGE of the fusion protein demonstrated that the antifreeze protein migrated at a size of 38 kDa. The immunization was performed by intra-muscular injection of pCDNA3-tmafp-XJ430, and then antiserum was detected by ELISA. The titer of the antibody was 1:2,000. Western blotting analysis showed the antiserum was specific against the antifreeze protein. This finding could lead to further investigation of the properties and function of antifreeze proteins.

  5. The utility of DNA sequences of an intron from the beta-fibrinogen gene in phylogenetic analysis of woodpeckers (Aves: Picidae).

    PubMed

    Prychitko, T M; Moore, W S

    1997-10-01

    Estimating phylogenies from DNA sequence data has become the major methodology of molecular phylogenetics. To date, molecular phylogenetics of the vertebrates has been very dependent on mtDNA, but studies involving mtDNA are limited because the several genes comprising the mt-genome are inherited as a single linkage group. The only apparent solution to this problem is to sequence additional genes, each representing a distinct linkage group, so that the resultant gene trees provide independent estimates of the species tree. There exists the need to find novel gene sequences which contain enough phylogenetic information to resolve relationships between closely related species. A possible source is the nuclear-encoded introns, because they evolve more rapidly than exons. We designed primers to amplify and sequence the 7 intron from the beta-fibrinogen gene for a recently evolved group, the woodpeckers. We sequenced the entire intron for 10 specimens representing five species. Nucleotide substitutions are randomly distributed along the length of the intron, suggesting selective neutrality. A preliminary analysis indicates that the phylogenetic signal in the intron is as strong as that in the mitochondrial encoded cytochrome b (cyt b) gene. The topology of the beta-fibrinogen tree is identical to that of the cyt b tree. This analysis demonstrates the ability of the 7 intron of beta-fibrinogen to provide well resolved, independent gene trees for recently evolved groups and establishes it as a source of sequences to be used in other phylogenetic studies. Copyright 1997 Academic Press

  6. Phylogenic study of Lemnoideae (duckweeds) through complete chloroplast genomes for eight accessions.

    PubMed

    Ding, Yanqiang; Fang, Yang; Guo, Ling; Li, Zhidan; He, Kaize; Zhao, Yun; Zhao, Hai

    2017-01-01

    Phylogenetic relationship within different genera of Lemnoideae, a kind of small aquatic monocotyledonous plants, was not well resolved, using either morphological characters or traditional markers. Given that rich genetic information in chloroplast genome makes them particularly useful for phylogenetic studies, we used chloroplast genomes to clarify the phylogeny within Lemnoideae. DNAs were sequenced with next-generation sequencing. The duckweeds chloroplast genomes were indirectly filtered from the total DNA data, or directly obtained from chloroplast DNA data. To test the reliability of assembling the chloroplast genome based on the filtration of the total DNA, two methods were used to assemble the chloroplast genome of Landoltia punctata strain ZH0202. A phylogenetic tree was built on the basis of the whole chloroplast genome sequences using MrBayes v.3.2.6 and PhyML 3.0. Eight complete duckweeds chloroplast genomes were assembled, with lengths ranging from 165,775 bp to 171,152 bp, and each contains 80 protein-coding sequences, four rRNAs, 30 tRNAs and two pseudogenes. The identity of L. punctata strain ZH0202 chloroplast genomes assembled through two methods was 100%, and their sequences and lengths were completely identical. The chloroplast genome comparison demonstrated that the differences in chloroplast genome sizes among the Lemnoideae primarily resulted from variation in non-coding regions, especially from repeat sequence variation. The phylogenetic analysis demonstrated that the different genera of Lemnoideae are derived from each other in the following order: Spirodela , Landoltia , Lemna , Wolffiella , and Wolffia . This study demonstrates potential of whole chloroplast genome DNA as an effective option for phylogenetic studies of Lemnoideae. It also showed the possibility of using chloroplast DNA data to elucidate those phylogenies which were not yet solved well by traditional methods even in plants other than duckweeds.

  7. Phylogenic study of Lemnoideae (duckweeds) through complete chloroplast genomes for eight accessions

    PubMed Central

    Ding, Yanqiang; Fang, Yang; Guo, Ling; Li, Zhidan; He, Kaize

    2017-01-01

    Background Phylogenetic relationship within different genera of Lemnoideae, a kind of small aquatic monocotyledonous plants, was not well resolved, using either morphological characters or traditional markers. Given that rich genetic information in chloroplast genome makes them particularly useful for phylogenetic studies, we used chloroplast genomes to clarify the phylogeny within Lemnoideae. Methods DNAs were sequenced with next-generation sequencing. The duckweeds chloroplast genomes were indirectly filtered from the total DNA data, or directly obtained from chloroplast DNA data. To test the reliability of assembling the chloroplast genome based on the filtration of the total DNA, two methods were used to assemble the chloroplast genome of Landoltia punctata strain ZH0202. A phylogenetic tree was built on the basis of the whole chloroplast genome sequences using MrBayes v.3.2.6 and PhyML 3.0. Results Eight complete duckweeds chloroplast genomes were assembled, with lengths ranging from 165,775 bp to 171,152 bp, and each contains 80 protein-coding sequences, four rRNAs, 30 tRNAs and two pseudogenes. The identity of L. punctata strain ZH0202 chloroplast genomes assembled through two methods was 100%, and their sequences and lengths were completely identical. The chloroplast genome comparison demonstrated that the differences in chloroplast genome sizes among the Lemnoideae primarily resulted from variation in non-coding regions, especially from repeat sequence variation. The phylogenetic analysis demonstrated that the different genera of Lemnoideae are derived from each other in the following order: Spirodela, Landoltia, Lemna, Wolffiella, and Wolffia. Discussion This study demonstrates potential of whole chloroplast genome DNA as an effective option for phylogenetic studies of Lemnoideae. It also showed the possibility of using chloroplast DNA data to elucidate those phylogenies which were not yet solved well by traditional methods even in plants other than duckweeds. PMID:29302399

  8. Outbreak Investigation Using High-Throughput Genome Sequencing within a Diagnostic Microbiology Laboratory

    PubMed Central

    Sherry, Norelle L.; Porter, Jessica L.; Seemann, Torsten; Watkins, Andrew; Stinear, Timothy P.

    2013-01-01

    Next-generation sequencing (NGS) of bacterial genomes has recently become more accessible and is now available to the routine diagnostic microbiology laboratory. However, questions remain regarding its feasibility, particularly with respect to data analysis in nonspecialist centers. To test the applicability of NGS to outbreak investigations, Ion Torrent sequencing was used to investigate a putative multidrug-resistant Escherichia coli outbreak in the neonatal unit of the Mercy Hospital for Women, Melbourne, Australia. Four suspected outbreak strains and a comparator strain were sequenced. Genome-wide single nucleotide polymorphism (SNP) analysis demonstrated that the four neonatal intensive care unit (NICU) strains were identical and easily differentiated from the comparator strain. Genome sequence data also determined that the NICU strains belonged to multilocus sequence type 131 and carried the blaCTX-M-15 extended-spectrum beta-lactamase. Comparison of the outbreak strains to all publicly available complete E. coli genome sequences showed that they clustered with neonatal meningitis and uropathogenic isolates. The turnaround time from a positive culture to the completion of sequencing (prior to data analysis) was 5 days, and the cost was approximately $300 per strain (for the reagents only). The main obstacles to a mainstream adoption of NGS technologies in diagnostic microbiology laboratories are currently cost (although this is decreasing), a paucity of user-friendly and clinically focused bioinformatics platforms, and a lack of genomics expertise outside the research environment. Despite these hurdles, NGS technologies provide unparalleled high-resolution genotyping in a short time frame and are likely to be widely implemented in the field of diagnostic microbiology in the next few years, particularly for epidemiological investigations (replacing current typing methods) and the characterization of resistance determinants. Clinical microbiologists need to familiarize themselves with these technologies and their applications. PMID:23408689

  9. Phylogenetic and Functional Analysis of Metagenome Sequence from High-Temperature Archaeal Habitats Demonstrate Linkages between Metabolic Potential and Geochemistry

    PubMed Central

    Inskeep, William P.; Jay, Zackary J.; Herrgard, Markus J.; Kozubal, Mark A.; Rusch, Douglas B.; Tringe, Susannah G.; Macur, Richard E.; Jennings, Ryan deM.; Boyd, Eric S.; Spear, John R.; Roberto, Francisco F.

    2013-01-01

    Geothermal habitats in Yellowstone National Park (YNP) provide an unparalleled opportunity to understand the environmental factors that control the distribution of archaea in thermal habitats. Here we describe, analyze, and synthesize metagenomic and geochemical data collected from seven high-temperature sites that contain microbial communities dominated by archaea relative to bacteria. The specific objectives of the study were to use metagenome sequencing to determine the structure and functional capacity of thermophilic archaeal-dominated microbial communities across a pH range from 2.5 to 6.4 and to discuss specific examples where the metabolic potential correlated with measured environmental parameters and geochemical processes occurring in situ. Random shotgun metagenome sequence (∼40–45 Mb Sanger sequencing per site) was obtained from environmental DNA extracted from high-temperature sediments and/or microbial mats and subjected to numerous phylogenetic and functional analyses. Analysis of individual sequences (e.g., MEGAN and G + C content) and assemblies from each habitat type revealed the presence of dominant archaeal populations in all environments, 10 of whose genomes were largely reconstructed from the sequence data. Analysis of protein family occurrence, particularly of those involved in energy conservation, electron transport, and autotrophic metabolism, revealed significant differences in metabolic strategies across sites consistent with differences in major geochemical attributes (e.g., sulfide, oxygen, pH). These observations provide an ecological basis for understanding the distribution of indigenous archaeal lineages across high-temperature systems of YNP. PMID:23720654

  10. pico-PLAZA, a genome database of microbial photosynthetic eukaryotes.

    PubMed

    Vandepoele, Klaas; Van Bel, Michiel; Richard, Guilhem; Van Landeghem, Sofie; Verhelst, Bram; Moreau, Hervé; Van de Peer, Yves; Grimsley, Nigel; Piganeau, Gwenael

    2013-08-01

    With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains. © 2013 John Wiley & Sons Ltd and Society for Applied Microbiology.

  11. Next-Generation Sequence Analysis of the Genome of RFHVMn, the Macaque Homolog of Kaposi's Sarcoma (KS)-Associated Herpesvirus, from a KS-Like Tumor of a Pig-Tailed Macaque

    PubMed Central

    Bruce, A. Gregory; Ryan, Jonathan T.; Thomas, Mathew J.; Peng, Xinxia; Grundhoff, Adam; Tsai, Che-Chung

    2013-01-01

    The complete sequence of retroperitoneal fibromatosis-associated herpesvirus Macaca nemestrina (RFHVMn), the pig-tailed macaque homolog of Kaposi's sarcoma-associated herpesvirus (KSHV), was determined by next-generation sequence analysis of a Kaposi's sarcoma (KS)-like macaque tumor. Colinearity of genes was observed with the KSHV genome, and the core herpesvirus genes had strong sequence homology to the corresponding KSHV genes. RFHVMn lacked homologs of open reading frame 11 (ORF11) and KSHV ORFs K5 and K6, which appear to have been generated by duplication of ORFs K3 and K4 after the divergence of KSHV and RFHV. RFHVMn contained positional homologs of all other unique KSHV genes, although some showed limited sequence similarity. RFHVMn contained a number of candidate microRNA genes. Although there was little sequence similarity with KSHV microRNAs, one candidate contained the same seed sequence as the positional homolog, kshv-miR-K12-10a, suggesting functional overlap. RNA transcript splicing was highly conserved between RFHVMn and KSHV, and strong sequence conservation was noted in specific promoters and putative origins of replication, predicting important functional similarities. Sequence comparisons indicated that RFHVMn and KSHV developed in long-term synchrony with the evolution of their hosts, and both viruses phylogenetically group within the RV1 lineage of Old World primate rhadinoviruses. RFHVMn is the closest homolog of KSHV to be completely sequenced and the first sequenced RV1 rhadinovirus homolog of KSHV from a nonhuman Old World primate. The strong genetic and sequence similarity between RFHVMn and KSHV, coupled with similarities in biology and pathology, demonstrate that RFHVMn infection in macaques offers an important and relevant model for the study of KSHV in humans. PMID:24109218

  12. Prediction of enhancer-promoter interactions via natural language processing.

    PubMed

    Zeng, Wanwen; Wu, Mengmeng; Jiang, Rui

    2018-05-09

    Precise identification of three-dimensional genome organization, especially enhancer-promoter interactions (EPIs), is important to deciphering gene regulation, cell differentiation and disease mechanisms. Currently, it is a challenging task to distinguish true interactions from other nearby non-interacting ones since the power of traditional experimental methods is limited due to low resolution or low throughput. We propose a novel computational framework EP2vec to assay three-dimensional genomic interactions. We first extract sequence embedding features, defined as fixed-length vector representations learned from variable-length sequences using an unsupervised deep learning method in natural language processing. Then, we train a classifier to predict EPIs using the learned representations in supervised way. Experimental results demonstrate that EP2vec obtains F1 scores ranging from 0.841~ 0.933 on different datasets, which outperforms existing methods. We prove the robustness of sequence embedding features by carrying out sensitivity analysis. Besides, we identify motifs that represent cell line-specific information through analysis of the learned sequence embedding features by adopting attention mechanism. Last, we show that even superior performance with F1 scores 0.889~ 0.940 can be achieved by combining sequence embedding features and experimental features. EP2vec sheds light on feature extraction for DNA sequences of arbitrary lengths and provides a powerful approach for EPIs identification.

  13. Sequencing of Dust Filter Production Process Using Design Structure Matrix (DSM)

    NASA Astrophysics Data System (ADS)

    Sari, R. M.; Matondang, A. R.; Syahputri, K.; Anizar; Siregar, I.; Rizkya, I.; Ursula, C.

    2018-01-01

    Metal casting company produces machinery spare part for manufactures. One of the product produced is dust filter. Most of palm oil mill used this product. Since it is used in most of palm oil mill, company often have problems to address this product. One of problem is the disordered of production process. It carried out by the job sequencing. The important job that should be solved first, least implement, while less important job and could be completed later, implemented first. Design Structure Matrix (DSM) used to analyse and determine priorities in the production process. DSM analysis is sort of production process through dependency sequencing. The result of dependency sequences shows the sequence process according to the inter-process linkage considering before and after activities. Finally, it demonstrates their activities to the coupled activities for metal smelting, refining, grinding, cutting container castings, metal expenditure of molds, metal casting, coating processes, and manufacture of molds of sand.

  14. Effects of Working Couple's Retirement Sequence on Satisfaction in Patriarchal Culture Country: Probing on Gender Difference.

    PubMed

    Lee, Ayoung; Cho, Joonmo

    2017-01-01

    We examined the effects of the differences in the retirement sequence (i.e., who retires first between spouses) on satisfaction in Korea of patriarchal culture. Our empirical study demonstrates that households where men retired first had a much lower satisfaction than households where women retired first. In addition, men were found to show lower satisfaction than wives in both households where women retire first and the households where men retire first. Retirement sequence affecting their satisfaction at the point when only one of the spouses is retired continues to affect their satisfaction after both of them are retired. This means that the difference in the couple's retirement sequence has an ongoing effect on their later happiness. The analysis of the effect of a couple's retirement sequence on the satisfaction in their old life may be useful for improving an individual and couples' quality of life in countries with similar cultures.

  15. Recurrent Network models of sequence generation and memory

    PubMed Central

    Rajan, Kanaka; Harvey, Christopher D; Tank, David W

    2016-01-01

    SUMMARY Sequential activation of neurons is a common feature of network activity during a variety of behaviors, including working memory and decision making. Previous network models for sequences and memory emphasized specialized architectures in which a principled mechanism is pre-wired into their connectivity. Here, we demonstrate that starting from random connectivity and modifying a small fraction of connections, a largely disordered recurrent network can produce sequences and implement working memory efficiently. We use this process, called Partial In-Network training (PINning), to model and match cellular-resolution imaging data from the posterior parietal cortex during a virtual memory-guided two-alternative forced choice task [Harvey, Coen and Tank, 2012]. Analysis of the connectivity reveals that sequences propagate by the cooperation between recurrent synaptic interactions and external inputs, rather than through feedforward or asymmetric connections. Together our results suggest that neural sequences may emerge through learning from largely unstructured network architectures. PMID:26971945

  16. Characterization of the in vitro expressed autoimmune rippling muscle disease immunogenic domain of human titin encoded by TTN exons 248-249

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zelinka, L.; McCann, S.; Budde, J.

    2011-08-05

    Highlights: {yields} Affinity purification of the autoimmune rippling muscle disease immunogenic domain of titin. {yields} Partial sequence analysis confirms that the peptides is in the I band region of titin. {yields} This region of the human titin shows high degree of homology to mouse titin N2-A. -- Abstract: Autoimmune rippling muscle disease (ARMD) is an autoimmune neuromuscular disease associated with myasthenia gravis (MG). Past studies in our laboratory recognized a very high molecular weight skeletal muscle protein antigen identified by ARMD patient antisera as the titin isoform. These past studies used antisera from ARMD and MG patients as probes tomore » screen a human skeletal muscle cDNA library and several pBluescript clones revealed supporting expression of immunoreactive peptides. This study characterizes the products of subcloning the titin immunoreactive domain into pGEX-3X and the subsequent fusion protein. Sequence analysis of the fusion gene indicates the cloned titin domain (GenBank ID: (EU428784)) is in frame and is derived from a sequence of N2-A spanning the exons 248-250 an area that encodes the fibronectin III domain. PCR and EcoR1 restriction mapping studies have demonstrated that the inserted cDNA is of a size that is predicted by bioinformatics analysis of the subclone. Expression of the fusion protein result in the isolation of a polypeptide of 52 kDa consistent with the predicted inferred amino acid sequence. Immunoblot experiments of the fusion protein, using rippling muscle/myasthenia gravis antisera, demonstrate that only the titin domain is immunoreactive.« less

  17. Microfluidic droplet enrichment for targeted sequencing

    PubMed Central

    Eastburn, Dennis J.; Huang, Yong; Pellegrino, Maurizio; Sciambi, Adam; Ptáček, Louis J.; Abate, Adam R.

    2015-01-01

    Targeted sequence enrichment enables better identification of genetic variation by providing increased sequencing coverage for genomic regions of interest. Here, we report the development of a new target enrichment technology that is highly differentiated from other approaches currently in use. Our method, MESA (Microfluidic droplet Enrichment for Sequence Analysis), isolates genomic DNA fragments in microfluidic droplets and performs TaqMan PCR reactions to identify droplets containing a desired target sequence. The TaqMan positive droplets are subsequently recovered via dielectrophoretic sorting, and the TaqMan amplicons are removed enzymatically prior to sequencing. We demonstrated the utility of this approach by generating an average 31.6-fold sequence enrichment across 250 kb of targeted genomic DNA from five unique genomic loci. Significantly, this enrichment enabled a more comprehensive identification of genetic polymorphisms within the targeted loci. MESA requires low amounts of input DNA, minimal prior locus sequence information and enriches the target region without PCR bias or artifacts. These features make it well suited for the study of genetic variation in a number of research and diagnostic applications. PMID:25873629

  18. Typing Clostridium difficile strains based on tandem repeat sequences

    PubMed Central

    2009-01-01

    Background Genotyping of epidemic Clostridium difficile strains is necessary to track their emergence and spread. Portability of genotyping data is desirable to facilitate inter-laboratory comparisons and epidemiological studies. Results This report presents results from a systematic screen for variation in repetitive DNA in the genome of C. difficile. We describe two tandem repeat loci, designated 'TR6' and 'TR10', which display extensive sequence variation that may be useful for sequence-based strain typing. Based on an investigation of 154 C. difficile isolates comprising 75 ribotypes, tandem repeat sequencing demonstrated excellent concordance with widely used PCR ribotyping and equal discriminatory power. Moreover, tandem repeat sequences enabled the reconstruction of the isolates' largely clonal population structure and evolutionary history. Conclusion We conclude that sequence analysis of the two repetitive loci introduced here may be highly useful for routine typing of C. difficile. Tandem repeat sequence typing resolves phylogenetic diversity to a level equivalent to PCR ribotypes. DNA sequences may be stored in databases accessible over the internet, obviating the need for the exchange of reference strains. PMID:19133124

  19. Annotate-it: a Swiss-knife approach to annotation, analysis and interpretation of single nucleotide variation in human disease

    PubMed Central

    2012-01-01

    The increasing size and complexity of exome/genome sequencing data requires new tools for clinical geneticists to discover disease-causing variants. Bottlenecks in identifying the causative variation include poor cross-sample querying, constantly changing functional annotation and not considering existing knowledge concerning the phenotype. We describe a methodology that facilitates exploration of patient sequencing data towards identification of causal variants under different genetic hypotheses. Annotate-it facilitates handling, analysis and interpretation of high-throughput single nucleotide variant data. We demonstrate our strategy using three case studies. Annotate-it is freely available and test data are accessible to all users at http://www.annotate-it.org. PMID:23013645

  20. Functional analysis of Pacific oyster (Crassostrea gigas) β-thymosin: Focus on antimicrobial activity.

    PubMed

    Nam, Bo-Hye; Seo, Jung-Kil; Lee, Min Jeong; Kim, Young-Ok; Kim, Dong-Gyun; An, Cheul Min; Park, Nam Gyu

    2015-07-01

    An antimicrobial peptide, ∼5 kDa in size, was isolated and purified in its active form from the mantle of the Pacific oyster Crassostrea gigas by C18 reversed-phase high-performance liquid chromatography. Matrix-assisted laser desorption ionisation time-of-flight analysis revealed 4656.4 Da of the purified and unreduced peptide. A comparison of the N-terminal amino acid sequence of oyster antimicrobial peptide with deduced amino acid sequences in our local expressed sequence tag (EST) database of C. gigas (unpublished data) revealed that the oyster antimicrobial peptide sequence entirely matched the deduced amino acid sequence of an EST clone (HM-8_A04), which was highly homologous with the β-thymosin of other species. The cDNA possessed a 126-bp open reading frame that encoded a protein of 41 amino acids. To confirm the antimicrobial activity of C. gigas β-thymosin, we overexpressed a recombinant β-thymosin (rcgTβ) using a pET22 expression plasmid in an Escherichia coli system. The antimicrobial activity of rcgTβ was evaluated and demonstrated using a bacterial growth inhibition test in both liquid and solid cultures. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. PlasFlow: predicting plasmid sequences in metagenomic data using genome signatures

    PubMed Central

    Lipinski, Leszek; Dziembowski, Andrzej

    2018-01-01

    Abstract Plasmids are mobile genetics elements that play an important role in the environmental adaptation of microorganisms. Although plasmids are usually analyzed in cultured microorganisms, there is a need for methods that allow for the analysis of pools of plasmids (plasmidomes) in environmental samples. To that end, several molecular biology and bioinformatics methods have been developed; however, they are limited to environments with low diversity and cannot recover large plasmids. Here, we present PlasFlow, a novel tool based on genomic signatures that employs a neural network approach for identification of bacterial plasmid sequences in environmental samples. PlasFlow can recover plasmid sequences from assembled metagenomes without any prior knowledge of the taxonomical or functional composition of samples with an accuracy up to 96%. It can also recover sequences of both circular and linear plasmids and can perform initial taxonomical classification of sequences. Compared to other currently available tools, PlasFlow demonstrated significantly better performance on test datasets. Analysis of two samples from heavy metal-contaminated microbial mats revealed that plasmids may constitute an important fraction of their metagenomes and carry genes involved in heavy-metal homeostasis, proving the pivotal role of plasmids in microorganism adaptation to environmental conditions. PMID:29346586

  2. Sequence analysis of the msp4 gene of Anaplasma ovis strains

    USGS Publications Warehouse

    de la Fuente, J.; Atkinson, M.W.; Naranjo, V.; Fernandez de Mera, I. G.; Mangold, A.J.; Keating, K.A.; Kocan, K.M.

    2007-01-01

    Anaplasma ovis (Rickettsiales: Anaplasmataceae) is a tick-borne pathogen of sheep, goats and wild ruminants. The genetic diversity of A. ovis strains has not been well characterized due to the lack of sequence information. In this study, we evaluated bighorn sheep (Ovis canadensis) and mule deer (Odocoileus hemionus) from Montana for infection with A. ovis by serology and sequence analysis of the msp4 gene. Antibodies to Anaplasma spp. were detected in 37% and 39% of bighorn sheep and mule deer analyzed, respectively. Four new msp4 genotypes were identified. The A. ovis msp4 sequences identified herein were analyzed together with sequences reported previously for the characterization of the genetic diversity of A. ovis strains in comparison with other Anaplasma spp. The results of these studies demonstrated that although A. ovis msp4 genotypes may vary among geographic regions and between sheep and deer hosts, the variation observed was less than the variation observed between A. marginale and A. phagocytophilum strains. The results reported herein further confirm that A. ovis infection occurs in natural wild ruminant populations in Western United States and that bighorn sheep and mule deer may serve as wildlife reservoirs of A. ovis. ?? 2006.

  3. Microbial Characterization of Qatari Barchan Sand Dunes

    PubMed Central

    Chatziefthimiou, Aspassia D.; Nguyen, Hanh; Richer, Renee; Louge, Michel; Sultan, Ali A.; Schloss, Patrick; Hay, Anthony G.

    2016-01-01

    This study represents the first characterization of sand microbiota in migrating barchan sand dunes. Bacterial communities were studied through direct counts and cultivation, as well as 16S rRNA gene and metagenomic sequence analysis to gain an understanding of microbial abundance, diversity, and potential metabolic capabilities. Direct on-grain cell counts gave an average of 5.3 ± 0.4 x 105 cells g-1 of sand. Cultured isolates (N = 64) selected for 16S rRNA gene sequencing belonged to the phyla Actinobacteria (58%), Firmicutes (27%) and Proteobacteria (15%). Deep-sequencing of 16S rRNA gene amplicons from 18 dunes demonstrated a high relative abundance of Proteobacteria, particularly enteric bacteria, and a dune-specific-pattern of bacterial community composition that correlated with dune size. Shotgun metagenome sequences of two representative dunes were analyzed and found to have similar relative bacterial abundance, though the relative abundances of eukaryotic, viral and enterobacterial sequences were greater in sand from the dune closer to a camel-pen. Functional analysis revealed patterns similar to those observed in desert soils; however, the increased relative abundance of genes encoding sporulation and dormancy are consistent with the dune microbiome being well-adapted to the exceptionally hyper-arid Qatari desert. PMID:27655399

  4. Epitope mapping of the variable repetitive region with the MB antigen of Ureaplasma urealyticum.

    PubMed Central

    Zheng, X; Lau, K; Frazier, M; Cassell, G H; Watson, H L

    1996-01-01

    One of the major surface structures of Ureaplasma urealyticum recognized by antibodies of patients during infection is the MB antigen. Previously, we showed by Western blot (immunoblot) analysis that any one of the anti-MB monoclonal antibodies (MAbs) 3B1.5, 5B1.1, and 10C6.6 could block the binding of patient antibodies to MB. Subsequent DNA sequencing revealed that a unique six-amino-acid direct tandem repeat region composed the carboxy two-thirds of this antigen. In the present study, using antibody-reactive peptide scanning of this repeat region, we demonstrated that the amino acids defining the epitopes for MAbs 3B1.5 5B1.1 and 10C6.6 are EQP, GK, and KEQPA, respectively. Peptide scanning analysis of an infected patient's serum antibody response showed that the dominant epitope was defined by the sequence PAGK. Mapping of these continuous epitopes revealed overlap between all MAb and patient polyclonal antibody binding sites, thus explaining the ability of a single MAb to apparently block all polyclonal antibody binding sites. We also show that a single amino acid difference in the sequence of the repeats of serovars 3 and 14 accounts for the lack of reactivity with serovar 14 of two of the serovar 3-specific MAbs. Finally, the data demonstrate the need to obtain the sequences of the mba genes of all serovars before an effective serovar-specific antibody detection method can be developed. PMID:8914774

  5. Cazymes Analysis Toolkit (CAT): Webservice for searching and analyzing carbohydrateactive enzymes in a newly sequenced organism using CAZy database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Karpinets, Tatiana V; Park, Byung; Syed, Mustafa H

    2010-01-01

    The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire non-redundant sequences of the CAZy database. Themore » second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains (DUF) and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit (CAT), and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.« less

  6. CAZymes Analysis Toolkit (CAT): web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database.

    PubMed

    Park, Byung H; Karpinets, Tatiana V; Syed, Mustafa H; Leuze, Michael R; Uberbacher, Edward C

    2010-12-01

    The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire nonredundant sequences of the CAZy database. The second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit, and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.

  7. Molecular Characterization of Transgene Integration by Next-Generation Sequencing in Transgenic Cattle

    PubMed Central

    Zhang, Ran; Yin, Yinliang; Zhang, Yujun; Li, Kexin; Zhu, Hongxia; Gong, Qin; Wang, Jianwu; Hu, Xiaoxiang; Li, Ning

    2012-01-01

    As the number of transgenic livestock increases, reliable detection and molecular characterization of transgene integration sites and copy number are crucial not only for interpreting the relationship between the integration site and the specific phenotype but also for commercial and economic demands. However, the ability of conventional PCR techniques to detect incomplete and multiple integration events is limited, making it technically challenging to characterize transgenes. Next-generation sequencing has enabled cost-effective, routine and widespread high-throughput genomic analysis. Here, we demonstrate the use of next-generation sequencing to extensively characterize cattle harboring a 150-kb human lactoferrin transgene that was initially analyzed by chromosome walking without success. Using this approach, the sites upstream and downstream of the target gene integration site in the host genome were identified at the single nucleotide level. The sequencing result was verified by event-specific PCR for the integration sites and FISH for the chromosomal location. Sequencing depth analysis revealed that multiple copies of the incomplete target gene and the vector backbone were present in the host genome. Upon integration, complex recombination was also observed between the target gene and the vector backbone. These findings indicate that next-generation sequencing is a reliable and accurate approach for the molecular characterization of the transgene sequence, integration sites and copy number in transgenic species. PMID:23185606

  8. Distribution of a Nocardia brasiliensis catalase gene fragment in members of the genera Nocardia, Gordona, and Rhodococcus.

    PubMed

    Vera-Cabrera, L; Johnson, W M; Welsh, O; Resendiz-Uresti, F L; Salinas-Carmona, M C

    1999-06-01

    An immunodominant protein from Nocardia brasiliensis, P61, was subjected to amino-terminal and internal sequence analysis. Three sequences of 22, 17, and 38 residues, respectively, were obtained and compared with the protein database from GenBank by using the BLAST system. The sequences showed homology to some eukaryotic catalases and to a bromoperoxidase-catalase from Streptomyces violaceus. Its identity as a catalase was confirmed by analysis of its enzymatic activity on H2O2 and by a double-staining method on a nondenaturing polyacrylamide gel with 3,3'-diaminobenzidine and ferricyanide; the result showed only catalase activity, but no peroxidase. By using one of the internal amino acid sequences and a consensus catalase motif (VGNNTP), we were able to design a PCR assay that generated a 500-bp PCR product. The amplicon was analyzed, and the nucleotide sequence was compared to the GenBank database with the observation of high homology to other bacterial and eukaryotic catalases. A PCR assay based on this target sequence was performed with primers NB10 and NB11 to confirm the presence of the NB10-NB11 gene fragment in several N. brasiliensis strains isolated from mycetoma. The same assay was used to determine whether there were homologous sequences in several type strains from the genera Nocardia, Rhodococcus, Gordona, and Streptomyces. All of the N. brasiliensis strains presented a positive result but only some of the actinomycetes species tested were positive in the PCR assay. In order to confirm these findings, genomic DNA was subjected to Southern blot analysis. A 1.7-kbp band was observed in the N. brasiliensis strains, and bands of different molecular weight were observed in cross-reacting actinomycetes. Sequence analysis of the amplicons of selected actinomycetes showed high homology in this catalase fragment, thus demonstrating that this protein is highly conserved in this group of bacteria.

  9. High-throughput sequencing enhanced phage display enables the identification of patient-specific epitope motifs in serum.

    PubMed

    Christiansen, Anders; Kringelum, Jens V; Hansen, Christian S; Bøgh, Katrine L; Sullivan, Eric; Patel, Jigar; Rigby, Neil M; Eiwegger, Thomas; Szépfalusi, Zsolt; de Masi, Federico; Nielsen, Morten; Lund, Ole; Dufva, Martin

    2015-08-06

    Phage display is a prominent screening technique with a multitude of applications including therapeutic antibody development and mapping of antigen epitopes. In this study, phages were selected based on their interaction with patient serum and exhaustively characterised by high-throughput sequencing. A bioinformatics approach was developed in order to identify peptide motifs of interest based on clustering and contrasting to control samples. Comparison of patient and control samples confirmed a major issue in phage display, namely the selection of unspecific peptides. The potential of the bioinformatic approach was demonstrated by identifying epitopes of a prominent peanut allergen, Ara h 1, in sera from patients with severe peanut allergy. The identified epitopes were confirmed by high-density peptide micro-arrays. The present study demonstrates that high-throughput sequencing can empower phage display by (i) enabling the analysis of complex biological samples, (ii) circumventing the traditional laborious picking and functional testing of individual phage clones and (iii) reducing the number of selection rounds.

  10. An integrated native mass spectrometry and top-down proteomics method that connects sequence to structure and function of macromolecular complexes

    NASA Astrophysics Data System (ADS)

    Li, Huilin; Nguyen, Hong Hanh; Ogorzalek Loo, Rachel R.; Campuzano, Iain D. G.; Loo, Joseph A.

    2018-02-01

    Mass spectrometry (MS) has become a crucial technique for the analysis of protein complexes. Native MS has traditionally examined protein subunit arrangements, while proteomics MS has focused on sequence identification. These two techniques are usually performed separately without taking advantage of the synergies between them. Here we describe the development of an integrated native MS and top-down proteomics method using Fourier-transform ion cyclotron resonance (FTICR) to analyse macromolecular protein complexes in a single experiment. We address previous concerns of employing FTICR MS to measure large macromolecular complexes by demonstrating the detection of complexes up to 1.8 MDa, and we demonstrate the efficacy of this technique for direct acquirement of sequence to higher-order structural information with several large complexes. We then summarize the unique functionalities of different activation/dissociation techniques. The platform expands the ability of MS to integrate proteomics and structural biology to provide insights into protein structure, function and regulation.

  11. Molecular cloning, structural analysis, and expression in Escherichia coli of a chitinase gene from Enterobacter agglomerans.

    PubMed Central

    Chernin, L S; De la Fuente, L; Sobolev, V; Haran, S; Vorgias, C E; Oppenheim, A B; Chet, I

    1997-01-01

    The gene chiA, which codes for endochitinase, was cloned from a soilborne Enterobacter agglomerans. Its complete sequence was determined, and the deduced amino acid sequence of the enzyme designated Chia_Entag yielded an open reading frame coding for 562 amino acids of a 61-kDa precursor protein with a putative leader peptide at its N terminus. The nucleotide and polypeptide sequences of Chia_Entag showed 86.8 and 87.7% identity with the corresponding gene and enzyme, Chia_Serma, of Serratia marcescens, respectively. Homology modeling of Chia_Entag's three-dimensional structure demonstrated that most amino acid substitutions are at solvent-accessible sites. Escherichia coli JM109 carrying the E. agglomerans chiA gene produced and secreted Chia_Entag. The antifungal activity of the secreted endochitinase was demonstrated in vitro by inhibition of Fusarium oxysporum spore germination. The transformed strain inhibited Rhizoctonia solani growth on plates and the root rot disease caused by this fungus in cotton seedlings under greenhouse conditions. PMID:9055404

  12. International Interlaboratory Digital PCR Study Demonstrating High Reproducibility for the Measurement of a Rare Sequence Variant.

    PubMed

    Whale, Alexandra S; Devonshire, Alison S; Karlin-Neumann, George; Regan, Jack; Javier, Leanne; Cowen, Simon; Fernandez-Gonzalez, Ana; Jones, Gerwyn M; Redshaw, Nicholas; Beck, Julia; Berger, Andreas W; Combaret, Valérie; Dahl Kjersgaard, Nina; Davis, Lisa; Fina, Frederic; Forshew, Tim; Fredslund Andersen, Rikke; Galbiati, Silvia; González Hernández, Álvaro; Haynes, Charles A; Janku, Filip; Lacave, Roger; Lee, Justin; Mistry, Vilas; Pender, Alexandra; Pradines, Anne; Proudhon, Charlotte; Saal, Lao H; Stieglitz, Elliot; Ulrich, Bryan; Foy, Carole A; Parkes, Helen; Tzonev, Svilen; Huggett, Jim F

    2017-02-07

    This study tested the claim that digital PCR (dPCR) can offer highly reproducible quantitative measurements in disparate laboratories. Twenty-one laboratories measured four blinded samples containing different quantities of a KRAS fragment encoding G12D, an important genetic marker for guiding therapy of certain cancers. This marker is challenging to quantify reproducibly using quantitative PCR (qPCR) or next generation sequencing (NGS) due to the presence of competing wild type sequences and the need for calibration. Using dPCR, 18 laboratories were able to quantify the G12D marker within 12% of each other in all samples. Three laboratories appeared to measure consistently outlying results; however, proper application of a follow-up analysis recommendation rectified their data. Our findings show that dPCR has demonstrable reproducibility across a large number of laboratories without calibration. This could enable the reproducible application of molecular stratification to guide therapy and, potentially, for molecular diagnostics.

  13. Implicit sequence-specific motor learning after sub-cortical stroke is associated with increased prefrontal brain activations: An fMRI study

    PubMed Central

    Meehan, Sean K.; Randhawa, Bubblepreet; Wessel, Brenda; Boyd, Lara A.

    2010-01-01

    Implicit motor learning is preserved after stroke, but how the brain compensates for damage to facilitate learning is unclear. We used a random effects analysis to determine how stroke alters patterns of brain activity during implicit sequence-specific motor learning as compared to general improvements in motor control. Nine healthy participants and 9 individuals with chronic, right focal sub-cortical stroke performed a continuous joystick-based tracking task during an initial fMRI session, over 5 days of practice, and a retention test during a separate fMRI session. Sequence-specific implicit motor learning was differentiated from general improvements in motor control by comparing tracking performance on a novel, repeated tracking sequences during early practice and again at the retention test. Both groups demonstrated implicit sequence-specific motor learning at the retention test, yet substantial differences were apparent. At retention, healthy control participants demonstrated increased BOLD response in left dorsal premotor cortex (BA 6) but decreased BOLD response left dorsolateral prefrontal cortex (DLPFC; BA 9) during repeated sequence tracking. In contrast, at retention individuals with stroke did not show this reduction in DLPFC during repeated tracking. Instead implicit sequence-specific motor learning and general improvements in motor control were associated with increased BOLD response in the left middle frontal gyrus BA 8, regardless of sequence type after stroke. These data emphasize the potential importance of a prefrontal-based attentional network for implicit motor learning after stroke. The present study is the first to highlight the importance of the prefrontal cortex for implicit sequence-specific motor learning after stroke. PMID:20725908

  14. RNA-Seq analysis and transcriptome assembly for blackberry (Rubus sp. Var. Lochness) fruit.

    PubMed

    Garcia-Seco, Daniel; Zhang, Yang; Gutierrez-Mañero, Francisco J; Martin, Cathie; Ramos-Solano, Beatriz

    2015-01-22

    There is an increasing interest in berries, especially blackberries in the diet, because of recent reports of their health benefits due to their high content of flavonoids. A broad range of genomic tools are available for other Rosaceae species but these tools are still lacking in the Rubus genus, thus limiting gene discovery and the breeding of improved varieties. De novo RNA-seq of ripe blackberries grown under field conditions was performed using Illumina Hiseq 2000. Almost 9 billion nucleotide bases were sequenced in total. Following assembly, 42,062 consensus sequences were detected. For functional annotation, 33,040 (NR), 32,762 (NT), 21,932 (Swiss-Prot), 20,134 (KEGG), 13,676 (COG), 24,168 (GO) consensus sequences were annotated using different databases; in total 34,552 annotated sequences were identified. For protein prediction analysis, the number of coding DNA sequences (CDS) that mapped to the protein database was 32,540. Non redundant (NR), annotation showed that 25,418 genes (73.5%) has the highest similarity with Fragaria vesca subspecies vesca. Reanalysis was undertaken by aligning the reads with this reference genome for a deeper analysis of the transcriptome. We demonstrated that de novo assembly, using Trinity and later annotation with Blast using different databases, were complementary to alignment to the reference sequence using SOAPaligner/SOAP2. The Fragaria reference genome belongs to a species in the same family as blackberry (Rosaceae) but to a different genus. Since blackberries are tetraploids, the possibility of artefactual gene chimeras resulting from mis-assembly was tested with one of the genes sequenced by RNAseq, Chalcone Synthase (CHS). cDNAs encoding this protein were cloned and sequenced. Primers designed to the assembled sequences accurately distinguished different contigs, at least for chalcone synthase genes. We prepared and analysed transcriptome data from ripe blackberries, for which prior genomic information was limited. This new sequence information will improve the knowledge of this important and healthy fruit, providing an invaluable new tool for biological research.

  15. Pseudomonas caspiana sp. nov., a citrus pathogen in the Pseudomonas syringae phylogenetic group.

    PubMed

    Busquets, Antonio; Gomila, Margarita; Beiki, Farid; Mulet, Magdalena; Rahimian, Heshmat; García-Valdés, Elena; Lalucat, Jorge

    2017-07-01

    In a screening by multilocus sequence analysis of Pseudomonas strains isolated from diverse origins, 4 phylogenetically closely related strains (FBF58, FBF102 T , FBF103, and FBF122) formed a well-defined cluster in the Pseudomonas syringae phylogenetic group. The strains were isolated from citrus orchards in northern Iran with disease symptoms in the leaves and stems and its pathogenicity against citrus plants was demonstrated. The whole genome of the type strain of the proposed new species (FBF102 T =CECT 9164 T =CCUG 69273 T ) was sequenced and characterized. Comparative genomics with the 14 known Pseudomonas species type strains of the P. syringae phylogenetic group demonstrated that this strain belonged to a new genomic species, different from the species described thus far. Genome analysis detected genes predicted to be involved in pathogenesis, such as an atypical type 3 secretion system and two type 6 secretion systems, together with effectors and virulence factors. A polyphasic taxonomic characterization demonstrated that the 4 plant pathogenic strains represented a new species, for which the name Pseudomonas caspiana sp. nov. is proposed. Copyright © 2017 Elsevier GmbH. All rights reserved.

  16. DUK - A Fast and Efficient Kmer Based Sequence Matching Tool

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Mingkun; Copeland, Alex; Han, James

    2011-03-21

    A new tool, DUK, is developed to perform matching task. Matching is to find whether a query sequence partially or totally matches given reference sequences or not. Matching is similar to alignment. Indeed many traditional analysis tasks like contaminant removal use alignment tools. But for matching, there is no need to know which bases of a query sequence matches which position of a reference sequence, it only need know whether there exists a match or not. This subtle difference can make matching task much faster than alignment. DUK is accurate, versatile, fast, and has efficient memory usage. It uses Kmermore » hashing method to index reference sequences and Poisson model to calculate p-value. DUK is carefully implemented in C++ in object oriented design. The resulted classes can also be used to develop other tools quickly. DUK have been widely used in JGI for a wide range of applications such as contaminant removal, organelle genome separation, and assembly refinement. Many real applications and simulated dataset demonstrate its power.« less

  17. Comparison of the nucleotide and amino acid sequences of the RsrI and EcoRI restriction endonucleases.

    PubMed

    Stephenson, F H; Ballard, B T; Boyer, H W; Rosenberg, J M; Greene, P J

    1989-12-21

    The RsrI endonuclease, a type-II restriction endonuclease (ENase) found in Rhodobacter sphaeroides, is an isoschizomer of the EcoRI ENase. A clone containing an 11-kb BamHI fragment was isolated from an R. sphaeroides genomic DNA library by hybridization with synthetic oligodeoxyribonucleotide probes based on the N-terminal amino acid (aa) sequence of RsrI. Extracts of E. coli containing a subclone of the 11-kb fragment display RsrI activity. Nucleotide sequence analysis reveals an 831-bp open reading frame encoding a polypeptide of 277 aa. A 50% identity exists within a 266-aa overlap between the deduced aa sequences of RsrI and EcoRI. Regions of 75-100% aa sequence identity correspond to key structural and functional regions of EcoRI. The type-II ENases have many common properties, and a common origin might have been expected. Nevertheless, this is the first demonstration of aa sequence similarity between ENases produced by different organisms.

  18. Flexible, fast and accurate sequence alignment profiling on GPGPU with PaSWAS.

    PubMed

    Warris, Sven; Yalcin, Feyruz; Jackson, Katherine J L; Nap, Jan Peter

    2015-01-01

    To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW) algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate due to statistical issues. Current SW implementations that run on graphics hardware do not report the alignment details necessary for further analysis. With the Parallel SW Alignment Software (PaSWAS) it is possible (a) to have easy access to the computational power of NVIDIA-based general purpose graphics processing units (GPGPUs) to perform high-speed sequence alignments, and (b) retrieve relevant information such as score, number of gaps and mismatches. The software reports multiple hits per alignment. The added value of the new SW implementation is demonstrated with two test cases: (1) tag recovery in next generation sequence data and (2) isotype assignment within an immunoglobulin 454 sequence data set. Both cases show the usability and versatility of the new parallel Smith-Waterman implementation.

  19. DraGnET: Software for storing, managing and analyzing annotated draft genome sequence data

    PubMed Central

    2010-01-01

    Background New "next generation" DNA sequencing technologies offer individual researchers the ability to rapidly generate large amounts of genome sequence data at dramatically reduced costs. As a result, a need has arisen for new software tools for storage, management and analysis of genome sequence data. Although bioinformatic tools are available for the analysis and management of genome sequences, limitations still remain. For example, restrictions on the submission of data and use of these tools may be imposed, thereby making them unsuitable for sequencing projects that need to remain in-house or proprietary during their initial stages. Furthermore, the availability and use of next generation sequencing in industrial, governmental and academic environments requires biologist to have access to computational support for the curation and analysis of the data generated; however, this type of support is not always immediately available. Results To address these limitations, we have developed DraGnET (Draft Genome Evaluation Tool). DraGnET is an open source web application which allows researchers, with no experience in programming and database management, to setup their own in-house projects for storing, retrieving, organizing and managing annotated draft and complete genome sequence data. The software provides a web interface for the use of BLAST, allowing users to perform preliminary comparative analysis among multiple genomes. We demonstrate the utility of DraGnET for performing comparative genomics on closely related bacterial strains. Furthermore, DraGnET can be further developed to incorporate additional tools for more sophisticated analyses. Conclusions DraGnET is designed for use either by individual researchers or as a collaborative tool available through Internet (or Intranet) deployment. For genome projects that require genome sequencing data to initially remain proprietary, DraGnET provides the means for researchers to keep their data in-house for analysis using local programs or until it is made publicly available, at which point it may be uploaded to additional analysis software applications. The DraGnET home page is available at http://www.dragnet.cvm.iastate.edu and includes example files for examining the functionalities, a link for downloading the DraGnET setup package and a link to the DraGnET source code hosted with full documentation on SourceForge. PMID:20175920

  20. Integration of Bioinformatics and Synthetic Promoters Leads to the Discovery of Novel Elicitor-Responsive cis-Regulatory Sequences in Arabidopsis1[C][W][OA

    PubMed Central

    Koschmann, Jeannette; Machens, Fabian; Becker, Marlies; Niemeyer, Julia; Schulze, Jutta; Bülow, Lorenz; Stahl, Dietmar J.; Hehl, Reinhard

    2012-01-01

    A combination of bioinformatic tools, high-throughput gene expression profiles, and the use of synthetic promoters is a powerful approach to discover and evaluate novel cis-sequences in response to specific stimuli. With Arabidopsis (Arabidopsis thaliana) microarray data annotated to the PathoPlant database, 732 different queries with a focus on fungal and oomycete pathogens were performed, leading to 510 up-regulated gene groups. Using the binding site estimation suite of tools, BEST, 407 conserved sequence motifs were identified in promoter regions of these coregulated gene sets. Motif similarities were determined with STAMP, classifying the 407 sequence motifs into 37 families. A comparative analysis of these 37 families with the AthaMap, PLACE, and AGRIS databases revealed similarities to known cis-elements but also led to the discovery of cis-sequences not yet implicated in pathogen response. Using a parsley (Petroselinum crispum) protoplast system and a modified reporter gene vector with an internal transformation control, 25 elicitor-responsive cis-sequences from 10 different motif families were identified. Many of the elicitor-responsive cis-sequences also drive reporter gene expression in an Agrobacterium tumefaciens infection assay in Nicotiana benthamiana. This work significantly increases the number of known elicitor-responsive cis-sequences and demonstrates the successful integration of a diverse set of bioinformatic resources combined with synthetic promoter analysis for data mining and functional screening in plant-pathogen interaction. PMID:22744985

  1. Investigation of bacterial and archaeal communities: novel protocols using modern sequencing by Illumina MiSeq and traditional DGGE-cloning.

    PubMed

    Kraková, Lucia; Šoltys, Katarína; Budiš, Jaroslav; Grivalský, Tomáš; Ďuriš, František; Pangallo, Domenico; Szemes, Tomáš

    2016-09-01

    Different protocols based on Illumina high-throughput DNA sequencing and denaturing gradient gel electrophoresis (DGGE)-cloning were developed and applied for investigating hot spring related samples. The study was focused on three target genes: archaeal and bacterial 16S rRNA and mcrA of methanogenic microflora. Shorter read lengths of the currently most popular technology of sequencing by Illumina do not allow analysis of the complete 16S rRNA region, or of longer gene fragments, as was the case of Sanger sequencing. Here, we demonstrate that there is no need for special indexed or tailed primer sets dedicated to short variable regions of 16S rRNA since the presented approach allows the analysis of complete bacterial 16S rRNA amplicons (V1-V9) and longer archaeal 16S rRNA and mcrA sequences. Sample augmented with transposon is represented by a set of approximately 300 bp long fragments that can be easily sequenced by Illumina MiSeq. Furthermore, a low proportion of chimeric sequences was observed. DGGE-cloning based strategies were performed combining semi-nested PCR, DGGE and clone library construction. Comparing both investigation methods, a certain degree of complementarity was observed confirming that the DGGE-cloning approach is not obsolete. Novel protocols were created for several types of laboratories, utilizing the traditional DGGE technique or using the most modern Illumina sequencing.

  2. Application of time-resolved shadowgraph imaging and computer analysis to study micrometer-scale response of superfluid helium

    NASA Astrophysics Data System (ADS)

    Sajjadi, Seyed; Buelna, Xavier; Eloranta, Jussi

    2018-01-01

    Application of inexpensive light emitting diodes as backlight sources for time-resolved shadowgraph imaging is demonstrated. The two light sources tested are able to produce light pulse sequences in the nanosecond and microsecond time regimes. After determining their time response characteristics, the diodes were applied to study the gas bubble formation around laser-heated copper nanoparticles in superfluid helium at 1.7 K and to determine the local cavitation bubble dynamics around fast moving metal micro-particles in the liquid. A convolutional neural network algorithm for analyzing the shadowgraph images by a computer is presented and the method is validated against the results from manual image analysis. The second application employed the red-green-blue light emitting diode source that produces light pulse sequences of the individual colors such that three separate shadowgraph frames can be recorded onto the color pixels of a charge-coupled device camera. Such an image sequence can be used to determine the moving object geometry, local velocity, and acceleration/deceleration. These data can be used to calculate, for example, the instantaneous Reynolds number for the liquid flow around the particle. Although specifically demonstrated for superfluid helium, the technique can be used to study the dynamic response of any medium that exhibits spatial variations in the index of refraction.

  3. The Oxidosqualene Cyclase from the Oomycete Saprolegnia parasitica Synthesizes Lanosterol as a Single Product

    PubMed Central

    Dahlin, Paul; Srivastava, Vaibhav; Bulone, Vincent; McKee, Lauren S.

    2016-01-01

    The first committed step of sterol biosynthesis is the cyclisation of 2,3-oxidosqualene to form either lanosterol (LA) or cycloartenol (CA). This is catalyzed by an oxidosqualene cyclase (OSC). LA and CA are subsequently converted into various sterols by a series of enzyme reactions. The specificity of the OSC therefore determines the final composition of the end sterols of an organism. Despite the functional importance of OSCs, the determinants of their specificity are not well understood. In sterol-synthesizing oomycetes, recent bioinformatics, and metabolite analysis suggest that LA is produced. However, this catalytic activity has never been experimentally demonstrated. Here, we show that the OSC of the oomycete Saprolegnia parasitica, a severe pathogen of salmonid fish, has an uncommon sequence in a conserved motif important for specificity. We present phylogenetic analysis revealing that this sequence is common to sterol-synthesizing oomycetes, as well as some plants, and hypothesize as to the evolutionary origin of some microbial sequences. We also demonstrate for the first time that a recombinant form of the OSC from S. parasitica produces LA exclusively. Our data pave the way for a detailed structural characterization of the protein and the possible development of specific inhibitors of oomycete OSCs for disease control in aquaculture. PMID:27881978

  4. BAC-pool sequencing and analysis confirms growth-associated QTLs in the Asian seabass genome.

    PubMed

    Shen, Xueyan; Ngoh, Si Yan; Thevasagayam, Natascha May; Prakki, Sai Rama Sridatta; Bhandare, Pranjali; Tan, Andy Wee Kiat; Tan, Gui Quan; Singh, Siddharth; Phua, Norman Chun Han; Vij, Shubha; Orbán, László

    2016-11-08

    The Asian seabass is an important marine food fish that has been cultured for several decades in Asia Pacific. However, the lack of a high quality reference genome has hampered efforts to improve its selective breeding. A 3D BAC pool set generated in this study was screened using 22 SSR markers located on linkage group 2 which contains a growth-related QTL region. Seventy-two clones corresponding to 22 FPC contigs were sequenced by Illumina MiSeq technology. We co-assembled the MiSeq-derived scaffolds from each FPC contig with error-corrected PacBio reads, resulting in 187 sequences covering 9.7 Mb. Eleven genes annotated within this region were found to be potentially associated with growth and their tissue-specific expression was investigated. Correlation analysis demonstrated that SNPs in ctsb, skp1 and ppp2ca can be potentially used as markers for selecting fast-growing fingerlings. Conserved syntenies between seabass LG2 and five other teleosts were identified. This study i) provided a 10 Mb targeted genome assembly; ii) demonstrated NGS of BAC pools as a potential approach for mining candidates underlying QTLs of this species; iii) detected eleven genes potentially responsible for growth in the QTL region; and iv) identified useful SNP markers for selective breeding programs of Asian seabass.

  5. Development and validation of a whole-exome sequencing test for simultaneous detection of point mutations, indels and copy-number alterations for precision cancer care

    PubMed Central

    Rennert, Hanna; Eng, Kenneth; Zhang, Tuo; Tan, Adrian; Xiang, Jenny; Romanel, Alessandro; Kim, Robert; Tam, Wayne; Liu, Yen-Chun; Bhinder, Bhavneet; Cyrta, Joanna; Beltran, Himisha; Robinson, Brian; Mosquera, Juan Miguel; Fernandes, Helen; Demichelis, Francesca; Sboner, Andrea; Kluk, Michael; Rubin, Mark A; Elemento, Olivier

    2016-01-01

    We describe Exome Cancer Test v1.0 (EXaCT-1), the first New York State-Department of Health-approved whole-exome sequencing (WES)-based test for precision cancer care. EXaCT-1 uses HaloPlex (Agilent) target enrichment followed by next-generation sequencing (Illumina) of tumour and matched constitutional control DNA. We present a detailed clinical development and validation pipeline suitable for simultaneous detection of somatic point/indel mutations and copy-number alterations (CNAs). A computational framework for data analysis, reporting and sign-out is also presented. For the validation, we tested EXaCT-1 on 57 tumours covering five distinct clinically relevant mutations. Results demonstrated elevated and uniform coverage compatible with clinical testing as well as complete concordance in variant quality metrics between formalin-fixed paraffin embedded and fresh-frozen tumours. Extensive sensitivity studies identified limits of detection threshold for point/indel mutations and CNAs. Prospective analysis of 337 cancer cases revealed mutations in clinically relevant genes in 82% of tumours, demonstrating that EXaCT-1 is an accurate and sensitive method for identifying actionable mutations, with reasonable costs and time, greatly expanding its utility for advanced cancer care. PMID:28781886

  6. The Oxidosqualene Cyclase from the Oomycete Saprolegnia parasitica Synthesizes Lanosterol as a Single Product.

    PubMed

    Dahlin, Paul; Srivastava, Vaibhav; Bulone, Vincent; McKee, Lauren S

    2016-01-01

    The first committed step of sterol biosynthesis is the cyclisation of 2,3-oxidosqualene to form either lanosterol (LA) or cycloartenol (CA). This is catalyzed by an oxidosqualene cyclase (OSC). LA and CA are subsequently converted into various sterols by a series of enzyme reactions. The specificity of the OSC therefore determines the final composition of the end sterols of an organism. Despite the functional importance of OSCs, the determinants of their specificity are not well understood. In sterol-synthesizing oomycetes, recent bioinformatics, and metabolite analysis suggest that LA is produced. However, this catalytic activity has never been experimentally demonstrated. Here, we show that the OSC of the oomycete Saprolegnia parasitica , a severe pathogen of salmonid fish, has an uncommon sequence in a conserved motif important for specificity. We present phylogenetic analysis revealing that this sequence is common to sterol-synthesizing oomycetes, as well as some plants, and hypothesize as to the evolutionary origin of some microbial sequences. We also demonstrate for the first time that a recombinant form of the OSC from S. parasitica produces LA exclusively. Our data pave the way for a detailed structural characterization of the protein and the possible development of specific inhibitors of oomycete OSCs for disease control in aquaculture.

  7. Transcriptome Sequence and Plasmid Copy Number Analysis of the Brewery Isolate Pediococcus claussenii ATCC BAA-344T during Growth in Beer

    PubMed Central

    Pittet, Vanessa; Phister, Trevor G.; Ziola, Barry

    2013-01-01

    Growth of specific lactic acid bacteria in beer leads to spoiled product and economic loss for the brewing industry. Microbial growth is typically inhibited by the combined stresses found in beer (e.g., ethanol, hops, low pH, minimal nutrients); however, certain bacteria have adapted to grow in this harsh environment. Considering little is known about the mechanisms used by bacteria to grow in and spoil beer, transcriptome sequencing was performed on a variant of the beer-spoilage organism Pediococcus claussenii ATCC BAA-344T (Pc344-358). Illumina sequencing was used to compare the transcript levels in Pc344-358 growing mid-exponentially in beer to those in nutrient-rich MRS broth. Various operons demonstrated high gene expression in beer, several of which are involved in nutrient acquisition and overcoming the inhibitory effects of hop compounds. As well, genes functioning in cell membrane modification and biosynthesis demonstrated significantly higher transcript levels in Pc344-358 growing in beer. Three plasmids had the majority of their genes showing increased transcript levels in beer, whereas the two cryptic plasmids showed slightly decreased gene expression. Follow-up analysis of plasmid copy number in both growth environments revealed similar trends, where more copies of the three non-cryptic plasmids were found in Pc344-358 growing in beer. Transcriptome sequencing also enabled the addition of several genes to the P . claussenii ATCC BAA-344T genome annotation, some of which are putatively transcribed as non-coding RNAs. The sequencing results not only provide the first transcriptome description of a beer-spoilage organism while growing in beer, but they also highlight several targets for future exploration, including genes that may have a role in the general stress response of lactic acid bacteria. PMID:24040005

  8. PreCisIon: PREdiction of CIS-regulatory elements improved by gene's positION.

    PubMed

    Elati, Mohamed; Nicolle, Rémy; Junier, Ivan; Fernández, David; Fekih, Rim; Font, Julio; Képès, François

    2013-02-01

    Conventional approaches to predict transcriptional regulatory interactions usually rely on the definition of a shared motif sequence on the target genes of a transcription factor (TF). These efforts have been frustrated by the limited availability and accuracy of TF binding site motifs, usually represented as position-specific scoring matrices, which may match large numbers of sites and produce an unreliable list of target genes. To improve the prediction of binding sites, we propose to additionally use the unrelated knowledge of the genome layout. Indeed, it has been shown that co-regulated genes tend to be either neighbors or periodically spaced along the whole chromosome. This study demonstrates that respective gene positioning carries significant information. This novel type of information is combined with traditional sequence information by a machine learning algorithm called PreCisIon. To optimize this combination, PreCisIon builds a strong gene target classifier by adaptively combining weak classifiers based on either local binding sequence or global gene position. This strategy generically paves the way to the optimized incorporation of any future advances in gene target prediction based on local sequence, genome layout or on novel criteria. With the current state of the art, PreCisIon consistently improves methods based on sequence information only. This is shown by implementing a cross-validation analysis of the 20 major TFs from two phylogenetically remote model organisms. For Bacillus subtilis and Escherichia coli, respectively, PreCisIon achieves on average an area under the receiver operating characteristic curve of 70 and 60%, a sensitivity of 80 and 70% and a specificity of 60 and 56%. The newly predicted gene targets are demonstrated to be functionally consistent with previously known targets, as assessed by analysis of Gene Ontology enrichment or of the relevant literature and databases.

  9. Implemented Lomb-Scargle periodogram: a valuable tool for improving cyclostratigraphic research on unevenly sampled deep-sea stratigraphic sequences

    NASA Astrophysics Data System (ADS)

    Pardo-Iguzquiza, Eulogio; Rodríguez-Tovar, Francisco J.

    2011-12-01

    One important handicap when working with stratigraphic sequences is the discontinuous character of the sedimentary record, especially relevant in cyclostratigraphic analysis. Uneven palaeoclimatic/palaeoceanographic time series are common, their cyclostratigraphic analysis being comparatively difficult because most spectral methodologies are appropriate only when working with even sampling. As a means to solve this problem, a program for calculating the smoothed Lomb-Scargle periodogram and cross-periodogram, which additionally evaluates the statistical confidence of the estimated power spectrum through a Monte Carlo procedure (the permutation test), has been developed. The spectral analysis of a short uneven time series calls for assessment of the statistical significance of the spectral peaks, since a periodogram can always be calculated but the main challenge resides in identifying true spectral features. To demonstrate the effectiveness of this program, two case studies are presented: the one deals with synthetic data and the other with paleoceanographic/palaeoclimatic proxies. On a simulated time series of 500 data, two uneven time series (with 100 and 25 data) were generated by selecting data at random. Comparative analysis between the power spectra from the simulated series and from the two uneven time series demonstrates the usefulness of the smoothed Lomb-Scargle periodogram for uneven sequences, making it possible to distinguish between statistically significant and spurious spectral peaks. Fragmentary time series of Cd/Ca ratios and δ18O from core AII107-131 of SPECMAP were analysed as a real case study. The efficiency of the direct and cross Lomb-Scargle periodogram in recognizing Milankovitch and sub-Milankovitch signals related to palaeoclimatic/palaeoceanographic changes is demonstrated. As implemented, the Lomb-Scargle periodogram may be applied to any palaeoclimatic/palaeoceanographic proxies, including those usually recovered from contourites, and it holds special interest in the context of centennial- to millennial-scale climatic changes affecting contouritic currents.

  10. Identification of Genomic Insertion and Flanking Sequence of G2-EPSPS and GAT Transgenes in Soybean Using Whole Genome Sequencing Method.

    PubMed

    Guo, Bingfu; Guo, Yong; Hong, Huilong; Qiu, Li-Juan

    2016-01-01

    Molecular characterization of sequence flanking exogenous fragment insertion is essential for safety assessment and labeling of genetically modified organism (GMO). In this study, the T-DNA insertion sites and flanking sequences were identified in two newly developed transgenic glyphosate-tolerant soybeans GE-J16 and ZH10-6 based on whole genome sequencing (WGS) method. More than 22.4 Gb sequence data (∼21 × coverage) for each line was generated on Illumina HiSeq 2500 platform. The junction reads mapped to boundaries of T-DNA and flanking sequences in these two events were identified by comparing all sequencing reads with soybean reference genome and sequence of transgenic vector. The putative insertion loci and flanking sequences were further confirmed by PCR amplification, Sanger sequencing, and co-segregation analysis. All these analyses supported that exogenous T-DNA fragments were integrated in positions of Chr19: 50543767-50543792 and Chr17: 7980527-7980541 in these two transgenic lines. Identification of genomic insertion sites of G2-EPSPS and GAT transgenes will facilitate the utilization of their glyphosate-tolerant traits in soybean breeding program. These results also demonstrated that WGS was a cost-effective and rapid method for identifying sites of T-DNA insertions and flanking sequences in soybean.

  11. Enhancing Next-Generation Sequencing-Guided Cancer Care Through Cognitive Computing.

    PubMed

    Patel, Nirali M; Michelini, Vanessa V; Snell, Jeff M; Balu, Saianand; Hoyle, Alan P; Parker, Joel S; Hayward, Michele C; Eberhard, David A; Salazar, Ashley H; McNeillie, Patrick; Xu, Jia; Huettner, Claudia S; Koyama, Takahiko; Utro, Filippo; Rhrissorrakrai, Kahn; Norel, Raquel; Bilal, Erhan; Royyuru, Ajay; Parida, Laxmi; Earp, H Shelton; Grilley-Olson, Juneko E; Hayes, D Neil; Harvey, Stephen J; Sharpless, Norman E; Kim, William Y

    2018-02-01

    Using next-generation sequencing (NGS) to guide cancer therapy has created challenges in analyzing and reporting large volumes of genomic data to patients and caregivers. Specifically, providing current, accurate information on newly approved therapies and open clinical trials requires considerable manual curation performed mainly by human "molecular tumor boards" (MTBs). The purpose of this study was to determine the utility of cognitive computing as performed by Watson for Genomics (WfG) compared with a human MTB. One thousand eighteen patient cases that previously underwent targeted exon sequencing at the University of North Carolina (UNC) and subsequent analysis by the UNCseq informatics pipeline and the UNC MTB between November 7, 2011, and May 12, 2015, were analyzed with WfG, a cognitive computing technology for genomic analysis. Using a WfG-curated actionable gene list, we identified additional genomic events of potential significance (not discovered by traditional MTB curation) in 323 (32%) patients. The majority of these additional genomic events were considered actionable based upon their ability to qualify patients for biomarker-selected clinical trials. Indeed, the opening of a relevant clinical trial within 1 month prior to WfG analysis provided the rationale for identification of a new actionable event in nearly a quarter of the 323 patients. This automated analysis took <3 minutes per case. These results demonstrate that the interpretation and actionability of somatic NGS results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive computing could potentially improve patient care by providing a rapid, comprehensive approach for data analysis and consideration of up-to-date availability of clinical trials. The results of this study demonstrate that the interpretation and actionability of somatic next-generation sequencing results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive computing can significantly improve patient care by providing a fast, cost-effective, and comprehensive approach for data analysis in the delivery of precision medicine. Patients and physicians who are considering enrollment in clinical trials may benefit from the support of such tools applied to genomic data. © AlphaMed Press 2017.

  12. Sequence Variants and Haplotype Analysis of Cat ERBB2 Gene: A Survey on Spontaneous Cat Mammary Neoplastic and Non-Neoplastic Lesions

    PubMed Central

    Santos, Sara; Bastos, Estela; Baptista, Cláudia S.; Sá, Daniela; Caloustian, Christophe; Guedes-Pinto, Henrique; Gärtner, Fátima; Gut, Ivo G.; Chaves, Raquel

    2012-01-01

    The human ERBB2 proto-oncogene is widely considered a key gene involved in human breast cancer onset and progression. Among spontaneous tumors, mammary tumors are the most frequent cause of cancer death in cats and second most frequent in humans. In fact, naturally occurring tumors in domestic animals, more particularly cat mammary tumors, have been proposed as a good model for human breast cancer, but critical genetic and molecular information is still scarce. The aims of this study include the analysis of the cat ERBB2 gene partial sequences (between exon 17 and 20) in order to characterize a normal and a mammary lesion heterogeneous populations. Cat genomic DNA was extracted from normal frozen samples (n = 16) and from frozen and formalin-fixed paraffin-embedded mammary lesion samples (n = 41). We amplified and sequenced two cat ERBB2 DNA fragments comprising exons 17 to 20. It was possible to identify five sequence variants and six haplotypes in the total population. Two sequence variants and two haplotypes show to be specific for cat mammary tumor samples. Bioinformatics analysis predicts that four of the sequence variants can produce alternative transcripts or activate cryptic splicing sites. Also, a possible association was identified between clinicopathological traits and the variant haplotypes. As far as we know, this is the first attempt to examine ERBB2 genetic variations in cat mammary genome and its possible association with the onset and progression of cat mammary tumors. The demonstration of a possible association between primary tumor size (one of the two most important prognostic factors) and the number of masses with the cat ERBB2 variant haplotypes reveal the importance of the analysis of this gene in veterinary medicine. PMID:22489125

  13. Sirius PSB: a generic system for analysis of biological sequences.

    PubMed

    Koh, Chuan Hock; Lin, Sharene; Jedd, Gregory; Wong, Limsoon

    2009-12-01

    Computational tools are essential components of modern biological research. For example, BLAST searches can be used to identify related proteins based on sequence homology, or when a new genome is sequenced, prediction models can be used to annotate functional sites such as transcription start sites, translation initiation sites and polyadenylation sites and to predict protein localization. Here we present Sirius Prediction Systems Builder (PSB), a new computational tool for sequence analysis, classification and searching. Sirius PSB has four main operations: (1) Building a classifier, (2) Deploying a classifier, (3) Search for proteins similar to query proteins, (4) Preliminary and post-prediction analysis. Sirius PSB supports all these operations via a simple and interactive graphical user interface. Besides being a convenient tool, Sirius PSB has also introduced two novelties in sequence analysis. Firstly, genetic algorithm is used to identify interesting features in the feature space. Secondly, instead of the conventional method of searching for similar proteins via sequence similarity, we introduced searching via features' similarity. To demonstrate the capabilities of Sirius PSB, we have built two prediction models - one for the recognition of Arabidopsis polyadenylation sites and another for the subcellular localization of proteins. Both systems are competitive against current state-of-the-art models based on evaluation of public datasets. More notably, the time and effort required to build each model is greatly reduced with the assistance of Sirius PSB. Furthermore, we show that under certain conditions when BLAST is unable to find related proteins, Sirius PSB can identify functionally related proteins based on their biophysical similarities. Sirius PSB and its related supplements are available at: http://compbio.ddns.comp.nus.edu.sg/~sirius.

  14. Network Analysis of Sequence-Function Relationships and Exploration of Sequence Space of TEM β-Lactamases.

    PubMed

    Zeil, Catharina; Widmann, Michael; Fademrecht, Silvia; Vogel, Constantin; Pleiss, Jürgen

    2016-05-01

    The Lactamase Engineering Database (www.LacED.uni-stuttgart.de) was developed to facilitate the classification and analysis of TEM β-lactamases. The current version contains 474 TEM variants. Two hundred fifty-nine variants form a large scale-free network of highly connected point mutants. The network was divided into three subnetworks which were enriched by single phenotypes: one network with predominantly 2be and two networks with 2br phenotypes. Fifteen positions were found to be highly variable, contributing to the majority of the observed variants. Since it is expected that a considerable fraction of the theoretical sequence space is functional, the currently sequenced 474 variants represent only the tip of the iceberg of functional TEM β-lactamase variants which form a huge natural reservoir of highly interconnected variants. Almost 50% of the variants are part of a quartet. Thus, two single mutations that result in functional enzymes can be combined into a functional protein. Most of these quartets consist of the same phenotype, or the mutations are additive with respect to the phenotype. By predicting quartets from triplets, 3,916 unknown variants were constructed. Eighty-seven variants complement multiple quartets and therefore have a high probability of being functional. The construction of a TEM β-lactamase network and subsequent analyses by clustering and quartet prediction are valuable tools to gain new insights into the viable sequence space of TEM β-lactamases and to predict their phenotype. The highly connected sequence space of TEM β-lactamases is ideally suited to network analysis and demonstrates the strengths of network analysis over tree reconstruction methods. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  15. ISOLATION AND IDENTIFICATION OF FRESHWATER BACTERIA ANTAGONISTIC TO GIARDIA INTESTINALIS CYSTS

    EPA Science Inventory

    We have isolated three freshwater bacterial strains that demonstrate the ability to degrade Giardia intestinalis cysts. These strains have been identified by 16S rRNA sequencing and phylogenetic analysis as belonging to the Flavobacterium columnare clade of the ...

  16. Restructuring a General Microbiology Laboratory into an Investigative Experience.

    ERIC Educational Resources Information Center

    Deutch, Charles E.

    1994-01-01

    Describes an investigative laboratory sequence based upon the isolation and characterization of soil bacteria to aid microbiology teachers in providing students with activities that expose them to basic techniques of microbiology as well as demonstrates the scientific process and the experimental analysis of microorganisms. (ZWH)

  17. Informatic and genomic analysis of melanocyte cDNA libraries as a resource for the study of melanocyte development and function.

    PubMed

    Baxter, Laura L; Hsu, Benjamin J; Umayam, Lowell; Wolfsberg, Tyra G; Larson, Denise M; Frith, Martin C; Kawai, Jun; Hayashizaki, Yoshihide; Carninci, Piero; Pavan, William J

    2007-06-01

    As part of the RIKEN mouse encyclopedia project, two cDNA libraries were prepared from melanocyte-derived cell lines, using techniques of full-length clone selection and subtraction/normalization to enrich for rare transcripts. End sequencing showed that these libraries display over 83% complete coding sequence at the 5' end and 96-97% complete coding sequence at the 3' end. Evaluation of the libraries, derived from B16F10Y tumor cells and melan-c cells, revealed that they contain clones for a majority of the genes previously demonstrated to function in melanocyte biology. Analysis of genomic locations for transcripts revealed that the distribution of melanocyte genes is non-random throughout the genome. Three genomic regions identified that showed significant clustering of melanocyte-expressed genes contain one or more genes previously shown to regulate melanocyte development or function. A catalog of genes expressed in these libraries is presented, providing a valuable resource of cDNA clones and sequence information that can be used for identification of new genes important for melanocyte development, function, and disease.

  18. A novel, extremely alkaliphilic and cold-active esterase from Antarctic desert soil.

    PubMed

    Hu, Xiao Ping; Heath, Caroline; Taylor, Mark Paul; Tuffin, Marla; Cowan, Don

    2012-01-01

    A novel, cold-active and highly alkaliphilic esterase was isolated from an Antarctic desert soil metagenomic library by functional screening. The 1,044 bp gene sequence contained several conserved regions common to lipases/esterases, but lacked clear classification based on sequence analysis alone. Moderate (<40%) amino acid sequence similarity to known esterases was apparent (the closest neighbour being a hypothetical protein from Chitinophaga pinensis), despite phylogenetic distance to many of the lipolytic "families". The enzyme functionally demonstrated activity towards shorter chain p-nitrophenyl esters with the optimal activity recorded towards p-nitrophenyl propionate (C3). The enzyme possessed an apparent T(opt) at 20°C and a pH optimum at pH 11. Esterases possessing such extreme alkaliphily are rare and so this enzyme represents an intriguing novel locus in protein sequence space. A metagenomic approach has been shown, in this case, to yield an enzyme with quite different sequential/structural properties to known lipases. It serves as an excellent candidate for analysis of the molecular mechanisms responsible for both cold and alkaline activity and novel structure-function relationships of esterase activity.

  19. Robust analysis of semiparametric renewal process models

    PubMed Central

    Lin, Feng-Chang; Truong, Young K.; Fine, Jason P.

    2013-01-01

    Summary A rate model is proposed for a modulated renewal process comprising a single long sequence, where the covariate process may not capture the dependencies in the sequence as in standard intensity models. We consider partial likelihood-based inferences under a semiparametric multiplicative rate model, which has been widely studied in the context of independent and identical data. Under an intensity model, gap times in a single long sequence may be used naively in the partial likelihood with variance estimation utilizing the observed information matrix. Under a rate model, the gap times cannot be treated as independent and studying the partial likelihood is much more challenging. We employ a mixing condition in the application of limit theory for stationary sequences to obtain consistency and asymptotic normality. The estimator's variance is quite complicated owing to the unknown gap times dependence structure. We adapt block bootstrapping and cluster variance estimators to the partial likelihood. Simulation studies and an analysis of a semiparametric extension of a popular model for neural spike train data demonstrate the practical utility of the rate approach in comparison with the intensity approach. PMID:24550568

  20. Molecular cloning of a putative gene encoding isopentenyltransferase from pingyitiancha (Malus hupehensis) and characterization of its response to nitrate.

    PubMed

    Peng, Jing; Peng, Futian; Zhu, Chunfu; Wei, Shaochong

    2008-06-01

    A putative isopentenyltransferase (IPT) encoding gene was identified from a pingyitiancha (Malus hupehensis Rehd.) expressed sequence tag database, and the full-length gene was cloned by RACE. Based on expression profile and sequence alignment, the nucleotide sequence of the clone, named MhIPT3, was most similar to AtIPT3, an IPT gene in Arabidopsis. The full-length cDNA contained a 963-bp open reading frame encoding a protein of 321 amino acids with a molecular mass of 37.3 kDa. Sequence analysis of genomic DNA revealed the absence of introns in the frame. Quantitative real-time PCR analysis demonstrated that the gene was expressed in roots, stems and leaves. Application of nitrate to roots of nitrogen-deprived seedlings strongly induced expression of MhIPT3 and was accompanied by the accumulation of cytokinins, whereas MhIPT3 expression was little affected by ammonium application to roots of nitrogen-deprived seedlings. Application of nitrate to leaves also up-regulated the expression of MhIPT3 and corresponded closely with the accumulation of isopentyladenine and isopentyladenosine in leaves.

  1. Sedimentology and geochemistry of lacustrine sequences of the upper Pleistocene and holocene in intertropical area (Lake Magadi and Green crater lake): paleoclimatic implications

    NASA Astrophysics Data System (ADS)

    Damnati, B.

    1993-05-01

    Sedimentological and geochemical analyses have been carried out on lacustrine deposits of East Africa, at Lake Magadi (2°S, 36°E, Kenya) and at Green Crater Lake (0°S, 36°E, Kenya), to determine the parameters controlling climatic and environmental dynamics during late Pleistocene and Holocene. These sedimentary sequences were collected with a stationary piston corer. At Lake Magadi (Fig. 1), sedimentary and geochemical control show three phases of lake level variation which corresponds to climatic change occurring during the last 40 thousand years. These phases were defined by three lithostratigraphic units. Laminated deposits of Lake Magadi were formed during a wet period. Analysis of these laminae define two microfacies: a dark lamina, characterised by lacustrine organic matter and a light lamina enriched in detritus, carbonates (CaCO 3) and magadiite (NaSi 7O 13(OH) 3, 3H 2O). The formation and preservation of each couplet was favoured by climatic contrast, lake stratification and various origin of the sediments (autochthon and allochthon) in the drainage basin. Therefore a relative chronology can be derived from laminae counting and the duration of deposition of each couplet. Spectral analysis applied on variation of the laminae thickness, shows the existence of three main periods, 4-7 years, 8-14 years and 18-30 years, respectively (Fig. 2). These cyclicites of the lacustrine environment precise former determinations established on more recent lacustrine sequences from East Africa. They are related to the global climatic cycle (quasi-biannual oscillations, El Nino Southern Oscillations and the sun spot cycles). At Green Crater Lake, the study of the sedimentary sequence was completed by physico-chemical analysis of the waters and interface sediments which demonstrate the carbonate, sodium, bicarbonate composition and the thermal and chemical stratification of the modern lake. The sedimentary sequence is characterized by volcanic deposits overlain by physico-chemical analysis of the lake waters and interface sediments which demonstrate the carbonate, sodium, bicarbonate composition and the thermal and chemical stratification of the modern lake. The sedimentary sequence is characterized by volcanic deposits overlain by silt and clays deposited before 7400 years B.P., followed by loweing of the lake level at 3000 years B. P. Results from lake Magadi document the occurrence of a wet period starting at about 12,000 years B. P. The methodology applied on modern Green Crater lake provides base of interpretative models for other Holocene sequence lacustrine systems of intertropical zones.

  2. Morphological and molecular characterization of fungal pathogen, Magnaphorthe oryzae

    NASA Astrophysics Data System (ADS)

    Hasan, Nor'Aishah; Rafii, Mohd Y.; Rahim, Harun A.; Ali, Nusaibah Syd; Mazlan, Norida; Abdullah, Shamsiah

    2016-02-01

    Rice is arguably the most crucial food crops supplying quarter of calories intake. Fungal pathogen, Magnaphorthe oryzae promotes blast disease unconditionally to gramineous host including rice species. This disease spurred an outbreaks and constant threat to cereal production. Global rice yield declining almost 10-30% including Malaysia. As Magnaphorthe oryzae and its host is model in disease plant study, the rice blast pathosystem has been the subject of intense interest to overcome the importance of the disease to world agriculture. Therefore, in this study, our prime objective was to isolate samples of Magnaphorthe oryzae from diseased leaf obtained from MARDI Seberang Perai, Penang, Malaysia. Molecular identification was performed by sequences analysis from internal transcribed spacer (ITS) region of nuclear ribosomal RNA genes. Phylogenetic affiliation of the isolated samples were analyzed by comparing the ITS sequences with those deposited in the GenBank database. The sequence of the isolate demonstrated at least 99% nucleotide identity with the corresponding sequence in GenBank for Magnaphorthe oryzae. Morphological observed under microscope demonstrated that the structure of conidia followed similar characteristic as M. oryzae. Finding in this study provide useful information for breeding programs, epidemiology studies and improved disease management.

  3. Morphological and molecular characterization of fungal pathogen, Magnaphorthe oryzae

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hasan, Nor’Aishah, E-mail: aishahnh@ns.uitm.edu.my; Rafii, Mohd Y., E-mail: mrafii@upm.edu.my; Department of Crop Science, Universiti Putra Malaysia

    2016-02-01

    Rice is arguably the most crucial food crops supplying quarter of calories intake. Fungal pathogen, Magnaphorthe oryzae promotes blast disease unconditionally to gramineous host including rice species. This disease spurred an outbreaks and constant threat to cereal production. Global rice yield declining almost 10-30% including Malaysia. As Magnaphorthe oryzae and its host is model in disease plant study, the rice blast pathosystem has been the subject of intense interest to overcome the importance of the disease to world agriculture. Therefore, in this study, our prime objective was to isolate samples of Magnaphorthe oryzae from diseased leaf obtained from MARDI Seberangmore » Perai, Penang, Malaysia. Molecular identification was performed by sequences analysis from internal transcribed spacer (ITS) region of nuclear ribosomal RNA genes. Phylogenetic affiliation of the isolated samples were analyzed by comparing the ITS sequences with those deposited in the GenBank database. The sequence of the isolate demonstrated at least 99% nucleotide identity with the corresponding sequence in GenBank for Magnaphorthe oryzae. Morphological observed under microscope demonstrated that the structure of conidia followed similar characteristic as M. oryzae. Finding in this study provide useful information for breeding programs, epidemiology studies and improved disease management.« less

  4. QuickNGS elevates Next-Generation Sequencing data analysis to a new level of automation.

    PubMed

    Wagle, Prerana; Nikolić, Miloš; Frommolt, Peter

    2015-07-01

    Next-Generation Sequencing (NGS) has emerged as a widely used tool in molecular biology. While time and cost for the sequencing itself are decreasing, the analysis of the massive amounts of data remains challenging. Since multiple algorithmic approaches for the basic data analysis have been developed, there is now an increasing need to efficiently use these tools to obtain results in reasonable time. We have developed QuickNGS, a new workflow system for laboratories with the need to analyze data from multiple NGS projects at a time. QuickNGS takes advantage of parallel computing resources, a comprehensive back-end database, and a careful selection of previously published algorithmic approaches to build fully automated data analysis workflows. We demonstrate the efficiency of our new software by a comprehensive analysis of 10 RNA-Seq samples which we can finish in only a few minutes of hands-on time. The approach we have taken is suitable to process even much larger numbers of samples and multiple projects at a time. Our approach considerably reduces the barriers that still limit the usability of the powerful NGS technology and finally decreases the time to be spent before proceeding to further downstream analysis and interpretation of the data.

  5. Identification of somatic mutations in cancer through Bayesian-based analysis of sequenced genome pairs

    PubMed Central

    2013-01-01

    Background The field of cancer genomics has rapidly adopted next-generation sequencing (NGS) in order to study and characterize malignant tumors with unprecedented resolution. In particular for cancer, one is often trying to identify somatic mutations – changes specific to a tumor and not within an individual’s germline. However, false positive and false negative detections often result from lack of sufficient variant evidence, contamination of the biopsy by stromal tissue, sequencing errors, and the erroneous classification of germline variation as tumor-specific. Results We have developed a generalized Bayesian analysis framework for matched tumor/normal samples with the purpose of identifying tumor-specific alterations such as single nucleotide mutations, small insertions/deletions, and structural variation. We describe our methodology, and discuss its application to other types of paired-tissue analysis such as the detection of loss of heterozygosity as well as allelic imbalance. We also demonstrate the high level of sensitivity and specificity in discovering simulated somatic mutations, for various combinations of a) genomic coverage and b) emulated heterogeneity. Conclusion We present a Java-based implementation of our methods named Seurat, which is made available for free academic use. We have demonstrated and reported on the discovery of different types of somatic change by applying Seurat to an experimentally-derived cancer dataset using our methods; and have discussed considerations and practices regarding the accurate detection of somatic events in cancer genomes. Seurat is available at https://sites.google.com/site/seuratsomatic. PMID:23642077

  6. Identification of somatic mutations in cancer through Bayesian-based analysis of sequenced genome pairs.

    PubMed

    Christoforides, Alexis; Carpten, John D; Weiss, Glen J; Demeure, Michael J; Von Hoff, Daniel D; Craig, David W

    2013-05-04

    The field of cancer genomics has rapidly adopted next-generation sequencing (NGS) in order to study and characterize malignant tumors with unprecedented resolution. In particular for cancer, one is often trying to identify somatic mutations--changes specific to a tumor and not within an individual's germline. However, false positive and false negative detections often result from lack of sufficient variant evidence, contamination of the biopsy by stromal tissue, sequencing errors, and the erroneous classification of germline variation as tumor-specific. We have developed a generalized Bayesian analysis framework for matched tumor/normal samples with the purpose of identifying tumor-specific alterations such as single nucleotide mutations, small insertions/deletions, and structural variation. We describe our methodology, and discuss its application to other types of paired-tissue analysis such as the detection of loss of heterozygosity as well as allelic imbalance. We also demonstrate the high level of sensitivity and specificity in discovering simulated somatic mutations, for various combinations of a) genomic coverage and b) emulated heterogeneity. We present a Java-based implementation of our methods named Seurat, which is made available for free academic use. We have demonstrated and reported on the discovery of different types of somatic change by applying Seurat to an experimentally-derived cancer dataset using our methods; and have discussed considerations and practices regarding the accurate detection of somatic events in cancer genomes. Seurat is available at https://sites.google.com/site/seuratsomatic.

  7. Lessons learned from the initial sequencing of the pig genome: comparative analysis of an 8 Mb region of pig chromosome 17

    PubMed Central

    Hart, Elizabeth A; Caccamo, Mario; Harrow, Jennifer L; Humphray, Sean J; Gilbert, James GR; Trevanion, Steve; Hubbard, Tim; Rogers, Jane; Rothschild, Max F

    2007-01-01

    Background We describe here the sequencing, annotation and comparative analysis of an 8 Mb region of pig chromosome 17, which provides a useful test region to assess coverage and quality for the pig genome sequencing project. We report our findings comparing the annotation of draft sequence assembled at different depths of coverage. Results Within this region we annotated 71 loci, of which 53 are orthologous to human known coding genes. When compared to the syntenic regions in human (20q13.13-q13.33) and mouse (chromosome 2, 167.5 Mb-178.3 Mb), this region was found to be highly conserved with respect to gene order. The most notable difference between the three species is the presence of a large expansion of zinc finger coding genes and pseudogenes on mouse chromosome 2 between Edn3 and Phactr3 that is absent from pig and human. All of our annotation has been made publicly available in the Vertebrate Genome Annotation browser, VEGA. We assessed the impact of coverage on sequence assembly across this region and found, as expected, that increased sequence depth resulted in fewer, longer contigs. One-third of our annotated loci could not be fully re-aligned back to the low coverage version of the sequence, principally because the transcripts are fragmented over several contigs. Conclusion We have demonstrated the considerable advantages of sequencing at increased read depths and discuss the implications that lower coverage sequence may have on subsequent comparative and functional studies, particularly those involving complex loci such as GNAS. PMID:17705864

  8. Metagenomic analysis of the airborne environment in urban spaces.

    PubMed

    Be, Nicholas A; Thissen, James B; Fofanov, Viacheslav Y; Allen, Jonathan E; Rojas, Mark; Golovko, George; Fofanov, Yuriy; Koshinsky, Heather; Jaing, Crystal J

    2015-02-01

    The organisms in aerosol microenvironments, especially densely populated urban areas, are relevant to maintenance of public health and detection of potential epidemic or biothreat agents. To examine aerosolized microorganisms in this environment, we performed sequencing on the material from an urban aerosol surveillance program. Whole metagenome sequencing was applied to DNA extracted from air filters obtained during periods from each of the four seasons. The composition of bacteria, plants, fungi, invertebrates, and viruses demonstrated distinct temporal shifts. Bacillus thuringiensis serovar kurstaki was detected in samples known to be exposed to aerosolized spores, illustrating the potential utility of this approach for identification of intentionally introduced microbial agents. Together, these data demonstrate the temporally dependent metagenomic complexity of urban aerosols and the potential of genomic analytical techniques for biosurveillance and monitoring of threats to public health.

  9. From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline.

    PubMed

    Chen, Yunshun; Lun, Aaron T L; Smyth, Gordon K

    2016-01-01

    In recent years, RNA sequencing (RNA-seq) has become a very widely used technology for profiling gene expression. One of the most common aims of RNA-seq profiling is to identify genes or molecular pathways that are differentially expressed (DE) between two or more biological conditions. This article demonstrates a computational workflow for the detection of DE genes and pathways from RNA-seq data by providing a complete analysis of an RNA-seq experiment profiling epithelial cell subsets in the mouse mammary gland. The workflow uses R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, including alignment of read sequences, data exploration, differential expression analysis, visualization and pathway analysis. Read alignment and count quantification is conducted using the Rsubread package and the statistical analyses are performed using the edgeR package. The differential expression analysis uses the quasi-likelihood functionality of edgeR.

  10. Plant centromeres: structure and control.

    PubMed

    Richards, E J; Dawe, R K

    1998-04-01

    Recent work has led to a better understanding of the molecular components of plant centromeres. Conservation of at least some centromere protein constituents between plant and non-plant systems has been demonstrated. The identity and organization of plant centromeric DNA sequences are also beginning to yield to analysis. While there is little primary DNA sequence conservation among the characterized plant centromeres and their non-plant counterparts, some parallels in centromere genomic organisation can be seen across species. Finally, the emerging idea that centromere activity is controlled epigenetically finds support in an examination of the plant centromere literature.

  11. CanvasDB: a local database infrastructure for analysis of targeted- and whole genome re-sequencing projects

    PubMed Central

    Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf

    2014-01-01

    CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB PMID:25281234

  12. CanvasDB: a local database infrastructure for analysis of targeted- and whole genome re-sequencing projects.

    PubMed

    Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf

    2014-01-01

    CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB. © The Author(s) 2014. Published by Oxford University Press.

  13. FAVR (Filtering and Annotation of Variants that are Rare): methods to facilitate the analysis of rare germline genetic variants from massively parallel sequencing datasets

    PubMed Central

    2013-01-01

    Background Characterising genetic diversity through the analysis of massively parallel sequencing (MPS) data offers enormous potential to significantly improve our understanding of the genetic basis for observed phenotypes, including predisposition to and progression of complex human disease. Great challenges remain in resolving genetic variants that are genuine from the millions of artefactual signals. Results FAVR is a suite of new methods designed to work with commonly used MPS analysis pipelines to assist in the resolution of some of the issues related to the analysis of the vast amount of resulting data, with a focus on relatively rare genetic variants. To the best of our knowledge, no equivalent method has previously been described. The most important and novel aspect of FAVR is the use of signatures in comparator sequence alignment files during variant filtering, and annotation of variants potentially shared between individuals. The FAVR methods use these signatures to facilitate filtering of (i) platform and/or mapping-specific artefacts, (ii) common genetic variants, and, where relevant, (iii) artefacts derived from imbalanced paired-end sequencing, as well as annotation of genetic variants based on evidence of co-occurrence in individuals. We applied conventional variant calling applied to whole-exome sequencing datasets, produced using both SOLiD and TruSeq chemistries, with or without downstream processing by FAVR methods. We demonstrate a 3-fold smaller rare single nucleotide variant shortlist with no detected reduction in sensitivity. This analysis included Sanger sequencing of rare variant signals not evident in dbSNP131, assessment of known variant signal preservation, and comparison of observed and expected rare variant numbers across a range of first cousin pairs. The principles described herein were applied in our recent publication identifying XRCC2 as a new breast cancer risk gene and have been made publically available as a suite of software tools. Conclusions FAVR is a platform-agnostic suite of methods that significantly enhances the analysis of large volumes of sequencing data for the study of rare genetic variants and their influence on phenotypes. PMID:23441864

  14. Broadband excitation in nuclear magnetic resonance

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tycko, Robert

    1984-10-01

    Theoretical methods for designing sequences of radio frequency (rf) radiation pulses for broadband excitation of spin systems in nuclear magnetic resonance (NMR) are described. The sequences excite spins uniformly over large ranges of resonant frequencies arising from static magnetic field inhomogeneity, chemical shift differences, or spin couplings, or over large ranges of rf field amplitudes. Specific sequences for creating a population inversion or transverse magnetization are derived and demonstrated experimentally in liquid and solid state NMR. One approach to broadband excitation is based on principles of coherent averaging theory. A general formalism for deriving pulse sequences is given, along withmore » computational methods for specific cases. This approach leads to sequences that produce strictly constant transformations of a spin system. The importance of this feature in NMR applications is discussed. A second approach to broadband excitation makes use of iterative schemes, i.e. sets of operations that are applied repetitively to a given initial pulse sequences, generating a series of increasingly complex sequences with increasingly desirable properties. A general mathematical framework for analyzing iterative schemes is developed. An iterative scheme is treated as a function that acts on a space of operators corresponding to the transformations produced by all possible pulse sequences. The fixed points of the function and the stability of the fixed points are shown to determine the essential behavior of the scheme. Iterative schemes for broadband population inversion are treated in detail. Algebraic and numerical methods for performing the mathematical analysis are presented. Two additional topics are treated. The first is the construction of sequences for uniform excitation of double-quantum coherence and for uniform polarization transfer over a range of spin couplings. Double-quantum excitation sequences are demonstrated in a liquid crystal system. The second additional topic is the construction of iterative schemes for narrowband population inversion. The use of sequences that invert spin populations only over a narrow range of rf field amplitudes to spatially localize NMR signals in an rf field gradient is discussed.« less

  15. Complete genome sequence of Menghai rhabdovirus, a novel mosquito-borne rhabdovirus from China.

    PubMed

    Sun, Qiang; Zhao, Qiumin; An, Xiaoping; Guo, Xiaofang; Zuo, Shuqing; Zhang, Xianglilan; Pei, Guangqian; Liu, Wenli; Cheng, Shi; Wang, Yunfei; Shu, Peng; Mi, Zhiqiang; Huang, Yong; Zhang, Zhiyi; Tong, Yigang; Zhou, Hongning; Zhang, Jiusong

    2017-04-01

    Menghai rhabdovirus (MRV) was isolated from Aedes albopictus in Menghai county of Yunnan Province, China, in August 2010. Whole-genome sequencing of MRV was performed using an Ion PGM™ Sequencer. We found that MRV is a single-stranded, negative-sense RNA virus. The complete genome of MRV has 10,744 nt, with short inverted repeat termini, encoding five typical rhabdovirus proteins (N, P, M, G, and L) and an additional small hypothetical protein. Nucleotide BLAST analysis using the BLASTn method showed that the genome sequence most similar to that of MRV is that of Arboretum virus (NC_025393.1), with a Max score of 322, query coverage of 14%, and 66% identity. Genomic and phylogenetic analyses both demonstrated that MRV should be considered a member of a novel species of the family Rhabdoviridae.

  16. copia-like retrotransposons are ubiquitous among plants.

    PubMed Central

    Voytas, D F; Cummings, M P; Koniczny, A; Ausubel, F M; Rodermel, S R

    1992-01-01

    Transposable genetic elements are assumed to be a feature of all eukaryotic genomes. Their identification, however, has largely been haphazard, limited principally to organisms subjected to molecular or genetic scrutiny. We assessed the phylogenetic distribution of copia-like retrotransposons, a class of transposable element that proliferates by reverse transcription, using a polymerase chain reaction assay designed to detect copia-like element reverse transcriptase sequences. copia-like retrotransposons were identified in 64 plant species as well as the photosynthetic protist Volvox carteri. The plant species included representatives from 9 of 10 plant divisions, including bryophytes, lycopods, ferns, gymnosperms, and angiosperms. DNA sequence analysis of 29 cloned PCR products and of a maize retrotransposon cDNA confirmed the identity of these sequences as copia-like reverse transcriptase sequences, thereby demonstrating that this class of retrotransposons is a ubiquitous component of plant genomes. Images PMID:1379734

  17. Comparative analysis of Edwardsiella isolates from fish in the eastern United States identifies two distinct genetic taxa amongst organisms phenotypically classified as E. tarda

    USGS Publications Warehouse

    Griffin, Matt J.; Quiniou, Sylvie M.; Cody, Theresa; Tabuchi, Maki; Ware, Cynthia; Cipriano, Rocco C.; Mauel, Michael J.; Soto, Esteban

    2013-01-01

    Edwardsiella tarda, a Gram-negative member of the family Enterobacteriaceae, has been implicated in significant losses in aquaculture facilities worldwide. Here, we assessed the intra-specific variability of E. tarda isolates from 4 different fish species in the eastern United States. Repetitive sequence mediated PCR (rep-PCR) using 4 different primer sets (ERIC I & II, ERIC II, BOX, and GTG5) and multi-locus sequence analysis of 16S SSU rDNA, groEl, gyrA, gyrB, pho, pgi, pgm, and rpoA gene fragments identified two distinct genotypes of E. tarda (DNA group I; DNA group II). Isolates that fell into DNA group II demonstrated more similarity to E. ictaluri than DNA group I, which contained the reference E. tarda strain (ATCC #15947). Conventional PCR analysis using published E. tarda-specific primer sets yielded variable results, with several primer sets producing no observable amplification of target DNA from some isolates. Fluorometric determination of G + C content demonstrated 56.4% G + C content for DNA group I, 60.2% for DNA group II, and 58.4% for E. ictaluri. Surprisingly, these isolates were indistinguishable using conventional biochemical techniques, with all isolates demonstrating phenotypic characteristics consistent with E. tarda. Analysis using two commercial test kits identified multiple phenotypes, although no single metabolic characteristic could reliably discriminate between genetic groups. Additionally, anti-microbial susceptibility and fatty acid profiles did not demonstrate remarkable differences between groups. The significant genetic variation (<90% similarity at gyrA, gyrB, pho, phi and pgm; <40% similarity by rep-PCR) between these groups suggests organisms from DNA group II may represent an unrecognized, genetically distinct taxa of Edwardsiella that is phenotypically indistinguishable from E. tarda.

  18. BioVLAB-mCpG-SNP-EXPRESS: A system for multi-level and multi-perspective analysis and exploration of DNA methylation, sequence variation (SNPs), and gene expression from multi-omics data.

    PubMed

    Chae, Heejoon; Lee, Sangseon; Seo, Seokjun; Jung, Daekyoung; Chang, Hyeonsook; Nephew, Kenneth P; Kim, Sun

    2016-12-01

    Measuring gene expression, DNA sequence variation, and DNA methylation status is routinely done using high throughput sequencing technologies. To analyze such multi-omics data and explore relationships, reliable bioinformatics systems are much needed. Existing systems are either for exploring curated data or for processing omics data in the form of a library such as R. Thus scientists have much difficulty in investigating relationships among gene expression, DNA sequence variation, and DNA methylation using multi-omics data. In this study, we report a system called BioVLAB-mCpG-SNP-EXPRESS for the integrated analysis of DNA methylation, sequence variation (SNPs), and gene expression for distinguishing cellular phenotypes at the pairwise and multiple phenotype levels. The system can be deployed on either the Amazon cloud or a publicly available high-performance computing node, and the data analysis and exploration of the analysis result can be conveniently done using a web-based interface. In order to alleviate analysis complexity, all the process are fully automated, and graphical workflow system is integrated to represent real-time analysis progression. The BioVLAB-mCpG-SNP-EXPRESS system works in three stages. First, it processes and analyzes multi-omics data as input in the form of the raw data, i.e., FastQ files. Second, various integrated analyses such as methylation vs. gene expression and mutation vs. methylation are performed. Finally, the analysis result can be explored in a number of ways through a web interface for the multi-level, multi-perspective exploration. Multi-level interpretation can be done by either gene, gene set, pathway or network level and multi-perspective exploration can be explored from either gene expression, DNA methylation, sequence variation, or their relationship perspective. The utility of the system is demonstrated by performing analysis of phenotypically distinct 30 breast cancer cell line data set. BioVLAB-mCpG-SNP-EXPRESS is available at http://biohealth.snu.ac.kr/software/biovlab_mcpg_snp_express/. Copyright © 2016 Elsevier Inc. All rights reserved.

  19. Analysis of correlated mutations in HIV-1 protease using spectral clustering.

    PubMed

    Liu, Ying; Eyal, Eran; Bahar, Ivet

    2008-05-15

    The ability of human immunodeficiency virus-1 (HIV-1) protease to develop mutations that confer multi-drug resistance (MDR) has been a major obstacle in designing rational therapies against HIV. Resistance is usually imparted by a cooperative mechanism that can be elucidated by a covariance analysis of sequence data. Identification of such correlated substitutions of amino acids may be obscured by evolutionary noise. HIV-1 protease sequences from patients subjected to different specific treatments (set 1), and from untreated patients (set 2) were subjected to sequence covariance analysis by evaluating the mutual information (MI) between all residue pairs. Spectral clustering of the resulting covariance matrices disclosed two distinctive clusters of correlated residues: the first, observed in set 1 but absent in set 2, contained residues involved in MDR acquisition; and the second, included those residues differentiated in the various HIV-1 protease subtypes, shortly referred to as the phylogenetic cluster. The MDR cluster occupies sites close to the central symmetry axis of the enzyme, which overlap with the global hinge region identified from coarse-grained normal-mode analysis of the enzyme structure. The phylogenetic cluster, on the other hand, occupies solvent-exposed and highly mobile regions. This study demonstrates (i) the possibility of distinguishing between the correlated substitutions resulting from neutral mutations and those induced by MDR upon appropriate clustering analysis of sequence covariance data and (ii) a connection between global dynamics and functional substitution of amino acids.

  20. Exome sequencing for simultaneous mutation screening in children with hemophagocytic lymphohistiocytosis.

    PubMed

    Mukda, Ekchol; Trachoo, Objoon; Pasomsub, Ekawat; Tiyasirichokchai, Rawiphorn; Iemwimangsa, Nareenart; Sosothikul, Darintr; Chantratita, Wasun; Pakakasama, Samart

    2017-08-01

    In the present study, we used exome sequencing to analyze PRF1, UNC13D, STX11, and STXBP2, as well as genes associated with primary immunodeficiency disease (RAB27A, LYST, AP3B1, SH2D1A, ITK, CD27, XIAP, and MAGT1) in Thai children with hemophagocytic lymphohistiocytosis (HLH). We performed mutation analysis of HLH-associated genes in 25 Thai children using an exome sequencing method. Genetic variations found within these target genes were compared to exome sequencing data from 133 healthy individuals. Variants identified with minor allele frequencies <5% and novel mutations were confirmed using Sanger sequencing. Exome sequencing data revealed 101 non-synonymous single nucleotide polymorphisms (SNPs) in all subjects. These SNPs were classified as pathogenic (n = 1), likely pathogenic (n = 16), variant of unknown significance (n = 12), or benign variant (n = 72). Homozygous, compound heterozygous, and double-gene heterozygous variants, involving mutations in PRF1 (n = 3), UNC13D (n = 2), STXBP2 (n = 3), LYST (n = 3), XIAP (n = 2), AP3B1 (n = 1), RAB27A (n = 1), and MAGT1 (n = 1), were demonstrated in 12 patients. Novel mutations were found in most patients in this study. In conclusion, exome sequencing demonstrated the ability to identify rare genetic variants in HLH patients. This method is useful in the detection of mutations in multi-gene associated diseases.

  1. Illumina sequencing-based analysis of a microbial community enriched under anaerobic methane oxidation condition coupled to denitrification revealed coexistence of aerobic and anaerobic methanotrophs.

    PubMed

    Siniscalchi, Luciene Alves Batista; Leite, Laura Rabelo; Oliveira, Guilherme; Chernicharo, Carlos Augusto Lemos; de Araújo, Juliana Calabria

    2017-07-01

    Methane is produced in anaerobic environments, such as reactors used to treat wastewaters, and can be consumed by methanotrophs. The composition and structure of a microbial community enriched from anaerobic sewage sludge under methane-oxidation condition coupled to denitrification were investigated. Denaturing gradient gel electrophoresis (DGGE) analysis retrieved sequences of Methylocaldum and Chloroflexi. Deep sequencing analysis revealed a complex community that changed over time and was affected by methane concentration. Methylocaldum (8.2%), Methylosinus (2.3%), Methylomonas (0.02%), Methylacidiphilales (0.45%), Nitrospirales (0.18%), and Methanosarcinales (0.3%) were detected. Despite denitrifying conditions provided, Nitrospirales and Methanosarcinales, known to perform anaerobic methane oxidation coupled to denitrification (DAMO) process, were in very low abundance. Results demonstrated that aerobic and anaerobic methanotrophs coexisted in the reactor together with heterotrophic microorganisms, suggesting that a diverse microbial community was important to sustain methanotrophic activity. The methanogenic sludge was a good inoculum to enrich methanotrophs, and cultivation conditions play a selective role in determining community composition.

  2. Characterisation of ATM mutations in Slavic Ataxia telangiectasia patients.

    PubMed

    Soukupova, Jana; Pohlreich, Petr; Seemanova, Eva

    2011-09-01

    Ataxia telangiectasia (AT) is a genomic instability syndrome characterised, among others, by progressive cerebellar degeneration, oculocutaneous telangiectases, immunodeficiency, elevated serum alpha-phetoprotein level, chromosomal breakage, hypersensitivity to ionising radiation and increased cancer risk. This autosomal recessive disorder is caused by mutations in the ataxia telangiectasia mutated (ATM) gene coding for serine/threonine protein kinase with a crucial role in response to DNA double-strand breaks. We characterised genotype and phenotype of 12 Slavic AT patients from 11 families. Mutation analysis included sequencing of the entire coding sequence, adjacent intron regions, 3'UTR and 5'UTR of the ATM gene and multiplex ligation-dependent probe amplification (MLPA) for the detection of large deletions/duplications at the ATM locus. The high incidence of new and individual mutations demonstrates a marked mutational heterogeneity of AT in the Czech Republic. Our data indicate that sequence analysis of the entire coding region of ATM is sufficient for a high detection rate of mutations in ATM and that MLPA analysis for the detection of deletions/duplications seems to be redundant in the Slavic population.

  3. Structural Analysis of Biodiversity

    PubMed Central

    Sirovich, Lawrence; Stoeckle, Mark Y.; Zhang, Yu

    2010-01-01

    Large, recently-available genomic databases cover a wide range of life forms, suggesting opportunity for insights into genetic structure of biodiversity. In this study we refine our recently-described technique using indicator vectors to analyze and visualize nucleotide sequences. The indicator vector approach generates correlation matrices, dubbed Klee diagrams, which represent a novel way of assembling and viewing large genomic datasets. To explore its potential utility, here we apply the improved algorithm to a collection of almost 17000 DNA barcode sequences covering 12 widely-separated animal taxa, demonstrating that indicator vectors for classification gave correct assignment in all 11000 test cases. Indicator vector analysis revealed discontinuities corresponding to species- and higher-level taxonomic divisions, suggesting an efficient approach to classification of organisms from poorly-studied groups. As compared to standard distance metrics, indicator vectors preserve diagnostic character probabilities, enable automated classification of test sequences, and generate high-information density single-page displays. These results support application of indicator vectors for comparative analysis of large nucleotide data sets and raise prospect of gaining insight into broad-scale patterns in the genetic structure of biodiversity. PMID:20195371

  4. Application of Wikis with Scaffolding Structure in Laboratory Reporting

    ERIC Educational Resources Information Center

    Ge, Changfeng

    2012-01-01

    This work demonstrates how a Wiki can be mapped into different learning stages during group-based lab reporting via an adequate scaffolding structure. The scaffolding structure of the Wiki-based group report is comprised of six constructs in sequence: Appendix, Methods, Results, Analysis, Introduction and Conclusion. The scaffolding structure was…

  5. [Convergent origin of repeats in genes coding for globular proteins. An analysis of the factors determining the presence of inverted and symmetrical repeats].

    PubMed

    Solov'ev, V V; Kel', A E; Kolchanov, N A

    1989-01-01

    The factors, determining the presence of inverted and symmetrical repeats in genes coding for globular proteins, have been analysed. An interesting property of genetical code has been revealed in the analysis of symmetrical repeats: the pairs of symmetrical codons corresponded to pairs of amino acids with mostly similar physical-chemical parameters. This property may explain the presence of symmetrical repeats and palindromes only in genes coding for beta-structural proteins-polypeptides, where amino acids with similar physical-chemical properties occupy symmetrical positions. A stochastic model of evolution of polynucleotide sequences has been used for analysis of inverted repeats. The modelling demonstrated that only limiting of sequences (uneven frequencies of used codons) is enough for arising of nonrandom inverted repeats in genes.

  6. A Phylogenetic and Phenotypic Analysis of Salmonella enterica Serovar Weltevreden, an Emerging Agent of Diarrheal Disease in Tropical Regions

    PubMed Central

    Makendi, Carine; Page, Andrew J.; Wren, Brendan W.; Le Thi Phuong, Tu; Clare, Simon; Hale, Christine; Goulding, David; Klemm, Elizabeth J.; Pickard, Derek; Okoro, Chinyere; Hunt, Martin; Thompson, Corinne N.; Phu Huong Lan, Nguyen; Tran Do Hoang, Nhu; Thwaites, Guy E.; Le Hello, Simon; Brisabois, Anne; Weill, François-Xavier; Baker, Stephen; Dougan, Gordon

    2016-01-01

    Salmonella enterica serovar Weltevreden (S. Weltevreden) is an emerging cause of diarrheal and invasive disease in humans residing in tropical regions. Despite the regional and international emergence of this Salmonella serovar, relatively little is known about its genetic diversity, genomics or virulence potential in model systems. Here we used whole genome sequencing and bioinformatics analyses to define the phylogenetic structure of a diverse global selection of S. Weltevreden. Phylogenetic analysis of more than 100 isolates demonstrated that the population of S. Weltevreden can be segregated into two main phylogenetic clusters, one associated predominantly with continental Southeast Asia and the other more internationally dispersed. Subcluster analysis suggested the local evolution of S. Weltevreden within specific geographical regions. Four of the isolates were sequenced using long read sequencing to produce high quality reference genomes. Phenotypic analysis in Hep-2 cells and in a murine infection model indicated that S. Weltevreden were significantly attenuated in these models compared to the classical S. Typhimurium reference strain SL1344. Our work outlines novel insights into this important emerging pathogen and provides a baseline understanding for future research studies. PMID:26867150

  7. Methodology for the analysis of pollutant emissions from a city bus

    NASA Astrophysics Data System (ADS)

    Armas, Octavio; Lapuerta, Magín; Mata, Carmen

    2012-04-01

    In this work a methodology is proposed for measurement and analysis of gaseous emissions and particle size distributions emitted by a diesel city bus during its typical operation under urban driving conditions. As test circuit, a passenger transportation line at a Spanish city was used. Different ways for data processing and representation were studied and, derived from this work, a new approach is proposed. The methodology was useful to detect the most important uncertainties arising during registration and processing of data derived from a measurement campaign devoted to determine the main pollutant emissions. A HORIBA OBS-1300 gas analyzer and a TSI engine exhaust particle spectrometer were used with 1 Hz frequency data recording. The methodology proposed allows for the comparison of results (in mean values) derived from the analysis of either complete cycles or specific categories (or sequences). The analysis by categories is demonstrated to be a robust and helpful tool to isolate the effect of the main vehicle parameters (relative fuel-air ratio and velocity) on pollutant emissions. It was shown that acceleration sequences have the highest contribution to the total emissions, whereas deceleration sequences have the least.

  8. A De Novo-Assembly Based Data Analysis Pipeline for Plant Obligate Parasite Metatranscriptomic Studies.

    PubMed

    Guo, Li; Allen, Kelly S; Deiulio, Greg; Zhang, Yong; Madeiras, Angela M; Wick, Robert L; Ma, Li-Jun

    2016-01-01

    Current and emerging plant diseases caused by obligate parasitic microbes such as rusts, downy mildews, and powdery mildews threaten worldwide crop production and food safety. These obligate parasites are typically unculturable in the laboratory, posing technical challenges to characterize them at the genetic and genomic level. Here we have developed a data analysis pipeline integrating several bioinformatic software programs. This pipeline facilitates rapid gene discovery and expression analysis of a plant host and its obligate parasite simultaneously by next generation sequencing of mixed host and pathogen RNA (i.e., metatranscriptomics). We applied this pipeline to metatranscriptomic sequencing data of sweet basil (Ocimum basilicum) and its obligate downy mildew parasite Peronospora belbahrii, both lacking a sequenced genome. Even with a single data point, we were able to identify both candidate host defense genes and pathogen virulence genes that are highly expressed during infection. This demonstrates the power of this pipeline for identifying genes important in host-pathogen interactions without prior genomic information for either the plant host or the obligate biotrophic pathogen. The simplicity of this pipeline makes it accessible to researchers with limited computational skills and applicable to metatranscriptomic data analysis in a wide range of plant-obligate-parasite systems.

  9. A plasma membrane sucrose-binding protein that mediates sucrose uptake shares structural and sequence similarity with seed storage proteins but remains functionally distinct.

    PubMed

    Overvoorde, P J; Chao, W S; Grimes, H D

    1997-06-20

    Photoaffinity labeling of a soybean cotyledon membrane fraction identified a sucrose-binding protein (SBP). Subsequent studies have shown that the SBP is a unique plasma membrane protein that mediates the linear uptake of sucrose in the presence of up to 30 mM external sucrose when ectopically expressed in yeast. Analysis of the SBP-deduced amino acid sequence indicates it lacks sequence similarity with other known transport proteins. Data presented here, however, indicate that the SBP shares significant sequence and structural homology with the vicilin-like seed storage proteins that organize into homotrimers. These similarities include a repeated sequence that forms the basis of the reiterated domain structure characteristic of the vicilin-like protein family. In addition, analytical ultracentrifugation and nonreducing SDS-polyacrylamide gel electrophoresis demonstrate that the SBP appears to be organized into oligomeric complexes with a Mr indicative of the existence of SBP homotrimers and homodimers. The structural similarity shared by the SBP and vicilin-like proteins provides a novel framework to explore the mechanistic basis of SBP-mediated sucrose uptake. Expression of the maize Glb protein (a vicilin-like protein closely related to the SBP) in yeast demonstrates that a closely related vicilin-like protein is unable to mediate sucrose uptake. Thus, despite sequence and structural similarities shared by the SBP and the vicilin-like protein family, the SBP is functionally divergent from other members of this group.

  10. Mapping-by-sequencing in complex polyploid genomes using genic sequence capture: a case study to map yellow rust resistance in hexaploid wheat.

    PubMed

    Gardiner, Laura-Jayne; Bansept-Basler, Pauline; Olohan, Lisa; Joynson, Ryan; Brenchley, Rachel; Hall, Neil; O'Sullivan, Donal M; Hall, Anthony

    2016-08-01

    Previously we extended the utility of mapping-by-sequencing by combining it with sequence capture and mapping sequence data to pseudo-chromosomes that were organized using wheat-Brachypodium synteny. This, with a bespoke haplotyping algorithm, enabled us to map the flowering time locus in the diploid wheat Triticum monococcum L. identifying a set of deleted genes (Gardiner et al., 2014). Here, we develop this combination of gene enrichment and sliding window mapping-by-synteny analysis to map the Yr6 locus for yellow stripe rust resistance in hexaploid wheat. A 110 MB NimbleGen capture probe set was used to enrich and sequence a doubled haploid mapping population of hexaploid wheat derived from an Avalon and Cadenza cross. The Yr6 locus was identified by mapping to the POPSEQ chromosomal pseudomolecules using a bespoke pipeline and algorithm (Chapman et al., 2015). Furthermore the same locus was identified using newly developed pseudo-chromosome sequences as a mapping reference that are based on the genic sequence used for sequence enrichment. The pseudo-chromosomes allow us to demonstrate the application of mapping-by-sequencing to even poorly defined polyploidy genomes where chromosomes are incomplete and sub-genome assemblies are collapsed. This analysis uniquely enabled us to: compare wheat genome annotations; identify the Yr6 locus - defining a smaller genic region than was previously possible; associate the interval with one wheat sub-genome and increase the density of SNP markers associated. Finally, we built the pipeline in iPlant, making it a user-friendly community resource for phenotype mapping. © 2016 The Authors. The Plant Journal published by Society for Experimental Biology and John Wiley & Sons Ltd.

  11. Sequence analysis of dolphin ferritin H and L subunits and possible iron-dependent translational control of dolphin ferritin gene

    PubMed Central

    Takaesu, Azusa; Watanabe, Kiyotaka; Takai, Shinji; Sasaki, Yukako; Orino, Koichi

    2008-01-01

    Background Iron-storage protein, ferritin plays a central role in iron metabolism. Ferritin has dual function to store iron and segregate iron for protection of iron-catalyzed reactive oxygen species. Tissue ferritin is composed of two kinds of subunits (H: heavy chain or heart-type subunit; L: light chain or liver-type subunit). Ferritin gene expression is controlled at translational level in iron-dependent manner or at transcriptional level in iron-independent manner. However, sequencing analysis of marine mammalian ferritin subunits has not yet been performed fully. The purpose of this study is to reveal cDNA-derived amino acid sequences of cetacean ferritin H and L subunits, and demonstrate the possibility of expression of these subunits, especially H subunit, by iron. Methods Sequence analyses of cetacean ferritin H and L subunits were performed by direct sequencing of polymerase chain reaction (PCR) fragments from cDNAs generated via reverse transcription-PCR of leukocyte total RNA prepared from blood samples of six different dolphin species (Pseudorca crassidens, Lagenorhynchus obliquidens, Grampus griseus, Globicephala macrorhynchus, Tursiops truncatus, and Delphinapterus leucas). The putative iron-responsive element sequence in the 5'-untranslated region of the six different dolphin species was revealed by direct sequencing of PCR fragments obtained using leukocyte genomic DNA. Results Dolphin H and L subunits consist of 182 and 174 amino acids, respectively, and amino acid sequence identities of ferritin subunits among these dolphins are highly conserved (H: 99–100%, (99→98) ; L: 98–100%). The conserved 28 bp IRE sequence was located -144 bp upstream from the initiation codon in the six different dolphin species. Conclusion These results indicate that six different dolphin species have conserved ferritin sequences, and suggest that these genes are iron-dependently expressed. PMID:18954429

  12. Functional sequencing read annotation for high precision microbiome analysis

    PubMed Central

    Zhu, Chengsheng; Miller, Maximilian; Marpaka, Srinayani; Vaysberg, Pavel; Rühlemann, Malte C; Wu, Guojun; Heinsen, Femke-Anouska; Tempel, Marie; Zhao, Liping; Lieb, Wolfgang; Franke, Andre; Bromberg, Yana

    2018-01-01

    Abstract The vast majority of microorganisms on Earth reside in often-inseparable environment-specific communities—microbiomes. Meta-genomic/-transcriptomic sequencing could reveal the otherwise inaccessible functionality of microbiomes. However, existing analytical approaches focus on attributing sequencing reads to known genes/genomes, often failing to make maximal use of available data. We created faser (functional annotation of sequencing reads), an algorithm that is optimized to map reads to molecular functions encoded by the read-correspondent genes. The mi-faser microbiome analysis pipeline, combining faser with our manually curated reference database of protein functions, accurately annotates microbiome molecular functionality. mi-faser’s minutes-per-microbiome processing speed is significantly faster than that of other methods, allowing for large scale comparisons. Microbiome function vectors can be compared between different conditions to highlight environment-specific and/or time-dependent changes in functionality. Here, we identified previously unseen oil degradation-specific functions in BP oil-spill data, as well as functional signatures of individual-specific gut microbiome responses to a dietary intervention in children with Prader–Willi syndrome. Our method also revealed variability in Crohn's Disease patient microbiomes and clearly distinguished them from those of related healthy individuals. Our analysis highlighted the microbiome role in CD pathogenicity, demonstrating enrichment of patient microbiomes in functions that promote inflammation and that help bacteria survive it. PMID:29194524

  13. Genetic diversity and molecular evolution of Naga King Chili inferred from internal transcribed spacer sequence of nuclear ribosomal DNA.

    PubMed

    Kehie, Mechuselie; Kumaria, Suman; Devi, Khumuckcham Sangeeta; Tandon, Pramod

    2016-02-01

    Sequences of the Internal Transcribed Spacer (ITS1-5.8S-ITS2) of nuclear ribosomal DNAs were explored to study the genetic diversity and molecular evolution of Naga King Chili. Our study indicated the occurrence of nucleotide polymorphism and haplotypic diversity in the ITS regions. The present study demonstrated that the variability of ITS1 with respect to nucleotide diversity and sequence polymorphism exceeded that of ITS2. Sequence analysis of 5.8S gene revealed a much conserved region in all the accessions of Naga King Chili. However, strong phylogenetic information of this species is the distinct 13 bp deletion in the 5.8S gene which discriminated Naga King Chili from the rest of the Capsicum sp. Neutrality test results implied a neutral variation, and population seems to be evolving at drift-mutation equilibrium and free from directed selection pressure. Furthermore, mismatch analysis showed multimodal curve indicating a demographic equilibrium. Phylogenetic relationships revealed by Median Joining Network (MJN) analysis denoted a clear discrimination of Naga King Chili from its closest sister species (Capsicum chinense and Capsicum frutescens). The absence of star-like network of haplotypes suggested an ancient population expansion of this chili.

  14. High throughput sequencing analysis of RNA libraries reveals the influences of initial library and PCR methods on SELEX efficiency

    PubMed Central

    Takahashi, Mayumi; Wu, Xiwei; Ho, Michelle; Chomchan, Pritsana; Rossi, John J.; Burnett, John C.; Zhou, Jiehua

    2016-01-01

    The systemic evolution of ligands by exponential enrichment (SELEX) technique is a powerful and effective aptamer-selection procedure. However, modifications to the process can dramatically improve selection efficiency and aptamer performance. For example, droplet digital PCR (ddPCR) has been recently incorporated into SELEX selection protocols to putatively reduce the propagation of byproducts and avoid selection bias that result from differences in PCR efficiency of sequences within the random library. However, a detailed, parallel comparison of the efficacy of conventional solution PCR versus the ddPCR modification in the RNA aptamer-selection process is needed to understand effects on overall SELEX performance. In the present study, we took advantage of powerful high throughput sequencing technology and bioinformatics analysis coupled with SELEX (HT-SELEX) to thoroughly investigate the effects of initial library and PCR methods in the RNA aptamer identification. Our analysis revealed that distinct “biased sequences” and nucleotide composition existed in the initial, unselected libraries purchased from two different manufacturers and that the fate of the “biased sequences” was target-dependent during selection. Our comparison of solution PCR- and ddPCR-driven HT-SELEX demonstrated that PCR method affected not only the nucleotide composition of the enriched sequences, but also the overall SELEX efficiency and aptamer efficacy. PMID:27652575

  15. Identification and Characterization of Novel Surface Proteins in Lactobacillus johnsonii and Lactobacillus gasseri

    PubMed Central

    Ventura, Marco; Jankovic, Ivana; Walker, D. Carey; Pridmore, R. David; Zink, Ralf

    2002-01-01

    We have identified and sequenced the genes encoding the aggregation-promoting factor (APF) protein from six different strains of Lactobacillus johnsonii and Lactobacillus gasseri. Both species harbor two apf genes, apf1 and apf2, which are in the same orientation and encode proteins of 257 to 326 amino acids. Multiple alignments of the deduced amino acid sequences of these apf genes demonstrate a very strong sequence conservation of all of the genes with the exception of their central regions. Northern blot analysis showed that both genes are transcribed, reaching their maximum expression during the exponential phase. Primer extension analysis revealed that apf1 and apf2 harbor a putative promoter sequence that is conserved in all of the genes. Western blot analysis of the LiCl cell extracts showed that APF proteins are located on the cell surface. Intact cells of L. johnsonii revealed the typical cell wall architecture of S-layer-carrying gram-positive eubacteria, which could be selectively removed with LiCl treatment. In addition, the amino acid composition, physical properties, and genetic organization were found to be quite similar to those of S-layer proteins. These results suggest that APF is a novel surface protein of the Lactobacillus acidophilus B-homology group which might belong to an S-layer-like family. PMID:12450842

  16. Sequence conservation on the Y chromosome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gibson, L.H.; Yang-Feng, L.; Lau, C.

    The Y chromosome is present in all mammals and is considered to be essential to sex determination. Despite intense genomic research, only a few genes have been identified and mapped to this chromosome in humans. Several of them, such as SRY and ZFY, have been demonstrated to be conserved and Y-located in other mammals. In order to address the issue of sequence conservation on the Y chromosome, we performed fluorescence in situ hybridization (FISH) with DNA from a human Y cosmid library as a probe to study the Y chromosomes from other mammalian species. Total DNA from 3,000-4,500 cosmid poolsmore » were labeled with biotinylated-dUTP and hybridized to metaphase chromosomes. For human and primate preparations, human cot1 DNA was included in the hybridization mixture to suppress the hybridization from repeat sequences. FISH signals were detected on the Y chromosomes of human, gorilla, orangutan and baboon (Old World monkey) and were absent on those of squirrel monkey (New World monkey), Indian munjac, wood lemming, Chinese hamster, rat and mouse. Since sequence analysis suggested that specific genes, e.g. SRY and ZFY, are conserved between these two groups, the lack of detectable hybridization in the latter group implies either that conservation of the human Y sequences is limited to the Y chromosomes of the great apes and Old World monkeys, or that the size of the syntenic segment is too small to be detected under the resolution of FISH, or that homologeous sequences have undergone considerable divergence. Further studies with reduced hybridization stringency are currently being conducted. Our results provide some clues as to Y-sequence conservation across species and demonstrate the limitations of FISH across species with total DNA sequences from a particular chromosome.« less

  17. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets.

    PubMed

    Shrimankar, D D; Sathe, S R

    2016-01-01

    Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.

  18. Genetic Architecture of Vitamin B12 and Folate Levels Uncovered Applying Deeply Sequenced Large Datasets

    PubMed Central

    Thorleifsson, Gudmar; Ahluwalia, Tarunveer S.; Steinthorsdottir, Valgerdur; Bjarnason, Helgi; Gudbjartsson, Daniel F.; Magnusson, Olafur T.; Sparsø, Thomas; Albrechtsen, Anders; Kong, Augustine; Masson, Gisli; Tian, Geng; Cao, Hongzhi; Nie, Chao; Kristiansen, Karsten; Husemoen, Lise Lotte; Thuesen, Betina; Li, Yingrui; Nielsen, Rasmus; Linneberg, Allan; Olafsson, Isleifur; Eyjolfsson, Gudmundur I.; Jørgensen, Torben; Wang, Jun; Hansen, Torben; Thorsteinsdottir, Unnur; Stefánsson, Kari; Pedersen, Oluf

    2013-01-01

    Genome-wide association studies have mainly relied on common HapMap sequence variations. Recently, sequencing approaches have allowed analysis of low frequency and rare variants in conjunction with common variants, thereby improving the search for functional variants and thus the understanding of the underlying biology of human traits and diseases. Here, we used a large Icelandic whole genome sequence dataset combined with Danish exome sequence data to gain insight into the genetic architecture of serum levels of vitamin B12 (B12) and folate. Up to 22.9 million sequence variants were analyzed in combined samples of 45,576 and 37,341 individuals with serum B12 and folate measurements, respectively. We found six novel loci associating with serum B12 (CD320, TCN2, ABCD4, MMAA, MMACHC) or folate levels (FOLR3) and confirmed seven loci for these traits (TCN1, FUT6, FUT2, CUBN, CLYBL, MUT, MTHFR). Conditional analyses established that four loci contain additional independent signals. Interestingly, 13 of the 18 identified variants were coding and 11 of the 13 target genes have known functions related to B12 and folate pathways. Contrary to epidemiological studies we did not find consistent association of the variants with cardiovascular diseases, cancers or Alzheimer's disease although some variants demonstrated pleiotropic effects. Although to some degree impeded by low statistical power for some of these conditions, these data suggest that sequence variants that contribute to the population diversity in serum B12 or folate levels do not modify the risk of developing these conditions. Yet, the study demonstrates the value of combining whole genome and exome sequencing approaches to ascertain the genetic and molecular architectures underlying quantitative trait associations. PMID:23754956

  19. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets

    PubMed Central

    Shrimankar, D. D.; Sathe, S. R.

    2016-01-01

    Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868

  20. Hepatitis C infection among intravenous drug users attending therapy programs in Cyprus.

    PubMed

    Demetriou, Victoria L; van de Vijver, David A M C; Hezka, Johana; Kostrikis, Leondios G; Kostrikis, Leondios G

    2010-02-01

    The most high-risk population for HCV transmission worldwide today are intravenous drug users. HCV genotypes in the general population in Cyprus demonstrate a polyphyletic infection and include subtypes associated with intravenous drug users. The prevalence of HCV, HBV, and HIV infection, HCV genotypes and risk factors among intravenous drug users in Cyprus were investigated here for the first time. Blood samples and interviews were obtained from 40 consenting users in treatment centers, and were tested for HCV, HBV, and HIV antibodies. On the HCV-positive samples, viral RNA extraction, RT-PCR and sequencing were performed. Phylogenetic analysis determined subtype and any relationships with database sequences and statistical analysis determined any correlation of risk factors with HCV infection. The prevalence of HCV infection was 50%, but no HBV or HIV infections were found. Of the PCR-positive samples, eight (57%) were genotype 3a, and six (43%) were 1b. No other subtypes, recombinant strains or mixed infections were observed. The phylogenetic analysis of the injecting drug users' strains against database sequences observed no clustering, which does not allow determination of transmission route, possibly due to a limitation of sequences in the database. However, three clusters were discovered among the drug users' sequences, revealing small groups who possibly share injecting equipment. Statistical analysis showed the risk factor associated with HCV infection is drug use duration. Overall, the polyphyletic nature of HCV infection in Cyprus is confirmed, but the transmission route remains unknown. These findings highlight the need for harm-reduction strategies to reduce HCV transmission. (c) 2009 Wiley-Liss, Inc.

  1. Diagnosis of autosomal dominant polycystic kidney disease using efficient PKD1 and PKD2 targeted next-generation sequencing.

    PubMed

    Trujillano, Daniel; Bullich, Gemma; Ossowski, Stephan; Ballarín, José; Torra, Roser; Estivill, Xavier; Ars, Elisabet

    2014-09-01

    Molecular diagnostics of autosomal dominant polycystic kidney disease (ADPKD) relies on mutation screening of PKD1 and PKD2, which is complicated by extensive allelic heterogeneity and the presence of six highly homologous sequences of PKD1. To date, specific sequencing of PKD1 requires laborious long-range amplifications. The high cost and long turnaround time of PKD1 and PKD2 mutation analysis using conventional techniques limits its widespread application in clinical settings. We performed targeted next-generation sequencing (NGS) of PKD1 and PKD2. Pooled barcoded DNA patient libraries were enriched by in-solution hybridization with PKD1 and PKD2 capture probes. Bioinformatics analysis was performed using an in-house developed pipeline. We validated the assay in a cohort of 36 patients with previously known PKD1 and PKD2 mutations and five control individuals. Then, we used the same assay and bioinformatics analysis in a discovery cohort of 12 uncharacterized patients. We detected 35 out of 36 known definitely, highly likely, and likely pathogenic mutations in the validation cohort, including two large deletions. In the discovery cohort, we detected 11 different pathogenic mutations in 10 out of 12 patients. This study demonstrates that laborious long-range PCRs of the repeated PKD1 region can be avoided by in-solution enrichment of PKD1 and PKD2 and NGS. This strategy significantly reduces the cost and time for simultaneous PKD1 and PKD2 sequence analysis, facilitating routine genetic diagnostics of ADPKD.

  2. Isoform-level gene expression patterns in single-cell RNA-sequencing data.

    PubMed

    Vu, Trung Nghia; Wills, Quin F; Kalari, Krishna R; Niu, Nifang; Wang, Liewei; Pawitan, Yudi; Rantalainen, Mattias

    2018-02-27

    RNA sequencing of single cells enables characterization of transcriptional heterogeneity in seemingly homogeneous cell populations. Single-cell sequencing has been applied in a wide range of researches fields. However, few studies have focus on characterization of isoform-level expression patterns at the single-cell level. In this study we propose and apply a novel method, ISOform-Patterns (ISOP), based on mixture modeling, to characterize the expression patterns of isoform pairs from the same gene in single-cell isoform-level expression data. We define six principal patterns of isoform expression relationships and describe a method for differential-pattern analysis. We demonstrate ISOP through analysis of single-cell RNA-sequencing data from a breast cancer cell line, with replication in three independent datasets. We assigned the pattern types to each of 16,562 isoform-pairs from 4,929 genes. Among those, 26% of the discovered patterns were significant (p<0.05), while remaining patterns are possibly effects of transcriptional bursting, drop-out and stochastic biological heterogeneity. Furthermore, 32% of genes discovered through differential-pattern analysis were not detected by differential-expression analysis. The effect of drop-out events, mean expression level, and properties of the expression distribution on the performances of ISOP were also investigated through simulated datasets. To conclude, ISOP provides a novel approach for characterization of isoformlevel preference, commitment and heterogeneity in single-cell RNA-sequencing data. The ISOP method has been implemented as a R package and is available at https://github.com/nghiavtr/ISOP under a GPL-3 license. mattias.rantalainen@ki.se. Supplementary data are available at Bioinformatics online.

  3. An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data.

    PubMed

    Jenkinson, Garrett; Abante, Jordi; Feinberg, Andrew P; Goutsias, John

    2018-03-07

    DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical dependencies between neighboring methylation sites, thus ignoring significant information available in WGBS reads. We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods.

  4. An interdisciplinary analysis of ERTS data for Colorado mountain environments using ADP Techniques

    NASA Technical Reports Server (NTRS)

    Hoffer, R. M. (Principal Investigator)

    1972-01-01

    Author identified significant preliminary results from the Ouachita portion of the Texoma frame of data indicate many potentials in the analysis and interpretation of ERTS data. It is believed that one of the more significant aspects of this analysis sequence has been the investigation of a technique to relate ERTS analysis and surface observation analysis. At present a sequence involving (1) preliminary analysis based solely upon the spectral characteristics of the data, followed by (2) a surface observation mission to obtain visual information and oblique photography to particular points of interest in the test site area, appears to provide an extremely efficient technique for obtaining particularly meaningful surface observation data. Following such a procedure permits concentration on particular points of interest in the entire ERTS frame and thereby makes the surface observation data obtained to be particularly significant and meaningful. The analysis of the Texoma frame has also been significant from the standpoint of demonstrating a fast turn around analysis capability. Additionally, the analysis has shown the potential accuracy and degree of complexity of features that can be identified and mapped using ERTS data.

  5. Genetic Diagnosis in Consanguineous Families With Kidney Disease by Homozygosity Mapping Coupled With Whole-Exome Sequencing

    PubMed Central

    Al-Romaih, Khaldoun I.; Genovese, Giulio; Al-Mojalli, Hamad; Al-Othman, Saleh; Al-Manea, Hadeel; Al-Suleiman, Mohammed; Al-Jondubi, Mohammed; Atallah, Nourah; Al-Rodhyan, Maha; Weins, Astrid; Pollak, Martin R.; Adra, Chaker N.

    2011-01-01

    Background Accurate diagnosis of the primary cause of an individual’s kidney disease can be essential for proper management. Some kidney diseases have overlapping histopathological features despite being caused by defects in different genes. In this report we describe two consanguineous Saudi Arabian families in which individuals presented with kidney failure and mixed clinical and histological features initially thought consistent with focal segmental glomerulosclerosis. Study Design Case series. Setting and participants We studied members of two apparently unrelated families from Saudi Arabia with kidney disease. Measurements Whole-genome single-nucleotide polymorphism analysis followed by targeted isolation and sequencing of exons using genomic DNA samples from affected members of these families, followed by additional focused genotyping and sequence analysis. Results The two apparently unrelated families shared a region of homozygosity on chromosome 2q13. Exome sequence from the affected individuals lacked any sequence reads from the NPHP1 gene, which is located within this homozygous region. Additional PCR based genotyping confirmed that affected individuals had NPHP1 deletions, rather than defects in a known FSGS-associated gene. Limitations The methods used here may not result in a clear genetic diagnosis in many cases of apparent familial kidney disease. Conclusions This analysis demonstrates the power of new high-throughput genotyping and sequencing technologies to aid in the rapid genetic diagnosis of individuals with an inherited form of kidney disease. We believe it is likely that such tools may become useful clinical genetic tools and alter the manner in which diagnoses are made in nephrology. PMID:21658830

  6. Unravelling the complexity of microRNA-mediated gene regulation in black pepper (Piper nigrum L.) using high-throughput small RNA profiling.

    PubMed

    Asha, Srinivasan; Sreekumar, Sweda; Soniya, E V

    2016-01-01

    Analysis of high-throughput small RNA deep sequencing data, in combination with black pepper transcriptome sequences revealed microRNA-mediated gene regulation in black pepper ( Piper nigrum L.). Black pepper is an important spice crop and its berries are used worldwide as a natural food additive that contributes unique flavour to foods. In the present study to characterize microRNAs from black pepper, we generated a small RNA library from black pepper leaf and sequenced it by Illumina high-throughput sequencing technology. MicroRNAs belonging to a total of 303 conserved miRNA families were identified from the sRNAome data. Subsequent analysis from recently sequenced black pepper transcriptome confirmed precursor sequences of 50 conserved miRNAs and four potential novel miRNA candidates. Stem-loop qRT-PCR experiments demonstrated differential expression of eight conserved miRNAs in black pepper. Computational analysis of targets of the miRNAs showed 223 potential black pepper unigene targets that encode diverse transcription factors and enzymes involved in plant development, disease resistance, metabolic and signalling pathways. RLM-RACE experiments further mapped miRNA-mediated cleavage at five of the mRNA targets. In addition, miRNA isoforms corresponding to 18 miRNA families were also identified from black pepper. This study presents the first large-scale identification of microRNAs from black pepper and provides the foundation for the future studies of miRNA-mediated gene regulation of stress responses and diverse metabolic processes in black pepper.

  7. Annotating Protein Functional Residues by Coupling High-Throughput Fitness Profile and Homologous-Structure Analysis

    PubMed Central

    Du, Yushen; Wu, Nicholas C.; Jiang, Lin; Zhang, Tianhao; Gong, Danyang; Shu, Sara; Wu, Ting-Ting

    2016-01-01

    ABSTRACT Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available. PMID:27803181

  8. RAPD and Internal Transcribed Spacer Sequence Analyses Reveal Zea nicaraguensis as a Section Luxuriantes Species Close to Zea luxurians

    PubMed Central

    Wang, Pei; Lu, Yanli; Zheng, Mingmin; Rong, Tingzhao; Tang, Qilin

    2011-01-01

    Genetic relationship of a newly discovered teosinte from Nicaragua, Zea nicaraguensis with waterlogging tolerance, was determined based on randomly amplified polymorphic DNA (RAPD) markers and the internal transcribed spacer (ITS) sequences of nuclear ribosomal DNA using 14 accessions from Zea species. RAPD analysis showed that a total of 5,303 fragments were produced by 136 random decamer primers, of which 84.86% bands were polymorphic. RAPD-based UPGMA analysis demonstrated that the genus Zea can be divided into section Luxuriantes including Zea diploperennis, Zea luxurians, Zea perennis and Zea nicaraguensis, and section Zea including Zea mays ssp. mexicana, Zea mays ssp. parviglumis, Zea mays ssp. huehuetenangensis and Zea mays ssp. mays. ITS sequence analysis showed the lengths of the entire ITS region of the 14 taxa in Zea varied from 597 to 605 bp. The average GC content was 67.8%. In addition to the insertion/deletions, 78 variable sites were recorded in the total ITS region with 47 in ITS1, 5 in 5.8S, and 26 in ITS2. Sequences of these taxa were analyzed with neighbor-joining (NJ) and maximum parsimony (MP) methods to construct the phylogenetic trees, selecting Tripsacum dactyloides L. as the outgroup. The phylogenetic relationships of Zea species inferred from the ITS sequences are highly concordant with the RAPD evidence that resolved two major subgenus clades. Both RAPD and ITS sequence analyses indicate that Zea nicaraguensis is more closely related to Zea luxurians than the other teosintes and cultivated maize, which should be regarded as a section Luxuriantes species. PMID:21525982

  9. Data-Aware Retrodiction for Asynchronous Harmonic Measurement in a Cyber-Physical Energy System.

    PubMed

    Liu, Youda; Wang, Xue; Liu, Yanchi; Cui, Sujin

    2016-08-18

    Cyber-physical energy systems provide a networked solution for safety, reliability and efficiency problems in smart grids. On the demand side, the secure and trustworthy energy supply requires real-time supervising and online power quality assessing. Harmonics measurement is necessary in power quality evaluation. However, under the large-scale distributed metering architecture, harmonic measurement faces the out-of-sequence measurement (OOSM) problem, which is the result of latencies in sensing or the communication process and brings deviations in data fusion. This paper depicts a distributed measurement network for large-scale asynchronous harmonic analysis and exploits a nonlinear autoregressive model with exogenous inputs (NARX) network to reorder the out-of-sequence measuring data. The NARX network gets the characteristics of the electrical harmonics from practical data rather than the kinematic equations. Thus, the data-aware network approximates the behavior of the practical electrical parameter with real-time data and improves the retrodiction accuracy. Theoretical analysis demonstrates that the data-aware method maintains a reasonable consumption of computing resources. Experiments on a practical testbed of a cyber-physical system are implemented, and harmonic measurement and analysis accuracy are adopted to evaluate the measuring mechanism under a distributed metering network. Results demonstrate an improvement of the harmonics analysis precision and validate the asynchronous measuring method in cyber-physical energy systems.

  10. Isolation of Lagos bat virus from water mongoose.

    PubMed

    Markotter, Wanda; Kuzmin, Ivan; Rupprecht, Charles E; Randles, Jenny; Sabeta, Claude T; Wandeler, Alexander I; Nel, Louis H

    2006-12-01

    A genotype 2 lyssavirus, Lagos bat virus (LBV), was isolated from a terrestrial wildlife species (water mongoose) in August 2004 in the Durban area of the KwaZulu-Natal Province of South Africa. The virus isolate was confirmed as LBV by antigenic and genetic characterization, and the mongoose was identified as Atilax paludinosus by mitochondrial cytochrome b sequence analysis. Phylogenetic analysis demonstrated sequence homology with previous LBV isolates from South African bats. Studies performed in mice indicated that the peripheral pathogenicity of LBV had been underestimated in previous studies. Surveillance strategies for LBV in Africa must be improved to better understand the epidemiology of this virus and to make informed decisions on future vaccine strategies because evidence is insufficent that current rabies vaccines provide protection against LBV.

  11. A genomewide survey of basic helix–loop–helix factors in Drosophila

    PubMed Central

    Moore, Adrian W.; Barbel, Sandra; Jan, Lily Yeh; Jan, Yuh Nung

    2000-01-01

    The basic helix–loop–helix (bHLH) transcription factors play important roles in the specification of tissue type during the development of animals. We have used the information contained in the recently published genomic sequence of Drosophila melanogaster to identify 12 additional bHLH proteins. By sequence analysis we have assigned these proteins to families defined by Atonal, Hairy-Enhancer of Split, Hand, p48, Mesp, MYC/USF, and the bHLH-Per, Arnt, Sim (PAS) domain. In addition, one single protein represents a unique family of bHLH proteins. mRNA in situ analysis demonstrates that the genes encoding these proteins are expressed in several tissue types but are particularly concentrated in the developing nervous system and mesoderm. PMID:10973473

  12. Nodal domains of a non-separable problem—the right-angled isosceles triangle

    NASA Astrophysics Data System (ADS)

    Aronovitch, Amit; Band, Ram; Fajman, David; Gnutzmann, Sven

    2012-03-01

    We study the nodal set of eigenfunctions of the Laplace operator on the right-angled isosceles triangle. A local analysis of the nodal pattern provides an algorithm for computing the number νn of nodal domains for any eigenfunction. In addition, an exact recursive formula for the number of nodal domains is found to reproduce all existing data. Eventually, we use the recursion formula to analyse a large sequence of nodal counts statistically. Our analysis shows that the distribution of nodal counts for this triangular shape has a much richer structure than the known cases of regular separable shapes or completely irregular shapes. Furthermore, we demonstrate that the nodal count sequence contains information about the periodic orbits of the corresponding classical ray dynamics.

  13. An improved model for whole genome phylogenetic analysis by Fourier transform.

    PubMed

    Yin, Changchuan; Yau, Stephen S-T

    2015-10-07

    DNA sequence similarity comparison is one of the major steps in computational phylogenetic studies. The sequence comparison of closely related DNA sequences and genomes is usually performed by multiple sequence alignments (MSA). While the MSA method is accurate for some types of sequences, it may produce incorrect results when DNA sequences undergone rearrangements as in many bacterial and viral genomes. It is also limited by its computational complexity for comparing large volumes of data. Previously, we proposed an alignment-free method that exploits the full information contents of DNA sequences by Discrete Fourier Transform (DFT), but still with some limitations. Here, we present a significantly improved method for the similarity comparison of DNA sequences by DFT. In this method, we map DNA sequences into 2-dimensional (2D) numerical sequences and then apply DFT to transform the 2D numerical sequences into frequency domain. In the 2D mapping, the nucleotide composition of a DNA sequence is a determinant factor and the 2D mapping reduces the nucleotide composition bias in distance measure, and thus improving the similarity measure of DNA sequences. To compare the DFT power spectra of DNA sequences with different lengths, we propose an improved even scaling algorithm to extend shorter DFT power spectra to the longest length of the underlying sequences. After the DFT power spectra are evenly scaled, the spectra are in the same dimensionality of the Fourier frequency space, then the Euclidean distances of full Fourier power spectra of the DNA sequences are used as the dissimilarity metrics. The improved DFT method, with increased computational performance by 2D numerical representation, can be applicable to any DNA sequences of different length ranges. We assess the accuracy of the improved DFT similarity measure in hierarchical clustering of different DNA sequences including simulated and real datasets. The method yields accurate and reliable phylogenetic trees and demonstrates that the improved DFT dissimilarity measure is an efficient and effective similarity measure of DNA sequences. Due to its high efficiency and accuracy, the proposed DFT similarity measure is successfully applied on phylogenetic analysis for individual genes and large whole bacterial genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. Software for rapid time dependent ChIP-sequencing analysis (TDCA).

    PubMed

    Myschyshyn, Mike; Farren-Dai, Marco; Chuang, Tien-Jui; Vocadlo, David

    2017-11-25

    Chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq) and associated methods are widely used to define the genome wide distribution of chromatin associated proteins, post-translational epigenetic marks, and modifications found on DNA bases. An area of emerging interest is to study time dependent changes in the distribution of such proteins and marks by using serial ChIP-seq experiments performed in a time resolved manner. Despite such time resolved studies becoming increasingly common, software to facilitate analysis of such data in a robust automated manner is limited. We have designed software called Time-Dependent ChIP-Sequencing Analyser (TDCA), which is the first program to automate analysis of time-dependent ChIP-seq data by fitting to sigmoidal curves. We provide users with guidance for experimental design of TDCA for modeling of time course (TC) ChIP-seq data using two simulated data sets. Furthermore, we demonstrate that this fitting strategy is widely applicable by showing that automated analysis of three previously published TC data sets accurately recapitulates key findings reported in these studies. Using each of these data sets, we highlight how biologically relevant findings can be readily obtained by exploiting TDCA to yield intuitive parameters that describe behavior at either a single locus or sets of loci. TDCA enables customizable analysis of user input aligned DNA sequencing data, coupled with graphical outputs in the form of publication-ready figures that describe behavior at either individual loci or sets of loci sharing common traits defined by the user. TDCA accepts sequencing data as standard binary alignment map (BAM) files and loci of interest in browser extensible data (BED) file format. TDCA accurately models the number of sequencing reads, or coverage, at loci from TC ChIP-seq studies or conceptually related TC sequencing experiments. TC experiments are reduced to intuitive parametric values that facilitate biologically relevant data analysis, and the uncovering of variations in the time-dependent behavior of chromatin. TDCA automates the analysis of TC ChIP-seq experiments, permitting researchers to easily obtain raw and modeled data for specific loci or groups of loci with similar behavior while also enhancing consistency of data analysis of TC data within the genomics field.

  15. Annotating Protein Functional Residues by Coupling High-Throughput Fitness Profile and Homologous-Structure Analysis.

    PubMed

    Du, Yushen; Wu, Nicholas C; Jiang, Lin; Zhang, Tianhao; Gong, Danyang; Shu, Sara; Wu, Ting-Ting; Sun, Ren

    2016-11-01

    Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available. To fully comprehend the diverse functions of a protein, it is essential to understand the functionality of individual residues. Current methods are highly dependent on evolutionary sequence conservation, which is usually limited by sampling size. Sequence conservation-based methods are further confounded by structural constraints and multifunctionality of proteins. Here we present a method that can systematically identify and annotate functional residues of a given protein. We used a high-throughput functional profiling platform to identify essential residues. Coupling it with homologous-structure comparison, we were able to annotate multiple functions of proteins. We demonstrated the method with the PB1 protein of influenza A virus and identified novel functional residues in addition to its canonical function as an RNA-dependent RNA polymerase. Not limited to virology, this method is generally applicable to other proteins that can be functionally selected and about which homologous-structure information is available. Copyright © 2016 Du et al.

  16. Integrating protein structural dynamics and evolutionary analysis with Bio3D.

    PubMed

    Skjærven, Lars; Yao, Xin-Qiu; Scarabelli, Guido; Grant, Barry J

    2014-12-10

    Popular bioinformatics approaches for studying protein functional dynamics include comparisons of crystallographic structures, molecular dynamics simulations and normal mode analysis. However, determining how observed displacements and predicted motions from these traditionally separate analyses relate to each other, as well as to the evolution of sequence, structure and function within large protein families, remains a considerable challenge. This is in part due to the general lack of tools that integrate information of molecular structure, dynamics and evolution. Here, we describe the integration of new methodologies for evolutionary sequence, structure and simulation analysis into the Bio3D package. This major update includes unique high-throughput normal mode analysis for examining and contrasting the dynamics of related proteins with non-identical sequences and structures, as well as new methods for quantifying dynamical couplings and their residue-wise dissection from correlation network analysis. These new methodologies are integrated with major biomolecular databases as well as established methods for evolutionary sequence and comparative structural analysis. New functionality for directly comparing results derived from normal modes, molecular dynamics and principal component analysis of heterogeneous experimental structure distributions is also included. We demonstrate these integrated capabilities with example applications to dihydrofolate reductase and heterotrimeric G-protein families along with a discussion of the mechanistic insight provided in each case. The integration of structural dynamics and evolutionary analysis in Bio3D enables researchers to go beyond a prediction of single protein dynamics to investigate dynamical features across large protein families. The Bio3D package is distributed with full source code and extensive documentation as a platform independent R package under a GPL2 license from http://thegrantlab.org/bio3d/ .

  17. Multilocus sequence analysis for assessment of phylogenetic diversity and biogeography in Thalassospira bacteria from diverse marine environments.

    PubMed

    Lai, Qiliang; Liu, Yang; Yuan, Jun; Du, Juan; Wang, Liping; Sun, Fengqin; Shao, Zongze

    2014-01-01

    Thalassospira bacteria are widespread and have been isolated from various marine environments. Less is known about their genetic diversity and biogeography, as well as their role in marine environments, many of them cannot be discriminated merely using the 16S rRNA gene. To address these issues, in this report, the phylogenetic analysis of 58 strains from seawater and deep sea sediments were carried out using the multilocus sequence analysis (MLSA) based on acsA, aroE, gyrB, mutL, rpoD and trpB genes, and the DNA-DNA hybridization (DDH) and average nucleotide identity (ANI) based on genome sequences. The MLSA analysis demonstrated that the 58 strains were clearly separated into 15 lineages, corresponding to seven validly described species and eight potential novel species. The DDH and ANI values further confirmed the validity of the MLSA analysis and eight potential novel species. The MLSA interspecies gap of the genus Thalassospira was determined to be 96.16-97.12% sequence identity on the basis of the combined analyses of the DDH and MLSA, while the ANIm interspecies gap was 95.76-97.20% based on the in silico DDH analysis. Meanwhile, phylogenetic analyses showed that the Thalassospira bacteria exhibited distribution pattern to a certain degree according to geographic regions. Moreover, they clustered together according to the habitats depth. For short, the phylogenetic analyses and biogeography of the Thalassospira bacteria were systematically investigated for the first time. These results will be helpful to explore further their ecological role and adaptive evolution in marine environments.

  18. Multilocus Sequence Analysis for Assessment of Phylogenetic Diversity and Biogeography in Thalassospira Bacteria from Diverse Marine Environments

    PubMed Central

    Yuan, Jun; Du, Juan; Wang, Liping; Sun, Fengqin; Shao, Zongze

    2014-01-01

    Thalassospira bacteria are widespread and have been isolated from various marine environments. Less is known about their genetic diversity and biogeography, as well as their role in marine environments, many of them cannot be discriminated merely using the 16S rRNA gene. To address these issues, in this report, the phylogenetic analysis of 58 strains from seawater and deep sea sediments were carried out using the multilocus sequence analysis (MLSA) based on acsA, aroE, gyrB, mutL, rpoD and trpB genes, and the DNA-DNA hybridization (DDH) and average nucleotide identity (ANI) based on genome sequences. The MLSA analysis demonstrated that the 58 strains were clearly separated into 15 lineages, corresponding to seven validly described species and eight potential novel species. The DDH and ANI values further confirmed the validity of the MLSA analysis and eight potential novel species. The MLSA interspecies gap of the genus Thalassospira was determined to be 96.16–97.12% sequence identity on the basis of the combined analyses of the DDH and MLSA, while the ANIm interspecies gap was 95.76–97.20% based on the in silico DDH analysis. Meanwhile, phylogenetic analyses showed that the Thalassospira bacteria exhibited distribution pattern to a certain degree according to geographic regions. Moreover, they clustered together according to the habitats depth. For short, the phylogenetic analyses and biogeography of the Thalassospira bacteria were systematically investigated for the first time. These results will be helpful to explore further their ecological role and adaptive evolution in marine environments. PMID:25198177

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ruggles, Kelly V.; Tang, Zuojian; Wang, Xuya

    Improvements in mass spectrometry (MS)-based peptide sequencing provide a new opportunity to determine whether polymorphisms, mutations and splice variants identified in cancer cells are translated. Herein we therefore describe a proteogenomic data integration tool (QUILTS) and illustrate its application to whole genome, transcriptome and global MS peptide sequence datasets generated from a pair of luminal and basal-like breast cancer patient derived xenografts (PDX). The sensitivity of proteogenomic analysis for singe nucleotide variant (SNV) expression and novel splice junction (NSJ) detection was probed using multiple MS/MS process replicates. Despite over thirty sample replicates, only about 10% of all SNV (somatic andmore » germline) were detected by both DNA and RNA sequencing were observed as peptides. An even smaller proportion of peptides corresponding to NSJ observed by RNA sequencing were detected (<0.1%). Peptides mapping to DNA-detected SNV without a detectable mRNA transcript were also observed demonstrating the transcriptome coverage was also incomplete (~80%). In contrast to germ-line variants, somatic variants were less likely to be detected at the peptide level in the basal-like tumor than the luminal tumor raising the possibility of differential translation or protein degradation effects. In conclusion, the QUILTS program integrates DNA, RNA and peptide sequencing to assess the degree to which somatic mutations are translated and therefore biologically active. By identifying gaps in sequence coverage QUILTS benchmarks current technology and assesses progress towards whole cancer proteome and transcriptome analysis.« less

  20. Sequence analysis of the complete genome of Trichoplusia ni single nucleopolyhedrovirus and the identification of a baculoviral photolyase gene

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Willis, Leslie G.; Siepp, Robyn; Stewart, Taryn M.

    2005-08-01

    The genome of the Trichoplusia ni single nucleopolyhedrovirus (TnSNPV), a group II NPV which infects the cabbage looper (T. ni), has been completely sequenced and analyzed. The TnSNPV DNA genome consists of 134,394 bp and has an overall G + C content of 39%. Gene analysis predicted 144 open reading frames (ORFs) of 150 nucleotides or greater that showed minimal overlap. Comparisons with previously sequenced baculoviruses indicate that 119 TnSNPV ORFs were homologues of previously reported viral gene sequences. Ninety-four TnSNPV ORFs returned an Autographa californica multiple NPV (AcMNPV) homologue while 25 ORFs returned poor or no sequence matches withmore » the current databases. A putative photolyase gene was also identified that had highest amino acid identity to the photolyase genes of Chrysodeixis chalcites NPV (ChchNPV) (47%) and Danio rerio (zebrafish) (40%). In addition unlike all other baculoviruses no obvious homologous repeat (hr) sequences were identified. Comparison of the TnSNPV and AcMNPV genomes provides a unique opportunity to examine two baculoviruses that are highly virulent for a common insect host (T. ni) yet belong to diverse baculovirus taxonomic groups and possess distinct biological features. In vitro fusion assays demonstrated that the TnSNPV F protein induces membrane fusion and syncytia formation and were compared to syncytia formed by AcMNPV GP64.« less

  1. CDR3 analysis of TCR Vβ repertoire of CD8⁺ T cells from chickens infected with Eimeria maxima.

    PubMed

    Ren, Chao; Yin, Guangwen; Qin, Mei; Suo, Jingxia; Lv, Qiyao; Xie, Li; Wang, Yunzhou; Huang, Xiaoxi; Chen, Yuchen; Liu, Xianyong; Suo, Xun

    2014-08-01

    CD8(+) T cells play a major role in the immune protection of host against the reinfection of Eimeria maxima, the most immunogenic species of eimerian parasites in chickens. To explore the dominant complementarity-determining regions 3 (CDR3) of CD8(+) T cell populations induced by the infection of this parasite, sequence analysis was performed in this study for CDR3 of CD8(+) T cells from E. maxima infected chickens. After 5 days post the third or forth infection, intraepithelial lymphocytes were isolated from the jejunum of bird. CD3(+)CD8(+) T cells were sorted and subjected to total RNA isolation and cDNA preparation. PCR amplification and cloning of the loci between Vβ1 and Cβ was conducted for the subsequent sequencing of CDR3 of T cell receptor (TCR). After the forth infection, 2 birds exhibited two same frequent TCR CDR3 sequences, i.e., AKQDWGTGGYSNMI and AGRVLNIQY; while the third bird showed two different frequent TCR CDR3 sequences, AKQGARGHTPLN and AKQDIEVRGPNTPLN. No frequent CDR3 sequence was detected from uninfected birds, though AGRVLNIQY was also found in two uninfected birds. Our result preliminarily demonstrates that frequent CDR3 sequences may exist in E. maxima immunized chickens, encouraging the mining of the immunodominant CD8(+) T cells against E. maxima infection. Copyright © 2014 Elsevier Inc. All rights reserved.

  2. When What's Inside Counts: Sequence of Demonstrated Actions Affects Preschooler's Categorization by Nonobvious Properties

    ERIC Educational Resources Information Center

    Yu, Yue; Kushnir, Tamar

    2016-01-01

    This study explores the role of a particular social cue--the "sequence" of demonstrated actions and events--in preschooler's categorization. A demonstrator sorted objects that varied on both a surface feature (color) and a nonobvious property (sound made when shaken). Children saw a sequence of actions in which the nonobvious property…

  3. Defining objective clusters for rabies virus sequences using affinity propagation clustering

    PubMed Central

    Fischer, Susanne; Freuling, Conrad M.; Pfaff, Florian; Bodenhofer, Ulrich; Höper, Dirk; Fischer, Mareike; Marston, Denise A.; Fooks, Anthony R.; Mettenleiter, Thomas C.; Conraths, Franz J.; Homeier-Bachmann, Timo

    2018-01-01

    Rabies is caused by lyssaviruses, and is one of the oldest known zoonoses. In recent years, more than 21,000 nucleotide sequences of rabies viruses (RABV), from the prototype species rabies lyssavirus, have been deposited in public databases. Subsequent phylogenetic analyses in combination with metadata suggest geographic distributions of RABV. However, these analyses somewhat experience technical difficulties in defining verifiable criteria for cluster allocations in phylogenetic trees inviting for a more rational approach. Therefore, we applied a relatively new mathematical clustering algorythm named ‘affinity propagation clustering’ (AP) to propose a standardized sub-species classification utilizing full-genome RABV sequences. Because AP has the advantage that it is computationally fast and works for any meaningful measure of similarity between data samples, it has previously been applied successfully in bioinformatics, for analysis of microarray and gene expression data, however, cluster analysis of sequences is still in its infancy. Existing (516) and original (46) full genome RABV sequences were used to demonstrate the application of AP for RABV clustering. On a global scale, AP proposed four clusters, i.e. New World cluster, Arctic/Arctic-like, Cosmopolitan, and Asian as previously assigned by phylogenetic studies. By combining AP with established phylogenetic analyses, it is possible to resolve phylogenetic relationships between verifiably determined clusters and sequences. This workflow will be useful in confirming cluster distributions in a uniform transparent manner, not only for RABV, but also for other comparative sequence analyses. PMID:29357361

  4. Mitochondrial gene sequences alone or combined with ITS region sequences provide firm molecular criteria for the classification of Lecanicillium species.

    PubMed

    Kouvelis, Vassili N; Sialakouma, Aphrodite; Typas, Milton A

    2008-07-01

    The recent revision of Verticillium sect. Prostrata led to the introduction of the genus Lecanicillium, which comprises the majority of the entomopathogenic strains. Sixty-five strains previously classified as Verticillium lecanii or Verticillium sp. from different geographical regions and hosts were examined and their phylogenetic relationships were determined using sequences from three mitochondrial (mt) genes [the small rRNA subunit (rns), the NADH dehydrogenase subunits 1 (nad1) and 3 (nad3)] and the ITS region. In general, single gene phylogenetic trees differentiated and placed the strains examined in well-supported (by BS analysis) groups of L. lecanii, L. longisporum, L. muscarium, and L. nodulosum, although in some cases a few uncertainties still remained. nad1 was the most informative single gene in phylogenetic analyses and was also found to contain group I introns with putative open reading frames (ORFs) encoding for GIY-YIG endonucleases. The combined use of mt gene sequences resolved taxonomic uncertainties arisen from ITS analysis and, alone or in combination with ITS sequences, helped in placing uncharacterised Verticillium lecanii and Verticillium sp. firmly into Lecanicillium species. Combined gene data from all the mt genes and all the mt genes and the ITS region together, were very similar. Furthermore, a relaxed correlation with host specificity -- at least for Homoptera -- was indicated for the rns and the combined mt gene sequences. Thus, the usefulness of mt gene sequences as a convenient molecular tool in phylogenetic studies of entomopathogenic fungi was demonstrated.

  5. Identification and characterization of Highlands J virus from a Mississippi sandhill crane using unbiased next-generation sequencing

    USGS Publications Warehouse

    Ip, Hon S.; Wiley, Michael R.; Long, Renee; Gustavo, Palacios; Shearn-Bochsler, Valerie; Whitehouse, Chris A.

    2014-01-01

    Advances in massively parallel DNA sequencing platforms, commonly termed next-generation sequencing (NGS) technologies, have greatly reduced time, labor, and cost associated with DNA sequencing. Thus, NGS has become a routine tool for new viral pathogen discovery and will likely become the standard for routine laboratory diagnostics of infectious diseases in the near future. This study demonstrated the application of NGS for the rapid identification and characterization of a virus isolated from the brain of an endangered Mississippi sandhill crane. This bird was part of a population restoration effort and was found in an emaciated state several days after Hurricane Isaac passed over the refuge in Mississippi in 2012. Post-mortem examination had identified trichostrongyliasis as the possible cause of death, but because a virus with morphology consistent with a togavirus was isolated from the brain of the bird, an arboviral etiology was strongly suspected. Because individual molecular assays for several known arboviruses were negative, unbiased NGS by Illumina MiSeq was used to definitively identify and characterize the causative viral agent. Whole genome sequencing and phylogenetic analysis revealed the viral isolate to be the Highlands J virus, a known avian pathogen. This study demonstrates the use of unbiased NGS for the rapid detection and characterization of an unidentified viral pathogen and the application of this technology to wildlife disease diagnostics and conservation medicine.

  6. Sequence Composition and Gene Content of the Short Arm of Rye (Secale cereale) Chromosome 1

    PubMed Central

    Fluch, Silvia; Kopecky, Dieter; Burg, Kornel; Šimková, Hana; Taudien, Stefan; Petzold, Andreas; Kubaláková, Marie; Platzer, Matthias; Berenyi, Maria; Krainer, Siegfried; Doležel, Jaroslav; Lelley, Tamas

    2012-01-01

    Background The purpose of the study is to elucidate the sequence composition of the short arm of rye chromosome 1 (Secale cereale) with special focus on its gene content, because this portion of the rye genome is an integrated part of several hundreds of bread wheat varieties worldwide. Methodology/Principal Findings Multiple Displacement Amplification of 1RS DNA, obtained from flow sorted 1RS chromosomes, using 1RS ditelosomic wheat-rye addition line, and subsequent Roche 454FLX sequencing of this DNA yielded 195,313,589 bp sequence information. This quantity of sequence information resulted in 0.43× sequence coverage of the 1RS chromosome arm, permitting the identification of genes with estimated probability of 95%. A detailed analysis revealed that more than 5% of the 1RS sequence consisted of gene space, identifying at least 3,121 gene loci representing 1,882 different gene functions. Repetitive elements comprised about 72% of the 1RS sequence, Gypsy/Sabrina (13.3%) being the most abundant. More than four thousand simple sequence repeat (SSR) sites mostly located in gene related sequence reads were identified for possible marker development. The existence of chloroplast insertions in 1RS has been verified by identifying chimeric chloroplast-genomic sequence reads. Synteny analysis of 1RS to the full genomes of Oryza sativa and Brachypodium distachyon revealed that about half of the genes of 1RS correspond to the distal end of the short arm of rice chromosome 5 and the proximal region of the long arm of Brachypodium distachyon chromosome 2. Comparison of the gene content of 1RS to 1HS barley chromosome arm revealed high conservation of genes related to chromosome 5 of rice. Conclusions The present study revealed the gene content and potential gene functions on this chromosome arm and demonstrated numerous sequence elements like SSRs and gene-related sequences, which can be utilised for future research as well as in breeding of wheat and rye. PMID:22328922

  7. Shotgun protein sequencing: assembly of peptide tandem mass spectra from mixtures of modified proteins.

    PubMed

    Bandeira, Nuno; Clauser, Karl R; Pevzner, Pavel A

    2007-07-01

    Despite significant advances in the identification of known proteins, the analysis of unknown proteins by MS/MS still remains a challenging open problem. Although Klaus Biemann recognized the potential of MS/MS for sequencing of unknown proteins in the 1980s, low throughput Edman degradation followed by cloning still remains the main method to sequence unknown proteins. The automated interpretation of MS/MS spectra has been limited by a focus on individual spectra and has not capitalized on the information contained in spectra of overlapping peptides. Indeed the powerful shotgun DNA sequencing strategies have not been extended to automated protein sequencing. We demonstrate, for the first time, the feasibility of automated shotgun protein sequencing of protein mixtures by utilizing MS/MS spectra of overlapping and possibly modified peptides generated via multiple proteases of different specificities. We validate this approach by generating highly accurate de novo reconstructions of multiple regions of various proteins in western diamondback rattlesnake venom. We further argue that shotgun protein sequencing has the potential to overcome the limitations of current protein sequencing approaches and thus catalyze the otherwise impractical applications of proteomics methodologies in studies of unknown proteins.

  8. Image Encryption Algorithm Based on Hyperchaotic Maps and Nucleotide Sequences Database

    PubMed Central

    2017-01-01

    Image encryption technology is one of the main means to ensure the safety of image information. Using the characteristics of chaos, such as randomness, regularity, ergodicity, and initial value sensitiveness, combined with the unique space conformation of DNA molecules and their unique information storage and processing ability, an efficient method for image encryption based on the chaos theory and a DNA sequence database is proposed. In this paper, digital image encryption employs a process of transforming the image pixel gray value by using chaotic sequence scrambling image pixel location and establishing superchaotic mapping, which maps quaternary sequences and DNA sequences, and by combining with the logic of the transformation between DNA sequences. The bases are replaced under the displaced rules by using DNA coding in a certain number of iterations that are based on the enhanced quaternary hyperchaotic sequence; the sequence is generated by Chen chaos. The cipher feedback mode and chaos iteration are employed in the encryption process to enhance the confusion and diffusion properties of the algorithm. Theoretical analysis and experimental results show that the proposed scheme not only demonstrates excellent encryption but also effectively resists chosen-plaintext attack, statistical attack, and differential attack. PMID:28392799

  9. Identification of the regulatory autophosphorylation site of autophosphorylation-dependent protein kinase (auto-kinase). Evidence that auto-kinase belongs to a member of the p21-activated kinase family.

    PubMed

    Yu, J S; Chen, W J; Ni, M H; Chan, W H; Yang, S D

    1998-08-15

    Autophosphorylation-dependent protein kinase (auto-kinase) was identified from pig brain and liver on the basis of its unique autophosphorylation/activation property [Yang, Fong, Yu and Liu (1987) J. Biol. Chem. 262, 7034-7040; Yang, Chang and Soderling (1987) J. Biol. Chem. 262, 9421-9427]. Its substrate consensus sequence motif was determined as being -R-X-(X)-S*/T*-X3-S/T-. To characterize auto-kinase further, we partly sequenced the kinase purified from pig liver. The N-terminal sequence (VDGGAKTSDKQKKKAXMTDE) and two internal peptide sequences (EKLRTIV and LQNPEK/ILTP/FI) of auto-kinase were obtained. These sequences identify auto-kinase as a C-terminal catalytic fragment of p21-activated protein kinase 2 (PAK2 or gamma-PAK) lacking its N-terminal regulatory region. Auto-kinase can be recognized by an antibody raised against the C-terminal peptide of human PAK2 by immunoblotting. Furthermore the autophosphorylation site sequence of auto-kinase was successfully predicted on the basis of its substrate consensus sequence motif and the known PAK2 sequence, and was further demonstrated to be RST(P)MVGTPYWMAPEVVTR by phosphoamino acid analysis, manual Edman degradation and phosphopeptide mapping via the help of phosphorylation site analysis of a synthetic peptide corresponding to the sequence of PAK2 from residues 396 to 418. During the activation process, auto-kinase autophosphorylates mainly on a single threonine residue Thr402 (according to the sequence numbering of human PAK2). In addition, a phospho-specific antibody against a synthetic phosphopeptide containing this identified sequence was generated and shown to be able to differentially recognize the activated auto-kinase autophosphorylated at Thr402 but not the non-phosphorylated/inactive auto-kinase. Immunoblot analysis with this phospho-specific antibody further revealed that the change in phosphorylation level of Thr402 of auto-kinase was well correlated with the activity change of the kinase during both autophosphorylation/activation and protein phosphatase-mediated dephosphorylation/inactivation processes. Taken together, our results identify Thr402 as the regulatory autophosphorylation site of auto-kinase, which is a C-terminal catalytic fragment of PAK2.

  10. Identification of the regulatory autophosphorylation site of autophosphorylation-dependent protein kinase (auto-kinase). Evidence that auto-kinase belongs to a member of the p21-activated kinase family.

    PubMed Central

    Yu, J S; Chen, W J; Ni, M H; Chan, W H; Yang, S D

    1998-01-01

    Autophosphorylation-dependent protein kinase (auto-kinase) was identified from pig brain and liver on the basis of its unique autophosphorylation/activation property [Yang, Fong, Yu and Liu (1987) J. Biol. Chem. 262, 7034-7040; Yang, Chang and Soderling (1987) J. Biol. Chem. 262, 9421-9427]. Its substrate consensus sequence motif was determined as being -R-X-(X)-S*/T*-X3-S/T-. To characterize auto-kinase further, we partly sequenced the kinase purified from pig liver. The N-terminal sequence (VDGGAKTSDKQKKKAXMTDE) and two internal peptide sequences (EKLRTIV and LQNPEK/ILTP/FI) of auto-kinase were obtained. These sequences identify auto-kinase as a C-terminal catalytic fragment of p21-activated protein kinase 2 (PAK2 or gamma-PAK) lacking its N-terminal regulatory region. Auto-kinase can be recognized by an antibody raised against the C-terminal peptide of human PAK2 by immunoblotting. Furthermore the autophosphorylation site sequence of auto-kinase was successfully predicted on the basis of its substrate consensus sequence motif and the known PAK2 sequence, and was further demonstrated to be RST(P)MVGTPYWMAPEVVTR by phosphoamino acid analysis, manual Edman degradation and phosphopeptide mapping via the help of phosphorylation site analysis of a synthetic peptide corresponding to the sequence of PAK2 from residues 396 to 418. During the activation process, auto-kinase autophosphorylates mainly on a single threonine residue Thr402 (according to the sequence numbering of human PAK2). In addition, a phospho-specific antibody against a synthetic phosphopeptide containing this identified sequence was generated and shown to be able to differentially recognize the activated auto-kinase autophosphorylated at Thr402 but not the non-phosphorylated/inactive auto-kinase. Immunoblot analysis with this phospho-specific antibody further revealed that the change in phosphorylation level of Thr402 of auto-kinase was well correlated with the activity change of the kinase during both autophosphorylation/activation and protein phosphatase-mediated dephosphorylation/inactivation processes. Taken together, our results identify Thr402 as the regulatory autophosphorylation site of auto-kinase, which is a C-terminal catalytic fragment of PAK2. PMID:9693111

  11. Asymmetric histone modifications between the original and derived loci of human segmental duplications

    PubMed Central

    Zheng, Deyou

    2008-01-01

    Background Sequencing and annotation of several mammalian genomes have revealed that segmental duplications are a common architectural feature of primate genomes; in fact, about 5% of the human genome is composed of large blocks of interspersed segmental duplications. These segmental duplications have been implicated in genomic copy-number variation, gene novelty, and various genomic disorders. However, the molecular processes involved in the evolution and regulation of duplicated sequences remain largely unexplored. Results In this study, the profile of about 20 histone modifications within human segmental duplications was characterized using high-resolution, genome-wide data derived from a ChIP-Seq study. The analysis demonstrates that derivative loci of segmental duplications often differ significantly from the original with respect to many histone methylations. Further investigation showed that genes are present three times more frequently in the original than in the derivative, whereas pseudogenes exhibit the opposite trend. These asymmetries tend to increase with the age of segmental duplications. The uneven distribution of genes and pseudogenes does not, however, fully account for the asymmetry in the profile of histone modifications. Conclusion The first systematic analysis of histone modifications between segmental duplications demonstrates that two seemingly 'identical' genomic copies are distinct in their epigenomic properties. Results here suggest that local chromatin environments may be implicated in the discrimination of derived copies of segmental duplications from their originals, leading to a biased pseudogenization of the new duplicates. The data also indicate that further exploration of the interactions between histone modification and sequence degeneration is necessary in order to understand the divergence of duplicated sequences. PMID:18598352

  12. Detecting novel genes with sparse arrays

    PubMed Central

    Haiminen, Niina; Smit, Bart; Rautio, Jari; Vitikainen, Marika; Wiebe, Marilyn; Martinez, Diego; Chee, Christine; Kunkel, Joe; Sanchez, Charles; Nelson, Mary Anne; Pakula, Tiina; Saloheimo, Markku; Penttilä, Merja; Kivioja, Teemu

    2014-01-01

    Species-specific genes play an important role in defining the phenotype of an organism. However, current gene prediction methods can only efficiently find genes that share features such as sequence similarity or general sequence characteristics with previously known genes. Novel sequencing methods and tiling arrays can be used to find genes without prior information and they have demonstrated that novel genes can still be found from extensively studied model organisms. Unfortunately, these methods are expensive and thus are not easily applicable, e.g., to finding genes that are expressed only in very specific conditions. We demonstrate a method for finding novel genes with sparse arrays, applying it on the 33.9 Mb genome of the filamentous fungus Trichoderma reesei. Our computational method does not require normalisations between arrays and it takes into account the multiple-testing problem typical for analysis of microarray data. In contrast to tiling arrays, that use overlapping probes, only one 25mer microarray oligonucleotide probe was used for every 100 b. Thus, only relatively little space on a microarray slide was required to cover the intergenic regions of a genome. The analysis was done as a by-product of a conventional microarray experiment with no additional costs. We found at least 23 good candidates for novel transcripts that could code for proteins and all of which were expressed at high levels. Candidate genes were found to neighbour ire1 and cre1 and many other regulatory genes. Our simple, low-cost method can easily be applied to finding novel species-specific genes without prior knowledge of their sequence properties. PMID:20691772

  13. A KCNH2 branch point mutation causing aberrant splicing contributes to an explanation of genotype-negative long QT syndrome.

    PubMed

    Crotti, Lia; Lewandowska, Marzena A; Schwartz, Peter J; Insolia, Roberto; Pedrazzini, Matteo; Bussani, Erica; Dagradi, Federica; George, Alfred L; Pagani, Franco

    2009-02-01

    Genetic screening of long QT syndrome (LQTS) fails to identify disease-causing mutations in about 30% of patients. So far, molecular screening has focused mainly on coding sequence mutations or on substitutions at canonical splice sites. The purpose of this study was to explore the possibility that intronic variants not at canonical splice sites might affect splicing regulatory elements, lead to aberrant transcripts, and cause LQTS. Molecular screening was performed through DHPLC and sequence analysis. The role of the intronic mutation identified was assessed with a hybrid minigene splicing assay. A three-generation LQTS family was investigated. Molecular screening failed to identify an obvious disease-causing mutation in the coding sequences of the major LQTS genes but revealed an intronic A-to-G substitution in KCNH2 (IVS9-28A/G) cosegregating with the clinical phenotype in family members. In vitro analysis proved that the mutation disrupts the acceptor splice site definition by affecting the branch point (BP) sequence and promoting intron retention. We further demonstrated a tight functional relationship between the BP and the polypyrimidine tract, whose weakness is responsible for the pathological effect of the IVS9-28A/G mutation. We identified a novel BP mutation in KCNH2 that disrupts the intron 9 acceptor splice site definition and causes LQT2. The present finding demonstrates that intronic mutations affecting pre-mRNA processing may contribute to the failure of traditional molecular screening in identifying disease-causing mutations in LQTS subjects and offers a rationale strategy for the reduction of genotype-negative cases.

  14. Draft genome sequence and annotation of Lactobacillus acetotolerans BM-LA14527, a beer-spoilage bacteria.

    PubMed

    Liu, Junyan; Li, Lin; Peters, Brian M; Li, Bing; Deng, Yang; Xu, Zhenbo; Shirtliff, Mark E

    2016-09-01

    Lactobacillus acetotolerans is a hard-to-culture beer-spoilage bacterium capable of entering into the viable putative nonculturable (VPNC) state. As part of an initial strategy to investigate the phenotypic behavior of L. acetotolerans, draft genome sequencing was performed. Results demonstrated a total of 1824 predicted annotated genes, with several potential VPNC- and beer-spoilage-associated genes identified. Importantly, this is the first genome sequence of L. acetotolerans as beer-spoilage bacteria and it may aid in further analysis of L. acetotolerans and other beer-spoilage bacteria, with direct implications for food safety control in the beer brewing industry. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  15. Nucleotide sequence and transcriptional start site of the Methylobacterium organophilum XX methanol dehydrogenase structural gene

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Machlin, S.M.; Hanson, R.S.

    The nucleotide sequence of a cloned 2.5-kilobase-pair SmaI fragment containing the methanol dehydrogenase (MDH) structural gene from Methylobacterium organophilum XX was determined. A single open reading frame with a coding capacity of 626 amino acids (molecular weight, 66,000) was identified on one stand, and N-terminal sequencing of purified MDH revealed that 27 of these residues constituted a putative signal peptide. Primer extension mapping of in vivo transcripts indicated that the start of mRNA synthesis was 160 to 170 base pairs upstream of the ATG codon. Northern (RNA) blot analysis further demonstrated that the transcript was 2.1 kilobase pairs in lengthmore » and therefore appeared to encode only MDH.« less

  16. Bioremediation potential of a highly mercury resistant bacterial strain Sphingobium SA2 isolated from contaminated soil.

    PubMed

    Mahbub, Khandaker Rayhan; Krishnan, Kannan; Megharaj, Mallavarapu; Naidu, Ravi

    2016-02-01

    A mercury resistant bacterial strain, SA2, was isolated from soil contaminated with mercury. The 16S rRNA gene sequence of this isolate showed 99% sequence similarity to the genera Sphingobium and Sphingomonas of α-proteobacteria group. However, the isolate formed a distinct phyletic line with the genus Sphingobium suggesting the strain belongs to Sphingobium sp. Toxicity studies indicated resistance to high levels of mercury with estimated EC50 values 4.5 mg L(-1) and 44.15 mg L(-1) and MIC values 5.1 mg L(-1) and 48.48 mg L(-1) in minimal and rich media, respectively. The strain SA2 was able to volatilize mercury by producing mercuric reductase enzyme which makes it potential candidate for remediating mercury. ICP-QQQ-MS analysis of Hg supplemented culture solutions confirmed that almost 79% mercury in the culture suspension was volatilized in 6 h. A very small amount of mercury was observed to accumulate in cell pellets which was also evident according to ESEM-EDX analysis. The mercuric reductase gene merA was amplified and sequenced. The deduced amino acid sequence demonstrated sequence homology with α-proteobacteria and Ascomycota group. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. Bacterial Pathogens and Community Composition in Advanced Sewage Treatment Systems Revealed by Metagenomics Analysis Based on High-Throughput Sequencing

    PubMed Central

    Lu, Xin; Zhang, Xu-Xiang; Wang, Zhu; Huang, Kailong; Wang, Yuan; Liang, Weigang; Tan, Yunfei; Liu, Bo; Tang, Junying

    2015-01-01

    This study used 454 pyrosequencing, Illumina high-throughput sequencing and metagenomic analysis to investigate bacterial pathogens and their potential virulence in a sewage treatment plant (STP) applying both conventional and advanced treatment processes. Pyrosequencing and Illumina sequencing consistently demonstrated that Arcobacter genus occupied over 43.42% of total abundance of potential pathogens in the STP. At species level, potential pathogens Arcobacter butzleri, Aeromonas hydrophila and Klebsiella pneumonia dominated in raw sewage, which was also confirmed by quantitative real time PCR. Illumina sequencing also revealed prevalence of various types of pathogenicity islands and virulence proteins in the STP. Most of the potential pathogens and virulence factors were eliminated in the STP, and the removal efficiency mainly depended on oxidation ditch. Compared with sand filtration, magnetic resin seemed to have higher removals in most of the potential pathogens and virulence factors. However, presence of the residual A. butzleri in the final effluent still deserves more concerns. The findings indicate that sewage acts as an important source of environmental pathogens, but STPs can effectively control their spread in the environment. Joint use of the high-throughput sequencing technologies is considered a reliable method for deep and comprehensive overview of environmental bacterial virulence. PMID:25938416

  18. [Discovery and whole genome sequences analysis of a dengue virus type 1 strain isolated in Henan province].

    PubMed

    Du, Yanhua; Zhang, Baifan; Li, Yi; Ma, Hongxia; Huang, Xueyong; Xu, Bianli

    2015-10-01

    To diagnose imported dengue fever case from Henan province, and to sequence and analyze the characteristics of whole genome sequence, and to explore the possible viral origin source. A suspected dengue fever case was reported in Yuzhou city, Henan province. The patient returned from foshan, Guangdong province on September 19, 2014, after the epidemiological investigation and serum specimen collected, which dengue fever case was diagnosed in the laboratory, then it was inoculated on Vero cells. Whole genome sequence was amplified by several pairs primers and characterized using biologic software. The imported case was diagnosed as dengue virus 1 serotype infection. Dengue 1 strain was isolated using Vero cells successfully. Whole genome was 10,670 nt, which belonged to dengue virus 1 serotype V genotype and didn't found any recombination event. The phylogenetic analysis demonstrated that the strain was closed to Indian starins isolated in 2008-2011, and the homology of nucleotide sequence was between 98.2%-99.4%. It was the first time to discover imported dengue 1 serotype case in Henan province. However, according to the patient has been to Guangdong province before onset, it inferred that the Indian strain had been imported to Guangdong province before this case in Henan province.

  19. Isolation and Identification of Pathogenicity Mutant of Curvularia lunata via Restriction Enzyme-Mediated Integration.

    PubMed

    Wang, Y J; Liu, T; Hou, J M; Zuo, Y H

    2013-09-01

    In this report, 156 hygromycin-resistant mutants were generated via restriction enzyme-mediated insertional (REMI) mutagenesis. All mutants were subjected to a bioassay on detached leaves. Five mutants (T4, T39, T71, T91, and T135) showed reduced symptom development, whereas one mutant (T120) did not exhibit any symptoms on the leaves compared with the wild type. The pathogenicity of these mutants was further assayed through the spray inoculation of whole seedlings. The results demonstrated that the pathogenicity of the T4, T39, T71, T91, and T135 mutants was reduced, whereas the T120 mutant lost its pathogenicity. Southern blot analysis revealed that the plasmids were inserted at different sites in the genome with different copy numbers. Flanking sequences approximately 550, 860, and 150 bp were obtained from T7, T91, and T120, respectively through plasmids rescue. Sequence analysis of the flanking sequences from T7 and T91 showed no homology to any known sequences in GenBank. The flanking sequence from the T120 mutant was highly homologous to MAPKK kinases, which regulates sexual/asexual development, melanization, pathogenicity from Cochliobolus heterostrophus. These results indicate that REMI and plasmids rescue have great potential for finding pathogenicity genes.

  20. Implications of diadochokinesia in children with speech sound disorder.

    PubMed

    Wertzner, Haydée Fiszbein; Pagan-Neves, Luciana de Oliveira; Alves, Renata Ramos; Barrozo, Tatiane Faria

    2013-01-01

    To verify the performance of children with and without speech sound disorder in oral motor skills measured by oral diadochokinesia according to age and gender and to compare the results by two different methods of analysis. Participants were 72 subjects aged from 5 years to 7 years and 11 months divided into four subgroups according to the presence of speech sound disorder (Study Group and Control Group) and age (<6 years and 5 months and >6 years and 5 months). Diadochokinesia skills were assessed by the repetition of the sequences 'pa', 'ta', 'ka' and 'pataka' measured both manually and by the software Motor Speech Profile®. Gender was statistically different for both groups but it did not influence on the number of sequences per second produced. Correlation between the number of sequences per second and age was observed for all sequences (except for 'ka') only for the control group children. Comparison between groups did not indicate differences between the number of sequences per second and age. Results presented strong agreement between the values of oral diadochokinesia measured manually and by MSP. This research demonstrated the importance of using different methods of analysis on the functional evaluation of oro-motor processing aspects of children with speech sound disorder and evidenced the oro-motor difficulties on children aged under than eight years old.

  1. Comparative sequence analysis of a region on human chromosome 13q14, frequently deleted in B-cell chronic lymphocytic leukemia, and its homologous region on mouse chromosome 14.

    PubMed

    Kapanadze, B; Makeeva, N; Corcoran, M; Jareborg, N; Hammarsund, M; Baranova, A; Zabarovsky, E; Vorontsova, O; Merup, M; Gahrton, G; Jansson, M; Yankovsky, N; Einhorn, S; Oscier, D; Grandér, D; Sangfelt, O

    2000-12-15

    Previous studies have indicated the presence of a putative tumor suppressor gene on human chromosome 13q14, commonly deleted in patients with B-cell chronic lymphocytic leukemia (B-CLL). We have recently identified a minimally deleted region encompassing parts of two adjacent genes, termed LEU1 and LEU2 (leukemia-associated genes 1 and 2), and several additional transcripts. In addition, 50 kb centromeric to this region we have identified another gene, LEU5/RFP2. To elucidate further the complex genomic organization of this region, we have identified, mapped, and sequenced the homologous region in the mouse. Fluorescence in situ hybridization analysis demonstrated that the region maps to mouse chromosome 14. The overall organization and gene order in this region were found to be highly conserved in the mouse. Sequence comparison between the human deletion hotspot region and its homologous mouse region revealed a high degree of sequence conservation with an overall score of 74%. However, our data also show that in terms of transcribed sequences, only two of those, human LEU2 and LEU5/RFP2, are clearly conserved, strengthening the case for these genes as putative candidate B-CLL tumor suppressor genes.

  2. A Simple Method for Visualization of Locus-Specific H4K20me1 Modifications in Living Caenorhabditis elegans Single Cells.

    PubMed

    Shinkai, Yoichi; Kuramochi, Masahiro; Doi, Motomichi

    2018-05-03

    Recently, advances in next-generation sequencing technologies have enabled genome-wide analyses of epigenetic modifications; however, it remains difficult to analyze the states of histone modifications at a single-cell resolution in living multicellular organisms because of the heterogeneity within cellular populations. Here we describe a simple method to visualize histone modifications on the specific sequence of target locus at a single-cell resolution in living Caenorhabditis elegans , by combining the LacO/LacI system and a genetically-encoded H4K20me1-specific probe, "mintbody". We demonstrate that Venus-labeled mintbody and mTurquoise2-labeled LacI can co-localize on an artificial chromosome carrying both the target locus and LacO sequences, where H4K20me1 marks the target locus. We demonstrate that our visualization method can precisely detect H4K20me1 depositions on the her-1 gene sequences on the artificial chromosome, to which the dosage compensation complex binds to regulate sex determination. The degree of H4K20me1 deposition on the her-1 sequences on the artificial chromosome correlated strongly with sex, suggesting that, using the artificial chromosome, this method can reflect context-dependent changes of H4K20me1 on endogenous genomes. Furthermore, we demonstrate live imaging of H4K20me1 depositions on the artificial chromosome. Combined with ChIP assays, this mintbody-LacO/LacI visualization method will enable analysis of developmental and context-dependent alterations of locus-specific histone modifications in specific cells and elucidation of the underlying molecular mechanisms. Copyright © 2018, G3: Genes, Genomes, Genetics.

  3. Phylo-mLogo: an interactive and hierarchical multiple-logo visualization tool for alignment of many sequences

    PubMed Central

    Shih, Arthur Chun-Chieh; Lee, DT; Peng, Chin-Lin; Wu, Yu-Wei

    2007-01-01

    Background When aligning several hundreds or thousands of sequences, such as epidemic virus sequences or homologous/orthologous sequences of some big gene families, to reconstruct the epidemiological history or their phylogenies, how to analyze and visualize the alignment results of many sequences has become a new challenge for computational biologists. Although there are several tools available for visualization of very long sequence alignments, few of them are applicable to the alignments of many sequences. Results A multiple-logo alignment visualization tool, called Phylo-mLogo, is presented in this paper. Phylo-mLogo calculates the variabilities and homogeneities of alignment sequences by base frequencies or entropies. Different from the traditional representations of sequence logos, Phylo-mLogo not only displays the global logo patterns of the whole alignment of multiple sequences, but also demonstrates their local homologous logos for each clade hierarchically. In addition, Phylo-mLogo also allows the user to focus only on the analysis of some important, structurally or functionally constrained sites in the alignment selected by the user or by built-in automatic calculation. Conclusion With Phylo-mLogo, the user can symbolically and hierarchically visualize hundreds of aligned sequences simultaneously and easily check the changes of their amino acid sites when analyzing many homologous/orthologous or influenza virus sequences. More information of Phylo-mLogo can be found at URL . PMID:17319966

  4. Identifying Novel Helix–Loop–Helix Genes in Caenorhabditis elegans through a Classroom Demonstration of Functional Genomics

    PubMed Central

    Griffin, Vernetta; McMiller, Tracee; Jones, Erika; Johnson, Casonya M.

    2003-01-01

    A 14-week, undergraduate-level Genetics and Population Biology course at Morgan State University was modified to include a demonstration of functional genomics in the research laboratory. Students performed a rudimentary sequence analysis of the Caenorhabditis elegans genome and further characterized three sequences that were predicted to encode helix–loop–helix proteins. Students then used reverse transcription–polymerase chain reaction to determine which of the three genes is normally expressed in C. elegans. At the end of this laboratory activity, students were 1) to demonstrate a rudimentary knowledge of bioinformatics, including the ability to differentiate between “having” a gene and “expressing” a gene, and 2) to understand basic approaches to functional genomics, including one specific technique for assaying for gene expression. It was also anticipated that students would increase their skills at effectively communicating their research activities through written and/or oral presentation. This article describes the laboratory activity and the assessment of the effectiveness of the activity. PMID:12822036

  5. Direct detection of RNA in vitro and in situ by target-primed RCA: The impact of E. coli RNase III on the detection efficiency of RNA sequences distanced far from the 3'-end.

    PubMed

    Merkiene, Egle; Gaidamaviciute, Edita; Riauba, Laurynas; Janulaitis, Arvydas; Lagunavicius, Arunas

    2010-08-01

    We improved the target RNA-primed RCA technique for direct detection and analysis of RNA in vitro and in situ. Previously we showed that the 3' --> 5' single-stranded RNA exonucleolytic activity of Phi29 DNA polymerase converts the target RNA into a primer and uses it for RCA initiation. However, in some cases, the single-stranded RNA exoribonucleolytic activity of the polymerase is hindered by strong double-stranded structures at the 3'-end of target RNAs. We demonstrate that in such hampered cases, the double-stranded RNA-specific Escherichia coli RNase III efficiently assists Phi29 DNA polymerase in converting the target RNA into a primer. These observations extend the target RNA-primed RCA possibilities to test RNA sequences distanced far from the 3'-end and customize this technique for the inner RNA sequence analysis.

  6. Analysis of protein-coding genetic variation in 60,706 humans.

    PubMed

    Lek, Monkol; Karczewski, Konrad J; Minikel, Eric V; Samocha, Kaitlin E; Banks, Eric; Fennell, Timothy; O'Donnell-Luria, Anne H; Ware, James S; Hill, Andrew J; Cummings, Beryl B; Tukiainen, Taru; Birnbaum, Daniel P; Kosmicki, Jack A; Duncan, Laramie E; Estrada, Karol; Zhao, Fengmei; Zou, James; Pierce-Hoffman, Emma; Berghout, Joanne; Cooper, David N; Deflaux, Nicole; DePristo, Mark; Do, Ron; Flannick, Jason; Fromer, Menachem; Gauthier, Laura; Goldstein, Jackie; Gupta, Namrata; Howrigan, Daniel; Kiezun, Adam; Kurki, Mitja I; Moonshine, Ami Levy; Natarajan, Pradeep; Orozco, Lorena; Peloso, Gina M; Poplin, Ryan; Rivas, Manuel A; Ruano-Rubio, Valentin; Rose, Samuel A; Ruderfer, Douglas M; Shakir, Khalid; Stenson, Peter D; Stevens, Christine; Thomas, Brett P; Tiao, Grace; Tusie-Luna, Maria T; Weisburd, Ben; Won, Hong-Hee; Yu, Dongmei; Altshuler, David M; Ardissino, Diego; Boehnke, Michael; Danesh, John; Donnelly, Stacey; Elosua, Roberto; Florez, Jose C; Gabriel, Stacey B; Getz, Gad; Glatt, Stephen J; Hultman, Christina M; Kathiresan, Sekar; Laakso, Markku; McCarroll, Steven; McCarthy, Mark I; McGovern, Dermot; McPherson, Ruth; Neale, Benjamin M; Palotie, Aarno; Purcell, Shaun M; Saleheen, Danish; Scharf, Jeremiah M; Sklar, Pamela; Sullivan, Patrick F; Tuomilehto, Jaakko; Tsuang, Ming T; Watkins, Hugh C; Wilson, James G; Daly, Mark J; MacArthur, Daniel G

    2016-08-18

    Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

  7. Analysis of an "off-ladder" allele at the Penta D short tandem repeat locus.

    PubMed

    Yang, Y L; Wang, J G; Wang, D X; Zhang, W Y; Liu, X J; Cao, J; Yang, S L

    2015-11-25

    Kinship testing of a father and his son from Guangxi, China, the location of the Zhuang minority people, was performed using the PowerPlex® 18D System with a short tandem repeat typing kit. The results indicated that both the father and his son had an off-ladder allele at the Penta D locus, with a genetic size larger than that of the maximal standard allelic ladder. To further identify this locus, monogenic amplification, gene cloning, and genetic sequencing were performed. Sequencing analysis demonstrated that the fragment size of the Penta D-OL locus was 469 bp and the core sequence was [AAAGA]21, also called Penta D-21. The rare Penta D-21 allele was found to be distributed among the Zhuang population from the Guangxi Zhuang Autonomous Region of China; therefore, this study improved the range of DNA data available for this locus and enhanced our ability for individual identification of gene loci.

  8. Weissella ghanensis sp. nov., isolated from a Ghanaian cocoa fermentation.

    PubMed

    De Bruyne, Katrien; Camu, Nicholas; Lefebvre, Karen; De Vuyst, Luc; Vandamme, Peter

    2008-12-01

    During a study on lactic acid bacteria (and their species diversity) in spontaneous heap fermentations of Ghanaian cocoa beans, two strains, designated 215(T) and 194B, were isolated. A phylogenetic analysis based on 16S rRNA gene sequences demonstrated that these strains represented a distinct lineage close to the genus Weissella and showing only 92.1 % 16S rRNA gene sequence similarity with respect to their closest neighbour, Weissella soli LMG 20113(T). Whole-cell protein electrophoresis, fluorescent amplified fragment length polymorphism fingerprinting of whole genomes and physiological and biochemical tests confirmed the unique taxonomic position of the two novel isolates. On the basis of the results of the morphological and biochemical tests and 16S rRNA gene sequence analysis, strains 215(T) and 194B represent the most peripheral lineage of the genus Weissella, for which we propose the name Weissella ghanensis sp. nov. The type strain is 215(T) (=LMG 24286(T)=DSM 19935(T)).

  9. Calibrating genomic and allelic coverage bias in single-cell sequencing.

    PubMed

    Zhang, Cheng-Zhong; Adalsteinsson, Viktor A; Francis, Joshua; Cornils, Hauke; Jung, Joonil; Maire, Cecile; Ligon, Keith L; Meyerson, Matthew; Love, J Christopher

    2015-04-16

    Artifacts introduced in whole-genome amplification (WGA) make it difficult to derive accurate genomic information from single-cell genomes and require different analytical strategies from bulk genome analysis. Here, we describe statistical methods to quantitatively assess the amplification bias resulting from whole-genome amplification of single-cell genomic DNA. Analysis of single-cell DNA libraries generated by different technologies revealed universal features of the genome coverage bias predominantly generated at the amplicon level (1-10 kb). The magnitude of coverage bias can be accurately calibrated from low-pass sequencing (∼0.1 × ) to predict the depth-of-coverage yield of single-cell DNA libraries sequenced at arbitrary depths. We further provide a benchmark comparison of single-cell libraries generated by multi-strand displacement amplification (MDA) and multiple annealing and looping-based amplification cycles (MALBAC). Finally, we develop statistical models to calibrate allelic bias in single-cell whole-genome amplification and demonstrate a census-based strategy for efficient and accurate variant detection from low-input biopsy samples.

  10. Calibrating genomic and allelic coverage bias in single-cell sequencing

    PubMed Central

    Francis, Joshua; Cornils, Hauke; Jung, Joonil; Maire, Cecile; Ligon, Keith L.; Meyerson, Matthew; Love, J. Christopher

    2016-01-01

    Artifacts introduced in whole-genome amplification (WGA) make it difficult to derive accurate genomic information from single-cell genomes and require different analytical strategies from bulk genome analysis. Here, we describe statistical methods to quantitatively assess the amplification bias resulting from whole-genome amplification of single-cell genomic DNA. Analysis of single-cell DNA libraries generated by different technologies revealed universal features of the genome coverage bias predominantly generated at the amplicon level (1–10 kb). The magnitude of coverage bias can be accurately calibrated from low-pass sequencing (~0.1 ×) to predict the depth-of-coverage yield of single-cell DNA libraries sequenced at arbitrary depths. We further provide a benchmark comparison of single-cell libraries generated by multi-strand displacement amplification (MDA) and multiple annealing and looping-based amplification cycles (MALBAC). Finally, we develop statistical models to calibrate allelic bias in single-cell whole-genome amplification and demonstrate a census-based strategy for efficient and accurate variant detection from low-input biopsy samples. PMID:25879913

  11. Breath-held MR Cholangiopancreatography (MRCP) using a 3D Dixon fat–water separated balanced steady state free precession sequence

    PubMed Central

    Glockner, James F.; Saranathan, Manojkumar; Bayram, Ersin; Lee, Christine U.

    2014-01-01

    A novel 3D breath-held Dixon fat–water separated balanced steady state free precession (b-SSFP) sequence for MR cholangiopancreatography (MRCP) is described and its potential clinical utility assessed in a series of patients. The main motivation is to develop a robust breath-held alternative to the respiratory gated 3D Fast Spin Echo (FSE) sequence, the current clinical sequence of choice for MRCP. Respiratory gated acquisitions are susceptible to motion artifacts and blurring in patients with significant diaphragmatic drift, erratic respiratory rhythms or sleep apnea. A two point Dixon fat–water separation scheme was developed which eliminates signal loss arising from B0 inhomogeneity effects and minimizes artifacts from perturbation of the b-SSFP steady state. Preliminary results from qualitative analysis of 49 patients demonstrate robust performance of the 3D Dixon b-SSFP sequence with diagnostic image quality acquired in a 20–24 s breath-hold. PMID:23876262

  12. Identification of full-length proviral DNA of porcine endogenous retrovirus from Chinese Wuzhishan miniature pigs inbred.

    PubMed

    Ma, Yuyuan; Lv, Maomin; Xu, Shu; Wu, Jianmin; Tian, Kegong; Zhang, Jingang

    2010-07-01

    Existence of porcine endogenous retrovirus (PERV) hinders pigs to be used in clinical xenotransplantation to alleviate the shortage of human transplants. Chinese miniature pigs are potential organ donors for xenotransplantation in China. However, so far, an adequate level of information on the molecular characteristics of PERV from Chinese miniature pigs has not been available. We described here the cloning and characterization of full-length proviral DNA of PERV from Chinese Wuzhishan miniature pigs inbred (WZSP). Full-length nucleotide sequences of PERV-WZSP and other PERVs were aligned and phylogenetic tree was constructed from deduced amino-acid sequences of env. The results demonstrated that the full-length proviral DNA of PERV-WZSP belongs to gammaretrovirus and shares high similarity with other PERVs. Sequence analysis also suggested that different patterns of LTR existed in the same porcine germ line and partial PERV-C sequence may recombine with PERV-A sequence in LTR. (c) 2008 Elsevier Ltd. All rights reserved.

  13. Self-Organizing Hidden Markov Model Map (SOHMMM).

    PubMed

    Ferles, Christos; Stafylopatis, Andreas

    2013-12-01

    A hybrid approach combining the Self-Organizing Map (SOM) and the Hidden Markov Model (HMM) is presented. The Self-Organizing Hidden Markov Model Map (SOHMMM) establishes a cross-section between the theoretic foundations and algorithmic realizations of its constituents. The respective architectures and learning methodologies are fused in an attempt to meet the increasing requirements imposed by the properties of deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and protein chain molecules. The fusion and synergy of the SOM unsupervised training and the HMM dynamic programming algorithms bring forth a novel on-line gradient descent unsupervised learning algorithm, which is fully integrated into the SOHMMM. Since the SOHMMM carries out probabilistic sequence analysis with little or no prior knowledge, it can have a variety of applications in clustering, dimensionality reduction and visualization of large-scale sequence spaces, and also, in sequence discrimination, search and classification. Two series of experiments based on artificial sequence data and splice junction gene sequences demonstrate the SOHMMM's characteristics and capabilities. Copyright © 2013 Elsevier Ltd. All rights reserved.

  14. Emergence of spike correlations in periodically forced excitable systems

    NASA Astrophysics Data System (ADS)

    Reinoso, José A.; Torrent, M. C.; Masoller, Cristina

    2016-09-01

    In sensory neurons the presence of noise can facilitate the detection of weak information-carrying signals, which are encoded and transmitted via correlated sequences of spikes. Here we investigate the relative temporal order in spike sequences induced by a subthreshold periodic input in the presence of white Gaussian noise. To simulate the spikes, we use the FitzHugh-Nagumo model and to investigate the output sequence of interspike intervals (ISIs), we use the symbolic method of ordinal analysis. We find different types of relative temporal order in the form of preferred ordinal patterns that depend on both the strength of the noise and the period of the input signal. We also demonstrate a resonancelike behavior, as certain periods and noise levels enhance temporal ordering in the ISI sequence, maximizing the probability of the preferred patterns. Our findings could be relevant for understanding the mechanisms underlying temporal coding, by which single sensory neurons represent in spike sequences the information about weak periodic stimuli.

  15. MIG-seq: an effective PCR-based method for genome-wide single-nucleotide polymorphism genotyping using the next-generation sequencing platform

    PubMed Central

    Suyama, Yoshihisa; Matsuki, Yu

    2015-01-01

    Restriction-enzyme (RE)-based next-generation sequencing methods have revolutionized marker-assisted genetic studies; however, the use of REs has limited their widespread adoption, especially in field samples with low-quality DNA and/or small quantities of DNA. Here, we developed a PCR-based procedure to construct reduced representation libraries without RE digestion steps, representing de novo single-nucleotide polymorphism discovery, and its genotyping using next-generation sequencing. Using multiplexed inter-simple sequence repeat (ISSR) primers, thousands of genome-wide regions were amplified effectively from a wide variety of genomes, without prior genetic information. We demonstrated: 1) Mendelian gametic segregation of the discovered variants; 2) reproducibility of genotyping by checking its applicability for individual identification; and 3) applicability in a wide variety of species by checking standard population genetic analysis. This approach, called multiplexed ISSR genotyping by sequencing, should be applicable to many marker-assisted genetic studies with a wide range of DNA qualities and quantities. PMID:26593239

  16. Characterization of rabbit limbal epithelial side population cells using RNA sequencing and single-cell qRT-PCR.

    PubMed

    Kameishi, Sumako; Umemoto, Terumasa; Matsuzaki, Yu; Fujita, Masako; Okano, Teruo; Kato, Takashi; Yamato, Masayuki

    2016-05-06

    Corneal epithelial stem cells reside in the limbus, a transitional zone between the cornea and conjunctiva, and are essential for maintaining homeostasis in the corneal epithelium. Although our previous studies demonstrated that rabbit limbal epithelial side population (SP) cells exhibit stem cell-like phenotypes with Hoechst 33342 staining, the different characteristics and/or populations of these cells remain unclear. Therefore, in this study, we determined the gene expression profiles of limbal epithelial SP cells by RNA sequencing using not only present public databases but also contigs that were created by de novo transcriptome assembly as references for mapping. Our transcriptome data indicated that limbal epithelial SP cells exhibited a stem cell-like phenotype compared with non-SP cells. Importantly, gene ontology analysis following RNA sequencing demonstrated that limbal epithelial SP cells exhibited significantly enhanced expression of mesenchymal/endothelial cell markers rather than epithelial cell markers. Furthermore, single-cell quantitative real-time reverse transcription polymerase chain reaction (qRT-PCR) demonstrated that the limbal epithelial SP population consisted of at least two immature cell populations with endothelial- or mesenchymal-like phenotypes. Therefore, our present results may propose the presence of a novel population of corneal epithelial stem cells distinct from conventional epithelial stem cells. Copyright © 2015 Elsevier Inc. All rights reserved.

  17. De novo sequencing and characterization of floral transcriptome in two species of buckwheat (Fagopyrum)

    PubMed Central

    2011-01-01

    Background Transcriptome sequencing data has become an integral component of modern genetics, genomics and evolutionary biology. However, despite advances in the technologies of DNA sequencing, such data are lacking for many groups of living organisms, in particular, many plant taxa. We present here the results of transcriptome sequencing for two closely related plant species. These species, Fagopyrum esculentum and F. tataricum, belong to the order Caryophyllales - a large group of flowering plants with uncertain evolutionary relationships. F. esculentum (common buckwheat) is also an important food crop. Despite these practical and evolutionary considerations Fagopyrum species have not been the subject of large-scale sequencing projects. Results Normalized cDNA corresponding to genes expressed in flowers and inflorescences of F. esculentum and F. tataricum was sequenced using the 454 pyrosequencing technology. This resulted in 267 (for F. esculentum) and 229 (F. tataricum) thousands of reads with average length of 341-349 nucleotides. De novo assembly of the reads produced about 25 thousands of contigs for each species, with 7.5-8.2× coverage. Comparative analysis of two transcriptomes demonstrated their overall similarity but also revealed genes that are presumably differentially expressed. Among them are retrotransposon genes and genes involved in sugar biosynthesis and metabolism. Thirteen single-copy genes were used for phylogenetic analysis; the resulting trees are largely consistent with those inferred from multigenic plastid datasets. The sister relationships of the Caryophyllales and asterids now gained high support from nuclear gene sequences. Conclusions 454 transcriptome sequencing and de novo assembly was performed for two congeneric flowering plant species, F. esculentum and F. tataricum. As a result, a large set of cDNA sequences that represent orthologs of known plant genes as well as potential new genes was generated. PMID:21232141

  18. A novel peptide from the ACEI/BPP-CNP precursor in the venom of Crotalus durissus collilineatus.

    PubMed

    Higuchi, Shigesada; Murayama, Nobuhiro; Saguchi, Ken-ichi; Ohi, Hiroaki; Fujita, Yoshiaki; da Silva, Nelson Jorge; de Siqueira, Rodrigo José Bezerra; Lahlou, Saad; Aird, Steven D

    2006-10-01

    In crotaline venoms, angiotensin-converting enzyme inhibitors [ACEIs, also known as bradykinin potentiating peptides (BPPs)], are products of a gene coding for an ACEI/BPP-C-type natriuretic peptide (CNP) precursor. In the genes from Bothrops jararaca and Gloydius blomhoffii, ACEI/BPP sequences are repeated. Sequencing of a cDNA clone from venom glands of Crotalus durissus collilineatus showed that two ACEIs/BPPs are located together at the N-terminus, but without repeats. An additional sequence for CNP was unexpectedly found at the C-terminus. Homologous genes for the ACEI/BPP-CNP precursor suggest that most crotaline venoms contain both ACEIs/BPPs and CNP. The sequence of ACEIs/BPPs is separated from the CNP sequence by a long spacer sequence. Previously, there was no evidence that this spacer actually coded any expressed peptides. Aird and Kaiser (1986, unpublished) previously isolated and sequenced a peptide of 11 residues (TPPAGPDVGPR) from Crotalus viridis viridis venom. In the present study, analysis of the cDNA clone from C. d. collilineatus revealed a nearly identical sequence in the ACEI/BPP-CNP spacer. Fractionation of the crude venom by reverse phase HPLC (C(18)), and analysis of the fractions by mass spectrometry (MS) indicated a component of 1020.5 Da. Amino acid sequencing by MS/MS confirmed that C. d. collilineatus venom contains the peptide TPPAGPDGGPR. Its high proline content and paired proline residues are typical of venom hypotensive peptides, although it lacks the usual N-terminal pyroglutamate. It has no demonstrable hypotensive activity when injected intravenously in rats; however, its occurrence in the venoms of dissimilar species suggests that its presence is not accidental. Evidence suggests that these novel toxins probably activate anaphylatoxin C3a receptors.

  19. Nucleotide sequence analysis of the recA gene and discrimination of the three isolates of urease-positive thermophilic Campylobacter (UPTC) isolated from seagulls (Larus spp.) in Northern Ireland.

    PubMed

    Matsuda, M; Tai, K; Moore, J E; Millar, B C; Murayama, O

    2004-01-01

    Nucleotide sequencing after TA cloning of the amplicon of the almost-full length recA gene from three strains of UPTC (A1, A2, and A3) isolated from seagulls in Northern Ireland, the phenotypical and genotypical characteristics of which have been demonstrated to be indistinguishable, clarified nucleotide differences at three nucleotide positions among the three strains. In conclusion, the nucleotide sequences of the recA gene were found to discriminate among the three strains of UPTC, A1, A2, and A3, which are indistinguishable phenotypically and genotypically. Thus, the present study strongly suggests that nucleotide sequence data of the amplicon of a suitable gene or region could aid in discriminating among isolates of the UPTC group, which are indistinguishable phenotypically and genotypically. Copyright 2004 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

  20. Phase 2 of the array automated assembly task for the low cost solar array project

    NASA Technical Reports Server (NTRS)

    Campbell, R. B.; Davis, J. R.; Ostroski, J. W.; Rai-Choudhury, P.; Rohatgi, A.; Seman, E. J.; Stapleton, R. E.

    1979-01-01

    The process sequence for the fabrication of dendritic web silicon into solar panels was modified to include aluminum back surface field formation. Plasma etching was found to be a feasible technique for pre-diffusion cleaning of the web. Several contacting systems were studied. The total plated Pd-Ni system was not compatible with the process sequence; however, the evaporated TiPd-electroplated Cu system was shown stable under life testing. Ultrasonic bonding parameters were determined for various interconnect and contact metals but the yield of the process was not sufficiently high to use for module fabrication at this time. Over 400 solar cells were fabricated according to the modified sequence. No sub-process incompatibility was seen. These cells were used to fabricate four demonstration modules. A cost analysis of the modified process sequence resulted in a selling price of $0.75/peak watt.

  1. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kurilla, M.G.; Stone, H.O.; Keene, J.D.

    The 3' end of the genomic RNA of Newcastle disease virus (NDV) has been sequenced and the leader RNA defined. Using hybridization to a 3'-end-labeled genome, leader RNA species from in vitro transcription reactions and from infected cell extracts were found to be 47 and 53 nucleotides long. In addition, the start site of the 3'-proximal mRNA was determined by sequence analysis of in vitro (beta-32P)GTP-labeled transcription products. The genomic sequence extending beyond the leader region demonstrated an open reading frame for at least 42 amino acids and probably represents the amino terminus of the nucleocapsid protein (NP). The terminalmore » 8 nucleotides of the NDV genome were identical to those of measles virus and Sendai virus while the sequence of the distal half of the leader region was more similar to that of vesicular stomatitis virus. These data argue for strong evolutionary relatedness between the paramyxovirus and rhabdovirus groups.« less

  2. Helicos BioSciences.

    PubMed

    Milos, Patrice

    2008-04-01

    Helicos BioSciences Corporation is a life sciences company developing revolutionary new single molecule sequencing technology to provide the path to the US$1000 genome. True Single Molecule Sequencing (tSMS) will drive advancements in pharmacogenomics that can enable a better understanding of an individual's susceptibility to disease, develop more effective disease diagnoses and differentiate response to disease therapies. During 2007, genome-wide disease-association studies, the encylopedia of DNA elements (ENCODE) and the published genome sequence of two individuals have revealed human genome variation far more extensive than originally believed. These also demonstrated that common variations explain only a fraction of the genetic basis of disease. Therefore, the capability to understand an individual genome is critical in setting the foundation for the next great revolution in healthcare. Helicos is committed to this vision and will provide cost-effective genome sequencing and comprehensive analysis of the transcribed genome that can unlock the era of personalized healthcare.

  3. On Statistical Modeling of Sequencing Noise in High Depth Data to Assess Tumor Evolution

    NASA Astrophysics Data System (ADS)

    Rabadan, Raul; Bhanot, Gyan; Marsilio, Sonia; Chiorazzi, Nicholas; Pasqualucci, Laura; Khiabanian, Hossein

    2018-07-01

    One cause of cancer mortality is tumor evolution to therapy-resistant disease. First line therapy often targets the dominant clone, and drug resistance can emerge from preexisting clones that gain fitness through therapy-induced natural selection. Such mutations may be identified using targeted sequencing assays by analysis of noise in high-depth data. Here, we develop a comprehensive, unbiased model for sequencing error background. We find that noise in sufficiently deep DNA sequencing data can be approximated by aggregating negative binomial distributions. Mutations with frequencies above noise may have prognostic value. We evaluate our model with simulated exponentially expanded populations as well as data from cell line and patient sample dilution experiments, demonstrating its utility in prognosticating tumor progression. Our results may have the potential to identify significant mutations that can cause recurrence. These results are relevant in the pretreatment clinical setting to determine appropriate therapy and prepare for potential recurrence pretreatment.

  4. Sequencing of the large dsDNA genome of Oryctes rhinoceros nudivirus using multiple displacement amplification of nanogram amounts of virus DNA.

    PubMed

    Wang, Yongjie; Kleespies, Regina G; Ramle, Moslim B; Jehle, Johannes A

    2008-09-01

    The genomic sequence analysis of many large dsDNA viruses is hampered by the lack of enough sample materials. Here, we report a whole genome amplification of the Oryctes rhinoceros nudivirus (OrNV) isolate Ma07 starting from as few as about 10 ng of purified viral DNA by application of phi29 DNA polymerase- and exonuclease-resistant random hexamer-based multiple displacement amplification (MDA) method. About 60 microg of high molecular weight DNA with fragment sizes of up to 25 kbp was amplified. A genomic DNA clone library was generated using the product DNA. After 8-fold sequencing coverage, the 127,615 bp of OrNV whole genome was sequenced successfully. The results demonstrate that the MDA-based whole genome amplification enables rapid access to genomic information from exiguous virus samples.

  5. On Statistical Modeling of Sequencing Noise in High Depth Data to Assess Tumor Evolution

    NASA Astrophysics Data System (ADS)

    Rabadan, Raul; Bhanot, Gyan; Marsilio, Sonia; Chiorazzi, Nicholas; Pasqualucci, Laura; Khiabanian, Hossein

    2017-12-01

    One cause of cancer mortality is tumor evolution to therapy-resistant disease. First line therapy often targets the dominant clone, and drug resistance can emerge from preexisting clones that gain fitness through therapy-induced natural selection. Such mutations may be identified using targeted sequencing assays by analysis of noise in high-depth data. Here, we develop a comprehensive, unbiased model for sequencing error background. We find that noise in sufficiently deep DNA sequencing data can be approximated by aggregating negative binomial distributions. Mutations with frequencies above noise may have prognostic value. We evaluate our model with simulated exponentially expanded populations as well as data from cell line and patient sample dilution experiments, demonstrating its utility in prognosticating tumor progression. Our results may have the potential to identify significant mutations that can cause recurrence. These results are relevant in the pretreatment clinical setting to determine appropriate therapy and prepare for potential recurrence pretreatment.

  6. Lactobacillus futsaii sp. nov., isolated from fu-tsai and suan-tsai, traditional Taiwanese fermented mustard products.

    PubMed

    Chao, Shiou-Huei; Kudo, Yuko; Tsai, Ying-Chieh; Watanabe, Koichi

    2012-03-01

    Three Gram-stain-positive strains were isolated from fermented mustard and were rod-shaped, non-motile, asporogenous, facultatively anaerobic, homofermentative and did not exhibit catalase activity. Comparative analyses of 16S rRNA, pheS and rpoA gene sequences demonstrated that the novel strains were members of the genus Lactobacillus. On the basis of 16S rRNA gene sequence analysis, the type strains of Lactobacillus crustorum (98.7% similarity), Lactobacillus farciminis (98.9%) and Lactobacillus mindensis (97.9%) were the closest neighbours. However, DNA-DNA reassociation values with these strains were less than 50%. Phenotypic and genotypic features demonstrated that these isolates represent a novel species of the genus Lactobacillus, for which the name Lactobacillus futsaii sp. nov. is proposed; the type strain is YM 0097(T) (=JCM 17355(T)=BCRC 80278(T)).

  7. Application of Broad-Spectrum Resequencing Microarray for Genotyping Rhabdoviruses▿

    PubMed Central

    Dacheux, Laurent; Berthet, Nicolas; Dissard, Gabriel; Holmes, Edward C.; Delmas, Olivier; Larrous, Florence; Guigon, Ghislaine; Dickinson, Philip; Faye, Ousmane; Sall, Amadou A.; Old, Iain G.; Kong, Katherine; Kennedy, Giulia C.; Manuguerra, Jean-Claude; Cole, Stewart T.; Caro, Valérie; Gessain, Antoine; Bourhy, Hervé

    2010-01-01

    The rapid and accurate identification of pathogens is critical in the control of infectious disease. To this end, we analyzed the capacity for viral detection and identification of a newly described high-density resequencing microarray (RMA), termed PathogenID, which was designed for multiple pathogen detection using database similarity searching. We focused on one of the largest and most diverse viral families described to date, the family Rhabdoviridae. We demonstrate that this approach has the potential to identify both known and related viruses for which precise sequence information is unavailable. In particular, we demonstrate that a strategy based on consensus sequence determination for analysis of RMA output data enabled successful detection of viruses exhibiting up to 26% nucleotide divergence with the closest sequence tiled on the array. Using clinical specimens obtained from rabid patients and animals, this method also shows a high species level concordance with standard reference assays, indicating that it is amenable for the development of diagnostic assays. Finally, 12 animal rhabdoviruses which were currently unclassified, unassigned, or assigned as tentative species within the family Rhabdoviridae were successfully detected. These new data allowed an unprecedented phylogenetic analysis of 106 rhabdoviruses and further suggest that the principles and methodology developed here may be used for the broad-spectrum surveillance and the broader-scale investigation of biodiversity in the viral world. PMID:20610710

  8. Quantifying low-frequency revertants in oral poliovirus vaccine using next generation sequencing.

    PubMed

    Sarcey, Eric; Serres, Aurélie; Tindy, Fabrice; Chareyre, Audrey; Ng, Siemon; Nicolas, Marine; Vetter, Emmanuelle; Bonnevay, Thierry; Abachin, Eric; Mallet, Laurent

    2017-08-01

    Spontaneous reversion to neurovirulence of live attenuated oral poliovirus vaccine (OPV) serotype 3 (chiefly involving the n.472U>C mutation), must be monitored during production to ensure vaccine safety and consistency. Mutant analysis by polymerase chain reaction and restriction enzyme cleavage (MAPREC) has long been endorsed by the World Health Organization as the preferred in vitro test for this purpose; however, it requires radiolabeling, which is no longer supported by many laboratories. We evaluated the performance and suitability of next generation sequencing (NGS) as an alternative to MAPREC. The linearity of NGS was demonstrated at revertant concentrations equivalent to the study range of 0.25%-1.5%. NGS repeatability and intermediate precision were comparable across all tested samples, and NGS was highly reproducible, irrespective of sequencing platform or analysis software used. NGS was performed on OPV serotype 3 working seed lots and monovalent bulks (n=21) that were previously tested using MAPREC, and which covered the representative range of vaccine production. Percentages of 472-C revertants identified by NGS and MAPREC were comparable and highly correlated (r≥0.80), with a Pearson correlation coefficient of 0.95585 (p<0.0001). NGS demonstrated statistically equivalent performance to that of MAPREC for quantifying low-frequency OPV serotype 3 revertants, and offers a valid alternative to MAPREC. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  9. Phylogenetic Analysis of Klebsiella pneumoniae from Hospitalized Children, Pakistan.

    PubMed

    Ejaz, Hasan; Wang, Nancy; Wilksch, Jonathan J; Page, Andrew J; Cao, Hanwei; Gujaran, Shruti; Keane, Jacqueline A; Lithgow, Trevor; Ul-Haq, Ikram; Dougan, Gordon; Strugnell, Richard A; Heinz, Eva

    2017-11-01

    Klebsiella pneumoniae shows increasing emergence of multidrug-resistant lineages, including strains resistant to all available antimicrobial drugs. We conducted whole-genome sequencing of 178 highly drug-resistant isolates from a tertiary hospital in Lahore, Pakistan. Phylogenetic analyses to place these isolates into global context demonstrate the expansion of multiple independent lineages, including K. quasipneumoniae.

  10. A novel missense Norrie disease mutation associated with a severe ocular phenotype.

    PubMed

    Khan, Arif O; Shamsi, Farrukh A; Al-Saif, Amr; Kambouris, Marios

    2004-01-01

    Clinical findings and pedigree analysis led to the diagnosis of severe Norrie disease in two brothers. DNA sequencing demonstrated a novel missense mutation (703G>T) that significantly alters predicted protein structure. Less severe retinal developmental disease may be associated with milder mutations in the Norrie disease gene.

  11. You Know Arnold Schwarzenegger? On Doing Questioning in Second Language Dyadic Tutorials

    ERIC Educational Resources Information Center

    Belhiah, Hassan

    2012-01-01

    This study analyses question-answer (QA) sequences in second language tutorial interaction. Using conversation analysis methodology as an analytical tool, the study demonstrates how the act of questioning is a dominant form of interaction in tutoring discourse. The doing of questioning is accomplished through a myriad of forms other than…

  12. Enhancing knowledge discovery from cancer genomics data with Galaxy

    PubMed Central

    Albuquerque, Marco A.; Grande, Bruno M.; Ritch, Elie J.; Pararajalingam, Prasath; Jessa, Selin; Krzywinski, Martin; Grewal, Jasleen K.; Shah, Sohrab P.; Boutros, Paul C.

    2017-01-01

    Abstract The field of cancer genomics has demonstrated the power of massively parallel sequencing techniques to inform on the genes and specific alterations that drive tumor onset and progression. Although large comprehensive sequence data sets continue to be made increasingly available, data analysis remains an ongoing challenge, particularly for laboratories lacking dedicated resources and bioinformatics expertise. To address this, we have produced a collection of Galaxy tools that represent many popular algorithms for detecting somatic genetic alterations from cancer genome and exome data. We developed new methods for parallelization of these tools within Galaxy to accelerate runtime and have demonstrated their usability and summarized their runtimes on multiple cloud service providers. Some tools represent extensions or refinement of existing toolkits to yield visualizations suited to cohort-wide cancer genomic analysis. For example, we present Oncocircos and Oncoprintplus, which generate data-rich summaries of exome-derived somatic mutation. Workflows that integrate these to achieve data integration and visualizations are demonstrated on a cohort of 96 diffuse large B-cell lymphomas and enabled the discovery of multiple candidate lymphoma-related genes. Our toolkit is available from our GitHub repository as Galaxy tool and dependency definitions and has been deployed using virtualization on multiple platforms including Docker. PMID:28327945

  13. Enhancing knowledge discovery from cancer genomics data with Galaxy.

    PubMed

    Albuquerque, Marco A; Grande, Bruno M; Ritch, Elie J; Pararajalingam, Prasath; Jessa, Selin; Krzywinski, Martin; Grewal, Jasleen K; Shah, Sohrab P; Boutros, Paul C; Morin, Ryan D

    2017-05-01

    The field of cancer genomics has demonstrated the power of massively parallel sequencing techniques to inform on the genes and specific alterations that drive tumor onset and progression. Although large comprehensive sequence data sets continue to be made increasingly available, data analysis remains an ongoing challenge, particularly for laboratories lacking dedicated resources and bioinformatics expertise. To address this, we have produced a collection of Galaxy tools that represent many popular algorithms for detecting somatic genetic alterations from cancer genome and exome data. We developed new methods for parallelization of these tools within Galaxy to accelerate runtime and have demonstrated their usability and summarized their runtimes on multiple cloud service providers. Some tools represent extensions or refinement of existing toolkits to yield visualizations suited to cohort-wide cancer genomic analysis. For example, we present Oncocircos and Oncoprintplus, which generate data-rich summaries of exome-derived somatic mutation. Workflows that integrate these to achieve data integration and visualizations are demonstrated on a cohort of 96 diffuse large B-cell lymphomas and enabled the discovery of multiple candidate lymphoma-related genes. Our toolkit is available from our GitHub repository as Galaxy tool and dependency definitions and has been deployed using virtualization on multiple platforms including Docker. © The Author 2017. Published by Oxford University Press.

  14. A Multi-Omics Approach to Evaluate the Quality of Milk Whey Used in Ricotta Cheese Production

    PubMed Central

    Sattin, Eleonora; Andreani, Nadia A.; Carraro, Lisa; Lucchini, Rosaria; Fasolato, Luca; Telatin, Andrea; Balzan, Stefania; Novelli, Enrico; Simionati, Barbara; Cardazzo, Barbara

    2016-01-01

    In the past, milk whey was only a by-product of cheese production, but currently, it has a high commercial value for use in the food industries. However, the regulation of whey management (i.e., storage and hygienic properties) has not been updated, and as a consequence, its microbiological quality is very challenging for food safety. The Next Generation Sequencing (NGS) technique was applied to several whey samples used for Ricotta production to evaluate the microbial community composition in depth using both RNA and DNA as templates for NGS library construction. Whey samples demonstrating a high microbial and aerobic spore load contained mostly Firmicutes; although variable, some samples contained a relevant amount of Gammaproteobacteria. Several lots of whey acquired as raw material for Ricotta production presented defective organoleptic properties. To define the volatile compounds in normal and defective whey samples, a headspace gas chromatography/mass spectrometry (GC/MS) analysis was conducted. The statistical analysis demonstrated that different microbial communities resulted from DNA or cDNA library sequencing, and distinguishable microbiota composed the communities contained in the organoleptic-defective whey samples. PMID:27582735

  15. High stability of yellow fever 17D-204 vaccine: a 12-year restrospective analysis of large-scale production.

    PubMed

    Barban, V; Girerd, Y; Aguirre, M; Gulia, S; Pétiard, F; Riou, P; Barrere, B; Lang, J

    2007-04-12

    We have retrospectively analyzed 12 bulk lots of yellow fever vaccine Stamaril, produced between 1990 and 2002 and prepared from the same seed lot that has been in continuous use since 1990. All vaccine batches displayed identical genome sequence. Only four nucleotide substitutions were observed, compared to previously published sequence, with no incidence at amino-acid level. Fine analysis of viral plaque size distribution was used as an additional marker for genetic stability and demonstrated a remarkable homogeneity of the viral population. The total virus load, measured by qRT-PCR, was also homogeneous pointing out reproducibility of the vaccine production process. Mice inoculated intracerebrally with the different bulks exhibited a similar average survival time, and ratio between in vitro potency and mouse LD(50) titers remained constant from batch-to-batch. Taken together, these data demonstrate the genetic stability of the strain at mass production level over a period of 12 years and reinforce the generally admitted idea of the safety of YF17D-based vaccines.

  16. Correlation dynamics and enhanced signals for the identification of serial biomolecules and DNA bases.

    PubMed

    Ahmed, Towfiq; Haraldsen, Jason T; Rehr, John J; Di Ventra, Massimiliano; Schuller, Ivan; Balatsky, Alexander V

    2014-03-28

    Nanopore-based sequencing has demonstrated a significant potential for the development of fast, accurate, and cost-efficient fingerprinting techniques for next generation molecular detection and sequencing. We propose a specific multilayered graphene-based nanopore device architecture for the recognition of single biomolecules. Molecular detection and analysis can be accomplished through the detection of transverse currents as the molecule or DNA base translocates through the nanopore. To increase the overall signal-to-noise ratio and the accuracy, we implement a new 'multi-point cross-correlation' technique for identification of DNA bases or other molecules on the single molecular level. We demonstrate that the cross-correlations between each nanopore will greatly enhance the transverse current signal for each molecule. We implement first-principles transport calculations for DNA bases surveyed across a multilayered graphene nanopore system to illustrate the advantages of the proposed geometry. A time-series analysis of the cross-correlation functions illustrates the potential of this method for enhancing the signal-to-noise ratio. This work constitutes a significant step forward in facilitating fingerprinting of single biomolecules using solid state technology.

  17. Demonstration of retrotransposition of the Tf1 element in fission yeast.

    PubMed

    Levin, H L; Boeke, J D

    1992-03-01

    Tf1, a retrotransposon from fission yeast, has LTRs and coding sequences resembling the protease, reverse transcriptase and integrase domains of retroviral pol genes. A unique aspect of Tf1 is that it contains a single open reading frame whereas other retroviruses and retrotransposons usually possess two or more open reading frames. To determine whether Tf1 can transpose, we overproduced Tf1 transcripts encoded by a plasmid copy of the element marked with a neo gene. Approximately 0.1-4.0% of the cell population acquired chromosomally inherited resistance to G418. DNA blot analysis demonstrated that such strains had acquired both Tf1 and neo specific sequences within a restriction fragment of the same size; the size of this restriction fragment varied between different isolates. Structural analysis of the cloned DNA flanking the Tf1-neo element of two transposition candidates with the same regions in the parent strain showed that the ability to grow on G418 was due to transposition of Tf1-neo and not other types of recombination events.

  18. Combining use of a panel of ssDNA aptamers in the detection of Staphylococcus aureus

    PubMed Central

    Cao, Xiaoxiao; Li, Shaohua; Chen, Liucun; Ding, Hongmei; Xu, Hua; Huang, Yanping; Li, Jie; Liu, Nongle; Cao, Weihong; Zhu, Yanjun; Shen, Beifen; Shao, Ningsheng

    2009-01-01

    In this article, a panel of ssDNA aptamers specific to Staphylococcus aureus was obtained by a whole bacterium-based SELEX procedure and applied to probing S. aureus. After several rounds of selection with S. aureus as the target and Streptococcus and S. epidermidis as counter targets, the highly enriched oligonucleic acid pool was sequenced and then grouped under different families on the basis of the homology of the primary sequence and the similarity of the secondary structure. Eleven sequences from different families were selected for further characterization by confocal imaging and flow cytometry analysis. Results showed that five aptamers demonstrated high specificity and affinity to S. aureus individually. The five aptamers recognize different molecular targets by competitive experiment. Combining these five aptamers had a much better effect than the individual aptamer in the recognition of different S. aureus strains. In addition, the combined aptamers can probe single S. aureus in pyogenic fluids. Our work demonstrates that a set of aptamers specific to one bacterium can be used in combination for the identification of the bacterium instead of a single aptamer. PMID:19498077

  19. Combining use of a panel of ssDNA aptamers in the detection of Staphylococcus aureus.

    PubMed

    Cao, Xiaoxiao; Li, Shaohua; Chen, Liucun; Ding, Hongmei; Xu, Hua; Huang, Yanping; Li, Jie; Liu, Nongle; Cao, Weihong; Zhu, Yanjun; Shen, Beifen; Shao, Ningsheng

    2009-08-01

    In this article, a panel of ssDNA aptamers specific to Staphylococcus aureus was obtained by a whole bacterium-based SELEX procedure and applied to probing S. aureus. After several rounds of selection with S. aureus as the target and Streptococcus and S. epidermidis as counter targets, the highly enriched oligonucleic acid pool was sequenced and then grouped under different families on the basis of the homology of the primary sequence and the similarity of the secondary structure. Eleven sequences from different families were selected for further characterization by confocal imaging and flow cytometry analysis. Results showed that five aptamers demonstrated high specificity and affinity to S. aureus individually. The five aptamers recognize different molecular targets by competitive experiment. Combining these five aptamers had a much better effect than the individual aptamer in the recognition of different S. aureus strains. In addition, the combined aptamers can probe single S. aureus in pyogenic fluids. Our work demonstrates that a set of aptamers specific to one bacterium can be used in combination for the identification of the bacterium instead of a single aptamer.

  20. The construction and partial characterization of plasmids containing complementary DNA sequences to human calcitonin precursor polyprotein.

    PubMed Central

    Allison, J; Hall, L; MacIntyre, I; Craig, R K

    1981-01-01

    (1) Total poly(A)-containing RNA isolated from human thyroid medullary carcinoma tissue was shown to direct the synthesis in the wheat germ cell-free system of a major (Mr 21000) and several minor forms of human calcitonin precursor polyproteins. Evidence for processing of these precursor(s) by the wheat germ cell-free system is also presented. (2) A small complementary DNA (cDNA) plasmid library has been constructed in the PstI site of the plasmid pAT153, using total human thyroid medullary carcinoma poly(A)-containing RNA as the starting material. (3) Plasmids containing abundant cDNA sequences were selected by hybridization in situ, and two of these (ph T-B3 and phT-B6) were characterized by hybridization--translation and restriction analysis. Each was shown to contain human calcitonin precursor polyprotein cDNA sequences. (4) RNA blotting techniques demonstrate that the human calcitonin precursor polyprotein is encoded within a mRNA containing 1000 bases. (5) The results demonstrate that human calcitonin is synthesized as a precursor polyprotein. Images Fig. 1. Fig. 2. Fig. 3. PMID:6896146

  1. Automated one-step DNA sequencing based on nanoliter reaction volumes and capillary electrophoresis.

    PubMed

    Pang, H M; Yeung, E S

    2000-08-01

    An integrated system with a nano-reactor for cycle-sequencing reaction coupled to on-line purification and capillary gel electrophoresis has been demonstrated. Fifty nanoliters of reagent solution, which includes dye-labeled terminators, polymerase, BSA and template, was aspirated and mixed with the template inside the nano-reactor followed by cycle-sequencing reaction. The reaction products were then purified by a size-exclusion chromatographic column operated at 50 degrees C followed by room temperature on-line injection of the DNA fragments into a capillary for gel electrophoresis. Over 450 bases of DNA can be separated and identified. As little as 25 nl reagent solution can be used for the cycle-sequencing reaction with a slightly shorter read length. Significant savings on reagent cost is achieved because the remaining stock solution can be reused without contamination. The steps of cycle sequencing, on-line purification, injection, DNA separation, capillary regeneration, gel-filling and fluidic manipulation were performed with complete automation. This system can be readily multiplexed for high-throughput DNA sequencing or PCR analysis directly from templates or even biological materials.

  2. Scop3D: three-dimensional visualization of sequence conservation.

    PubMed

    Vermeire, Tessa; Vermaere, Stijn; Schepens, Bert; Saelens, Xavier; Van Gucht, Steven; Martens, Lennart; Vandermarliere, Elien

    2015-04-01

    The integration of a protein's structure with its known sequence variation provides insight on how that protein evolves, for instance in terms of (changing) function or immunogenicity. Yet, collating the corresponding sequence variants into a multiple sequence alignment, calculating each position's conservation, and mapping this information back onto a relevant structure is not straightforward. We therefore built the Sequence Conservation on Protein 3D structure (scop3D) tool to perform these tasks automatically. The output consists of two modified PDB files in which the B-values for each position are replaced by the percentage sequence conservation, or the information entropy for each position, respectively. Furthermore, text files with absolute and relative amino acid occurrences for each position are also provided, along with snapshots of the protein from six distinct directions in space. The visualization provided by scop3D can for instance be used as an aid in vaccine development or to identify antigenic hotspots, which we here demonstrate based on an analysis of the fusion proteins of human respiratory syncytial virus and mumps virus. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. High resolution identity testing of inactivated poliovirus vaccines

    PubMed Central

    Mee, Edward T.; Minor, Philip D.; Martin, Javier

    2015-01-01

    Background Definitive identification of poliovirus strains in vaccines is essential for quality control, particularly where multiple wild-type and Sabin strains are produced in the same facility. Sequence-based identification provides the ultimate in identity testing and would offer several advantages over serological methods. Methods We employed random RT-PCR and high throughput sequencing to recover full-length genome sequences from monovalent and trivalent poliovirus vaccine products at various stages of the manufacturing process. Results All expected strains were detected in previously characterised products and the method permitted identification of strains comprising as little as 0.1% of sequence reads. Highly similar Mahoney and Sabin 1 strains were readily discriminated on the basis of specific variant positions. Analysis of a product known to contain incorrect strains demonstrated that the method correctly identified the contaminants. Conclusion Random RT-PCR and shotgun sequencing provided high resolution identification of vaccine components. In addition to the recovery of full-length genome sequences, the method could also be easily adapted to the characterisation of minor variant frequencies and distinction of closely related products on the basis of distinguishing consensus and low frequency polymorphisms. PMID:26049003

  4. Evidence for Horizontal Gene Transfer in Evolution of Elongation Factor Tu in Enterococci

    PubMed Central

    Ke, Danbing; Boissinot, Maurice; Huletsky, Ann; Picard, François J.; Frenette, Johanne; Ouellette, Marc; Roy, Paul H.; Bergeron, Michel G.

    2000-01-01

    The elongation factor Tu, encoded by tuf genes, is a GTP binding protein that plays a central role in protein synthesis. One to three tuf genes per genome are present, depending on the bacterial species. Most low-G+C-content gram-positive bacteria carry only one tuf gene. We have designed degenerate PCR primers derived from consensus sequences of the tuf gene to amplify partial tuf sequences from 17 enterococcal species and other phylogenetically related species. The amplified DNA fragments were sequenced either by direct sequencing or by sequencing cloned inserts containing putative amplicons. Two different tuf genes (tufA and tufB) were found in 11 enterococcal species, including Enterococcus avium, Enterococcus casseliflavus, Enterococcus dispar, Enterococcus durans, Enterococcus faecium, Enterococcus gallinarum, Enterococcus hirae, Enterococcus malodoratus, Enterococcus mundtii, Enterococcus pseudoavium, and Enterococcus raffinosus. For the other six enterococcal species (Enterococcus cecorum, Enterococcus columbae, Enterococcus faecalis, Enterococcus sulfureus, Enterococcus saccharolyticus, and Enterococcus solitarius), only the tufA gene was present. Based on 16S rRNA gene sequence analysis, the 11 species having two tuf genes all have a common ancestor, while the six species having only one copy diverged from the enterococcal lineage before that common ancestor. The presence of one or two copies of the tuf gene in enterococci was confirmed by Southern hybridization. Phylogenetic analysis of tuf sequences demonstrated that the enterococcal tufA gene branches with the Bacillus, Listeria, and Staphylococcus genera, while the enterococcal tufB gene clusters with the genera Streptococcus and Lactococcus. Primary structure analysis showed that four amino acid residues encoded within the sequenced regions are conserved and unique to the enterococcal tufB genes and the tuf genes of streptococci and Lactococcus lactis. The data suggest that an ancestral streptococcus or a streptococcus-related species may have horizontally transferred a tuf gene to the common ancestor of the 11 enterococcal species which now carry two tuf genes. PMID:11092850

  5. Dual-echo ASL based assessment of motor networks: a feasibility study

    NASA Astrophysics Data System (ADS)

    Storti, Silvia Francesca; Boscolo Galazzo, Ilaria; Pizzini, Francesca B.; Menegaz, Gloria

    2018-04-01

    Objective. Dual-echo arterial spin labeling (DE-ASL) technique has been recently proposed for the simultaneous acquisition of ASL and blood-oxygenation-level-dependent (BOLD)-functional magnetic resonance imaging (fMRI) data. The assessment of this technique in detecting functional connectivity at rest or during motor and motor imagery tasks is still unexplored both per-se and in comparison with conventional methods. The purpose is to quantify the sensitivity of the DE-ASL sequence with respect to the conventional fMRI sequence (cvBOLD) in detecting brain activations, and to assess and compare the relevance of node features in decoding the network structure. Approach. Thirteen volunteers were scanned acquiring a pseudo-continuous DE-ASL sequence from which the concomitant BOLD (ccBOLD) simultaneously to the ASL can be extracted. The approach consists of two steps: (i) model-based analyses for assessing brain activations at individual and group levels, followed by statistical analysis for comparing the activation elicited by the three sequences under two conditions (motor and motor imagery), respectively; (ii) brain connectivity graph-theoretical analysis for assessing and comparing the network models properties. Main results. Our results suggest that cvBOLD and ccBOLD have comparable sensitivity in detecting the regions involved in the active task, whereas ASL offers a higher degree of co-localization with smaller activation volumes. The connectivity results and the comparative analysis of node features across sequences revealed that there are no strong changes between rest and tasks and that the differences between the sequences are limited to few connections. Significance. Considering the comparable sensitivity of the ccBOLD and cvBOLD sequences in detecting activated brain regions, the results demonstrate that DE-ASL can be successfully applied in functional studies allowing to obtain both ASL and BOLD information within a single sequence. Further, DE-ASL is a powerful technique for research and clinical applications allowing to perform quantitative comparisons as well as to characterize functional connectivity.

  6. Increasing Clinical Severity during a Dengue Virus Type 3 Cuban Epidemic: Deep Sequencing of Evolving Viral Populations

    PubMed Central

    Blanc, Hervé; Bordería, Antonio V.; Díaz, Gisell; Henningsson, Rasmus; Gonzalez, Daniel; Santana, Emidalys; Alvarez, Mayling; Castro, Osvaldo; Fontes, Magnus; Vignuzzi, Marco; Guzman, Maria G.

    2016-01-01

    ABSTRACT During the dengue virus type 3 (DENV-3) epidemic that occurred in Havana in 2001 to 2002, severe disease was associated with the infection sequence DENV-1 followed by DENV-3 (DENV-1/DENV-3), while the sequence DENV-2/DENV-3 was associated with mild/asymptomatic infections. To determine the role of the virus in the increasing severity demonstrated during the epidemic, serum samples collected at different time points were studied. A total of 22 full-length sequences were obtained using a deep-sequencing approach. Bayesian phylogenetic analysis of consensus sequences revealed that two DENV-3 lineages were circulating in Havana at that time, both grouped within genotype III. The predominant lineage is closely related to Peruvian and Ecuadorian strains, while the minor lineage is related to Venezuelan strains. According to consensus sequences, relatively few nonsynonymous mutations were observed; only one was fixed during the epidemic at position 4380 in the NS2B gene. Intrahost genetic analysis indicated that a significant minor population was selected and became predominant toward the end of the epidemic. In conclusion, greater variability was detected during the epidemic's progression in terms of significant minority variants, particularly in the nonstructural genes. An increasing trend of genetic diversity toward the end of the epidemic was observed only for synonymous variant allele rates, with higher variability in secondary cases. Remarkably, significant intrahost genetic variation was demonstrated within the same patient during the course of secondary infection with DENV-1/DENV-3, including changes in the structural proteins premembrane (PrM) and envelope (E). Therefore, the dynamic of evolving viral populations in the context of heterotypic antibodies could be related to the increasing clinical severity observed during the epidemic. IMPORTANCE Based on the evidence that DENV fitness is context dependent, our research has focused on the study of viral factors associated with intraepidemic increasing severity in a unique epidemiological setting. Here, we investigated the intrahost genetic diversity in acute human samples collected at different time points during the DENV-3 epidemic that occurred in Cuba in 2001 to 2002 using a deep-sequencing approach. We concluded that greater variability in significant minor populations occurred as the epidemic progressed, particularly in the nonstructural genes, with higher variability observed in secondary infection cases. Remarkably, for the first time significant intrahost genetic variation was demonstrated within the same patient during the course of secondary infection with DENV-1/DENV-3, including changes in structural proteins. These findings indicate that high-resolution approaches are needed to unravel molecular mechanisms involved in dengue pathogenesis. PMID:26889031

  7. Genome-wide characterization of centromeric satellites from multiple mammalian genomes.

    PubMed

    Alkan, Can; Cardone, Maria Francesca; Catacchio, Claudia Rita; Antonacci, Francesca; O'Brien, Stephen J; Ryder, Oliver A; Purgato, Stefania; Zoli, Monica; Della Valle, Giuliano; Eichler, Evan E; Ventura, Mario

    2011-01-01

    Despite its importance in cell biology and evolution, the centromere has remained the final frontier in genome assembly and annotation due to its complex repeat structure. However, isolation and characterization of the centromeric repeats from newly sequenced species are necessary for a complete understanding of genome evolution and function. In recent years, various genomes have been sequenced, but the characterization of the corresponding centromeric DNA has lagged behind. Here, we present a computational method (RepeatNet) to systematically identify higher-order repeat structures from unassembled whole-genome shotgun sequence and test whether these sequence elements correspond to functional centromeric sequences. We analyzed genome datasets from six species of mammals representing the diversity of the mammalian lineage, namely, horse, dog, elephant, armadillo, opossum, and platypus. We define candidate monomer satellite repeats and demonstrate centromeric localization for five of the six genomes. Our analysis revealed the greatest diversity of centromeric sequences in horse and dog in contrast to elephant and armadillo, which showed high-centromeric sequence homogeneity. We could not isolate centromeric sequences within the platypus genome, suggesting that centromeres in platypus are not enriched in satellite DNA. Our method can be applied to the characterization of thousands of other vertebrate genomes anticipated for sequencing in the near future, providing an important tool for annotation of centromeres.

  8. Organization, chromosomal localization and promoter analysis of the gene encoding human acidic fibroblast growth factor intracellular binding protein.

    PubMed Central

    Kolpakova, E; Frengen, E; Stokke, T; Olsnes, S

    2000-01-01

    Acidic fibroblast growth factor (aFGF) intracellular binding protein (FIBP) is a protein found mainly in the nucleus that might be involved in the intracellular function of aFGF. Here we present a comparative analysis of the deduced amino acid sequences of human, murine and Drosophila FIBP analogues and demonstrate that FIBP is an evolutionarily conserved protein. The human gene spans more than 5 kb, comprising ten exons and nine introns, and maps to chromosome 11q13.1. Two slightly different splice variants found in different tissues were isolated and characterized. Sequence analysis of the region surrounding the translation start revealed a CpG island, a classical feature of widely expressed genes. Functional studies of the promoter region with a luciferase reporter system suggested a strong transcriptional activity residing within 600 bp of the 5' flanking region. PMID:11104667

  9. Molecular analysis of the AGXT gene in Italian patients with primary hyperoxaluria type 1 (PH1).

    PubMed

    Ferrettini, C; Pirulli, D; Cosseddu, D; Marangella, M; Petrarulo, M; Mazzola, G; Vatta, S; Amoroso, A

    1998-01-01

    Specimens were collected from 22 Italian patients with primary hyperoxaluria type 1 (PH1). Ten of them had already been analyzed by molecular biology. To clarify the molecular characteristics of the AGXT gene disease responsible for PH1, DNA samples were examined for known mutations by hybridisation of PCR products with Sequence Specific Oligonucleotides (PCR-SSO). We planned to identify new mutations of the AGXT gene by heteroduplex analysis followed by direct sequencing. We had already standardized a) the conditions for the amplification of the 11 exons of AGXT, b) the PCR-SSO technique and c) the heteroduplex analysis of amplified products. Preliminary results demonstrated that the AGXT mutations described in previous studies were found only in 40% of the examined Italian patients with PH1. The remaining 60% of mutations should be characterised in future studies.

  10. Acrania/encephalocele sequence (exencephaly) associated with 92,XXXX karyotype: early prenatal diagnosis at 9(+5) weeks by 3D transvaginal ultrasound and coelocentesis.

    PubMed

    Tonni, Gabriele; Ventura, Alessandro; Bonasoni, Maria Paola

    2009-09-01

    A 27-year-old pregnant woman was diagnosed by 3D transvaginal ultrasound as carrying a fetus of 9(+5) weeks gestation affected by acrania/encephalocele (exencephaly) sequence. A 2D transvaginal ultrasound-guided aspiration of 5 mL of extra-coelomic fluid was performed under cervical block before uterine suction. Conventional cytogenetic analysis demonstrated a 92,XXXX karyotype. Transvaginal 2D ultrasound-guided coelocentesis for rapid karyotyping can be proposed to women who are near to miscarriage or in cases where a prenatal ultrasound diagnosis of congenital anomaly is performed at an early stage of development. Genetic analysis can be performed using traditional cytogenetic analysis or can be aided by fluorescence in situ hybridization (FISH). Coelocentesis may become an integral part of first trimester armamentarium and may be clinically useful in the understanding of the pathogenesis of early prenatally diagnosed congenital anomalies.

  11. 19F DOSY NMR analysis for spin systems with nJFF couplings.

    PubMed

    Dal Poggetto, Guilherme; Favaro, Denize C; Nilsson, Mathias; Morris, Gareth A; Tormena, Cláudio F

    2014-04-01

    NMR is a powerful method for identification and quantification of drug components and contaminations. These problems present themselves as mixtures, and here, one of the most powerful tools is DOSY. DOSY works best when there is no spectral overlap between components, so drugs containing fluorine substituents are well-suited for DOSY analysis as (19)F spectra are typically very sparse. Here, we demonstrate the use of a modified (19)F DOSY experiment (on the basis of the Oneshot sequences) for various fluorinated benzenes. For compounds with significant (n) JFF coupling constants, as is common, the undesirable J-modulation can be efficiently suppressed using the Oneshot45 pulse sequence. This investigation highlights (19)F DOSY as a valuable and robust method for analysis of molecular systems containing fluorine atoms even where there are large fluorine-fluorine couplings. Copyright © 2014 John Wiley & Sons, Ltd.

  12. Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline.

    PubMed

    Reid, Jeffrey G; Carroll, Andrew; Veeraraghavan, Narayanan; Dahdouli, Mahmoud; Sundquist, Andreas; English, Adam; Bainbridge, Matthew; White, Simon; Salerno, William; Buhay, Christian; Yu, Fuli; Muzny, Donna; Daly, Richard; Duyk, Geoff; Gibbs, Richard A; Boerwinkle, Eric

    2014-01-29

    Massively parallel DNA sequencing generates staggering amounts of data. Decreasing cost, increasing throughput, and improved annotation have expanded the diversity of genomics applications in research and clinical practice. This expanding scale creates analytical challenges: accommodating peak compute demand, coordinating secure access for multiple analysts, and sharing validated tools and results. To address these challenges, we have developed the Mercury analysis pipeline and deployed it in local hardware and the Amazon Web Services cloud via the DNAnexus platform. Mercury is an automated, flexible, and extensible analysis workflow that provides accurate and reproducible genomic results at scales ranging from individuals to large cohorts. By taking advantage of cloud computing and with Mercury implemented on the DNAnexus platform, we have demonstrated a powerful combination of a robust and fully validated software pipeline and a scalable computational resource that, to date, we have applied to more than 10,000 whole genome and whole exome samples.

  13. Comparison of post-contrast 3D-T1-MPRAGE, 3D-T1-SPACE and 3D-T2-FLAIR MR images in evaluation of meningeal abnormalities at 3-T MRI.

    PubMed

    Jeevanandham, Balaji; Kalyanpur, Tejas; Gupta, Prashant; Cherian, Mathew

    2017-06-01

    This study was to assess the usefulness of newer three-dimensional (3D)-T 1 sampling perfection with application optimized contrast using different flip-angle evolutions (SPACE) and 3D-T 2 fluid-attenuated inversion recovery (FLAIR) sequences in evaluation of meningeal abnormalities. 78 patients who presented with high suspicion of meningeal abnormalities were evaluated using post-contrast 3D-T 2 -FLAIR, 3D-T 1 magnetization-prepared rapid gradient-echo (MPRAGE) and 3D-T 1 -SPACE sequences. The images were evaluated independently by two radiologists for cortical gyral, sulcal space, basal cisterns and dural enhancement. The diagnoses were confirmed by further investigations including histopathology. Post-contrast 3D-T 1 -SPACE and 3D-T 2 -FLAIR images yielded significantly more information than MPRAGE images (p < 0.05 for both SPACE and FLAIR images) in detection of meningeal abnormalities. SPACE images best demonstrated abnormalities in dural and sulcal spaces, whereas FLAIR was useful for basal cisterns enhancement. Both SPACE and FLAIR performed equally well in detection of gyral enhancement. In all 10 patients, where both SPACE and T 2 -FLAIR images failed to demonstrate any abnormality, further analysis was also negative. The 3D-T 1 -SPACE sequence best demonstrated abnormalities in dural and sulcal spaces, whereas FLAIR was useful for abnormalities in basal cisterns. Both SPACE and FLAIR performed holds good for detection of gyral enhancement. Post-contrast SPACE and FLAIR sequences are superior to the MPRAGE sequence for evaluation of meningeal abnormalities and when used in combination have the maximum sensitivity for leptomeningeal abnormalities. The negative-predictive value is nearly 100%, where no leptomeningeal abnormality was detected on these sequences. Advances in knowledge: Post-contrast 3D-T 1 -SPACE and 3D-T 2 -FLAIR images are more useful than 3D-T 1 -MPRAGE images in evaluation of meningeal abnormalities.

  14. A rare complex DNA rearrangement in the murine Steel gene results in exon duplication and a lethal phenotype.

    PubMed

    Chandra, Saurabh; Kapur, Reuben; Chuzhanova, Nadia; Summey, Victoria; Prentice, David; Barker, Jane; Cooper, David N; Williams, David A

    2003-11-15

    Kit ligand (Kitl), encoded by the Steel (Sl) locus, plays an essential role in hematopoiesis, gametogenesis, and melanogenesis during both embryonic and adult life. We have characterized a new spontaneous mutant of the Sl locus in mice designated KitlSl-20J that arose in the breeding colony at Jackson Laboratories. Heterozygous KitlSl-20J mice display a white belly spot and intercrossing results in an embryonic lethal phenotype in the homozygous state. Analysis of homozygous embryos demonstrated a significant reduction in fetal liver cellularity, colony forming unit-erythroid (CFU-E) progenitors, and a total absence of germ cells. Although expressed in vivo, recombinant mutant protein demonstrated loss of bioactivity that was correlated with lack of receptor binding. Analysis of the Sl gene transcripts in heterozygous KitlSl-20J mice revealed an in-frame tandem duplication of exon 3. A long-range polymerase chain reaction (PCR) strategy using overlapping primers in exon 3 amplified an approximately 7-kilobase (kb) product from DNA isolated from heterozygous KitlSl-20J mice but not from wild-type DNA that contained sequences from both introns 2 and 3 and an inverted intron 2 sequence, suggesting a complex rearrangement as the mechanism of the mutation. "Complexity analysis" of the sequence of the amplified product strongly suggests that local DNA motifs may have contributed to the generation of this spontaneous KitlSl-20J allele, likely mediated by a 2-step process. The KitlSl-20J mutation is a unique KitlSl allele and represents an unusual mechanism of mutation.

  15. Genometa--a fast and accurate classifier for short metagenomic shotgun reads.

    PubMed

    Davenport, Colin F; Neugebauer, Jens; Beckmann, Nils; Friedrich, Benedikt; Kameri, Burim; Kokott, Svea; Paetow, Malte; Siekmann, Björn; Wieding-Drewes, Matthias; Wienhöfer, Markus; Wolf, Stefan; Tümmler, Burkhard; Ahlers, Volker; Sprengel, Frauke

    2012-01-01

    Metagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We here present Genometa, a computationally undemanding graphical user interface program that enables identification of bacterial species and gene content from datasets generated by inexpensive high-throughput short read sequencing technologies. Our approach was first verified on two simulated metagenomic short read datasets, detecting 100% and 94% of the bacterial species included with few false positives or false negatives. Subsequent comparative benchmarking analysis against three popular metagenomic algorithms on an Illumina human gut dataset revealed Genometa to attribute the most reads to bacteria at species level (i.e. including all strains of that species) and demonstrate similar or better accuracy than the other programs. Lastly, speed was demonstrated to be many times that of BLAST due to the use of modern short read aligners. Our method is highly accurate if bacteria in the sample are represented by genomes in the reference sequence but cannot find species absent from the reference. This method is one of the most user-friendly and resource efficient approaches and is thus feasible for rapidly analysing millions of short reads on a personal computer. The Genometa program, a step by step tutorial and Java source code are freely available from http://genomics1.mh-hannover.de/genometa/ and on http://code.google.com/p/genometa/. This program has been tested on Ubuntu Linux and Windows XP/7.

  16. “Epidemic Clones” of Listeria monocytogenes Are Widespread and Ancient Clonal Groups

    PubMed Central

    Cantinelli, Thomas; Chenal-Francisque, Viviane; Diancourt, Laure; Frezal, Lise; Leclercq, Alexandre; Wirth, Thierry

    2013-01-01

    The food-borne pathogen Listeria monocytogenes is genetically heterogeneous. Although some clonal groups have been implicated in multiple outbreaks, there is currently no consensus on how “epidemic clones” should be defined. The objectives of this work were to compare the patterns of sequence diversity on two sets of genes that have been widely used to define L. monocytogenes clonal groups: multilocus sequence typing (MLST) and multi-virulence-locus sequence typing (MvLST). Further, we evaluated the diversity within clonal groups by pulsed-field gel electrophoresis (PFGE). Based on 125 isolates of diverse temporal, geographical, and source origins, MLST and MvLST genes (i) had similar patterns of sequence polymorphisms, recombination, and selection, (ii) provided concordant phylogenetic clustering, and (iii) had similar discriminatory power, which was not improved when we combined both data sets. Inclusion of representative strains of previous outbreaks demonstrated the correspondence of epidemic clones with previously recognized MLST clonal complexes. PFGE analysis demonstrated heterogeneity within major clones, most of which were isolated decades before their involvement in outbreaks. We conclude that the “epidemic clone” denominations represent a redundant but largely incomplete nomenclature system for MLST-defined clones, which must be regarded as successful genetic groups that are widely distributed across time and space. PMID:24006010

  17. Transcriptome and Small RNA Deep Sequencing Reveals Deregulation of miRNA Biogenesis in Human Glioma

    PubMed Central

    Moore, Lynette M.; Kivinen, Virpi; Liu, Yuexin; Annala, Matti; Cogdell, David; Liu, Xiuping; Liu, Chang-Gong; Sawaya, Raymond; Yli-Harja, Olli; Shmulevich, Ilya; Fuller, Gregory N.; Zhang, Wei; Nykter, Matti

    2013-01-01

    Altered expression of oncogenic and tumor-suppressing microRNAs (miRNAs) is widely associated with tumorigenesis. However, the regulatory mechanisms underlying these alterations are poorly understood. We sought to shed light on the deregulation of miRNA biogenesis promoting the aberrant miRNA expression profiles identified in these tumors. Using sequencing technology to perform both whole-transcriptome and small RNA sequencing of glioma patient samples, we examined precursor and mature miRNAs to directly evaluate the miRNA maturation process, and interrogated expression profiles for genes involved in the major steps of miRNA biogenesis. We found that ratios of mature to precursor forms of a large number of miRNAs increased with the progression from normal brain to low-grade and then to high-grade gliomas. The expression levels of genes involved in each of the three major steps of miRNA biogenesis (nuclear processing, nucleo-cytoplasmic transport, and cytoplasmic processing) were systematically altered in glioma tissues. Survival analysis of an independent data set demonstrated that the alteration of genes involved in miRNA maturation correlates with survival in glioma patients. Direct quantification of miRNA maturation with deep sequencing demonstrated that deregulation of the miRNA biogenesis pathway is a hallmark for glioma genesis and progression. PMID:23007860

  18. Genomic Sequencing Procedure Microcosting Analysis and Health Economic Cost-Impact Analysis: A Report of the Association for Molecular Pathology.

    PubMed

    Sabatini, Linda M; Mathews, Charles; Ptak, Devon; Doshi, Shivang; Tynan, Katherine; Hegde, Madhuri R; Burke, Tara L; Bossler, Aaron D

    2016-05-01

    The increasing use of advanced nucleic acid sequencing technologies for clinical diagnostics and therapeutics has made vital understanding the costs of performing these procedures and their value to patients, providers, and payers. The Association for Molecular Pathology invested in a cost and value analysis of specific genomic sequencing procedures (GSPs) newly coded by the American Medical Association Current Procedural Terminology Editorial Panel. Cost data and work effort, including the development and use of data analysis pipelines, were gathered from representative laboratories currently performing these GSPs. Results were aggregated to generate representative cost ranges given the complexity and variability of performing the tests. Cost-impact models for three clinical scenarios were generated with assistance from key opinion leaders: impact of using a targeted gene panel in optimizing care for patients with advanced non-small-cell lung cancer, use of a targeted gene panel in the diagnosis and management of patients with sensorineural hearing loss, and exome sequencing in the diagnosis and management of children with neurodevelopmental disorders of unknown genetic etiology. Each model demonstrated value by either reducing health care costs or identifying appropriate care pathways. The templates generated will aid laboratories in assessing their individual costs, considering the value structure in their own patient populations, and contributing their data to the ongoing dialogue regarding the impact of GSPs on improving patient care. Copyright © 2016 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  19. The Complete Chloroplast Genome Sequence of a Relict Conifer Glyptostrobus pensilis: Comparative Analysis and Insights into Dynamics of Chloroplast Genome Rearrangement in Cupressophytes and Pinaceae

    PubMed Central

    Zheng, Renhua; Xu, Haibin; Zhou, Yanwei; Li, Meiping; Lu, Fengjuan; Dong, Yini; Liu, Xin; Chen, Jinhui; Shi, Jisen

    2016-01-01

    Glyptostrobus pensilis, belonging to the monotypic genus Glyptostrobus (Family: Cupressaceae), is an ancient conifer that is naturally distributed in low-lying wet areas. Here, we report the complete chloroplast (cp) genome sequence (132,239 bp) of G. pensilis. The G. pensilis cp genome is similar in gene content, organization and genome structure to the sequenced cp genomes from other cupressophytes, especially with respect to the loss of the inverted repeat region A (IRA). Through phylogenetic analysis, we demonstrated that the genus Glyptostrobus is closely related to the genus Cryptomeria, supporting previous findings based on physiological characteristics. Since IRs play an important role in stabilize cp genome and conifer cp genomes lost different IR regions after splitting in two clades (cupressophytes and Pinaceae), we performed cp genome rearrangement analysis and found more extensive cp genome rearrangements among the species of cupressophytes relative to Pinaceae. Additional repeat analysis indicated that cupressophytes cp genomes contained less potential functional repeats, especially in Cupressaceae, compared with Pinaceae. These results suggested that dynamics of cp genome rearrangement in conifers differed since the two clades, Pinaceae and cupressophytes, lost IR copies independently and developed different repeats to complement the residual IRs. In addition, we identified 170 perfect simple sequence repeats that will be useful in future research focusing on the evolution of genetic diversity and conservation of genetic variation for this endangered species in the wild. PMID:27560965

  20. The cyc1-11 mutation in yeast reverts by recombination with a nonallelic gene: composite genes determining the iso-cytochromes c.

    PubMed Central

    Ernst, J F; Stewart, J W; Sherman, F

    1981-01-01

    DNA sequence analysis of a cloned fragment directly established that the cyc1-11 mutation of iso-1-cytochrome c in the yeast Saccharomyces cerevisiae is a two-base-pair substitution that changes the CCA proline codon at amino acid position 76 to a UAA nonsense codon. Analysis of 11 revertant proteins and one cloned revertant gene showed that reversion of the cyc1-11 mutation can occur in three ways: a single base-pair substitution, which produces a serine replacement at position 76; recombination with the nonallelic CYC7 gene of iso-2-cytochrome c, which causes replacement of a segment in the cyc1-11 gene by the corresponding segment of the CYC7 gene; and either a two-base-pair substitution or recombination with the CYC7 gene, which causes the formation of the normal iso-1-cytochrome c sequence. These results demonstrate the occurrence of low frequencies of recombination between nonallelic genes having extensive but not complete homology. The formation of composite genes that share sequences from nonallelic genes may be an evolutionary mechanism for producing protein diversities and for maintaining identical sequences at different loci. Images PMID:6273865

  1. The complete chloroplast genome sequence of Actinidia arguta using the PacBio RS II platform

    PubMed Central

    Lin, Miaomiao; Qi, Xiujuan; Chen, Jinyong; Sun, Leiming; Zhong, Yunpeng; Fang, Jinbao; Hu, Chungen

    2018-01-01

    Actinidia arguta is the most basal species in a phylogenetically and economically important genus in the family Actinidiaceae. To better understand the molecular basis of the Actinidia arguta chloroplast (cp), we sequenced the complete cp genome from A. arguta using Illumina and PacBio RS II sequencing technologies. The cp genome from A. arguta was 157,611 bp in length and composed of a pair of 24,232 bp inverted repeats (IRs) separated by a 20,463 bp small single copy region (SSC) and an 88,684 bp large single copy region (LSC). Overall, the cp genome contained 113 unique genes. The cp genomes from A. arguta and three other Actinidia species from GenBank were subjected to a comparative analysis. Indel mutation events and high frequencies of base substitution were identified, and the accD and ycf2 genes showed a high degree of variation within Actinidia. Forty-seven simple sequence repeats (SSRs) and 155 repetitive structures were identified, further demonstrating the rapid evolution in Actinidia. The cp genome analysis and the identification of variable loci provide vital information for understanding the evolution and function of the chloroplast and for characterizing Actinidia population genetics. PMID:29795601

  2. EEG microstates during resting represent personality differences.

    PubMed

    Schlegel, Felix; Lehmann, Dietrich; Faber, Pascal L; Milz, Patricia; Gianotti, Lorena R R

    2012-01-01

    We investigated the spontaneous brain electric activity of 13 skeptics and 16 believers in paranormal phenomena; they were university students assessed with a self-report scale about paranormal beliefs. 33-channel EEG recordings during no-task resting were processed as sequences of momentary potential distribution maps. Based on the maps at peak times of Global Field Power, the sequences were parsed into segments of quasi-stable potential distribution, the 'microstates'. The microstates were clustered into four classes of map topographies (A-D). Analysis of the microstate parameters time coverage, occurrence frequency and duration as well as the temporal sequence (syntax) of the microstate classes revealed significant differences: Believers had a higher coverage and occurrence of class B, tended to decreased coverage and occurrence of class C, and showed a predominant sequence of microstate concatenations from A to C to B to A that was reversed in skeptics (A to B to C to A). Microstates of different topographies, putative "atoms of thought", are hypothesized to represent different types of information processing.The study demonstrates that personality differences can be detected in resting EEG microstate parameters and microstate syntax. Microstate analysis yielded no conclusive evidence for the hypothesized relation between paranormal belief and schizophrenia.

  3. Using comparative genome analysis to identify problems in annotated microbial genomes.

    PubMed

    Poptsova, Maria S; Gogarten, J Peter

    2010-07-01

    Genome annotation is a tedious task that is mostly done by automated methods; however, the accuracy of these approaches has been questioned since the beginning of the sequencing era. Genome annotation is a multilevel process, and errors can emerge at different stages: during sequencing, as a result of gene-calling procedures, and in the process of assigning gene functions. Missed or wrongly annotated genes differentially impact different types of analyses. Here we discuss and demonstrate how the methods of comparative genome analysis can refine annotations by locating missing orthologues. We also discuss possible reasons for errors and show that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Since old errors may propagate to the newly sequenced genomes, we emphasize that the problem of continuously updating popular public databases is an urgent and unresolved one. Due to the progress in genome-sequencing technologies, automated annotation techniques will remain the main approach in the future. Researchers need to be aware of the existing errors in the annotation of even well-studied genomes, such as Escherichia coli, and consider additional quality control for their results.

  4. Novel chaperonins are prevalent in the virioplankton and demonstrate links to viral biology and ecology

    PubMed Central

    Marine, Rachel L; Nasko, Daniel J; Wray, Jeffrey; Polson, Shawn W; Wommack, K Eric

    2017-01-01

    Chaperonins are protein-folding machinery found in all cellular life. Chaperonin genes have been documented within a few viruses, yet, surprisingly, analysis of metagenome sequence data indicated that chaperonin-carrying viruses are common and geographically widespread in marine ecosystems. Also unexpected was the discovery of viral chaperonin sequences related to thermosome proteins of archaea, indicating the presence of virioplankton populations infecting marine archaeal hosts. Virioplankton large subunit chaperonin sequences (GroELs) were divergent from bacterial sequences, indicating that viruses have carried this gene over long evolutionary time. Analysis of viral metagenome contigs indicated that: the order of large and small subunit genes was linked to the phylogeny of GroEL; both lytic and temperate phages may carry group I chaperonin genes; and viruses carrying a GroEL gene likely have large double-stranded DNA (dsDNA) genomes (>70 kb). Given these connections, it is likely that chaperonins are critical to the biology and ecology of virioplankton populations that carry these genes. Moreover, these discoveries raise the intriguing possibility that viral chaperonins may more broadly alter the structure and function of viral and cellular proteins in infected host cells. PMID:28731469

  5. Novel chaperonins are prevalent in the virioplankton and demonstrate links to viral biology and ecology.

    PubMed

    Marine, Rachel L; Nasko, Daniel J; Wray, Jeffrey; Polson, Shawn W; Wommack, K Eric

    2017-11-01

    Chaperonins are protein-folding machinery found in all cellular life. Chaperonin genes have been documented within a few viruses, yet, surprisingly, analysis of metagenome sequence data indicated that chaperonin-carrying viruses are common and geographically widespread in marine ecosystems. Also unexpected was the discovery of viral chaperonin sequences related to thermosome proteins of archaea, indicating the presence of virioplankton populations infecting marine archaeal hosts. Virioplankton large subunit chaperonin sequences (GroELs) were divergent from bacterial sequences, indicating that viruses have carried this gene over long evolutionary time. Analysis of viral metagenome contigs indicated that: the order of large and small subunit genes was linked to the phylogeny of GroEL; both lytic and temperate phages may carry group I chaperonin genes; and viruses carrying a GroEL gene likely have large double-stranded DNA (dsDNA) genomes (>70 kb). Given these connections, it is likely that chaperonins are critical to the biology and ecology of virioplankton populations that carry these genes. Moreover, these discoveries raise the intriguing possibility that viral chaperonins may more broadly alter the structure and function of viral and cellular proteins in infected host cells.

  6. Phylogenetic Distribution of the Capsid Assembly Protein Gene (g20) of Cyanophages in Paddy Floodwaters in Northeast China

    PubMed Central

    Jing, Ruiyong; Liu, Junjie; Yu, Zhenhua; Liu, Xiaobing; Wang, Guanghua

    2014-01-01

    Numerous studies have revealed the high diversity of cyanophages in marine and freshwater environments, but little is currently known about the diversity of cyanophages in paddy fields, particularly in Northeast (NE) China. To elucidate the genetic diversity of cyanophages in paddy floodwaters in NE China, viral capsid assembly protein gene (g20) sequences from five floodwater samples were amplified with the primers CPS1 and CPS8. Denaturing gradient gel electrophoresis (DGGE) was applied to distinguish different g20 clones. In total, 54 clones differing in g20 nucleotide sequences were obtained in this study. Phylogenetic analysis showed that the distribution of g20 sequences in this study was different from that in Japanese paddy fields, and all the sequences were grouped into Clusters α, β, γ and ε. Within Clusters α and β, three new small clusters (PFW-VII∼-IX) were identified. UniFrac analysis of g20 clone assemblages demonstrated that the community compositions of cyanophage varied among marine, lake and paddy field environments. In paddy floodwater, community compositions of cyanophage were also different between NE China and Japan. PMID:24533125

  7. Complete sequence and comparative analysis of the chloroplast genome of Plinia trunciflora

    PubMed Central

    Eguiluz, Maria; Yuyama, Priscila Mary; Guzman, Frank; Rodrigues, Nureyev Ferreira; Margis, Rogerio

    2017-01-01

    Abstract Plinia trunciflora is a Brazilian native fruit tree from the Myrtaceae family, also known as jaboticaba. This species has great potential by its fruit production. Due to the high content of essential oils in their leaves and of anthocyanins in the fruits, there is also an increasing interest by the pharmaceutical industry. Nevertheless, there are few studies focusing on its molecular biology and genetic characterization. We herein report the complete chloroplast (cp) genome of P. trunciflora using high-throughput sequencing and compare it to other previously sequenced Myrtaceae genomes. The cp genome of P. trunciflora is 159,512 bp in size, comprising inverted repeats of 26,414 bp and single-copy regions of 88,097 bp (LSC) and 18,587 bp (SSC). The genome contains 111 single-copy genes (77 protein-coding, 30 tRNA and four rRNA genes). Phylogenetic analysis using 57 cp protein-coding genes demonstrated that P. trunciflora, Eugenia uniflora and Acca sellowiana form a cluster with closer relationship to Syzygium cumini than with Eucalyptus. The complete cp sequence reported here can be used in evolutionary and population genetics studies, contributing to resolve the complex taxonomy of this species and fill the gap in genetic characterization. PMID:29111566

  8. Sequence variability of Campylobacter temperate bacteriophages

    PubMed Central

    Clark, Clifford G; Ng, Lai-King

    2008-01-01

    Background Prophages integrated within the chromosomes of Campylobacter jejuni isolates have been demonstrated very recently. Prior work with Campylobacter temperate bacteriophages, as well as evidence from prophages in other enteric bacteria, suggests these prophages might have a role in the biology and virulence of the organism. However, very little is known about the genetic variability of Campylobacter prophages which, if present, could lead to differential phenotypes in isolates carrying the phages versus those that do not. As a first step in the characterization of C. jejuni prophages, we investigated the distribution of prophage DNA within a C. jejuni population assessed the DNA and protein sequence variability within a subset of the putative prophages found. Results Southern blotting of C. jejuni DNA using probes from genes within the three putative prophages of the C. jejuni sequenced strain RM 1221 demonstrated the presence of at least one prophage gene in a large proportion (27/35) of isolates tested. Of these, 15 were positive for 5 or more of the 7 Campylobacter Mu-like phage 1 (CMLP 1, also designated Campylobacter jejuni integrated element 1, or CJIE 1) genes tested. Twelve of these putative prophages were chosen for further analysis. DNA sequencing of a 9,000 to 11,000 nucleotide region of each prophage demonstrated a close homology with CMLP 1 in both gene order and nucleotide sequence. Structural and sequence variability, including short insertions, deletions, and allele replacements, were found within the prophage genomes, some of which would alter the protein products of the ORFs involved. No insertions of novel genes were detected within the sequenced regions. The 12 prophages and RM 1221 had a % G+C very similar to C. jejuni sequenced strains, as well as promoter regions characteristic of C. jejuni. None of the putative prophages were successfully induced and propagated, so it is not known if they were functional or if they represented remnant prophage DNA in the bacterial chromosomes. Conclusion These putative prophages form a family of phages with conserved sequences, and appear to be adapted to Campylobacter. There was evidence for recombination among groups of prophages, suggesting that the prophages had a mosaic structure. In many of these properties, the Mu-like CMLP 1 homologs characterized in this study resemble temperate bacteriophages of enteric bacteria that are responsible for contributions to virulence and host adaptation. PMID:18366706

  9. New Insights on Taxonomy, Phylogeny and Population Genetics of Leishmania (Viannia) Parasites Based on Multilocus Sequence Analysis

    PubMed Central

    Boité, Mariana C.; Mauricio, Isabel L.; Miles, Michael A.; Cupolillo, Elisa

    2012-01-01

    The Leishmania genus comprises up to 35 species, some with status still under discussion. The multilocus sequence typing (MLST)—extensively used for bacteria—has been proposed for pathogenic trypanosomatids. For Leishmania, however, a detailed analysis and revision on the taxonomy is still required. We have partially sequenced four housekeeping genes—glucose-6-phosphate dehydrogenase (G6PD), 6-phosphogluconate dehydrogenase (6PGD), mannose phosphate isomerase (MPI) and isocitrate dehydrogenase (ICD)—from 96 Leishmania (Viannia) strains and assessed their discriminatory typing capacity. The fragments had different degrees of diversity, and are thus suitable to be used in combination for intra- and inter-specific inferences. Species-specific single nucleotide polymorphisms were detected, but not for all species; ambiguous sites indicating heterozygosis were observed, as well as the putative homozygous donor. A large number of haplotypes were detected for each marker; for 6PGD a possible ancestral allele for L. (Viannia) was found. Maximum parsimony-based haplotype networks were built. Strains of different species, as identified by multilocus enzyme electrophoresis (MLEE), formed separated clusters in each network, with exceptions. NeighborNet of concatenated sequences confirmed species-specific clusters, suggesting recombination occurring in L. braziliensis and L. guyanensis. Phylogenetic analysis indicates L. lainsoni and L. naiffi as the most divergent species and does not support L. shawi as a distinct species, placing it in the L. guyanensis cluster. BURST analysis resulted in six clonal complexes (CC), corresponding to distinct species. The L. braziliensis strains evaluated correspond to one widely geographically distributed CC and another restricted to one endemic area. This study demonstrates the value of systematic multilocus sequence analysis (MLSA) for determining intra- and inter-species relationships and presents an approach to validate the species status of some entities. Furthermore, it contributes to the phylogeny of L. (Viannia) and might be helpful for epidemiological and population genetics analysis based on haplotype/diplotype determinations and inferences. PMID:23133690

  10. Phylogenetic Network Analysis Revealed the Occurrence of Horizontal Gene Transfer of 16S rRNA in the Genus Enterobacter

    PubMed Central

    Sato, Mitsuharu; Miyazaki, Kentaro

    2017-01-01

    Horizontal gene transfer (HGT) is a ubiquitous genetic event in bacterial evolution, but it seldom occurs for genes involved in highly complex supramolecules (or biosystems), which consist of many gene products. The ribosome is one such supramolecule, but several bacteria harbor dissimilar and/or chimeric 16S rRNAs in their genomes, suggesting the occurrence of HGT of this gene. However, we know little about whether the genes actually experience HGT and, if so, the frequency of such a transfer. This is primarily because the methods currently employed for phylogenetic analysis (e.g., neighbor-joining, maximum likelihood, and maximum parsimony) of 16S rRNA genes assume point mutation-driven tree-shape evolution as an evolutionary model, which is intrinsically inappropriate to decipher the evolutionary history for genes driven by recombination. To address this issue, we applied a phylogenetic network analysis, which has been used previously for detection of genetic recombination in homologous alleles, to the 16S rRNA gene. We focused on the genus Enterobacter, whose phylogenetic relationships inferred by multi-locus sequence alignment analysis and 16S rRNA sequences are incompatible. All 10 complete genomic sequences were retrieved from the NCBI database, in which 71 16S rRNA genes were included. Neighbor-joining analysis demonstrated that the genes residing in the same genomes clustered, indicating the occurrence of intragenomic recombination. However, as suggested by the low bootstrap values, evolutionary relationships between the clusters were uncertain. We then applied phylogenetic network analysis to representative sequences from each cluster. We found three ancestral 16S rRNA groups; the others were likely created through recursive recombination between the ancestors and chimeric descendants. Despite the large sequence changes caused by the recombination events, the RNA secondary structures were conserved. Successive intergenomic and intragenomic recombination thus shaped the evolution of 16S rRNA genes in the genus Enterobacter. PMID:29180992

  11. De novo transcript sequence reconstruction from RNA-Seq: reference generation and analysis with Trinity

    PubMed Central

    Yassour, Moran; Grabherr, Manfred; Blood, Philip D.; Bowden, Joshua; Couger, Matthew Brian; Eccles, David; Li, Bo; Lieber, Matthias; MacManes, Matthew D.; Ott, Michael; Orvis, Joshua; Pochet, Nathalie; Strozzi, Francesco; Weeks, Nathan; Westerman, Rick; William, Thomas; Dewey, Colin N.; Henschel, Robert; LeDuc, Richard D.; Friedman, Nir; Regev, Aviv

    2013-01-01

    De novo assembly of RNA-Seq data allows us to study transcriptomes without the need for a genome sequence, such as in non-model organisms of ecological and evolutionary importance, cancer samples, or the microbiome. In this protocol, we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-Seq data in non-model organisms. We also present Trinity’s supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples, and approaches to identify protein coding genes. In an included tutorial we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sf.net. PMID:23845962

  12. Isolation of Lagos Bat Virus from Water Mongoose

    PubMed Central

    Markotter, Wanda; Kuzmin, Ivan; Rupprecht, Charles E.; Randles, Jenny; Sabeta, Claude T.; Wandeler, Alexander I.

    2006-01-01

    A genotype 2 lyssavirus, Lagos bat virus (LBV), was isolated from a terrestrial wildlife species (water mongoose) in August 2004 in the Durban area of the KwaZulu-Natal Province of South Africa. The virus isolate was confirmed as LBV by antigenic and genetic characterization, and the mongoose was identified as Atilax paludinosus by mitochondrial cytochrome b sequence analysis. Phylogenetic analysis demonstrated sequence homology with previous LBV isolates from South African bats. Studies performed in mice indicated that the peripheral pathogenicity of LBV had been underestimated in previous studies. Surveillance strategies for LBV in Africa must be improved to better understand the epidemiology of this virus and to make informed decisions on future vaccine strategies because evidence is insufficent that current rabies vaccines provide protection against LBV. PMID:17326944

  13. Some methodical peculiarities of analysis of small-mass samples by SRXFA

    NASA Astrophysics Data System (ADS)

    Kudryashova, A. F.; Tarasov, L. S.; Ulyanov, A. A.; Baryshev, V. B.

    1989-10-01

    The stability of work of the element analysis station on the storage rings VEPP-3 and VEPP-4 in INP (Novosibirsk, USSR) was demonstrated on the example of three sets of rare element analyses carried out by SRXFA in May 1985, January and May-June 1988. These data show that there are some systematic deviations in the results of measurements of Zr and La contents. SRXFA and INAA data have been compared for the latter element. A false linear correlation on the Rb-Sr plot in one set of analyses has been attributed to an overlapping artificial Sr peak on a Rb peak. The authors proposed sequences of registration of spectra and computer treatment for samples and standards. Such sequences result in better final concentration data.

  14. Predictive Rate-Distortion for Infinite-Order Markov Processes

    NASA Astrophysics Data System (ADS)

    Marzen, Sarah E.; Crutchfield, James P.

    2016-06-01

    Predictive rate-distortion analysis suffers from the curse of dimensionality: clustering arbitrarily long pasts to retain information about arbitrarily long futures requires resources that typically grow exponentially with length. The challenge is compounded for infinite-order Markov processes, since conditioning on finite sequences cannot capture all of their past dependencies. Spectral arguments confirm a popular intuition: algorithms that cluster finite-length sequences fail dramatically when the underlying process has long-range temporal correlations and can fail even for processes generated by finite-memory hidden Markov models. We circumvent the curse of dimensionality in rate-distortion analysis of finite- and infinite-order processes by casting predictive rate-distortion objective functions in terms of the forward- and reverse-time causal states of computational mechanics. Examples demonstrate that the resulting algorithms yield substantial improvements.

  15. Cloning of a cDNA encoding 1-aminocyclopropane-1-carboxylate synthase and expression of its mRNA in ripening apple fruit.

    PubMed

    Dong, J G; Kim, W T; Yip, W K; Thompson, G A; Li, L; Bennett, A B; Yang, S F

    1991-08-01

    1-Aminocyclopropane-1-carboxylate (ACC) synthase (EC 4.4.1.14) purified from apple (Malus sylvestris Mill.) fruit was subjected to trypsin digestion. Following separation by reversed-phase high-pressure liquid chromatography, ten tryptic peptides were sequenced. Based on the sequences of three tryptic peptides, three sets of mixed oligonucleotide probes were synthesized and used to screen a plasmid cDNA library prepared from poly(A)(+) RNA of ripe apple fruit. A 1.5-kb (kilobase) cDNA clone which hybridized to all three probes were isolated. The clone contained an open reading frame of 1214 base pairs (bp) encoding a sequence of 404 amino acids. While the polyadenine tail at the 3'-end was intact, it lacked a portion of sequence at the 5'-end. Using the RNA-based polymerase chain reaction, an additional sequence of 148 bp was obtained at the 5'-end. Thus, 1362 bp were sequenced and they encode 454 amino acids. The deduced amino-acid sequence contained peptide sequences corresponding to all ten tryptic fragments, confirming the identity of the cDNA clone. Comparison of the deduced amino-acid sequence between ACC synthase from apple fruit and those from tomato (Lycopersicon esculentum Mill.) and winter squash (Cucurbita maxima Duch.) fruits demonstrated the presence of seven highly conserved regions, including the previously identified region for the active site. The size of the translation product of ACC-synthase mRNA was similar to that of the mature protein on sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE), indicating that apple ACC-synthase undergoes only minor, if any, post-translational proteolytic processing. Analysis of ACC-synthase mRNA by in-vitro translation-immunoprecipitation, and by Northern blotting indicates that the ACC-synthase mRNA was undetectable in unripe fruit, but was accumulated massively during the ripening proccess. These data demonstrate that the expression of the ACC-synthase gene is developmentally regulated.

  16. Development of a Prokaryotic Universal Primer for Simultaneous Analysis of Bacteria and Archaea Using Next-Generation Sequencing

    PubMed Central

    Takahashi, Shunsuke; Tomita, Junko; Nishioka, Kaori; Hisada, Takayoshi; Nishijima, Miyuki

    2014-01-01

    For the analysis of microbial community structure based on 16S rDNA sequence diversity, sensitive and robust PCR amplification of 16S rDNA is a critical step. To obtain accurate microbial composition data, PCR amplification must be free of bias; however, amplifying all 16S rDNA species with equal efficiency from a sample containing a large variety of microorganisms remains challenging. Here, we designed a universal primer based on the V3-V4 hypervariable region of prokaryotic 16S rDNA for the simultaneous detection of Bacteria and Archaea in fecal samples from crossbred pigs (Landrace×Large white×Duroc) using an Illumina MiSeq next-generation sequencer. In-silico analysis showed that the newly designed universal prokaryotic primers matched approximately 98.0% of Bacteria and 94.6% of Archaea rRNA gene sequences in the Ribosomal Database Project database. For each sequencing reaction performed with the prokaryotic universal primer, an average of 69,330 (±20,482) reads were obtained, of which archaeal rRNA genes comprised approximately 1.2% to 3.2% of all prokaryotic reads. In addition, the detection frequency of Bacteria belonging to the phylum Verrucomicrobia, including members of the classes Verrucomicrobiae and Opitutae, was higher in the NGS analysis using the prokaryotic universal primer than that performed with the bacterial universal primer. Importantly, this new prokaryotic universal primer set had markedly lower bias than that of most previously designed universal primers. Our findings demonstrate that the prokaryotic universal primer set designed in the present study will permit the simultaneous detection of Bacteria and Archaea, and will therefore allow for a more comprehensive understanding of microbial community structures in environmental samples. PMID:25144201

  17. Census of the bacterial community of the gypsy moth larval midgut by using culturing and culture-independent methods.

    PubMed

    Broderick, Nichole A; Raffa, Kenneth F; Goodman, Robert M; Handelsman, Jo

    2004-01-01

    Little is known about bacteria associated with Lepidoptera, the large group of mostly phytophagous insects comprising the moths and butterflies. We inventoried the larval midgut bacteria of a polyphagous foliivore, the gypsy moth (Lymantria dispar L.), whose gut is highly alkaline, by using traditional culturing and culture-independent methods. We also examined the effects of diet on microbial composition. Analysis of individual third-instar larvae revealed a high degree of similarity of microbial composition among insects fed on the same diet. DNA sequence analysis indicated that most of the PCR-amplified 16S rRNA genes belong to the gamma-Proteobacteria and low G+C gram-positive divisions and that the cultured members represented more than half of the phylotypes identified. Less frequently detected taxa included members of the alpha-Proteobacterium, Actinobacterium, and Cytophaga/Flexibacter/Bacteroides divisions. The 16S rRNA gene sequences from 7 of the 15 cultured organisms and 8 of the 9 sequences identified by PCR amplification diverged from previously reported bacterial sequences. The microbial composition of midguts differed substantially among larvae feeding on a sterilized artificial diet, aspen, larch, white oak, or willow. 16S rRNA analysis of cultured isolates indicated that an Enterococcus species and culture-independent analysis indicated that an Entbacter sp. were both present in all larvae, regardless of the feeding substrate; the sequences of these two phylotypes varied less than 1% among individual insects. These results provide the first comprehensive description of the microbial diversity of a lepidopteran midgut and demonstrate that the plant species in the diet influences the composition of the gut bacterial community.

  18. A Nonautochthonous U.S. Strain of Vibrio parahaemolyticus Isolated from Chesapeake Bay Oysters Caused the Outbreak in Maryland in 2010

    PubMed Central

    Haendiges, Julie; Jones, Jessica; Myers, Robert A.; Mitchell, Clifford S.; Butler, Erin

    2016-01-01

    ABSTRACT In the summer of 2010, Vibrio parahaemolyticus caused an outbreak in Maryland linked to the consumption of oysters. Strains isolated from both stool and oyster samples were indistinguishable by pulsed-field gel electrophoresis (PFGE). However, the oysters contained other potentially pathogenic V. parahaemolyticus strains exhibiting different PFGE patterns. In order to assess the identity, genetic makeup, relatedness, and potential pathogenicity of the V. parahaemolyticus strains, we sequenced 11 such strains (2 clinical strains and 9 oyster strains). We analyzed these genomes by in silico multilocus sequence typing (MLST) and determined their phylogeny using a whole-genome MLST (wgMLST) analysis. Our in silico MLST analysis identified six different sequence types (STs) (ST8, ST676, ST810, ST811, ST34, and ST768), with both of the clinical and four of the oyster strains being identified as belonging to ST8. Using wgMLST, we showed that the ST8 strains from clinical and oyster samples were nearly indistinguishable and belonged to the same outbreak, confirming that local oysters were the source of the infections. The remaining oyster strains were genetically diverse, differing in >3,000 loci from the Maryland ST8 strains. eBURST analysis comparing these strains with strains of other STs available at the V. parahaemolyticus MLST website showed that the Maryland ST8 strains belonged to a clonal complex endemic to Asia. This indicates that the ST8 isolates from clinical and oyster sources were likely not endemic to Maryland. Finally, this study demonstrates the utility of whole-genome sequencing (WGS) and associated analyses for source-tracking investigations. IMPORTANCE Vibrio parahaemolyticus is an important foodborne pathogen and the leading cause of bacterial infections in the United States associated with the consumption of seafood. In the summer of 2010, Vibrio parahaemolyticus caused an outbreak in Maryland linked to oyster consumption. Strains isolated from stool and oyster samples were indistinguishable by pulsed-field gel electrophoresis (PFGE). The oysters also contained other potentially pathogenic V. parahaemolyticus strains with different PFGE patterns. Since their identity, genetic makeup, relatedness, and potential pathogenicity were unknown, their genomes were determined by using next-generation sequencing. Whole-genome sequencing (WGS) analysis by whole-genome multilocus sequence typing (wgMLST) allowed (i) identification of clinical and oyster strains with matching PFGE profiles as belonging to ST8, (ii) determination of oyster strain diversity, and (iii) identification of the clinical strains as belonging to a clonal complex (CC) described only in Asia. Finally, WGS and associated analyses demonstrated their utility for trace-back investigations. PMID:26994080

  19. A Nonautochthonous U.S. Strain of Vibrio parahaemolyticus Isolated from Chesapeake Bay Oysters Caused the Outbreak in Maryland in 2010.

    PubMed

    Haendiges, Julie; Jones, Jessica; Myers, Robert A; Mitchell, Clifford S; Butler, Erin; Toro, Magaly; Gonzalez-Escalona, Narjol

    2016-06-01

    In the summer of 2010, Vibrio parahaemolyticus caused an outbreak in Maryland linked to the consumption of oysters. Strains isolated from both stool and oyster samples were indistinguishable by pulsed-field gel electrophoresis (PFGE). However, the oysters contained other potentially pathogenic V. parahaemolyticus strains exhibiting different PFGE patterns. In order to assess the identity, genetic makeup, relatedness, and potential pathogenicity of the V. parahaemolyticus strains, we sequenced 11 such strains (2 clinical strains and 9 oyster strains). We analyzed these genomes by in silico multilocus sequence typing (MLST) and determined their phylogeny using a whole-genome MLST (wgMLST) analysis. Our in silico MLST analysis identified six different sequence types (STs) (ST8, ST676, ST810, ST811, ST34, and ST768), with both of the clinical and four of the oyster strains being identified as belonging to ST8. Using wgMLST, we showed that the ST8 strains from clinical and oyster samples were nearly indistinguishable and belonged to the same outbreak, confirming that local oysters were the source of the infections. The remaining oyster strains were genetically diverse, differing in >3,000 loci from the Maryland ST8 strains. eBURST analysis comparing these strains with strains of other STs available at the V. parahaemolyticus MLST website showed that the Maryland ST8 strains belonged to a clonal complex endemic to Asia. This indicates that the ST8 isolates from clinical and oyster sources were likely not endemic to Maryland. Finally, this study demonstrates the utility of whole-genome sequencing (WGS) and associated analyses for source-tracking investigations. Vibrio parahaemolyticus is an important foodborne pathogen and the leading cause of bacterial infections in the United States associated with the consumption of seafood. In the summer of 2010, Vibrio parahaemolyticus caused an outbreak in Maryland linked to oyster consumption. Strains isolated from stool and oyster samples were indistinguishable by pulsed-field gel electrophoresis (PFGE). The oysters also contained other potentially pathogenic V. parahaemolyticus strains with different PFGE patterns. Since their identity, genetic makeup, relatedness, and potential pathogenicity were unknown, their genomes were determined by using next-generation sequencing. Whole-genome sequencing (WGS) analysis by whole-genome multilocus sequence typing (wgMLST) allowed (i) identification of clinical and oyster strains with matching PFGE profiles as belonging to ST8, (ii) determination of oyster strain diversity, and (iii) identification of the clinical strains as belonging to a clonal complex (CC) described only in Asia. Finally, WGS and associated analyses demonstrated their utility for trace-back investigations. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  20. Development of phoH as a Novel Signature Gene for Assessing Marine Phage Diversity▿

    PubMed Central

    Goldsmith, Dawn B.; Crosti, Giuseppe; Dwivedi, Bhakti; McDaniel, Lauren D.; Varsani, Arvind; Suttle, Curtis A.; Weinbauer, Markus G.; Sandaa, Ruth-Anne; Breitbart, Mya

    2011-01-01

    Phages play a key role in the marine environment by regulating the transfer of energy between trophic levels and influencing global carbon and nutrient cycles. The diversity of marine phage communities remains difficult to characterize because of the lack of a signature gene common to all phages. Recent studies have demonstrated the presence of host-derived auxiliary metabolic genes in phage genomes, such as those belonging to the Pho regulon, which regulates phosphate uptake and metabolism under low-phosphate conditions. Among the completely sequenced phage genomes in GenBank, this study identified Pho regulon genes in nearly 40% of the marine phage genomes, while only 4% of nonmarine phage genomes contained these genes. While several Pho regulon genes were identified, phoH was the most prevalent, appearing in 42 out of 602 completely sequenced phage genomes. Phylogenetic analysis demonstrated that phage phoH sequences formed a cluster distinct from those of their bacterial hosts. PCR primers designed to amplify a region of the phoH gene were used to determine the diversity of phage phoH sequences throughout a depth profile in the Sargasso Sea and at six locations worldwide. phoH was present at all sites examined, and a high diversity of phoH sequences was recovered. Most phoH sequences belonged to clusters without any cultured representatives. Each depth and geographic location had a distinct phoH composition, although most phoH clusters were recovered from multiple sites. Overall, phoH is an effective signature gene for examining phage diversity in the marine environment. PMID:21926220

Top