Sample records for considerable sequence similarity

  1. Sequence Similarity Presenter: a tool for the graphic display of similarities of long sequences for use in presentations.

    PubMed

    Fröhlich, K U

    1994-04-01

    A new method for the presentation of alignments of long sequences is described. The degree of identity for the aligned sequences is averaged for sections of a fixed number of residues. The resulting values are converted to shades of gray, with white corresponding to lack of identity and black corresponding to perfect identity. A sequence alignment is represented as a bar filled with varying shades of gray. The display is compact and allows for a fast and intuitive recognition of the distribution of regions with a high similarity. It is well suited for the presentation of alignments of long sequences, e.g. of protein superfamilies, in plenary lectures. The method is implemented as a HyperCard stack for Apple Macintosh computers. Several options for the modification of the output are available (e.g. background reduction, size of the summation window, consideration of amino acid similarity, inclusion of graphic markers to indicate specific domains). The output is a PostScript file which can be printed, imported as EPS or processed further with Adobe Illustrator.

  2. A new method to improve network topological similarity search: applied to fold recognition

    PubMed Central

    Lhota, John; Hauptman, Ruth; Hart, Thomas; Ng, Clara; Xie, Lei

    2015-01-01

    Motivation: Similarity search is the foundation of bioinformatics. It plays a key role in establishing structural, functional and evolutionary relationships between biological sequences. Although the power of the similarity search has increased steadily in recent years, a high percentage of sequences remain uncharacterized in the protein universe. Thus, new similarity search strategies are needed to efficiently and reliably infer the structure and function of new sequences. The existing paradigm for studying protein sequence, structure, function and evolution has been established based on the assumption that the protein universe is discrete and hierarchical. Cumulative evidence suggests that the protein universe is continuous. As a result, conventional sequence homology search methods may be not able to detect novel structural, functional and evolutionary relationships between proteins from weak and noisy sequence signals. To overcome the limitations in existing similarity search methods, we propose a new algorithmic framework—Enrichment of Network Topological Similarity (ENTS)—to improve the performance of large scale similarity searches in bioinformatics. Results: We apply ENTS to a challenging unsolved problem: protein fold recognition. Our rigorous benchmark studies demonstrate that ENTS considerably outperforms state-of-the-art methods. As the concept of ENTS can be applied to any similarity metric, it may provide a general framework for similarity search on any set of biological entities, given their representation as a network. Availability and implementation: Source code freely available upon request Contact: lxie@iscb.org PMID:25717198

  3. Previously unknown and highly divergent ssDNA viruses populate the oceans.

    PubMed

    Labonté, Jessica M; Suttle, Curtis A

    2013-11-01

    Single-stranded DNA (ssDNA) viruses are economically important pathogens of plants and animals, and are widespread in oceans; yet, the diversity and evolutionary relationships among marine ssDNA viruses remain largely unknown. Here we present the results from a metagenomic study of composite samples from temperate (Saanich Inlet, 11 samples; Strait of Georgia, 85 samples) and subtropical (46 samples, Gulf of Mexico) seawater. Most sequences (84%) had no evident similarity to sequenced viruses. In total, 608 putative complete genomes of ssDNA viruses were assembled, almost doubling the number of ssDNA viral genomes in databases. These comprised 129 genetically distinct groups, each represented by at least one complete genome that had no recognizable similarity to each other or to other virus sequences. Given that the seven recognized families of ssDNA viruses have considerable sequence homology within them, this suggests that many of these genetic groups may represent new viral families. Moreover, nearly 70% of the sequences were similar to one of these genomes, indicating that most of the sequences could be assigned to a genetically distinct group. Most sequences fell within 11 well-defined gene groups, each sharing a common gene. Some of these encoded putative replication and coat proteins that had similarity to sequences from viruses infecting eukaryotes, suggesting that these were likely from viruses infecting eukaryotic phytoplankton and zooplankton.

  4. Pstl repeat: a family of short interspersed nucleotide element (SINE)-like sequences in the genomes of cattle, goat, and buffalo.

    PubMed

    Sheikh, Faruk G; Mukhopadhyay, Sudit S; Gupta, Prabhakar

    2002-02-01

    The PstI family of elements are short, highly repetitive DNA sequences interspersed throughout the genome of the Bovidae. We have cloned and sequenced some members of the PstI family from cattle, goat, and buffalo. These elements are approximately 500 bp, have a copy number of 2 x 10(5) - 4 x 10(5), and comprise about 4% of the haploid genome. Studies of nucleotide sequence homology indicate that the buffalo and goat PstI repeats (type II) are similar types of short interspersed nucleotide element (SINE) sequences, but the cattle PstI repeat (type I) is considerably more divergent. Additionally, the goat PstI sequence showed significant sequence homology with bovine serine tRNA, and is therefore likely derived from serine tRNA. Interestingly, Southern hybridization suggests that both types of SINEs (I and II) are present in all the species of Bovidae. Dendrogram analysis indicates that cattle PstI SINE is similar to bovine Alu-like SINEs. Goat and buffalo SINEs formed a separate cluster, suggesting that these two types of SINEs evolved separately in the genome of the Bovidae.

  5. Ebolavirus comparative genomics

    DOE PAGES

    Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat; ...

    2015-07-14

    The 2014 Ebola outbreak in West Africa is the largest documented for this virus. We examine the dynamics of this genome, comparing more than one hundred currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus, and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of themore » same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP), and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. In conclusion, this information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies.« less

  6. Targeted Re-Sequencing Emulsion PCR Panel for Myopathies: Results in 94 Cases.

    PubMed

    Punetha, Jaya; Kesari, Akanchha; Uapinyoying, Prech; Giri, Mamta; Clarke, Nigel F; Waddell, Leigh B; North, Kathryn N; Ghaoui, Roula; O'Grady, Gina L; Oates, Emily C; Sandaradura, Sarah A; Bönnemann, Carsten G; Donkervoort, Sandra; Plotz, Paul H; Smith, Edward C; Tesi-Rocha, Carolina; Bertorini, Tulio E; Tarnopolsky, Mark A; Reitter, Bernd; Hausmanowa-Petrusewicz, Irena; Hoffman, Eric P

    2016-05-27

    Molecular diagnostics in the genetic myopathies often requires testing of the largest and most complex transcript units in the human genome (DMD, TTN, NEB). Iteratively targeting single genes for sequencing has traditionally entailed high costs and long turnaround times. Exome sequencing has begun to supplant single targeted genes, but there are concerns regarding coverage and needed depth of the very large and complex genes that frequently cause myopathies. To evaluate efficiency of next-generation sequencing technologies to provide molecular diagnostics for patients with previously undiagnosed myopathies. We tested a targeted re-sequencing approach, using a 45 gene emulsion PCR myopathy panel, with subsequent sequencing on the Illumina platform in 94 undiagnosed patients. We compared the targeted re-sequencing approach to exome sequencing for 10 of these patients studied. We detected likely pathogenic mutations in 33 out of 94 patients with a molecular diagnostic rate of approximately 35%. The remaining patients showed variants of unknown significance (35/94 patients) or no mutations detected in the 45 genes tested (26/94 patients). Mutation detection rates for targeted re-sequencing vs. whole exome were similar in both methods; however exome sequencing showed better distribution of reads and fewer exon dropouts. Given that costs of highly parallel re-sequencing and whole exome sequencing are similar, and that exome sequencing now takes considerably less laboratory processing time than targeted re-sequencing, we recommend exome sequencing as the standard approach for molecular diagnostics of myopathies.

  7. Fuzzy measures on the Gene Ontology for gene product similarity.

    PubMed

    Popescu, Mihail; Keller, James M; Mitchell, Joyce A

    2006-01-01

    One of the most important objects in bioinformatics is a gene product (protein or RNA). For many gene products, functional information is summarized in a set of Gene Ontology (GO) annotations. For these genes, it is reasonable to include similarity measures based on the terms found in the GO or other taxonomy. In this paper, we introduce several novel measures for computing the similarity of two gene products annotated with GO terms. The fuzzy measure similarity (FMS) has the advantage that it takes into consideration the context of both complete sets of annotation terms when computing the similarity between two gene products. When the two gene products are not annotated by common taxonomy terms, we propose a method that avoids a zero similarity result. To account for the variations in the annotation reliability, we propose a similarity measure based on the Choquet integral. These similarity measures provide extra tools for the biologist in search of functional information for gene products. The initial testing on a group of 194 sequences representing three proteins families shows a higher correlation of the FMS and Choquet similarities to the BLAST sequence similarities than the traditional similarity measures such as pairwise average or pairwise maximum.

  8. Wideband Arrhythmia-Insensitive-Rapid (AIR) Pulse Sequence for Cardiac T1 mapping without Image Artifacts induced by ICD

    PubMed Central

    Hong, KyungPyo; Jeong, Eun-Kee; Wall, T. Scott; Drakos, Stavros G.; Kim, Daniel

    2015-01-01

    Purpose To develop and evaluate a wideband arrhythmia-insensitive-rapid (AIR) pulse sequence for cardiac T1 mapping without image artifacts induced by implantable-cardioverter-defibrillator (ICD). Methods We developed a wideband AIR pulse sequence by incorporating a saturation pulse with wide frequency bandwidth (8.9 kHz), in order to achieve uniform T1 weighting in the heart with ICD. We tested the performance of original and “wideband” AIR cardiac T1 mapping pulse sequences in phantom and human experiments at 1.5T. Results In 5 phantoms representing native myocardium and blood and post-contrast blood/tissue T1 values, compared with the control T1 values measured with an inversion-recovery pulse sequence without ICD, T1 values measured with original AIR with ICD were considerably lower (absolute percent error >29%), whereas T1 values measured with wideband AIR with ICD were similar (absolute percent error <5%). Similarly, in 11 human subjects, compared with the control T1 values measured with original AIR without ICD, T1 measured with original AIR with ICD was significantly lower (absolute percent error >10.1%), whereas T1 measured with wideband AIR with ICD was similar (absolute percent error <2.0%). Conclusion This study demonstrates the feasibility of a wideband pulse sequence for cardiac T1 mapping without significant image artifacts induced by ICD. PMID:25975192

  9. On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation

    PubMed Central

    2014-01-01

    Background Protein sequence similarities to any types of non-globular segments (coiled coils, low complexity regions, transmembrane regions, long loops, etc. where either positional sequence conservation is the result of a very simple, physically induced pattern or rather integral sequence properties are critical) are pertinent sources for mistaken homologies. Regretfully, these considerations regularly escape attention in large-scale annotation studies since, often, there is no substitute to manual handling of these cases. Quantitative criteria are required to suppress events of function annotation transfer as a result of false homology assignments. Results The sequence homology concept is based on the similarity comparison between the structural elements, the basic building blocks for conferring the overall fold of a protein. We propose to dissect the total similarity score into fold-critical and other, remaining contributions and suggest that, for a valid homology statement, the fold-relevant score contribution should at least be significant on its own. As part of the article, we provide the DissectHMMER software program for dissecting HMMER2/3 scores into segment-specific contributions. We show that DissectHMMER reproduces HMMER2/3 scores with sufficient accuracy and that it is useful in automated decisions about homology for instructive sequence examples. To generalize the dissection concept for cases without 3D structural information, we find that a dissection based on alignment quality is an appropriate surrogate. The approach was applied to a large-scale study of SMART and PFAM domains in the space of seed sequences and in the space of UniProt/SwissProt. Conclusions Sequence similarity core dissection with regard to fold-critical and other contributions systematically suppresses false hits and, additionally, recovers previously obscured homology relationships such as the one between aquaporins and formate/nitrite transporters that, so far, was only supported by structure comparison. PMID:24890864

  10. A reduced amino acid alphabet for understanding and designing protein adaptation to mutation.

    PubMed

    Etchebest, C; Benros, C; Bornot, A; Camproux, A-C; de Brevern, A G

    2007-11-01

    Protein sequence world is considerably larger than structure world. In consequence, numerous non-related sequences may adopt similar 3D folds and different kinds of amino acids may thus be found in similar 3D structures. By grouping together the 20 amino acids into a smaller number of representative residues with similar features, sequence world simplification may be achieved. This clustering hence defines a reduced amino acid alphabet (reduced AAA). Numerous works have shown that protein 3D structures are composed of a limited number of building blocks, defining a structural alphabet. We previously identified such an alphabet composed of 16 representative structural motifs (5-residues length) called Protein Blocks (PBs). This alphabet permits to translate the structure (3D) in sequence of PBs (1D). Based on these two concepts, reduced AAA and PBs, we analyzed the distributions of the different kinds of amino acids and their equivalences in the structural context. Different reduced sets were considered. Recurrent amino acid associations were found in all the local structures while other were specific of some local structures (PBs) (e.g Cysteine, Histidine, Threonine and Serine for the alpha-helix Ncap). Some similar associations are found in other reduced AAAs, e.g Ile with Val, or hydrophobic aromatic residues Trp with Phe and Tyr. We put into evidence interesting alternative associations. This highlights the dependence on the information considered (sequence or structure). This approach, equivalent to a substitution matrix, could be useful for designing protein sequence with different features (for instance adaptation to environment) while preserving mainly the 3D fold.

  11. Unusual Intron Conservation near Tissue-Regulated Exons Found by Splicing Microarrays

    PubMed Central

    Sugnet, Charles W; Srinivasan, Karpagam; Clark, Tyson A; O'Brien, Georgeann; Cline, Melissa S; Wang, Hui; Williams, Alan; Kulp, David; Blume, John E; Haussler, David; Ares, Manuel

    2006-01-01

    Alternative splicing contributes to both gene regulation and protein diversity. To discover broad relationships between regulation of alternative splicing and sequence conservation, we applied a systems approach, using oligonucleotide microarrays designed to capture splicing information across the mouse genome. In a set of 22 adult tissues, we observe differential expression of RNA containing at least two alternative splice junctions for about 40% of the 6,216 alternative events we could detect. Statistical comparisons identify 171 cassette exons whose inclusion or skipping is different in brain relative to other tissues and another 28 exons whose splicing is different in muscle. A subset of these exons is associated with unusual blocks of intron sequence whose conservation in vertebrates rivals that of protein-coding exons. By focusing on sets of exons with similar regulatory patterns, we have identified new sequence motifs implicated in brain and muscle splicing regulation. Of note is a motif that is strikingly similar to the branchpoint consensus but is located downstream of the 5′ splice site of exons included in muscle. Analysis of three paralogous membrane-associated guanylate kinase genes reveals that each contains a paralogous tissue-regulated exon with a similar tissue inclusion pattern. While the intron sequences flanking these exons remain highly conserved among mammalian orthologs, the paralogous flanking intron sequences have diverged considerably, suggesting unusually complex evolution of the regulation of alternative splicing in multigene families. PMID:16424921

  12. Species classifier choice is a key consideration when analysing low-complexity food microbiome data.

    PubMed

    Walsh, Aaron M; Crispie, Fiona; O'Sullivan, Orla; Finnegan, Laura; Claesson, Marcus J; Cotter, Paul D

    2018-03-20

    The use of shotgun metagenomics to analyse low-complexity microbial communities in foods has the potential to be of considerable fundamental and applied value. However, there is currently no consensus with respect to choice of species classification tool, platform, or sequencing depth. Here, we benchmarked the performances of three high-throughput short-read sequencing platforms, the Illumina MiSeq, NextSeq 500, and Ion Proton, for shotgun metagenomics of food microbiota. Briefly, we sequenced six kefir DNA samples and a mock community DNA sample, the latter constructed by evenly mixing genomic DNA from 13 food-related bacterial species. A variety of bioinformatic tools were used to analyse the data generated, and the effects of sequencing depth on these analyses were tested by randomly subsampling reads. Compositional analysis results were consistent between the platforms at divergent sequencing depths. However, we observed pronounced differences in the predictions from species classification tools. Indeed, PERMANOVA indicated that there was no significant differences between the compositional results generated by the different sequencers (p = 0.693, R 2  = 0.011), but there was a significant difference between the results predicted by the species classifiers (p = 0.01, R 2  = 0.127). The relative abundances predicted by the classifiers, apart from MetaPhlAn2, were apparently biased by reference genome sizes. Additionally, we observed varying false-positive rates among the classifiers. MetaPhlAn2 had the lowest false-positive rate, whereas SLIMM had the greatest false-positive rate. Strain-level analysis results were also similar across platforms. Each platform correctly identified the strains present in the mock community, but accuracy was improved slightly with greater sequencing depth. Notably, PanPhlAn detected the dominant strains in each kefir sample above 500,000 reads per sample. Again, the outputs from functional profiling analysis using SUPER-FOCUS were generally accordant between the platforms at different sequencing depths. Finally, and expectedly, metagenome assembly completeness was significantly lower on the MiSeq than either on the NextSeq (p = 0.03) or the Proton (p = 0.011), and it improved with increased sequencing depth. Our results demonstrate a remarkable similarity in the results generated by the three sequencing platforms at different sequencing depths, and, in fact, the choice of bioinformatics methodology had a more evident impact on results than the choice of sequencer did.

  13. Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence

    PubMed Central

    Gordon, Kacy L.; Arthur, Robert K.; Ruvinsky, Ilya

    2015-01-01

    Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements. PMID:26020930

  14. StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase.

    PubMed

    Zemla, Adam T; Lang, Dorothy M; Kostova, Tanya; Andino, Raul; Ecale Zhou, Carol L

    2011-06-02

    Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory--still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could help overcome these difficulties by facilitating the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV (structure-alignment sequence variability), a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus, and we demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique, or that share structural similarity with proteins that would be considered distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local structural alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position. StralSV is provided as a web service at http://proteinmodel.org/AS2TS/STRALSV/.

  15. 5S ribosomal ribonucleic acid sequences in Bacteroides and Fusobacterium: evolutionary relationships within these genera and among eubacteria in general

    NASA Technical Reports Server (NTRS)

    Van den Eynde, H.; De Baere, R.; Shah, H. N.; Gharbia, S. E.; Fox, G. E.; Michalik, J.; Van de Peer, Y.; De Wachter, R.

    1989-01-01

    The 5S ribosomal ribonucleic acid (rRNA) sequences were determined for Bacteroides fragilis, Bacteroides thetaiotaomicron, Bacteroides capillosus, Bacteroides veroralis, Porphyromonas gingivalis, Anaerorhabdus furcosus, Fusobacterium nucleatum, Fusobacterium mortiferum, and Fusobacterium varium. A dendrogram constructed by a clustering algorithm from these sequences, which were aligned with all other hitherto known eubacterial 5S rRNA sequences, showed differences as well as similarities with respect to results derived from 16S rRNA analyses. In the 5S rRNA dendrogram, Bacteroides clustered together with Cytophaga and Fusobacterium, as in 16S rRNA analyses. Intraphylum relationships deduced from 5S rRNAs suggested that Bacteroides is specifically related to Cytophaga rather than to Fusobacterium, as was suggested by 16S rRNA analyses. Previous taxonomic considerations concerning the genus Bacteroides, based on biochemical and physiological data, were confirmed by the 5S rRNA sequence analysis.

  16. Archaebacterial rhodopsin sequences: Implications for evolution

    NASA Technical Reports Server (NTRS)

    Lanyi, J. K.

    1991-01-01

    It was proposed over 10 years ago that the archaebacteria represent a separate kingdom which diverged very early from the eubacteria and eukaryotes. It follows that investigations of archaebacterial characteristics might reveal features of early evolution. So far, two genes, one for bacteriorhodopsin and another for halorhodopsin, both from Halobacterium halobium, have been sequenced. We cloned and sequenced the gene coding for the polypeptide of another one of these rhodopsins, a halorhodopsin in Natronobacterium pharaonis. Peptide sequencing of cyanogen bromide fragments, and immuno-reactions of the protein and synthetic peptides derived from the C-terminal gene sequence, confirmed that the open reading frame was the structural gene for the pharaonis halorhodopsin polypeptide. The flanking DNA sequences of this gene, as well as those of other bacterial rhodopsins, were compared to previously proposed archaebacterial consensus sequences. In pairwise comparisons of the open reading frame with DNA sequences for bacterio-opsin and halo-opsin from Halobacterium halobium, silent divergences were calculated. These indicate very considerable evolutionary distance between each pair of genes, even in the dame organism. In spite of this, three protein sequences show extensive similarities, indicating strong selective pressures.

  17. StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zemla, A; Lang, D; Kostova, T

    2010-11-29

    Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory - still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could overcome these difficulties and facilitatemore » the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV, a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus and demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique or that shared structural similarity with structures that are distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position.« less

  18. Genetic diversity of Clostridium perfringens type A isolates from animals, food poisoning outbreaks and sludge

    PubMed Central

    Johansson, Anders; Aspan, Anna; Bagge, Elisabeth; Båverud, Viveca; Engström, Björn E; Johansson, Karl-Erik

    2006-01-01

    Background Clostridium perfringens, a serious pathogen, causes enteric diseases in domestic animals and food poisoning in humans. The epidemiological relationship between C. perfringens isolates from the same source has previously been investigated chiefly by pulsed-field gel electrophoresis (PFGE). In this study the genetic diversity of C. perfringens isolated from various animals, from food poisoning outbreaks and from sludge was investigated. Results We used PFGE to examine the genetic diversity of 95 C. perfringens type A isolates from eight different sources. The isolates were also examined for the presence of the beta2 toxin gene (cpb2) and the enterotoxin gene (cpe). The cpb2 gene from the 28 cpb2-positive isolates was also partially sequenced (519 bp, corresponding to positions 188 to 706 in the consensus cpb2 sequence). The results of PFGE revealed a wide genetic diversity among the C. perfringens type A isolates. The genetic relatedness of the isolates ranged from 58 to 100% and 56 distinct PFGE types were identified. Almost all clusters with similar patterns comprised isolates with a known epidemiological correlation. Most of the isolates from pig, horse and sheep carried the cpb2 gene. All isolates originating from food poisoning outbreaks carried the cpe gene and three of these also carried cpb2. Two evolutionary different populations were identified by sequence analysis of the partially sequenced cpb2 genes from our study and cpb2 sequences previously deposited in GenBank. Conclusion As revealed by PFGE, there was a wide genetic diversity among C. perfringens isolates from different sources. Epidemiologically related isolates showed a high genetic similarity, as expected, while isolates with no obvious epidemiological relationship expressed a lesser degree of genetic similarity. The wide diversity revealed by PFGE was not reflected in the 16S rRNA sequences, which had a considerable degree of sequence similarity. Sequence comparison of the partially sequenced cpb2 gene revealed two genetically different populations. This is to our knowledge the first study in which the genetic diversity of C. perfringens isolates both from different animals species, from food poisoning outbreaks and from sludge has been investigated. PMID:16737528

  19. Identification and characterization of the reptilian GnRH-II gene in the leopard gecko, Eublepharis macularius, and its evolutionary considerations.

    PubMed

    Ikemoto, Tadahiro; Park, Min Kyun

    2003-10-16

    To elucidate the molecular phylogeny and evolution of a particular peptide, one must analyze not the limited primary amino acid sequences of the low molecular weight mature polypeptide, but rather the sequences of the corresponding precursors from various species. Of all the structural variants of gonadotropin-releasing hormone (GnRH), GnRH-II (chicken GnRH-II, or cGnRH-II) is remarkably conserved without any sequence substitutions among vertebrates, but its precursor sequences vary considerably. We have identified and characterized the full-length complementary DNA (cDNA) encoding the GnRH-II precursor and determined its genomic structure, consisting of four exons and three introns, in a reptilian species, the leopard gecko Eublepharis macularius. This is the first report about the GnRH-II precursor cDNA/gene from reptiles. The deduced leopard gecko prepro-GnRH-II polypeptide had the highest identities with the corresponding polypeptides of amphibians. The GnRH-II precursor mRNA was detected in more than half of the tissues and organs examined. This widespread expression is consistent with the previous findings in several species, though the roles of GnRH outside the hypothalamus-pituitary-gonadal axis remain largely unknown. Molecular phylogenetic analysis combined with sequence comparison showed that the leopard gecko is more similar to fishes and amphibians than to eutherian mammals with respect to the GnRH-II precursor sequence. These results strongly suggest that the divergence of the GnRH-II precursor sequences seen in eutherian mammals may have occurred along with amniote evolution.

  20. The Liverwort Contains a Lectin That Is Structurally and Evolutionary Related to the Monocot Mannose-Binding Lectins1

    PubMed Central

    Peumans, Willy J.; Barre, Annick; Bras, Julien; Rougé, Pierre; Proost, Paul; Van Damme, Els J.M.

    2002-01-01

    A mannose (Man)-binding lectin has been isolated and characterized from the thallus of the liverwort Marchantia polymorpha. N-terminal sequencing indicated that the M. polymorpha agglutinin (Marpola) shares sequence similarity with the superfamily of monocot Man-binding lectins. Searches in the databases yielded expressed sequence tags encoding Marpola. Sequence analysis, molecular modeling, and docking experiments revealed striking structural similarities between Marpola and the monocot Man-binding lectins. Activity and specificity studies further indicated that Marpola is a much stronger agglutinin than the Galanthus nivalis agglutinin and exhibits a preference for methylated Man and glucose, which is unprecedented within the family of monocot Man-binding lectins. The discovery of Marpola allows us, for the first time, to corroborate the evolutionary relationship between a lectin from a lower plant and a well-established lectin family from flowering plants. In addition, the identification of Marpola sheds a new light on the molecular evolution of the superfamily of monocot Man-binding lectins. Beside evolutionary considerations, the occurrence of a G. nivalis agglutinin homolog in a lower plant necessitates the rethinking of the physiological role of the whole family of monocot Man-binding lectins. PMID:12114560

  1. OrthoANI: An improved algorithm and software for calculating average nucleotide identity.

    PubMed

    Lee, Imchang; Ouk Kim, Yeong; Park, Sang-Cheol; Chun, Jongsik

    2016-02-01

    Species demarcation in Bacteria and Archaea is mainly based on overall genome relatedness, which serves a framework for modern microbiology. Current practice for obtaining these measures between two strains is shifting from experimentally determined similarity obtained by DNA-DNA hybridization (DDH) to genome-sequence-based similarity. Average nucleotide identity (ANI) is a simple algorithm that mimics DDH. Like DDH, ANI values between two genome sequences may be different from each other when reciprocal calculations are compared. We compared 63 690 pairs of genome sequences and found that the differences in reciprocal ANI values are significantly high, exceeding 1 % in some cases. To resolve this problem of not being symmetrical, a new algorithm, named OrthoANI, was developed to accommodate the concept of orthology for which both genome sequences were fragmented and only orthologous fragment pairs taken into consideration for calculating nucleotide identities. OrthoANI is highly correlated with ANI (using BLASTn) and the former showed approximately 0.1 % higher values than the latter. In conclusion, OrthoANI provides a more robust and faster means of calculating average nucleotide identity for taxonomic purposes. The standalone software tools are freely available at http://www.ezbiocloud.net/sw/oat.

  2. Diversity and evolution of centromere repeats in the maize genome.

    PubMed

    Bilinski, Paul; Distor, Kevin; Gutierrez-Lopez, Jose; Mendoza, Gabriela Mendoza; Shi, Jinghua; Dawe, R Kelly; Ross-Ibarra, Jeffrey

    2015-03-01

    Centromere repeats are found in most eukaryotes and play a critical role in kinetochore formation. Though centromere repeats exhibit considerable diversity both within and among species, little is understood about the mechanisms that drive centromere repeat evolution. Here, we use maize as a model to investigate how a complex history involving polyploidy, fractionation, and recent domestication has impacted the diversity of the maize centromeric repeat CentC. We first validate the existence of long tandem arrays of repeats in maize and other taxa in the genus Zea. Although we find considerable sequence diversity among CentC copies genome-wide, genetic similarity among repeats is highest within these arrays, suggesting that tandem duplications are the primary mechanism for the generation of new copies. Nonetheless, clustering analyses identify similar sequences among distant repeats, and simulations suggest that this pattern may be due to homoplasious mutation. Although the two ancestral subgenomes of maize have contributed nearly equal numbers of centromeres, our analysis shows that the majority of all CentC repeats derive from one of the parental genomes, with an even stronger bias when examining the largest assembled contiguous clusters. Finally, by comparing maize with its wild progenitor teosinte, we find that the abundance of CentC likely decreased after domestication, while the pericentromeric repeat Cent4 has drastically increased.

  3. Expression of three mammalian cDNAs that interfere with RAS function in Saccharomyces cerevisiae.

    PubMed Central

    Colicelli, J; Nicolette, C; Birchmeier, C; Rodgers, L; Riggs, M; Wigler, M

    1991-01-01

    Saccharomyces cerevisiae strains expressing the activated RAS2Val19 gene or lacking both cAMP phosphodiesterase genes, PDE1 and PDE2, have impaired growth control and display an acute sensitivity to heat shock. We have isolated two classes of mammalian cDNAs from yeast expression libraries that suppress the heat shock-sensitive phenotype of RAS2Val19 strain. Members of the first class of cDNAs also suppress the heat shock-sensitive phenotype of pde1- pde2- strains and encode cAMP phosphodiesterases. Members of the second class fail to suppress the phenotype of pde1- pde2- strains and therefore are candidate cDNAs encoding proteins that interact with RAS proteins. We report the nucleotide sequence of three members of this class. Two of these cDNAs share considerable sequence similarity, but none are clearly similar to previously isolated genes. Images PMID:1849280

  4. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction.

    PubMed

    Cui, Xuefeng; Lu, Zhiwu; Wang, Sheng; Jing-Yan Wang, Jim; Gao, Xin

    2016-06-15

    Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence-structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM-HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods. Our program is freely available for download from http://sfb.kaust.edu.sa/Pages/Software.aspx : xin.gao@kaust.edu.sa Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  5. Constraints on the depositional age and tectonometamorphic evolution of marbles from the Biharia Nappe System (Apuseni Mountains, Romania)

    NASA Astrophysics Data System (ADS)

    Reiser, Martin Kaspar; Schuster, Ralf; Tropper, Peter; Fügenschuh, Bernhard

    2017-04-01

    Basement rocks from the Biharia Nappe System in the Apuseni Mountains comprise several dolomite and calcite marble sequences or lenses which experienced deformation and metamorphic overprint during the Alpine orogeny. New Sr, O and C-isotope data in combination with considerations from the lithological sequences indicate Middle to Late Triassic deposition of calcite marbles from the Vulturese-Belioara Series (Biharia Nappe s.str.). Ductile deformation and large-scale folding of the siliciclastic and carbonatic lithologies is attributed to NW-directed nappe stacking during late Early Cretaceous times (D2). The studied marble sequences experienced a metamorphic overprint under lower greenschist- facies conditions (316-370 °C based on calcite - dolomite geothermometry) during this tectonic event. Other marble sequences from the Biharia Nappe System (i.e. Vidolm and Baia de Arieș nappes) show similarities in the stratigraphic sequence and their isotope signature, together with a comparable structural position close to nappe contact. However, the dataset is not concise enough to allow for a definitive attribution of a Mesozoic origin to other marble sequences than the Vulturese-Belioara Series.

  6. Memory and learning with rapid audiovisual sequences

    PubMed Central

    Keller, Arielle S.; Sekuler, Robert

    2015-01-01

    We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed. PMID:26575193

  7. Memory and learning with rapid audiovisual sequences.

    PubMed

    Keller, Arielle S; Sekuler, Robert

    2015-01-01

    We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed.

  8. SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters.

    PubMed

    Wang, Chunlin; Lefkowitz, Elliot J

    2004-10-28

    Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST) or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary. We describe the implementation of SS-Wrapper (Similarity Search Wrapper), a package of wrapper applications that can parallelize similarity search applications on a Linux cluster. Our wrapper utilizes a query segmentation-search (QS-search) approach to parallelize sequence database search applications. It takes into consideration load balancing between each node on the cluster to maximize resource usage. QS-search is designed to wrap many different search tools, such as BLAST and HMMPFAM using the same interface. This implementation does not alter the original program, so newly obtained programs and program updates should be accommodated easily. Benchmark experiments using QS-search to optimize BLAST and HMMPFAM showed that QS-search accelerated the performance of these programs almost linearly in proportion to the number of CPUs used. We have also implemented a wrapper that utilizes a database segmentation approach (DS-BLAST) that provides a complementary solution for BLAST searches when the database is too large to fit into the memory of a single node. Used together, QS-search and DS-BLAST provide a flexible solution to adapt sequential similarity searching applications in high performance computing environments. Their ease of use and their ability to wrap a variety of database search programs provide an analytical architecture to assist both the seasoned bioinformaticist and the wet-bench biologist.

  9. SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters

    PubMed Central

    Wang, Chunlin; Lefkowitz, Elliot J

    2004-01-01

    Background Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST) or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary. Results We describe the implementation of SS-Wrapper (Similarity Search Wrapper), a package of wrapper applications that can parallelize similarity search applications on a Linux cluster. Our wrapper utilizes a query segmentation-search (QS-search) approach to parallelize sequence database search applications. It takes into consideration load balancing between each node on the cluster to maximize resource usage. QS-search is designed to wrap many different search tools, such as BLAST and HMMPFAM using the same interface. This implementation does not alter the original program, so newly obtained programs and program updates should be accommodated easily. Benchmark experiments using QS-search to optimize BLAST and HMMPFAM showed that QS-search accelerated the performance of these programs almost linearly in proportion to the number of CPUs used. We have also implemented a wrapper that utilizes a database segmentation approach (DS-BLAST) that provides a complementary solution for BLAST searches when the database is too large to fit into the memory of a single node. Conclusions Used together, QS-search and DS-BLAST provide a flexible solution to adapt sequential similarity searching applications in high performance computing environments. Their ease of use and their ability to wrap a variety of database search programs provide an analytical architecture to assist both the seasoned bioinformaticist and the wet-bench biologist. PMID:15511296

  10. Genes encoding Xenopus laevis Ig L chains: Implications for the evolution of [kappa] and [lambda] chains

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zezza, D.J.; Stewart, S.E.; Steiner, L.A.

    1992-12-15

    Xenopus laevis Ig contain two distinct types of L chains, designated [rho] or L1 and [sigma] or L2. The authors have analyzed Xenopus genomic DNA by Southern blotting with cDNA probes specific for L1 V and C regions. Many fragments hybridized to the V probe, but only one or two fragments hybridized to the C probe. Corresponding C, J, and V gene segments were identified on clones isolated from a genomic library prepared from the same DNA. One clone contains a C gene segment separated from a J gene segment by an intron of 3.4 kb. The J and Cmore » gene segments are nearly identical in sequence to cDNA clones analyzed previously. The C segment is somewhat more similar and the J segment considerably more similar in sequence to the corresponding segments of mammalian [kappa] chains than to those of mammalian [lambda] chains. Upstream of the J segment is a typical recombination signal sequence with a spacer of 23 bp, as in J[kappa]. A second clone from the library contains four V gene segments, separated by 2.1 to 3.6 kb. Two of these, V1 and V3, have the expected structural and regulatory features of V genes, and are very similar in sequence to each other and to mammalian V[kappa]. A third gene segment, V2, resembles V1 and V3 in its coding region and nearby 5[prime]-flanking region, but diverges in sequence 5[prime] to position [minus]95 with loss of the octamer promoter element. The fourth V-like segment is similar to the others at the 3[prime]-end, but upstream of codon 64 bears no resemblance in sequence to any Ig V region. All four V segments have typical recombination signal sequences with 12-bp spacers at their 3[prime]-ends, as in V[kappa]. Taken together, the data suggest that Xenopus L1 L chain genes are members of the [kappa] gene family. 80 refs., 9 figs.« less

  11. Construction of Pseudomolecule Sequences of the aus Rice Cultivar Kasalath for Comparative Genomics of Asian Cultivated Rice

    PubMed Central

    Sakai, Hiroaki; Kanamori, Hiroyuki; Arai-Kichise, Yuko; Shibata-Hatta, Mari; Ebana, Kaworu; Oono, Youko; Kurita, Kanako; Fujisawa, Hiroko; Katagiri, Satoshi; Mukai, Yoshiyuki; Hamada, Masao; Itoh, Takeshi; Matsumoto, Takashi; Katayose, Yuichi; Wakasa, Kyo; Yano, Masahiro; Wu, Jianzhong

    2014-01-01

    Having a deep genetic structure evolved during its domestication and adaptation, the Asian cultivated rice (Oryza sativa) displays considerable physiological and morphological variations. Here, we describe deep whole-genome sequencing of the aus rice cultivar Kasalath by using the advanced next-generation sequencing (NGS) technologies to gain a better understanding of the sequence and structural changes among highly differentiated cultivars. The de novo assembled Kasalath sequences represented 91.1% (330.55 Mb) of the genome and contained 35 139 expressed loci annotated by RNA-Seq analysis. We detected 2 787 250 single-nucleotide polymorphisms (SNPs) and 7393 large insertion/deletion (indel) sites (>100 bp) between Kasalath and Nipponbare, and 2 216 251 SNPs and 3780 large indels between Kasalath and 93-11. Extensive comparison of the gene contents among these cultivars revealed similar rates of gene gain and loss. We detected at least 7.39 Mb of inserted sequences and 40.75 Mb of unmapped sequences in the Kasalath genome in comparison with the Nipponbare reference genome. Mapping of the publicly available NGS short reads from 50 rice accessions proved the necessity and the value of using the Kasalath whole-genome sequence as an additional reference to capture the sequence polymorphisms that cannot be discovered by using the Nipponbare sequence alone. PMID:24578372

  12. Molecular characterization and phylogenetic relationships among microsporidian isolates infecting silkworm, Bombyx mori using small subunit rRNA (SSU-rRNA) gene sequence analysis.

    PubMed

    Nath, B Surendra; Gupta, S K; Bajpai, A K

    2012-12-01

    The life cycle, spore morphology, pathogenicity, tissue specificity, mode of transmission and small subunit rRNA (SSU-rRNA) gene sequence analysis of the five new microsporidian isolates viz., NIWB-11bp, NIWB-12n, NIWB-13md, NIWB-14b and NIWB-15mb identified from the silkworm, Bombyx mori have been studied along with type species, NIK-1s_mys. The life cycle of the microsporidians identified exhibited the sequential developmental cycles that are similar to the general developmental cycle of the genus, Nosema. The spores showed considerable variations in their shape, length and width. The pathogenicity observed was dose-dependent and differed from each of the microsporidian isolates; the NIWB-15mb was found to be more virulent than other isolates. All of the microsporidians were found to infect most of the tissues examined and showed gonadal infection and transovarial transmission in the infected silkworms. SSU-rRNA sequence based phylogenetic tree placed NIWB-14b, NIWB-12n and NIWB-11bp in a separate branch along with other Nosema species and Nosema bombycis; while NIWB-15mb and NIWB-13md together formed another cluster along with other Nosema species. NIK-1s_mys revealed a signature sequence similar to standard type species, N. bombycis, indicating that NIK-1s_mys is similar to N. bombycis. Based on phylogenetic relationships, branch length information based on genetic distance and nucleotide differences, we conclude that the microsporidian isolates identified are distinctly different from the other known species and belonging to the genus, Nosema. This SSU-rRNA gene sequence analysis method is found to be more useful approach in detecting different and closely related microsporidians of this economically important domestic insect.

  13. HYBRIDIZATION PROPERTIES OF DNA SEQUENCES DIRECTING THE SYNTHESIS OF MESSENGER RNA AND HETEROGENEOUS NUCLEAR RNA

    PubMed Central

    Greenberg, Jay R.; Perry, Robert P.

    1971-01-01

    The relationship of the DNA sequences from which polyribosomal messenger RNA (mRNA) and heterogeneous nuclear RNA (NRNA) of mouse L cells are transcribed was investigated by means of hybridization kinetics and thermal denaturation of the hybrids. Hybridization was performed in formamide solutions at DNA excess. Under these conditions most of the hybridizing mRNA and NRNA react at values of Dot (DNA concentration multiplied by time) expected for RNA transcribed from the nonrepeated or rarely repeated fraction of the genome. However, a fraction of both mRNA and NRNA hybridize at values of Dot about 10,000 times lower, and therefore must be transcribed from highly redundant DNA sequences. The fraction of NRNA hybridizing to highly repeated sequences is about 1.7 times greater than the corresponding fraction of mRNA. The hybrids formed by the rapidly reacting fractions of both NRNA and mRNA melt over a narrow temperature range with a midpoint about 11°C below that of native L cell DNA. This indicates that these hybrids consist of partially complementary sequences with approximately 11% mismatching of bases. Hybrids formed by the slowly reacting fraction of NRNA melt within 4°–6°C of native DNA, indicating very little, if any, mismatching of bases. Hybrids of the slowly reacting components of mRNA, formed under conditions of sufficiently low RNA input, have a high thermal stability, similar to that observed for hybrids of the slowly reacting NRNA component. However, when higher inputs of mRNA are used, hybrids are formed which have a strikingly lower thermal stability. This observation can be explained by assuming that there is sufficient similarity among the relatively rare DNA sequences coding for mRNA so that under hybridization conditions, in which these DNA sequences are not truly in excess, reversible hybrids exhibiting a considerable amount of mispairing are formed. The fact that a comparable phenomenon has not been observed for NRNA may mean that there is less similarity among the relatively rare DNA sequences coding for NRNA than there is among the rare sequences coding for mRNA. PMID:4999767

  14. High Potential Source for Biomass Degradation Enzyme Discovery and Environmental Aspects Revealed through Metagenomics of Indian Buffalo Rumen

    PubMed Central

    Singh, K. M.; Reddy, Bhaskar; Patel, Dishita; Patel, A. K.; Patel, J. B.; Joshi, C. G.

    2014-01-01

    The complex microbiomes of the rumen functions as an effective system for plant cell wall degradation, and biomass utilization provide genetic resource for degrading microbial enzymes that could be used in the production of biofuel. Therefore the buffalo rumen microbiota was surveyed using shot gun sequencing. This metagenomic sequencing generated 3.9 GB of sequences and data were assembled into 137270 contiguous sequences (contigs). We identified potential 2614 contigs encoding biomass degrading enzymes including glycoside hydrolases (GH: 1943 contigs), carbohydrate binding module (CBM: 23 contigs), glycosyl transferase (GT: 373 contigs), carbohydrate esterases (CE: 259 contigs), and polysaccharide lyases (PE: 16 contigs). The hierarchical clustering of buffalo metagenomes demonstrated the similarities and dissimilarity in microbial community structures and functional capacity. This demonstrates that buffalo rumen microbiome was considerably enriched in functional genes involved in polysaccharide degradation with great prospects to obtain new molecules that may be applied in the biofuel industry. PMID:25136572

  15. Streptococcus moroccensis sp. nov. and Streptococcus rifensis sp. nov., isolated from raw camel milk.

    PubMed

    Kadri, Zaina; Amar, Mohamed; Ouadghiri, Mouna; Cnockaert, Margo; Aerts, Maarten; El Farricha, Omar; Vandamme, Peter

    2014-07-01

    Two catalase- and oxidase-negative Streptococcus-like strains, LMG 27682(T) and LMG 27684(T), were isolated from raw camel milk in Morocco. Comparative 16S rRNA gene sequencing assigned these bacteria to the genus Streptococcus with Streptococcus rupicaprae 2777-2-07(T) as their closest phylogenetic neighbour (95.9% and 95.7% similarity, respectively). 16S rRNA gene sequence similarity between the two strains was 96.7%. Although strains LMG 27682(T) and LMG 27684(T) shared a DNA-DNA hybridization value that corresponded to the threshold level for species delineation (68%), the two strains could be distinguished by multiple biochemical tests, sequence analysis of the phenylalanyl-tRNA synthase (pheS), RNA polymerase (rpoA) and ATP synthase (atpA) genes and by their MALDI-TOF MS profiles. On the basis of these considerable phenotypic and genotypic differences, we propose to classify both strains as novel species of the genus Streptococcus, for which the names Streptococcus moroccensis sp. nov. (type strain, LMG 27682(T)  = CCMM B831(T)) and Streptococcus rifensis sp. nov. (type strain, LMG 27684(T)  = CCMM B833(T)) are proposed. © 2014 IUMS.

  16. Real-time filtering and detection of dynamics for compression of HDTV

    NASA Technical Reports Server (NTRS)

    Sauer, Ken D.; Bauer, Peter

    1991-01-01

    The preprocessing of video sequences for data compressing is discussed. The end goal associated with this is a compression system for HDTV capable of transmitting perceptually lossless sequences at under one bit per pixel. Two subtopics were emphasized to prepare the video signal for more efficient coding: (1) nonlinear filtering to remove noise and shape the signal spectrum to take advantage of insensitivities of human viewers; and (2) segmentation of each frame into temporally dynamic/static regions for conditional frame replenishment. The latter technique operates best under the assumption that the sequence can be modelled as a superposition of active foreground and static background. The considerations were restricted to monochrome data, since it was expected to use the standard luminance/chrominance decomposition, which concentrates most of the bandwidth requirements in the luminance. Similar methods may be applied to the two chrominance signals.

  17. Differences in the emergent coding properties of cortical and striatal ensembles

    PubMed Central

    Ma, L.; Hyman, J.M.; Lindsay, A.J.; Phillips, A.G.; Seamans, J.K.

    2016-01-01

    The function of a given brain region is often defined by the coding properties of its individual neurons, yet how this information is combined at the ensemble level is an equally important consideration. In the present study, multiple neurons from the anterior cingulate cortex (ACC) and the dorsal striatum (DS) were recorded simultaneously as rats performed different sequences of the same three actions. Sequence and lever decoding was remarkably similar on a per-neuron basis in the two regions. At the ensemble level, sequence-specific representations in the DS appeared synchronously but transiently along with the representation of lever location, while these two streams of information appeared independently and asynchronously in the ACC. As a result the ACC achieved superior ensemble decoding accuracy overall. Thus, the manner in which information was combined across neurons in an ensemble determined the functional separation of the ACC and DS on this task. PMID:24974796

  18. Mesorhizobium ciceri biovar biserrulae, a novel biovar nodulating the pasture legume Biserrula pelecinus L.

    PubMed

    Nandasena, Kemanthi G; O'Hara, Graham W; Tiwari, Ravi P; Willlems, Anne; Howieson, John G

    2007-05-01

    Biserrula pelecinus L. is a pasture legume species that forms a highly specific nitrogen-fixing symbiotic interaction with a group of bacteria that belong to Mesorhizobium. These mesorhizobia have >98.8 % sequence similarity to Mesorhizobium ciceri and Mesorhizobium loti for the 16S rRNA gene (1440 bp) and >99.3 % sequence similarity to M. ciceri for the dnaK gene (300 bp), and strain WSM1271 has 100 % sequence similarity to M. ciceri for GSII (600 bp). Strain WSM1271 had 85 % relatedness to M. ciceri LMG 14989(T) and 50 % relatedness to M. loti LMG 6125(T) when DNA-DNA hybridization was performed. WSM1271 also had a similar cellular fatty acid profile to M. ciceri. These results are strong evidence that the Biserrula mesorhizobia and M. ciceri belong to the same group of bacteria. Significant differences were revealed between the Biserrula mesorhizobia and M. ciceri in growth conditions, antibiotic resistance and carbon source utilization. The G+C content of the DNA of WSM1271 was 62.7 mol%, compared to 63-64 mol% for M. ciceri. The Biserrula mesorhizobia contained a plasmid ( approximately 500 bp), but the symbiotic genes were detected on a mobile symbiosis island and considerable variation was present in the symbiotic genes of Biserrula mesorhizobia and M. ciceri. There was <78.6 % sequence similarity for nodA and <66.9 % for nifH between Biserrula mesorhizobia and M. ciceri. Moreover, the Biserrula mesorhizobia did not nodulate the legume host of M. ciceri, Cicer arietinum, and M. ciceri did not nodulate B. pelecinus. These significant differences observed between Biserrula mesorhizobia and M. ciceri warrant the proposal of a novel biovar for Biserrula mesorhizobia within M. ciceri. The name Mesorhizobium ciceri biovar biserrulae is proposed, with strain WSM1271 (=LMG 23838=HAMBI 2942) as the reference strain.

  19. Tenebrio molitor antifreeze protein gene identification and regulation.

    PubMed

    Qin, Wensheng; Walker, Virginia K

    2006-02-15

    The yellow mealworm, Tenebrio molitor, is a freeze susceptible, stored product pest. Its winter survival is facilitated by the accumulation of antifreeze proteins (AFPs), encoded by a small gene family. We have now isolated 11 different AFP genomic clones from 3 genomic libraries. All the clones had a single coding sequence, with no evidence of intervening sequences. Three genomic clones were further characterized. All have putative TATA box sequences upstream of the coding regions and multiple potential poly(A) signal sequences downstream of the coding regions. A TmAFP regulatory region, B1037, conferred transcriptional activity when ligated to a luciferase reporter sequence and after transfection into an insect cell line. A 143 bp core promoter including a TATA box sequence was identified. Its promoter activity was increased 4.4 times by inserting an exotic 245 bp intron into the construct, similar to the enhancement of transgenic expression seen in several other systems. The addition of a duplication of the first 120 bp sequence from the 143 bp core promoter decreased promoter activity by half. Although putative hormonal response sequences were identified, none of the five hormones tested enhanced reporter activity. These studies on the mechanisms of AFP transcriptional control are important for the consideration of any transfer of freeze-resistance phenotypes to beneficial hosts.

  20. Comparative Genome and Proteome Analysis of Anopheles gambiae and Drosophila melanogaster

    NASA Astrophysics Data System (ADS)

    Zdobnov, Evgeny M.; von Mering, Christian; Letunic, Ivica; Torrents, David; Suyama, Mikita; Copley, Richard R.; Christophides, George K.; Thomasova, Dana; Holt, Robert A.; Subramanian, G. Mani; Mueller, Hans-Michael; Dimopoulos, George; Law, John H.; Wells, Michael A.; Birney, Ewan; Charlab, Rosane; Halpern, Aaron L.; Kokoza, Elena; Kraft, Cheryl L.; Lai, Zhongwu; Lewis, Suzanna; Louis, Christos; Barillas-Mury, Carolina; Nusskern, Deborah; Rubin, Gerald M.; Salzberg, Steven L.; Sutton, Granger G.; Topalis, Pantelis; Wides, Ron; Wincker, Patrick; Yandell, Mark; Collins, Frank H.; Ribeiro, Jose; Gelbart, William M.; Kafatos, Fotis C.; Bork, Peer

    2002-10-01

    Comparison of the genomes and proteomes of the two diptera Anopheles gambiae and Drosophila melanogaster, which diverged about 250 million years ago, reveals considerable similarities. However, numerous differences are also observed; some of these must reflect the selection and subsequent adaptation associated with different ecologies and life strategies. Almost half of the genes in both genomes are interpreted as orthologs and show an average sequence identity of about 56%, which is slightly lower than that observed between the orthologs of the pufferfish and human (diverged about 450 million years ago). This indicates that these two insects diverged considerably faster than vertebrates. Aligned sequences reveal that orthologous genes have retained only half of their intron/exon structure, indicating that intron gains or losses have occurred at a rate of about one per gene per 125 million years. Chromosomal arms exhibit significant remnants of homology between the two species, although only 34% of the genes colocalize in small ``microsyntenic'' clusters, and major interarm transfers as well as intra-arm shuffling of gene order are detected.

  1. Dual Task Automatic and Controlled Processing in Visual Search: Can It Be Done without Cost?

    DTIC Science & Technology

    1980-02-09

    transmitted. The sequence of learning to read is similar ( LaBerge and Samuels, 1!i74). Motor skill acquisition also shows a complex buildup of . skill to...has received considerable interest in recent years ( LaBerge , 1973, 1975, 1976; Posner and Snyder, 1975; Norman, 1976; Shiffrin and Schneider, 1977... Laberge and Samuels (1974) report that for beginning readers to increase chunking, the demand for accuracy may have to be relaxed. In the present

  2. Mitochondrial DNA Sequence Divergence among Meloidogyne incognita, Romanomermis culicivorax, Ascaris suum, and Caenorhabditis elegans

    PubMed Central

    Powers, T. O.; Harris, T. S.; Hyman, B. C.

    1993-01-01

    Mitochondrial DNA sequences were obtained from the NADH dehydrogenase subunit 3 (ND3), large rRNA, and cytochrome b genes from Meloidogyne incognita and Romanomermis culicivorax. Both species show considerable genetic distance within these same genes when compared with Caenorhabditis elegans or Ascaris suum, two species previously analyzed. Caenorhabditis, Ascaris, and Meloidogyne were selected as representatives of three subclasses in the nematode class Secernentea: Rhabditia, Spiruria, and Diplogasteria, respectively. Romanomermis served as a representative out-group of the class Adenophorea. The divergence between the phytoparasitic lineage (represented by Meloidogyne) and the three other species is so great that virtually every variable position in these genes appears to have accumulated multiple mutations, obscuring the phylogenetic information obtainable from these comparisons. The 39 and 42% amino acid similarity between the M. incognita and C. elegans ND3 and cytochrome b coding sequences, respectively, are approximately the same as those of C. elegans-mouse comparisons for the same genes (26 and 44%). This discovery calls into question the feasibility of employing cloned C. elegans probes as reagents to isolate phytoparasitic nematode genes. The genetic distance between the phytoparasitic nematode lineage and C. elegans markedly contrasts with the 79% amino acid similarity between C. elegans and A. suum for the same sequences. The molecular data suggest that Caenorhabditis and Ascaris belong to the same subclass. PMID:19279810

  3. Comparative genomic analysis of bacteriophages specific to the channel catfish pathogen Edwardsiella ictaluri

    PubMed Central

    2011-01-01

    Background The bacterial pathogen Edwardsiella ictaluri is a primary cause of mortality in channel catfish raised commercially in aquaculture farms. Additional treatment and diagnostic regimes are needed for this enteric pathogen, motivating the discovery and characterization of bacteriophages specific to E. ictaluri. Results The genomes of three Edwardsiella ictaluri-specific bacteriophages isolated from geographically distant aquaculture ponds, at different times, were sequenced and analyzed. The genomes for phages eiAU, eiDWF, and eiMSLS are 42.80 kbp, 42.12 kbp, and 42.69 kbp, respectively, and are greater than 95% identical to each other at the nucleotide level. Nucleotide differences were mostly observed in non-coding regions and in structural proteins, with significant variability in the sequences of putative tail fiber proteins. The genome organization of these phages exhibit a pattern shared by other Siphoviridae. Conclusions These E. ictaluri-specific phage genomes reveal considerable conservation of genomic architecture and sequence identity, even with considerable temporal and spatial divergence in their isolation. Their genomic homogeneity is similarly observed among E. ictaluri bacterial isolates. The genomic analysis of these phages supports the conclusion that these are virulent phages, lacking the capacity for lysogeny or expression of virulence genes. This study contributes to our knowledge of phage genomic diversity and facilitates studies on the diagnostic and therapeutic applications of these phages. PMID:21214923

  4. Computational Functional Analysis of Lipid Metabolic Enzymes.

    PubMed

    Bagnato, Carolina; Have, Arjen Ten; Prados, María B; Beligni, María V

    2017-01-01

    The computational analysis of enzymes that participate in lipid metabolism has both common and unique challenges when compared to the whole protein universe. Some of the hurdles that interfere with the functional annotation of lipid metabolic enzymes that are common to other pathways include the definition of proper starting datasets, the construction of reliable multiple sequence alignments, the definition of appropriate evolutionary models, and the reconstruction of phylogenetic trees with high statistical support, particularly for large datasets. Most enzymes that take part in lipid metabolism belong to complex superfamilies with many members that are not involved in lipid metabolism. In addition, some enzymes that do not have sequence similarity catalyze similar or even identical reactions. Some of the challenges that, albeit not unique, are more specific to lipid metabolism refer to the high compartmentalization of the routes, the catalysis in hydrophobic environments and, related to this, the function near or in biological membranes.In this work, we provide guidelines intended to assist in the proper functional annotation of lipid metabolic enzymes, based on previous experiences related to the phospholipase D superfamily and the annotation of the triglyceride synthesis pathway in algae. We describe a pipeline that starts with the definition of an initial set of sequences to be used in similarity-based searches and ends in the reconstruction of phylogenies. We also mention the main issues that have to be taken into consideration when using tools to analyze subcellular localization, hydrophobicity patterns, or presence of transmembrane domains in lipid metabolic enzymes.

  5. Considerations in video playback design: using optic flow analysis to examine motion characteristics of live and computer-generated animation sequences.

    PubMed

    Woo, Kevin L; Rieucau, Guillaume

    2008-07-01

    The increasing use of the video playback technique in behavioural ecology reveals a growing need to ensure better control of the visual stimuli that focal animals experience. Technological advances now allow researchers to develop computer-generated animations instead of using video sequences of live-acting demonstrators. However, care must be taken to match the motion characteristics (speed and velocity) of the animation to the original video source. Here, we presented a tool based on the use of an optic flow analysis program to measure the resemblance of motion characteristics of computer-generated animations compared to videos of live-acting animals. We examined three distinct displays (tail-flick (TF), push-up body rock (PUBR), and slow arm wave (SAW)) exhibited by animations of Jacky dragons (Amphibolurus muricatus) that were compared to the original video sequences of live lizards. We found no significant differences between the motion characteristics of videos and animations across all three displays. Our results showed that our animations are similar the speed and velocity features of each display. Researchers need to ensure that similar motion characteristics in animation and video stimuli are represented, and this feature is a critical component in the future success of the video playback technique.

  6. Validation of Splicing Events in Transcriptome Sequencing Data

    PubMed Central

    Kaisers, Wolfgang; Ptok, Johannes; Schwender, Holger; Schaal, Heiner

    2017-01-01

    Genomic alignments of sequenced cellular messenger RNA contain gapped alignments which are interpreted as consequence of intron removal. The resulting gap-sites, genomic locations of alignment gaps, are landmarks representing potential splice-sites. As alignment algorithms report gap-sites with a considerable false discovery rate, validations are required. We describe two quality scores, gap quality score (gqs) and weighted gap information score (wgis), developed for validation of putative splicing events: While gqs solely relies on alignment data wgis additionally considers information from the genomic sequence. FASTQ files obtained from 54 human dermal fibroblast samples were aligned against the human genome (GRCh38) using TopHat and STAR aligner. Statistical properties of gap-sites validated by gqs and wgis were evaluated by their sequence similarity to known exon-intron borders. Within the 54 samples, TopHat identifies 1,000,380 and STAR reports 6,487,577 gap-sites. Due to the lack of strand information, however, the percentage of identified GT-AG gap-sites is rather low. While gap-sites from TopHat contain ≈89% GT-AG, gap-sites from STAR only contain ≈42% GT-AG dinucleotide pairs in merged data from 54 fibroblast samples. Validation with gqs yields 156,251 gap-sites from TopHat alignments and 166,294 from STAR alignments. Validation with wgis yields 770,327 gap-sites from TopHat alignments and 1,065,596 from STAR alignments. Both alignment algorithms, TopHat and STAR, report gap-sites with considerable false discovery rate, which can drastically be reduced by validation with gqs and wgis. PMID:28545234

  7. A 3D sequence-independent representation of the protein data bank.

    PubMed

    Fischer, D; Tsai, C J; Nussinov, R; Wolfson, H

    1995-10-01

    Here we address the following questions. How many structurally different entries are there in the Protein Data Bank (PDB)? How do the proteins populate the structural universe? To investigate these questions a structurally non-redundant set of representative entries was selected from the PDB. Construction of such a dataset is not trivial: (i) the considerable size of the PDB requires a large number of comparisons (there were more than 3250 structures of protein chains available in May 1994); (ii) the PDB is highly redundant, containing many structurally similar entries, not necessarily with significant sequence homology, and (iii) there is no clear-cut definition of structural similarity. The latter depend on the criteria and methods used. Here, we analyze structural similarity ignoring protein topology. To date, representative sets have been selected either by hand, by sequence comparison techniques which ignore the three-dimensional (3D) structures of the proteins or by using sequence comparisons followed by linear structural comparison (i.e. the topology, or the sequential order of the chains, is enforced in the structural comparison). Here we describe a 3D sequence-independent automated and efficient method to obtain a representative set of protein molecules from the PDB which contains all unique structures and which is structurally non-redundant. The method has two novel features. The first is the use of strictly structural criteria in the selection process without taking into account the sequence information. To this end we employ a fast structural comparison algorithm which requires on average approximately 2 s per pairwise comparison on a workstation. The second novel feature is the iterative application of a heuristic clustering algorithm that greatly reduces the number of comparisons required. We obtain a representative set of 220 chains with resolution better than 3.0 A, or 268 chains including lower resolution entries, NMR entries and models. The resulting set can serve as a basis for extensive structural classification and studies of 3D recurring motifs and of sequence-structure relationships. The clustering algorithm succeeds in classifying into the same structural family chains with no significant sequence homology, e.g. all the globins in one single group, all the trypsin-like serine proteases in another or all the immunoglobulin-like folds into a third. In addition, unexpected structural similarities of interest have been automatically detected between pairs of chains. A cluster analysis of the representative structures demonstrates the way the "structural universe' is populated.

  8. In silico segmentations of lentivirus envelope sequences

    PubMed Central

    Boissin-Quillon, Aurélia; Piau, Didier; Leroux, Caroline

    2007-01-01

    Background The gene encoding the envelope of lentiviruses exhibits a considerable plasticity, particularly the region which encodes the surface (SU) glycoprotein. Interestingly, mutations do not appear uniformly along the sequence of SU, but they are clustered in restricted areas, called variable (V) regions, which are interspersed with relatively more stable regions, called constant (C) regions. We look for specific signatures of C/V regions, using hidden Markov models constructed with SU sequences of the equine, human, small ruminant and simian lentiviruses. Results Our models yield clear and accurate delimitations of the C/V regions, when the test set and the training set were made up of sequences of the same lentivirus, but also when they were made up of sequences of different lentiviruses. Interestingly, the models predicted the different regions of lentiviruses such as the bovine and feline lentiviruses, not used in the training set. Models based on composite training sets produce accurate segmentations of sequences of all these lentiviruses. Conclusion Our results suggest that each C/V region has a specific statistical oligonucleotide composition, and that the C (respectively V) regions of one of these lentiviruses are statistically more similar to the C (respectively V) regions of the other lentiviruses, than to the V (respectively C) regions of the same lentivirus. PMID:17376229

  9. Morphologic and Molecular Characterization of a Demodex (Acari: Demodicidae) Species from White-Tailed Deer (Odocoileus virginianus)

    PubMed Central

    Yabsley, Michael J.; Clay, Sarah E.; Gibbs, Samantha E. J.; Cunningham, Mark W.; Austel, Michaela G.

    2013-01-01

    Demodex mites, although usually nonpathogenic, can cause a wide range of dermatological lesions ranging from mild skin irritation and alopecia to severe furunculosis. Recently, a case of demodicosis from a white-tailed deer (Odocoileus virginianus) revealed a Demodex species morphologically distinct from Demodex odocoilei. All life cycle stages were considerably larger than D. odocoilei and although similar in size to D. kutzeri and D. acutipes from European cervids, numerous morphometrics distinguished the four species. Adult males and females were 209.1 ± 13.1 and 225.5 ± 13.4 μm in length, respectively. Ova, larva, and nymphs measured 65.1 ± 4.1, 124.9 ± 11.6, and 205.1 ± 19.4 μm in length, respectively. For phylogenetic analyses, a portion of the 18S rRNA gene was amplified and sequenced from samples of the WTD Demodex sp., two Demodex samples from domestic dogs, and Demodex ursi from a black bear. Phylogenetic analyses indicated that the WTD Demodex was most similar to D. musculi from laboratory mice. A partial sequence from D. ursi was identical to the WTD Demodex sequence; however, these two species can be differentiated morphologically. This paper describes a second Demodex species from white-tailed deer and indicates that 18S rRNA is useful for phylogenetic analysis of most Demodex species, but two morphologically distinct species had identical partial sequences. Additional gene targets should be investigated for phylogenetic and parasite-host association studies. PMID:27335854

  10. Molecular cloning and characterization of RGA1 encoding a G protein alpha subunit from rice (Oryza sativa L. IR-36).

    PubMed

    Seo, H S; Kim, H Y; Jeong, J Y; Lee, S Y; Cho, M J; Bahk, J D

    1995-03-01

    A cDNA clone, RGA1, was isolated by using a GPA1 cDNA clone of Arabidopsis thaliana G protein alpha subunit as a probe from a rice (Oryza sativa L. IR-36) seedling cDNA library from roots and leaves. Sequence analysis of genomic clone reveals that the RGA1 gene has 14 exons and 13 introns, and encodes a polypeptide of 380 amino acid residues with a calculated molecular weight of 44.5 kDa. The encoded protein exhibits a considerable degree of amino acid sequence similarity to all the other known G protein alpha subunits. A putative TATA sequence (ATATGA), a potential CAAT box sequence (AGCAATAC), and a cis-acting element, CCACGTGG (ABRE), known to be involved in ABA induction are found in the promoter region. The RGA1 protein contains all the consensus regions of G protein alpha subunits except the cysteine residue near the C-terminus for ADP-ribosylation by pertussis toxin. The RGA1 polypeptide expressed in Escherichia coli was, however, ADP-ribosylated by 10 microM [adenylate-32P] NAD and activated cholera toxin. Southern analysis indicates that there are no other genes similar to the RGA1 gene in the rice genome. Northern analysis reveals that the RGA1 mRNA is 1.85 kb long and expressed in vegetative tissues, including leaves and roots, and that its expression is regulated by light.

  11. RAG-3D: A search tool for RNA 3D substructures

    DOE PAGES

    Zahran, Mai; Sevim Bayrak, Cigdem; Elmetwaly, Shereef; ...

    2015-08-24

    In this study, to address many challenges in RNA structure/function prediction, the characterization of RNA's modular architectural units is required. Using the RNA-As-Graphs (RAG) database, we have previously explored the existence of secondary structure (2D) submotifs within larger RNA structures. Here we present RAG-3D—a dataset of RNA tertiary (3D) structures and substructures plus a web-based search tool—designed to exploit graph representations of RNAs for the goal of searching for similar 3D structural fragments. The objects in RAG-3D consist of 3D structures translated into 3D graphs, cataloged based on the connectivity between their secondary structure elements. Each graph is additionally describedmore » in terms of its subgraph building blocks. The RAG-3D search tool then compares a query RNA 3D structure to those in the database to obtain structurally similar structures and substructures. This comparison reveals conserved 3D RNA features and thus may suggest functional connections. Though RNA search programs based on similarity in sequence, 2D, and/or 3D structural elements are available, our graph-based search tool may be advantageous for illuminating similarities that are not obvious; using motifs rather than sequence space also reduces search times considerably. Ultimately, such substructuring could be useful for RNA 3D structure prediction, structure/function inference and inverse folding.« less

  12. RAG-3D: a search tool for RNA 3D substructures

    PubMed Central

    Zahran, Mai; Sevim Bayrak, Cigdem; Elmetwaly, Shereef; Schlick, Tamar

    2015-01-01

    To address many challenges in RNA structure/function prediction, the characterization of RNA's modular architectural units is required. Using the RNA-As-Graphs (RAG) database, we have previously explored the existence of secondary structure (2D) submotifs within larger RNA structures. Here we present RAG-3D—a dataset of RNA tertiary (3D) structures and substructures plus a web-based search tool—designed to exploit graph representations of RNAs for the goal of searching for similar 3D structural fragments. The objects in RAG-3D consist of 3D structures translated into 3D graphs, cataloged based on the connectivity between their secondary structure elements. Each graph is additionally described in terms of its subgraph building blocks. The RAG-3D search tool then compares a query RNA 3D structure to those in the database to obtain structurally similar structures and substructures. This comparison reveals conserved 3D RNA features and thus may suggest functional connections. Though RNA search programs based on similarity in sequence, 2D, and/or 3D structural elements are available, our graph-based search tool may be advantageous for illuminating similarities that are not obvious; using motifs rather than sequence space also reduces search times considerably. Ultimately, such substructuring could be useful for RNA 3D structure prediction, structure/function inference and inverse folding. PMID:26304547

  13. RAG-3D: A search tool for RNA 3D substructures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zahran, Mai; Sevim Bayrak, Cigdem; Elmetwaly, Shereef

    In this study, to address many challenges in RNA structure/function prediction, the characterization of RNA's modular architectural units is required. Using the RNA-As-Graphs (RAG) database, we have previously explored the existence of secondary structure (2D) submotifs within larger RNA structures. Here we present RAG-3D—a dataset of RNA tertiary (3D) structures and substructures plus a web-based search tool—designed to exploit graph representations of RNAs for the goal of searching for similar 3D structural fragments. The objects in RAG-3D consist of 3D structures translated into 3D graphs, cataloged based on the connectivity between their secondary structure elements. Each graph is additionally describedmore » in terms of its subgraph building blocks. The RAG-3D search tool then compares a query RNA 3D structure to those in the database to obtain structurally similar structures and substructures. This comparison reveals conserved 3D RNA features and thus may suggest functional connections. Though RNA search programs based on similarity in sequence, 2D, and/or 3D structural elements are available, our graph-based search tool may be advantageous for illuminating similarities that are not obvious; using motifs rather than sequence space also reduces search times considerably. Ultimately, such substructuring could be useful for RNA 3D structure prediction, structure/function inference and inverse folding.« less

  14. Candidate chemosensory ionotropic receptors in a Lepidoptera.

    PubMed

    Olivier, V; Monsempes, C; François, M-C; Poivet, E; Jacquin-Joly, E

    2011-04-01

    A new family of candidate chemosensory ionotropic receptors (IRs) related to ionotropic glutamate receptors (iGluRs) was recently discovered in Drosophila melanogaster. Through Blast analyses of an expressed sequenced tag library prepared from male antennae of the noctuid moth Spodoptera littoralis, we identified 12 unigenes encoding proteins related to D. melanogaster and Bombyx mori IRs. Their full length sequences were obtained and the analyses of their expression patterns suggest that they were exclusively expressed or clearly enriched in chemosensory organs. The deduced protein sequences were more similar to B. mori and D. melanogaster IRs than to iGluRs and showed considerable variations in the predicted ligand-binding domains; none have the three glutamate-interacting residues found in iGluRs, suggesting different binding specificities. Our data suggest that we identified members of the insect IR chemosensory receptor family in S. littoralis and we report here the first demonstration of IR expression in Lepidoptera. © 2010 The Authors. Insect Molecular Biology © 2010 The Royal Entomological Society.

  15. Microbial community structure in the gut of the New Zealand insect Auckland tree weta (Hemideina thoracica).

    PubMed

    Waite, David W; Dsouza, Melissa; Biswas, Kristi; Ward, Darren F; Deines, Peter; Taylor, Michael W

    2015-05-01

    The endemic New Zealand weta is an enigmatic insect. Although the insect is well known by its distinctive name, considerable size, and morphology, many basic aspects of weta biology remain unknown. Here, we employed cultivation-independent enumeration techniques and rRNA gene sequencing to investigate the gut microbiota of the Auckland tree weta (Hemideina thoracica). Fluorescence in situ hybridisation performed on different sections of the gut revealed a bacterial community of fluctuating density, while rRNA gene-targeted amplicon pyrosequencing revealed the presence of a microbial community containing high bacterial diversity, but an apparent absence of archaea. Bacteria were further studied using full-length 16S rRNA gene sequences, with statistical testing of bacterial community membership against publicly available termite- and cockroach-derived sequences, revealing that the weta gut microbiota is similar to that of cockroaches. These data represent the first analysis of the weta microbiota and provide initial insights into the potential function of these microorganisms.

  16. Detection and characterization of hepatitis A virus circulating in Egypt.

    PubMed

    Hamza, Hazem; Abd-Elshafy, Dina Nadeem; Fayed, Sayed A; Bahgat, Mahmoud Mohamed; El-Esnawy, Nagwa Abass; Abdel-Mobdy, Emam

    2017-07-01

    Hepatitis A virus (HAV) still poses a considerable problem worldwide. In the current study, hepatitis A virus was recovered from wastewater samples collected from three wastewater treatment plants over one year. Using RT-PCR, HAV was detected in 43 out of 68 samples (63.2%) representing both inlet and outlet. Eleven positive samples were subjected to sequencing targeting the VP1-2A junction region. Phylogenetic analysis revealed that all samples belonged to subgenotype IB with few substitutions at the amino acid level. The complete sequence of one isolate (HAV/Egy/BI-11/2015) showed that the similarity at the amino acid level was not reflected at the nucleotide level. However, the deduced amino acid sequence derived from the complete nucleotide sequence showed distinct substitutions in the 2B, 2C, and 3A regions. Recombination analysis revealed a recombination event between X75215 (subgenotype IA) and AF268396 (subgenotype IB) involving a portion of the 2B nonstructural protein coding region (nucleotides 3757-3868) assuming the herein characterized sequence an actual recombinant. Despite the role of recombination in picornaviruses evolution, its involvement in HAV evolution has rarely been reported, and this may be due to the limited available complete HAV sequences. To our knowledge, this represents the first characterized complete sequence of an Egyptian isolate and the described recombination event provides an important update on the circulating HAV strains in Egypt.

  17. Not all transmembrane helices are born equal: Towards the extension of the sequence homology concept to membrane proteins

    PubMed Central

    2011-01-01

    Background Sequence homology considerations widely used to transfer functional annotation to uncharacterized protein sequences require special precautions in the case of non-globular sequence segments including membrane-spanning stretches composed of non-polar residues. Simple, quantitative criteria are desirable for identifying transmembrane helices (TMs) that must be included into or should be excluded from start sequence segments in similarity searches aimed at finding distant homologues. Results We found that there are two types of TMs in membrane-associated proteins. On the one hand, there are so-called simple TMs with elevated hydrophobicity, low sequence complexity and extraordinary enrichment in long aliphatic residues. They merely serve as membrane-anchoring device. In contrast, so-called complex TMs have lower hydrophobicity, higher sequence complexity and some functional residues. These TMs have additional roles besides membrane anchoring such as intra-membrane complex formation, ligand binding or a catalytic role. Simple and complex TMs can occur both in single- and multi-membrane-spanning proteins essentially in any type of topology. Whereas simple TMs have the potential to confuse searches for sequence homologues and to generate unrelated hits with seemingly convincing statistical significance, complex TMs contain essential evolutionary information. Conclusion For extending the homology concept onto membrane proteins, we provide a necessary quantitative criterion to distinguish simple TMs (and a sufficient criterion for complex TMs) in query sequences prior to their usage in homology searches based on assessment of hydrophobicity and sequence complexity of the TM sequence segments. Reviewers This article was reviewed by Shamil Sunyaev, L. Aravind and Arcady Mushegian. PMID:22024092

  18. A discriminative structural similarity measure and its application to video-volume registration for endoscope three-dimensional motion tracking.

    PubMed

    Luo, Xiongbiao; Mori, Kensaku

    2014-06-01

    Endoscope 3-D motion tracking, which seeks to synchronize pre- and intra-operative images in endoscopic interventions, is usually performed as video-volume registration that optimizes the similarity between endoscopic video and pre-operative images. The tracking performance, in turn, depends significantly on whether a similarity measure can successfully characterize the difference between video sequences and volume rendering images driven by pre-operative images. The paper proposes a discriminative structural similarity measure, which uses the degradation of structural information and takes image correlation or structure, luminance, and contrast into consideration, to boost video-volume registration. By applying the proposed similarity measure to endoscope tracking, it was demonstrated to be more accurate and robust than several available similarity measures, e.g., local normalized cross correlation, normalized mutual information, modified mean square error, or normalized sum squared difference. Based on clinical data evaluation, the tracking error was reduced significantly from at least 14.6 mm to 4.5 mm. The processing time was accelerated more than 30 frames per second using graphics processing unit.

  19. Application of representational difference analysis to identify genomic differences between Bradyrhizobium elkanii and B. Japonicum species.

    PubMed

    Soares, René Arderius; Passaglia, Luciane Maria Pereira

    2010-10-01

    Bradyrhizobium elkanii is successfully used in the formulation of commercial inoculants and, together with B. japonicum, it fully supplies the plant nitrogen demands. Despite the similarity between B. japonicum and B. elkanii species, several works demonstrated genetic and physiological differences between them. In this work Representational Difference Analysis (RDA) was used for genomic comparison between B. elkanii SEMIA 587, a crop inoculant strain, and B. japonicum USDA 110, a reference strain. Two hundred sequences were obtained. From these, 46 sequences belonged exclusively to the genome of B. elkanii strain, and 154 showed similarity to sequences from B. japonicum genome. From the 46 sequences with no similarity to sequences from B. japonicum, 39 showed no similarity to sequences in public databases and seven showed similarity to sequences of genes coding for known proteins. These seven sequences were divided in three groups: similar to sequences from other Bradyrhizobium strains, similar to sequences from other nitrogen-fixing bacteria, and similar to sequences from non nitrogen-fixing bacteria. These new sequences could be used as DNA markers in order to investigate the rates of genetic material gain and loss in natural Bradyrhizobium strains.

  20. Precursors of vertebrate peptide antibiotics dermaseptin b and adenoregulin have extensive sequence identities with precursors of opioid peptides dermorphin, dermenkephalin, and deltorphins.

    PubMed

    Amiche, M; Ducancel, F; Mor, A; Boulain, J C; Menez, A; Nicolas, P

    1994-07-08

    The dermaseptins are a family of broad spectrum antimicrobial peptides, 27-34 amino acids long, involved in the defense of the naked skin of frogs against microbial invasion. They are the first vertebrate peptides to show lethal effects against the filamentous fungi responsible for severe opportunistic infections accompanying immunodeficiency syndrome and the use of immunosuppressive agents. A cDNA library was constructed from skin poly(A+) RNA of the arboreal frog Phyllomedusa bicolor and screened with an oligonucleotide probe complementary to the COOH terminus of dermaseptin b. Several clones contained a full-length DNA copy of a 443-nucleotide mRNA that encoded a 78-residue dermaseptin b precursor protein. The deduced precursor contained a putative signal sequence at the NH2 terminus, a 20-residue spacer sequence extremely rich (60%) in glutamic and aspartic acids, and a single copy of a dermaseptin b progenitor sequence at the COOH terminus. One clone contained a complete copy of adenoregulin, a 33-residue peptide reported to enhance the binding of agonists to the A1 adenosine receptor. The mRNAs encoding adenoregulin and dermaseptin b were very similar: 70 and 75% nucleotide identities between the 5'- and 3'-untranslated regions, respectively; 91% amino acid identity between the signal peptides; 82% identity between the acidic spacer sequences; and 38% identity between adenoregulin and dermaseptin b. Because adenoregulin and dermaseptin b have similar precursor designs and antimicrobial spectra, adenoregulin should be considered as a new member of the dermaseptin family and alternatively named dermaseptin b II. Preprodermaseptin b and preproadenoregulin have considerable sequence identities to the precursors encoding the opioid heptapeptides dermorphin, dermenkephalin, and deltorphins. This similarity extended into the 5'-untranslated regions of the mRNAs. These findings suggest that the genes encoding the four preproproteins are all members of the same family despite the fact that they encode end products having very different biological activities. These genes might contain a homologous export exon comprising the 5'-untranslated region, the 22-residue signal peptide, the 20-24-residue acidic spacer, and the basic pair Lys-Arg.

  1. Bioelectrochemical removal of carbon dioxide (CO2): an innovative method for biogas upgrading.

    PubMed

    Xu, Heng; Wang, Kaijun; Holmes, Dawn E

    2014-12-01

    Innovative methods for biogas upgrading based on biological/in-situ concepts have started to arouse considerable interest. Bioelectrochemical removal of CO2 for biogas upgrading was proposed here and demonstrated in both batch and continuous experiments. The in-situ biogas upgrading system seemed to perform better than the ex-situ one, but CO2 content was kept below 10% in both systems. The in-situ system's performance was further enhanced under continuous operation. Hydrogenotrophic methanogenesis and alkali production with CO2 absorption could be major contributors to biogas upgrading. Molecular studies showed that all the biocathodes associated with biogas upgrading were dominated by sequences most similar to the same hydrogenotrophic methanogen species, Methanobacterium petrolearium (97-99% sequence identity). Conclusively, bioelectrochemical removal of CO2 showed great potential for biogas upgrading. Copyright © 2014 Elsevier Ltd. All rights reserved.

  2. Rebelling for a Reason: Protein Structural “Outliers”

    PubMed Central

    Arumugam, Gandhimathi; Nair, Anu G.; Hariharaputran, Sridhar; Ramanathan, Sowdhamini

    2013-01-01

    Analysis of structural variation in domain superfamilies can reveal constraints in protein evolution which aids protein structure prediction and classification. Structure-based sequence alignment of distantly related proteins, organized in PASS2 database, provides clues about structurally conserved regions among different functional families. Some superfamily members show large structural differences which are functionally relevant. This paper analyses the impact of structural divergence on function for multi-member superfamilies, selected from the PASS2 superfamily alignment database. Functional annotations within superfamilies, with structural outliers or ‘rebels’, are discussed in the context of structural variations. Overall, these data reinforce the idea that functional similarities cannot be extrapolated from mere structural conservation. The implication for fold-function prediction is that the functional annotations can only be inherited with very careful consideration, especially at low sequence identities. PMID:24073209

  3. The future is now: single-cell genomics of bacteria and archaea

    PubMed Central

    Blainey, Paul C.

    2013-01-01

    Interest in the expanding catalog of uncultivated microorganisms, increasing recognition of heterogeneity among seemingly similar cells, and technological advances in whole-genome amplification and single-cell manipulation are driving considerable progress in single-cell genomics. Here, the spectrum of applications for single-cell genomics, key advances in the development of the field, and emerging methodology for single-cell genome sequencing are reviewed by example with attention to the diversity of approaches and their unique characteristics. Experimental strategies transcending specific methodologies are identified and organized as a road map for future studies in single-cell genomics of environmental microorganisms. Over the next decade, increasingly powerful tools for single-cell genome sequencing and analysis will play key roles in accessing the genomes of uncultivated organisms, determining the basis of microbial community functions, and fundamental aspects of microbial population biology. PMID:23298390

  4. ITS all right mama: investigating the formation of chimeric sequences in the ITS2 region by DNA metabarcoding analyses of fungal mock communities of different complexities.

    PubMed

    Bjørnsgaard Aas, Anders; Davey, Marie Louise; Kauserud, Håvard

    2017-07-01

    The formation of chimeric sequences can create significant methodological bias in PCR-based DNA metabarcoding analyses. During mixed-template amplification of barcoding regions, chimera formation is frequent and well documented. However, profiling of fungal communities typically uses the more variable rDNA region ITS. Due to a larger research community, tools for chimera detection have been developed mainly for the 16S/18S markers. However, these tools are widely applied to the ITS region without verification of their performance. We examined the rate of chimera formation during amplification and 454 sequencing of the ITS2 region from fungal mock communities of different complexities. We evaluated the chimera detecting ability of two common chimera-checking algorithms: perseus and uchime. Large proportions of the chimeras reported were false positives. No false negatives were found in the data set. Verified chimeras accounted for only 0.2% of the total ITS2 reads, which is considerably less than what is typically reported in 16S and 18S metabarcoding analyses. Verified chimeric 'parent sequences' had significantly higher per cent identity to one another than to random members of the mock communities. Community complexity increased the rate of chimera formation. GC content was higher around the verified chimeric break points, potentially facilitating chimera formation through base pair mismatching in the neighbouring regions of high similarity in the chimeric region. We conclude that the hypervariable nature of the ITS region seems to buffer the rate of chimera formation in comparison with other, less variable barcoding regions, due to shorter regions of high sequence similarity. © 2016 John Wiley & Sons Ltd.

  5. Whole genome sequence revealed the fine transmission map of carbapenem-resistant Klebsiella pneumonia isolates within a nosocomial outbreak.

    PubMed

    Sui, Wenjun; Zhou, Haijian; Du, Pengcheng; Wang, Lijun; Qin, Tian; Wang, Mei; Ren, Hongyu; Huang, Yanfei; Hou, Jing; Chen, Chen; Lu, Xinxin

    2018-01-01

    Carbapenem-resistant Klebsiella pneumoniae (CRKP) is a major cause of nosocomial infections worldwide. The transmission route of CRKP isolates within an outbreak is rarely described. This study aimed to reveal the molecular characteristics and transmission route of CRKP isolates within an outbreak of nosocomial infection. Collecting case information, active screening and targeted environmental monitoring were carried out. The antibiotic susceptibility, drug-resistant genes, molecular subtype and whole genome sequence of CRKP strains were analyzed. Between October and December 2011, 26 CRKP isolates were collected from eight patients in a surgical intensive care unit and subsequent transfer wards of Beijing Tongren hospital, China. All 26 isolates harbored bla KPC-2 , bla SHV-1 , and bla CTX-M-15 genes, had the same or similar pulsed-field gel electrophoresis patterns, and belonged to the sequence type 11 (ST11) clone. By comprehensive consideration of genomic and epidemiological information, a putative transmission map was constructed, including identifying one case as an independent event distinct from the other seven cases, and revealing two transmissions starting from the same case. This study provided the first report confirming an outbreak caused by K. pneumoniae ST11 clone co-harboring the bla KPC-2 , bla CTX-M-15 , and bla SHV-1 genes, and suggested that comprehensive consideration of genomic and epidemiological data can yield a fine transmission map of an outbreak and facilitate the control of nosocomial transmission.

  6. De Novo Protein Structure Prediction

    NASA Astrophysics Data System (ADS)

    Hung, Ling-Hong; Ngan, Shing-Chung; Samudrala, Ram

    An unparalleled amount of sequence data is being made available from large-scale genome sequencing efforts. The data provide a shortcut to the determination of the function of a gene of interest, as long as there is an existing sequenced gene with similar sequence and of known function. This has spurred structural genomic initiatives with the goal of determining as many protein folds as possible (Brenner and Levitt, 2000; Burley, 2000; Brenner, 2001; Heinemann et al., 2001). The purpose of this is twofold: First, the structure of a gene product can often lead to direct inference of its function. Second, since the function of a protein is dependent on its structure, direct comparison of the structures of gene products can be more sensitive than the comparison of sequences of genes for detecting homology. Presently, structural determination by crystallography and NMR techniques is still slow and expensive in terms of manpower and resources, despite attempts to automate the processes. Computer structure prediction algorithms, while not providing the accuracy of the traditional techniques, are extremely quick and inexpensive and can provide useful low-resolution data for structure comparisons (Bonneau and Baker, 2001). Given the immense number of structures which the structural genomic projects are attempting to solve, there would be a considerable gain even if the computer structure prediction approach were applicable to a subset of proteins.

  7. Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification.

    PubMed

    Borozan, Ivan; Watt, Stuart; Ferretti, Vincent

    2015-05-01

    Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. ivan.borozan@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  8. Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification

    PubMed Central

    Borozan, Ivan; Watt, Stuart; Ferretti, Vincent

    2015-01-01

    Motivation: Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Results: Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. Availability and implementation: All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. Contact: ivan.borozan@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25573913

  9. Biodiversity hot spot on a hot spot: novel extremophile diversity in Hawaiian fumaroles.

    PubMed

    Wall, Kate; Cornell, Jennifer; Bizzoco, Richard W; Kelley, Scott T

    2015-01-06

    Fumaroles (steam vents) are the most common, yet least understood, microbial habitat in terrestrial geothermal settings. Long believed too extreme for life, recent advances in sample collection and DNA extraction methods have found that fumarole deposits and subsurface waters harbor a considerable diversity of viable microbes. In this study, we applied culture-independent molecular methods to explore fumarole deposit microbial assemblages in 15 different fumaroles in four geographic locations on the Big Island of Hawai'i. Just over half of the vents yielded sufficient high-quality DNA for the construction of 16S ribosomal RNA gene sequence clone libraries. The bacterial clone libraries contained sequences belonging to 11 recognized bacterial divisions and seven other division-level phylogenetic groups. Archaeal sequences were less numerous, but similarly diverse. The taxonomic composition among fumarole deposits was highly heterogeneous. Phylogenetic analysis found cloned fumarole sequences were related to microbes identified from a broad array of globally distributed ecotypes, including hot springs, terrestrial soils, and industrial waste sites. Our results suggest that fumarole deposits function as an "extremophile collector" and may be a hot spot of novel extremophile biodiversity. © 2015 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.

  10. Biodiversity hot spot on a hot spot: novel extremophile diversity in Hawaiian fumaroles

    PubMed Central

    Wall, Kate; Cornell, Jennifer; Bizzoco, Richard W; Kelley, Scott T

    2015-01-01

    Fumaroles (steam vents) are the most common, yet least understood, microbial habitat in terrestrial geothermal settings. Long believed too extreme for life, recent advances in sample collection and DNA extraction methods have found that fumarole deposits and subsurface waters harbor a considerable diversity of viable microbes. In this study, we applied culture-independent molecular methods to explore fumarole deposit microbial assemblages in 15 different fumaroles in four geographic locations on the Big Island of Hawai'i. Just over half of the vents yielded sufficient high-quality DNA for the construction of 16S ribosomal RNA gene sequence clone libraries. The bacterial clone libraries contained sequences belonging to 11 recognized bacterial divisions and seven other division-level phylogenetic groups. Archaeal sequences were less numerous, but similarly diverse. The taxonomic composition among fumarole deposits was highly heterogeneous. Phylogenetic analysis found cloned fumarole sequences were related to microbes identified from a broad array of globally distributed ecotypes, including hot springs, terrestrial soils, and industrial waste sites. Our results suggest that fumarole deposits function as an “extremophile collector” and may be a hot spot of novel extremophile biodiversity. PMID:25565172

  11. Evaluation of atpB nucleotide sequences for phylogenetic studies of ferns and other pteridophytes.

    PubMed

    Wolf, P

    1997-10-01

    Inferring basal relationships among vascular plants poses a major challenge to plant systematists. The divergence events that describe these relationships occurred long ago and considerable homoplasy has since accrued for both molecular and morphological characters. A potential solution is to examine phylogenetic analyses from multiple data sets. Here I present a new source of phylogenetic data for ferns and other pteridophytes. I sequenced the chloroplast gene atpB from 23 pteridophyte taxa and used maximum parsimony to infer relationships. A 588-bp region of the gene appeared to contain a statistically significant amount of phylogenetic signal and the resulting trees were largely congruent with similar analyses of nucleotide sequences from rbcL. However, a combined analysis of atpB plus rbcL produced a better resolved tree than did either data set alone. In the shortest trees, leptosporangiate ferns formed a monophyletic group. Also, I detected a well-supported clade of Psilotaceae (Psilotum and Tmesipteris) plus Ophioglossaceae (Ophioglossum and Botrychium). The demonstrated utility of atpB suggests that sequences from this gene should play a role in phylogenetic analyses that incorporate data from chloroplast genes, nuclear genes, morphology, and fossil data.

  12. Ebolavirus comparative genomics

    PubMed Central

    Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat; Uberbacher, Edward C.; Land, Miriam; Zhang, Qian; Wanchai, Visanu; Chai, Juanjuan; Nielsen, Morten; Trolle, Thomas; Lund, Ole; Buzard, Gregory S.; Pedersen, Thomas D.; Wassenaar, Trudy M.; Ussery, David W.

    2015-01-01

    The 2014 Ebola outbreak in West Africa is the largest documented for this virus. To examine the dynamics of this genome, we compare more than 100 currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of the same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP) and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. This information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). PMID:26175035

  13. Information categorization approach to literary authorship disputes

    NASA Astrophysics Data System (ADS)

    Yang, Albert C.-C.; Peng, C.-K.; Yien, H.-W.; Goldberger, Ary L.

    2003-11-01

    Scientific analysis of the linguistic styles of different authors has generated considerable interest. We present a generic approach to measuring the similarity of two symbolic sequences that requires minimal background knowledge about a given human language. Our analysis is based on word rank order-frequency statistics and phylogenetic tree construction. We demonstrate the applicability of this method to historic authorship questions related to the classic Chinese novel “The Dream of the Red Chamber,” to the plays of William Shakespeare, and to the Federalist papers. This method may also provide a simple approach to other large databases based on their information content.

  14. A generalized global alignment algorithm.

    PubMed

    Huang, Xiaoqiu; Chao, Kun-Mao

    2003-01-22

    Homologous sequences are sometimes similar over some regions but different over other regions. Homologous sequences have a much lower global similarity if the different regions are much longer than the similar regions. We present a generalized global alignment algorithm for comparing sequences with intermittent similarities, an ordered list of similar regions separated by different regions. A generalized global alignment model is defined to handle sequences with intermittent similarities. A dynamic programming algorithm is designed to compute an optimal general alignment in time proportional to the product of sequence lengths and in space proportional to the sum of sequence lengths. The algorithm is implemented as a computer program named GAP3 (Global Alignment Program Version 3). The generalized global alignment model is validated by experimental results produced with GAP3 on both DNA and protein sequences. The GAP3 program extends the ability of standard global alignment programs to recognize homologous sequences of lower similarity. The GAP3 program is freely available for academic use at http://bioinformatics.iastate.edu/aat/align/align.html.

  15. Genome sequence analysis of dengue virus 1 isolated in Key West, Florida.

    PubMed

    Shin, Dongyoung; Richards, Stephanie L; Alto, Barry W; Bettinardi, David J; Smartt, Chelsea T

    2013-01-01

    Dengue virus (DENV) is transmitted to humans through the bite of mosquitoes. In November 2010, a dengue outbreak was reported in Monroe County in southern Florida (FL), including greater than 20 confirmed human cases. The virus collected from the human cases was verified as DENV serotype 1 (DENV-1) and one isolate was provided for sequence analysis. RNA was extracted from the DENV-1 isolate and was used in reverse transcription polymerase chain reaction (RT-PCR) to amplify PCR fragments to sequence. Nucleic acid primers were designed to generate overlapping PCR fragments that covered the entire genome. The DENV-1 isolate found in Key West (KW), FL was sequenced for whole genome characterization. Sequence assembly, Genbank searches, and recombination analyses were performed to verify the identity of the genome sequences and to determine percent similarity to known DENV-1 sequences. We show that the KW DENV-1 strain is 99% identical to Nicaraguan and Mexican DENV-1 strains. Phylogenetic and recombination analyses suggest that the DENV-1 isolated in KW originated from Nicaragua (NI) and the KW strain may circulate in KW. Also, recombination analysis results detected recombination events in the KW strain compared to DENV-1 strains from Puerto Rico. We evaluate the relative growth of KW strain of DENV-1 compared to other dengue viruses to determine whether the underlying genetics of the strain is associated with a replicative advantage, an important consideration since local transmission of DENV may result because domestic tourism can spread DENVs.

  16. The eukaryotic genome is structurally and functionally more like a social insect colony than a book.

    PubMed

    Qiu, Guo-Hua; Yang, Xiaoyan; Zheng, Xintian; Huang, Cuiqin

    2017-11-01

    Traditionally, the genome has been described as the 'book of life'. However, the metaphor of a book may not reflect the dynamic nature of the structure and function of the genome. In the eukaryotic genome, the number of centrally located protein-coding sequences is relatively constant across species, but the amount of noncoding DNA increases considerably with the increase of organismal evolutional complexity. Therefore, it has been hypothesized that the abundant peripheral noncoding DNA protects the genome and the central protein-coding sequences in the eukaryotic genome. Upon comparison with the habitation, sociality and defense mechanisms of a social insect colony, it is found that the genome is similar to a social insect colony in various aspects. A social insect colony may thus be a better metaphor than a book to describe the spatial organization and physical functions of the genome. The potential implications of the metaphor are also discussed.

  17. Genetic analysis of porcine circovirus type 2 from dead minks.

    PubMed

    Wang, Gui-Sheng; Sun, Na; Tian, Fu-Lin; Wen, Yong-Jun; Xu, Cong; Li, Jun; Chen, Qiang; Wang, Jin-Bao

    2016-09-01

    Circovirus infection is a growing problem in the field of veterinary and public health. It is associated with enteric diseases in both mammalian and avian hosts. In this study, we detected and isolated porcine circovirus strains in the tissue samples of minks that died from diarrhoea in Shandong Province, China. We sequenced the whole genome of two porcine strains of Circovirus, designated as MiSD-1 and MiSD-2, which had a 97.34% similarity on nucleotide sequence and were closely related to porcine circovirus type 2 (PCV2), but distantly related to mink circoviral species. Phylogenetically MiSD-1 and MiSD-2 are a part of the PCV2b genotype cluster, which is a highly prevalent genotype worldwide. The closer relationship of MiSD-1 and MiSD-2 to PCV2 from pigs than to other mink circoviral species may be evidence of cross-species transmission and considerable zoonotic potential.

  18. Application of cryopreservation to genetic analyses of a photosynthetic picoeukaryote community.

    PubMed

    Kawachi, Masanobu; Kataoka, Takafumi; Sato, Mayumi; Noël, Mary-Hélène; Kuwata, Akira; Demura, Mikihide; Yamaguchi, Haruyo

    2016-02-01

    Cryopreservation is useful for long-term maintenance of living strains in microbial culture collections. We applied this technique to environmental specimens from two monitoring sites at Sendai Bay, Japan and compared the microbial diversity of photosynthetic picoeukaryotes in samples before and after cryopreservation. Flow cytometry (FCM) showed no considerable differences between specimens. We used 2500 cells sorted with FCM for next-generation sequencing of 18S rRNA gene amplicons and after removing low-quality sequences obtained 10,088-37,454 reads. Cluster analysis and comparative correlation analysis of observed high-level operational taxonomic units indicated similarity between specimens before and after cryopreservation. The effects of cryopreservation on cells were assessed with representative culture strains, including fragile cryptophyte cells. We confirmed the usefulness of cryopreservation for genetic studies on environmental specimens, and found that small changes in FCM cytograms after cryopreservation may affect biodiversity estimation. Copyright © 2015 Elsevier B.V. All rights reserved.

  19. A Method for WD40 Repeat Detection and Secondary Structure Prediction

    PubMed Central

    Wang, Yang; Jiang, Fan; Zhuo, Zhu; Wu, Xian-Hui; Wu, Yun-Dong

    2013-01-01

    WD40-repeat proteins (WD40s), as one of the largest protein families in eukaryotes, play vital roles in assembling protein-protein/DNA/RNA complexes. WD40s fold into similar β-propeller structures despite diversified sequences. A program WDSP (WD40 repeat protein Structure Predictor) has been developed to accurately identify WD40 repeats and predict their secondary structures. The method is designed specifically for WD40 proteins by incorporating both local residue information and non-local family-specific structural features. It overcomes the problem of highly diversified protein sequences and variable loops. In addition, WDSP achieves a better prediction in identifying multiple WD40-domain proteins by taking the global combination of repeats into consideration. In secondary structure prediction, the average Q3 accuracy of WDSP in jack-knife test reaches 93.7%. A disease related protein LRRK2 was used as a representive example to demonstrate the structure prediction. PMID:23776530

  20. Crystal structure of a triacylglycerol lipase from Penicillium expansum at 1.3 A determined by sulfur SAD

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bian, Chuanbing; Yuan, Cai; Chen, Liqing

    2010-04-05

    Triacylglycerol lipases (EC 3.1.1.3) are present in many different organisms including animals, plants, and microbes. Lipases catalyze the hydrolysis of long-chain triglycerides into fatty acids and glycerol at the interface between the water insoluble substrate and the aqueous phase. Lipases can also catalyze the reverse esterification reaction to form glycerides under certain conditions. Lipases of microbial origin are of considerable commercial interest for wide variety of biotechnological applications in industries, including detergent, food, cosmetic, pharmaceutical, fine chemicals, and biodiesel. Nowadays, microbial lipases have become one of the most important industrial enzymes. PEL (Penicillium expansum lipase) is a fungal lipase frommore » Penicillium expansum strain PF898 isolated from Chinese soil that has been subjected to several generations of mutagenesis to increase its enzymatic activity. PEL belongs to the triacylglycerol lipases family, and its catalytic characteristics have been studied. The enzyme has been used in Chinese laundry detergent industry for several years (http://www.leveking.com). However, the poor thermal stability of the enzyme limits its application. To further study and improve this enzyme, PEL was cloned and sequenced. Furthermore, it was overexpressed in Pichia pastoris. PEL contains GHSLG sequence, which is the lipase consensus sequence Gly-X1-Ser-X2-Gly, but has a low amino acid sequence identities to other lipases. The most similar lipases are Rhizomucor miehei (PML) and Rhizopus niveus (PNL) with a 21% and 20% sequence identities to PEL, respectively. Interestingly, the similarity of PEL with the known esterases is somewhat higher with 24% sequence identity to feruloyl esterase A. Here, we report the 1.3 {angstrom} resolution crystal structure of PEL determined by sulfur SAD phasing. This structure not only presents a new lipase structure at high resolution, but also provides a structural platform to analyze the published mutagenesis results. The structure may also open up new avenues for future protein engineering study on PEL.« less

  1. Nucleotide sequences and regulational analysis of genes involved in conversion of aniline to catechol in Pseudomonas putida UCC22(pTDN1).

    PubMed Central

    Fukumori, F; Saint, C P

    1997-01-01

    A 9,233-bp HindIII fragment of the aromatic amine catabolic plasmid pTDN1, isolated from a derivative of Pseudomonas putida mt-2 (UCC22), confers the ability to degrade aniline on P. putida KT2442. The fragment encodes six open reading frames which are arranged in the same direction. Their 5' upstream region is part of the direct-repeat sequence of pTDN1. Nucleotide sequence of 1.8 kb of the repeat sequence revealed only a single base pair change compared to the known sequence of IS1071 which is involved in the transposition of the chlorobenzoate genes (C. Nakatsu, J. Ng, R. Singh, N. Straus, and C. Wyndham, Proc. Natl. Acad. Sci. USA 88:8312-8316, 1991). Four open reading frames encode proteins with considerable homology to proteins found in other aromatic-compound degradation pathways. On the basis of sequence similarity, these genes are proposed to encode the large and small subunits of aniline oxygenase (tdnA1 and tdnA2, respectively), a reductase (tdnB), and a LysR-type regulatory gene (tdnR). The putative large subunit has a conserved [2Fe-2S]R Rieske-type ligand center. Two genes, tdnQ and tdnT, which may be involved in amino group transfer, are localized upstream of the putative oxygenase genes. The tdnQ gene product shares about 30% similarity with glutamine synthetases; however, a pUC-based plasmid carrying tdnQ did not support the growth of an Escherichia coli glnA strain in the absence of glutamine. TdnT possesses domains that are conserved among amidotransferases. The tdnQ, tdnA1, tdnA2, tdnB, and tdnR genes are essential for the conversion of aniline to catechol. PMID:8990291

  2. Sequence-similar, structure-dissimilar protein pairs in the PDB.

    PubMed

    Kosloff, Mickey; Kolodny, Rachel

    2008-05-01

    It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).

  3. Application of phase consistency to improve time efficiency and image quality in dual echo black-blood carotid angiography.

    PubMed

    Kholmovski, Eugene G; Parker, Dennis L

    2005-07-01

    There is a considerable similarity between proton density-weighted (PDw) and T2-weighted (T2w) images acquired by dual echo fast spin-echo (FSE) sequences. The similarity manifests itself not only in image space as correspondence between intensities of PDw and T2w images, but also in phase space as consistency between phases of PDw and T2w images. Methods for improving the imaging efficiency and image quality of dual echo FSE sequences based on this feature have been developed. The total scan time of dual echo FSE acquisition may be reduced by as much as 25% by incorporating an estimate of the image phase from a fully sampled PDw image when reconstructing partially sampled T2w images. The quality of T2w images acquired using phased array coils may be significantly improved by using the developed noise reduction reconstruction scheme, which is based on the correspondence between the PDw and T2w image intensities and the consistency between the PDw and T2w image phases. Studies of phantom and human subject MRI data were performed to evaluate the effectiveness of the techniques.

  4. Atrial natriuretic peptide synthesis in atrial tumors of transgenic mice.

    PubMed

    Gardner, D G; Camargo, M J; Behringer, R R; Brinster, R L; Baxter, J D; Atlas, S A; Laragh, J H; Deschepper, C F

    1992-04-01

    Transgenic mice harboring a chimeric gene linking mouse protamine 1 5'-flanking sequence to the coding sequence of the simian virus 40 T-antigen develop spontaneous rhabdomyosarcomas of the right atria. The presence of the tumors is accompanied by dramatic elevations in plasma atrial natriuretic peptide (ANP) immunoreactivity (1,698 +/- 993 vs. 60 +/- 18 fmol/ml for controls) and hematocrit (56 +/- 8 vs. 51 +/- 2 for controls). The immunoreactive ANP (irANP) present in the tumors is similar in size to irANP found in normal mouse atria. ANP mRNA transcripts present in the tumors also appear to be very similar in overall size and 5'-termini to those produced in normal cardiac tissue. Microscopically, the tumors are composed of a disorganized array of densely packed abnormal-appearing cells. Immunocytochemistry and in situ hybridization analysis reveal considerable heterogeneity in ANP gene expression. ANP peptide and mRNA are detectable throughout the parenchyma of the tumors, but absolute levels of expression vary widely among different cells in the population. These tumors represent a potentially valuable model for the study of inappropriate ANP secretion and may provide a tissue source for the development of an ANP-producing atrial cell line.

  5. Deep sequencing reveals exceptional diversity and modes of transmission for bacterial sponge symbionts.

    PubMed

    Webster, Nicole S; Taylor, Michael W; Behnam, Faris; Lücker, Sebastian; Rattei, Thomas; Whalan, Stephen; Horn, Matthias; Wagner, Michael

    2010-08-01

    Marine sponges contain complex bacterial communities of considerable ecological and biotechnological importance, with many of these organisms postulated to be specific to sponge hosts. Testing this hypothesis in light of the recent discovery of the rare microbial biosphere, we investigated three Australian sponges by massively parallel 16S rRNA gene tag pyrosequencing. Here we show bacterial diversity that is unparalleled in an invertebrate host, with more than 250,000 sponge-derived sequence tags being assigned to 23 bacterial phyla and revealing up to 2996 operational taxonomic units (95% sequence similarity) per sponge species. Of the 33 previously described 'sponge-specific' clusters that were detected in this study, 48% were found exclusively in adults and larvae - implying vertical transmission of these groups. The remaining taxa, including 'Poribacteria', were also found at very low abundance among the 135,000 tags retrieved from surrounding seawater. Thus, members of the rare seawater biosphere may serve as seed organisms for widely occurring symbiont populations in sponges and their host association might have evolved much more recently than previously thought. © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd.

  6. Deep sequencing reveals exceptional diversity and modes of transmission for bacterial sponge symbionts

    PubMed Central

    Webster, Nicole S; Taylor, Michael W; Behnam, Faris; Lücker, Sebastian; Rattei, Thomas; Whalan, Stephen; Horn, Matthias; Wagner, Michael

    2010-01-01

    Marine sponges contain complex bacterial communities of considerable ecological and biotechnological importance, with many of these organisms postulated to be specific to sponge hosts. Testing this hypothesis in light of the recent discovery of the rare microbial biosphere, we investigated three Australian sponges by massively parallel 16S rRNA gene tag pyrosequencing. Here we show bacterial diversity that is unparalleled in an invertebrate host, with more than 250 000 sponge-derived sequence tags being assigned to 23 bacterial phyla and revealing up to 2996 operational taxonomic units (95% sequence similarity) per sponge species. Of the 33 previously described ‘sponge-specific’ clusters that were detected in this study, 48% were found exclusively in adults and larvae – implying vertical transmission of these groups. The remaining taxa, including ‘Poribacteria’, were also found at very low abundance among the 135 000 tags retrieved from surrounding seawater. Thus, members of the rare seawater biosphere may serve as seed organisms for widely occurring symbiont populations in sponges and their host association might have evolved much more recently than previously thought. PMID:21966903

  7. Drosophila Melanogaster Mitochondrial DNA: Gene Organization and Evolutionary Considerations

    PubMed Central

    Garesse, R.

    1988-01-01

    The sequence of a 8351-nucleotide mitochondrial DNA (mtDNA) fragment has been obtained extending the knowledge of the Drosophila melanogaster mitochondrial genome to 90% of its coding region. The sequence encodes seven polypeptides, 12 tRNAs and the 3' end of the 16S rRNA and CO III genes. The gene organization is strictly conserved with respect to the Drosophila yakuba mitochondrial genome, and different from that found in mammals and Xenopus. The high A + T content of D. melanogaster mitochondrial DNA is reflected in a reiterative codon usage, with more than 90% of the codons ending in T or A, G + C rich codons being practically absent. The average level of homology between the D. melanogaster and D. yakuba sequences is very high (roughly 94%), although insertion and deletions have been detected in protein, tRNA and large ribosomal genes. The analysis of nucleotide changes reveals a similar frequency for transitions and transversions, and reflects a strong bias against G+C on both strands. The predominant type of transition is strand specific. PMID:3130291

  8. Mechanistic considerations on the wavelength-dependent variations of UVR genotoxicity and mutagenesis in skin: the discrimination of UVA-signature from UV-signature mutation.

    PubMed

    Ikehata, Hironobu

    2018-05-31

    Ultraviolet radiation (UVR) predominantly induces UV-signature mutations, C → T and CC → TT base substitutions at dipyrimidine sites, in the cellular and skin genome. I observed in our in vivo mutation studies of mouse skin that these UVR-specific mutations show a wavelength-dependent variation in their sequence-context preference. The C → T mutation occurs most frequently in the 5'-TCG-3' sequence regardless of the UVR wavelength, but is recovered more preferentially there as the wavelength increases, resulting in prominent occurrences exclusively in the TCG sequence in the UVA wavelength range, which I will designate as a "UVA signature" in this review. The preference of the UVB-induced C → T mutation for the sequence contexts shows a mixed pattern of UVC- and UVA-induced mutations, and a similar pattern is also observed for natural sunlight, in which UVB is the most genotoxic component. In addition, the CC → TT mutation hardly occurs at UVA1 wavelengths, although it is detected rarely but constantly in the UVC and UVB ranges. This wavelength-dependent variation in the sequence-context preference of the UVR-specific mutations could be explained by two different photochemical mechanisms of cyclobutane pyrimidine dimer (CPD) formation. The UV-signature mutations observed in the UVC and UVB ranges are known to be caused mainly by CPDs produced through the conventional singlet/triplet excitation of pyrimidine bases after the direct absorption of the UVC/UVB photon energy in those bases. On the other hand, a novel photochemical mechanism through the direct absorption of the UVR energy to double-stranded DNA, which is called "collective excitation", has been proposed for the UVA-induced CPD formation. The UVA photons directly absorbed by DNA produce CPDs with a sequence context preference different from that observed for CPDs caused by the UVC/UVB-mediated singlet/triplet excitation, causing CPD formation preferentially at thymine-containing dipyrimidine sites and probably also preferably at methyl CpG-associated dipyrimidine sites, which include the TCG sequence. In this review, I present a mechanistic consideration on the wavelength-dependent variation of the sequence context preference of the UVR-specific mutations and rationalize the proposition of the UVA-signature mutation, in addition to the UV-signature mutation.

  9. Subgrouping Automata: automatic sequence subgrouping using phylogenetic tree-based optimum subgrouping algorithm.

    PubMed

    Seo, Joo-Hyun; Park, Jihyang; Kim, Eun-Mi; Kim, Juhan; Joo, Keehyoung; Lee, Jooyoung; Kim, Byung-Gee

    2014-02-01

    Sequence subgrouping for a given sequence set can enable various informative tasks such as the functional discrimination of sequence subsets and the functional inference of unknown sequences. Because an identity threshold for sequence subgrouping may vary according to the given sequence set, it is highly desirable to construct a robust subgrouping algorithm which automatically identifies an optimal identity threshold and generates subgroups for a given sequence set. To meet this end, an automatic sequence subgrouping method, named 'Subgrouping Automata' was constructed. Firstly, tree analysis module analyzes the structure of tree and calculates the all possible subgroups in each node. Sequence similarity analysis module calculates average sequence similarity for all subgroups in each node. Representative sequence generation module finds a representative sequence using profile analysis and self-scoring for each subgroup. For all nodes, average sequence similarities are calculated and 'Subgrouping Automata' searches a node showing statistically maximum sequence similarity increase using Student's t-value. A node showing the maximum t-value, which gives the most significant differences in average sequence similarity between two adjacent nodes, is determined as an optimum subgrouping node in the phylogenetic tree. Further analysis showed that the optimum subgrouping node from SA prevents under-subgrouping and over-subgrouping. Copyright © 2013. Published by Elsevier Ltd.

  10. Large-Scale Concatenation cDNA Sequencing

    PubMed Central

    Yu, Wei; Andersson, Björn; Worley, Kim C.; Muzny, Donna M.; Ding, Yan; Liu, Wen; Ricafrente, Jennifer Y.; Wentland, Meredith A.; Lennon, Greg; Gibbs, Richard A.

    1997-01-01

    A total of 100 kb of DNA derived from 69 individual human brain cDNA clones of 0.7–2.0 kb were sequenced by concatenated cDNA sequencing (CCS), whereby multiple individual DNA fragments are sequenced simultaneously in a single shotgun library. The method yielded accurate sequences and a similar efficiency compared with other shotgun libraries constructed from single DNA fragments (>20 kb). Computer analyses were carried out on 65 cDNA clone sequences and their corresponding end sequences to examine both nucleic acid and amino acid sequence similarities in the databases. Thirty-seven clones revealed no DNA database matches, 12 clones generated exact matches (≥98% identity), and 16 clones generated nonexact matches (57%–97% identity) to either known human or other species genes. Of those 28 matched clones, 8 had corresponding end sequences that failed to identify similarities. In a protein similarity search, 27 clone sequences displayed significant matches, whereas only 20 of the end sequences had matches to known protein sequences. Our data indicate that full-length cDNA insert sequences provide significantly more nucleic acid and protein sequence similarity matches than expressed sequence tags (ESTs) for database searching. [All 65 cDNA clone sequences described in this paper have been submitted to the GenBank data library under accession nos. U79240–U79304.] PMID:9110174

  11. The complete chloroplast genome sequence of the chlorophycean green alga Scenedesmus obliquus reveals a compact gene organization and a biased distribution of genes on the two DNA strands

    PubMed Central

    de Cambiaire, Jean-Charles; Otis, Christian; Lemieux, Claude; Turmel, Monique

    2006-01-01

    Background The phylum Chlorophyta contains the majority of the green algae and is divided into four classes. While the basal position of the Prasinophyceae is well established, the divergence order of the Ulvophyceae, Trebouxiophyceae and Chlorophyceae (UTC) remains uncertain. The five complete chloroplast DNA (cpDNA) sequences currently available for representatives of these classes display considerable variability in overall structure, gene content, gene density, intron content and gene order. Among these genomes, that of the chlorophycean green alga Chlamydomonas reinhardtii has retained the least ancestral features. The two single-copy regions, which are separated from one another by the large inverted repeat (IR), have similar sizes, rather than unequal sizes, and differ radically in both gene contents and gene organizations relative to the single-copy regions of prasinophyte and ulvophyte cpDNAs. To gain insights into the various changes that underwent the chloroplast genome during the evolution of chlorophycean green algae, we have sequenced the cpDNA of Scenedesmus obliquus, a member of a distinct chlorophycean lineage. Results The 161,452 bp IR-containing genome of Scenedesmus features single-copy regions of similar sizes, encodes 96 genes, i.e. only two additional genes (infA and rpl12) relative to its Chlamydomonas homologue and contains seven group I and two group II introns. It is clearly more compact than the four UTC algal cpDNAs that have been examined so far, displays the lowest proportion of short repeats among these algae and shows a stronger bias in clustering of genes on the same DNA strand compared to Chlamydomonas cpDNA. Like the latter genome, Scenedesmus cpDNA displays only a few ancestral gene clusters. The two chlorophycean genomes share 11 gene clusters that are not found in previously sequenced trebouxiophyte and ulvophyte cpDNAs as well as a few genes that have an unusual structure; however, their single-copy regions differ considerably in gene content. Conclusion Our results underscore the remarkable plasticity of the chlorophycean chloroplast genome. Owing to this plasticity, only a sketchy portrait could be drawn for the chloroplast genome of the last common ancestor of Scenedesmus and Chlamydomonas. PMID:16638149

  12. Application of discrete Fourier inter-coefficient difference for assessing genetic sequence similarity.

    PubMed

    King, Brian R; Aburdene, Maurice; Thompson, Alex; Warres, Zach

    2014-01-01

    Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.

  13. Model-free aftershock forecasts constructed from similar sequences in the past

    NASA Astrophysics Data System (ADS)

    van der Elst, N.; Page, M. T.

    2017-12-01

    The basic premise behind aftershock forecasting is that sequences in the future will be similar to those in the past. Forecast models typically use empirically tuned parametric distributions to approximate past sequences, and project those distributions into the future to make a forecast. While parametric models do a good job of describing average outcomes, they are not explicitly designed to capture the full range of variability between sequences, and can suffer from over-tuning of the parameters. In particular, parametric forecasts may produce a high rate of "surprises" - sequences that land outside the forecast range. Here we present a non-parametric forecast method that cuts out the parametric "middleman" between training data and forecast. The method is based on finding past sequences that are similar to the target sequence, and evaluating their outcomes. We quantify similarity as the Poisson probability that the observed event count in a past sequence reflects the same underlying intensity as the observed event count in the target sequence. Event counts are defined in terms of differential magnitude relative to the mainshock. The forecast is then constructed from the distribution of past sequences outcomes, weighted by their similarity. We compare the similarity forecast with the Reasenberg and Jones (RJ95) method, for a set of 2807 global aftershock sequences of M≥6 mainshocks. We implement a sequence-specific RJ95 forecast using a global average prior and Bayesian updating, but do not propagate epistemic uncertainty. The RJ95 forecast is somewhat more precise than the similarity forecast: 90% of observed sequences fall within a factor of two of the median RJ95 forecast value, whereas the fraction is 85% for the similarity forecast. However, the surprise rate is much higher for the RJ95 forecast; 10% of observed sequences fall in the upper 2.5% of the (Poissonian) forecast range. The surprise rate is less than 3% for the similarity forecast. The similarity forecast may be useful to emergency managers and non-specialists when confidence or expertise in parametric forecasting may be lacking. The method makes over-tuning impossible, and minimizes the rate of surprises. At the least, this forecast constitutes a useful benchmark for more precisely tuned parametric forecasts.

  14. De novo sequencing and characterization of floral transcriptome in two species of buckwheat (Fagopyrum)

    PubMed Central

    2011-01-01

    Background Transcriptome sequencing data has become an integral component of modern genetics, genomics and evolutionary biology. However, despite advances in the technologies of DNA sequencing, such data are lacking for many groups of living organisms, in particular, many plant taxa. We present here the results of transcriptome sequencing for two closely related plant species. These species, Fagopyrum esculentum and F. tataricum, belong to the order Caryophyllales - a large group of flowering plants with uncertain evolutionary relationships. F. esculentum (common buckwheat) is also an important food crop. Despite these practical and evolutionary considerations Fagopyrum species have not been the subject of large-scale sequencing projects. Results Normalized cDNA corresponding to genes expressed in flowers and inflorescences of F. esculentum and F. tataricum was sequenced using the 454 pyrosequencing technology. This resulted in 267 (for F. esculentum) and 229 (F. tataricum) thousands of reads with average length of 341-349 nucleotides. De novo assembly of the reads produced about 25 thousands of contigs for each species, with 7.5-8.2× coverage. Comparative analysis of two transcriptomes demonstrated their overall similarity but also revealed genes that are presumably differentially expressed. Among them are retrotransposon genes and genes involved in sugar biosynthesis and metabolism. Thirteen single-copy genes were used for phylogenetic analysis; the resulting trees are largely consistent with those inferred from multigenic plastid datasets. The sister relationships of the Caryophyllales and asterids now gained high support from nuclear gene sequences. Conclusions 454 transcriptome sequencing and de novo assembly was performed for two congeneric flowering plant species, F. esculentum and F. tataricum. As a result, a large set of cDNA sequences that represent orthologs of known plant genes as well as potential new genes was generated. PMID:21232141

  15. MEGANTE: A Web-Based System for Integrated Plant Genome Annotation

    PubMed Central

    Numa, Hisataka; Itoh, Takeshi

    2014-01-01

    The recent advancement of high-throughput genome sequencing technologies has resulted in a considerable increase in demands for large-scale genome annotation. While annotation is a crucial step for downstream data analyses and experimental studies, this process requires substantial expertise and knowledge of bioinformatics. Here we present MEGANTE, a web-based annotation system that makes plant genome annotation easy for researchers unfamiliar with bioinformatics. Without any complicated configuration, users can perform genomic sequence annotations simply by uploading a sequence and selecting the species to query. MEGANTE automatically runs several analysis programs and integrates the results to select the appropriate consensus exon–intron structures and to predict open reading frames (ORFs) at each locus. Functional annotation, including a similarity search against known proteins and a functional domain search, are also performed for the predicted ORFs. The resultant annotation information is visualized with a widely used genome browser, GBrowse. For ease of analysis, the results can be downloaded in Microsoft Excel format. All of the query sequences and annotation results are stored on the server side so that users can access their own data from virtually anywhere on the web. The current release of MEGANTE targets 24 plant species from the Brassicaceae, Fabaceae, Musaceae, Poaceae, Salicaceae, Solanaceae, Rosaceae and Vitaceae families, and it allows users to submit a sequence up to 10 Mb in length and to save up to 100 sequences with the annotation information on the server. The MEGANTE web service is available at https://megante.dna.affrc.go.jp/. PMID:24253915

  16. Investigating Correlation between Protein Sequence Similarity and Semantic Similarity Using Gene Ontology Annotations.

    PubMed

    Ikram, Najmul; Qadir, Muhammad Abdul; Afzal, Muhammad Tanvir

    2018-01-01

    Sequence similarity is a commonly used measure to compare proteins. With the increasing use of ontologies, semantic (function) similarity is getting importance. The correlation between these measures has been applied in the evaluation of new semantic similarity methods, and in protein function prediction. In this research, we investigate the relationship between the two similarity methods. The results suggest absence of a strong correlation between sequence and semantic similarities. There is a large number of proteins with low sequence similarity and high semantic similarity. We observe that Pearson's correlation coefficient is not sufficient to explain the nature of this relationship. Interestingly, the term semantic similarity values above 0 and below 1 do not seem to play a role in improving the correlation. That is, the correlation coefficient depends only on the number of common GO terms in proteins under comparison, and the semantic similarity measurement method does not influence it. Semantic similarity and sequence similarity have a distinct behavior. These findings are of significant effect for future works on protein comparison, and will help understand the semantic similarity between proteins in a better way.

  17. Efficient production of artificially designed gelatins with a Bacillus brevis system.

    PubMed

    Kajino, T; Takahashi, H; Hirai, M; Yamada, Y

    2000-01-01

    Artificially designed gelatins comprising tandemly repeated 30-amino-acid peptide units derived from human alphaI collagen were successfully produced with a Bacillus brevis system. The DNA encoding the peptide unit was synthesized by taking into consideration the codon usage of the host cells, but no clones having a tandemly repeated gene were obtained through the above-mentioned strategy. Minirepeat genes could be selected in vivo from a mixture of every possible sequence encoding an artificial gelatin by randomly ligating the mixed sequence unit and transforming it into Escherichia coli. Larger repeat genes constructed by connecting minirepeat genes obtained by in vivo selection were also stable in the expression host cells. Gelatins derived from the eight-unit and six-unit repeat genes were extracellularly produced at the level of 0.5 g/liter and easily purified by ammonium sulfate fractionation and anion-exchange chromatography. The purified artificial gelatins had the predicted N-terminal sequences and amino acid compositions and a solgel property similar to that of the native gelatin. These results suggest that the selection of a repeat unit sequence stable in an expression host is a shortcut for the efficient production of repetitive proteins and that it can conveniently be achieved by the in vivo selection method. This study revealed the possible industrial application of artificially designed repetitive proteins.

  18. miRNEST database: an integrative approach in microRNA search and annotation

    PubMed Central

    Szcześniak, Michał Wojciech; Deorowicz, Sebastian; Gapski, Jakub; Kaczyński, Łukasz; Makałowska, Izabela

    2012-01-01

    Despite accumulating data on animal and plant microRNAs and their functions, existing public miRNA resources usually collect miRNAs from a very limited number of species. A lot of microRNAs, including those from model organisms, remain undiscovered. As a result there is a continuous need to search for new microRNAs. We present miRNEST (http://mirnest.amu.edu.pl), a comprehensive database of animal, plant and virus microRNAs. The core part of the database is built from our miRNA predictions conducted on Expressed Sequence Tags of 225 animal and 202 plant species. The miRNA search was performed based on sequence similarity and as many as 10 004 miRNA candidates in 221 animal and 199 plant species were discovered. Out of them only 299 have already been deposited in miRBase. Additionally, miRNEST has been integrated with external miRNA data from literature and 13 databases, which includes miRNA sequences, small RNA sequencing data, expression, polymorphisms and targets data as well as links to external miRNA resources, whenever applicable. All this makes miRNEST a considerable miRNA resource in a sense of number of species (544) that integrates a scattered miRNA data into a uniform format with a user-friendly web interface. PMID:22135287

  19. Method and apparatus for biological sequence comparison

    DOEpatents

    Marr, T.G.; Chang, W.I.

    1997-12-23

    A method and apparatus are disclosed for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence. 5 figs.

  20. Method and apparatus for biological sequence comparison

    DOEpatents

    Marr, Thomas G.; Chang, William I-Wei

    1997-01-01

    A method and apparatus for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence.

  1. Sequence information signal processor

    DOEpatents

    Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

    1999-01-01

    An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.

  2. Molecular structure and chromosome distribution of three repetitive DNA families in Anemone hortensis L. (Ranunculaceae).

    PubMed

    Mlinarec, Jelena; Chester, Mike; Siljak-Yakovlev, Sonja; Papes, Drazena; Leitch, Andrew R; Besendorfer, Visnja

    2009-01-01

    The structure, abundance and location of repetitive DNA sequences on chromosomes can characterize the nature of higher plant genomes. Here we report on three new repeat DNA families isolated from Anemone hortensis L.; (i) AhTR1, a family of satellite DNA (stDNA) composed of a 554-561 bp long EcoRV monomer; (ii) AhTR2, a stDNA family composed of a 743 bp long HindIII monomer and; (iii) AhDR, a repeat family composed of a 945 bp long HindIII fragment that exhibits some sequence similarity to Ty3/gypsy-like retroelements. Fluorescence in-situ hybridization (FISH) to metaphase chromosomes of A. hortensis (2n = 16) revealed that both AhTR1 and AhTR2 sequences co-localized with DAPI-positive AT-rich heterochromatic regions. AhTR1 sequences occur at intercalary DAPI bands while AhTR2 sequences occur at 8-10 terminally located heterochromatic blocks. In contrast AhDR sequences are dispersed over all chromosomes as expected of a Ty3/gypsy-like element. AhTR2 and AhTR1 repeat families include polyA- and polyT-tracks, AT/TA-motifs and a pentanucleotide sequence (CAAAA) that may have consequences for chromatin packing and sequence homogeneity. AhTR2 repeats also contain TTTAGGG motifs and degenerate variants. We suggest that they arose by interspersion of telomeric repeats with subtelomeric repeats, before hybrid unit(s) amplified through the heterochromatic domain. The three repetitive DNA families together occupy approximately 10% of the A. hortensis genome. Comparative analyses of eight Anemone species revealed that the divergence of the A. hortensis genome was accompanied by considerable modification and/or amplification of repeats.

  3. Estimation of pairwise sequence similarity of mammalian enhancers with word neighbourhood counts.

    PubMed

    Göke, Jonathan; Schulz, Marcel H; Lasserre, Julia; Vingron, Martin

    2012-03-01

    The identity of cells and tissues is to a large degree governed by transcriptional regulation. A major part is accomplished by the combinatorial binding of transcription factors at regulatory sequences, such as enhancers. Even though binding of transcription factors is sequence-specific, estimating the sequence similarity of two functionally similar enhancers is very difficult. However, a similarity measure for regulatory sequences is crucial to detect and understand functional similarities between two enhancers and will facilitate large-scale analyses like clustering, prediction and classification of genome-wide datasets. We present the standardized alignment-free sequence similarity measure N2, a flexible framework that is defined for word neighbourhoods. We explore the usefulness of adding reverse complement words as well as words including mismatches into the neighbourhood. On simulated enhancer sequences as well as functional enhancers in mouse development, N2 is shown to outperform previous alignment-free measures. N2 is flexible, faster than competing methods and less susceptible to single sequence noise and the occurrence of repetitive sequences. Experiments on the mouse enhancers reveal that enhancers active in different tissues can be separated by pairwise comparison using N2. N2 represents an improvement over previous alignment-free similarity measures without compromising speed, which makes it a good candidate for large-scale sequence comparison of regulatory sequences. The software is part of the open-source C++ library SeqAn (www.seqan.de) and a compiled version can be downloaded at http://www.seqan.de/projects/alf.html. Supplementary data are available at Bioinformatics online.

  4. Is the phonological similarity effect in working memory due to proactive interference?

    PubMed

    Baddeley, Alan D; Hitch, Graham J; Quinlan, Philip T

    2018-04-12

    Immediate serial recall of verbal material is highly sensitive to impairment attributable to phonological similarity. Although this has traditionally been interpreted as a within-sequence similarity effect, Engle (2007) proposed an interpretation based on interference from prior sequences, a phenomenon analogous to that found in the Peterson short-term memory (STM) task. We use the method of serial reconstruction to test this in an experiment contrasting the standard paradigm in which successive sequences are drawn from the same set of phonologically similar or dissimilar words and one in which the vowel sound on which similarity is based is switched from trial to trial, a manipulation analogous to that producing release from PI in the Peterson task. A substantial similarity effect occurs under both conditions although there is a small advantage from switching across similar sequences. There is, however, no evidence for the suggestion that the similarity effect will be absent from the very first sequence tested. Our results support the within-sequence similarity rather than a between-list PI interpretation. Reasons for the contrast with the classic Peterson short-term forgetting task are briefly discussed. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  5. Genetic diversity among brazilian isolates of beauveria bassiana: comparisons with non-brazilian isolates and other beauveria species

    USGS Publications Warehouse

    Fernandes, E.K.K.; Moraes, A.M.L.; Pacheco, R.S.; Rangel, D.E.N.; Miller, M.P.; Bittencourt, V.R.E.P.; Roberts, D.W.

    2009-01-01

    Aims: The genetic diversity of Beauveria bassiana was investigated by comparing isolates of this species to each other (49 from different geographical regions of Brazil and 4 from USA) and to other Beauveria spp. Methods and Results: The isolates were examined by multilocus enzyme electrophoresis (MLEE), amplified fragment length polymorphism (AFLP), and rDNA sequencing. MLEE and AFLP revealed considerable genetic variability among B. bassiana isolates. Several isolates from South and Southeast Brazil had high similarity coefficients, providing evidence of at least one population with clonal structure. There were clear genomic differences between most Brazilian and USA B. bassiana isolates. A Mantel test using data generated by AFLP provided evidence that greater geographical distances were associated with higher genetic distances. AFLP and rDNA sequencing demonstrated notable genotypic variation between B. bassiana and other Beauveria spp. Conclusion: Geographical distance between populations apparently is an important factor influencing genotypic variability among B. bassiana populations in Brazil. Significance and Impact of the Study: This study characterized many B. bassiana isolates. The results indicate that certain Brazilian isolates are considerably different from others and possibly should be regarded as separate species from B. bassiana sensu latu. The information on genetic variation among the Brazilian isolates, therefore, will be important to comprehending the population structure of B. bassiana in Brazil. ?? 2009 The Society for Applied Microbiology.

  6. Smoking Gun or Circumstantial Evidence? Comparison of Statistical Learning Methods using Functional Annotations for Prioritizing Risk Variants.

    PubMed

    Gagliano, Sarah A; Ravji, Reena; Barnes, Michael R; Weale, Michael E; Knight, Jo

    2015-08-24

    Although technology has triumphed in facilitating routine genome sequencing, new challenges have been created for the data-analyst. Genome-scale surveys of human variation generate volumes of data that far exceed capabilities for laboratory characterization. By incorporating functional annotations as predictors, statistical learning has been widely investigated for prioritizing genetic variants likely to be associated with complex disease. We compared three published prioritization procedures, which use different statistical learning algorithms and different predictors with regard to the quantity, type and coding. We also explored different combinations of algorithm and annotation set. As an application, we tested which methodology performed best for prioritizing variants using data from a large schizophrenia meta-analysis by the Psychiatric Genomics Consortium. Results suggest that all methods have considerable (and similar) predictive accuracies (AUCs 0.64-0.71) in test set data, but there is more variability in the application to the schizophrenia GWAS. In conclusion, a variety of algorithms and annotations seem to have a similar potential to effectively enrich true risk variants in genome-scale datasets, however none offer more than incremental improvement in prediction. We discuss how methods might be evolved for risk variant prediction to address the impending bottleneck of the new generation of genome re-sequencing studies.

  7. Contrast-enhanced 3-dimensional SPACE versus MP-RAGE for the detection of brain metastases: considerations with a 32-channel head coil.

    PubMed

    Reichert, Miriam; Morelli, John N; Runge, Val M; Tao, Ai; von Ritschl, Ruediger; von Ritschl, Andreas; Padua, Abraham; Dix, James E; Marra, Michael J; Schoenberg, Stefan O; Attenberger, Ulrike I

    2013-01-01

    The aim of this study was to compare the detection of brain metastases at 3 T using a 32-channel head coil with 2 different 3-dimensional (3D) contrast-enhanced sequences, a T1-weighted fast spin-echo-based (SPACE; sampling perfection with application-optimized contrasts using different flip angle evolutions) sequence and a conventional magnetization-prepared rapid gradient-echo (MP-RAGE) sequence. Seventeen patients with 161 brain metastases were examined prospectively using both SPACE and MP-RAGE sequences on a 3-T magnetic resonance system. Eight healthy volunteers were similarly examined for determination of signal-to-noise ratio (SNR) values. Parameters were adjusted to equalize acquisition times between the sequences (3 minutes and 30 seconds). The order in which sequences were performed was randomized. Two blinded board-certified neuroradiologists evaluated the number of detectable metastatic lesions with each sequence relative to a criterion standard reading conducted at the Gamma Knife facility by a neuroradiologist with access to all clinical and imaging data. In the volunteer assessment with SPACE and MP-RAGE, SNR (10.3 ± 0.8 vs 7.7 ± 0.7) and contrast-to-noise ratio (0.8 ± 0.2 vs 0.5 ± 0.1) were statistically significantly greater with the SPACE sequence (P < 0.05). Overall, lesion detection was markedly improved with the SPACE sequence (99.1% of lesions for reader 1 and 96.3% of lesions for reader 2) compared with the MP-RAGE sequence (73.6% of lesions for reader 1 and 68.5% of lesions for reader 2; P < 0.01). A 3D T1-weighted fast spin echo sequence (SPACE) improves detection of metastatic lesions relative to 3D T1-weighted gradient-echo-based scan (MP-RAGE) imaging when implemented with a 32-channel head coil at identical scan acquisition times (3 minutes and 30 seconds).

  8. DsaV methyltransferase and its isoschizomers contain a conserved segment that is similar to the segment in Hhai methyltransferase that is in contact with DNA bases.

    PubMed Central

    Gopal, J; Yebra, M J; Bhagwat, A S

    1994-01-01

    The methyltransferase (MTase) in the DsaV restriction--modification system methylates within 5'-CCNGG sequences. We have cloned the gene for this MTase and determined its sequence. The predicted sequence of the MTase protein contains sequence motifs conserved among all cytosine-5 MTases and is most similar to other MTases that methylate CCNGG sequences, namely M.ScrFI and M.SsoII. All three MTases methylate the internal cytosine within their recognition sequence. The 'variable' region within the three enzymes that methylate CCNGG can be aligned with the sequences of two enzymes that methylate CCWGG sequences. Remarkably, two segments within this region contain significant similarity with the region of M.HhaI that is known to contact DNA bases. These alignments suggest that many cytosine-5 MTases are likely to interact with DNA using a similar structural framework. Images PMID:7971279

  9. BLAST and FASTA similarity searching for multiple sequence alignment.

    PubMed

    Pearson, William R

    2014-01-01

    BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.

  10. Comparative mitogenomic analyses of three North American stygobiont amphipods of the genus Stygobromus (Crustacea: Amphipoda)

    USGS Publications Warehouse

    Aunins, Aaron W.; Nelms, David L.; Hobson, Christopher S.; King, Timothy L.

    2016-01-01

    The mitochondrial genomes of three North American stygobiont amphipods Stygobromus tenuis potomacus, S. foliatus and S. indentatus collected from Caroline County, VA, were sequenced using a shotgun sequencing approach on an Illumina NextSeq500 (Illumina Inc., San Diego, CA). All three mitogenomes displayed 13 protein-coding genes, 22 tRNAs and two rRNAs typical of metazoans. While S. tenuis and S. indentatusdisplayed identical gene orders similar to the pancrustacean ground pattern, S. foliatus displayed a transposition of the trnL2-cox2 genes to after atp8-atp6. In addition, a short atp8 gene, longer rrnL gene and large inverted repeat within the Control Region distinguished S. foliatus from S. tenuis potomacus and S. indentatus. Overall, it appears that gene order varies considerably among amphipods, and the addition of these Stygobromus mitogenomes to the existing sequenced amphipod mitogenomes will prove useful for characterizing evolutionary relationships among various amphipod taxa, as well as investigations of the evolutionary dynamics of the mitogenome in general.

  11. Sequence stratigraphy on an early wet Mars

    NASA Astrophysics Data System (ADS)

    Barker, Donald C.; Bhattacharya, Janok P.

    2018-02-01

    The evolution of Mars as a water-bearing body is of considerable interest for the understanding of its early history and evolution. The principles of terrestrial sequence stratigraphy provide a useful conceptual framework to hypothesize about the stratigraphic history of the planets northern plains. We present a model based on the hypothesized presence of an early ocean and the accumulation of lowland sediments eroded from highland terrain during the time of the valley networks and later outflow channels. Ancient, global environmental changes, induced by a progressively cooling climate would have led to a protracted loss of surface and near surface water from low-latitudes and eventual cold-trapping at higher latitudes - resulting in a unique and prolonged, perpetual forced regression within basins and lowland depositional environments. The Messinian Salinity Crisis (MSC) serves as a potential terrestrial analogue of the depositional and environmental consequences relating to the progressive removal of large standing bodies of water. We suggest that the evolution of similar conditions on Mars would have led to the emplacement of diagnostic sequences of deposits and regional scale unconformities, consistent with intermittent resurfacing of the northern plains and the progressive loss of an early ocean by the end of the Hesperian era.

  12. Fungal Genes in Context: Genome Architecture Reflects Regulatory Complexity and Function

    PubMed Central

    Noble, Luke M.; Andrianopoulos, Alex

    2013-01-01

    Gene context determines gene expression, with local chromosomal environment most influential. Comparative genomic analysis is often limited in scope to conserved or divergent gene and protein families, and fungi are well suited to this approach with low functional redundancy and relatively streamlined genomes. We show here that one aspect of gene context, the amount of potential upstream regulatory sequence maintained through evolution, is highly predictive of both molecular function and biological process in diverse fungi. Orthologs with large upstream intergenic regions (UIRs) are strongly enriched in information processing functions, such as signal transduction and sequence-specific DNA binding, and, in the genus Aspergillus, include the majority of experimentally studied, high-level developmental and metabolic transcriptional regulators. Many uncharacterized genes are also present in this class and, by implication, may be of similar importance. Large intergenic regions also share two novel sequence characteristics, currently of unknown significance: they are enriched for plus-strand polypyrimidine tracts and an information-rich, putative regulatory motif that was present in the last common ancestor of the Pezizomycotina. Systematic consideration of gene UIR in comparative genomics, particularly for poorly characterized species, could help reveal organisms’ regulatory priorities. PMID:23699226

  13. Molecular taxonomy of Dunaliella (Chlorophyceae), with a special focus on D. salina: ITS2 sequences revisited with an extensive geographical sampling

    PubMed Central

    2012-01-01

    We used an ITS2 primary and secondary structure and Compensatory Base Changes (CBCs) analyses on new French and Spanish Dunallela salina strains to investigate their phylogenetic position and taxonomic status within the genus Dunaliella. Our analyses show a great diversity within D. salina (with only some clades not statistically supported) and reveal considerable genetic diversity and structure within Dunaliella, although the CBC analysis did not bolster the existence of different biological groups within this taxon. The ITS2 sequences of the new Spanish and French D. salina strains were very similar except for two of them: ITC5105 "Janubio" from Spain and ITC5119 from France. Although the Spanish one had a unique ITS2 sequence profile and the phylogenetic tree indicates that this strain can represent a new species, this hypothesis was not confirmed by CBCs, and clarification of its taxonomic status requires further investigation with new data. Overall, the use of CBCs to define species boundaries within Dunaliella was not conclusive in some cases, and the ITS2 region does not contain a geographical signal overall. PMID:22520929

  14. On the path to genetic novelties: insights from programmed DNA elimination and RNA splicing.

    PubMed

    Catania, Francesco; Schmitz, Jürgen

    2015-01-01

    Understanding how genetic novelties arise is a central goal of evolutionary biology. To this end, programmed DNA elimination and RNA splicing deserve special consideration. While programmed DNA elimination reshapes genomes by eliminating chromatin during organismal development, RNA splicing rearranges genetic messages by removing intronic regions during transcription. Small RNAs help to mediate this class of sequence reorganization, which is not error-free. It is this imperfection that makes programmed DNA elimination and RNA splicing excellent candidates for generating evolutionary novelties. Leveraging a number of these two processes' mechanistic and evolutionary properties, which have been uncovered over the past years, we present recently proposed models and empirical evidence for how splicing can shape the structure of protein-coding genes in eukaryotes. We also chronicle a number of intriguing similarities between the processes of programmed DNA elimination and RNA splicing, and highlight the role that the variation in the population-genetic environment may play in shaping their target sequences. © 2015 Wiley Periodicals, Inc.

  15. Testing the Use of Implicit Solvent in the Molecular Dynamics Modelling of DNA Flexibility

    NASA Astrophysics Data System (ADS)

    Mitchell, J.; Harris, S.

    DNA flexibility controls packaging, looping and in some cases sequence specific protein binding. Molecular dynamics simulations carried out with a computationally efficient implicit solvent model are potentially a powerful tool for studying larger DNA molecules than can be currently simulated when water and counterions are represented explicitly. In this work we compare DNA flexibility at the base pair step level modelled using an implicit solvent model to that previously determined from explicit solvent simulations and database analysis. Although much of the sequence dependent behaviour is preserved in implicit solvent, the DNA is considerably more flexible when the approximate model is used. In addition we test the ability of the implicit solvent to model stress induced DNA disruptions by simulating a series of DNA minicircle topoisomers which vary in size and superhelical density. When compared with previously run explicit solvent simulations, we find that while the levels of DNA denaturation are similar using both computational methodologies, the specific structural form of the disruptions is different.

  16. Purification and characterization of gamma poly glutamic acid from newly Bacillus licheniformis NRC20.

    PubMed

    Tork, Sanaa E; Aly, Magda M; Alakilli, Saleha Y; Al-Seeni, Madeha N

    2015-03-01

    γ-poly glutamic acid (γ-PGA) has received considerable attention for pharmaceutical and biomedical applications. γ-PGA from the newly isolate Bacillus licheniformis NRC20 was purified and characterized using diffusion distance agar plate, mass spectrometry and thin layer chromatography. All analysis indicated that γ-PGA is a homopolymer composed of glutamic acid. Its molecular weight was determined to be 1266 kDa. It was composed of L- and D-glutamic acid residues. An amplicon of 3050 represents the γ-PGA-coding genes was obtained, sequenced and submitted in genbank database. Its amino acid sequence showed high similarity with that obtained from B. licheniformis strains. The bacterium NRC 20 was independent of L-glutamic acid but the polymer production enhanced when cultivated in medium containing L-glutamic acid as the sole nitrogen source. Finally we can conclude that γ-PGA production from B. licheniformis NRC20 has many promised applications in medicine, industry and nanotechnology. Copyright © 2014 Elsevier B.V. All rights reserved.

  17. Learning Grasp Context Distinctions that Generalize

    NASA Technical Reports Server (NTRS)

    Platt, Robert; Grupen, Roderic A.; Fagg, Andrew H.

    2006-01-01

    Control-based approaches to grasp synthesis create grasping behavior by sequencing and combining control primitives. In the absence of any other structure, these approaches must evaluate a large number of feasible control sequences as a function of object shape, object pose, and task. This work explores a new approach to grasp synthesis that limits consideration to variations on a generalized localize-reach-grasp control policy. A new learning algorithm, known as schema structured learning, is used to learn which instantiations of the generalized policy are most likely to lead to a successful grasp in different problem contexts. Two experiments are described where Dexter, a bimanual upper torso, learns to select an appropriate grasp strategy as a function of object eccentricity and orientation. In addition, it is shown that grasp skills learned in this way can generalize to new objects. Results are presented showing that after learning how to grasp a small, representative set of objects, the robot's performance quantitatively improves for similar objects that it has not experienced before.

  18. Computer ranking of the sequence of appearance of 100 features of the brain and related structures in staged human embryos during the first 5 weeks of development.

    PubMed

    O'Rahilly, R; Müller, F; Hutchins, G M; Moore, G W

    1984-11-01

    The sequence of events in the development of the brain in staged human embryos was investigated in much greater detail than in previous studies by listing 100 features in 165 embryos of the first 5 weeks. Using a computerized bubble-sort algorithm, individual embryos were ranked in ascending order of the features present. This procedure made feasible an appreciation of the slight variation found in the developmental features. The vast majority of features appeared during either one or two stages (about 2 or 3 days). In general, the soundness of the Carnegie system of embryonic staging was amply confirmed. The rhombencephalon was found to show increasing complexity around stage 13, and the postoptic portion of the diencephalon underwent considerable differentiation by stage 15. The need for similar investigations of other systems of the body is emphasized, and the importance of such studies in assessing the timing of congenital malformations and in clarifying syndromic clusters is suggested.

  19. Design and implementation of a hybrid MPI-CUDA model for the Smith-Waterman algorithm.

    PubMed

    Khaled, Heba; Faheem, Hossam El Deen Mostafa; El Gohary, Rania

    2015-01-01

    This paper provides a novel hybrid model for solving the multiple pair-wise sequence alignment problem combining message passing interface and CUDA, the parallel computing platform and programming model invented by NVIDIA. The proposed model targets homogeneous cluster nodes equipped with similar Graphical Processing Unit (GPU) cards. The model consists of the Master Node Dispatcher (MND) and the Worker GPU Nodes (WGN). The MND distributes the workload among the cluster working nodes and then aggregates the results. The WGN performs the multiple pair-wise sequence alignments using the Smith-Waterman algorithm. We also propose a modified implementation to the Smith-Waterman algorithm based on computing the alignment matrices row-wise. The experimental results demonstrate a considerable reduction in the running time by increasing the number of the working GPU nodes. The proposed model achieved a performance of about 12 Giga cell updates per second when we tested against the SWISS-PROT protein knowledge base running on four nodes.

  20. Origin of the DA and non-DA white dwarf stars

    NASA Technical Reports Server (NTRS)

    Shipman, Harry L.

    1989-01-01

    Various proposals for the bifurcation of the white dwarf cooling sequence are reviewed. 'Primordial' theories, in which the basic bifurcation of the white dwarf sequence is rooted in events predating the white dwarf stage of stellar evolution, are discussed, along with the competing 'mixing' theories in which processes occurring during the white dwarf stage are responsible for the existence of DA or non-DA stars. A new proposal is suggested, representing a two-channel scenario. In the DA channel, some process reduces the hydrogen layer mass to the value of less than 10 to the -7th. The non-DA channel is similar to that in the primordial scenario. These considerations suggest that some mechanism operates in both channels to reduce the thickness of the outermost layer of the white dwarf. It is also noted that accretion from the interstellar medium has little to do with whether a particular white dwarf becomes a DA or a non-DA star.

  1. CLAST: CUDA implemented large-scale alignment search tool.

    PubMed

    Yano, Masahiro; Mori, Hiroshi; Akiyama, Yutaka; Yamada, Takuji; Kurokawa, Ken

    2014-12-11

    Metagenomics is a powerful methodology to study microbial communities, but it is highly dependent on nucleotide sequence similarity searching against sequence databases. Metagenomic analyses with next-generation sequencing technologies produce enormous numbers of reads from microbial communities, and many reads are derived from microbes whose genomes have not yet been sequenced, limiting the usefulness of existing sequence similarity search tools. Therefore, there is a clear need for a sequence similarity search tool that can rapidly detect weak similarity in large datasets. We developed a tool, which we named CLAST (CUDA implemented large-scale alignment search tool), that enables analyses of millions of reads and thousands of reference genome sequences, and runs on NVIDIA Fermi architecture graphics processing units. CLAST has four main advantages over existing alignment tools. First, CLAST was capable of identifying sequence similarities ~80.8 times faster than BLAST and 9.6 times faster than BLAT. Second, CLAST executes global alignment as the default (local alignment is also an option), enabling CLAST to assign reads to taxonomic and functional groups based on evolutionarily distant nucleotide sequences with high accuracy. Third, CLAST does not need a preprocessed sequence database like Burrows-Wheeler Transform-based tools, and this enables CLAST to incorporate large, frequently updated sequence databases. Fourth, CLAST requires <2 GB of main memory, making it possible to run CLAST on a standard desktop computer or server node. CLAST achieved very high speed (similar to the Burrows-Wheeler Transform-based Bowtie 2 for long reads) and sensitivity (equal to BLAST, BLAT, and FR-HIT) without the need for extensive database preprocessing or a specialized computing platform. Our results demonstrate that CLAST has the potential to be one of the most powerful and realistic approaches to analyze the massive amount of sequence data from next-generation sequencing technologies.

  2. The chloroplast genome sequence of the green alga Leptosira terrestris: multiple losses of the inverted repeat and extensive genome rearrangements within the Trebouxiophyceae

    PubMed Central

    de Cambiaire, Jean-Charles; Otis, Christian; Turmel, Monique; Lemieux, Claude

    2007-01-01

    Background In the Chlorophyta – the green algal phylum comprising the classes Prasinophyceae, Ulvophyceae, Trebouxiophyceae and Chlorophyceae – the chloroplast genome displays a highly variable architecture. While chlorophycean chloroplast DNAs (cpDNAs) deviate considerably from the ancestral pattern described for the prasinophyte Nephroselmis olivacea, the degree of remodelling sustained by the two ulvophyte cpDNAs completely sequenced to date is intermediate relative to those observed for chlorophycean and trebouxiophyte cpDNAs. Chlorella vulgaris (Chlorellales) is currently the only photosynthetic trebouxiophyte whose complete cpDNA sequence has been reported. To gain insights into the evolutionary trends of the chloroplast genome in the Trebouxiophyceae, we sequenced cpDNA from the filamentous alga Leptosira terrestris (Ctenocladales). Results The 195,081-bp Leptosira chloroplast genome resembles the 150,613-bp Chlorella genome in lacking a large inverted repeat (IR) but differs greatly in gene order. Six of the conserved genes present in Chlorella cpDNA are missing from the Leptosira gene repertoire. The 106 conserved genes, four introns and 11 free standing open reading frames (ORFs) account for 48.3% of the genome sequence. This is the lowest gene density yet observed among chlorophyte cpDNAs. Contrary to the situation in Chlorella but similar to that in the chlorophycean Scenedesmus obliquus, the gene distribution is highly biased over the two DNA strands in Leptosira. Nine genes, compared to only three in Chlorella, have significantly expanded coding regions relative to their homologues in ancestral-type green algal cpDNAs. As observed in chlorophycean genomes, the rpoB gene is fragmented into two ORFs. Short repeats account for 5.1% of the Leptosira genome sequence and are present mainly in intergenic regions. Conclusion Our results highlight the great plasticity of the chloroplast genome in the Trebouxiophyceae and indicate that the IR was lost on at least two separate occasions. The intriguing similarities of the derived features exhibited by Leptosira cpDNA and its chlorophycean counterparts suggest that the same evolutionary forces shaped the IR-lacking chloroplast genomes in these two algal lineages. PMID:17610731

  3. Main-Sequence CMEs as Magnetic Explosions: Compatibility with Observed Kinematics

    NASA Technical Reports Server (NTRS)

    Moore, Ron; Falconer, David; Sterling, Alphonse

    2004-01-01

    We examine the kinematics of 26 CMEs of the morphological main sequence of CMEs, those having the classic three-part bubble structure of (1) a bright front eveloping (2) a dark cavity within which rides (3) a bright blob/filamentary feature. Each CME is observed in Yohkoh/SXT images to originate from near the limb (> or equal to 0.7 R(sub Sun) from disk center). The basic data (from the SOHO LASCO CME Catalog) for the kinematics of each CME are the sequence of LASCO images of the CME, the time of each image, the measured radial distance of the front edge of the CME in each image, and the measured angular extent of the CME. About half of our CMEs (12) occur with a flare, and the rest (14) occur without a flare. While the average linear-fit speed of the flare CMEs (1000 km/s) is twice that of the non-flare CMEs (510 km/s), the flare CMEs and the non-flare CMEs are similar in that some have nearly flat velocity-height (radial extent) profiles (little acceleration), some have noticeably falling velocity profiles (noticeable deceleration), and the rest have velocity profiles that rise considerably through the outer corona (blatant acceleration). This suggests that in addition to sharing similar morphology, main-sequence CMEs all have basically the same driving mechanism. The observed radial progression of each of our 26 CMEs is fit by a simple model magnetic plasmoid that is in pressure balance with the radial magnetic field in the outer corona and that propels itself outward by magnetic expansion, doing no net work on its surroundings. On average over the 26 CMEs, this model fits the observations as well as the assumption of constant acceleration. This is compatible with main-sequence CMEs being magnetically driven, basically magnetic explosions, with the velocity profile in the outer corona being largely dictated by the initial Alfien speed in the CME (when the front is at approx. 3 (sub Sun), analogous to the mass of a main-sequence star dictating the luminosity.

  4. Genomewide Analysis of the Antimicrobial Peptides in Python bivittatus and Characterization of Cathelicidins with Potent Antimicrobial Activity and Low Cytotoxicity.

    PubMed

    Kim, Dayeong; Soundrarajan, Nagasundarapandian; Lee, Juyeon; Cho, Hye-Sun; Choi, Minkyeung; Cha, Se-Yeoun; Ahn, Byeongyong; Jeon, Hyoim; Le, Minh Thong; Song, Hyuk; Kim, Jin-Hoi; Park, Chankyu

    2017-09-01

    In this study, we sought to identify novel antimicrobial peptides (AMPs) in Python bivittatus through bioinformatic analyses of publicly available genome information and experimental validation. In our analysis of the python genome, we identified 29 AMP-related candidate sequences. Of these, we selected five cathelicidin-like sequences and subjected them to further in silico analyses. The results showed that these sequences likely have antimicrobial activity. The sequences were named Pb-CATH1 to Pb-CATH5 according to their sequence similarity to previously reported snake cathelicidins. We predicted their molecular structure and then chemically synthesized the mature peptide for three putative cathelicidins and subjected them to biological activity tests. Interestingly, all three peptides showed potent antimicrobial effects against Gram-negative bacteria but very weak activity against Gram-positive bacteria. Remarkably, ΔPb-CATH4 showed potent activity against antibiotic-resistant clinical isolates and also was observed to possess very low hemolytic activity and cytotoxicity. ΔPb-CATH4 also showed considerable serum stability. Electron microscopic analysis indicated that ΔPb-CATH4 exerts its effects via toroidal pore preformation. Structural comparison of the cathelicidins identified in this study to previously reported ones revealed that these Pb-CATHs are representatives of a new group of reptilian cathelicidins lacking the acidic connecting domain. Furthermore, Pb-CATH4 possesses a completely different mature peptide sequence from those of previously described reptilian cathelicidins. These new AMPs may be candidates for the development of alternatives to or complements of antibiotics to control multidrug-resistant pathogens. Copyright © 2017 American Society for Microbiology.

  5. Genomewide Analysis of the Antimicrobial Peptides in Python bivittatus and Characterization of Cathelicidins with Potent Antimicrobial Activity and Low Cytotoxicity

    PubMed Central

    Kim, Dayeong; Soundrarajan, Nagasundarapandian; Lee, Juyeon; Cho, Hye-sun; Choi, Minkyeung; Cha, Se-Yeoun; Ahn, Byeongyong; Jeon, Hyoim; Le, Minh Thong; Song, Hyuk; Kim, Jin-Hoi

    2017-01-01

    ABSTRACT In this study, we sought to identify novel antimicrobial peptides (AMPs) in Python bivittatus through bioinformatic analyses of publicly available genome information and experimental validation. In our analysis of the python genome, we identified 29 AMP-related candidate sequences. Of these, we selected five cathelicidin-like sequences and subjected them to further in silico analyses. The results showed that these sequences likely have antimicrobial activity. The sequences were named Pb-CATH1 to Pb-CATH5 according to their sequence similarity to previously reported snake cathelicidins. We predicted their molecular structure and then chemically synthesized the mature peptide for three putative cathelicidins and subjected them to biological activity tests. Interestingly, all three peptides showed potent antimicrobial effects against Gram-negative bacteria but very weak activity against Gram-positive bacteria. Remarkably, ΔPb-CATH4 showed potent activity against antibiotic-resistant clinical isolates and also was observed to possess very low hemolytic activity and cytotoxicity. ΔPb-CATH4 also showed considerable serum stability. Electron microscopic analysis indicated that ΔPb-CATH4 exerts its effects via toroidal pore preformation. Structural comparison of the cathelicidins identified in this study to previously reported ones revealed that these Pb-CATHs are representatives of a new group of reptilian cathelicidins lacking the acidic connecting domain. Furthermore, Pb-CATH4 possesses a completely different mature peptide sequence from those of previously described reptilian cathelicidins. These new AMPs may be candidates for the development of alternatives to or complements of antibiotics to control multidrug-resistant pathogens. PMID:28630199

  6. Assessment of antibody library diversity through next generation sequencing and technical error compensation

    PubMed Central

    Lisi, Simonetta; Chirichella, Michele; Arisi, Ivan; Goracci, Martina; Cremisi, Federico; Cattaneo, Antonino

    2017-01-01

    Antibody libraries are important resources to derive antibodies to be used for a wide range of applications, from structural and functional studies to intracellular protein interference studies to developing new diagnostics and therapeutics. Whatever the goal, the key parameter for an antibody library is its complexity (also known as diversity), i.e. the number of distinct elements in the collection, which directly reflects the probability of finding in the library an antibody against a given antigen, of sufficiently high affinity. Quantitative evaluation of antibody library complexity and quality has been for a long time inadequately addressed, due to the high similarity and length of the sequences of the library. Complexity was usually inferred by the transformation efficiency and tested either by fingerprinting and/or sequencing of a few hundred random library elements. Inferring complexity from such a small sampling is, however, very rudimental and gives limited information about the real diversity, because complexity does not scale linearly with sample size. Next-generation sequencing (NGS) has opened new ways to tackle the antibody library complexity quality assessment. However, much remains to be done to fully exploit the potential of NGS for the quantitative analysis of antibody repertoires and to overcome current limitations. To obtain a more reliable antibody library complexity estimate here we show a new, PCR-free, NGS approach to sequence antibody libraries on Illumina platform, coupled to a new bioinformatic analysis and software (Diversity Estimator of Antibody Library, DEAL) that allows to reliably estimate the complexity, taking in consideration the sequencing error. PMID:28505201

  7. Assessment of antibody library diversity through next generation sequencing and technical error compensation.

    PubMed

    Fantini, Marco; Pandolfini, Luca; Lisi, Simonetta; Chirichella, Michele; Arisi, Ivan; Terrigno, Marco; Goracci, Martina; Cremisi, Federico; Cattaneo, Antonino

    2017-01-01

    Antibody libraries are important resources to derive antibodies to be used for a wide range of applications, from structural and functional studies to intracellular protein interference studies to developing new diagnostics and therapeutics. Whatever the goal, the key parameter for an antibody library is its complexity (also known as diversity), i.e. the number of distinct elements in the collection, which directly reflects the probability of finding in the library an antibody against a given antigen, of sufficiently high affinity. Quantitative evaluation of antibody library complexity and quality has been for a long time inadequately addressed, due to the high similarity and length of the sequences of the library. Complexity was usually inferred by the transformation efficiency and tested either by fingerprinting and/or sequencing of a few hundred random library elements. Inferring complexity from such a small sampling is, however, very rudimental and gives limited information about the real diversity, because complexity does not scale linearly with sample size. Next-generation sequencing (NGS) has opened new ways to tackle the antibody library complexity quality assessment. However, much remains to be done to fully exploit the potential of NGS for the quantitative analysis of antibody repertoires and to overcome current limitations. To obtain a more reliable antibody library complexity estimate here we show a new, PCR-free, NGS approach to sequence antibody libraries on Illumina platform, coupled to a new bioinformatic analysis and software (Diversity Estimator of Antibody Library, DEAL) that allows to reliably estimate the complexity, taking in consideration the sequencing error.

  8. Positive selection on MHC class II DRB and DQB genes in the bank vole (Myodes glareolus).

    PubMed

    Scherman, Kristin; Råberg, Lars; Westerdahl, Helena

    2014-05-01

    The major histocompatibility complex (MHC) class IIB genes show considerable sequence similarity between loci. The MHC class II DQB and DRB genes are known to exhibit a high level of polymorphism, most likely maintained by parasite-mediated selection. Studies of the MHC in wild rodents have focused on DRB, whilst DQB has been given much less attention. Here, we characterised DQB genes in Swedish bank voles Myodes glareolus, using full-length transcripts. We then designed primers that specifically amplify exon 2 from DRB (202 bp) and DQB (205 bp) and investigated molecular signatures of natural selection on DRB and DQB alleles. The presence of two separate gene clusters was confirmed using BLASTN and phylogenetic analysis, where our seven transcripts clustered according to either DQB or DRB homologues. These gene clusters were again confirmed on exon 2 data from 454-amplicon sequencing. Our DRB primers amplify a similar number of alleles per individual as previously published DRB primers, though our reads are longer. Traditional d N/d S analyses of DRB sequences in the bank vole have not found a conclusive signal of positive selection. Using a more advanced substitution model (the Kumar method) we found positive selection in the peptide binding region (PBR) of both DRB and DQB genes. Maximum likelihood models of codon substitutions detected positively selected sites located in the PBR of both DQB and DRB. Interestingly, these analyses detected at least twice as many positively selected sites in DQB than DRB, suggesting that DQB has been under stronger positive selection than DRB over evolutionary time.

  9. The nuclear 18S ribosomal RNA gene as a source of phylogenetic information in the genus Taenia.

    PubMed

    Yan, Hongbin; Lou, Zhongzi; Li, Li; Ni, Xingwei; Guo, Aijiang; Li, Hongmin; Zheng, Yadong; Dyachenko, Viktor; Jia, Wanzhong

    2013-03-01

    Most species of the genus Taenia are of considerable medical and veterinary significance. In this study, complete nuclear 18S rRNA gene sequences were obtained from seven members of genus Taenia [Taenia multiceps, Taenia saginata, Taenia asiatica, Taenia solium, Taenia pisiformis, Taenia hydatigena, and Taenia taeniaeformis] and a phylogeny inferred using these sequences. Most of the variable sites fall within the variable regions, V1-V5. We show that sequences from the nuclear 18S ribosomal RNA gene have considerable promise as sources of phylogenetic information within the genus Taenia. Furthermore, given that almost all the variable sites lie within defined variable portions of that gene, it will be appropriate and economical to sequence only those regions for additional species of Taenia.

  10. An improved model for whole genome phylogenetic analysis by Fourier transform.

    PubMed

    Yin, Changchuan; Yau, Stephen S-T

    2015-10-07

    DNA sequence similarity comparison is one of the major steps in computational phylogenetic studies. The sequence comparison of closely related DNA sequences and genomes is usually performed by multiple sequence alignments (MSA). While the MSA method is accurate for some types of sequences, it may produce incorrect results when DNA sequences undergone rearrangements as in many bacterial and viral genomes. It is also limited by its computational complexity for comparing large volumes of data. Previously, we proposed an alignment-free method that exploits the full information contents of DNA sequences by Discrete Fourier Transform (DFT), but still with some limitations. Here, we present a significantly improved method for the similarity comparison of DNA sequences by DFT. In this method, we map DNA sequences into 2-dimensional (2D) numerical sequences and then apply DFT to transform the 2D numerical sequences into frequency domain. In the 2D mapping, the nucleotide composition of a DNA sequence is a determinant factor and the 2D mapping reduces the nucleotide composition bias in distance measure, and thus improving the similarity measure of DNA sequences. To compare the DFT power spectra of DNA sequences with different lengths, we propose an improved even scaling algorithm to extend shorter DFT power spectra to the longest length of the underlying sequences. After the DFT power spectra are evenly scaled, the spectra are in the same dimensionality of the Fourier frequency space, then the Euclidean distances of full Fourier power spectra of the DNA sequences are used as the dissimilarity metrics. The improved DFT method, with increased computational performance by 2D numerical representation, can be applicable to any DNA sequences of different length ranges. We assess the accuracy of the improved DFT similarity measure in hierarchical clustering of different DNA sequences including simulated and real datasets. The method yields accurate and reliable phylogenetic trees and demonstrates that the improved DFT dissimilarity measure is an efficient and effective similarity measure of DNA sequences. Due to its high efficiency and accuracy, the proposed DFT similarity measure is successfully applied on phylogenetic analysis for individual genes and large whole bacterial genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.

  11. SIMAP—the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage

    PubMed Central

    Arnold, Roland; Goldenberg, Florian; Mewes, Hans-Werner; Rattei, Thomas

    2014-01-01

    The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith–Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads. PMID:24165881

  12. Fracture propagation through a layered shale and limestone sequence at Nash Point, South Wales: Implications on the development of fracture networks in layered sequences

    NASA Astrophysics Data System (ADS)

    Forbes Inskip, N.; Meredith, P. G.; Gudmundsson, A.

    2017-12-01

    While considerable effort has been expended on the study of fracture propagation in rocks in recent years, our understanding of how fractures propagate through sedimentary rocks composed of layers with different mechanical and elastic properties remains poor. Yet the mechanical layering is a key parameter controlling the propagation of fractures in sedimentary sequences. Here we report measurements of the contrasting properties of the Lower Lias at Nash Point, South Wales, which comprises a sequence of interbedded shale and limestone layers, and how those properties influence fracture propagation. The static Young's modulus (Estat) of both rock types has been measured parallel and normal to bedding. The shale is highly anisotropic, with Estat varying from 2.4 GPa, in the bedding-normal orientation, to 7.9 GPa, in the bedding-parallel orientation, yielding an anisotropy of 107%. By contrast the limestone has a very low anisotropy of 8%, with Estat values varying from 28.5 GPa, in the bedding-normal orientation, to 26.3 GPa in the bedding-parallel orientation. It follows that for a vertical fracture propagating in this sequence the modulus contrast is by a factor of about 12. This is important because the contrast in elastic properties is a key factor in controlling whether fractures arrest, deflect, or propagate across interfaces between layers in a sequence. Preliminary numerical modelling results (using a finite element modelling software) of induced fractures at Nash Point demonstrate a rotation of the maximum principal compressive stress across interfaces but also the concentration of tensile stress within the more competent (high Estat) limestone layers. The tensile strength (σT), using the Brazil-disk technique, and fracture toughness (KIc), using the semi-circular bend methodology, of both rock types have been measured. Measurements were made in the three principal orientations relative to bedding, Arrester, Divider, and Short-Transverse, and also at 15° intervals between these planes. Again, values for the shale show a high degree of anisotropy; with similar values in the Arrester and Divider orientations, but much lower values in the Short-Transverse orientation. σT and KIc values for the limestone are considerably higher than those for the shale and exhibit no significant anisotropy.

  13. Evolutionary Origin and Conserved Structural Building Blocks of Riboswitches and Ribosomal RNAs: Riboswitches as Probable Target Sites for Aminoglycosides Interaction.

    PubMed

    Mehdizadeh Aghdam, Elnaz; Barzegar, Abolfazl; Hejazi, Mohammad Saeid

    2014-01-01

    Riboswitches, as noncoding RNA sequences, control gene expression through direct ligand binding. Sporadic reports on the structural relation of riboswitches with ribosomal RNAs (rRNA), raises an interest in possible similarity between riboswitches and rRNAs evolutionary origins. Since aminoglycoside antibiotics affect microbial cells through binding to functional sites of the bacterial rRNA, finding any conformational and functional relation between riboswitches/rRNAs is utmost important in both of medicinal and basic research. Analysis of the riboswitches structures were carried out using bioinformatics and computational tools. The possible functional similarity of riboswitches with rRNAs was evaluated based on the affinity of paromomycin antibiotic (targeting "A site" of 16S rRNA) to riboswitches via docking method. There was high structural similarity between riboswitches and rRNAs, but not any particular sequence based similarity between them was found. The building blocks including "hairpin loop containing UUU", "peptidyl transferase center conserved hairpin A loop"," helix 45" and "S2 (G8) hairpin" as high identical rRNA motifs were detected in all kinds of riboswitches. Surprisingly, binding energies of paromomycin with different riboswitches are considerably better than the binding energy of paromomycin with "16S rRNA A site". Therefore the high affinity of paromomycin to bind riboswitches in comparison with rRNA "A site" suggests a new insight about riboswitches as possible targets for aminoglycoside antibiotics. These findings are considered as a possible supporting evidence for evolutionary origin of riboswitches/rRNAs and also their role in the exertion of antibiotics effects to design new drugs based on the concomitant effects via rRNA/riboswitches.

  14. Efficient alignment-free DNA barcode analytics.

    PubMed

    Kuksa, Pavel; Pavlovic, Vladimir

    2009-11-10

    In this work we consider barcode DNA analysis problems and address them using alternative, alignment-free methods and representations which model sequences as collections of short sequence fragments (features). The methods use fixed-length representations (spectrum) for barcode sequences to measure similarities or dissimilarities between sequences coming from the same or different species. The spectrum-based representation not only allows for accurate and computationally efficient species classification, but also opens possibility for accurate clustering analysis of putative species barcodes and identification of critical within-barcode loci distinguishing barcodes of different sample groups. New alignment-free methods provide highly accurate and fast DNA barcode-based identification and classification of species with substantial improvements in accuracy and speed over state-of-the-art barcode analysis methods. We evaluate our methods on problems of species classification and identification using barcodes, important and relevant analytical tasks in many practical applications (adverse species movement monitoring, sampling surveys for unknown or pathogenic species identification, biodiversity assessment, etc.) On several benchmark barcode datasets, including ACG, Astraptes, Hesperiidae, Fish larvae, and Birds of North America, proposed alignment-free methods considerably improve prediction accuracy compared to prior results. We also observe significant running time improvements over the state-of-the-art methods. Our results show that newly developed alignment-free methods for DNA barcoding can efficiently and with high accuracy identify specimens by examining only few barcode features, resulting in increased scalability and interpretability of current computational approaches to barcoding.

  15. Identification of phylogenetic position in the Chlamydiaceae family for Chlamydia strains released from monkeys and humans with chlamydial pathology.

    PubMed

    Karaulov, Alexander; Aleshkin, Vladimir; Slobodenyuk, Vladimir; Grechishnikova, Olga; Afanasyev, Stanislav; Lapin, Boris; Dzhikidze, Eteri; Nesvizhsky, Yuriy; Evsegneeva, Irina; Voropayeva, Elena; Afanasyev, Maxim; Aleshkin, Andrei; Metelskaya, Valeria; Yegorova, Ekaterina; Bayrakova, Alexandra

    2010-01-01

    Based on the results of the comparative analysis concerning relatedness and evolutional difference of the 16S-23S nucleotide sequences of the middle ribosomal cluster and 23S rRNA I domain, and based on identification of phylogenetic position for Chlamydophila pneumoniae and Chlamydia trichomatis strains released from monkeys, relatedness of the above stated isolates with similar strains released from humans and with strains having nucleotide sequences presented in the GenBank electronic database has been detected for the first time ever. Position of these isolates in the Chlamydiaceae family phylogenetic tree has been identified. The evolutional position of the investigated original Chlamydia and Chlamydophila strains close to analogous strains from the Gen-Bank electronic database has been demonstrated. Differences in the 16S-23S nucleotide sequence of the middle ribosomal cluster and 23S rRNA I domain of plasmid and nonplasmid Chlamydia trachomatis strains released from humans and monkeys relative to different genotype groups (group B-B, Ba, D, Da, E, L1, L2, L2a; intermediate group-F, G, Ga) have been revealed for the first time ever. Abnormality in incA chromosomal gene expression resulting in Chlamydia life development cycle disorder, and decrease of Chlamydia virulence can be related to probable changes in the nucleotide sequence of the gene under consideration.

  16. Molecular characterization and phylogenetic analysis of infectious bursal disease viruses isolated from chicken in South China in 2011.

    PubMed

    Liu, Di; Zhang, Xiang-Bin; Yan, Zhuan-Qiang; Chen, Feng; Ji, Jun; Qin, Jian-Ping; Li, Hai-Yan; Lu, Jun-Peng; Xue, Yu; Liu, Jia-Jia; Xie, Qing-Mei; Ma, Jing-Yun; Xue, Chun-Yi; Bee, Ying-Zuo

    2013-06-01

    Infectious bursal disease virus (IBDV) is a double-stranded RNA virus that causes immunosuppressive disease in young chickens. Thousands of cases of IBDV infection are reported each year in South China, and these infections can result in considerable economic losses to the poultry industry. To monitor variations of the virus during the outbreaks, 30 IBDVs were identified from vaccinated chicken flocks from nine provinces in South China in 2011. VP2 fragments from different virus strains were sequenced and analyzed by comparison with the published sequences of IBDV strains from China and around the world. Phylogenetic analysis of hypervariable regions of the VP2 (vVP2) gene showed that 29 of the isolates were very virulent (vv) IBDVs, and were closely related to vvIBDV strains from Europe and Asia. Alignment analysis of the deduced amino acid (aa) sequences of vVP2 showed the 29 vv isolates had high uniformity, indicated low variability and slow evolution of the virus. The non-vvIBDV isolate JX2-11 was associated with higher than expected mortality, and had high deduced aa sequence similarity (99.2 %) with the attenuated vaccine strain B87 (BJ). The present study has demonstrated the continued circulation of IBDV strains in South China, and emphasizes the importance of reinforcing IBDV surveillance.

  17. Primer and platform effects on 16S rRNA tag sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tremblay, Julien; Singh, Kanwar; Fern, Alison

    Sequencing of 16S rRNA gene tags is a popular method for profiling and comparing microbial communities. The protocols and methods used, however, vary considerably with regard to amplification primers, sequencing primers, sequencing technologies; as well as quality filtering and clustering. How results are affected by these choices, and whether data produced with different protocols can be meaningfully compared, is often unknown. Here we compare results obtained using three different amplification primer sets (targeting V4, V6–V8, and V7–V8) and two sequencing technologies (454 pyrosequencing and Illumina MiSeq) using DNA from a mock community containing a known number of species as wellmore » as complex environmental samples whose PCR-independent profiles were estimated using shotgun sequencing. We find that paired-end MiSeq reads produce higher quality data and enabled the use of more aggressive quality control parameters over 454, resulting in a higher retention rate of high quality reads for downstream data analysis. While primer choice considerably influences quantitative abundance estimations, sequencing platform has relatively minor effects when matched primers are used. In conclusion, beta diversity metrics are surprisingly robust to both primer and sequencing platform biases.« less

  18. Primer and platform effects on 16S rRNA tag sequencing

    DOE PAGES

    Tremblay, Julien; Singh, Kanwar; Fern, Alison; ...

    2015-08-04

    Sequencing of 16S rRNA gene tags is a popular method for profiling and comparing microbial communities. The protocols and methods used, however, vary considerably with regard to amplification primers, sequencing primers, sequencing technologies; as well as quality filtering and clustering. How results are affected by these choices, and whether data produced with different protocols can be meaningfully compared, is often unknown. Here we compare results obtained using three different amplification primer sets (targeting V4, V6–V8, and V7–V8) and two sequencing technologies (454 pyrosequencing and Illumina MiSeq) using DNA from a mock community containing a known number of species as wellmore » as complex environmental samples whose PCR-independent profiles were estimated using shotgun sequencing. We find that paired-end MiSeq reads produce higher quality data and enabled the use of more aggressive quality control parameters over 454, resulting in a higher retention rate of high quality reads for downstream data analysis. While primer choice considerably influences quantitative abundance estimations, sequencing platform has relatively minor effects when matched primers are used. In conclusion, beta diversity metrics are surprisingly robust to both primer and sequencing platform biases.« less

  19. A Deep-Coverage Tomato BAC Library and Prospects Toward Development of an STC Framework for Genome Sequencing

    PubMed Central

    Budiman, Muhammad A.; Mao, Long; Wood, Todd C.; Wing, Rod A.

    2000-01-01

    Recently a new strategy using BAC end sequences as sequence-tagged connectors (STCs) was proposed for whole-genome sequencing projects. In this study, we present the construction and detailed characterization of a 15.0 haploid genome equivalent BAC library for the cultivated tomato, Lycopersicon esculentum cv. Heinz 1706. The library contains 129,024 clones with an average insert size of 117.5 kb and a chloroplast content of 1.11%. BAC end sequences from 1490 ends were generated and analyzed as a preliminary evaluation for using this library to develop an STC framework to sequence the tomato genome. A total of 1205 BAC end sequences (80.9%) were obtained, with an average length of 360 high-quality bases, and were searched against the GenBank database. Using a cutoff expectation value of <10−6, and combining the results from BLASTN, BLASTX, and TBLASTX searches, 24.3% of the BAC end sequences were similar to known sequences, of which almost half (48.7%) share sequence similarities to retrotransposons and 7% to known genes. Some of the transposable element sequences were the first reported in tomato, such as sequences similar to maize transposon Activator (Ac) ORF and tobacco pararetrovirus-like sequences. Interestingly, there were no BAC end sequences similar to the highly repeated TGRI and TGRII elements. However, the majority (70.3%) of STCs did not share significant sequence similarities to any sequences in GenBank at either the DNA or predicted protein levels, indicating that a large portion of the tomato genome is still unknown. Our data demonstrate that this BAC library is suitable for developing an STC database to sequence the tomato genome. The advantages of developing an STC framework for whole-genome sequencing of tomato are discussed. [The BAC end sequences described in this paper have been deposited in the GenBank data library under accession nos. AQ367111–AQ368361.] PMID:10645957

  20. The Effects of Within-Sequence Acoustic Similarity on the Short-Term Retention of Consonants and Words

    ERIC Educational Resources Information Center

    Marcer, D.; And Others

    1977-01-01

    Compares the rates of forgetting of five-item sequences of acoustically similar and dissimilar consonants and words in the absence of proactive and retroactive interference in order to test whether within sequence similarity rather than stimulus length would have a greater influence on retention. (Author/RK)

  1. A space-efficient algorithm for local similarities.

    PubMed

    Huang, X Q; Hardison, R C; Miller, W

    1990-10-01

    Existing dynamic-programming algorithms for identifying similar regions of two sequences require time and space proportional to the product of the sequence lengths. Often this space requirement is more limiting than the time requirement. We describe a dynamic-programming local-similarity algorithm that needs only space proportional to the sum of the sequence lengths. The method can also find repeats within a single long sequence. To illustrate the algorithm's potential, we discuss comparison of a 73,360 nucleotide sequence containing the human beta-like globin gene cluster and a corresponding 44,594 nucleotide sequence for rabbit, a problem well beyond the capabilities of other dynamic-programming software.

  2. SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform.

    PubMed

    Lin, Jie; Wei, Jing; Adjeroh, Donald; Jiang, Bing-Hua; Jiang, Yue

    2018-05-02

    Alignment-free sequence similarity analysis methods often lead to significant savings in computational time over alignment-based counterparts. A new alignment-free sequence similarity analysis method, called SSAW is proposed. SSAW stands for Sequence Similarity Analysis using the Stationary Discrete Wavelet Transform (SDWT). It extracts k-mers from a sequence, then maps each k-mer to a complex number field. Then, the series of complex numbers formed are transformed into feature vectors using the stationary discrete wavelet transform. After these steps, the original sequence is turned into a feature vector with numeric values, which can then be used for clustering and/or classification. Using two different types of applications, namely, clustering and classification, we compared SSAW against the the-state-of-the-art alignment free sequence analysis methods. SSAW demonstrates competitive or superior performance in terms of standard indicators, such as accuracy, F-score, precision, and recall. The running time was significantly better in most cases. These make SSAW a suitable method for sequence analysis, especially, given the rapidly increasing volumes of sequence data required by most modern applications.

  3. T cell receptor repertoires of mice and humans are clustered in similarity networks around conserved public CDR3 sequences

    PubMed Central

    Madi, Asaf; Poran, Asaf; Shifrut, Eric; Reich-Zeliger, Shlomit; Greenstein, Erez; Zaretsky, Irena; Arnon, Tomer; Laethem, Francois Van; Singer, Alfred; Lu, Jinghua; Sun, Peter D; Cohen, Irun R; Friedman, Nir

    2017-01-01

    Diversity of T cell receptor (TCR) repertoires, generated by somatic DNA rearrangements, is central to immune system function. However, the level of sequence similarity of TCR repertoires within and between species has not been characterized. Using network analysis of high-throughput TCR sequencing data, we found that abundant CDR3-TCRβ sequences were clustered within networks generated by sequence similarity. We discovered a substantial number of public CDR3-TCRβ segments that were identical in mice and humans. These conserved public sequences were central within TCR sequence-similarity networks. Annotated TCR sequences, previously associated with self-specificities such as autoimmunity and cancer, were linked to network clusters. Mechanistically, CDR3 networks were promoted by MHC-mediated selection, and were reduced following immunization, immune checkpoint blockade or aging. Our findings provide a new view of T cell repertoire organization and physiology, and suggest that the immune system distributes its TCR sequences unevenly, attending to specific foci of reactivity. DOI: http://dx.doi.org/10.7554/eLife.22057.001 PMID:28731407

  4. Two myxozoans from the urinary tract of topsmelt, Atherinops affinis

    USGS Publications Warehouse

    Sanders, Justin L.; Jaramillo, Alejandra G.; Ashford, Jacob E.; Feist, Stephen W.; Lafferty, Kevin D.; Kent, Michael L.

    2015-01-01

    Two myxozoan species were observed in the kidney of topsmelt, Atherinops affinis, during a survey of parasites of estuarine fishes in the Carpinteria Salt Marsh Reserve, California. Fish collected on three dates in 2012 and 2013 were sectioned and examined histologically. Large extrasporogonic stages occurred in the renal interstitium of several fish from the first two collections (5/8, 11/20, respectively), and, in some fish, these replaced over 80% of the kidney. In addition, presporogonic and polysporogonic stages occurred in the lumen of the renal tubules, collecting and mesonephric ducts. The latter contained subspherical spores with up to 4 polar capsules, consistent with the genus Chloromyxum. For the third collection (15 May 2013, n=30), we portioned kidneys for examination by histology, wet mount, and DNA extraction for small subunit ribosomal gene sequencing. Histology showed the large extrasporogonic forms in the kidney interstitium of 3 fish, and 2 other fish with subspherical myxospores in the lumen of the renal tubules with smooth valves and two spherical polar capsules consistent with the genus Sphaerospora. Chloromyxum-type myxospores were observed in the renal tubules of one fish by wet mount. Sequencing of the kidney tissue from this fish yielded a partial SSU rDNA sequence of 1769 bp. Phylogenetic reconstruction suggested this organism to be a novel species of Chloromyxum, most similar to Chloromyxum careni (84% similarity). In addition, subspherical myxospores with smooth valves and two spherical polar capsules consistent with the genus Sphaerospora were observed in wet mounts of 2 fish. Sequencing of the kidney tissue from 1 fish yielded a partial SSU rDNA sequence of 1937 bp. Phylogenetic reconstruction suggests this organism to be a novel species of Sphaerospora most closely related to Sphaerospora epinepheli (93%). We conclude that these organisms represent novel species of the genera Chloromyxum and Sphaerospora based on host, location, and SSU rDNA sequence. We further conclude that the formation of large, histozoic extrasprogonic stages in the renal interstitium represent developmental stages of the Chloromyxum species for the following reasons: 1. Large extrasporogonic stages stages were only observed in fish with Chloromyxum-type spores developing within the renal tubules, 2. DNA sequence consistent with the Chloromyxum sp. was only detected in fish with the large extrasporogonic stages and 3.Sphaerospora species have extrasporogonic forms, but they are considerably smaller and are comprised of much fewer cells.

  5. Mouse Vk gene classification by nucleic acid sequence similarity.

    PubMed

    Strohal, R; Helmberg, A; Kroemer, G; Kofler, R

    1989-01-01

    Analyses of immunoglobulin (Ig) variable (V) region gene usage in the immune response, estimates of V gene germline complexity, and other nucleic acid hybridization-based studies depend on the extent to which such genes are related (i.e., sequence similarity) and their organization in gene families. While mouse Igh heavy chain V region (VH) gene families are relatively well-established, a corresponding systematic classification of Igk light chain V region (Vk) genes has not been reported. The present analysis, in the course of which we reviewed the known extent of the Vk germline gene repertoire and Vk gene usage in a variety of responses to foreign and self antigens, provides a classification of mouse Vk genes in gene families composed of members with greater than 80% overall nucleic acid sequence similarity. This classification differed in several aspects from that of VH genes: only some Vk gene families were as clearly separated (by greater than 25% sequence dissimilarity) as typical VH gene families; most Vk gene families were closely related and, in several instances, members from different families were very similar (greater than 80%) over large sequence portions; frequently, classification by nucleic acid sequence similarity diverged from existing classifications based on amino-terminal protein sequence similarity. Our data have implications for Vk gene analyses by nucleic acid hybridization and describe potentially important differences in sequence organization between VH and Vk genes.

  6. Using relational databases for improved sequence similarity searching and large-scale genomic analyses.

    PubMed

    Mackey, Aaron J; Pearson, William R

    2004-10-01

    Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.

  7. Whole-genome sequence of the Tibetan frog Nanorana parkeri and the comparative evolution of tetrapod genomes.

    PubMed

    Sun, Yan-Bo; Xiong, Zi-Jun; Xiang, Xue-Yan; Liu, Shi-Ping; Zhou, Wei-Wei; Tu, Xiao-Long; Zhong, Li; Wang, Lu; Wu, Dong-Dong; Zhang, Bao-Lin; Zhu, Chun-Ling; Yang, Min-Min; Chen, Hong-Man; Li, Fang; Zhou, Long; Feng, Shao-Hong; Huang, Chao; Zhang, Guo-Jie; Irwin, David; Hillis, David M; Murphy, Robert W; Yang, Huan-Ming; Che, Jing; Wang, Jun; Zhang, Ya-Ping

    2015-03-17

    The development of efficient sequencing techniques has resulted in large numbers of genomes being available for evolutionary studies. However, only one genome is available for all amphibians, that of Xenopus tropicalis, which is distantly related from the majority of frogs. More than 96% of frogs belong to the Neobatrachia, and no genome exists for this group. This dearth of amphibian genomes greatly restricts genomic studies of amphibians and, more generally, our understanding of tetrapod genome evolution. To fill this gap, we provide the de novo genome of a Tibetan Plateau frog, Nanorana parkeri, and compare it to that of X. tropicalis and other vertebrates. This genome encodes more than 20,000 protein-coding genes, a number similar to that of Xenopus. Although the genome size of Nanorana is considerably larger than that of Xenopus (2.3 vs. 1.5 Gb), most of the difference is due to the respective number of transposable elements in the two genomes. The two frogs exhibit considerable conserved whole-genome synteny despite having diverged approximately 266 Ma, indicating a slow rate of DNA structural evolution in anurans. Multigenome synteny blocks further show that amphibians have fewer interchromosomal rearrangements than mammals but have a comparable rate of intrachromosomal rearrangements. Our analysis also identifies 11 Mb of anuran-specific highly conserved elements that will be useful for comparative genomic analyses of frogs. The Nanorana genome offers an improved understanding of evolution of tetrapod genomes and also provides a genomic reference for other evolutionary studies.

  8. Effect of DNA extraction and sample preservation method on rumen bacterial population.

    PubMed

    Fliegerova, Katerina; Tapio, Ilma; Bonin, Aurelie; Mrazek, Jakub; Callegari, Maria Luisa; Bani, Paolo; Bayat, Alireza; Vilkki, Johanna; Kopečný, Jan; Shingfield, Kevin J; Boyer, Frederic; Coissac, Eric; Taberlet, Pierre; Wallace, R John

    2014-10-01

    The comparison of the bacterial profile of intracellular (iDNA) and extracellular DNA (eDNA) isolated from cow rumen content stored under different conditions was conducted. The influence of rumen fluid treatment (cheesecloth squeezed, centrifuged, filtered), storage temperature (RT, -80 °C) and cryoprotectants (PBS-glycerol, ethanol) on quality and quantity parameters of extracted DNA was evaluated by bacterial DGGE analysis, real-time PCR quantification and metabarcoding approach using high-throughput sequencing. Samples clustered according to the type of extracted DNA due to considerable differences between iDNA and eDNA bacterial profiles, while storage temperature and cryoprotectants additives had little effect on sample clustering. The numbers of Firmicutes and Bacteroidetes were lower (P < 0.01) in eDNA samples. The qPCR indicated significantly higher amount of Firmicutes in iDNA sample frozen with glycerol (P < 0.01). Deep sequencing analysis of iDNA samples revealed the prevalence of Bacteroidetes and similarity of samples frozen with and without cryoprotectants, which differed from sample stored with ethanol at room temperature. Centrifugation and consequent filtration of rumen fluid subjected to the eDNA isolation procedure considerably changed the ratio of molecular operational taxonomic units (MOTUs) of Bacteroidetes and Firmicutes. Intracellular DNA extraction using bead-beating method from cheesecloth sieved rumen content mixed with PBS-glycerol and stored at -80 °C was found as the optimal method to study ruminal bacterial profile. Copyright © 2013 Elsevier Ltd. All rights reserved.

  9. Correlation between protein sequence similarity and x-ray diffraction quality in the protein data bank.

    PubMed

    Lu, Hui-Meng; Yin, Da-Chuan; Ye, Ya-Jing; Luo, Hui-Min; Geng, Li-Qiang; Li, Hai-Sheng; Guo, Wei-Hong; Shang, Peng

    2009-01-01

    As the most widely utilized technique to determine the 3-dimensional structure of protein molecules, X-ray crystallography can provide structure of the highest resolution among the developed techniques. The resolution obtained via X-ray crystallography is known to be influenced by many factors, such as the crystal quality, diffraction techniques, and X-ray sources, etc. In this paper, the authors found that the protein sequence could also be one of the factors. We extracted information of the resolution and the sequence of proteins from the Protein Data Bank (PDB), classified the proteins into different clusters according to the sequence similarity, and statistically analyzed the relationship between the sequence similarity and the best resolution obtained. The results showed that there was a pronounced correlation between the sequence similarity and the obtained resolution. These results indicate that protein structure itself is one variable that may affect resolution when X-ray crystallography is used.

  10. Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform

    PubMed Central

    Van Nostrand, Joy D.; Ning, Daliang; Sun, Bo; Xue, Kai; Liu, Feifei; Deng, Ye; Liang, Yuting; Zhou, Jizhong

    2017-01-01

    Illumina’s MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered, the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1–3%, p<0.001). Increasing the sequencing depth to 160,000 reads by deep sequencing increased OTU overlap both when sequence abundance was considered (95%) and when not (44%). However, if singletons were not removed the overlap between two technical replicates (not considering sequence abundance) plateaus at 39% with 30,000 sequences. Diversity measures were not affected by the low overlap as α-diversities were similar among technical replicates while β-diversities (Bray-Curtis) were much smaller among technical replicates than among treatment replicates (e.g., 0.269 vs. 0.374). Higher diversity coverage, but lower OTU overlap, was observed when replicates were sequenced in separate runs. Detrended correspondence analysis indicated that while there was considerable variation among technical replicates, the reproducibility was sufficient for detecting treatment effects for the samples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately and with caution. For example, including technical replicates, removing spurious sequences and unrepresentative OTUs, using a clustering method with a high stringency for OTU generation, estimating treatment effects at higher taxonomic levels, and adapting the unique molecular identifier (UMI) and other newly developed methods to lower PCR and sequencing error and to identify true low abundance rare species all can increase reproducibility. PMID:28453559

  11. Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wen, Chongqing; Wu, Liyou; Qin, Yujia

    Illumina's MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered,more » the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1-3%, p<0.001). Increasing the sequencing depth to 160,000 reads by deep sequencing increased OTU overlap both when sequence abundance was considered (95%) and when not (44%). However, if singletons were not removed the overlap between two technical replicates (not considering sequence abundance) plateaus at 39% with 30,000 sequences. Diversity measures were not affected by the low overlap as α-diversities were similar among technical replicates while β-diversities (Bray-Curtis) were much smaller among technical replicates than among treatment replicates (e.g., 0.269 vs. 0.374). Higher diversity coverage, but lower OTU overlap, was observed when replicates were sequenced in separate runs. Detrended correspondence analysis indicated that while there was considerable variation among technical replicates, the reproducibility was sufficient for detecting treatment effects for the samples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately and with caution. For example, including technical replicates, removing spurious sequences and unrepresentative OTUs, using a clustering method with a high stringency for OTU generation, estimating treatment effects at higher taxonomic levels, and adapting the unique molecular identifier (UMI) and other newly developed methods to lower PCR and sequencing error and to identify true low abundance rare species all can increase reproducibility.« less

  12. Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform

    DOE PAGES

    Wen, Chongqing; Wu, Liyou; Qin, Yujia; ...

    2017-04-28

    Illumina's MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered,more » the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1-3%, p<0.001). Increasing the sequencing depth to 160,000 reads by deep sequencing increased OTU overlap both when sequence abundance was considered (95%) and when not (44%). However, if singletons were not removed the overlap between two technical replicates (not considering sequence abundance) plateaus at 39% with 30,000 sequences. Diversity measures were not affected by the low overlap as α-diversities were similar among technical replicates while β-diversities (Bray-Curtis) were much smaller among technical replicates than among treatment replicates (e.g., 0.269 vs. 0.374). Higher diversity coverage, but lower OTU overlap, was observed when replicates were sequenced in separate runs. Detrended correspondence analysis indicated that while there was considerable variation among technical replicates, the reproducibility was sufficient for detecting treatment effects for the samples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately and with caution. For example, including technical replicates, removing spurious sequences and unrepresentative OTUs, using a clustering method with a high stringency for OTU generation, estimating treatment effects at higher taxonomic levels, and adapting the unique molecular identifier (UMI) and other newly developed methods to lower PCR and sequencing error and to identify true low abundance rare species all can increase reproducibility.« less

  13. Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform.

    PubMed

    Wen, Chongqing; Wu, Liyou; Qin, Yujia; Van Nostrand, Joy D; Ning, Daliang; Sun, Bo; Xue, Kai; Liu, Feifei; Deng, Ye; Liang, Yuting; Zhou, Jizhong

    2017-01-01

    Illumina's MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered, the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1-3%, p<0.001). Increasing the sequencing depth to 160,000 reads by deep sequencing increased OTU overlap both when sequence abundance was considered (95%) and when not (44%). However, if singletons were not removed the overlap between two technical replicates (not considering sequence abundance) plateaus at 39% with 30,000 sequences. Diversity measures were not affected by the low overlap as α-diversities were similar among technical replicates while β-diversities (Bray-Curtis) were much smaller among technical replicates than among treatment replicates (e.g., 0.269 vs. 0.374). Higher diversity coverage, but lower OTU overlap, was observed when replicates were sequenced in separate runs. Detrended correspondence analysis indicated that while there was considerable variation among technical replicates, the reproducibility was sufficient for detecting treatment effects for the samples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately and with caution. For example, including technical replicates, removing spurious sequences and unrepresentative OTUs, using a clustering method with a high stringency for OTU generation, estimating treatment effects at higher taxonomic levels, and adapting the unique molecular identifier (UMI) and other newly developed methods to lower PCR and sequencing error and to identify true low abundance rare species all can increase reproducibility.

  14. Finding functional features in Saccharomyces genomes by phylogenetic footprinting.

    PubMed

    Cliften, Paul; Sudarsanam, Priya; Desikan, Ashwin; Fulton, Lucinda; Fulton, Bob; Majors, John; Waterston, Robert; Cohen, Barak A; Johnston, Mark

    2003-07-04

    The sifting and winnowing of DNA sequence that occur during evolution cause nonfunctional sequences to diverge, leaving phylogenetic footprints of functional sequence elements in comparisons of genome sequences. We searched for such footprints among the genome sequences of six Saccharomyces species and identified potentially functional sequences. Comparison of these sequences allowed us to revise the catalog of yeast genes and identify sequence motifs that may be targets of transcriptional regulatory proteins. Some of these conserved sequence motifs reside upstream of genes with similar functional annotations or similar expression patterns or those bound by the same transcription factor and are thus good candidates for functional regulatory sequences.

  15. Tumor Heterogeneity, Single-Cell Sequencing, and Drug Resistance.

    PubMed

    Schmidt, Felix; Efferth, Thomas

    2016-06-16

    Tumor heterogeneity has been compared with Darwinian evolution and survival of the fittest. The evolutionary ecosystem of tumors consisting of heterogeneous tumor cell populations represents a considerable challenge to tumor therapy, since all genetically and phenotypically different subpopulations have to be efficiently killed by therapy. Otherwise, even small surviving subpopulations may cause repopulation and refractory tumors. Single-cell sequencing allows for a better understanding of the genomic principles of tumor heterogeneity and represents the basis for more successful tumor treatments. The isolation and sequencing of single tumor cells still represents a considerable technical challenge and consists of three major steps: (1) single cell isolation (e.g., by laser-capture microdissection), fluorescence-activated cell sorting, micromanipulation, whole genome amplification (e.g., with the help of Phi29 DNA polymerase), and transcriptome-wide next generation sequencing technologies (e.g., 454 pyrosequencing, Illumina sequencing, and other systems). Data demonstrating the feasibility of single-cell sequencing for monitoring the emergence of drug-resistant cell clones in patient samples are discussed herein. It is envisioned that single-cell sequencing will be a valuable asset to assist the design of regimens for personalized tumor therapies based on tumor subpopulation-specific genetic alterations in individual patients.

  16. K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.

    PubMed

    Lin, Jie; Adjeroh, Donald A; Jiang, Bing-Hua; Jiang, Yue

    2018-05-15

    Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods. We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes. The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz). yueljiang@163.com. Supplementary data are available at Bioinformatics online.

  17. A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference.

    PubMed

    Shen, Xing-Xing; Salichos, Leonidas; Rokas, Antonis

    2016-09-02

    Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal and could be useful in guiding the choice of phylogenetic markers. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  18. Whole-genome characterization of a Peruvian alpaca rotavirus isolate expressing a novel VP4 genotype.

    PubMed

    Rojas, Miguel; Gonçalves, Jorge Luiz S; Dias, Helver G; Manchego, Alberto; Pezo, Danilo; Santos, Norma

    2016-11-30

    The SA44 isolate of Rotavirus A (RVA) was identified from a neonatal Peruvian alpaca presenting with diarrhea, and the full-length genome sequence of the isolate (designated RVA/Alpaca-tc/PER/SA44/2014/G3P[40]) was determined. Phylogenetic analyses showed that the isolate possessed the genotype constellation G3-P[40]-I8-R3-C3-M3-A9-N3-T3-E3-H6, which differs considerably from those of RVA strains isolated from other species of the order Artiodactyla. Overall, the genetic constellation of the SA44 strain was quite similar to those of RVA strains isolated from a bat in Asia (MSLH14 and MYAS33). Nonetheless, phylogenetic analyses of each genome segment identified a distinct combination of genes. Several sequences were closely related to corresponding gene sequences in RVA strains from other species, including human (VP1, VP2, NSP1, and NSP2), simian (VP3 and NSP5), bat (VP6 and NSP4), and equine (NSP3). The VP7 gene sequence was closely related to RVA strains from a Peruvian alpaca (K'ayra/3368-10; 99.0% nucleotide and 99.7% amino acid identity) and from humans (RCH272; 95% nucleotide and 99.0% amino acid identity). The nucleotide sequence of the VP4 gene was distantly related to other VP4 sequences and was designated as the reference strain for the new P[40] genotype. This unique genetic makeup suggests that the SA44 strain emerged from multiple reassortment events between bat-, equine-, and human-like RVA strains. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. The Foldback-like element Galileo belongs to the P superfamily of DNA transposons and is widespread within the Drosophila genus.

    PubMed

    Marzo, Mar; Puig, Marta; Ruiz, Alfredo

    2008-02-26

    Galileo is the only transposable element (TE) known to have generated natural chromosomal inversions in the genus Drosophila. It was discovered in Drosophila buzzatii and classified as a Foldback-like element because of its long, internally repetitive, terminal inverted repeats (TIRs) and lack of coding capacity. Here, we characterized a seemingly complete copy of Galileo from the D. buzzatii genome. It is 5,406 bp long, possesses 1,229-bp TIRs, and encodes a 912-aa transposase similar to those of the Drosophila melanogaster 1360 (Hoppel) and P elements. We also searched the recently available genome sequences of 12 Drosophila species for elements similar to Dbuz\\Galileo by using bioinformatic tools. Galileo was found in six species (ananassae, willistoni, peudoobscura, persimilis, virilis, and mojavensis) from the two main lineages within the Drosophila genus. Our observations place Galileo within the P superfamily of cut-and-paste transposons and extend considerably its phylogenetic distribution. The interspecific distribution of Galileo indicates an ancient presence in the genus, but the phylogenetic tree built with the transposase amino acid sequences contrasts significantly with that of the species, indicating lineage sorting and/or horizontal transfer events. Our results also suggest that Foldback-like elements such as Galileo may evolve from DNA-based transposon ancestors by loss of the transposase gene and disproportionate elongation of TIRs.

  20. The Foldback-like element Galileo belongs to the P superfamily of DNA transposons and is widespread within the Drosophila genus

    PubMed Central

    Marzo, Mar; Puig, Marta; Ruiz, Alfredo

    2008-01-01

    Galileo is the only transposable element (TE) known to have generated natural chromosomal inversions in the genus Drosophila. It was discovered in Drosophila buzzatii and classified as a Foldback-like element because of its long, internally repetitive, terminal inverted repeats (TIRs) and lack of coding capacity. Here, we characterized a seemingly complete copy of Galileo from the D. buzzatii genome. It is 5,406 bp long, possesses 1,229-bp TIRs, and encodes a 912-aa transposase similar to those of the Drosophila melanogaster 1360 (Hoppel) and P elements. We also searched the recently available genome sequences of 12 Drosophila species for elements similar to Dbuz\\Galileo by using bioinformatic tools. Galileo was found in six species (ananassae, willistoni, peudoobscura, persimilis, virilis, and mojavensis) from the two main lineages within the Drosophila genus. Our observations place Galileo within the P superfamily of cut-and-paste transposons and extend considerably its phylogenetic distribution. The interspecific distribution of Galileo indicates an ancient presence in the genus, but the phylogenetic tree built with the transposase amino acid sequences contrasts significantly with that of the species, indicating lineage sorting and/or horizontal transfer events. Our results also suggest that Foldback-like elements such as Galileo may evolve from DNA-based transposon ancestors by loss of the transposase gene and disproportionate elongation of TIRs. PMID:18287066

  1. Cloning and sequencing of the allophycocyanin genes from Spirulina maxima (Cyanophyta)

    NASA Astrophysics Data System (ADS)

    Qin, Song; Hiroyuki, Kojima; Yoshikazu, Kawata; Shin-Ichi, Yano; Zeng, Cheng-Kui

    1998-03-01

    The genes coding for the α-and β-subunit of allophycocyanin ( apcA and apcB) from the cyanophyte Spirulina maxima were cloned and sequenced. The results revealed 44.4% of nucleotide sequence similarity and 30.4% of similarity of deduced amino acid sequence between them. The amino acid sequence identities between S. maxima and S. platensis are 99.4% for α subunit and 100% for β subunit.

  2. Next-Generation Sequence Analysis of the Genome of RFHVMn, the Macaque Homolog of Kaposi's Sarcoma (KS)-Associated Herpesvirus, from a KS-Like Tumor of a Pig-Tailed Macaque

    PubMed Central

    Bruce, A. Gregory; Ryan, Jonathan T.; Thomas, Mathew J.; Peng, Xinxia; Grundhoff, Adam; Tsai, Che-Chung

    2013-01-01

    The complete sequence of retroperitoneal fibromatosis-associated herpesvirus Macaca nemestrina (RFHVMn), the pig-tailed macaque homolog of Kaposi's sarcoma-associated herpesvirus (KSHV), was determined by next-generation sequence analysis of a Kaposi's sarcoma (KS)-like macaque tumor. Colinearity of genes was observed with the KSHV genome, and the core herpesvirus genes had strong sequence homology to the corresponding KSHV genes. RFHVMn lacked homologs of open reading frame 11 (ORF11) and KSHV ORFs K5 and K6, which appear to have been generated by duplication of ORFs K3 and K4 after the divergence of KSHV and RFHV. RFHVMn contained positional homologs of all other unique KSHV genes, although some showed limited sequence similarity. RFHVMn contained a number of candidate microRNA genes. Although there was little sequence similarity with KSHV microRNAs, one candidate contained the same seed sequence as the positional homolog, kshv-miR-K12-10a, suggesting functional overlap. RNA transcript splicing was highly conserved between RFHVMn and KSHV, and strong sequence conservation was noted in specific promoters and putative origins of replication, predicting important functional similarities. Sequence comparisons indicated that RFHVMn and KSHV developed in long-term synchrony with the evolution of their hosts, and both viruses phylogenetically group within the RV1 lineage of Old World primate rhadinoviruses. RFHVMn is the closest homolog of KSHV to be completely sequenced and the first sequenced RV1 rhadinovirus homolog of KSHV from a nonhuman Old World primate. The strong genetic and sequence similarity between RFHVMn and KSHV, coupled with similarities in biology and pathology, demonstrate that RFHVMn infection in macaques offers an important and relevant model for the study of KSHV in humans. PMID:24109218

  3. Synthetic oligonucleotide probes deduced from amino acid sequence data. Theoretical and practical considerations.

    PubMed

    Lathe, R

    1985-05-05

    Synthetic probes deduced from amino acid sequence data are widely used to detect cognate coding sequences in libraries of cloned DNA segments. The redundancy of the genetic code dictates that a choice must be made between (1) a mixture of probes reflecting all codon combinations, and (2) a single longer "optimal" probe. The second strategy is examined in detail. The frequency of sequences matching a given probe by chance alone can be determined and also the frequency of sequences closely resembling the probe and contributing to the hybridization background. Gene banks cannot be treated as random associations of the four nucleotides, and probe sequences deduced from amino acid sequence data occur more often than predicted by chance alone. Probe lengths must be increased to confer the necessary specificity. Examination of hybrids formed between unique homologous probes and their cognate targets reveals that short stretches of perfect homology occurring by chance make a significant contribution to the hybridization background. Statistical methods for improving homology are examined, taking human coding sequences as an example, and considerations of codon utilization and dinucleotide frequencies yield an overall homology of greater than 82%. Recommendations for probe design and hybridization are presented, and the choice between using multiple probes reflecting all codon possibilities and a unique optimal probe is discussed.

  4. CoVaCS: a consensus variant calling system.

    PubMed

    Chiara, Matteo; Gioiosa, Silvia; Chillemi, Giovanni; D'Antonio, Mattia; Flati, Tiziano; Picardi, Ernesto; Zambelli, Federico; Horner, David Stephen; Pesole, Graziano; Castrignanò, Tiziana

    2018-02-05

    The advent and ongoing development of next generation sequencing technologies (NGS) has led to a rapid increase in the rate of human genome re-sequencing data, paving the way for personalized genomics and precision medicine. The body of genome resequencing data is progressively increasing underlining the need for accurate and time-effective bioinformatics systems for genotyping - a crucial prerequisite for identification of candidate causal mutations in diagnostic screens. Here we present CoVaCS, a fully automated, highly accurate system with a web based graphical interface for genotyping and variant annotation. Extensive tests on a gold standard benchmark data-set -the NA12878 Illumina platinum genome- confirm that call-sets based on our consensus strategy are completely in line with those attained by similar command line based approaches, and far more accurate than call-sets from any individual tool. Importantly our system exhibits better sensitivity and higher specificity than equivalent commercial software. CoVaCS offers optimized pipelines integrating state of the art tools for variant calling and annotation for whole genome sequencing (WGS), whole-exome sequencing (WES) and target-gene sequencing (TGS) data. The system is currently hosted at Cineca, and offers the speed of a HPC computing facility, a crucial consideration when large numbers of samples must be analysed. Importantly, all the analyses are performed automatically allowing high reproducibility of the results. As such, we believe that CoVaCS can be a valuable tool for the analysis of human genome resequencing studies. CoVaCS is available at: https://bioinformatics.cineca.it/covacs .

  5. Genetic Characterization of the Fish Piaractus brachypomus by Microsatellites Derived from Transcriptome Sequencing.

    PubMed

    Jorge, Paulo H; Mastrochirico-Filho, Vito A; Hata, Milene E; Mendes, Natália J; Ariede, Raquel B; de Freitas, Milena Vieira; Vera, Manuel; Porto-Foresti, Fábio; Hashimoto, Diogo T

    2018-01-01

    The pirapitinga, Piaractus brachypomus (Characiformes, Serrasalmidae), is a fish from the Amazon basin and is considered to be one of the main native species used in aquaculture production in South America. The objectives of this study were: (1) to perform liver transcriptome sequencing of pirapitinga through NGS and then validate a set of microsatellite markers for this species; and (2) to use polymorphic microsatellites for analysis of genetic variability in farmed stocks. The transcriptome sequencing was carried out through the Roche/454 technology, which resulted in 3,696 non-redundant contigs. Of this total, 2,568 contigs had similarity in the non-redundant (nr) protein database (Genbank) and 2,075 sequences were characterized in the categories of Gene Ontology (GO). After the validation process of 30 microsatellite loci, eight markers showed polymorphism. The analysis of these polymorphic markers in farmed stocks revealed that fish farms from North Brazil had a higher genetic diversity than fish farms from Southeast Brazil. AMOVA demonstrated that the highest proportion of variation was presented within the populations. However, when comparing different groups (1: Wild; 2: North fish farms; 3: Southeast fish farms), a considerable variation between the groups was observed. The F ST values showed the occurrence of genetic structure among the broodstocks from different regions of Brazil. The transcriptome sequencing in pirapitinga provided important genetic resources for biological studies in this non-model species, and microsatellite data can be used as the framework for the genetic management of breeding stocks in Brazil, which might provide a basis for a genetic pre-breeding programme.

  6. Using SQL Databases for Sequence Similarity Searching and Analysis.

    PubMed

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  7. Binding Site Turnover Produces Pervasive Quantitative Changes in Transcription Factor Binding between Closely Related Drosophila Species

    PubMed Central

    Trapnell, Cole; Davidson, Stuart; Pachter, Lior; Chu, Hou Cheng; Tonkin, Leath A.; Biggin, Mark D.; Eisen, Michael B.

    2010-01-01

    Changes in gene expression play an important role in evolution, yet the molecular mechanisms underlying regulatory evolution are poorly understood. Here we compare genome-wide binding of the six transcription factors that initiate segmentation along the anterior-posterior axis in embryos of two closely related species: Drosophila melanogaster and Drosophila yakuba. Where we observe binding by a factor in one species, we almost always observe binding by that factor to the orthologous sequence in the other species. Levels of binding, however, vary considerably. The magnitude and direction of the interspecies differences in binding levels of all six factors are strongly correlated, suggesting a role for chromatin or other factor-independent forces in mediating the divergence of transcription factor binding. Nonetheless, factor-specific quantitative variation in binding is common, and we show that it is driven to a large extent by the gain and loss of cognate recognition sequences for the given factor. We find only a weak correlation between binding variation and regulatory function. These data provide the first genome-wide picture of how modest levels of sequence divergence between highly morphologically similar species affect a system of coordinately acting transcription factors during animal development, and highlight the dominant role of quantitative variation in transcription factor binding over short evolutionary distances. PMID:20351773

  8. Occurrence and activity of a type II CRISPR-Cas system in Lactobacillus gasseri.

    PubMed

    Sanozky-Dawes, Rosemary; Selle, Kurt; O'Flaherty, Sarah; Klaenhammer, Todd; Barrangou, Rodolphe

    2015-09-01

    Bacteria encode clustered regularly interspaced short palindromic repeats (CRISPRs) and CRISPR-associated genes (cas), which collectively form an RNA-guided adaptive immune system against invasive genetic elements. In silico surveys have revealed that lactic acid bacteria harbour a prolific and diverse set of CRISPR-Cas systems. Thus, the natural evolutionary role of CRISPR-Cas systems may be investigated in these ecologically, industrially, scientifically and medically important microbes. In this study, 17 Lactobacillus gasseri strains were investigated and 6 harboured a type II-A CRISPR-Cas system, with considerable diversity in array size and spacer content. Several of the spacers showed similarity to phage and plasmid sequences, which are typical targets of CRISPR-Cas immune systems. Aligning the protospacers facilitated inference of the protospacer adjacent motif sequence, determined to be 5'-NTAA-3' flanking the 3' end of the protospacer. The system in L. gasseri JV-V03 and NCK 1342 interfered with transforming plasmids containing sequences matching the most recently acquired CRISPR spacers in each strain. We report the distribution and function of a native type II-A CRISPR-Cas system in the commensal species L. gasseri. Collectively, these results open avenues for applications for bacteriophage protection and genome modification in L. gasseri, and contribute to the fundamental understanding of CRISPR-Cas systems in bacteria.

  9. Isolation and Characterization of the PKAr Gene From a Plant Pathogen, Curvularia lunata.

    PubMed

    Liu, T; Ma, B C; Hou, J M; Zuo, Y H

    2014-09-01

    By using EST database from a full-length cDNA library of Curvularia lunata, we have isolated a 2.9 kb cDNA, termed PKAr. An ORF of 1,383 bp encoding a polypeptide of 460 amino acids with molecular weight 50.1 kDa, (GeneBank Acc. No. KF675744) was cloned. The deduced amino acid sequence of the PKAr shows 90 and 88 % identity with cAMP-dependent protein kinase A regulatory subunit from Alternaria alternate and Pyrenophora tritici-repentis Pt-1C-BFP, respectively. Database analysis revealed that the deduced amino acid sequence of PKAr shares considerable similarity with that of PKA regulatory subunits in other organisms, particularly in the conserved regions. No introns were identified within the 1,383 bp of ORF compared with PKAr genomic DNA sequence. Southern blot indicated that PKAr existed as a single copy per genome. The mRNA expression level of PKAr in different development stages were demonstrated using real-time quantitative PCR. The results showed that the level of PKAr expression was highest in vegetative growth mycelium, which indicated it might play an important role in the vegetative growth of C. lunata. These results provided a fundamental supporting research on the function of PKAr in plant pathogen, C. lunata.

  10. Bacterial diversity in Adélie penguin, Pygoscelis adeliae, guano: molecular and morpho-physiological approaches.

    PubMed

    Zdanowski, Marek K; Weglenski, Piotr; Golik, Pawel; Sasin, Joanna M; Borsuk, Piotr; Zmuda, Magdalena J; Stankovic, Anna

    2004-11-01

    The total number of bacteria and culturable bacteria in Adélie penguin (Pygoscelis adeliae) guano was determined during 42 days of decomposition in a location adjacent to the rookery in Admiralty Bay, King George Island, Antarctica. Of the culturable bacteria, 72 randomly selected colonies were described using 49 morpho-physiological tests, 27 of which were subsequently considered significant in characterizing and differentiating the isolates. On the basis of the nucleotide sequence of a fragment of the 16S rRNA gene in each of 72 pure isolates, three major phylogenetic groups were identified, namely the Moraxellaceae/Pseudomonadaceae (29 isolates), the Flavobacteriaceae (14), and the Micrococcaceae (29). Grouping of the isolates on the basis of morpho-physiological tests (whether 49 or 27 parameters) showed similar results to those based on 16S rRNA gene sequences. Clusters were characterized by considerable intra-cluster variation in both 16S rRNA gene sequences and morpho-physiological responses. High diversity in abundance and morphometry of total bacterial communities during penguin guano decomposition was supported by image analysis of epifluorescence micrographs. The results indicate that the bacterial community in penguin guano is not only one of the richest in Antarctica, but is extremely diverse, both phylogenetically and morpho-physiologically.

  11. Potential use of bacterial community succession for estimating post-mortem interval as revealed by high-throughput sequencing

    PubMed Central

    Guo, Juanjuan; Fu, Xiaoliang; Liao, Huidan; Hu, Zhenyu; Long, Lingling; Yan, Weitao; Ding, Yanjun; Zha, Lagabaiyila; Guo, Yadong; Yan, Jie; Chang, Yunfeng; Cai, Jifeng

    2016-01-01

    Decomposition is a complex process involving the interaction of both biotic and abiotic factors. Microbes play a critical role in the process of carrion decomposition. In this study, we analysed bacterial communities from live rats and rat remains decomposed under natural conditions, or excluding sarcosaphagous insect interference, in China using Illumina MiSeq sequencing of 16S rRNA gene amplicons. A total of 1,394,842 high-quality sequences and 1,938 singleton operational taxonomic units were obtained. Bacterial communities showed notable variation in relative abundance and became more similar to each other across body sites during the decomposition process. As decomposition progressed, Proteobacteria (mostly Gammaproteobacteria) became the predominant phylum in both the buccal cavity and rectum, while Firmicutes and Bacteroidetes in the mouth and rectum, respectively, gradually decreased. In particular, the arrival and oviposition of sarcosaphagous insects had no obvious influence on bacterial taxa composition, but accelerated the loss of biomass. In contrast to the rectum, the microbial community structure in the buccal cavity of live rats differed considerably from that of rats immediately after death. Although this research indicates that bacterial communities can be used as a “microbial clock” for the estimation of post-mortem interval, further work is required to better understand this concept. PMID:27052375

  12. Myticalins: A Novel Multigenic Family of Linear, Cationic Antimicrobial Peptides from Marine Mussels (Mytilus spp.).

    PubMed

    Leoni, Gabriele; De Poli, Andrea; Mardirossian, Mario; Gambato, Stefano; Florian, Fiorella; Venier, Paola; Wilson, Daniel N; Tossi, Alessandro; Pallavicini, Alberto; Gerdol, Marco

    2017-08-22

    The application of high-throughput sequencing technologies to non-model organisms has brought new opportunities for the identification of bioactive peptides from genomes and transcriptomes. From this point of view, marine invertebrates represent a potentially rich, yet largely unexplored resource for de novo discovery due to their adaptation to diverse challenging habitats. Bioinformatics analyses of available genomic and transcriptomic data allowed us to identify myticalins, a novel family of antimicrobial peptides (AMPs) from the mussel Mytilus galloprovincialis , and a similar family of AMPs from Modiolus spp., named modiocalins. Their coding sequence encompasses two conserved N-terminal (signal peptide) and C-terminal (propeptide) regions and a hypervariable central cationic region corresponding to the mature peptide. Myticalins are taxonomically restricted to Mytiloida and they can be classified into four subfamilies. These AMPs are subject to considerable interindividual sequence variability and possibly to presence/absence variation. Functional assays performed on selected members of this family indicate a remarkable tissue-specific expression (in gills) and broad spectrum of activity against both Gram-positive and Gram-negative bacteria. Overall, we present the first linear AMPs ever described in marine mussels and confirm the great potential of bioinformatics tools for the de novo discovery of bioactive peptides in non-model organisms.

  13. The paradox of HBV evolution as revealed from a 16th century mummy

    PubMed Central

    Duggan, Ana T.; Poinar, Debi; Poinar, Hendrik N.

    2018-01-01

    Hepatitis B virus (HBV) is a ubiquitous viral pathogen associated with large-scale morbidity and mortality in humans. However, there is considerable uncertainty over the time-scale of its origin and evolution. Initial shotgun data from a mid-16th century Italian child mummy, that was previously paleopathologically identified as having been infected with Variola virus (VARV, the agent of smallpox), showed no DNA reads for VARV yet did for hepatitis B virus (HBV). Previously, electron microscopy provided evidence for the presence of VARV in this sample, although similar analyses conducted here did not reveal any VARV particles. We attempted to enrich and sequence for both VARV and HBV DNA. Although we did not recover any reads identified as VARV, we were successful in reconstructing an HBV genome at 163.8X coverage. Strikingly, both the HBV sequence and that of the associated host mitochondrial DNA displayed a nearly identical cytosine deamination pattern near the termini of DNA fragments, characteristic of an ancient origin. In contrast, phylogenetic analyses revealed a close relationship between the putative ancient virus and contemporary HBV strains (of genotype D), at first suggesting contamination. In addressing this paradox we demonstrate that HBV evolution is characterized by a marked lack of temporal structure. This confounds attempts to use molecular clock-based methods to date the origin of this virus over the time-frame sampled so far, and means that phylogenetic measures alone cannot yet be used to determine HBV sequence authenticity. If genuine, this phylogenetic pattern indicates that the genotypes of HBV diversified long before the 16th century, and enables comparison of potential pathogenic similarities between modern and ancient HBV. These results have important implications for our understanding of the emergence and evolution of this common viral pathogen. PMID:29300782

  14. The disorderly conduct of Hsc70 and its interaction with the Alzheimer's related Tau protein.

    PubMed

    Taylor, Isabelle R; Ahmad, Atta; Wu, Taia; Nordhues, Bryce A; Bhullar, Anup; Gestwicki, Jason E; Zuiderweg, Erik R P

    2018-05-15

    Hsp70 chaperones bind to various protein substrates for folding, trafficking, and degradation. Considerable structural information is available about how prokaryotic Hsp70 (DnaK) binds substrates, but less is known about mammalian Hsp70s, of which there are 13 isoforms encoded in the human genome. Here, we report the interaction between the human Hsp70 isoform heat shock cognate 71 KDa protein (Hsc70 or HSPA8) and peptides derived from the microtubule-associated protein tau, which is linked to Alzheimer's disease. For structural studies, we used an Hsc70 construct (called BETA) comprising the substrate-binding domain, but lacking the lid. Importantly, we found that truncating the lid does not significantly impair Hsc70's chaperone activity or allostery in vitro. Using NMR, we show that BETA is partially dynamically disordered in the absence of substrate and that binding of the tau sequence GKVQIINKKG (with a KD = 500 nM) causes dramatic rigidification of BETA. Nuclear Overhauser effect distance measurements revealed that tau binds to the canonical substrate-binding cleft, similar to the binding observed with DnaK. To further develop BETA as a tool for studying Hsc70 interactions, we also measured BETA binding in NMR and fluorescent competition assays to peptides derived from huntingtin, insulin, a second tau-recognition sequence, and a KFERQ-like sequence linked to chaperone-mediated autophagy. We found that the insulin C-peptide binds BETA with high affinity (KD < 100 nM), whereas the others do not (KD > 100 μM). Together, our findings reveal several similarities and differences in how prokaryotic and mammalian Hsp70 isoforms interact with different substrate peptides. Published under license by The American Society for Biochemistry and Molecular Biology, Inc.

  15. Diversity of the P2 protein among nontypeable Haemophilus influenzae isolates.

    PubMed Central

    Bell, J; Grass, S; Jeanteur, D; Munson, R S

    1994-01-01

    The genes for outer membrane protein P2 of four nontypeable Haemophilus influenzae strains were cloned and sequenced. The derived amino acid sequences were compared with the outer membrane protein P2 sequence from H. influenzae type b MinnA and the sequences of P2 from three additional nontypeable H. influenzae strains. The sequences were 76 to 94% identical. The sequences had regions with considerable variability separated by regions which were highly conserved. The variable regions mapped to putative surface-exposed loops of the protein. PMID:8188390

  16. Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment

    PubMed Central

    2013-01-01

    Background Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. Results In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Conclusion Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to identify conserved regions fast or even interactively using a standard PC. Our method has many potential applications such as finding characteristic signature sequences for families of organisms and studying conserved and variable regions in, for example, 16S rRNA. PMID:24564200

  17. Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment.

    PubMed

    Nagar, Anurag; Hahsler, Michael

    2013-01-01

    Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to identify conserved regions fast or even interactively using a standard PC. Our method has many potential applications such as finding characteristic signature sequences for families of organisms and studying conserved and variable regions in, for example, 16S rRNA.

  18. Self-similarity in nature

    NASA Astrophysics Data System (ADS)

    Timashev, S. F.

    2000-02-01

    A general phenomenological approach to the analysis of experimental temporal, spatial and energetic series for extracting truly physical non-model parameters ("passport data") is presented, which may be used to characterize and distinguish the evolution as well as the spatial and energetic structure of any open nonlinear dissipative system. This methodology is based on a postulate concerning the crucial information contained in the sequences of non-regularities of the measured dynamic variable (temporal, spatial, energetic). In accordance with this approach, multi-parametric formulas for dynamic variable power spectra as well as for structural functions of different orders are identical for every spatial-temporal-energetic level of the system under consideration. In effect, this entails the introduction of a new kind of self-similarity in Nature. An algorithm has been developed for obtaining as many "passport data" as are necessary for the characterization of a dynamic system. Applications of this approach in the analysis of various experimental series (temporal, spatial, energetic) demonstrate its potential for defining adequate phenomenological parameters of different dynamic processes and structures.

  19. Temporal dynamics of contingency extraction from tonal and verbal auditory sequences.

    PubMed

    Bendixen, Alexandra; Schwartze, Michael; Kotz, Sonja A

    2015-09-01

    Consecutive sound events are often to some degree predictive of each other. Here we investigated the brain's capacity to detect contingencies between consecutive sounds by means of electroencephalography (EEG) during passive listening. Contingencies were embedded either within tonal or verbal stimuli. Contingency extraction was measured indirectly via the elicitation of the mismatch negativity (MMN) component of the event-related potential (ERP) by contingency violations. MMN results indicate that structurally identical forms of predictability can be extracted from both tonal and verbal stimuli. We also found similar generators to underlie the processing of contingency violations across stimulus types, as well as similar performance in an active-listening follow-up test. However, the process of passive contingency extraction was considerably slower (twice as many rule exemplars were needed) for verbal than for tonal stimuli These results suggest caution in transferring findings on complex predictive regularity processing obtained with tonal stimuli directly to the speech domain. Copyright © 2014 Elsevier Inc. All rights reserved.

  20. SIMAP--a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters.

    PubMed

    Rattei, Thomas; Tischler, Patrick; Götz, Stefan; Jehl, Marc-André; Hoser, Jonathan; Arnold, Roland; Conesa, Ana; Mewes, Hans-Werner

    2010-01-01

    The prediction of protein function as well as the reconstruction of evolutionary genesis employing sequence comparison at large is still the most powerful tool in sequence analysis. Due to the exponential growth of the number of known protein sequences and the subsequent quadratic growth of the similarity matrix, the computation of the Similarity Matrix of Proteins (SIMAP) becomes a computational intensive task. The SIMAP database provides a comprehensive and up-to-date pre-calculation of the protein sequence similarity matrix, sequence-based features and sequence clusters. As of September 2009, SIMAP covers 48 million proteins and more than 23 million non-redundant sequences. Novel features of SIMAP include the expansion of the sequence space by including databases such as ENSEMBL as well as the integration of metagenomes based on their consistent processing and annotation. Furthermore, protein function predictions by Blast2GO are pre-calculated for all sequences in SIMAP and the data access and query functions have been improved. SIMAP assists biologists to query the up-to-date sequence space systematically and facilitates large-scale downstream projects in computational biology. Access to SIMAP is freely provided through the web portal for individuals (http://mips.gsf.de/simap/) and for programmatic access through DAS (http://webclu.bio.wzw.tum.de/das/) and Web-Service (http://mips.gsf.de/webservices/services/SimapService2.0?wsdl).

  1. SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.

    PubMed

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-05-01

    Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.

  2. SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

    PubMed Central

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-01-01

    Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616

  3. Computational Identification Of CDR3 Sequence Archetypes Among Immunoglobulin Sequences in Chronic Lymphocytic Leukemia

    PubMed Central

    Messmer, Bradley T; Raphael, Benjamin J; Aerni, Sarah J; Widhopf, George F; Rassenti, Laura Z; Gribben, John G; Kay, Neil E; Kipps, Thomas J

    2009-01-01

    The leukemia cells of unrelated patients with chronic lymphocytic leukemia (CLL) display a restricted repertoire of immunoglobulin (Ig) gene rearrangements with preferential usage of certain Ig gene segments. We developed a computational method to rigorously quantify biases in Ig sequence similarity in large patient databases and to identify groups of patients with unusual levels of sequence similarity. We applied our method to sequences from 1577 CLL patients through the CLL Research Consortium (CRC), and identified 67 similarity groups into which roughly 20% of all patients could be assigned. Immunoglobulin light chain class was highly correlated within all groups and light chain gene usage was similar within sets. Surprisingly, over 40% of the identified groups were composed of somatically mutated genes. This study significantly expands the evidence that antigen selection shapes the Ig repertoire in CLL. PMID:18640719

  4. Computational identification of CDR3 sequence archetypes among immunoglobulin sequences in chronic lymphocytic leukemia.

    PubMed

    Messmer, Bradley T; Raphael, Benjamin J; Aerni, Sarah J; Widhopf, George F; Rassenti, Laura Z; Gribben, John G; Kay, Neil E; Kipps, Thomas J

    2009-03-01

    The leukemia cells of unrelated patients with chronic lymphocytic leukemia (CLL) display a restricted repertoire of immunoglobulin (Ig) gene rearrangements with preferential usage of certain Ig gene segments. We developed a computational method to rigorously quantify biases in Ig sequence similarity in large patient databases and to identify groups of patients with unusual levels of sequence similarity. We applied our method to sequences from 1577 CLL patients through the CLL Research Consortium (CRC), and identified 67 similarity groups into which roughly 20% of all patients could be assigned. Immunoglobulin light chain class was highly correlated within all groups and light chain gene usage was similar within sets. Surprisingly, over 40% of the identified groups were composed of somatically mutated genes. This study significantly expands the evidence that antigen selection shapes the Ig repertoire in CLL.

  5. Community detection in sequence similarity networks based on attribute clustering

    DOE PAGES

    Chowdhary, Janamejaya; Loeffler, Frank E.; Smith, Jeremy C.

    2017-07-24

    Networks are powerful tools for the presentation and analysis of interactions in multi-component systems. A commonly studied mesoscopic feature of networks is their community structure, which arises from grouping together similar nodes into one community and dissimilar nodes into separate communities. Here in this paper, the community structure of protein sequence similarity networks is determined with a new method: Attribute Clustering Dependent Communities (ACDC). Sequence similarity has hitherto typically been quantified by the alignment score or its expectation value. However, pair alignments with the same score or expectation value cannot thus be differentiated. To overcome this deficiency, the method constructs,more » for pair alignments, an extended alignment metric, the link attribute vector, which includes the score and other alignment characteristics. Rescaling components of the attribute vectors qualitatively identifies a systematic variation of sequence similarity within protein superfamilies. The problem of community detection is then mapped to clustering the link attribute vectors, selection of an optimal subset of links and community structure refinement based on the partition density of the network. ACDC-predicted communities are found to be in good agreement with gold standard sequence databases for which the "ground truth" community structures (or families) are known. ACDC is therefore a community detection method for sequence similarity networks based entirely on pair similarity information. A serial implementation of ACDC is available from https://cmb.ornl.gov/resources/developments« less

  6. An approach to large scale identification of non-obvious structural similarities between proteins

    PubMed Central

    Cherkasov, Artem; Jones, Steven JM

    2004-01-01

    Background A new sequence independent bioinformatics approach allowing genome-wide search for proteins with similar three dimensional structures has been developed. By utilizing the numerical output of the sequence threading it establishes putative non-obvious structural similarities between proteins. When applied to the testing set of proteins with known three dimensional structures the developed approach was able to recognize structurally similar proteins with high accuracy. Results The method has been developed to identify pathogenic proteins with low sequence identity and high structural similarity to host analogues. Such protein structure relationships would be hypothesized to arise through convergent evolution or through ancient horizontal gene transfer events, now undetectable using current sequence alignment techniques. The pathogen proteins, which could mimic or interfere with host activities, would represent candidate virulence factors. The developed approach utilizes the numerical outputs from the sequence-structure threading. It identifies the potential structural similarity between a pair of proteins by correlating the threading scores of the corresponding two primary sequences against the library of the standard folds. This approach allowed up to 64% sensitivity and 99.9% specificity in distinguishing protein pairs with high structural similarity. Conclusion Preliminary results obtained by comparison of the genomes of Homo sapiens and several strains of Chlamydia trachomatis have demonstrated the potential usefulness of the method in the identification of bacterial proteins with known or potential roles in virulence. PMID:15147578

  7. Community detection in sequence similarity networks based on attribute clustering

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chowdhary, Janamejaya; Loeffler, Frank E.; Smith, Jeremy C.

    Networks are powerful tools for the presentation and analysis of interactions in multi-component systems. A commonly studied mesoscopic feature of networks is their community structure, which arises from grouping together similar nodes into one community and dissimilar nodes into separate communities. Here in this paper, the community structure of protein sequence similarity networks is determined with a new method: Attribute Clustering Dependent Communities (ACDC). Sequence similarity has hitherto typically been quantified by the alignment score or its expectation value. However, pair alignments with the same score or expectation value cannot thus be differentiated. To overcome this deficiency, the method constructs,more » for pair alignments, an extended alignment metric, the link attribute vector, which includes the score and other alignment characteristics. Rescaling components of the attribute vectors qualitatively identifies a systematic variation of sequence similarity within protein superfamilies. The problem of community detection is then mapped to clustering the link attribute vectors, selection of an optimal subset of links and community structure refinement based on the partition density of the network. ACDC-predicted communities are found to be in good agreement with gold standard sequence databases for which the "ground truth" community structures (or families) are known. ACDC is therefore a community detection method for sequence similarity networks based entirely on pair similarity information. A serial implementation of ACDC is available from https://cmb.ornl.gov/resources/developments« less

  8. Isolation of laccase gene-specific sequences from white rot and brown rot fungi by PCR.

    PubMed Central

    D'Souza, T M; Boominathan, K; Reddy, C A

    1996-01-01

    Degenerate primers corresponding to the consensus sequences of the copper-binding regions in the N-terminal domains of known basidiomycete laccases were used to isolate laccase gene-specific sequences from strains representing nine genera of wood rot fungi. All except three gave the expected PCR product of about 200 bp. Computer searches of the databases identified the sequence of each of the PCR products analyzed as a laccase gene sequence, suggesting the specificity of the primers. PCR products of the white rot fungi Ganoderma lucidum, Phlebia brevispora, and Trametes versicolor showed 65 to 74% nucleotide sequence similarity to each other; the similarity in deduced amino acid sequences was 83 to 91%. The PCR products of Lentinula edodes and Lentinus tigrinus, on the other hand, showed relatively low nucleotide and amino acid similarities (58 to 64 and 62 to 81%, respectively); however, these similarities were still much higher than when compared with the corresponding regions in the laccases of the ascomycete fungi Aspergillus nidulans and Neurospora crassa. A few of the white rot fungi, as well as Gloeophyllum trabeum, a brown rot fungus, gave a 144-bp PCR fragment which had a nucleotide sequence similarity of 60 to 71%. Demonstration of laccase activity in G. trabeum and several other brown rot fungi was of particular interest because these organisms were not previously shown to produce laccases. PMID:8837429

  9. Fold independent structural comparisons of protein-ligand binding sites for exploring functional relationships.

    PubMed

    Gold, Nicola D; Jackson, Richard M

    2006-02-03

    The rapid growth in protein structural data and the emergence of structural genomics projects have increased the need for automatic structure analysis and tools for function prediction. Small molecule recognition is critical to the function of many proteins; therefore, determination of ligand binding site similarity is important for understanding ligand interactions and may allow their functional classification. Here, we present a binding sites database (SitesBase) that given a known protein-ligand binding site allows rapid retrieval of other binding sites with similar structure independent of overall sequence or fold similarity. However, each match is also annotated with sequence similarity and fold information to aid interpretation of structure and functional similarity. Similarity in ligand binding sites can indicate common binding modes and recognition of similar molecules, allowing potential inference of function for an uncharacterised protein or providing additional evidence of common function where sequence or fold similarity is already known. Alternatively, the resource can provide valuable information for detailed studies of molecular recognition including structure-based ligand design and in understanding ligand cross-reactivity. Here, we show examples of atomic similarity between superfamily or more distant fold relatives as well as between seemingly unrelated proteins. Assignment of unclassified proteins to structural superfamiles is also undertaken and in most cases substantiates assignments made using sequence similarity. Correct assignment is also possible where sequence similarity fails to find significant matches, illustrating the potential use of binding site comparisons for newly determined proteins.

  10. [WN] central stars of planetary nebulae

    NASA Astrophysics Data System (ADS)

    Todt, H.; Miszalski, B.; Toalá, J. A.; Guerrero, M. A.

    2017-10-01

    While most of the low-mass stars stay hydrogen-rich on their surface throughout their evolution, a considerable fraction of white dwarfs as well as central stars of planetary nebulae have a hydrogen-deficient surface composition. The majority of these H-deficient central stars exhibit spectra very similar to massive Wolf-Rayet stars of the carbon sequence, i.e. with broad emission lines of carbon, helium, and oxygen. In analogy to the massive Wolf-Rayet stars, they are classified as [WC] stars. Their formation, which is relatively well understood, is thought to be the result of a (very) late thermal pulse of the helium burning shell. It is therefore surprising that some H-deficient central stars which have been found recently, e.g. IC 4663 and Abell 48, exhibit spectra that resemble those of the massive Wolf-Rayet stars of the nitrogen sequence, i.e. with strong emission lines of nitrogen instead of carbon. This new type of central stars is therefore labelled [WN]. We present spectral analyses of these objects and discuss the status of further candidates as well as the evolutionary status and origin of the [WN] stars.

  11. Improved localisation for 2-hydroxyglutarate detection at 3T using long-TE semi-LASER.

    PubMed

    Berrington, Adam; Voets, Natalie L; Plaha, Puneet; Larkin, Sarah J; Mccullagh, James; Stacey, Richard; Yildirim, Muhammed; Schofield, Christopher J; Jezzard, Peter; Cadoux-Hudson, Tom; Ansorge, Olaf; Emir, Uzay E

    2016-06-01

    2-hydroxyglutarate (2-HG) has emerged as a biomarker of tumour cell IDH mutations that may enable the differential diagnosis of glioma patients. At 3 Tesla, detection of 2-HG with magnetic resonance spectroscopy is challenging because of metabolite signal overlap and a spectral pattern modulated by slice selection and chemical shift displacement. Using density matrix simulations and phantom experiments, an optimised semi-LASER scheme (TE = 110 ms) improves localisation of the 2-HG spin system considerably compared to an existing PRESS sequence. This results in a visible 2-HG peak in the in vivo spectra at 1.9 ppm in the majority of IDH mutated tumours. Detected concentrations of 2-HG were similar using both sequences, although the use of semi-LASER generated narrower confidence intervals. Signal overlap with glutamate and glutamine, as measured by pairwise fitting correlation was reduced. Lactate was readily detectable across glioma patients using the method presented here (mean CLRB: (10±2)%). Together with more robust 2-HG detection, long TE semi-LASER offers the potential to investigate tumour metabolism and stratify patients in vivo at 3T.

  12. Molecular cloning of trypsin cDNAs and trypsin gene expression in the salmon louse Lepeophtheirus salmonis (Copepoda: Caligidae).

    PubMed

    Johnson, S C; Ewart, K V; Osborne, J A; Delage, D; Ross, N W; Murray, H M

    2002-09-01

    The salmon louse, Lepeophtheirus salmonis, is a marine ectoparasitic copepod that infects salmonid fishes. We are studying the interactions between this parasite and its salmonid hosts, as it is a common cause of disease in both wild and farmed stocks of salmon. In this paper, we report on the cloning and sequencing of seven trypsin-like enzymes from a cDNA library prepared from whole body preadult female and male L. salmonis. The predicted trypsin activation peptides are 23 or 24 residues in length, considerably longer than previously reported activation peptides of other animals. Differences in the putative signal and activation peptide sequences of the trypsin isoforms suggest that these forms differ in their regulation and function. The calculated molecular weights of the trypsins range from 23.6 to 23.7 kDa. There are eight cysteine residues, which suggest the presence of four disulfide bridges. These trypsins are very similar (>or=46% aa identity) to other crustacean trypsins and insect hypodermins. Using in situ hybridization techniques trypsinogen expression could be identified in all three cell types of the midgut.

  13. Feasibility and effectiveness of a brief, intensive phylogenetics workshop in a middle-income country.

    PubMed

    Pollett, S; Leguia, M; Nelson, M I; Maljkovic Berry, I; Rutherford, G; Bausch, D G; Kasper, M; Jarman, R; Melendrez, M

    2016-01-01

    There is an increasing role for bioinformatic and phylogenetic analysis in tropical medicine research. However, scientists working in low- and middle-income regions may lack access to training opportunities in these methods. To help address this gap, a 5-day intensive bioinformatics workshop was offered in Lima, Peru. The syllabus is presented here for others who want to develop similar programs. To assess knowledge gained, a 20-point knowledge questionnaire was administered to participants (21 participants) before and after the workshop, covering topics on sequence quality control, alignment/formatting, database retrieval, models of evolution, sequence statistics, tree building, and results interpretation. Evolution/tree-building methods represented the lowest scoring domain at baseline and after the workshop. There was a considerable median gain in total knowledge scores (increase of 30%, p<0.001) with gains as high as 55%. A 5-day workshop model was effective in improving the pathogen-applied bioinformatics knowledge of scientists working in a middle-income country setting. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  14. Analysis of dispatching rules in a stochastic dynamic job shop manufacturing system with sequence-dependent setup times

    NASA Astrophysics Data System (ADS)

    Sharma, Pankaj; Jain, Ajai

    2014-12-01

    Stochastic dynamic job shop scheduling problem with consideration of sequence-dependent setup times are among the most difficult classes of scheduling problems. This paper assesses the performance of nine dispatching rules in such shop from makespan, mean flow time, maximum flow time, mean tardiness, maximum tardiness, number of tardy jobs, total setups and mean setup time performance measures viewpoint. A discrete event simulation model of a stochastic dynamic job shop manufacturing system is developed for investigation purpose. Nine dispatching rules identified from literature are incorporated in the simulation model. The simulation experiments are conducted under due date tightness factor of 3, shop utilization percentage of 90% and setup times less than processing times. Results indicate that shortest setup time (SIMSET) rule provides the best performance for mean flow time and number of tardy jobs measures. The job with similar setup and modified earliest due date (JMEDD) rule provides the best performance for makespan, maximum flow time, mean tardiness, maximum tardiness, total setups and mean setup time measures.

  15. Efficient alignment-free DNA barcode analytics

    PubMed Central

    Kuksa, Pavel; Pavlovic, Vladimir

    2009-01-01

    Background In this work we consider barcode DNA analysis problems and address them using alternative, alignment-free methods and representations which model sequences as collections of short sequence fragments (features). The methods use fixed-length representations (spectrum) for barcode sequences to measure similarities or dissimilarities between sequences coming from the same or different species. The spectrum-based representation not only allows for accurate and computationally efficient species classification, but also opens possibility for accurate clustering analysis of putative species barcodes and identification of critical within-barcode loci distinguishing barcodes of different sample groups. Results New alignment-free methods provide highly accurate and fast DNA barcode-based identification and classification of species with substantial improvements in accuracy and speed over state-of-the-art barcode analysis methods. We evaluate our methods on problems of species classification and identification using barcodes, important and relevant analytical tasks in many practical applications (adverse species movement monitoring, sampling surveys for unknown or pathogenic species identification, biodiversity assessment, etc.) On several benchmark barcode datasets, including ACG, Astraptes, Hesperiidae, Fish larvae, and Birds of North America, proposed alignment-free methods considerably improve prediction accuracy compared to prior results. We also observe significant running time improvements over the state-of-the-art methods. Conclusion Our results show that newly developed alignment-free methods for DNA barcoding can efficiently and with high accuracy identify specimens by examining only few barcode features, resulting in increased scalability and interpretability of current computational approaches to barcoding. PMID:19900305

  16. High levels of variation in Salix lignocellulose genes revealed using poplar genomic resources

    PubMed Central

    2013-01-01

    Background Little is known about the levels of variation in lignin or other wood related genes in Salix, a genus that is being increasingly used for biomass and biofuel production. The lignin biosynthesis pathway is well characterized in a number of species, including the model tree Populus. We aimed to transfer the genomic resources already available in Populus to its sister genus Salix to assess levels of variation within genes involved in wood formation. Results Amplification trials for 27 gene regions were undertaken in 40 Salix taxa. Twelve of these regions were sequenced. Alignment searches of the resulting sequences against reference databases, combined with phylogenetic analyses, showed the close similarity of these Salix sequences to Populus, confirming homology of the primer regions and indicating a high level of conservation within the wood formation genes. However, all sequences were found to vary considerably among Salix species, mainly as SNPs with a smaller number of insertions-deletions. Between 25 and 176 SNPs per kbp per gene region (in predicted exons) were discovered within Salix. Conclusions The variation found is sizeable but not unexpected as it is based on interspecific and not intraspecific comparison; it is comparable to interspecific variation in Populus. The characterisation of genetic variation is a key process in pre-breeding and for the conservation and exploitation of genetic resources in Salix. This study characterises the variation in several lignocellulose gene markers for such purposes. PMID:23924375

  17. Local alignment of two-base encoded DNA sequence

    PubMed Central

    Homer, Nils; Merriman, Barry; Nelson, Stanley F

    2009-01-01

    Background DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity. Results We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions. Conclusion The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data. PMID:19508732

  18. Exploring Dance Movement Data Using Sequence Alignment Methods

    PubMed Central

    Chavoshi, Seyed Hossein; De Baets, Bernard; Neutens, Tijs; De Tré, Guy; Van de Weghe, Nico

    2015-01-01

    Despite the abundance of research on knowledge discovery from moving object databases, only a limited number of studies have examined the interaction between moving point objects in space over time. This paper describes a novel approach for measuring similarity in the interaction between moving objects. The proposed approach consists of three steps. First, we transform movement data into sequences of successive qualitative relations based on the Qualitative Trajectory Calculus (QTC). Second, sequence alignment methods are applied to measure the similarity between movement sequences. Finally, movement sequences are grouped based on similarity by means of an agglomerative hierarchical clustering method. The applicability of this approach is tested using movement data from samba and tango dancers. PMID:26181435

  19. The limits of protein sequence comparison?

    PubMed Central

    Pearson, William R; Sierk, Michael L

    2010-01-01

    Modern sequence alignment algorithms are used routinely to identify homologous proteins, proteins that share a common ancestor. Homologous proteins always share similar structures and often have similar functions. Over the past 20 years, sequence comparison has become both more sensitive, largely because of profile-based methods, and more reliable, because of more accurate statistical estimates. As sequence and structure databases become larger, and comparison methods become more powerful, reliable statistical estimates will become even more important for distinguishing similarities that are due to homology from those that are due to analogy (convergence). The newest sequence alignment methods are more sensitive than older methods, but more accurate statistical estimates are needed for their full power to be realized. PMID:15919194

  20. Complete Genome Sequences of 38 Gordonia sp. Bacteriophages

    PubMed Central

    Montgomery, Matthew T.; Bonilla, J. Alfred; Dejong, Randall; Garlena, Rebecca A.; Guerrero Bustamante, Carlos; Klyczek, Karen K.; Russell, Daniel A.; Wertz, John T.; Jacobs-Sera, Deborah; Hatfull, Graham F.

    2017-01-01

    ABSTRACT We report here the genome sequences of 38 newly isolated bacteriophages using Gordonia terrae 3612 (ATCC 25594) and Gordonia neofelifaecis NRRL59395 as bacterial hosts. All of the phages are double-stranded DNA (dsDNA) tail phages with siphoviral morphologies, with genome sizes ranging from 17,118 bp to 93,843 bp and spanning considerable nucleotide sequence diversity. PMID:28057748

  1. An early illness recognition framework using a temporal Smith Waterman algorithm and NLP.

    PubMed

    Hajihashemi, Zahra; Popescu, Mihail

    2013-01-01

    In this paper we propose a framework for detecting health patterns based on non-wearable sensor sequence similarity and natural language processing (NLP). In TigerPlace, an aging in place facility from Columbia, MO, we deployed 47 sensor networks together with a nursing electronic health record (EHR) system to provide early illness recognition. The proposed framework utilizes sensor sequence similarity and NLP on EHR nursing comments to automatically notify the physician when health problems are detected. The reported methodology is inspired by genomic sequence annotation using similarity algorithms such as Smith Waterman (SW). Similarly, for each sensor sequence, we associate health concepts extracted from the nursing notes using Metamap, a NLP tool provided by Unified Medical Language System (UMLS). Since sensor sequences, unlike genomics ones, have an associated time dimension we propose a temporal variant of SW (TSW) to account for time. The main challenges presented by our framework are finding the most suitable time sequence similarity and aggregation of the retrieved UMLS concepts. On a pilot dataset from three Tiger Place residents, with a total of 1685 sensor days and 626 nursing records, we obtained an average precision of 0.64 and a recall of 0.37.

  2. Clustering and visualizing similarity networks of membrane proteins.

    PubMed

    Hu, Geng-Ming; Mai, Te-Lun; Chen, Chi-Ming

    2015-08-01

    We proposed a fast and unsupervised clustering method, minimum span clustering (MSC), for analyzing the sequence-structure-function relationship of biological networks, and demonstrated its validity in clustering the sequence/structure similarity networks (SSN) of 682 membrane protein (MP) chains. The MSC clustering of MPs based on their sequence information was found to be consistent with their tertiary structures and functions. For the largest seven clusters predicted by MSC, the consistency in chain function within the same cluster is found to be 100%. From analyzing the edge distribution of SSN for MPs, we found a characteristic threshold distance for the boundary between clusters, over which SSN of MPs could be properly clustered by an unsupervised sparsification of the network distance matrix. The clustering results of MPs from both MSC and the unsupervised sparsification methods are consistent with each other, and have high intracluster similarity and low intercluster similarity in sequence, structure, and function. Our study showed a strong sequence-structure-function relationship of MPs. We discussed evidence of convergent evolution of MPs and suggested applications in finding structural similarities and predicting biological functions of MP chains based on their sequence information. © 2015 Wiley Periodicals, Inc.

  3. Predicted secondary structure similarity in the absence of primary amino acid sequence homology: hepatitis B virus open reading frames.

    PubMed Central

    Schaeffer, E; Sninsky, J J

    1984-01-01

    Proteins that are related evolutionarily may have diverged at the level of primary amino acid sequence while maintaining similar secondary structures. Computer analysis has been used to compare the open reading frames of the hepatitis B virus to those of the woodchuck hepatitis virus at the level of amino acid sequence, and to predict the relative hydrophilic character and the secondary structure of putative polypeptides. Similarity is seen at the levels of relative hydrophilicity and secondary structure, in the absence of sequence homology. These data reinforce the proposal that these open reading frames encode viral proteins. Computer analysis of this type can be more generally used to establish structural similarities between proteins that do not share obvious sequence homology as well as to assess whether an open reading frame is fortuitous or codes for a protein. PMID:6585835

  4. Economic importance, taxonomic representation and scientific priority as drivers of genome sequencing projects.

    PubMed

    Vallée, Geneviève C; Muñoz, Daniella Santos; Sankoff, David

    2016-11-11

    Of the approximately two hundred sequenced plant genomes, how many and which ones were sequenced motivated by strictly or largely scientific considerations, and how many by chiefly economic, in a wide sense, incentives? And how large a role does publication opportunity play? In an integration of multiple disparate databases and other sources of information, we collect and analyze data on the size (number of species) in the plant orders and families containing sequenced genomes, on the trade value of these species, and of all the same-family or same-order species, and on the publication priority within the family and order. These data are subjected to multiple regression and other statistical analyses. We find that despite the initial importance of model organisms, it is clearly economic considerations that outweigh others in the choice of genome to be sequenced. This has important implications for generalizations about plant genomes, since human choices of plants to harvest (and cultivate) will have incurred many biases with respect to phenotypic characteristics and hence of genomic properties, and recent genomic evolution will also have been affected by human agricultural practices.

  5. Enrichment of target sequences for next-generation sequencing applications in research and diagnostics.

    PubMed

    Altmüller, Janine; Budde, Birgit S; Nürnberg, Peter

    2014-02-01

    Abstract Targeted re-sequencing such as gene panel sequencing (GPS) has become very popular in medical genetics, both for research projects and in diagnostic settings. The technical principles of the different enrichment methods have been reviewed several times before; however, new enrichment products are constantly entering the market, and researchers are often puzzled about the requirement to take decisions about long-term commitments, both for the enrichment product and the sequencing technology. This review summarizes important considerations for the experimental design and provides helpful recommendations in choosing the best sequencing strategy for various research projects and diagnostic applications.

  6. Isolation of laccase gene-specific sequences from white rot and brown rot fungi by PCR

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    D`Souza, T.M.; Boominathan, K.; Reddy, C.A.

    1996-10-01

    Degenerate primers corresponding to the consensus sequences of the copper-binding regions in the N-terminal domains of known basidiomycete laccases were used to isolate laccase gene-specific sequences from strains representing nine genera of wood rot fungi. All except three gave the expected PCR product of about 200 bp. Computer searches of the databases identified the sequences of each of the PCR product of about 200 bp. Computer searches of the databases identified the sequence of each of the PCR products analyzed as a laccase gene sequence, suggesting the specificity of the primers. PCR products of the white rot fungi Ganoderma lucidum,more » Phlebia brevispora, and Trametes versicolor showed 65 to 74% nucleotide sequence similarity to each other; the similarity in deduced amino acid sequences was 83 to 91%. The PCR products of Lentinula edodes and Lentinus tigrinus, on the other hand, showed relatively low nucleotide and amino acid similarities (58 to 64 and 62 to 81%, respectively); however, these similarities were still much higher than when compared with the corresponding regions in the laccases of the ascomycete fungi Aspergillus nidulans and Neurospora crassa. A few of the white rot fungi, as well as Gloeophyllum trabeum, a brown rot fungus, gave a 144-bp PCR fragment which had a nucleotide sequence similarity of 60 to 71%. Demonstration of laccase activity in G. trabeum and several other brown rot fungi was of particular interest because these organisms were not previously shown to produce laccases. 36 refs., 6 figs., 2 tabs.« less

  7. Characterization of the Campylobacter jejuni cryptic plasmid pTIW94 recovered from wild birds in the southeastern United States.

    PubMed

    Hiett, Kelli L; Rothrock, Michael J; Seal, Bruce S

    2013-09-01

    The complete nucleotide sequence was determined for a cryptic plasmid, pTIW94, recovered from several Campylobacter jejuni isolates from wild birds in the southeastern United States. pTIW94 is a circular molecule of 3860 nucleotides, with a G+C content (31.0%) similar to that of many Campylobacter spp. genomes. A typical origin of replication, with iteron sequences, was identified upstream of DNA sequences that demonstrated similarity to replication initiation proteins. A total of five open reading frames (ORFs) were identified; two of the five ORFs demonstrated significant similarity to plasmid pCC2228-2 found within Campylobacter coli. These two ORFs were similar to essential replication proteins RepA (100%; 26/26 aa identity) and RepB (95%; 327/346 aa identity). A third identified ORF demonstrated significant similarity (99%; 421/424 aa identity) to the MOB protein from C. coli 67-8, originally recovered from swine. The other two identified ORFs were either similar to hypothetical proteins from other Campylobacter spp., or exhibited no significant similarity to any DNA or protein sequence in the GenBank database. Promoter regions (-35 and -10 signal sites), ribosomal binding sites upstream of ORFs, and stem-loop structures were also identified within the plasmid. These results demonstrate that pTIW94 represents a previously un-reported small cryptic plasmid with unique sequences as well as highly similar sequences to other small plasmids found within Campylobacter spp., and that this cryptic plasmid is present among Campylobacter spp. recovered from different genera of wild birds. Copyright © 2013. Published by Elsevier Inc.

  8. Yield and Economic Responses of Peanut to Crop Rotation Sequence

    USDA-ARS?s Scientific Manuscript database

    National Peanut Research Laboratory, Dawson, GA 39842. Proper crop rotation is essential to maintaining high peanut yield and quality. However, the economic considerations of maintaining or altering crop rotation sequences must incorporate the commodity prices, production costs, and yield responses...

  9. Molecular analysis of an oyster-related norovirus outbreak.

    PubMed

    Nenonen, Nancy P; Hannoun, Charles; Olsson, Margareta B; Bergström, Tomas

    2009-06-01

    Contaminated raw oysters were implicated in a severe outbreak of norovirus (NoV) gastroenteritis affecting 30 restaurant guests. To define the outbreak source by using molecular methods to characterize NoV strains detected in patient and oyster samples. Molecular epidemiological studies based on nucleotide sequencing and phylogenetic analyses of patient and oyster NoV strains, and comparison to background dataset. NoV genotype (G) I.1 was detected in the one patient stool analyzed by in-house TaqMan real time RT-PCR and classical nested RT-PCR targeting NoV RNA-dependent polymerase (RdRp, 285 nt), and by nested RT-PCR targeting RdRp-capsid-poly(A)-3' (3085 nt). Patient strain showed >or=99% similarity (285 nt) with three NoV strains detected in two of five oysters examined by classical nested RT-PCR (RdRp). A third oyster tested positive for NoV GII.3. Phylogenetic analysis showed clustering of patient and oyster strains related to this outbreak with GI.1 strains from previous local outbreaks, and mussel studies. Sequence data revealed >or=99% similarity (285 nt) between NoV GI.1 strains detected in patient stool and suspect oysters, linking the contaminated oysters to the outbreak. Identification of human NoV GI and GII strains in oysters indicated contamination of human fecal origin, presumably from inappropriate storage in the harbor. Comparative long-fragment analysis of the patient strain revealed 99% similarity (3085 nt) with NoV GI.1 strains detected in previous outbreaks and environmental mussel studies from West Sweden, 87% with M87661 (Norwalk68) and 96% with L23828 (SRSV-KY-89/89/J). These results indicated considerable genomic stability of NoV GI.1 strains over time.

  10. Lactobacillus heilongjiangensis sp. nov., isolated from Chinese pickle.

    PubMed

    Gu, Chun Tao; Li, Chun Yan; Yang, Li Jie; Huo, Gui Cheng

    2013-11-01

    A Gram-stain-positive bacterial strain, S4-3(T), was isolated from traditional pickle in Heilongjiang Province, China. The bacterium was characterized by a polyphasic approach, including 16S rRNA gene sequence analysis, pheS gene sequence analysis, rpoA gene sequence analysis, dnaK gene sequence analysis, fatty acid methyl ester (FAME) analysis, determination of DNA G+C content, DNA-DNA hybridization and an analysis of phenotypic features. Strain S4-3(T) showed 97.9-98.7 % 16S rRNA gene sequence similarities, 84.4-94.1 % pheS gene sequence similarities and 94.4-96.9 % rpoA gene sequence similarities to the type strains of Lactobacillus nantensis, Lactobacillus mindensis, Lactobacillus crustorum, Lactobacillus futsaii, Lactobacillus farciminis and Lactobacillus kimchiensis. dnaK gene sequence similarities between S4-3(T) and Lactobacillus nantensis LMG 23510(T), Lactobacillus mindensis LMG 21932(T), Lactobacillus crustorum LMG 23699(T), Lactobacillus futsaii JCM 17355(T) and Lactobacillus farciminis LMG 9200(T) were 95.4, 91.5, 90.4, 91.7 and 93.1 %, respectively. Based upon the data obtained in the present study, a novel species, Lactobacillus heilongjiangensis sp. nov., is proposed and the type strain is S4-3(T) ( = LMG 26166(T) = NCIMB 14701(T)).

  11. Methodologic European external quality assurance for DNA sequencing: the EQUALseq program.

    PubMed

    Ahmad-Nejad, Parviz; Dorn-Beineke, Alexandra; Pfeiffer, Ulrike; Brade, Joachim; Geilenkeuser, Wolf-Jochen; Ramsden, Simon; Pazzagli, Mario; Neumaier, Michael

    2006-04-01

    DNA sequencing is a key technique in molecular diagnostics, but to date no comprehensive methodologic external quality assessment (EQA) programs have been instituted. Between 2003 and 2005, the European Union funded, as specific support actions, the EQUAL initiative to develop methodologic EQA schemes for genotyping (EQUALqual), quantitative PCR (EQUALquant), and sequencing (EQUALseq). Here we report on the results of the EQUALseq program. The participating laboratories received a 4-sample set comprising 2 DNA plasmids, a PCR product, and a finished sequencing reaction to be analyzed. Data and information from detailed questionnaires were uploaded online and evaluated by use of a scoring system for technical skills and proficiency of data interpretation. Sixty laboratories from 21 European countries registered, and 43 participants (72%) returned data and samples. Capillary electrophoresis was the predominant platform (n = 39; 91%). The median contiguous correct sequence stretch was 527 nucleotides with considerable variation in quality of both primary data and data evaluation. The association between laboratory performance and the number of sequencing assays/year was statistically significant (P <0.05). Interestingly, more than 30% of participants neither added comments to their data nor made efforts to identify the gene sequences or mutational positions. Considerable variations exist even in a highly standardized methodology such as DNA sequencing. Methodologic EQAs are appropriate tools to uncover strengths and weaknesses in both technique and proficiency, and our results emphasize the need for mandatory EQAs. The results of EQUALseq should help improve the overall quality of molecular genetics findings obtained by DNA sequencing.

  12. FRESCO: Referential compression of highly similar sequences.

    PubMed

    Wandelt, Sebastian; Leser, Ulf

    2013-01-01

    In many applications, sets of similar texts or sequences are of high importance. Prominent examples are revision histories of documents or genomic sequences. Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever-increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. In this paper, we propose a general open-source framework to compress large amounts of biological sequence data called Framework for REferential Sequence COmpression (FRESCO). Our basic compression algorithm is shown to be one to two orders of magnitudes faster than comparable related work, while achieving similar compression ratios. We also propose several techniques to further increase compression ratios, while still retaining the advantage in speed: 1) selecting a good reference sequence; and 2) rewriting a reference sequence to allow for better compression. In addition,we propose a new way of further boosting the compression ratios by applying referential compression to already referentially compressed files (second-order compression). This technique allows for compression ratios way beyond state of the art, for instance,4,000:1 and higher for human genomes. We evaluate our algorithms on a large data set from three different species (more than 1,000 genomes, more than 3 TB) and on a collection of versions of Wikipedia pages. Our results show that real-time compression of highly similar sequences at high compression ratios is possible on modern hardware.

  13. Yield and Economic Responses of Peanut to Crop Rotation Sequence

    USDA-ARS?s Scientific Manuscript database

    Proper crop rotation is essential to maintaining high peanut yield and quality. However, the economic considerations of maintaining or altering crop rotation sequences must incorporate the commodity prices, production costs, and yield responses of all crops in, or potentially in, the crop rotation ...

  14. Association between the genetic similarity of the open reading frame 5 sequence of Porcine reproductive and respiratory syndrome virus and the similarity in clinical signs of Porcine reproductive and respiratory syndrome in Ontario swine herds.

    PubMed

    Rosendal, Thomas; Dewey, Cate; Friendship, Robert; Wootton, Sarah; Young, Beth; Poljak, Zvonimir

    2014-10-01

    A study of Ontario swine farms positive for Porcine reproductive and respiratory syndrome virus (PRRSV) tested the association between genetic similarity of the virus and similarity of clinical signs reported by the herd owner. Herds were included if a positive result of polymerase chain reaction for PRRSV at the Animal Health Laboratory at the University of Guelph, Guelph, Ontario, was found between September 2004 and August 2007. Nucleotide-sequence similarity and clinical similarity, as determined from a telephone survey, were calculated for all pairs of herds. The Mantel test indicated that clinical similarity and sequence similarity were weakly correlated for most clinical signs. The generalized additive model indicated that virus homology with 2 vaccine viruses affected the association between sequence similarity and clinical similarity. When the data for herds with vaccine-like virus were removed from the dataset there was a significant association between virus similarity and similarity of the reported presence of abortion, stillbirth, preweaning mortality, and sow/boar mortality. Ownership similarity was also found to be associated with virus similarity and with similarity of the reported presence of sows being off-feed, nursery respiratory disease, nursery mortality, finisher respiratory disease, and finisher mortality. These results indicate that clinical signs of PRRS are associated with PRRSV genotype and that herd ownership is associated with both of these.

  15. Association between the genetic similarity of the open reading frame 5 sequence of Porcine reproductive and respiratory syndrome virus and the similarity in clinical signs of Porcine reproductive and respiratory syndrome in Ontario swine herds

    PubMed Central

    Rosendal, Thomas; Dewey, Cate; Friendship, Robert; Wootton, Sarah; Young, Beth; Poljak, Zvonimir

    2014-01-01

    A study of Ontario swine farms positive for Porcine reproductive and respiratory syndrome virus (PRRSV) tested the association between genetic similarity of the virus and similarity of clinical signs reported by the herd owner. Herds were included if a positive result of polymerase chain reaction for PRRSV at the Animal Health Laboratory at the University of Guelph, Guelph, Ontario, was found between September 2004 and August 2007. Nucleotide-sequence similarity and clinical similarity, as determined from a telephone survey, were calculated for all pairs of herds. The Mantel test indicated that clinical similarity and sequence similarity were weakly correlated for most clinical signs. The generalized additive model indicated that virus homology with 2 vaccine viruses affected the association between sequence similarity and clinical similarity. When the data for herds with vaccine-like virus were removed from the dataset there was a significant association between virus similarity and similarity of the reported presence of abortion, stillbirth, preweaning mortality, and sow/boar mortality. Ownership similarity was also found to be associated with virus similarity and with similarity of the reported presence of sows being off-feed, nursery respiratory disease, nursery mortality, finisher respiratory disease, and finisher mortality. These results indicate that clinical signs of PRRS are associated with PRRSV genotype and that herd ownership is associated with both of these. PMID:25355993

  16. Detection of Plasmodium sp. in capybara.

    PubMed

    dos Santos, Leonilda Correia; Curotto, Sandra Mara Rotter; de Moraes, Wanderlei; Cubas, Zalmir Silvino; Costa-Nascimento, Maria de Jesus; de Barros Filho, Ivan Roque; Biondo, Alexander Welker; Kirchgatter, Karin

    2009-07-07

    In the present study, we have microscopically and molecularly surveyed blood samples from 11 captive capybaras (Hydrochaeris hydrochaeris) from the Sanctuary Zoo for Plasmodium sp. infection. One animal presented positive on blood smear by light microscopy. Polymerase chain reaction was carried out accordingly using a nested genus-specific protocol, which uses oligonucleotides from conserved sequences flanking a variable sequence region in the small subunit ribosomal RNA (ssrRNA) of all Plasmodium organisms. This revealed three positive animals. Products from two samples were purified and sequenced. The results showed less than 1% divergence between the two capybara sequences. When compared with GenBank sequences, a 55% similarity was obtained to Toxoplasma gondii and a higher similarity (73-77.2%) was found to ssrRNAs from Plasmodium species that infect reptile, avian, rodents, and human beings. The most similar Plasmodium sequence was from Plasmodium mexicanum that infects lizards of North America, where around 78% identity was found. This work is the first report of Plasmodium in capybaras, and due to the low similarity with other Plasmodium species, we suggest it is a new species, which, in the future could be denominated "Plasmodium hydrochaeri".

  17. DNA barcoding for the identification of sand fly species (Diptera, Psychodidae, Phlebotominae) in Colombia.

    PubMed

    Contreras Gutiérrez, María Angélica; Vivero, Rafael J; Vélez, Iván D; Porter, Charles H; Uribe, Sandra

    2014-01-01

    Sand flies include a group of insects that are of medical importance and that vary in geographic distribution, ecology, and pathogen transmission. Approximately 163 species of sand flies have been reported in Colombia. Surveillance of the presence of sand fly species and the actualization of species distribution are important for predicting risks for and monitoring the expansion of diseases which sand flies can transmit. Currently, the identification of phlebotomine sand flies is based on morphological characters. However, morphological identification requires considerable skills and taxonomic expertise. In addition, significant morphological similarity between some species, especially among females, may cause difficulties during the identification process. DNA-based approaches have become increasingly useful and promising tools for estimating sand fly diversity and for ensuring the rapid and accurate identification of species. A partial sequence of the mitochondrial cytochrome oxidase gene subunit I (COI) is currently being used to differentiate species in different animal taxa, including insects, and it is referred as a barcoding sequence. The present study explored the utility of the DNA barcode approach for the identification of phlebotomine sand flies in Colombia. We sequenced 700 bp of the COI gene from 36 species collected from different geographic localities. The COI barcode sequence divergence within a single species was <2% in most cases, whereas this divergence ranged from 9% to 26.6% among different species. These results indicated that the barcoding gene correctly discriminated among the previously morphologically identified species with an efficacy of nearly 100%. Analyses of the generated sequences indicated that the observed species groupings were consistent with the morphological identifications. In conclusion, the barcoding gene was useful for species discrimination in sand flies from Colombia.

  18. Prevalence and Identity of Taenia multiceps cysts "Coenurus cerebralis" in Sheep in Egypt.

    PubMed

    Amer, Said; ElKhatam, Ahmed; Fukuda, Yasuhiro; Bakr, Lamia I; Zidan, Shereif; Elsify, Ahmed; Mohamed, Mostafa A; Tada, Chika; Nakai, Yutaka

    2017-12-01

    Coenurosis is a parasitic disease caused by the larval stage (Coenurus cerebralis) of the canids cestode Taenia multiceps. C. cerebralis particularly infects sheep and goats, and pose a public health concerns. The present study aimed to determine the occurrence and molecular identity of C. cerebralis infecting sheep in Egypt. Infection rate was determined by postmortem inspection of heads of the cases that showed neurological manifestations. Species identification and genetic diversity were analyzed based on PCR-sequence analysis of nuclear ITS1 and mitochondrial cytochrome oxidase (COI) and nicotinamide adenine dinucleotide dehydrogenase (ND1) gene markers. Out of 3668 animals distributed in 50 herds at localities of Ashmoun and El Sadat cities, El Menoufia Province, Egypt, 420 (11.45%) sheep showed neurological disorders. Postmortem examination of these animals after slaughter at local abattoirs indicated to occurrence of C. cerebralis cysts in the brain of 111 out of 420 (26.4%), with overall infection rate 3.03% of the involved sheep population. Molecular analysis of representative samples of coenuri at ITS1 gene marker showed extensive intra- and inter-sequence diversity due to deletions/insertions in the microsatellite regions. On contrast to the nuclear gene marker, considerably low genetic diversity was seen in the analyzed mitochondrial gene markers. Phylogenetic analysis based on COI and ND1 gene sequences indicated that the generated sequences in the present study and the reference sequences in the database clustered in 4 haplogroups, with more or less similar topologies. Clustering pattern of the phylogenetic tree showed no effect for the geographic location or the host species. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. DNA Barcoding for the Identification of Sand Fly Species (Diptera, Psychodidae, Phlebotominae) in Colombia

    PubMed Central

    Contreras Gutiérrez, María Angélica; Vivero, Rafael J.; Vélez, Iván D.; Porter, Charles H.; Uribe, Sandra

    2014-01-01

    Sand flies include a group of insects that are of medical importance and that vary in geographic distribution, ecology, and pathogen transmission. Approximately 163 species of sand flies have been reported in Colombia. Surveillance of the presence of sand fly species and the actualization of species distribution are important for predicting risks for and monitoring the expansion of diseases which sand flies can transmit. Currently, the identification of phlebotomine sand flies is based on morphological characters. However, morphological identification requires considerable skills and taxonomic expertise. In addition, significant morphological similarity between some species, especially among females, may cause difficulties during the identification process. DNA-based approaches have become increasingly useful and promising tools for estimating sand fly diversity and for ensuring the rapid and accurate identification of species. A partial sequence of the mitochondrial cytochrome oxidase gene subunit I (COI) is currently being used to differentiate species in different animal taxa, including insects, and it is referred as a barcoding sequence. The present study explored the utility of the DNA barcode approach for the identification of phlebotomine sand flies in Colombia. We sequenced 700 bp of the COI gene from 36 species collected from different geographic localities. The COI barcode sequence divergence within a single species was <2% in most cases, whereas this divergence ranged from 9% to 26.6% among different species. These results indicated that the barcoding gene correctly discriminated among the previously morphologically identified species with an efficacy of nearly 100%. Analyses of the generated sequences indicated that the observed species groupings were consistent with the morphological identifications. In conclusion, the barcoding gene was useful for species discrimination in sand flies from Colombia. PMID:24454877

  20. sup 31 P NMR measurements of the ADP concentration in yeast cells genetically modified to express creatine kinase

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brindle, K.; Braddock, P.; Fulton, S.

    1990-04-03

    Rabbit muscle creatine kinase has been introduced into the yeast Saccharomyces cerevisiae by transforming cells with a multicopy plasmid containing the coding sequence for the enzyme under the control of the yeast phosphoglycerate kinase promoter. The transformed cells showed creating kinase activities similar to those found in mammalian heart muscle. {sup 31}P NMR measurements of the near-equilibrium concentrations of phosphocreatine and cellular pH together with measurements of the total extractable concentrations of phosphocreatine and creatine allowed calculation of the free ADP/ATP ratio in the cell. The calculated ratio of approximately 2 was considerably higher than the ratio of between 0.06more » and 0.1 measured directly in cell extracts.« less

  1. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification.

    PubMed

    Sinclair, Robert M; Ravantti, Janne J; Bamford, Dennis H

    2017-04-15

    Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. Copyright © 2017 Sinclair et al.

  2. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification

    PubMed Central

    Sinclair, Robert M.; Ravantti, Janne J.

    2017-01-01

    ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. PMID:28122979

  3. Human Treponema pallidum 11q/j isolate belongs to subsp. endemicum but contains two loci with a sequence in TP0548 and TP0488 similar to subsp. pertenue and subsp. pallidum, respectively

    PubMed Central

    Mikalová, Lenka; Strouhal, Michal; Oppelt, Jan; Grange, Philippe Alain; Janier, Michel; Benhaddou, Nadjet; Dupin, Nicolas; Šmajs, David

    2017-01-01

    Background Treponema pallidum subsp. endemicum (TEN) is the causative agent of endemic syphilis (bejel). An unusual human TEN 11q/j isolate was obtained from a syphilis-like primary genital lesion from a patient that returned to France from Pakistan. Methodology/Principal findings The TEN 11q/j isolate was characterized using nested PCR followed by Sanger sequencing and/or direct Illumina sequencing. Altogether, 44 chromosomal regions were analyzed. Overall, the 11q/j isolate clustered with TEN strains Bosnia A and Iraq B as expected from previous TEN classification of the 11q/j isolate. However, the 11q/j sequence in a 505 bp-long region at the TP0488 locus was similar to Treponema pallidum subsp. pallidum (TPA) strains, but not to TEN Bosnia A and Iraq B sequences, suggesting a recombination event at this locus. Similarly, the 11q/j sequence in a 613 bp-long region at the TP0548 locus was similar to Treponema pallidum subsp. pertenue (TPE) strains, but not to TEN sequences. Conclusions/Significance A detailed analysis of two recombinant loci found in the 11q/j clinical isolate revealed that the recombination event occurred just once, in the TP0488, with the donor sequence originating from a TPA strain. Since TEN Bosnia A and Iraq B were found to contain TPA-like sequences at the TP0548 locus, the recombination at TP0548 took place in a treponeme that was an ancestor to both TEN Bosnia A and Iraq B. The sequence of 11q/j isolate in TP0548 represents an ancestral TEN sequence that is similar to yaws-causing treponemes. In addition to the importance of the 11q/j isolate for reconstruction of the TEN phylogeny, this case emphasizes the possible role of TEN strains in development of syphilis-like lesions. PMID:28263990

  4. Investigating uncultured microbes and their role in a deep subseafloor ammonium sink

    NASA Astrophysics Data System (ADS)

    Kirkpatrick, J. B.; Spivack, A. J.; Smith, D. C.; D'Hondt, S. L.

    2013-12-01

    The marine deep biosphere is thought to hold a large reservoir of both microbial cells and untapped genetic diversity. One potential driving force behind the vast amount of uncultured organisms are unconventional redox pairs which may not be favorable at benchtop conditions, but can support life in other circumstances. One instance of this is the previously documented thermodynamic favorability of ammonium oxidation with sulfate in sediments such as those investigated here from the Indian Ocean. Using 454 tag sequencing of 16S DNA, we identified uncultured archaea and bacteria potentially playing key roles at the sulfate and ammonium interface. First, the phylogenetic identity of organisms potentially involved in this reaction is inferred, as well as thermodynamic considerations of potential pathways. Several novel phyla, as well as Clostridiales, appear over-represented at the reaction zone. Secondly, to understand the metabolic capability of these target organisms, these sequences have been cross-referenced with assemblies from metagenomic data sets, and connections to functional genes are being elucidated. Finally, we discuss parallels with near-shore coastal sediment from Narragansett Bay, Rhode Island, where geochemical similarities have been found. While the thermodynamic regime is similar to the Indian Ocean, suggesting the potential for a broad geographic distribution, accessibility provides the opportunity to construct bioreactors to test rates and pathways of ammonium and sulfate fluxes. Iron content may be a key factor in determining reaction favorability. We present ongoing work in this area and the pros and cons of different bioreactor designs.

  5. A ceramic/slag interface as an analog for accretion of hot refractory objects and rim formation

    NASA Technical Reports Server (NTRS)

    Paque, J. M.; Bunch, T. E.

    1994-01-01

    Refractory inclusions or Ca-Al-rich inclusions (CAI's) from carbonaceous chondrites span a wide range of bulk compositions that cannot be explained either by segregation from a gas of solar composition at different points in the condensation sequence or by fractional crystallization from a parent liquid. CAI's are commonly rimmed by Wark-Lovering (W-L) rims, a series of nearly monomineralic layers that have been a source of controversy since the variety of rim sequences occurring on different types of CAI's from Allende were described. The origin of these distinctive features has not yet been resolved, with proponents of accretion, condensation, flash heating, ablation, evaporation, etc. Rims have generated considerable interest because they potentially contain clues to conditions experienced by CAI's after the formation of the inclusion and prior to incorporation into the parent body. Ceramic bricks in contact with hot steel slag may produce reaction products in rimlike fashion similar to those found in CAI's. The similarity between the mineralogy of blast furnace slags and CAI's has long been recognized, with both containing unusual phases not found in terrestrial materials. We provide here a comparison between a ceramic brick/slag multiple-layered interface and a multiple-layered interface between a melilite-perovskite object and a melilite-spinel object in the Allende inclusion USNM 4691-1. These results have implications in interpreting the origin of rims and the textures and compositions of CAI's.

  6. The HMMER Web Server for Protein Sequence Similarity Search.

    PubMed

    Prakash, Ananth; Jeffryes, Matt; Bateman, Alex; Finn, Robert D

    2017-12-08

    Protein sequence similarity search is one of the most commonly used bioinformatics methods for identifying evolutionarily related proteins. In general, sequences that are evolutionarily related share some degree of similarity, and sequence-search algorithms use this principle to identify homologs. The requirement for a fast and sensitive sequence search method led to the development of the HMMER software, which in the latest version (v3.1) uses a combination of sophisticated acceleration heuristics and mathematical and computational optimizations to enable the use of profile hidden Markov models (HMMs) for sequence analysis. The HMMER Web server provides a common platform by linking the HMMER algorithms to databases, thereby enabling the search for homologs, as well as providing sequence and functional annotation by linking external databases. This unit describes three basic protocols and two alternate protocols that explain how to use the HMMER Web server using various input formats and user defined parameters. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  7. Exploiting sequence similarity to validate the sensitivity of SNP arrays in detecting fine-scaled copy number variations.

    PubMed

    Wong, Gerard; Leckie, Christopher; Gorringe, Kylie L; Haviv, Izhak; Campbell, Ian G; Kowalczyk, Adam

    2010-04-15

    High-density single nucleotide polymorphism (SNP) genotyping arrays are efficient and cost effective platforms for the detection of copy number variation (CNV). To ensure accuracy in probe synthesis and to minimize production costs, short oligonucleotide probe sequences are used. The use of short probe sequences limits the specificity of binding targets in the human genome. The specificity of these short probeset sequences has yet to be fully analysed against a normal reference human genome. Sequence similarity can artificially elevate or suppress copy number measurements, and hence reduce the reliability of affected probe readings. For the purpose of detecting narrow CNVs reliably down to the width of a single probeset, sequence similarity is an important issue that needs to be addressed. We surveyed the Affymetrix Human Mapping SNP arrays for probeset sequence similarity against the reference human genome. Utilizing sequence similarity results, we identified a collection of fine-scaled putative CNVs between gender from autosomal probesets whose sequence matches various loci on the sex chromosomes. To detect these variations, we utilized our statistical approach, Detecting REcurrent Copy number change using rank-order Statistics (DRECS), and showed that its performance was superior and more stable than the t-test in detecting CNVs. Through the application of DRECS on the HapMap population datasets with multi-matching probesets filtered, we identified biologically relevant SNPs in aberrant regions across populations with known association to physical traits, such as height, covered by the span of a single probe. This provided empirical confirmation of the existence of naturally occurring narrow CNVs as well as the sensitivity of the Affymetrix SNP array technology in detecting them. The MATLAB implementation of DRECS is available at http://ww2.cs.mu.oz.au/ approximately gwong/DRECS/index.html.

  8. Marker genes that are less conserved in their sequences are useful for predicting genome-wide similarity levels between closely related prokaryotic strains

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lan, Yemin; Rosen, Gail; Hershberg, Ruth

    The 16s rRNA gene is so far the most widely used marker for taxonomical classification and separation of prokaryotes. Since it is universally conserved among prokaryotes, it is possible to use this gene to classify a broad range of prokaryotic organisms. At the same time, it has often been noted that the 16s rRNA gene is too conserved to separate between prokaryotes at finer taxonomic levels. In this paper, we examine how well levels of similarity of 16s rRNA and 73 additional universal or nearly universal marker genes correlate with genome-wide levels of gene sequence similarity. We demonstrate that themore » percent identity of 16s rRNA predicts genome-wide levels of similarity very well for distantly related prokaryotes, but not for closely related ones. In closely related prokaryotes, we find that there are many other marker genes for which levels of similarity are much more predictive of genome-wide levels of gene sequence similarity. Finally, we show that the identities of the markers that are most useful for predicting genome-wide levels of similarity within closely related prokaryotic lineages vary greatly between lineages. However, the most useful markers are always those that are least conserved in their sequences within each lineage. In conclusion, our results show that by choosing markers that are less conserved in their sequences within a lineage of interest, it is possible to better predict genome-wide gene sequence similarity between closely related prokaryotes than is possible using the 16s rRNA gene. We point readers towards a database we have created (POGO-DB) that can be used to easily establish which markers show lowest levels of sequence conservation within different prokaryotic lineages.« less

  9. Marker genes that are less conserved in their sequences are useful for predicting genome-wide similarity levels between closely related prokaryotic strains

    DOE PAGES

    Lan, Yemin; Rosen, Gail; Hershberg, Ruth

    2016-05-03

    The 16s rRNA gene is so far the most widely used marker for taxonomical classification and separation of prokaryotes. Since it is universally conserved among prokaryotes, it is possible to use this gene to classify a broad range of prokaryotic organisms. At the same time, it has often been noted that the 16s rRNA gene is too conserved to separate between prokaryotes at finer taxonomic levels. In this paper, we examine how well levels of similarity of 16s rRNA and 73 additional universal or nearly universal marker genes correlate with genome-wide levels of gene sequence similarity. We demonstrate that themore » percent identity of 16s rRNA predicts genome-wide levels of similarity very well for distantly related prokaryotes, but not for closely related ones. In closely related prokaryotes, we find that there are many other marker genes for which levels of similarity are much more predictive of genome-wide levels of gene sequence similarity. Finally, we show that the identities of the markers that are most useful for predicting genome-wide levels of similarity within closely related prokaryotic lineages vary greatly between lineages. However, the most useful markers are always those that are least conserved in their sequences within each lineage. In conclusion, our results show that by choosing markers that are less conserved in their sequences within a lineage of interest, it is possible to better predict genome-wide gene sequence similarity between closely related prokaryotes than is possible using the 16s rRNA gene. We point readers towards a database we have created (POGO-DB) that can be used to easily establish which markers show lowest levels of sequence conservation within different prokaryotic lineages.« less

  10. Sequence and Transcriptional Analyses of the Fish Retroviruses Walleye Epidermal Hyperplasia Virus Types 1 and 2: Evidence for a Gene Duplication

    PubMed Central

    LaPierre, Lorie A.; Holzschu, Donald L.; Bowser, Paul R.; Casey, James W.

    1999-01-01

    Walleye epidermal hyperplasia virus types 1 and 2 (WEHV1 and WEHV2, respectively) are associated with a hyperproliferative skin lesion on walleyes that appears and regresses seasonally. We have determined the complete nucleotide sequences and transcriptional profiles of these viruses. WEHV1 and WEHV2 are large, complex retroviruses of 12,999 and 13,125 kb in length, respectively, that are closely related to one another and to walleye dermal sarcoma virus (WDSV). These walleye retroviruses contain three open reading frames, orfA, orfB, and orfC, in addition to gag, pol, and env. orfA and orfB are adjacent to one another and located downstream of env. The OrfA proteins were previously identified as cyclin D homologs that may contribute to the induction of cell proliferation leading to epidermal hyperplasia and dermal sarcoma. The sequence analysis of WEHV1 and WEHV2 revealed that the OrfB proteins are distantly related to the OrfA proteins, suggesting that orfB arose by gene duplication. Presuming that the precursor of orfA and orfB was derived from a cellular cyclin, these genes are the first accessory genes of complex retroviruses that can be traced to a cellular origin. WEHV1, WEHV2, and WDSV are the only retroviruses that have an open reading frame, orfC, of considerable size (ca. 130 amino acids) in the leader region preceding gag. While we were unable to predict a function for the OrfC proteins, they are more conserved than OrfA and OrfB, suggesting that they may be biologically important to the viruses. The transcriptional profiles of WEHV1 and WEHV2 were also similar to that of WDSV; Northern blot analyses detected only low levels of the orfA transcripts in developing lesions, whereas abundant levels of genomic, env, orfA, and orfB transcripts were detected in regressing lesions. The splice donors and acceptors of individual transcripts were identified by reverse transcriptase PCR. The similarities of WEHV1, WEHV2, and WDSV suggest that these viruses use similar strategies of viral replication and induce cell proliferation by a similar mechanism. PMID:10516048

  11. Amino terminal sequence of heavy and light chains from ratfish immunoglobulin.

    PubMed

    De Ioannes, A E; Aguila, H L

    1989-01-01

    The ratfish, Callorhinchus callorhinchus, a representative of the Holocephali, has a natural serum hemagglutinin (Mr 960,000), composed of heavy (Mr 71,000), light (Mr 22,500), and J (Mr 16,000) chains. To approach the mechanisms that generate diversity at this level of evolution, the amino terminal sequence of the heavy and light chains was determined by automated microsequencing. The chains are unblocked and have modest internal sequence heterogeneity. The heavy chains show sequence similarity with the terminal region of the heavy chain from the horned shark, Heterodontus francisci, and other species. In contrast to the heavy chain, the ratfish light chains display low sequence similarity with their shark kappa counterparts. However, their similarity with the variable region of the chicken lambda light chains is about 75%.

  12. Analysis of xylem formation in pine by cDNA sequencing

    NASA Technical Reports Server (NTRS)

    Allona, I.; Quinn, M.; Shoop, E.; Swope, K.; St Cyr, S.; Carlis, J.; Riedl, J.; Retzel, E.; Campbell, M. M.; Sederoff, R.; hide

    1998-01-01

    Secondary xylem (wood) formation is likely to involve some genes expressed rarely or not at all in herbaceous plants. Moreover, environmental and developmental stimuli influence secondary xylem differentiation, producing morphological and chemical changes in wood. To increase our understanding of xylem formation, and to provide material for comparative analysis of gymnosperm and angiosperm sequences, ESTs were obtained from immature xylem of loblolly pine (Pinus taeda L.). A total of 1,097 single-pass sequences were obtained from 5' ends of cDNAs made from gravistimulated tissue from bent trees. Cluster analysis detected 107 groups of similar sequences, ranging in size from 2 to 20 sequences. A total of 361 sequences fell into these groups, whereas 736 sequences were unique. About 55% of the pine EST sequences show similarity to previously described sequences in public databases. About 10% of the recognized genes encode factors involved in cell wall formation. Sequences similar to cell wall proteins, most known lignin biosynthetic enzymes, and several enzymes of carbohydrate metabolism were found. A number of putative regulatory proteins also are represented. Expression patterns of several of these genes were studied in various tissues and organs of pine. Sequencing novel genes expressed during xylem formation will provide a powerful means of identifying mechanisms controlling this important differentiation pathway.

  13. SW#db: GPU-Accelerated Exact Sequence Similarity Database Search.

    PubMed

    Korpar, Matija; Šošić, Martin; Blažeka, Dino; Šikić, Mile

    2015-01-01

    In recent years we have witnessed a growth in sequencing yield, the number of samples sequenced, and as a result-the growth of publicly maintained sequence databases. The increase of data present all around has put high requirements on protein similarity search algorithms with two ever-opposite goals: how to keep the running times acceptable while maintaining a high-enough level of sensitivity. The most time consuming step of similarity search are the local alignments between query and database sequences. This step is usually performed using exact local alignment algorithms such as Smith-Waterman. Due to its quadratic time complexity, alignments of a query to the whole database are usually too slow. Therefore, the majority of the protein similarity search methods prior to doing the exact local alignment apply heuristics to reduce the number of possible candidate sequences in the database. However, there is still a need for the alignment of a query sequence to a reduced database. In this paper we present the SW#db tool and a library for fast exact similarity search. Although its running times, as a standalone tool, are comparable to the running times of BLAST, it is primarily intended to be used for exact local alignment phase in which the database of sequences has already been reduced. It uses both GPU and CPU parallelization and was 4-5 times faster than SSEARCH, 6-25 times faster than CUDASW++ and more than 20 times faster than SSW at the time of writing, using multiple queries on Swiss-prot and Uniref90 databases.

  14. Expressed Sequence Tag Analysis of the Human Pathogen Paracoccidioides brasiliensis Yeast Phase: Identification of Putative Homologues of Candida albicans Virulence and Pathogenicity Genes

    PubMed Central

    Goldman, Gustavo H.; dos Reis Marques, Everaldo; Custódio Duarte Ribeiro, Diógenes; Ângelo de Souza Bernardes, Luciano; Quiapin, Andréa Carla; Vitorelli, Patrícia Marostica; Savoldi, Marcela; Semighini, Camile P.; de Oliveira, Regina C.; Nunes, Luiz R.; Travassos, Luiz R.; Puccia, Rosana; Batista, Wagner L.; Ferreira, Leslie Ecker; Moreira, Júlio C.; Bogossian, Ana Paula; Tekaia, Fredj; Nobrega, Marina Pasetto; Nobrega, Francisco G.; Goldman, Maria Helena S.

    2003-01-01

    Paracoccidioides brasiliensis, a thermodimorphic fungus, is the causative agent of the prevalent systemic mycosis in Latin America, paracoccidioidomycosis. We present here a survey of expressed genes in the yeast pathogenic phase of P. brasiliensis. We obtained 13,490 expressed sequence tags from both 5′ and 3′ ends. Clustering analysis yielded the partial sequences of 4,692 expressed genes that were functionally classified by similarity to known genes. We have identified several Candida albicans virulence and pathogenicity homologues in P. brasiliensis. Furthermore, we have analyzed the expression of some of these genes during the dimorphic yeast-mycelium-yeast transition by real-time quantitative reverse transcription-PCR. Clustering analysis of the mycelium-yeast transition revealed three groups: (i) RBT, hydrophobin, and isocitrate lyase; (ii) malate dehydrogenase, contigs Pb1067 and Pb1145, GPI, and alternative oxidase; and (iii) ubiquitin, delta-9-desaturase, HSP70, HSP82, and HSP104. The first two groups displayed high mRNA expression in the mycelial phase, whereas the third group showed higher mRNA expression in the yeast phase. Our results suggest the possible conservation of pathogenicity and virulence mechanisms among fungi, expand considerably gene identification in P. brasiliensis, and provide a broader basis for further progress in understanding its biological peculiarities. PMID:12582121

  15. Diversity and three-dimensional structures of the alpha Mcr of the methanogenic Archaea from the anoxic region of Tucuruí Lake, in Eastern Brazilian Amazonia

    PubMed Central

    Santana, Priscila Bessa; Junior, Rubens Ghilardi; Alves, Claudio Nahum; Silva, Jeronimo Lameira; McCulloch, John Anthony; Schneider, Maria Paula Cruz; da Costa da Silva, Artur

    2012-01-01

    Methanogenic archaeans are organisms of considerable ecological and biotechnological interest that produce methane through a restricted metabolic pathway, which culminates in the reaction catalyzed by the Methyl-coenzyme M reductase (Mcr) enzyme, and results in the release of methane. Using a metagenomic approach, the gene of the α subunit of mcr (mcrα) was isolated from sediment sample from an anoxic zone, rich in decomposing organic material, obtained from the Tucuruí hydroelectric dam reservoir in eastern Brazilian Amazonia. The partial nucleotide sequences obtained were 83 to 95% similar to those available in databases, indicating a low diversity of archaeans in the reservoir. Two orders were identified - the Methanomicrobiales, and a unique Operational Taxonomic Unit (OTU) forming a clade with the Methanosarcinales according to low bootstrap values. Homology modeling was used to determine the three-dimensional (3D) structures, for this the partial nucleotide sequence of the mcrα were isolated and translated on their partial amino acid sequences. The 3D structures of the archaean Mcrα observed in the present study varied little, and presented approximately 70% identity in comparison with the Mcrα of Methanopyrus klanderi. The results demonstrated that the community of methanogenic archaeans of the anoxic C1 region of the Tucurui reservoir is relatively homogeneous. PMID:22481885

  16. The isolation and amino acid sequence of an adrenocorticotrophin from the pars distalis and a corticotrophin-like intermediate-lobe peptide from the neurointermediate lobe of the pituitary of the dogfish Squalus acanthias

    PubMed Central

    Lowry, Philip J.; Bennett, Hugh P. J.; McMartin, Colin; Scott, Alexander P.

    1974-01-01

    An adrenocorticotrophic hormone (ACTH) was isolated from extracts of the pars distalis of the pituitary of the dogfish Squalus acanthias by gel filtration and ion-exchange chromatography. It had 15% of the potency of human ACTH in promoting cortico-steroidogenesis in isolated rat adrenal cells. Sequence analysis revealed it to be a nonatria-contapeptide with the following primary structure: Ser-Tyr-Ser-Met-Glu-His-Phe-Arg-Trp-Gly-Lys-Pro-Met-Gly-Arg-Lys-Arg-Arg-Pro-Ile-Lys-Val-Tyr-Pro-Asn-Ser-Phe-Glu-Asp-Glu-Ser-Val-Glu-Asn-Met-Gly-Pro-Glu-Leu. The N-terminal tridecapeptide sequence was identical with the proposed structure of dogfish α-melanocyte-stimulating hormone (α-MSH). On comparison with human ACTH eleven amino acid differences were seen, nine of which are in the 20–39 region of the molecule which is not essential for the steroidogenic activity of ACTH. A peptide identical with the 18–39 portion of this new ACTH was similarly isolated from the neurointermediate lobe of the pituitary where considerable amounts of dogfish α-MSH were found. This supported our view that ACTH as well as having a distinct biological role of its own is also the precursor of α-MSH. PMID:4375977

  17. A field ornithologist’s guide to genomics: Practical considerations for ecology and conservation

    USGS Publications Warehouse

    Oyler-McCance, Sara J.; Oh, Kevin; Langin, Kathryn; Aldridge, Cameron L.

    2016-01-01

    Vast improvements in sequencing technology have made it practical to simultaneously sequence millions of nucleotides distributed across the genome, opening the door for genomic studies in virtually any species. Ornithological research stands to benefit in three substantial ways. First, genomic methods enhance our ability to parse and simultaneously analyze both neutral and non-neutral genomic regions, thus providing insight into adaptive evolution and divergence. Second, the sheer quantity of sequence data generated by current sequencing platforms allows increased precision and resolution in analyses. Third, high-throughput sequencing can benefit applications that focus on a small number of loci that are otherwise prohibitively expensive, time-consuming, and technically difficult using traditional sequencing methods. These advances have improved our ability to understand evolutionary processes like speciation and local adaptation, but they also offer many practical applications in the fields of population ecology, migration tracking, conservation planning, diet analyses, and disease ecology. This review provides a guide for field ornithologists interested in incorporating genomic approaches into their research program, with an emphasis on techniques related to ecology and conservation. We present a general overview of contemporary genomic approaches and methods, as well as important considerations when selecting a genomic technique. We also discuss research questions that are likely to benefit from utilizing high-throughput sequencing instruments, highlighting select examples from recent avian studies.

  18. PCR Amplification Strategies towards full-length HIV-1 Genome sequencing.

    PubMed

    Liu, Chao Chun; Ji, Hezhao

    2018-06-26

    The advent of next generation sequencing has enabled greater resolution of viral diversity and improved feasibility of full viral genome sequencing allowing routine HIV-1 full genome sequencing in both research and diagnostic settings. Regardless of the sequencing platform selected, successful PCR amplification of the HIV-1 genome is essential for sequencing template preparation. As such, full HIV-1 genome amplification is a crucial step in dictating the successful and reliable sequencing downstream. Here we reviewed existing PCR protocols leading to HIV-1 full genome sequencing. In addition to the discussion on basic considerations on relevant PCR design, the advantages as well as the pitfalls of published protocols were reviewed. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  19. Genome Sequences of Mycobacteriophages Amgine, Amohnition, Bella96, Cain, DarthP, Hammy, Krueger, LastHope, Peanam, PhelpsODU, Phrank, SirPhilip, Slimphazie, and Unicorn

    PubMed Central

    Anders, Kirk R.; Mavrodi, Dmitri V.; Vazquez, Edwin; Amoh, Nana Yaa A.; Baliraine, Frederick N.; Buchser, William J.; Cast, Thomas P.; Chamberlain, Carmen E.; Chung, Hui-Min; D’Angelo, William A.; Farris, Christian T.; Fernandez-Martinez, Mariceli; Fischman, Haley D.; Forsyth, Mark H.; Fortier, Anna G.; Gallo, Kara F.; Held, Greta J.; Lomas, Miguel A.; Maldonado-Vazquez, Natalia Y.; Moonsammy, Claudia H.; Namboote, Peace; Paudel, Sudip; Reyes, Gabriella M.; Rubin, Michael R.; Saha, Margaret S.; Stukey, Joseph; Tobias, Tristan D.; Garlena, Rebecca A.; Stoner, Ty H.; Russell, Daniel A.

    2017-01-01

    ABSTRACT We report the genome sequences of 14 cluster K mycobacteriophages isolated using Mycobacterium smegmatis mc²155 as host. Four are closely related to subcluster K1 phages, and 10 are members of subcluster K6. The phage genomes span considerable sequence diversity, including multiple types of integrases and integration sites. PMID:29217790

  20. Discrimination of germline V genes at different sequencing lengths and mutational burdens: A new tool for identifying and evaluating the reliability of V gene assignment.

    PubMed

    Zhang, Bochao; Meng, Wenzhao; Prak, Eline T Luning; Hershberg, Uri

    2015-12-01

    Immune repertoires are collections of lymphocytes that express diverse antigen receptor gene rearrangements consisting of Variable (V), (Diversity (D) in the case of heavy chains) and Joining (J) gene segments. Clonally related cells typically share the same germline gene segments and have highly similar junctional sequences within their third complementarity determining regions. Identifying clonal relatedness of sequences is a key step in the analysis of immune repertoires. The V gene is the most important for clone identification because it has the longest sequence and the greatest number of sequence variants. However, accurate identification of a clone's germline V gene source is challenging because there is a high degree of similarity between different germline V genes. This difficulty is compounded in antibodies, which can undergo somatic hypermutation. Furthermore, high-throughput sequencing experiments often generate partial sequences and have significant error rates. To address these issues, we describe a novel method to estimate which germline V genes (or alleles) cannot be discriminated under different conditions (read lengths, sequencing errors or somatic hypermutation frequencies). Starting with any set of germline V genes, this method measures their similarity using different sequencing lengths and calculates their likelihood of unambiguous assignment under different levels of mutation. Hence, one can identify, under different experimental and biological conditions, the germline V genes (or alleles) that cannot be uniquely identified and bundle them together into groups of specific V genes with highly similar sequences. Copyright © 2015 Elsevier B.V. All rights reserved.

  1. Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM.

    PubMed

    Liang, Yunyun; Liu, Sanyang; Zhang, Shengli

    2015-01-01

    Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.

  2. Diversity of virus-host systems in hypersaline Lake Retba, Senegal.

    PubMed

    Sime-Ngando, Télesphore; Lucas, Soizick; Robin, Agnès; Tucker, Kimberly Pause; Colombet, Jonathan; Bettarel, Yvan; Desmond, Elie; Gribaldo, Simonetta; Forterre, Patrick; Breitbart, Mya; Prangishvili, David

    2011-08-01

    Remarkable morphological diversity of virus-like particles was observed by transmission electron microscopy in a hypersaline water sample from Lake Retba, Senegal. The majority of particles morphologically resembled hyperthermophilic archaeal DNA viruses isolated from extreme geothermal environments. Some hypersaline viral morphotypes have not been previously observed in nature, and less than 1% of observed particles had a head-and-tail morphology, which is typical for bacterial DNA viruses. Culture-independent analysis of the microbial diversity in the sample suggested the dominance of extremely halophilic archaea. Few of the 16S sequences corresponded to known archeal genera (Haloquadratum, Halorubrum and Natronomonas), whereas the majority represented novel archaeal clades. Three sequences corresponded to a new basal lineage of the haloarchaea. Bacteria belonged to four major phyla, consistent with the known diversity in saline environments. Metagenomic sequencing of DNA from the purified virus-like particles revealed very few similarities to the NCBI non-redundant database at either the nucleotide or amino acid level. Some of the identifiable virus sequences were most similar to previously described haloarchaeal viruses, but no sequence similarities were found to archaeal viruses from extreme geothermal environments. A large proportion of the sequences had similarity to previously sequenced viral metagenomes from solar salterns. © 2010 Society for Applied Microbiology and Blackwell Publishing Ltd.

  3. The spliced leader trans-splicing mechanism in different organisms: molecular details and possible biological roles

    PubMed Central

    Bitar, Mainá; Boroni, Mariana; Macedo, Andréa M.; Machado, Carlos R.; Franco, Glória R.

    2013-01-01

    The spliced leader (SL) is a gene that generates a functional ncRNA that is composed of two regions: an intronic region of unknown function (SLi) and an exonic region (SLe), which is transferred to the 5′ end of independent transcripts yielding mature mRNAs, in a process known as spliced leader trans-splicing (SLTS). The best described function for SLTS is to solve polycistronic transcripts into monocistronic units, specifically in Trypanosomatids. In other metazoans, it is speculated that the SLe addition could lead to increased mRNA stability, differential recruitment of the translational machinery, modification of the 5′ region or a combination of these effects. Although important aspects of this mechanism have been revealed, several features remain to be elucidated. We have analyzed 157 SLe sequences from 148 species from seven phyla and found a high degree of conservation among the sequences of species from the same phylum, although no considerable similarity seems to exist between sequences of species from different phyla. When analyzing case studies, we found evidence that a given SLe will always be related to a given set of transcripts in different species from the same phylum, and therefore, different SLe sequences from the same species would regulate different sets of transcripts. In addition, we have observed distinct transcript categories to be preferential targets for the SLe addition in different phyla. This work sheds light into crucial and controversial aspects of the SLTS mechanism. It represents a comprehensive study concerning various species and different characteristics of this important post-transcriptional regulatory mechanism. PMID:24130571

  4. De Novo Transcriptomic Analysis of an Oleaginous Microalga: Pathway Description and Gene Discovery for Production of Next-Generation Biofuels

    PubMed Central

    Wan, LingLin; Han, Juan; Sang, Min; Li, AiFen; Wu, Hong; Yin, ShunJi; Zhang, ChengWu

    2012-01-01

    Background Eustigmatos cf. polyphem is a yellow-green unicellular soil microalga belonging to the eustimatophyte with high biomass and considerable production of triacylglycerols (TAGs) for biofuels, which is thus referred to as an oleaginous microalga. The paucity of microalgae genome sequences, however, limits development of gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for a non-model microalgae species, E. cf. polyphem, and identify pathways and genes of importance related to biofuel production. Results We performed the de novo assembly of E. cf. polyphem transcriptome using Illumina paired-end sequencing technology. In a single run, we produced 29,199,432 sequencing reads corresponding to 2.33 Gb total nucleotides. These reads were assembled into 75,632 unigenes with a mean size of 503 bp and an N50 of 663 bp, ranging from 100 bp to >3,000 bp. Assembled unigenes were subjected to BLAST similarity searches and annotated with Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology identifiers. These analyses identified the majority of carbohydrate, fatty acids, TAG and carotenoids biosynthesis and catabolism pathways in E. cf. polyphem. Conclusions Our data provides the construction of metabolic pathways involved in the biosynthesis and catabolism of carbohydrate, fatty acids, TAG and carotenoids in E. cf. polyphem and provides a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character of microalgae-based biofuel feedstock. PMID:22536352

  5. TaqMan Real-Time PCR Assays To Assess Arbuscular Mycorrhizal Responses to Field Manipulation of Grassland Biodiversity: Effects of Soil Characteristics, Plant Species Richness, and Functional Traits▿ †

    PubMed Central

    König, Stephan; Wubet, Tesfaye; Dormann, Carsten F.; Hempel, Stefan; Renker, Carsten; Buscot, François

    2010-01-01

    Large-scale (temporal and/or spatial) molecular investigations of the diversity and distribution of arbuscular mycorrhizal fungi (AMF) require considerable sampling efforts and high-throughput analysis. To facilitate such efforts, we have developed a TaqMan real-time PCR assay to detect and identify AMF in environmental samples. First, we screened the diversity in clone libraries, generated by nested PCR, of the nuclear ribosomal DNA internal transcribed spacer (ITS) of AMF in environmental samples. We then generated probes and forward primers based on the detected sequences, enabling AMF sequence type-specific detection in TaqMan multiplex real-time PCR assays. In comparisons to conventional clone library screening and Sanger sequencing, the TaqMan assay approach provided similar accuracy but higher sensitivity with cost and time savings. The TaqMan assays were applied to analyze the AMF community composition within plots of a large-scale plant biodiversity manipulation experiment, the Jena Experiment, primarily designed to investigate the interactive effects of plant biodiversity on element cycling and trophic interactions. The results show that environmental variables hierarchically shape AMF communities and that the sequence type spectrum is strongly affected by previous land use and disturbance, which appears to favor disturbance-tolerant members of the genus Glomus. The AMF species richness of disturbance-associated communities can be largely explained by richness of plant species and plant functional groups, while plant productivity and soil parameters appear to have only weak effects on the AMF community. PMID:20418424

  6. A survey and evaluations of histogram-based statistics in alignment-free sequence comparison.

    PubMed

    Luczak, Brian B; James, Benjamin T; Girgis, Hani Z

    2017-12-06

    Since the dawn of the bioinformatics field, sequence alignment scores have been the main method for comparing sequences. However, alignment algorithms are quadratic, requiring long execution time. As alternatives, scientists have developed tens of alignment-free statistics for measuring the similarity between two sequences. We surveyed tens of alignment-free k-mer statistics. Additionally, we evaluated 33 statistics and multiplicative combinations between the statistics and/or their squares. These statistics are calculated on two k-mer histograms representing two sequences. Our evaluations using global alignment scores revealed that the majority of the statistics are sensitive and capable of finding similar sequences to a query sequence. Therefore, any of these statistics can filter out dissimilar sequences quickly. Further, we observed that multiplicative combinations of the statistics are highly correlated with the identity score. Furthermore, combinations involving sequence length difference or Earth Mover's distance, which takes the length difference into account, are always among the highest correlated paired statistics with identity scores. Similarly, paired statistics including length difference or Earth Mover's distance are among the best performers in finding the K-closest sequences. Interestingly, similar performance can be obtained using histograms of shorter words, resulting in reducing the memory requirement and increasing the speed remarkably. Moreover, we found that simple single statistics are sufficient for processing next-generation sequencing reads and for applications relying on local alignment. Finally, we measured the time requirement of each statistic. The survey and the evaluations will help scientists with identifying efficient alternatives to the costly alignment algorithm, saving thousands of computational hours. The source code of the benchmarking tool is available as Supplementary Materials. © The Author 2017. Published by Oxford University Press.

  7. Pattern similarity study of functional sites in protein sequences: lysozymes and cystatins

    PubMed Central

    Nakai, Shuryo; Li-Chan, Eunice CY; Dou, Jinglie

    2005-01-01

    Background Although it is generally agreed that topography is more conserved than sequences, proteins sharing the same fold can have different functions, while there are protein families with low sequence similarity. An alternative method for profile analysis of characteristic conserved positions of the motifs within the 3D structures may be needed for functional annotation of protein sequences. Using the approach of quantitative structure-activity relationships (QSAR), we have proposed a new algorithm for postulating functional mechanisms on the basis of pattern similarity and average of property values of side-chains in segments within sequences. This approach was used to search for functional sites of proteins belonging to the lysozyme and cystatin families. Results Hydrophobicity and β-turn propensity of reference segments with 3–7 residues were used for the homology similarity search (HSS) for active sites. Hydrogen bonding was used as the side-chain property for searching the binding sites of lysozymes. The profiles of similarity constants and average values of these parameters as functions of their positions in the sequences could identify both active and substrate binding sites of the lysozyme of Streptomyces coelicolor, which has been reported as a new fold enzyme (Cellosyl). The same approach was successfully applied to cystatins, especially for postulating the mechanisms of amyloidosis of human cystatin C as well as human lysozyme. Conclusion Pattern similarity and average index values of structure-related properties of side chains in short segments of three residues or longer were, for the first time, successfully applied for predicting functional sites in sequences. This new approach may be applicable to studying functional sites in un-annotated proteins, for which complete 3D structures are not yet available. PMID:15904486

  8. Query-seeded iterative sequence similarity searching improves selectivity 5–20-fold

    PubMed Central

    Li, Weizhong; Lopez, Rodrigo

    2017-01-01

    Abstract Iterative similarity search programs, like psiblast, jackhmmer, and psisearch, are much more sensitive than pairwise similarity search methods like blast and ssearch because they build a position specific scoring model (a PSSM or HMM) that captures the pattern of sequence conservation characteristic to a protein family. But models are subject to contamination; once an unrelated sequence has been added to the model, homologs of the unrelated sequence will also produce high scores, and the model can diverge from the original protein family. Examination of alignment errors during psiblast PSSM contamination suggested a simple strategy for dramatically reducing PSSM contamination. psiblast PSSMs are built from the query-based multiple sequence alignment (MSA) implied by the pairwise alignments between the query model (PSSM, HMM) and the subject sequences in the library. When the original query sequence residues are inserted into gapped positions in the aligned subject sequence, the resulting PSSM rarely produces alignment over-extensions or alignments to unrelated sequences. This simple step, which tends to anchor the PSSM to the original query sequence and slightly increase target percent identity, can reduce the frequency of false-positive alignments more than 20-fold compared with psiblast and jackhmmer, with little loss in search sensitivity. PMID:27923999

  9. mrtailor: a tool for PDB-file preparation for the generation of external restraints.

    PubMed

    Gruene, Tim

    2013-09-01

    Model building starting from, for example, a molecular-replacement solution with low sequence similarity introduces model bias, which can be difficult to detect, especially at low resolution. The program mrtailor removes low-similarity regions from a template PDB file according to sequence similarity between the target sequence and the template sequence and maps the target sequence onto the PDB file. The modified PDB file can be used to generate external restraints for low-resolution refinement with reduced model bias and can be used as a starting point for model building and refinement. The program can call ProSMART [Nicholls et al. (2012), Acta Cryst. D68, 404-417] directly in order to create external restraints suitable for REFMAC5 [Murshudov et al. (2011), Acta Cryst. D67, 355-367]. Both a command-line version and a GUI exist.

  10. Maximizing ecological and evolutionary insight in bisulfite sequencing data sets

    PubMed Central

    Lea, Amanda J.; Vilgalys, Tauras P.; Durst, Paul A.P.; Tung, Jenny

    2017-01-01

    Preface Genome-scale bisulfite sequencing approaches have opened the door to ecological and evolutionary studies of DNA methylation in many organisms. These approaches can be powerful. However, they introduce new methodological and statistical considerations, some of which are particularly relevant to non-model systems. Here, we highlight how these considerations influence a study’s power to link methylation variation with a predictor variable of interest. Relative to current practice, we argue that sample sizes will need to increase to provide robust insights. We also provide recommendations for overcoming common challenges and an R Shiny app to aid in study design. PMID:29046582

  11. The nucleotide sequence of 5S rRNA from a cellular slime mold Dictyostelium discoideum.

    PubMed Central

    Hori, H; Osawa, S; Iwabuchi, M

    1980-01-01

    The nucleotide sequence of ribosomal 5S rRNA from a cellular slime mold Dictyostelium discoideum is GUAUACGGCCAUACUAGGUUGGAAACACAUCAUCCCGUUCGAUCUGAUA AGUAAAUCGACCUCAGGCCUUCCAAGUACUCUGGUUGGAGACAACAGGGGAACAUAGGGUGCUGUAUACU. A model for the secondary structure of this 5S rRNA is proposed. The sequence is more similar to those of animals (62% similarity on the average) rather than those of yeasts (56%). Images PMID:7465421

  12. Molecular characterization of a novel rhabdovirus infecting blackcurrant identified by high-throughput sequencing.

    PubMed

    Wu, L-P; Yang, T; Liu, H-W; Postman, J; Li, R

    2018-05-01

    A large contig with sequence similarities to several nucleorhabdoviruses was identified by high-throughput sequencing analysis from a black currant (Ribes nigrum L.) cultivar. The complete genome sequence of this new nucleorhabdovirus is 14,432 nucleotides long. Its genomic organization is very similar to those of unsegmented plant rhabdoviruses, containing six open reading frames in the order 3'-N-P-P3-M-G-L-5. The virus, which is provisionally named "black currant-associated rhabdovirus", is 41-52% identical in its genome nucleotide sequence to other nucleorhabdoviruses and may represent a new species in the genus Nucleorhabdovirus.

  13. Inter- and intraspecific mitochondrial DNA variation in North American bears (Ursus)

    USGS Publications Warehouse

    Cronin, Matthew A.; Amstrup, Steven C.; Garner, Gerald W.; Vyse, Ernest R.

    1991-01-01

    We assessed mitochondrial DNA variation in North American black bears (Ursus americanus), brown bears (Ursus arctos), and polar bears (Ursus maritimus). Divergent mitochondrial DNA haplotypes (0.05 base substitutions per nucleotide) were identified in populations of black bears from Montana and Oregon. In contrast, very similar haplotypes occur in black bears across North America. This discordance of haplotype phylogeny and geographic distribution indicates that there has been maintenance of polymorphism and considerable gene flow throughout the history of the species. Intraspecific mitochondrial DNA sequence divergence in brown bears and polar bears is lower than in black bears. The two morphological forms of U. arctos, grizzly and coastal brown bears, are not in distinct mtDNA lineages. Interspecific comparisons indicate that brown bears and polar bears share similar mitochondrial DNA (0.023 base substitutions per nucleotide) which is quite divergent (0.078 base substitutions per nucleotide) from that of black bears. High mitochondrial DNA divergence within black bears and paraphyletic relationships of brown and polar bear mitochondrial DNA indicate that intraspecific variation across species' ranges should be considered in phylogenetic analyses of mitochondrial DNA.

  14. Hematopoietic cytokines: similarities and differences in the structures, with implications for receptor binding.

    PubMed Central

    Wlodawer, A.; Pavlovsky, A.; Gustchina, A.

    1993-01-01

    Crystal and NMR structures of helical cytokines--interleukin-4 (IL-4), granulocyte-macrophage colony-stimulating factor (GM-CSF), and interleukin-2 (IL-2)--have been compared. Root mean square deviations in the C alpha coordinates for the conserved regions of the helices were 1-2 A between different cytokines, about twice the differences observed for independently determined crystal and solution structures of IL-4. Considerable similarity in amino acid sequence in the areas expected to interact with the receptors was detected, and the available mutagenesis data for these cytokines were correlated with structure conservation. Models of cytokine-receptor interactions were postulated for IL-4 based on its structure as well as on the published structure of human growth hormone interacting with its receptors (de Vos, A.M., Ultsch, M., & Kossiakoff, A.A., 1992, Science 255, 306-312). Patches of positively charged residues on the surfaces of helices C and D of IL-4 may be responsible for the interactions with the negatively charged residues found in the complementary parts of the IL-4 receptors. PMID:8401223

  15. Lactobacillus rodentium sp. nov., from the digestive tract of wild rodents.

    PubMed

    Killer, J; Havlík, J; Vlková, E; Rada, V; Pechar, R; Benada, O; Kopečný, J; Kofroňová, O; Sechovcová, H

    2014-05-01

    Three strains of regular, long, Gram-stain-positive bacterial rods were isolated using TPY, M.R.S. and Rogosa agar under anaerobic conditions from the digestive tract of wild mice (Mus musculus). All 16S rRNA gene sequences of these isolates were most similar to sequences of Lactobacillus gasseri ATCC 33323T and Lactobacillus johnsonii ATCC 33200T (97.3% and 97.2% sequence similarities, respectively). The novel strains shared 99.2-99.6% 16S rRNA gene sequence similarities. Type strains of L. gasseri and L. johnsonii were also most related to the newly isolated strains according to rpoA (83.9-84.0% similarities), pheS (84.6-87.8%), atpA (86.2-87.7%), hsp60 (89.4-90.4%) and tuf (92.7-93.6%) gene sequence similarities. Phylogenetic studies based on 16S rRNA, hsp60, rpoA, atpA and pheS gene sequences, other genotypic and many phenotypic characteristics (results of API 50 CHL, Rapid ID 32A and API ZYM biochemical tests; cellular fatty acid profiles; cellular polar lipid profiles; end products of glucose fermentation) showed that these bacterial strains represent a novel species within the genus Lactobacillus. The name Lactobacillus rodentium sp. nov. is proposed to accommodate this group of new isolates. The type strain is MYMRS/TLU1T (=DSM 24759T=CCM 7945T).

  16. Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner.

    PubMed

    Lu, David V; Brown, Randall H; Arumugam, Manimozhiyan; Brent, Michael R

    2009-07-01

    The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary determinant of alignment accuracy, while heuristics that prevent consideration of certain alignments are a primary determinant of runtime and memory usage. Both accuracy and speed are important considerations in choosing an alignment algorithm, but scoring systems have received much less attention than heuristics. We present Pairagon, a pair hidden Markov model based cDNA-to-genome alignment program, as the most accurate aligner for sequences with high- and low-identity levels. We conducted a series of experiments testing alignment accuracy with varying sequence identity. We first created 'perfect' simulated cDNA sequences by splicing the sequences of exons in the reference genome sequences of fly and human. The complete reference genome sequences were then mutated to various degrees using a realistic mutation simulator and the perfect cDNAs were aligned to them using Pairagon and 12 other aligners. To validate these results with natural sequences, we performed cross-species alignment using orthologous transcripts from human, mouse and rat. We found that aligner accuracy is heavily dependent on sequence identity. For sequences with 100% identity, Pairagon achieved accuracy levels of >99.6%, with one quarter of the errors of any other aligner. Furthermore, for human/mouse alignments, which are only 85% identical, Pairagon achieved 87% accuracy, higher than any other aligner. Pairagon source and executables are freely available at http://mblab.wustl.edu/software/pairagon/

  17. Francisella guangzhouensis sp. nov., isolated from air-conditioning systems.

    PubMed

    Qu, Ping-Hua; Chen, Shou-Yi; Scholz, Holger C; Busse, Hans-Jürgen; Gu, Quan; Kämpfer, Peter; Foster, Jeffrey T; Glaeser, Stefanie P; Chen, Cha; Yang, Zhi-Chong

    2013-10-01

    Four strains (08HL01032(T), 09HG994, 10HP82-6 and 10HL1960) were isolated from water of air-conditioning systems of various cooling towers in Guangzhou city, China. Cells were Gram-stain-negative coccobacilli without flagella, catalase-positive and oxidase-negative, showing no reduction of nitrate, no hydrolysis of urea and no production of H2S. Growth was characteristically enhanced in the presence of l-cysteine, which was consistent with the properties of members of the genus Francisella. The quinone system was composed of ubiquinone Q-8 with minor amounts of Q-9. The polar lipid profile consisted of the predominant lipids phosphatidylethanolamine, diphosphatidylglycerol, phosphatidylglycerol, phosphatidylcholine, two unidentified phospholipids (PL2, PL3), an unidentified aminophospholipid and an unidentified glycolipid (GL2). The polyamine pattern consisted of the major compounds spermidine, cadaverine and spermine. The major cellular fatty acids were C10 : 0, C14 : 0, C16 : 0, C18 : 1ω9c and C18 : 1 3-OH. A draft whole-genome sequence of the proposed type strain 08HL01032(T) was generated. Comparative sequence analysis of the complete 16S and 23S rRNA genes confirmed affiliation to the genus Francisella, with 95 % sequence identity to the closest relatives in the database, the type strains of Francisella philomiragia and Francisella noatunensis subsp. orientalis. Full-length deduced amino acid sequences of various housekeeping genes, recA, gyrB, groEL, dnaK, rpoA, rpoB, rpoD, rpoH, fopA and sdhA, exhibited similarities of 67-92 % to strains of other species of the genus Francisella. Strains 08HL01032(T), 09HG994, 10HP82-6 and 10HL1960 exhibited highly similar pan-genome PCR profiles. Both the phenotypic and molecular data support the conclusion that the four strains belong to the genus Francisella but exhibit considerable divergence from all recognized Francisella species. Therefore, we propose the name Francisella guangzhouensis sp. nov., with the type strain 08HL01032(T) ( = CCUG 60119(T) = NCTC 13503(T)).

  18. Adhesive Proteins of Stalked and Acorn Barnacles Display Homology with Low Sequence Similarities

    PubMed Central

    Jonker, Jaimie-Leigh; Abram, Florence; Pires, Elisabete; Varela Coelho, Ana; Grunwald, Ingo; Power, Anne Marie

    2014-01-01

    Barnacle adhesion underwater is an important phenomenon to understand for the prevention of biofouling and potential biotechnological innovations, yet so far, identifying what makes barnacle glue proteins ‘sticky’ has proved elusive. Examination of a broad range of species within the barnacles may be instructive to identify conserved adhesive domains. We add to extensive information from the acorn barnacles (order Sessilia) by providing the first protein analysis of a stalked barnacle adhesive, Lepas anatifera (order Lepadiformes). It was possible to separate the L. anatifera adhesive into at least 10 protein bands using SDS-PAGE. Intense bands were present at approximately 30, 70, 90 and 110 kilodaltons (kDa). Mass spectrometry for protein identification was followed by de novo sequencing which detected 52 peptides of 7–16 amino acids in length. None of the peptides matched published or unpublished transcriptome sequences, but some amino acid sequence similarity was apparent between L. anatifera and closely-related Dosima fascicularis. Antibodies against two acorn barnacle proteins (ab-cp-52k and ab-cp-68k) showed cross-reactivity in the adhesive glands of L. anatifera. We also analysed the similarity of adhesive proteins across several barnacle taxa, including Pollicipes pollicipes (a stalked barnacle in the order Scalpelliformes). Sequence alignment of published expressed sequence tags clearly indicated that P. pollicipes possesses homologues for the 19 kDa and 100 kDa proteins in acorn barnacles. Homology aside, sequence similarity in amino acid and gene sequences tended to decline as taxonomic distance increased, with minimum similarities of 18–26%, depending on the gene. The results indicate that some adhesive proteins (e.g. 100 kDa) are more conserved within barnacles than others (20 kDa). PMID:25295513

  19. Adhesive proteins of stalked and acorn barnacles display homology with low sequence similarities.

    PubMed

    Jonker, Jaimie-Leigh; Abram, Florence; Pires, Elisabete; Varela Coelho, Ana; Grunwald, Ingo; Power, Anne Marie

    2014-01-01

    Barnacle adhesion underwater is an important phenomenon to understand for the prevention of biofouling and potential biotechnological innovations, yet so far, identifying what makes barnacle glue proteins 'sticky' has proved elusive. Examination of a broad range of species within the barnacles may be instructive to identify conserved adhesive domains. We add to extensive information from the acorn barnacles (order Sessilia) by providing the first protein analysis of a stalked barnacle adhesive, Lepas anatifera (order Lepadiformes). It was possible to separate the L. anatifera adhesive into at least 10 protein bands using SDS-PAGE. Intense bands were present at approximately 30, 70, 90 and 110 kilodaltons (kDa). Mass spectrometry for protein identification was followed by de novo sequencing which detected 52 peptides of 7-16 amino acids in length. None of the peptides matched published or unpublished transcriptome sequences, but some amino acid sequence similarity was apparent between L. anatifera and closely-related Dosima fascicularis. Antibodies against two acorn barnacle proteins (ab-cp-52k and ab-cp-68k) showed cross-reactivity in the adhesive glands of L. anatifera. We also analysed the similarity of adhesive proteins across several barnacle taxa, including Pollicipes pollicipes (a stalked barnacle in the order Scalpelliformes). Sequence alignment of published expressed sequence tags clearly indicated that P. pollicipes possesses homologues for the 19 kDa and 100 kDa proteins in acorn barnacles. Homology aside, sequence similarity in amino acid and gene sequences tended to decline as taxonomic distance increased, with minimum similarities of 18-26%, depending on the gene. The results indicate that some adhesive proteins (e.g. 100 kDa) are more conserved within barnacles than others (20 kDa).

  20. Genome Sequences of Mycobacteriophages Amgine, Amohnition, Bella96, Cain, DarthP, Hammy, Krueger, LastHope, Peanam, PhelpsODU, Phrank, SirPhilip, Slimphazie, and Unicorn.

    PubMed

    Anders, Kirk R; Barekzi, Nazir; Best, Aaron A; Frederick, Gregory D; Mavrodi, Dmitri V; Vazquez, Edwin; Amoh, Nana Yaa A; Baliraine, Frederick N; Buchser, William J; Cast, Thomas P; Chamberlain, Carmen E; Chung, Hui-Min; D'Angelo, William A; Farris, Christian T; Fernandez-Martinez, Mariceli; Fischman, Haley D; Forsyth, Mark H; Fortier, Anna G; Gallo, Kara F; Held, Greta J; Lomas, Miguel A; Maldonado-Vazquez, Natalia Y; Moonsammy, Claudia H; Namboote, Peace; Paudel, Sudip; Polley, Sarah-Elizabeth M; Reyes, Gabriella M; Rubin, Michael R; Saha, Margaret S; Stukey, Joseph; Tobias, Tristan D; Garlena, Rebecca A; Stoner, Ty H; Cresawn, Steven G; Jacobs-Sera, Deborah; Pope, Welkin H; Russell, Daniel A; Hatfull, Graham F

    2017-12-07

    We report the genome sequences of 14 cluster K mycobacteriophages isolated using Mycobacterium smegmatis mc²155 as host. Four are closely related to subcluster K1 phages, and 10 are members of subcluster K6. The phage genomes span considerable sequence diversity, including multiple types of integrases and integration sites. Copyright © 2017 Anders et al.

  1. Tracking the origin of simultaneous endometrial and ovarian cancer by next-generation sequencing - a case report.

    PubMed

    Valtcheva, Nadejda; Lang, Franziska M; Noske, Aurelia; Samartzis, Eleftherios P; Schmidt, Anna-Maria; Bellini, Elisa; Fink, Daniel; Moch, Holger; Rechsteiner, Markus; Dedes, Konstantin J; Wild, Peter J

    2017-01-19

    Endometrioid adenocarcinoma of the uterus and ovarian endometrioid carcinoma share many morphological and molecular features. Differentiation between simultaneous primary carcinomas and ovarian metastases of an endometrial cancer may be very challenging but is essential for prognostic and therapeutic considerations. In the present case study of a 33 year-old patient we used targeted amplicon next-generation re-sequencing for clarifying the origin of synchronous endometrioid cancer of the corpus uteri and the left ovary. The patient developed a metachronous lung metastasis of an endometrioid adenocarcinoma four years after hyster- and adnexectomy, vaginal brachytherapy and treatment with the synthetic steroid tibolone. Removal of the metastasis and megestrol treatment for seven years led to a complete remission. A total of 409 genes from the Ampliseq Comprehensive Cancer Panel (Ion Torrent, Thermo Fisher) were analysed by next generation sequencing and mutations in 10 genes, including ARID1A, CTNNB1, PIK3CA and PTEN were identified and confirmed by Sanger sequencing. Primary endometrial as well as ovarian cancer showed an identical mutational profile, suggesting the presence of an ovarian metastasis of the endometrial cancer, rather than a simultaneous endometrial and ovarian cancer. The metachronous lung metastasis showed a different mutational profile compared to the primary cancer. Immunohistochemical staining of the corresponding proteins suggested that the tumour development was driven by alterations in the protein function rather than by changes of the protein abundance in the cell. Our results have demonstrated next generation sequencing as a valuable tool in the differentiation of synchronous primary tumours and metastases, which has an important impact on the clinical decision making process. Similar to breast cancer, targeted therapies based on mutational tumour profiling will become increasingly important in endometrial and ovarian cancer. In summary, our results support the usage of next generation sequencing as a supplementary diagnostic tool, assisting in personalized precision medicine.

  2. Evolutionary advantage via common action of recombination and neutrality

    NASA Astrophysics Data System (ADS)

    Saakian, David B.; Hu, Chin-Kun

    2013-11-01

    We investigate evolution models with recombination and neutrality. We consider the Crow-Kimura (parallel) mutation-selection model with the neutral fitness landscape, in which there is a central peak with high fitness A, and some of 1-point mutants have the same high fitness A, while the fitness of other sequences is 0. We find that the effect of recombination and neutrality depends on the concrete version of both neutrality and recombination. We consider three versions of neutrality: (a) all the nearest neighbor sequences of the peak sequence have the same high fitness A; (b) all the l-point mutations in a piece of genome of length l≥1 are neutral; (c) the neutral sequences are randomly distributed among the nearest neighbors of the peak sequences. We also consider three versions of recombination: (I) the simple horizontal gene transfer (HGT) of one nucleotide; (II) the exchange of a piece of genome of length l, HGT-l; (III) two-point crossover recombination (2CR). For the case of (a), the 2CR gives a rather strong contribution to the mean fitness, much stronger than that of HGT for a large genome length L. For the random distribution of neutral sequences there is a critical degree of neutrality νc, and for μ<μc and (μc-μ) is not large, the 2CR suppresses the mean fitness while HGT increases it; for ν much larger than νc, the 2CR and HGT-l increase the mean fitness larger than that of the HGT. We also consider the recombination in the case of smooth fitness landscapes. The recombination gives some advantage in the evolutionary dynamics, where recombination distinguishes clearly the mean-field-like evolutionary factors from the fluctuation-like ones. By contrast, mutations affect the mean-field-like and fluctuation-like factors similarly. Consequently, recombination can accelerate the non-mean-field (fluctuation) type dynamics without considerably affecting the mean-field-like factors.

  3. Effective Identification of Similar Patients Through Sequential Matching over ICD Code Embedding.

    PubMed

    Nguyen, Dang; Luo, Wei; Venkatesh, Svetha; Phung, Dinh

    2018-04-11

    Evidence-based medicine often involves the identification of patients with similar conditions, which are often captured in ICD (International Classification of Diseases (World Health Organization 2013)) code sequences. With no satisfying prior solutions for matching ICD-10 code sequences, this paper presents a method which effectively captures the clinical similarity among routine patients who have multiple comorbidities and complex care needs. Our method leverages the recent progress in representation learning of individual ICD-10 codes, and it explicitly uses the sequential order of codes for matching. Empirical evaluation on a state-wide cancer data collection shows that our proposed method achieves significantly higher matching performance compared with state-of-the-art methods ignoring the sequential order. Our method better identifies similar patients in a number of clinical outcomes including readmission and mortality outlook. Although this paper focuses on ICD-10 diagnosis code sequences, our method can be adapted to work with other codified sequence data.

  4. A prototype operational earthquake loss model for California based on UCERF3-ETAS – A first look at valuation

    USGS Publications Warehouse

    Field, Edward; Porter, Keith; Milner, Kevn

    2017-01-01

    We present a prototype operational loss model based on UCERF3-ETAS, which is the third Uniform California Earthquake Rupture Forecast with an Epidemic Type Aftershock Sequence (ETAS) component. As such, UCERF3-ETAS represents the first earthquake forecast to relax fault segmentation assumptions and to include multi-fault ruptures, elastic-rebound, and spatiotemporal clustering, all of which seem important for generating realistic and useful aftershock statistics. UCERF3-ETAS is nevertheless an approximation of the system, however, so usefulness will vary and potential value needs to be ascertained in the context of each application. We examine this question with respect to statewide loss estimates, exemplifying how risk can be elevated by orders of magnitude due to triggered events following various scenario earthquakes. Two important considerations are the probability gains, relative to loss likelihoods in the absence of main shocks, and the rapid decay of gains with time. Significant uncertainties and model limitations remain, so we hope this paper will inspire similar analyses with respect to other risk metrics to help ascertain whether operationalization of UCERF3-ETAS would be worth the considerable resources required.

  5. Unraveling systematic inventory of Echinops (Asteraceae) with special reference to nrDNA ITS sequence-based molecular typing of Echinops abuzinadianus.

    PubMed

    Ali, M A; Al-Hemaid, F M; Lee, J; Hatamleh, A A; Gyulai, G; Rahman, M O

    2015-10-02

    The present study explored the systematic inventory of Echinops L. (Asteraceae) of Saudi Arabia, with special reference to the molecular typing of Echinops abuzinadianus Chaudhary, an endemic species to Saudi Arabia, based on the internal transcribed spacer (ITS) sequences (ITS1-5.8S-ITS2) of nuclear ribosomal DNA. A sequence similarity search using BLAST and a phylogenetic analysis of the ITS sequence of E. abuzinadianus revealed a high level of sequence similarity with E. glaberrimus DC. (section Ritropsis). The novel primary sequence and the secondary structure of ITS2 of E. abuzinadianus could potentially be used for molecular genotyping.

  6. Nucleotide sequence of the Saccharomyces cerevisiae PUT4 proline-permease-encoding gene: similarities between CAN1, HIP1 and PUT4 permeases.

    PubMed

    Vandenbol, M; Jauniaux, J C; Grenson, M

    1989-11-15

    The complete nucleotide (nt) sequence of the PUT4 gene, whose product is required for high-affinity proline active transport in the yeast Saccharomyces cerevisiae, is presented. The sequence contains a single long open reading frame of 1881 nt, encoding a polypeptide with a calculated Mr of 68,795. The predicted protein is strongly hydrophobic and exhibits six potential glycosylation sites. Its hydropathy profile suggests the presence of twelve membrane-spanning regions flanked by hydrophilic N- and C-terminal domains. The N terminus does not resemble signal sequences found in secreted proteins. These features are characteristic of integral membrane proteins catalyzing translocation of ligands across cellular membranes. Protein sequence comparisons indicate strong resemblance to the arginine and histidine permeases of S. cerevisiae, but no marked sequence similarity to the proline permease of Escherichia coli or to other known prokaryotic or eukaryotic transport proteins. The strong similarity between the three yeast amino acid permeases suggests a common ancestor for the three proteins.

  7. String Mining in Bioinformatics

    NASA Astrophysics Data System (ADS)

    Abouelhoda, Mohamed; Ghanem, Moustafa

    Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word "data-mining" is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].

  8. String Mining in Bioinformatics

    NASA Astrophysics Data System (ADS)

    Abouelhoda, Mohamed; Ghanem, Moustafa

    Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word “data-mining” is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].

  9. Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison.

    PubMed

    Dai, Qi; Yang, Yanchun; Wang, Tianming

    2008-10-15

    Many proposed statistical measures can efficiently compare biological sequences to further infer their structures, functions and evolutionary information. They are related in spirit because all the ideas for sequence comparison try to use the information on the k-word distributions, Markov model or both. Motivated by adding k-word distributions to Markov model directly, we investigated two novel statistical measures for sequence comparison, called wre.k.r and S2.k.r. The proposed measures were tested by similarity search, evaluation on functionally related regulatory sequences and phylogenetic analysis. This offers the systematic and quantitative experimental assessment of our measures. Moreover, we compared our achievements with these based on alignment or alignment-free. We grouped our experiments into two sets. The first one, performed via ROC (receiver operating curve) analysis, aims at assessing the intrinsic ability of our statistical measures to search for similar sequences from a database and discriminate functionally related regulatory sequences from unrelated sequences. The second one aims at assessing how well our statistical measure is used for phylogenetic analysis. The experimental assessment demonstrates that our similarity measures intending to incorporate k-word distributions into Markov model are more efficient.

  10. What controls the local time extent of flux transfer events?

    NASA Astrophysics Data System (ADS)

    Milan, S. E.; Imber, S. M.; Carter, J. A.; Walach, M.-T.; Hubert, B.

    2016-02-01

    Flux transfer events (FTEs) are the manifestation of bursty and/or patchy magnetic reconnection at the magnetopause. We compare two sequences of the ionospheric signatures of flux transfer events observed in global auroral imagery and coherent ionospheric radar measurements. Both sequences were observed during very similar seasonal and interplanetary magnetic field (IMF) conditions, though with differing solar wind speed. A key observation is that the signatures differed considerably in their local time extent. The two periods are 26 August 1998, when the IMF had components BZ≈-10 nT and BY≈9 nT and the solar wind speed was VX≈650 km s-1, and 31 August 2005, IMF BZ≈-7 nT, BY≈17 nT, and VX≈380 km s-1. In the first case, the reconnection rate was estimated to be near 160 kV, and the FTE signatures extended across at least 7 h of magnetic local time (MLT) of the dayside polar cap boundary. In the second, a reconnection rate close to 80 kV was estimated, and the FTEs had a MLT extent of roughly 2 h. We discuss the ramifications of these differences for solar wind-magnetosphere coupling.

  11. Improved localisation for 2-hydroxyglutarate detection at 3T using long-TE semi-LASER

    PubMed Central

    Berrington, Adam; Voets, Natalie L.; Plaha, Puneet; Larkin, Sarah J.; Mccullagh, James; Stacey, Richard; Yildirim, Muhammed; Schofield, Christopher J.; Jezzard, Peter; Cadoux-Hudson, Tom; Ansorge, Olaf; Emir, Uzay E.

    2016-01-01

    2-hydroxyglutarate (2-HG) has emerged as a biomarker of tumour cell IDH mutations that may enable the differential diagnosis of glioma patients. At 3 Tesla, detection of 2-HG with magnetic resonance spectroscopy is challenging because of metabolite signal overlap and a spectral pattern modulated by slice selection and chemical shift displacement. Using density matrix simulations and phantom experiments, an optimised semi-LASER scheme (TE = 110 ms) improves localisation of the 2-HG spin system considerably compared to an existing PRESS sequence. This results in a visible 2-HG peak in the in vivo spectra at 1.9 ppm in the majority of IDH mutated tumours. Detected concentrations of 2-HG were similar using both sequences, although the use of semi-LASER generated narrower confidence intervals. Signal overlap with glutamate and glutamine, as measured by pairwise fitting correlation was reduced. Lactate was readily detectable across glioma patients using the method presented here (mean CLRB: (10±2)%). Together with more robust 2-HG detection, long TE semi-LASER offers the potential to investigate tumour metabolism and stratify patients in vivo at 3T. PMID:27547821

  12. Structure of the N-terminal domain of human thioredoxin-interacting protein.

    PubMed

    Polekhina, Galina; Ascher, David Benjamin; Kok, Shie Foong; Beckham, Simone; Wilce, Matthew; Waltham, Mark

    2013-03-01

    Thioredoxin-interacting protein (TXNIP) is one of the six known α-arrestins and has recently received considerable attention owing to its involvement in redox signalling and metabolism. Various stress stimuli such as high glucose, heat shock, UV, H2O2 and mechanical stress among others robustly induce the expression of TXNIP, resulting in the sequestration and inactivation of thioredoxin, which in turn leads to cellular oxidative stress. While TXNIP is the only α-arrestin known to bind thioredoxin, TXNIP and two other α-arrestins, Arrdc4 and Arrdc3, have been implicated in metabolism. Furthermore, owing to its roles in the pathologies of diabetes and cardiovascular disease, TXNIP is considered to be a promising drug target. Based on their amino-acid sequences, TXNIP and the other α-arrestins are remotely related to β-arrestins. Here, the crystal structure of the N-terminal domain of TXNIP is reported. It provides the first structural information on any of the α-arrestins and reveals that although TXNIP adopts a β-arrestin fold as predicted, it is structurally more similar to Vps26 proteins than to β-arrestins, while sharing below 15% pairwise sequence identity with either.

  13. A Phytase Characterized by Relatively High pH Tolerance and Thermostability from the Shiitake Mushroom Lentinus edodes

    PubMed Central

    Zhang, Guo-Qing; Wu, Ying-Ying; Ng, Tzi-Bun; Chen, Qing-Jun; Wang, He-Xiang

    2013-01-01

    A monomeric phytase with a molecular mass of 14 kDa was acquired from fresh fruiting bodies of the shiitake mushroom Lentinus edodes. The isolation procedure involved chromatography on DEAE-cellulose, CM-cellulose, Q-Sepharose, Affi-gel blue gel, and a final fast protein liquid chromatography-gel filtration on Superdex 75. The purified phytase demonstrated the unique N-terminal amino acid sequence DPKRTDQVN, which exhibited no sequence similarity with those of other phytases previously reported. It expressed its maximal activity at pH 5.0 and 37°C. Phytase activity manifested less than 20% change in activity over the pH range of 3.0–9.0, considerable thermostability with more than 60% residual activity at 70°C, and about 40% residual activity at 95°C. It displayed a wide substrate specificity on a variety of phosphorylated compounds with the following ranking: ATP > fructose-6-phosphate > AMP > glucose-6-phosphate > ADP > sodium phytate > β-glycerophosphate. The phytase activity was moderately stimulated by Ca2+, but inhibited by Al3+, Mn2+, Zn2+, and Cu2+ at a tested concentration of 5 mM. PMID:23586045

  14. Future technologies for monitoring HIV drug resistance and cure.

    PubMed

    Parikh, Urvi M; McCormick, Kevin; van Zyl, Gert; Mellors, John W

    2017-03-01

    Sensitive, scalable and affordable assays are critically needed for monitoring the success of interventions for preventing, treating and attempting to cure HIV infection. This review evaluates current and emerging technologies that are applicable for both surveillance of HIV drug resistance (HIVDR) and characterization of HIV reservoirs that persist despite antiretroviral therapy and are obstacles to curing HIV infection. Next-generation sequencing (NGS) has the potential to be adapted into high-throughput, cost-efficient approaches for HIVDR surveillance and monitoring during continued scale-up of antiretroviral therapy and rollout of preexposure prophylaxis. Similarly, improvements in PCR and NGS are resulting in higher throughput single genome sequencing to detect intact proviruses and to characterize HIV integration sites and clonal expansions of infected cells. Current population genotyping methods for resistance monitoring are high cost and low throughput. NGS, combined with simpler sample collection and storage matrices (e.g. dried blood spots), has considerable potential to broaden global surveillance and patient monitoring for HIVDR. Recent adaptions of NGS to identify integration sites of HIV in the human genome and to characterize the integrated HIV proviruses are likely to facilitate investigations of the impact of experimental 'curative' interventions on HIV reservoirs.

  15. Comprehensive molecular characterization of human colon and rectal cancer.

    PubMed

    2012-07-18

    To characterize somatic alterations in colorectal carcinoma, we conducted a genome-scale analysis of 276 samples, analysing exome sequence, DNA copy number, promoter methylation and messenger RNA and microRNA expression. A subset of these samples (97) underwent low-depth-of-coverage whole-genome sequencing. In total, 16% of colorectal carcinomas were found to be hypermutated: three-quarters of these had the expected high microsatellite instability, usually with hypermethylation and MLH1 silencing, and one-quarter had somatic mismatch-repair gene and polymerase ε (POLE) mutations. Excluding the hypermutated cancers, colon and rectum cancers were found to have considerably similar patterns of genomic alteration. Twenty-four genes were significantly mutated, and in addition to the expected APC, TP53, SMAD4, PIK3CA and KRAS mutations, we found frequent mutations in ARID1A, SOX9 and FAM123B. Recurrent copy-number alterations include potentially drug-targetable amplifications of ERBB2 and newly discovered amplification of IGF2. Recurrent chromosomal translocations include the fusion of NAV2 and WNT pathway member TCF7L1. Integrative analyses suggest new markers for aggressive colorectal carcinoma and an important role for MYC-directed transcriptional activation and repression.

  16. J0811+4730: the most metal-poor star-forming dwarf galaxy known

    NASA Astrophysics Data System (ADS)

    Izotov, Y. I.; Thuan, T. X.; Guseva, N. G.; Liss, S. E.

    2018-01-01

    We report the discovery of the most metal-poor dwarf star-forming galaxy (SFG) known to date, J0811+4730. This galaxy, at a redshift z = 0.04444, has a Sloan Digital Sky Survey (SDSS) g-band absolute magnitude Mg = -15.41 mag. It was selected by inspecting the spectroscopic data base in the Data Release 13 (DR13) of the SDSS. Large Binocular Telescope/Multi-Object Double spectrograph (LBT/MODS) spectroscopic observations reveal its oxygen abundance to be 12 + log O/H = 6.98 ± 0.02, the lowest ever observed for an SFG. J0811+4730 strongly deviates from the main sequence defined by SFGs in the emission line diagnostic diagrams and the metallicity-luminosity diagram. These differences are caused mainly by the extremely low oxygen abundance in J0811+4730, which is ∼10 times lower than that in main-sequence SFGs with similar luminosities. By fitting the spectral energy distributions of the SDSS and LBT spectra, we derive a stellar mass of M⋆ = 106.24-106.29 M⊙, and we find that a considerable fraction of the galaxy stellar mass was formed during the most recent burst of star formation.

  17. Current overview of allergens of plant pathogenesis related protein families.

    PubMed

    Sinha, Mau; Singh, Rashmi Prabha; Kushwaha, Gajraj Singh; Iqbal, Naseer; Singh, Avinash; Kaushik, Sanket; Kaur, Punit; Sharma, Sujata; Singh, Tej P

    2014-01-01

    Pathogenesis related (PR) proteins are one of the major sources of plant derived allergens. These proteins are induced by the plants as a defense response system in stress conditions like microbial and insect infections, wounding, exposure to harsh chemicals, and atmospheric conditions. However, some plant tissues that are more exposed to environmental conditions like UV irradiation and insect or fungal attacks express these proteins constitutively. These proteins are mostly resistant to proteases and most of them show considerable stability at low pH. Many of these plant pathogenesis related proteins are found to act as food allergens, latex allergens, and pollen allergens. Proteins having similar amino acid sequences among the members of PR proteins may be responsible for cross-reactivity among allergens from diverse plants. This review analyzes the different pathogenesis related protein families that have been reported as allergens. Proteins of these families have been characterized in regard to their biological functions, amino acid sequence, and cross-reactivity. The three-dimensional structures of some of these allergens have also been evaluated to elucidate the antigenic determinants of these molecules and to explain the cross-reactivity among the various allergens.

  18. Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity.

    PubMed

    Leuthaeuser, Janelle B; Knutson, Stacy T; Kumar, Kiran; Babbitt, Patricia C; Fetrow, Jacquelyn S

    2015-09-01

    The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM-Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well-studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence- or structure-based networks. Sequence- and structure-based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity-based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity-based methods. © 2015 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.

  19. Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity

    PubMed Central

    Leuthaeuser, Janelle B; Knutson, Stacy T; Kumar, Kiran; Babbitt, Patricia C; Fetrow, Jacquelyn S

    2015-01-01

    The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM-Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well-studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence- or structure-based networks. Sequence- and structure-based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity-based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity-based methods. PMID:26073648

  20. Genome Sequences of Eight Aspergillus flavus spp. and One A. parasiticus sp., Isolated From Peanut Seeds in Georgia

    USDA-ARS?s Scientific Manuscript database

    Aspergillus flavus and A. parasiticus fungi, carcinogen-mycotoxins producers, infect peanut seeds, causing considerable impact on both human health and the economy. Here we report 9 genome sequences of Aspergillus spp. isolated from peanut seeds. The information obtained will allow conducting biodiv...

  1. Predicted stem-loop structures and variation in nucleotide sequence of 3' noncoding regions among animal calicivirus genomes.

    PubMed

    Seal, B S; Neill, J D; Ridpath, J F

    1994-07-01

    Caliciviruses are nonenveloped with a polyadenylated genome of approximately 7.6 kb and a single capsid protein. The "RNA Fold" computer program was used to analyze 3'-terminal noncoding sequences of five feline calicivirus (FCV), rabbit hemorrhagic disease virus (RHDV), and two San Miguel sea lion virus (SMSV) isolates. The FCV 3'-terminal sequences are 40-46 nucleotides in length and 72-91% similar. The FCV sequences were predicted to contain two possible duplex structures and one stem-loop structure with free energies of -2.1 to -18.2 kcal/mole. The RHDV genomic 3'-terminal RNA sequences are 54 nucleotides in length and share 49% sequence similarity to homologous regions of the FCV genome. The RHDV sequence was predicted to form two duplex structures in the 3'-terminal noncoding region with a single stem-loop structure, resembling that of FCV. In contrast, the SMSV 1 and 4 genomic 3'-terminal noncoding sequences were 185 and 182 nucleotides in length, respectively. Ten possible duplex structures were predicted with an average structural free energy of -35 kcal/mole. Sequence similarity between the two SMSV isolates was 75%. Furthermore, extensive cloverleaflike structures are predicted in the 3' noncoding region of the SMSV genome, in contrast to the predicted single stem-loop structures of FCV or RHDV.

  2. Algorithm for Video Summarization of Bronchoscopy Procedures

    PubMed Central

    2011-01-01

    Background The duration of bronchoscopy examinations varies considerably depending on the diagnostic and therapeutic procedures used. It can last more than 20 minutes if a complex diagnostic work-up is included. With wide access to videobronchoscopy, the whole procedure can be recorded as a video sequence. Common practice relies on an active attitude of the bronchoscopist who initiates the recording process and usually chooses to archive only selected views and sequences. However, it may be important to record the full bronchoscopy procedure as documentation when liability issues are at stake. Furthermore, an automatic recording of the whole procedure enables the bronchoscopist to focus solely on the performed procedures. Video recordings registered during bronchoscopies include a considerable number of frames of poor quality due to blurry or unfocused images. It seems that such frames are unavoidable due to the relatively tight endobronchial space, rapid movements of the respiratory tract due to breathing or coughing, and secretions which occur commonly in the bronchi, especially in patients suffering from pulmonary disorders. Methods The use of recorded bronchoscopy video sequences for diagnostic, reference and educational purposes could be considerably extended with efficient, flexible summarization algorithms. Thus, the authors developed a prototype system to create shortcuts (called summaries or abstracts) of bronchoscopy video recordings. Such a system, based on models described in previously published papers, employs image analysis methods to exclude frames or sequences of limited diagnostic or education value. Results The algorithm for the selection or exclusion of specific frames or shots from video sequences recorded during bronchoscopy procedures is based on several criteria, including automatic detection of "non-informative", frames showing the branching of the airways and frames including pathological lesions. Conclusions The paper focuses on the challenge of generating summaries of bronchoscopy video recordings. PMID:22185344

  3. SeqAPASS (Sequence Alignment to Predict Across Species Susceptibility) software and documentation

    EPA Science Inventory

    SeqAPASS is a software application facilitates rapid and streamlined, yet transparent, comparisons of the similarity of toxicologically-significant molecular targets across species. The present application facilitates analysis of primary amino acid sequence similarity (including ...

  4. GWFASTA: server for FASTA search in eukaryotic and microbial genomes.

    PubMed

    Issac, Biju; Raghava, G P S

    2002-09-01

    Similarity searches are a powerful method for solving important biological problems such as database scanning, evolutionary studies, gene prediction, and protein structure prediction. FASTA is a widely used sequence comparison tool for rapid database scanning. Here we describe the GWFASTA server that was developed to assist the FASTA user in similarity searches against partially and/or completely sequenced genomes. GWFASTA consists of more than 60 microbial genomes, eight eukaryote genomes, and proteomes of annotatedgenomes. Infact, it provides the maximum number of databases for similarity searching from a single platform. GWFASTA allows the submission of more than one sequence as a single query for a FASTA search. It also provides integrated post-processing of FASTA output, including compositional analysis of proteins, multiple sequences alignment, and phylogenetic analysis. Furthermore, it summarizes the search results organism-wise for prokaryotes and chromosome-wise for eukaryotes. Thus, the integration of different tools for sequence analyses makes GWFASTA a powerful toolfor biologists.

  5. Multiple alignment-free sequence comparison

    PubMed Central

    Ren, Jie; Song, Kai; Sun, Fengzhu; Deng, Minghua; Reinert, Gesine

    2013-01-01

    Motivation: Recently, a range of new statistics have become available for the alignment-free comparison of two sequences based on k-tuple word content. Here, we extend these statistics to the simultaneous comparison of more than two sequences. Our suite of statistics contains, first, and , extensions of statistics for pairwise comparison of the joint k-tuple content of all the sequences, and second, , and , averages of sums of pairwise comparison statistics. The two tasks we consider are, first, to identify sequences that are similar to a set of target sequences, and, second, to measure the similarity within a set of sequences. Results: Our investigation uses both simulated data as well as cis-regulatory module data where the task is to identify cis-regulatory modules with similar transcription factor binding sites. We find that although for real data, all of our statistics show a similar performance, on simulated data the Shepp-type statistics are in some instances outperformed by star-type statistics. The multiple alignment-free statistics are more sensitive to contamination in the data than the pairwise average statistics. Availability: Our implementation of the five statistics is available as R package named ‘multiAlignFree’ at be http://www-rcf.usc.edu/∼fsun/Programs/multiAlignFree/multiAlignFreemain.html. Contact: reinert@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23990418

  6. Insights into the Emergent Bacterial Pathogen Cronobacter spp., Generated by Multilocus Sequence Typing and Analysis

    PubMed Central

    Joseph, Susan; Forsythe, Stephen J.

    2012-01-01

    Cronobacter spp. (previously known as Enterobacter sakazakii) is a bacterial pathogen affecting all age groups, with particularly severe clinical complications in neonates and infants. One recognized route of infection being the consumption of contaminated infant formula. As a recently recognized bacterial pathogen of considerable importance and regulatory control, appropriate detection, and identification schemes are required. The application of multilocus sequence typing (MLST) and analysis (MLSA) of the seven alleles atpD, fusA, glnS, gltB, gyrB, infB, and ppsA (concatenated length 3036 base pairs) has led to considerable advances in our understanding of the genus. This approach is supported by both the reliability of DNA sequencing over subjective phenotyping and the establishment of a MLST database which has open access and is also curated; http://www.pubMLST.org/cronobacter. MLST has been used to describe the diversity of the newly recognized genus, instrumental in the formal recognition of new Cronobacter species (C. universalis and C. condimenti) and revealed the high clonality of strains and the association of clonal complex 4 with neonatal meningitis cases. Clearly the MLST approach has considerable benefits over the use of non-DNA sequence based methods of analysis for newly emergent bacterial pathogens. The application of MLST and MLSA has dramatically enabled us to better understand this opportunistic bacterium which can cause irreparable damage to a newborn baby’s brain, and has contributed to improved control measures to protect neonatal health. PMID:23189075

  7. A search for the binary companion of Polaris

    NASA Technical Reports Server (NTRS)

    Evans, Nancy Remage

    1988-01-01

    Polaris has a spectroscopic orbit determined from an extensive series of observations as well as a more uncertain astrometric orbit. The determination of its mass and evolutionary state is of considerable interest because it is a low-amplitude classical Cepheid with unusual period and amplitude variations. In this study, IUE spectra are investigated to search for light from the companion. The spectra of Polaris from 1600 A to 3200 A are a good match for nonvariable supergiants of similar spectral type. The lack of any excess flux at the shortest wavelengths implies that a main-sequence companion must be later than A8 V. Although this is the most likely companion, the ultraviolet observations cannot rule out a white dwarf 15,000 K or cooler. Both these companions are consistent with either an evolutionary mass or a smaller pulsation mass for the Cepheid.

  8. RPS8—a New Informative DNA Marker for Phylogeny of Babesia and Theileria Parasites in China

    PubMed Central

    Tian, Zhan-Cheng; Liu, Guang-Yuan; Yin, Hong; Luo, Jian-Xun; Guan, Gui-Quan; Luo, Jin; Xie, Jun-Ren; Shen, Hui; Tian, Mei-Yuan; Zheng, Jin-feng; Yuan, Xiao-song; Wang, Fang-fang

    2013-01-01

    Piroplasmosis is a serious debilitating and sometimes fatal disease. Phylogenetic relationships within piroplasmida are complex and remain unclear. We compared the intron–exon structure and DNA sequences of the RPS8 gene from Babesia and Theileria spp. isolates in China. Similar to 18S rDNA, the 40S ribosomal protein S8 gene, RPS8, including both coding and non-coding regions is a useful and novel genetic marker for defining species boundaries and for inferring phylogenies because it tends to have little intra-specific variation but considerable inter-specific difference. However, more samples are needed to verify the usefulness of the RPS8 (coding and non-coding regions) gene as a marker for the phylogenetic position and detection of most Babesia and Theileria species, particularly for some closely related species. PMID:24244571

  9. Proteomic endorsed transcriptomic profiles of venom glands from Tityus obscurus and T. serrulatus scorpions

    PubMed Central

    Nishiyama, Milton Yutaka; dos Santos, Maria Beatriz Viana; Santos-da-Silva, Andria de Paula; Chalkidis, Hipócrates de Menezes; Souza-Imberg, Andreia; Candido, Denise Maria; Yamanouye, Norma; Dorce, Valquíria Abrão Coronado; Junqueira-de-Azevedo, Inácio de Loiola Meirelles

    2018-01-01

    Background Except for the northern region, where the Amazonian black scorpion, T. obscurus, represents the predominant and most medically relevant scorpion species, Tityus serrulatus, the Brazilian yellow scorpion, is widely distributed throughout Brazil, causing most envenoming and fatalities due to scorpion sting. In order to evaluate and compare the diversity of venom components of Tityus obscurus and T. serrulatus, we performed a transcriptomic investigation of the telsons (venom glands) corroborated by a shotgun proteomic analysis of the venom from the two species. Results The putative venom components represented 11.4% and 16.7% of the total gene expression for T. obscurus and T. serrulatus, respectively. Transcriptome and proteome data revealed high abundance of metalloproteinases sequences followed by sodium and potassium channel toxins, making the toxin core of the venom. The phylogenetic analysis of metalloproteinases from T. obscurus and T. serrulatus suggested an intraspecific gene expansion, as we previously observed for T. bahiensis, indicating that this enzyme may be under evolutionary pressure for diversification. We also identified several putative venom components such as anionic peptides, antimicrobial peptides, bradykinin-potentiating peptide, cysteine rich protein, serine proteinases, cathepsins, angiotensin-converting enzyme, endothelin-converting enzyme and chymotrypsin like protein, proteinases inhibitors, phospholipases and hyaluronidases. Conclusion The present work shows that the venom composition of these two allopatric species of Tityus are considerably similar in terms of the major classes of proteins produced and secreted, although their individual toxin sequences are considerably divergent. These differences at amino acid level may reflect in different epitopes for the same protein classes in each species, explaining the basis for the poor recognition of T. obscurus venom by the antiserum raised against other species. PMID:29561852

  10. The genomes of three stocks comprising the most widely utilized live sporozoite Theileria parva vaccine exhibit very different degrees and patterns of sequence divergence.

    PubMed

    Norling, Martin; Bishop, Richard P; Pelle, Roger; Qi, Weihong; Henson, Sonal; Drábek, Elliott F; Tretina, Kyle; Odongo, David; Mwaura, Stephen; Njoroge, Thomas; Bongcam-Rudloff, Erik; Daubenberger, Claudia A; Silva, Joana C

    2015-09-24

    There are no commercially available vaccines against human protozoan parasitic diseases, despite the success of vaccination-induced long-term protection against infectious diseases. East Coast fever, caused by the protist Theileria parva, kills one million cattle each year in sub-Saharan Africa, and contributes significantly to hunger and poverty in the region. A highly effective, live, multi-isolate vaccine against T. parva exists, but its component isolates have not been characterized. Here we sequence and compare the three component T. parva stocks within this vaccine, the Muguga Cocktail, namely Muguga, Kiambu5 and Serengeti-transformed, aiming to identify genomic features that contribute to vaccine efficacy. We find that Serengeti-transformed, originally isolated from the wildlife carrier, the African Cape buffalo, is remarkably and unexpectedly similar to the Muguga isolate. The 420 detectable non-synonymous SNPs were distributed among only 53 genes, primarily subtelomeric antigens and antigenic families. The Kiambu5 isolate is considerably more divergent, with close to 40,000 SNPs relative to Muguga, including >8,500 non-synonymous mutations distributed among >1,700 (42.5 %) of the predicted genes. These genetic markers of the component stocks can be used to characterize the composition of new batches of the Muguga Cocktail. Differences among these three isolates, while extensive, represent only a small proportion of the genetic variation in the entire species. Given the efficacy of the Muguga Cocktail in inducing long-lasting protection against infections in the field, our results suggest that whole-organism vaccines against parasitic diseases can be highly efficacious despite considerable genome-wide differences relative to the isolates against which they protect.

  11. A discrete artificial bee colony algorithm for detecting transcription factor binding sites in DNA sequences.

    PubMed

    Karaboga, D; Aslan, S

    2016-04-27

    The great majority of biological sequences share significant similarity with other sequences as a result of evolutionary processes, and identifying these sequence similarities is one of the most challenging problems in bioinformatics. In this paper, we present a discrete artificial bee colony (ABC) algorithm, which is inspired by the intelligent foraging behavior of real honey bees, for the detection of highly conserved residue patterns or motifs within sequences. Experimental studies on three different data sets showed that the proposed discrete model, by adhering to the fundamental scheme of the ABC algorithm, produced competitive or better results than other metaheuristic motif discovery techniques.

  12. Mitogenomes of Giant-Skipper Butterflies reveal an ancient split between deep and shallow root feeders.

    PubMed

    Zhang, Jing; Cong, Qian; Fan, Xiao-Ling; Wang, Rongjiang; Wang, Min; Grishin, Nick V

    2017-01-01

    Background: Giant-Skipper butterflies from the genus Megathymus are North American endemics. These large and thick-bodied Skippers resemble moths and are unique in their life cycles. Grub-like at the later stages of development, caterpillars of these species feed and live inside yucca roots. Adults do not feed and are mostly local, not straying far from the patches of yucca plants. Methods: Pieces of muscle were dissected from the thorax of specimens and genomic DNA was extracted (also from the abdomen of a specimen collected nearly 60 years ago). Paired-end libraries were prepared and sequenced for 150bp from both ends. The mitogenomes were assembled from the reads followed by a manual gap-closing procedure and a phylogenetic tree was constructed using a maximum likelihood method from an alignment of the mitogenomes. Results: We determined mitogenome sequences of nominal subspecies of all five known species of Megathymus and Agathymus mariae to confidently root the phylogenetic tree. Pairwise sequence identity indicates the high similarity, ranging from 88-96% among coding regions for 13 proteins, 22 tRNAs and 2 rRNA, with a gene order typical for mitogenomes of Lepidoptera. Phylogenetic analysis confirms that Giant-Skippers (Megathymini) originate within the subfamily Hesperiinae and do not warrant a subfamily rank. Genus Megathymus is monophyletic and splits into two species groups. M. streckeri and M. cofaqui caterpillars feed deep in the main root system of yucca plants and deposit frass underground. M. ursus , M. beulahae and M. yuccae feed in the yucca caudex and roots near the ground, and deposit frass outside through a "tent" (a silk tube projecting from the center of yucca plant). M. yuccae and M. beulahae are sister species consistently with morphological similarities between them. Conclusions: We constructed the first DNA-based phylogeny of the genus Megathymus from their mitogenomes. The phylogeny agrees with morphological considerations.

  13. Mitochondrial Genomes Reveal Slow Rates of Molecular Evolution and the Timing of Speciation in Beavers (Castor), One of the Largest Rodent Species

    PubMed Central

    Horn, Susanne; Durka, Walter; Wolf, Ronny; Ermala, Aslak; Stubbe, Annegret; Stubbe, Michael; Hofreiter, Michael

    2011-01-01

    Background Beavers are one of the largest and ecologically most distinct rodent species. Little is known about their evolution and even their closest phylogenetic relatives have not yet been identified with certainty. Similarly, little is known about the timing of divergence events within the genus Castor. Methodology/Principal Findings We sequenced complete mitochondrial genomes from both extant beaver species and used these sequences to place beavers in the phylogenetic tree of rodents and date their divergence from other rodents as well as the divergence events within the genus Castor. Our analyses support the phylogenetic position of beavers as a sister lineage to the scaly tailed squirrel Anomalurus within the mouse related clade. Molecular dating places the divergence time of the lineages leading to beavers and Anomalurus as early as around 54 million years ago (mya). The living beaver species, Castor canadensis from North America and Castor fiber from Eurasia, although similar in appearance, appear to have diverged from a common ancestor more than seven mya. This result is consistent with the hypothesis that a migration of Castor from Eurasia to North America as early as 7.5 mya could have initiated their speciation. We date the common ancestor of the extant Eurasian beaver relict populations to around 210,000 years ago, much earlier than previously thought. Finally, the substitution rate of Castor mitochondrial DNA is considerably lower than that of other rodents. We found evidence that this is correlated with the longer life span of beavers compared to other rodents. Conclusions/Significance A phylogenetic analysis of mitochondrial genome sequences suggests a sister-group relationship between Castor and Anomalurus, and allows molecular dating of species divergence in congruence with paleontological data. The implementation of a relaxed molecular clock enabled us to estimate mitochondrial substitution rates and to evaluate the effect of life history traits on it. PMID:21307956

  14. Systemic Lupus Erythematosus: Molecular Mimicry between Anti-dsDNA CDR3 Idiotype, Microbial and Self Peptides-As Antigens for Th Cells.

    PubMed

    Aas-Hanssen, Kristin; Thompson, Keith M; Bogen, Bjarne; Munthe, Ludvig A

    2015-01-01

    Systemic lupus erythematosus (SLE) is marked by a T helper (Th) cell-dependent B cell hyperresponsiveness, with frequent germinal center reactions, and gammaglobulinemia. A feature of SLE is the finding of IgG autoantibodies specific for dsDNA. The specificity of the Th cells that drive the expansion of anti-dsDNA B cells is unresolved. However, anti-microbial, anti-histone, and anti-idiotype Th cell responses have been hypothesized to play a role. It has been entirely unclear if these seemingly disparate Th cell responses and hypotheses could be related or unified. Here, we describe that H chain CDR3 idiotypes from IgG(+) B cells of lupus mice have sequence similarities with both microbial and self peptides. Matched sequences were more frequent within the mutated CDR3 repertoire and when sequences were derived from lupus mice with expanded anti-dsDNA B cells. Analyses of histone sequences showed that particular histone peptides were similar to VDJ junctions. Moreover, lupus mice had Th cell responses toward histone peptides similar to anti-dsDNA CDR3 sequences. The results suggest that Th cells in lupus may have multiple cross-reactive specificities linked to the IgVH CDR3 Id-peptide sequences as well as similar DNA-associated protein motifs.

  15. Dali server update.

    PubMed

    Holm, Liisa; Laakso, Laura M

    2016-07-08

    The Dali server (http://ekhidna2.biocenter.helsinki.fi/dali) is a network service for comparing protein structures in 3D. In favourable cases, comparing 3D structures may reveal biologically interesting similarities that are not detectable by comparing sequences. The Dali server has been running in various places for over 20 years and is used routinely by crystallographers on newly solved structures. The latest update of the server provides enhanced analytics for the study of sequence and structure conservation. The server performs three types of structure comparisons: (i) Protein Data Bank (PDB) search compares one query structure against those in the PDB and returns a list of similar structures; (ii) pairwise comparison compares one query structure against a list of structures specified by the user; and (iii) all against all structure comparison returns a structural similarity matrix, a dendrogram and a multidimensional scaling projection of a set of structures specified by the user. Structural superimpositions are visualized using the Java-free WebGL viewer PV. The structural alignment view is enhanced by sequence similarity searches against Uniprot. The combined structure-sequence alignment information is compressed to a stack of aligned sequence logos. In the stack, each structure is structurally aligned to the query protein and represented by a sequence logo. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. A plasma membrane sucrose-binding protein that mediates sucrose uptake shares structural and sequence similarity with seed storage proteins but remains functionally distinct.

    PubMed

    Overvoorde, P J; Chao, W S; Grimes, H D

    1997-06-20

    Photoaffinity labeling of a soybean cotyledon membrane fraction identified a sucrose-binding protein (SBP). Subsequent studies have shown that the SBP is a unique plasma membrane protein that mediates the linear uptake of sucrose in the presence of up to 30 mM external sucrose when ectopically expressed in yeast. Analysis of the SBP-deduced amino acid sequence indicates it lacks sequence similarity with other known transport proteins. Data presented here, however, indicate that the SBP shares significant sequence and structural homology with the vicilin-like seed storage proteins that organize into homotrimers. These similarities include a repeated sequence that forms the basis of the reiterated domain structure characteristic of the vicilin-like protein family. In addition, analytical ultracentrifugation and nonreducing SDS-polyacrylamide gel electrophoresis demonstrate that the SBP appears to be organized into oligomeric complexes with a Mr indicative of the existence of SBP homotrimers and homodimers. The structural similarity shared by the SBP and vicilin-like proteins provides a novel framework to explore the mechanistic basis of SBP-mediated sucrose uptake. Expression of the maize Glb protein (a vicilin-like protein closely related to the SBP) in yeast demonstrates that a closely related vicilin-like protein is unable to mediate sucrose uptake. Thus, despite sequence and structural similarities shared by the SBP and the vicilin-like protein family, the SBP is functionally divergent from other members of this group.

  17. Identification of Protective Brucella Antigens and their Expressions in Vaccinia Virus to Prevent Disease in Animals and Humans.

    DTIC Science & Technology

    1996-05-01

    see figure appendix; B. abortus sequence in similarity arrangement with secD of E.coli, H. influenzae, M. leprae and S. coelicalor). Highly related...low similarity (E. coil and M. leprae approx 13.5% similarity). The Brucella and E. coil sequences were 25% similar (see figures appendix). In E...1987. Micobacterial growth inhibition by interferon-g activated bone marrow macrophages and differential susceptibility among strains of Mycobacterium

  18. Correlating low-similarity peptide sequences and allergenic epitopes.

    PubMed

    Kanduc, D

    2008-01-01

    Although a high number of allergenic peptide epitopes has been experimentally identified and defined, the molecular basis and the precise mechanisms underlying peptide allergenicity are unknown. This issue was analyzed exploring the relationship between peptide allergenicity and sequence similarity to the human proteome. The structured analysis of the data reported in literature put into evidence that the most part of IgE-binding epitopes are (or harbor) pentapeptide unit(s) with no/low similarity to the human proteome, this way suggesting that no or low sequence similarity to the host proteome might represent a minimum common denominator identifying allergenic peptides. The present literature analysis might be of relevance in devising and designing short amino acid modules to be used for blocking pathogenic IgE.

  19. Biosequence Similarity Search on the Mercury System

    PubMed Central

    Krishnamurthy, Praveen; Buhler, Jeremy; Chamberlain, Roger; Franklin, Mark; Gyang, Kwame; Jacob, Arpith; Lancaster, Joseph

    2007-01-01

    Biosequence similarity search is an important application in modern molecular biology. Search algorithms aim to identify sets of sequences whose extensional similarity suggests a common evolutionary origin or function. The most widely used similarity search tool for biosequences is BLAST, a program designed to compare query sequences to a database. Here, we present the design of BLASTN, the version of BLAST that searches DNA sequences, on the Mercury system, an architecture that supports high-volume, high-throughput data movement off a data store and into reconfigurable hardware. An important component of application deployment on the Mercury system is the functional decomposition of the application onto both the reconfigurable hardware and the traditional processor. Both the Mercury BLASTN application design and its performance analysis are described. PMID:18846267

  20. Domain similarity based orthology detection.

    PubMed

    Bitard-Feildel, Tristan; Kemena, Carsten; Greenwood, Jenny M; Bornberg-Bauer, Erich

    2015-05-13

    Orthologous protein detection software mostly uses pairwise comparisons of amino-acid sequences to assert whether two proteins are orthologous or not. Accordingly, when the number of sequences for comparison increases, the number of comparisons to compute grows in a quadratic order. A current challenge of bioinformatic research, especially when taking into account the increasing number of sequenced organisms available, is to make this ever-growing number of comparisons computationally feasible in a reasonable amount of time. We propose to speed up the detection of orthologous proteins by using strings of domains to characterize the proteins. We present two new protein similarity measures, a cosine and a maximal weight matching score based on domain content similarity, and new software, named porthoDom. The qualities of the cosine and the maximal weight matching similarity measures are compared against curated datasets. The measures show that domain content similarities are able to correctly group proteins into their families. Accordingly, the cosine similarity measure is used inside porthoDom, the wrapper developed for proteinortho. porthoDom makes use of domain content similarity measures to group proteins together before searching for orthologs. By using domains instead of amino acid sequences, the reduction of the search space decreases the computational complexity of an all-against-all sequence comparison. We demonstrate that representing and comparing proteins as strings of discrete domains, i.e. as a concatenation of their unique identifiers, allows a drastic simplification of search space. porthoDom has the advantage of speeding up orthology detection while maintaining a degree of accuracy similar to proteinortho. The implementation of porthoDom is released using python and C++ languages and is available under the GNU GPL licence 3 at http://www.bornberglab.org/pages/porthoda .

  1. MHC-mediated sexual selection on birdsong: Generic polymorphism, particular alleles and acoustic signals.

    PubMed

    Garamszegi, László Zsolt; Zagalska-Neubauer, Magdalena; Canal, David; Blázi, György; Laczi, Miklós; Nagy, Gergely; Szöllősi, Eszter; Vaskuti, Éva; Török, János; Zsebők, Sándor

    2018-06-01

    Several hypotheses predict that the major histocompatibility complex (MHC) drives mating preference in females. Olfactory, colour or morphological traits are often found as reliable signals of the MHC profile, but the role of avian song mediating MHC-based female choice remains largely unexplored. We investigated the relationship between several MHC and acoustic features in the collared flycatcher (Ficedula albicollis), a European passerine with complex songs. We screened a fragment of the class IIB second exon of the MHC molecule, of which individuals harbour 4-15 alleles, while considerable sequence diversity is maintained at the population level. To make statistical inferences from a large number of comparisons, we adopted both null-hypothesis testing and effect size framework in combination with randomization procedures. After controlling for potential confounding factors, neither MHC allelic diversity nor the presence of particular alleles was associated remarkably with the investigated qualitative and quantitative song traits. Furthermore, genetic similarity among males based on MHC sequences was not reflected by the similarity in their song based on syllable content. Overall, these results suggest that the relationship between features of song and the allelic composition and diversity of MHC is not strong in the studied species. However, a biologically motivated analysis revealed that individuals that harbour an MHC allele that impairs survival perform songs with broader frequency range. This finding suggests that certain aspects of the song may bear reliable information concerning the MHC profile of the individuals, which can be used by females to optimize mate choice. © 2018 John Wiley & Sons Ltd.

  2. Parent and Public Interest in Whole Genome Sequencing

    PubMed Central

    Dodson, Daniel S.; Goldenberg, Aaron J.; Davis, Matthew M.; Singer, Dianne C.; Tarini, Beth A.

    2015-01-01

    Objective To assess the baseline interest of the public in whole genome sequencing (WGS) for themselves, parents’ interest in WGS for their youngest children, and factors associated with such interest. Methods A random sample of adults from a probability-based nationally representative online panel was surveyed. All participants were provided basic information about WGS and then asked their interest in WGS for themselves. Those participants who self-identified as parents were asked about their interest in WGS for their children. The order in which parents were asked about their interest in WGS for themselves and their child was randomized. The relationship between parent/child characteristics and interest in WGS was examined. Results Overall response rate was 62% (55% among parents). 58.6% of the total population (parents and non-parents) was interested in WGS for themselves. Similarly, 61.8% of parents were interested in WGS for themselves and 57.8% were interested in WGS for their youngest children. Of note, 84.7% of parents showed an identical interest level in WGS for themselves and their youngest children. Mothers as a whole, and parents whose youngest children had ≥2 health conditions had significantly more interest in WGS for themselves and their youngest children, while those with conservative political ideologies had considerably less. Conclusions While U.S. adults have varying interest levels in WGS, parents appear to have similar interests in genome testing for themselves and their youngest children. As WGS technology becomes available in the clinic and private market, clinicians should be prepared to discuss WGS risks and benefits with their patients. PMID:25765282

  3. Measures of phylogenetic differentiation provide robust and complementary insights into microbial communities.

    PubMed

    Parks, Donovan H; Beiko, Robert G

    2013-01-01

    High-throughput sequencing techniques have made large-scale spatial and temporal surveys of microbial communities routine. Gaining insight into microbial diversity requires methods for effectively analyzing and visualizing these extensive data sets. Phylogenetic β-diversity measures address this challenge by allowing the relationship between large numbers of environmental samples to be explored using standard multivariate analysis techniques. Despite the success and widespread use of phylogenetic β-diversity measures, an extensive comparative analysis of these measures has not been performed. Here, we compare 39 measures of phylogenetic β diversity in order to establish the relative similarity of these measures along with key properties and performance characteristics. While many measures are highly correlated, those commonly used within microbial ecology were found to be distinct from those popular within classical ecology, and from the recently recommended Gower and Canberra measures. Many of the measures are surprisingly robust to different rootings of the gene tree, the choice of similarity threshold used to define operational taxonomic units, and the presence of outlying basal lineages. Measures differ considerably in their sensitivity to rare organisms, and the effectiveness of measures can vary substantially under alternative models of differentiation. Consequently, the depth of sequencing required to reveal underlying patterns of relationships between environmental samples depends on the selected measure. Our results demonstrate that using complementary measures of phylogenetic β diversity can further our understanding of how communities are phylogenetically differentiated. Open-source software implementing the phylogenetic β-diversity measures evaluated in this manuscript is available at http://kiwi.cs.dal.ca/Software/ExpressBetaDiversity.

  4. Bacterial and archaeal phylogenetic diversity of a cold sulfur-rich spring on the shoreline of Lake Erie, Michigan

    USGS Publications Warehouse

    Chaudhary, A.; Haack, S.K.; Duris, J.W.; Marsh, T.L.

    2009-01-01

    Studies of sulfidic springs have provided new insights into microbial metabolism, groundwater biogeochemistry, and geologic processes. We investigated Great Sulphur Spring on the western shore of Lake Erie and evaluated the phylogenetic affiliations of 189 bacterial and 77 archaeal 16S rRNA gene sequences from three habitats: the spring origin (11-m depth), bacterial-algal mats on the spring pond surface, and whitish filamentous materials from the spring drain. Water from the spring origin water was cold, pH 6.3, and anoxic (H2, 5.4 nM; CH4, 2.70 ??M) with concentrations of S2- (0.03 mM), SO42- (14.8 mM), Ca2+ (15.7 mM), and HCO3- (4.1 mM) similar to those in groundwater from the local aquifer. No archaeal and few bacterial sequences were >95% similar to sequences of cultivated organisms. Bacterial sequences were largely affiliated with sulfur-metabolizing or chemolithotrophic taxa in Beta-, Gamma-, Delta-, and Epsilonproteobacteria. Epsilonproteobacteria sequences similar to those obtained from other sulfidic environments and a new clade of Cyanobacteria sequences were particularly abundant (16% and 40%, respectively) in the spring origin clone library. Crenarchaeota sequences associated with archaeal-bacterial consortia in whitish filaments at a German sulfidic spring were detected only in a similar habitat at Great Sulphur Spring. This study expands the geographic distribution of many uncultured Archaea and Bacteria sequences to the Laurentian Great Lakes, indicates possible roles for epsilonproteobacteria in local aquifer chemistry and karst formation, documents new oscillatorioid Cyanobacteria lineages, and shows that uncultured, cold-adapted Crenarchaeota sequences may comprise a significant part of the microbial community of some sulfidic environments. Copyright ?? 2009, American Society for Microbiology. All Rights Reserved.

  5. Next-Generation Sequencing in Oncology: Genetic Diagnosis, Risk Prediction and Cancer Classification

    PubMed Central

    Kamps, Rick; Brandão, Rita D.; van den Bosch, Bianca J.; Paulussen, Aimee D. C.; Xanthoulea, Sofia; Blok, Marinus J.; Romano, Andrea

    2017-01-01

    Next-generation sequencing (NGS) technology has expanded in the last decades with significant improvements in the reliability, sequencing chemistry, pipeline analyses, data interpretation and costs. Such advances make the use of NGS feasible in clinical practice today. This review describes the recent technological developments in NGS applied to the field of oncology. A number of clinical applications are reviewed, i.e., mutation detection in inherited cancer syndromes based on DNA-sequencing, detection of spliceogenic variants based on RNA-sequencing, DNA-sequencing to identify risk modifiers and application for pre-implantation genetic diagnosis, cancer somatic mutation analysis, pharmacogenetics and liquid biopsy. Conclusive remarks, clinical limitations, implications and ethical considerations that relate to the different applications are provided. PMID:28146134

  6. Rapid Threat Organism Recognition Pipeline

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Williams, Kelly P.; Solberg, Owen D.; Schoeniger, Joseph S.

    2013-05-07

    The RAPTOR computational pipeline identifies microbial nucleic acid sequences present in sequence data from clinical samples. It takes as input raw short-read genomic sequence data (in particular, the type generated by the Illumina sequencing platforms) and outputs taxonomic evaluation of detected microbes in various human-readable formats. This software was designed to assist in the diagnosis or characterization of infectious disease, by detecting pathogen sequences in nucleic acid sequence data from clinical samples. It has also been applied in the detection of algal pathogens, when algal biofuel ponds became unproductive. RAPTOR first trims and filters genomic sequence reads based on qualitymore » and related considerations, then performs a quick alignment to the human (or other host) genome to filter out host sequences, then performs a deeper search against microbial genomes. Alignment to a protein sequence database is optional. Alignment results are summarized and placed in a taxonomic framework using the Lowest Common Ancestor algorithm.« less

  7. Incorrectly predicted genes in rice?

    PubMed

    Cruveiller, Stéphane; Jabbari, Kamel; Clay, Oliver; Bernardi, Giorgio

    2004-05-26

    Between one third and one half of the proposed rice genes appear to have no homologs in other species, including Arabidopsis. Compositional considerations, and a comparison of curated rice sequences with ex novo predictions, suggest that many or most of the putative genes without homologs may be false positive predictions, i.e., sequences that are never translated into functional proteins in vivo.

  8. The nop gene from Phanerochaete chrysosporium encodes a peroxidase with novel structural features

    Treesearch

    Luis F. Larrondo; Angel Gonzalez; Tomas Perez-Acle; Dan Cullen; Rafael Vicuna

    2005-01-01

    Inspection of the genome of the ligninolytic basidiomycete Phanerochaete chrysosporium revealed an unusual peroxidase-like sequence. The corresponding full length cDNA was sequenced and an archetypal secretion signal predicted. The deduced mature protein (NoP, novel peroxidase) contains 295 aa residues and is therefore considerably shorter than other Class II (fungal)...

  9. Mars Surface Operations via Low-Latency Telerobotics from Phobos

    NASA Technical Reports Server (NTRS)

    Wright, Michael; Lupisella, Mark

    2016-01-01

    To help assess the feasibility and timing of Low-Latency Telerobotics (LLT) operations on Mars via a Phobos telecommand base, operations concepts (ops cons) and timelines for several representative sequences for Mars surface operations have been developed. A summary of these LLT sequences and timelines will be presented, along with associated assumptions, operational considerations, and challenges.

  10. Diversity of 16S rRNA genes of new Ehrlichia strains isolated from horses with clinical signs of Potomac horse fever.

    PubMed

    Wen, B; Rikihisa, Y; Fuerst, P A; Chaichanasiriwithaya, W

    1995-04-01

    Ehrlichia risticii is the causative agent of Potomac horse fever. Variations among the major antigens of different local E. risticii strains have been detected previously. To further assess genetic variability in this species or species complex, the sequences of the 16S rRNA genes of several isolates obtained from sick horses diagnosed as having Potomac horse fever were determined. The sequences of six isolates obtained from Ohio and three isolates obtained from Kentucky were amplified by PCR. Three groups of sequences were identified. The sequences of five of the Ohio isolates were identical to the sequence of the type strain of E. risticii, the Illinois strain. The sequence of one Ohio isolate, isolate 081, was unique; this sequence differed in 10 nucleotides from the sequence of the type strain (level of similarity, 99.3%). The sequences of the three Kentucky isolates were identical to each other, but differed by five bases from the sequence of the type strain (level of similarity, 99.6%). The levels of sequence similarity of isolate 081, the Kentucky isolates, and the type strain to the next most closely related Ehrlichia sp., Ehrlichia sennetsu, were 99.3, 99.2, and 99.2%, respectively. On the basis of the distinct antigenic profiles and the levels of 16S rRNA sequence divergence, isolate 081 is as divergent from the type strain of E. risticii as E. sennetsu is. Therefore, we suggest that strain 081 and the Kentucky isolates may represent two new distinct Ehrlichia species.

  11. Exploring the sequence-structure protein landscape in the glycosyltransferase family

    PubMed Central

    Zhang, Ziding; Kochhar, Sunil; Grigorov, Martin

    2003-01-01

    To understand the molecular basis of glycosyltransferases’ (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBank and Swissprot. First, GTF sequences were retrieved and classified into clusters, based on sequence similarity only. Intracluster sequence similarity was chosen sufficiently high to ensure that the same fold is found within a given cluster. Then, a representative sequence from each cluster was selected to compose a subset of GTF sequences. The members of this reduced set were processed by three different fold recognition methods: 3D-PSSM, FUGUE, and GeneFold. Finally, the results from different fold recognition methods were analyzed and compared to sequence-similarity search methods (i.e., BLAST and PSI-BLAST). It was established that the folds of about 70% of all currently known GTF sequences can be confidently assigned by fold recognition methods, a value which is higher than the fold identification rate based on sequence comparison alone (48% for BLAST and 64% for PSI-BLAST). The identified folds were submitted to 3D clustering, and we found that most of the GTF sequences adopt the typical GTF A or GTF B folds. Our results indicate a lack of evidence that new GTF folds (i.e., folds other than GTF A and B) exist. Based on cases where fold identification was not possible, we suggest several sequences as the most promising targets for a structural genomics initiative focused on the GTF protein family. PMID:14500887

  12. Quantifying the Relationships among Drug Classes

    PubMed Central

    Hert, Jérôme; Keiser, Michael J.; Irwin, John J.; Oprea, Tudor I.; Shoichet, Brian K.

    2009-01-01

    The similarity of drug targets is typically measured using sequence or structural information. Here, we consider chemo-centric approaches that measure target similarity on the basis of their ligands, asking how chemoinformatics similarities differ from those derived bioinformatically, how stable the ligand networks are to changes in chemoinformatics metrics, and which network is the most reliable for prediction of pharmacology. We calculated the similarities between hundreds of drug targets and their ligands and mapped the relationship between them in a formal network. Bioinformatics networks were based on the BLAST similarity between sequences, while chemoinformatics networks were based on the ligand-set similarities calculated with either the Similarity Ensemble Approach (SEA) or a method derived from Bayesian statistics. By multiple criteria, bioinformatics and chemoinformatics networks differed substantially, and only occasionally did a high sequence similarity correspond to a high ligand-set similarity. In contrast, the chemoinformatics networks were stable to the method used to calculate the ligand-set similarities and to the chemical representation of the ligands. Also, the chemoinformatics networks were more natural and more organized, by network theory, than their bioinformatics counterparts: ligand-based networks were found to be small-world and broad-scale. PMID:18335977

  13. The use of high-throughput small RNA sequencing reveals differentially expressed microRNAs in response to aster yellows phytoplasma-infection in Vitis vinifera cv. ‘Chardonnay’

    PubMed Central

    Solofoharivelo, Marie-Chrystine; Souza-Richards, Rose; Stephan, Dirk; Murray, Shane; Burger, Johan T.

    2017-01-01

    Phytoplasmas are cell wall-less plant pathogenic bacteria responsible for major crop losses throughout the world. In grapevine they cause grapevine yellows, a detrimental disease associated with a variety of symptoms. The high economic impact of this disease has sparked considerable interest among researchers to understand molecular mechanisms related to pathogenesis. Increasing evidence exist that a class of small non-coding endogenous RNAs, known as microRNAs (miRNAs), play an important role in post-transcriptional gene regulation during plant development and responses to biotic and abiotic stresses. Thus, we aimed to dissect complex high-throughput small RNA sequencing data for the genome-wide identification of known and novel differentially expressed miRNAs, using read libraries constructed from healthy and phytoplasma-infected Chardonnay leaf material. Furthermore, we utilised computational resources to predict putative miRNA targets to explore the involvement of possible pathogen response pathways. We identified multiple known miRNA sequence variants (isomiRs), likely generated through post-transcriptional modifications. Sequences of 13 known, canonical miRNAs were shown to be differentially expressed. A total of 175 novel miRNA precursor sequences, each derived from a unique genomic location, were predicted, of which 23 were differentially expressed. A homology search revealed that some of these novel miRNAs shared high sequence similarity with conserved miRNAs from other plant species, as well as known grapevine miRNAs. The relative expression of randomly selected known and novel miRNAs was determined with real-time RT-qPCR analysis, thereby validating the trend of expression seen in the normalised small RNA sequencing read count data. Among the putative miRNA targets, we identified genes involved in plant morphology, hormone signalling, nutrient homeostasis, as well as plant stress. Our results may assist in understanding the role that miRNA pathways play during plant pathogenesis, and may be crucial in understanding disease symptom development in aster yellows phytoplasma-infected grapevines. PMID:28813447

  14. Molecular evolution of ependymin and the phylogenetic resolution of early divergences among euteleost fishes.

    PubMed

    Ortí, G; Meyer, A

    1996-04-01

    The rate and pattern of DNA evolution of ependymin, a single-copy gene coding for a highly expressed glycoprotein in the brain matrix of teleost fishes, is characterized and its phylogenetic utility for fish systematics is assessed. DNA sequences were determined from catfish, electric fish, and characiforms and compared with published ependymin sequences from cyprinids, salmon, pike, and herring. Among these groups, ependymin amino acid sequences were highly divergent (up to 60% sequence difference), but had surprisingly similar hydropathy profiles and invariant glycosylation sites, suggesting that functional properties of the proteins are conserved. Comparison of base composition at third codon positions and introns revealed AT-rich introns and GC-rich third codon positions, suggesting that the biased codon usage observed might not be due to mutational bias. Phylogenetic information content of third codon positions was surprisingly high and sufficient to recover the most basal nodes of the tree, in spite of the observation that pairwise distances (at third codon positions) were well above the presumed saturation level. This finding can be explained by the high proportion of phylogenetically informative nonsynonymous changes at third codon positions among these highly divergent proteins. Ependymin DNA sequences have established the first molecular evidence for the monophyly of a group containing salmonids and esociforms. In addition, ependymin suggests a sister group relationship of electric fish (Gymnotiformes) and Characiformes, constituting a significant departure from currently accepted classifications. However, relationships among characiform lineages were not completely resolved by ependymin sequences in spite of seemingly appropriate levels of variation among taxa and considerably low levels of homoplasy in the data (consistency index = 0.7). If the diversification of Characiformes took place in an "explosive" manner, over a relatively short period of time this pattern should also be observed using other phylogenetic markers. Poor conservation of ependymin's primary structure hinders the design of efficient primers for PCR that could be used in wide-ranging fish systematic studies. However, alternative methods like PCR amplification from cDNA used here should provide promising comparative sequence data for the resolution of phylogenetic relationships among other basal lineages of teleost fishes.

  15. Diverse molecular signatures for ribosomally ‘active’ Perkinsea in marine sediments

    PubMed Central

    2014-01-01

    Background Perkinsea are a parasitic lineage within the eukaryotic superphylum Alveolata. Recent studies making use of environmental small sub-unit ribosomal RNA gene (SSU rDNA) sequencing methodologies have detected a significant diversity and abundance of Perkinsea-like phylotypes in freshwater environments. In contrast only a few Perkinsea environmental sequences have been retrieved from marine samples and only two groups of Perkinsea have been cultured and morphologically described and these are parasites of marine molluscs or marine protists. These two marine groups form separate and distantly related phylogenetic clusters, composed of closely related lineages on SSU rDNA trees. Here, we test the hypothesis that Perkinsea are a hitherto under-sampled group in marine environments. Using 454 diversity ‘tag’ sequencing we investigate the diversity and distribution of these protists in marine sediments and water column samples taken from the Deep Chlorophyll Maximum (DCM) and sub-surface using both DNA and RNA as the source template and sampling four European offshore locations. Results We detected the presence of 265 sequences branching with known Perkinsea, the majority of them recovered from marine sediments. Moreover, 27% of these sequences were sampled from RNA derived cDNA libraries. Phylogenetic analyses classify a large proportion of these sequences into 38 cluster groups (including 30 novel marine cluster groups), which share less than 97% sequence similarity suggesting this diversity encompasses a range of biologically and ecologically distinct organisms. Conclusions These results demonstrate that the Perkinsea lineage is considerably more diverse than previously detected in marine environments. This wide diversity of Perkinsea-like protists is largely retrieved in marine sediment with a significant proportion detected in RNA derived libraries suggesting this diversity represents ribosomally ‘active’ and intact cells. Given the phylogenetic range of hosts infected by known Perkinsea parasites, these data suggest that Perkinsea either play a significant but hitherto unrecognized role as parasites in marine sediments and/or members of this group are present in the marine sediment possibly as part of the ‘seed bank’ microbial community. PMID:24779375

  16. A galaxy of folds.

    PubMed

    Alva, Vikram; Remmert, Michael; Biegert, Andreas; Lupas, Andrei N; Söding, Johannes

    2010-01-01

    Many protein classification systems capture homologous relationships by grouping domains into families and superfamilies on the basis of sequence similarity. Superfamilies with similar 3D structures are further grouped into folds. In the absence of discernable sequence similarity, these structural similarities were long thought to have originated independently, by convergent evolution. However, the growth of databases and advances in sequence comparison methods have led to the discovery of many distant evolutionary relationships that transcend the boundaries of superfamilies and folds. To investigate the contributions of convergent versus divergent evolution in the origin of protein folds, we clustered representative domains of known structure by their sequence similarity, treating them as point masses in a virtual 2D space which attract or repel each other depending on their pairwise sequence similarities. As expected, families in the same superfamily form tight clusters. But often, superfamilies of the same fold are linked with each other, suggesting that the entire fold evolved from an ancient prototype. Strikingly, some links connect superfamilies with different folds. They arise from modular peptide fragments of between 20 and 40 residues that co-occur in the connected folds in disparate structural contexts. These may be descendants of an ancestral pool of peptide modules that evolved as cofactors in the RNA world and from which the first folded proteins arose by amplification and recombination. Our galaxy of folds summarizes, in a single image, most known and many yet undescribed homologous relationships between protein superfamilies, providing new insights into the evolution of protein domains.

  17. Predicting protein contact map using evolutionary and physical constraints by integer programming.

    PubMed

    Wang, Zhiyong; Xu, Jinbo

    2013-07-01

    Protein contact map describes the pairwise spatial and functional relationship of residues in a protein and contains key information for protein 3D structure prediction. Although studied extensively, it remains challenging to predict contact map using only sequence information. Most existing methods predict the contact map matrix element-by-element, ignoring correlation among contacts and physical feasibility of the whole-contact map. A couple of recent methods predict contact map by using mutual information, taking into consideration contact correlation and enforcing a sparsity restraint, but these methods demand for a very large number of sequence homologs for the protein under consideration and the resultant contact map may be still physically infeasible. This article presents a novel method PhyCMAP for contact map prediction, integrating both evolutionary and physical restraints by machine learning and integer linear programming. The evolutionary restraints are much more informative than mutual information, and the physical restraints specify more concrete relationship among contacts than the sparsity restraint. As such, our method greatly reduces the solution space of the contact map matrix and, thus, significantly improves prediction accuracy. Experimental results confirm that PhyCMAP outperforms currently popular methods no matter how many sequence homologs are available for the protein under consideration. http://raptorx.uchicago.edu.

  18. Cospeciation of Psyllids and Their Primary Prokaryotic Endosymbionts

    PubMed Central

    Thao, MyLo L.; Moran, Nancy A.; Abbot, Patrick; Brennan, Eric B.; Burckhardt, Daniel H.; Baumann, Paul

    2000-01-01

    Psyllids are plant sap-feeding insects that harbor prokaryotic endosymbionts in specialized cells within the body cavity. Four-kilobase DNA fragments containing 16S and 23S ribosomal DNA (rDNA) were amplified from the primary (P) endosymbiont of 32 species of psyllids representing three psyllid families and eight subfamilies. In addition, 0.54-kb fragments of the psyllid nuclear gene wingless were also amplified from 26 species. Phylogenetic trees derived from 16S-23S rDNA and from the host wingless gene are very similar, and tests of compatibility of the data sets show no significant conflict between host and endosymbiont phylogenies. This result is consistent with a single infection of a shared psyllid ancestor and subsequent cospeciation of the host and the endosymbiont. In addition, the phylogenies based on DNA sequences generally agreed with psyllid taxonomy based on morphology. The 3′ end of the 16S rDNA of the P endosymbionts differs from that of other members of the domain Bacteria in the lack of a sequence complementary to the mRNA ribosome binding site. The rate of sequence change in the 16S-23S rDNA of the psyllid P endosymbiont was considerably higher than that of other bacteria, including other fast-evolving insect endosymbionts. The lineage consisting of the P endosymbionts of psyllids was given the designation Candidatus Carsonella (gen. nov.) with a single species, Candidatus Carsonella ruddii (sp. nov.). PMID:10877784

  19. Next-Generation Sequencing of Matched Primary and Metastatic Rectal Adenocarcinomas Demonstrates Minimal Mutation Gain and Concordance to Colonic Adenocarcinomas.

    PubMed

    Crumley, Suzanne M; Pepper, Kristi L; Phan, Alexandria T; Olsen, Randall J; Schwartz, Mary R; Portier, Bryce P

    2016-06-01

    -Colorectal carcinoma is the third most common cause of cancer death in males and females in the United States. Rectal adenocarcinoma can have distinct therapeutic and surgical management from colonic adenocarcinoma owing to its location and anatomic considerations. -To determine the oncologic driver mutations and better understand the molecular pathogenesis of rectal adenocarcinoma in relation to colon adenocarcinoma. -Next-generation sequencing was performed on 20 cases of primary rectal adenocarcinoma with a paired lymph node or solid organ metastasis by using an amplicon-based assay of more than 2800 Catalogue of Somatic Mutations in Cancer (COSMIC)-identified somatic mutations. -Next-generation sequencing data were obtained on both the primary tumor and metastasis from 16 patients. Most rectal adenocarcinoma cases demonstrated identical mutations in the primary tumor and metastasis (13 of 16, 81%). The mutations identified, listed in order of frequency, included TP53, KRAS, APC, FBXW7, GNAS, FGFR3, BRAF, NRAS, PIK3CA, and SMAD4. -The somatic mutations identified in our rectal adenocarcinoma cohort showed a strong correlation to those previously characterized in colonic adenocarcinoma. In addition, most rectal adenocarcinomas harbored identical somatic mutations in both the primary tumor and metastasis. These findings demonstrate evidence that rectal adenocarcinoma follows a similar molecular pathogenesis as colonic adenocarcinoma and that sampling either the primary or metastatic lesion is valid for initial evaluation of somatic mutations and selection of possible targeted therapy.

  20. Comparative genomic sequence analysis of strawberry and other rosids reveals significant microsynteny

    PubMed Central

    2010-01-01

    Background Fragaria belongs to the Rosaceae, an economically important family that includes a number of important fruit producing genera such as Malus and Prunus. Using genomic sequences from 50 Fragaria fosmids, we have examined the microsynteny between Fragaria and other plant models. Results In more than half of the strawberry fosmids, we found syntenic regions that are conserved in Populus, Vitis, Medicago and/or Arabidopsis with Populus containing the greatest number of syntenic regions with Fragaria. The longest syntenic region was between LG VIII of the poplar genome and the strawberry fosmid 72E18, where seven out of twelve predicted genes were collinear. We also observed an unexpectedly high level of conserved synteny between Fragaria (rosid I) and Vitis (basal rosid). One of the strawberry fosmids, 34E24, contained a cluster of R gene analogs (RGAs) with NBS and LRR domains. We detected clusters of RGAs with high sequence similarity to those in 34E24 in all the genomes compared. In the phylogenetic tree we have generated, all the NBS-LRR genes grouped together with Arabidopsis CNL-A type NBS-LRR genes. The Fragaria RGA grouped together with those of Vitis and Populus in the phylogenetic tree. Conclusions Our analysis shows considerable microsynteny between Fragaria and other plant genomes such as Populus, Medicago, Vitis, and Arabidopsis to a lesser degree. We also detected a cluster of NBS-LRR type genes that are conserved in all the genomes compared. PMID:20565715

  1. A conserved genetic module that encodes the major virion components in both the coliphage T4 and the marine cyanophage S-PM2

    PubMed Central

    Hambly, Emma; Tétart, Francoise; Desplats, Carine; Wilson, William H.; Krisch, Henry M.; Mann, Nicholas H.

    2001-01-01

    Sequence analysis of a 10-kb region of the genome of the marine cyanomyovirus S-PM2 reveals a homology to coliphage T4 that extends as a contiguous block from gene (g)18 to g23. The order of the S-PM2 genes in this region is similar to that of T4, but there are insertions and deletions of small ORFs of unknown function. In T4, g18 codes for the tail sheath, g19, the tail tube, g20, the head portal protein, g21, the prohead core protein, g22, a scaffolding protein, and g23, the major capsid protein. Thus, the entire module that determines the structural components of the phage head and contractile tail is conserved between T4 and this cyanophage. The significant differences in the morphology of these phages must reflect the considerable divergence of the amino acid sequence of their homologous virion proteins, which uniformly exceeds 50%. We suggest that their enormous diversity in the sea could be a result of genetic shuffling between disparate phages mediated by such commonly shared modules. These conserved sequences could facilitate genetic exchange by providing partially homologous substrates for recombination between otherwise divergent phage genomes. Such a mechanism would thus expand the pool of phage genes accessible by recombination to all those phages that share common modules. PMID:11553768

  2. Influence of quasi-specific sites on kinetics of target DNA search by a sequence-specific DNA-binding protein.

    PubMed

    Kemme, Catherine A; Esadze, Alexandre; Iwahara, Junji

    2015-11-10

    Functions of transcription factors require formation of specific complexes at particular sites in cis-regulatory elements of genes. However, chromosomal DNA contains numerous sites that are similar to the target sequences recognized by transcription factors. The influence of such "quasi-specific" sites on functions of the transcription factors is not well understood at present by experimental means. In this work, using fluorescence methods, we have investigated the influence of quasi-specific DNA sites on the efficiency of target location by the zinc finger DNA-binding domain of the inducible transcription factor Egr-1, which recognizes a 9 bp sequence. By stopped-flow assays, we measured the kinetics of Egr-1's association with a target site on 143 bp DNA in the presence of various competitor DNAs, including nonspecific and quasi-specific sites. The presence of quasi-specific sites on competitor DNA significantly decelerated the target association by the Egr-1 protein. The impact of the quasi-specific sites depended strongly on their affinity, their concentration, and the degree of their binding to the protein. To quantitatively describe the kinetic impact of the quasi-specific sites, we derived an analytical form of the apparent kinetic rate constant for the target association and used it for fitting to the experimental data. Our kinetic data with calf thymus DNA as a competitor suggested that there are millions of high-affinity quasi-specific sites for Egr-1 among the 3 billion bp of genomic DNA. This study quantitatively demonstrates that naturally abundant quasi-specific sites on DNA can considerably impede the target search processes of sequence-specific DNA-binding proteins.

  3. Influence of Quasi-Specific Sites on Kinetics of Target DNA Search by a Sequence-Specific DNA-Binding Protein

    PubMed Central

    2015-01-01

    Functions of transcription factors require formation of specific complexes at particular sites in cis-regulatory elements of genes. However, chromosomal DNA contains numerous sites that are similar to the target sequences recognized by transcription factors. The influence of such “quasi-specific” sites on functions of the transcription factors is not well understood at present by experimental means. In this work, using fluorescence methods, we have investigated the influence of quasi-specific DNA sites on the efficiency of target location by the zinc finger DNA-binding domain of the inducible transcription factor Egr-1, which recognizes a 9 bp sequence. By stopped-flow assays, we measured the kinetics of Egr-1’s association with a target site on 143 bp DNA in the presence of various competitor DNAs, including nonspecific and quasi-specific sites. The presence of quasi-specific sites on competitor DNA significantly decelerated the target association by the Egr-1 protein. The impact of the quasi-specific sites depended strongly on their affinity, their concentration, and the degree of their binding to the protein. To quantitatively describe the kinetic impact of the quasi-specific sites, we derived an analytical form of the apparent kinetic rate constant for the target association and used it for fitting to the experimental data. Our kinetic data with calf thymus DNA as a competitor suggested that there are millions of high-affinity quasi-specific sites for Egr-1 among the 3 billion bp of genomic DNA. This study quantitatively demonstrates that naturally abundant quasi-specific sites on DNA can considerably impede the target search processes of sequence-specific DNA-binding proteins. PMID:26502071

  4. Genome sequence and rapid evolution of the rice pathogen Xanthomonas oryzae pv. oryzae PXO99A

    PubMed Central

    Salzberg, Steven L; Sommer, Daniel D; Schatz, Michael C; Phillippy, Adam M; Rabinowicz, Pablo D; Tsuge, Seiji; Furutani, Ayako; Ochiai, Hirokazu; Delcher, Arthur L; Kelley, David; Madupu, Ramana; Puiu, Daniela; Radune, Diana; Shumway, Martin; Trapnell, Cole; Aparna, Gudlur; Jha, Gopaljee; Pandey, Alok; Patil, Prabhu B; Ishihara, Hiromichi; Meyer, Damien F; Szurek, Boris; Verdier, Valerie; Koebnik, Ralf; Dow, J Maxwell; Ryan, Robert P; Hirata, Hisae; Tsuyumu, Shinji; Won Lee, Sang; Ronald, Pamela C; Sonti, Ramesh V; Van Sluys, Marie-Anne; Leach, Jan E; White, Frank F; Bogdanove, Adam J

    2008-01-01

    Background Xanthomonas oryzae pv. oryzae causes bacterial blight of rice (Oryza sativa L.), a major disease that constrains production of this staple crop in many parts of the world. We report here on the complete genome sequence of strain PXO99A and its comparison to two previously sequenced strains, KACC10331 and MAFF311018, which are highly similar to one another. Results The PXO99A genome is a single circular chromosome of 5,240,075 bp, considerably longer than the genomes of the other strains (4,941,439 bp and 4,940,217 bp, respectively), and it contains 5083 protein-coding genes, including 87 not found in KACC10331 or MAFF311018. PXO99A contains a greater number of virulence-associated transcription activator-like effector genes and has at least ten major chromosomal rearrangements relative to KACC10331 and MAFF311018. PXO99A contains numerous copies of diverse insertion sequence elements, members of which are associated with 7 out of 10 of the major rearrangements. A rapidly-evolving CRISPR (clustered regularly interspersed short palindromic repeats) region contains evidence of dozens of phage infections unique to the PXO99A lineage. PXO99A also contains a unique, near-perfect tandem repeat of 212 kilobases close to the replication terminus. Conclusion Our results provide striking evidence of genome plasticity and rapid evolution within Xanthomonas oryzae pv. oryzae. The comparisons point to sources of genomic variation and candidates for strain-specific adaptations of this pathogen that help to explain the extraordinary diversity of Xanthomonas oryzae pv. oryzae genotypes and races that have been isolated from around the world. PMID:18452608

  5. The Sudden Dominance of bla CTX–M Harbouring Plasmids in Shigella spp. Circulating in Southern Vietnam

    PubMed Central

    Nhu, Nguyen Thi Khanh; Vinh, Ha; Nga, Tran Vu Thieu; Stabler, Richard; Duy, Pham Thanh; Thi Minh Vien, Le; van Doorn, H. Rogier; Cerdeño-Tárraga, Ana; Thomson, Nicholas; Campbell, James; Van Minh Hoang, Nguyen; Thi Thu Nga, Tran; Minh, Pham Van; Thuy, Cao Thu; Wren, Brendan; Farrar, Jeremy; Baker, Stephen

    2010-01-01

    Background Plasmid mediated antimicrobial resistance in the Enterobacteriaceae is a global problem. The rise of CTX-M class extended spectrum beta lactamases (ESBLs) has been well documented in industrialized countries. Vietnam is representative of a typical transitional middle income country where the spectrum of infectious diseases combined with the spread of drug resistance is shifting and bringing new healthcare challenges. Methodology We collected hospital admission data from the pediatric population attending the hospital for tropical diseases in Ho Chi Minh City with Shigella infections. Organisms were cultured from all enrolled patients and subjected to antimicrobial susceptibility testing. Those that were ESBL positive were subjected to further investigation. These investigations included PCR amplification for common ESBL genes, plasmid investigation, conjugation, microarray hybridization and DNA sequencing of a bla CTX–M encoding plasmid. Principal Findings We show that two different bla CTX-M genes are circulating in this bacterial population in this location. Sequence of one of the ESBL plasmids shows that rather than the gene being integrated into a preexisting MDR plasmid, the bla CTX-M gene is located on relatively simple conjugative plasmid. The sequenced plasmid (pEG356) carried the bla CTX-M-24 gene on an ISEcp1 element and demonstrated considerable sequence homology with other IncFI plasmids. Significance The rapid dissemination, spread of antimicrobial resistance and changing population of Shigella spp. concurrent with economic growth are pertinent to many other countries undergoing similar development. Third generation cephalosporins are commonly used empiric antibiotics in Ho Chi Minh City. We recommend that these agents should not be considered for therapy of dysentery in this setting. PMID:20544028

  6. Diversity of diazotrophic gut inhabitants of pikas (Ochotonidae) revealed by PCR-DGGE analysis.

    PubMed

    Kizilova, A K; Kravchenko, I K

    2014-01-01

    Diazotrophic gut symbionts are considered to act as nitrogen providers for their hosts, as was shown for various termite species. Although the diet of lagomorphs, like pikas or rabbits, is very poor in nitrogen and energy, their fecal matter contains 30-40% of protein. Since our hypothesis was that pikas maintained a diazotrophic consortium in their gastrointestinal tract, we conducted the first investigation of microbial diversity in pika guts. We obtained gut samples from animals of several Ochotona species, O. hyperborea (Northern pika), O. mantchurica (Manchurian pika), and O. dauurica (Daurian pika), in order to retrieve and compare the nitrogen-fixing communities of different pika species. The age and gender of the animals were taken into consideration. We amplified 320-bp long fragments of the nifH gene using the DNA extracted directly from the colon and cecum samples of pika's gut, resolved them by DGGE, and performed phylogenetic reconstruction of 51 sequences obtained from excised bands. No significant difference was detected between the nitrogen-fixing gut inhabitants of different pika species. NifH sequences fell into two clusters. The first cluster contained the sequences affiliated with NifH Cluster I (Zehr et al., 2003) with similarity to Sphingomonas sp., Bradyrhizobium sp., and various uncultured bacteria from soil and rhizosphere. Sequences from the second group were related to Treponema sp., Fibrobacter succinogenes, and uncultured clones from the guts of various termites and belonged to NifH Cluster III. We suggest that diazotrophic organisms from the second cluster are genuine endosymbionts of pikas and provide nitrogen for further synthesis processes thus allowing these animals not to be short of protein.

  7. Molecular beacon sequence design algorithm.

    PubMed

    Monroe, W Todd; Haselton, Frederick R

    2003-01-01

    A method based on Web-based tools is presented to design optimally functioning molecular beacons. Molecular beacons, fluorogenic hybridization probes, are a powerful tool for the rapid and specific detection of a particular nucleic acid sequence. However, their synthesis costs can be considerable. Since molecular beacon performance is based on its sequence, it is imperative to rationally design an optimal sequence before synthesis. The algorithm presented here uses simple Microsoft Excel formulas and macros to rank candidate sequences. This analysis is carried out using mfold structural predictions along with other free Web-based tools. For smaller laboratories where molecular beacons are not the focus of research, the public domain algorithm described here may be usefully employed to aid in molecular beacon design.

  8. The Social Meaning of Leisure in Uganda and America.

    ERIC Educational Resources Information Center

    Crandall, Rich; Thompson, Richard W.

    1978-01-01

    This paper analyzes cross-culturally the importance of social contact for leisure. The general findings of considerable similarity in evaluating preferences and the importance of social considerations provide a basis for preliminary comparisons and suggest that similar factors can affect leisure preferences in different cultural settings.…

  9. Identification of Mycobacterium spp. of veterinary importance using rpoB gene sequencing

    PubMed Central

    2011-01-01

    Background Studies conducted on Mycobacterium spp. isolated from human patients indicate that sequencing of a 711 bp portion of the rpoB gene can be useful in assigning a species identity, particularly for members of the Mycobacterium avium complex (MAC). Given that MAC are important pathogens in livestock, companion animals, and zoo/exotic animals, we were interested in evaluating the use of rpoB sequencing for identification of Mycobacterium isolates of veterinary origin. Results A total of 386 isolates, collected over 2008 - June 2011 from 378 animals (amphibians, reptiles, birds, and mammals) underwent PCR and sequencing of a ~ 711 bp portion of the rpoB gene; 310 isolates (80%) were identified to the species level based on similarity at ≥ 98% with a reference sequence. The remaining 76 isolates (20%) displayed < 98% similarity with reference sequences and were assigned to a clade based on their location in a neighbor-joining tree containing reference sequences. For a subset of 236 isolates that received both 16S rRNA and rpoB sequencing, 167 (70%) displayed a similar species/clade assignation for both sequencing methods. For the remaining 69 isolates, species/clade identities were different with each sequencing method. Mycobacterium avium subsp. hominissuis was the species most frequently isolated from specimens from pigs, cervids, companion animals, cattle, and exotic/zoo animals. Conclusions rpoB sequencing proved useful in identifying Mycobacterium isolates of veterinary origin to clade, species, or subspecies levels, particularly for assemblages (such as the MAC) where 16S rRNA sequencing alone is not adequate to demarcate these taxa. rpoB sequencing can represent a cost-effective identification tool suitable for routine use in the veterinary diagnostic laboratory. PMID:22118247

  10. Characterization of minimal sequences associated with self-similar interval exchange maps

    NASA Astrophysics Data System (ADS)

    Cobo, Milton; Gutiérrez-Romo, Rodolfo; Maass, Alejandro

    2018-04-01

    The construction of affine interval exchange maps (IEMs) with wandering intervals that are semi-conjugate to a given self-similar IEM is strongly related to the existence of the so-called minimal sequences associated with local potentials, which are certain elements of the substitution subshift arising from the given IEM. In this article, under the condition called unique representation property, we characterize such minimal sequences for potentials coming from non-real eigenvalues of the substitution matrix. We also give conditions on the slopes of the affine extensions of a self-similar IEM that determine whether it exhibits a wandering interval or not.

  11. Genome Sequences of Akhmeta Virus, an Early Divergent Old World Orthopoxvirus.

    PubMed

    Gao, Jinxin; Gigante, Crystal; Khmaladze, Ekaterine; Liu, Pengbo; Tang, Shiyuyun; Wilkins, Kimberly; Zhao, Kun; Davidson, Whitni; Nakazawa, Yoshinori; Maghlakelidze, Giorgi; Geleishvili, Marika; Kokhreidze, Maka; Carroll, Darin S; Emerson, Ginny; Li, Yu

    2018-05-12

    Annotated whole genome sequences of three isolates of the Akhmeta virus (AKMV), a novel species of orthopoxvirus (OPXV), isolated from the Akhmeta and Vani regions of the country Georgia, are presented and discussed. The AKMV genome is similar in genomic content and structure to that of the cowpox virus (CPXV), but a lower sequence identity was found between AKMV and Old World OPXVs than between other known species of Old World OPXVs. Phylogenetic analysis showed that AKMV diverged prior to other Old World OPXV. AKMV isolates formed a monophyletic clade in the OPXV phylogeny, yet the sequence variability between AKMV isolates was higher than between the monkeypox virus strains in the Congo basin and West Africa. An AKMV isolate from Vani contained approximately six kb sequence in the left terminal region that shared a higher similarity with CPXV than with other AKMV isolates, whereas the rest of the genome was most similar to AKMV, suggesting recombination between AKMV and CPXV in a region containing several host range and virulence genes.

  12. Phonotactics, Neighborhood Activation, and Lexical Access for Spoken Words

    PubMed Central

    Vitevitch, Michael S.; Luce, Paul A.; Pisoni, David B.; Auer, Edward T.

    2012-01-01

    Probabilistic phonotactics refers to the relative frequencies of segments and sequences of segments in spoken words. Neighborhood density refers to the number of words that are phonologically similar to a given word. Despite a positive correlation between phonotactic probability and neighborhood density, nonsense words with high probability segments and sequences are responded to more quickly than nonsense words with low probability segments and sequences, whereas real words occurring in dense similarity neighborhoods are responded to more slowly than real words occurring in sparse similarity neighborhoods. This contradiction may be resolved by hypothesizing that effects of probabilistic phonotactics have a sublexical focus and that effects of similarity neighborhood density have a lexical focus. The implications of this hypothesis for models of spoken word recognition are discussed. PMID:10433774

  13. Protein-protein interaction network-based detection of functionally similar proteins within species.

    PubMed

    Song, Baoxing; Wang, Fen; Guo, Yang; Sang, Qing; Liu, Min; Li, Dengyun; Fang, Wei; Zhang, Deli

    2012-07-01

    Although functionally similar proteins across species have been widely studied, functionally similar proteins within species showing low sequence similarity have not been examined in detail. Identification of these proteins is of significant importance for understanding biological functions, evolution of protein families, progression of co-evolution, and convergent evolution and others which cannot be obtained by detection of functionally similar proteins across species. Here, we explored a method of detecting functionally similar proteins within species based on graph theory. After denoting protein-protein interaction networks using graphs, we split the graphs into subgraphs using the 1-hop method. Proteins with functional similarities in a species were detected using a method of modified shortest path to compare these subgraphs and to find the eligible optimal results. Using seven protein-protein interaction networks and this method, some functionally similar proteins with low sequence similarity that cannot detected by sequence alignment were identified. By analyzing the results, we found that, sometimes, it is difficult to separate homologous from convergent evolution. Evaluation of the performance of our method by gene ontology term overlap showed that the precision of our method was excellent. Copyright © 2012 Wiley Periodicals, Inc.

  14. 32 CFR 179.7 - Sequencing.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... and social factors. (3) Economic factors, including economic considerations pertaining to... alternatives to responses that entail significant capital investments, a lengthy period of operation, or costly...

  15. 32 CFR 179.7 - Sequencing.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... and social factors. (3) Economic factors, including economic considerations pertaining to... alternatives to responses that entail significant capital investments, a lengthy period of operation, or costly...

  16. Fibonacci chain polynomials: Identities from self-similarity

    NASA Technical Reports Server (NTRS)

    Lang, Wolfdieter

    1995-01-01

    Fibonacci chains are special diatomic, harmonic chains with uniform nearest neighbor interaction and two kinds of atoms (mass-ratio r) arranged according to the self-similar binary Fibonacci sequence ABAABABA..., which is obtained by repeated substitution of A yields AB and B yields A. The implications of the self-similarity of this sequence for the associated orthogonal polynomial systems which govern these Fibonacci chains with fixed mass-ratio r are studied.

  17. Characterization of kinetoplast DNA from Phytomonas serpens.

    PubMed

    Sá-Carvalho, D; Perez-Morga, D; Traub-Cseko, Y M

    1993-01-01

    The restriction enzyme digestion of kinetoplast DNA from four Phytomonas serpens isolates shows an overall similar band pattern. One minicircle from isolate 30T was cloned and sequenced, showing low levels of homology but the same general features and organization as described for minicircles of other trypanosomatids. Extensive regions of the minicircle are composed by G and T on the H strand. These regions are very repetitive and similar to regions in a minicircle of Crithidia oncopelti and to telomeric sequences of Saccharomyces cerevisiae. Conserved Sequence Block 3, present in all trypanosomatids, is one nucleotide different from the consensus in P. serpens and provides a basis to differentiate P. serpens from other trypanosomatids. Electron microscopy of kinetoplast DNA evidenced a network with organization similar to other trypanosomatids and the measurement of minicircles confirmed the size of about 1.45 kb of the sequenced minicircle.

  18. A putative carbohydrate-binding domain of the lactose-binding Cytisus sessilifolius anti-H(O) lectin has a similar amino acid sequence to that of the L-fucose-binding Ulex europaeus anti-H(O) lectin.

    PubMed

    Konami, Y; Yamamoto, K; Osawa, T; Irimura, T

    1995-04-01

    The complete amino acid sequence of a lactose-binding Cytisus sessilifolius anti-H(O) lectin II (CSA-II) was determined using a protein sequencer. After digestion of CSA-II with endoproteinase Lys-C or Asp-N, the resulting peptides were purified by reversed-phase high performance liquid chromatography (HPLC) and then subjected to sequence analysis. Comparison of the complete amino acid sequence of CSA-II with the sequences of other leguminous seed lectins revealed regions of extensive homology. The amino acid sequence of a putative carbohydrate-binding domain of CSA-II was found to be similar to those of several anti-H(O) leguminous lectins, especially to that of the L-fucose-binding Ulex europaeus lectin I (UEA-I).

  19. Analysis of HIV-1 intersubtype recombination breakpoints suggests region with high pairing probability may be a more fundamental factor than sequence similarity affecting HIV-1 recombination.

    PubMed

    Jia, Lei; Li, Lin; Gui, Tao; Liu, Siyang; Li, Hanping; Han, Jingwan; Guo, Wei; Liu, Yongjian; Li, Jingyun

    2016-09-21

    With increasing data on HIV-1, a more relevant molecular model describing mechanism details of HIV-1 genetic recombination usually requires upgrades. Currently an incomplete structural understanding of the copy choice mechanism along with several other issues in the field that lack elucidation led us to perform an analysis of the correlation between breakpoint distributions and (1) the probability of base pairing, and (2) intersubtype genetic similarity to further explore structural mechanisms. Near full length sequences of URFs from Asia, Europe, and Africa (one sequence/patient), and representative sequences of worldwide CRFs were retrieved from the Los Alamos HIV database. Their recombination patterns were analyzed by jpHMM in detail. Then the relationships between breakpoint distributions and (1) the probability of base pairing, and (2) intersubtype genetic similarities were investigated. Pearson correlation test showed that all URF groups and the CRF group exhibit the same breakpoint distribution pattern. Additionally, the Wilcoxon two-sample test indicated a significant and inexplicable limitation of recombination in regions with high pairing probability. These regions have been found to be strongly conserved across distinct biological states (i.e., strong intersubtype similarity), and genetic similarity has been determined to be a very important factor promoting recombination. Thus, the results revealed an unexpected disagreement between intersubtype similarity and breakpoint distribution, which were further confirmed by genetic similarity analysis. Our analysis reveals a critical conflict between results from natural HIV-1 isolates and those from HIV-1-based assay vectors in which genetic similarity has been shown to be a very critical factor promoting recombination. These results indicate the region with high-pairing probabilities may be a more fundamental factor affecting HIV-1 recombination than sequence similarity in natural HIV-1 infections. Our findings will be relevant in furthering the understanding of HIV-1 recombination mechanisms.

  20. Comprehensive comparison of three commercial human whole-exome capture platforms.

    PubMed

    Asan; Xu, Yu; Jiang, Hui; Tyler-Smith, Chris; Xue, Yali; Jiang, Tao; Wang, Jiawei; Wu, Mingzhi; Liu, Xiao; Tian, Geng; Wang, Jun; Wang, Jian; Yang, Huangming; Zhang, Xiuqing

    2011-09-28

    Exome sequencing, which allows the global analysis of protein coding sequences in the human genome, has become an effective and affordable approach to detecting causative genetic mutations in diseases. Currently, there are several commercial human exome capture platforms; however, the relative performances of these have not been characterized sufficiently to know which is best for a particular study. We comprehensively compared three platforms: NimbleGen's Sequence Capture Array and SeqCap EZ, and Agilent's SureSelect. We assessed their performance in a variety of ways, including number of genes covered and capture efficacy. Differences that may impact on the choice of platform were that Agilent SureSelect covered approximately 1,100 more genes, while NimbleGen provided better flanking sequence capture. Although all three platforms achieved similar capture specificity of targeted regions, the NimbleGen platforms showed better uniformity of coverage and greater genotype sensitivity at 30- to 100-fold sequencing depth. All three platforms showed similar power in exome SNP calling, including medically relevant SNPs. Compared with genotyping and whole-genome sequencing data, the three platforms achieved a similar accuracy of genotype assignment and SNP detection. Importantly, all three platforms showed similar levels of reproducibility, GC bias and reference allele bias. We demonstrate key differences between the three platforms, particularly advantages of solutions over array capture and the importance of a large gene target set.

  1. Two new miniature inverted-repeat transposable elements in the genome of the clam Donax trunculus.

    PubMed

    Šatović, Eva; Plohl, Miroslav

    2017-10-01

    Repetitive sequences are important components of eukaryotic genomes that drive their evolution. Among them are different types of mobile elements that share the ability to spread throughout the genome and form interspersed repeats. To broaden the generally scarce knowledge on bivalves at the genome level, in the clam Donax trunculus we described two new non-autonomous DNA transposons, miniature inverted-repeat transposable elements (MITEs), named DTC M1 and DTC M2. Like other MITEs, they are characterized by their small size, their A + T richness, and the presence of terminal inverted repeats (TIRs). DTC M1 and DTC M2 are 261 and 286 bp long, respectively, and in addition to TIRs, both of them contain a long imperfect palindrome sequence in their central parts. These elements are present in complete and truncated versions within the genome of the clam D. trunculus. The two new MITEs share only structural similarity, but lack any nucleotide sequence similarity to each other. In a search for related elements in databases, blast search revealed within the Crassostrea gigas genome a larger element sharing sequence similarity only to DTC M1 in its TIR sequences. The lack of sequence similarity with any previously published mobile elements indicates that DTC M1 and DTC M2 elements may be unique to D. trunculus.

  2. Theory of winds in late-type evolved and pre-main-sequence stars

    NASA Technical Reports Server (NTRS)

    Macgregor, K. B.

    1983-01-01

    Recent observational results confirm that many of the physical processes which are known to occur in the Sun also occur among late-type stars in general. One such process is the continuous loss of mass from a star in the form of a wind. There now exists an abundance of either direct or circumstantial evidence which suggests that most (if not all) stars in the cool portion of the HR diagram possess winds. An attempt is made to assess the current state of theoretical understanding of mass loss from two distinctly different classes of late-type stars: the post-main-sequence giant/supergiant stars and the pre-main-sequence T Tauri stars. Toward this end, the observationally inferred properties of the wind associated with each of the two stellar classes under consideration are summarized and compared against the predictions of existing theoretical models. Although considerable progress has been made in attempting to identify the mechanisms responsible for mass loss from cool stars, many fundamental problems remain to be solved.

  3. Archaeal phylogeny: reexamination of the phylogenetic position of Archaeoglobus fulgidus in light of certain composition-induced artifacts

    NASA Technical Reports Server (NTRS)

    Woese, C. R.; Achenbach, L.; Rouviere, P.; Mandelco, L.

    1991-01-01

    A major and too little recognized source of artifact in phylogenetic analysis of molecular sequence data is compositional difference among sequences. The problem becomes particularly acute when alignments contain ribosomal RNAs from both mesophilic and thermophilic species. Among prokaryotes the latter are considerably higher in G + C content than the former, which often results in artificial clustering of thermophilic lineages and their being placed artificially deep in phylogenetic trees. In this communication we review archaeal phylogeny in the light of this consideration, focusing in particular on the phylogenetic position of the sulfate reducing species Archaeoglobus fulgidus, using both 16S rRNA and 23S rRNA sequences. The analysis shows clearly that the previously reported deep branching of the A. fulgidus lineage (very near the base of the euryarchaeal side of the archaeal tree) is incorrect, and that the lineage actually groups with a previously recognized unit that comprises the Methanomicrobiales and extreme halophiles.

  4. The language faculty that wasn't: a usage-based account of natural language recursion

    PubMed Central

    Christiansen, Morten H.; Chater, Nick

    2015-01-01

    In the generative tradition, the language faculty has been shrinking—perhaps to include only the mechanism of recursion. This paper argues that even this view of the language faculty is too expansive. We first argue that a language faculty is difficult to reconcile with evolutionary considerations. We then focus on recursion as a detailed case study, arguing that our ability to process recursive structure does not rely on recursion as a property of the grammar, but instead emerges gradually by piggybacking on domain-general sequence learning abilities. Evidence from genetics, comparative work on non-human primates, and cognitive neuroscience suggests that humans have evolved complex sequence learning skills, which were subsequently pressed into service to accommodate language. Constraints on sequence learning therefore have played an important role in shaping the cultural evolution of linguistic structure, including our limited abilities for processing recursive structure. Finally, we re-evaluate some of the key considerations that have often been taken to require the postulation of a language faculty. PMID:26379567

  5. The language faculty that wasn't: a usage-based account of natural language recursion.

    PubMed

    Christiansen, Morten H; Chater, Nick

    2015-01-01

    In the generative tradition, the language faculty has been shrinking-perhaps to include only the mechanism of recursion. This paper argues that even this view of the language faculty is too expansive. We first argue that a language faculty is difficult to reconcile with evolutionary considerations. We then focus on recursion as a detailed case study, arguing that our ability to process recursive structure does not rely on recursion as a property of the grammar, but instead emerges gradually by piggybacking on domain-general sequence learning abilities. Evidence from genetics, comparative work on non-human primates, and cognitive neuroscience suggests that humans have evolved complex sequence learning skills, which were subsequently pressed into service to accommodate language. Constraints on sequence learning therefore have played an important role in shaping the cultural evolution of linguistic structure, including our limited abilities for processing recursive structure. Finally, we re-evaluate some of the key considerations that have often been taken to require the postulation of a language faculty.

  6. Complete Genome Sequences of the Carlavirus Sweet potato chlorotic fleck virus from East Timor and Australia

    PubMed Central

    Maina, Solomon; Edwards, Owain R.; de Almeida, Luis; Ximenes, Abel

    2016-01-01

    We present here the first complete genome sequences of Sweet potato chlorotic fleck virus (SPCFV) from sweet potato in Australia and East Timor, and we compare these with four complete SPCFV genomes from South Korea and one from Uganda. The Australian, East Timorese, South Korean, and Ugandan genomes differed considerably from each other. PMID:27231359

  7. Colony-PCR Is a Rapid Method for DNA Amplification of Hyphomycetes

    PubMed Central

    Walch, Georg; Knapp, Maria; Rainer, Georg; Peintner, Ursula

    2016-01-01

    Fungal pure cultures identified with both classical morphological methods and through barcoding sequences are a basic requirement for reliable reference sequences in public databases. Improved techniques for an accelerated DNA barcode reference library construction will result in considerably improved sequence databases covering a wider taxonomic range. Fast, cheap, and reliable methods for obtaining DNA sequences from fungal isolates are, therefore, a valuable tool for the scientific community. Direct colony PCR was already successfully established for yeasts, but has not been evaluated for a wide range of anamorphic soil fungi up to now, and a direct amplification protocol for hyphomycetes without tissue pre-treatment has not been published so far. Here, we present a colony PCR technique directly from fungal hyphae without previous DNA extraction or other prior manipulation. Seven hundred eighty-eight fungal strains from 48 genera were tested with a success rate of 86%. PCR success varied considerably: DNA of fungi belonging to the genera Cladosporium, Geomyces, Fusarium, and Mortierella could be amplified with high success. DNA of soil-borne yeasts was always successfully amplified. Absidia, Mucor, Trichoderma, and Penicillium isolates had noticeably lower PCR success. PMID:29376929

  8. Analysis of sequence repeats of proteins in the PDB.

    PubMed

    Mary Rajathei, David; Selvaraj, Samuel

    2013-12-01

    Internal repeats in protein sequences play a significant role in the evolution of protein structure and function. Applications of different bioinformatics tools help in the identification and characterization of these repeats. In the present study, we analyzed sequence repeats in a non-redundant set of proteins available in the Protein Data Bank (PDB). We used RADAR for detecting internal repeats in a protein, PDBeFOLD for assessing structural similarity, PDBsum for finding functional involvement and Pfam for domain assignment of the repeats in a protein. Through the analysis of sequence repeats, we found that identity of the sequence repeats falls in the range of 20-40% and, the superimposed structures of the most of the sequence repeats maintain similar overall folding. Analysis sequence repeats at the functional level reveals that most of the sequence repeats are involved in the function of the protein through functionally involved residues in the repeat regions. We also found that sequence repeats in single and two domain proteins often contained conserved sequence motifs for the function of the domain. Copyright © 2013 Elsevier Ltd. All rights reserved.

  9. What is a melody? On the relationship between pitch and brightness of timbre.

    PubMed

    Cousineau, Marion; Carcagno, Samuele; Demany, Laurent; Pressnitzer, Daniel

    2013-01-01

    Previous studies showed that the perceptual processing of sound sequences is more efficient when the sounds vary in pitch than when they vary in loudness. We show here that sequences of sounds varying in brightness of timbre are processed with the same efficiency as pitch sequences. The sounds used consisted of two simultaneous pure tones one octave apart, and the listeners' task was to make same/different judgments on pairs of sequences varying in length (one, two, or four sounds). In one condition, brightness of timbre was varied within the sequences by changing the relative level of the two pure tones. In other conditions, pitch was varied by changing fundamental frequency, or loudness was varied by changing the overall level. In all conditions, only two possible sounds could be used in a given sequence, and these two sounds were equally discriminable. When sequence length increased from one to four, discrimination performance decreased substantially for loudness sequences, but to a smaller extent for brightness sequences and pitch sequences. In the latter two conditions, sequence length had a similar effect on performance. These results suggest that the processes dedicated to pitch and brightness analysis, when probed with a sequence-discrimination task, share unexpected similarities.

  10. Genetic testing for inherited ocular disease: delivering on the promise at last?

    PubMed

    Gillespie, Rachel L; Hall, Georgina; Black, Graeme C

    2014-01-01

    Genetic testing is of increasing clinical utility for diagnosing inherited eye disease. Clarifying a clinical diagnosis is important for accurate estimation of prognosis, facilitating genetic counselling and management of families, and in the future will direct gene-specific therapeutic strategies. Often, precise diagnosis of genetic ophthalmic conditions is complicated by genetic heterogeneity, a difficulty that the so-called 'next-generation sequencing' technologies promise to overcome. Despite considerable counselling and ethical complexities, next-generation sequencing offers to revolutionize clinical practice. This will necessitate considerable adjustment to standard practice but has the power to deliver a personalized approach to genomic medicine for many more patients and enhance the potential for preventing vision loss. © 2013 Royal Australian and New Zealand College of Ophthalmologists.

  11. Fine-Scale Bacterial Beta Diversity within a Complex Ecosystem (Zodletone Spring, OK, USA): The Role of the Rare Biosphere

    PubMed Central

    Youssef, Noha H.; Couger, M. B.; Elshahed, Mostafa S.

    2010-01-01

    Background The adaptation of pyrosequencing technologies for use in culture-independent diversity surveys allowed for deeper sampling of ecosystems of interest. One extremely well suited area of interest for pyrosequencing-based diversity surveys that has received surprisingly little attention so far, is examining fine scale (e.g. micrometer to millimeter) beta diversity in complex microbial ecosystems. Methodology/Principal Findings We examined the patterns of fine scale Beta diversity in four adjacent sediment samples (1mm apart) from the source of an anaerobic sulfide and sulfur rich spring (Zodletone spring) in southwestern Oklahoma, USA. Using pyrosequencing, a total of 292,130 16S rRNA gene sequences were obtained. The beta diversity patterns within the four datasets were examined using various qualitative and quantitative similarity indices. Low levels of Beta diversity (high similarity indices) were observed between the four samples at the phylum-level. However, at a putative species (OTU0.03) level, higher levels of beta diversity (lower similarity indices) were observed. Further examination of beta diversity patterns within dominant and rare members of the community indicated that at the putative species level, beta diversity is much higher within rare members of the community. Finally, sub-classification of rare members of Zodletone spring community based on patterns of novelty and uniqueness, and further examination of fine scale beta diversity of each of these subgroups indicated that members of the community that are unique, but non novel showed the highest beta diversity within these subgroups of the rare biosphere. Conclusions/Significance The results demonstrate the occurrence of high inter-sample diversity within seemingly identical samples from a complex habitat. We reason that such unexpected diversity should be taken into consideration when exploring gamma diversity of various ecosystems, as well as planning for sequencing-intensive metagenomic surveys of highly complex ecosystems. PMID:20865128

  12. TRANSAT-- method for detecting the conserved helices of functional RNA structures, including transient, pseudo-knotted and alternative structures.

    PubMed

    Wiebe, Nicholas J P; Meyer, Irmtraud M

    2010-06-24

    The prediction of functional RNA structures has attracted increased interest, as it allows us to study the potential functional roles of many genes. RNA structure prediction methods, however, assume that there is a unique functional RNA structure and also do not predict functional features required for in vivo folding. In order to understand how functional RNA structures form in vivo, we require sophisticated experiments or reliable prediction methods. So far, there exist only a few, experimentally validated transient RNA structures. On the computational side, there exist several computer programs which aim to predict the co-transcriptional folding pathway in vivo, but these make a range of simplifying assumptions and do not capture all features known to influence RNA folding in vivo. We want to investigate if evolutionarily related RNA genes fold in a similar way in vivo. To this end, we have developed a new computational method, Transat, which detects conserved helices of high statistical significance. We introduce the method, present a comprehensive performance evaluation and show that Transat is able to predict the structural features of known reference structures including pseudo-knotted ones as well as those of known alternative structural configurations. Transat can also identify unstructured sub-sequences bound by other molecules and provides evidence for new helices which may define folding pathways, supporting the notion that homologous RNA sequence not only assume a similar reference RNA structure, but also fold similarly. Finally, we show that the structural features predicted by Transat differ from those assuming thermodynamic equilibrium. Unlike the existing methods for predicting folding pathways, our method works in a comparative way. This has the disadvantage of not being able to predict features as function of time, but has the considerable advantage of highlighting conserved features and of not requiring a detailed knowledge of the cellular environment.

  13. Transcriptome analysis of genes related to resistance against powdery mildew in wheat-Thinopyrum alien addition disomic line germplasm SN6306.

    PubMed

    Li, Quanquan; Niu, Zubiao; Bao, Yinguang; Tian, Qiuju; Wang, Honggang; Kong, Lingrang; Feng, Deshun

    2016-09-15

    Wheat powdery mildew, which is mainly caused by Blumeria graminis f. sp. tritici (Bgt), seriously damages wheat production. The wheat-Thinopyrum intermedium alien addition disomic line germplasm SN6306, being one of the important sources of genes for wheat resistance, is highly resistant to Bgt E09 and to many other powdery mildew physiological races. However, knowledge on the resistance mechanism of SN6306 remains limited. Our study employed high-throughput RNA sequencing based on next-generation sequencing technology (Illumina) to obtain an overview of the transcriptome characteristics of SN6306 and its parent wheat Yannong 15 (YN15) during Bgt infection. The sequencing generated 104,773 unigenes, 9909 of which showed varied expression levels. Among the 9909 unigenes, 1678 unigenes showed 0 reads in YN15. The expression levels in Bgt-inoculated SN6306 and YN15 of exactly 39 unigenes that showed 0 or considerably low reads in YN15 were validated to identify the genes involved in Bgt resistance. Among the 39 unigenes, 12 unigenes were upregulated in SN6306 by 3-45 times. These unigenes mainly encoded kinase, synthase, proteases, and signal transduction proteins, which may play an important role in the resistance against Bgt. To confirm whether the unigenes that showed 0 reads in YN15 are really unique to SN6306, 8 unigenes were cloned and sequenced. Results showed that the selected unigenes are more similar to SN6306 and Th. intermedium than to the wheat cultivar YN15. The sequencing results further confirmed that the unigenes showing 0 reads in YN15 are unique to SN6306 and are most likely derived from Th. intermedium (Host) Nevski. Thus, the genes from Th. intermedium most probably conferred the resistance of SN6306 to Bgt. Copyright © 2016 Elsevier B.V. All rights reserved.

  14. Molecular characterization of a novel orthomyxovirus from rainbow and steelhead trout (Oncorhynchus mykiss)

    USGS Publications Warehouse

    Batts, William N.; LaPatra, Scott E.; Katona, Ryan; Leis, Eric; Fei Fan Ng, Terry; Bruieuc, Marine S.O.; Breyta, Rachel; Purcell, Maureen; Waltzek, Thomas B.; Delwart, Eric; Winton, James

    2017-01-01

    A novel virus, rainbow trout orthomyxovirus (RbtOV), was isolated in 1997 and again in 2000 from commercially-reared rainbow trout (Oncorhynchus mykiss) in Idaho, USA. The virus grew optimally in the CHSE-214 cell line at 15°C producing a diffuse cytopathic effect; however, juvenile rainbow trout exposed to cell culture-grown virus showed no mortality or gross pathology. Electron microscopy of preparations from infected cell cultures revealed the presence of typical orthomyxovirus particles. The complete genome of RbtOV is comprised of eight linear segments of single-stranded, negative-sense RNA having highly conserved 5′ and 3′-terminal nucleotide sequences. Another virus isolated in 2014 from steelhead trout (also O. mykiss) in Wisconsin, USA, and designated SttOV was found to have eight genome segments with high amino acid sequence identities (89–99%) to the corresponding genes of RbtOV, suggesting these new viruses are isolates of the same virus species and may be more widespread than currently realized. The new isolates had the same genome segment order and the closest pairwise amino acid sequence identities of 16–42% with Infectious salmon anemia virus (ISAV), the type species and currently only member of the genus Isavirus in the family Orthomyxoviridae. However, pairwise comparisons of the predicted amino acid sequences of the 10 RbtOV and SttOV proteins with orthologs from representatives of the established orthomyxoviral genera and a phylogenetic analysis using the PB1 protein showed that while RbtOV and SttOV clustered most closely with ISAV, they diverged sufficiently to merit consideration as representatives of a novel genus. A set of PCR primers was designed using conserved regions of the PB1 gene to produce amplicons that may be sequenced for identification of similar fish orthomyxoviruses in the future.

  15. SU-F-T-540: Comprehensive Fluence Delivery Optimization with Multileaf Collimation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Weppler, S; Villarreal-Barajas, J; Department of Medical Physics, Tom Baker Cancer Center, Calgary, Alberta

    2016-06-15

    Purpose: Multileaf collimator (MLC) leaf sequencing is performed via commercial black-box implementations, on which a user has limited to no access. We have developed an explicit, generic MLC sequencing model to serve as a tool for future investigations of fluence map optimization, fluence delivery optimization, and rotational collimator delivery methods. Methods: We have developed a novel, comprehensive model to effectively account for a variety of transmission and penumbra effects previously treated on an ad hoc basis in the literature. As the model is capable of quantifying a variety of effects, we utilize the asymmetric leakage intensity across each leaf tomore » deliver fluence maps with pixel size smaller than the narrowest leaf width. Developed using linear programming and mixed integer programming formulations, the model is implemented using state of the art open-source solvers. To demonstrate the versatility of the algorithm, a graphical user interface (GUI) was developed in MATLAB capable of accepting custom leaf specifications and transmission parameters. As a preliminary proof-ofconcept, we have sequenced the leaves of a Varian 120 Leaf Millennium MLC for five prostate cancer patient fields and one head and neck field. Predetermined fluence maps have been processed by data smoothing methods to obtain pixel sizes of 2.5 cm{sup 2}. The quality of output was analyzed using computer simulations. Results: For the prostate fields, an average root mean squared error (RMSE) of 0.82 and gamma (0.5mm/0.5%) of 91.4% were observed compared to RMSE and gamma (0.5mm/0.5%) values of 7.04 and 34.0% when the leakage considerations were omitted. Similar results were observed for the head and neck case. Conclusion: A model to sequence MLC leaves to optimality has been proposed. Future work will involve extensive testing and evaluation of the method on clinical MLCs and comparison with black-box leaf sequencing algorithms currently used by commercial treatment planning systems.« less

  16. Paleomagnetic and Geochronologic Data from Central Asia: Inferences for Early Paleozoic Tectonic Evolution and Timing of Worldwide Glacial Events

    NASA Astrophysics Data System (ADS)

    Gregory, L. C.; Meert, J. G.; Levashova, N.; Grice, W. C.; Gibsher, A.; Rybanin, A.

    2007-12-01

    The Neoproterozoic to early Paleozoic Ural-Mongol belt that runs through Central Asia is crucial for determining the enigmatic amalgamation of microcontinents that make up the Eurasian subcontinent. Two unique models have been proposed for the evolution of Ural-Mongol belt. One involves a complex assemblage of cratonic blocks that have collided and rifted apart during diachronous opening and closing of Neoproterozoic to Devonian aged ocean basins. The opposing model of Sengor and Natal"in proposes a long-standing volcanic arc system that connected Central Asian blocks with the Baltica continent. The Aktau-Mointy and Dzabkhan microcontinents in Kazakhstan and Central Mongolia make up the central section of the Ural-Mongol belt, and both contain glacial sequences characteristic of the hypothesized snowball earth event. These worldwide glaciations are currently under considerable debate, and paleomagnetic data from these microcontients are a useful contribution to the snowball controversy. We have sampled volcanic and sedimentary sequences in Central Mongolia, Kazakhstan and Kyrgyzstan for paleomagnetic and geochronologic study. U-Pb data, 13C curves and abundant fossil records place age constraints on sequences that contain glacial deposits of the hypothesized snowball earth events. Carbonates in the Zavkhan Basin in Mongolia are likely remagnetized, but fossil evidence within the sequence suggests a readjusted age control on two glacial events that were previously labeled as Sturtian and Marinoan. U-Pb ages from both Kazakhstan and Mongolian volcanic sequences imply a similar evolution history of the areas as part of the Ural-Mongol fold belt, and these ages paired with paleomagnetic and 13C records have important tectonic implications. We will present these data in order to place better constraints on the Precambrian to early Paleozoic tectonic evolution of Central Asia and the timing of glacial events recorded in the area.

  17. Changes in environmental conditions as the cause of the marine biota Great Mass Extinction at the Triassic-Jurassic boundary

    NASA Astrophysics Data System (ADS)

    Barash, M. S.

    2016-02-01

    In the interval of the Triassic-Jurassic boundary, 80% of the marine species became extinct. Four main hypotheses about the causes of this mass extinction are considered: volcanism, climatic oscillations, sea level variations accompanied by anoxia, and asteroid impact events. The extinction was triggered by an extensive flooding of basalts in the Central Atlantic Magmatic Province. Furthermore, a number of meteoritic craters have been found. Under the effect of cosmic causes, two main sequences of events developed on the Earth: terrestrial ones, leading to intensive volcanism, and cosmic ones (asteroid impacts). Their aftermaths, however, were similar in terms of the chemical compounds and aerosols released. As a consequence, the greenhouse effect, dimming of the atmosphere (impeding photosynthesis), ocean stagnation, and anoxia emerged. Then, biological productivity decreased and food chains were destroyed. Thus, the entire ecosystem was disturbed and a considerable part of the biota became extinct.

  18. Mitochondrial genomes of parasitic flatworms.

    PubMed

    Le, Thanh H; Blair, David; McManus, Donald P

    2002-05-01

    Complete or near-complete mitochondrial genomes are now available for 11 species or strains of parasitic flatworms belonging to the Trematoda and the Cestoda. The organization of these genomes is not strikingly different from those of other eumetazoans, although one gene (atp8) commonly found in other phyla is absent from flatworms. The gene order in most flatworms has similarities to those seen in higher protostomes such as annelids. However, the gene order has been drastically altered in Schistosoma mansoni, which obscures this possible relationship. Among the sequenced taxa, base composition varies considerably, creating potential difficulties for phylogeny reconstruction. Long non-coding regions are present in all taxa, but these vary in length from only a few hundred to approximately 10000 nucleotides. Among Schistosoma spp., the long non-coding regions are rich in repeats and length variation among individuals is known. Data from mitochondrial genomes are valuable for studies on species identification, phylogenies and biogeography.

  19. Super high compression of line drawing data

    NASA Technical Reports Server (NTRS)

    Cooper, D. B.

    1976-01-01

    Models which can be used to accurately represent the type of line drawings which occur in teleconferencing and transmission for remote classrooms and which permit considerable data compression were described. The objective was to encode these pictures in binary sequences of shortest length but such that the pictures can be reconstructed without loss of important structure. It was shown that exploitation of reasonably simple structure permits compressions in the range of 30-100 to 1. When dealing with highly stylized material such as electronic or logic circuit schematics, it is unnecessary to reproduce configurations exactly. Rather, the symbols and configurations must be understood and be reproduced, but one can use fixed font symbols for resistors, diodes, capacitors, etc. Compression of pictures of natural phenomena such as can be realized by taking a similar approach, or essentially zero error reproducibility can be achieved but at a lower level of compression.

  20. Use of a FORTH-based PROLOG for real-time expert systems. 1: Spacelab life sciences experiment application

    NASA Technical Reports Server (NTRS)

    Paloski, William H.; Odette, Louis L.; Krever, Alfred J.; West, Allison K.

    1987-01-01

    A real-time expert system is being developed to serve as the astronaut interface for a series of Spacelab vestibular experiments. This expert system is written in a version of Prolog that is itself written in Forth. The Prolog contains a predicate that can be used to execute Forth definitions; thus, the Forth becomes an embedded real-time operating system within the Prolog programming environment. The expert system consists of a data base containing detailed operational instructions for each experiment, a rule base containing Prolog clauses used to determine the next step in an experiment sequence, and a procedure base containing Prolog goals formed from real-time routines coded in Forth. In this paper, we demonstrate and describe the techniques and considerations used to develop this real-time expert system, and we conclude that Forth-based Prolog provides a viable implementation vehicle for this and similar applications.

  1. Gait Analysis Methods for Rodent Models of Osteoarthritis

    PubMed Central

    Jacobs, Brittany Y.; Kloefkorn, Heidi E.; Allen, Kyle D.

    2014-01-01

    Patients with osteoarthritis (OA) primarily seek treatment due to pain and disability, yet the primary endpoints for rodent OA models tend to be histological measures of joint destruction. The discrepancy between clinical and preclinical evaluations is problematic, given that radiographic evidence of OA in humans does not always correlate to the severity of patient-reported symptoms. Recent advances in behavioral analyses have provided new methods to evaluate disease sequelae in rodents. Of particular relevance to rodent OA models are methods to assess rodent gait. While obvious differences exist between quadrupedal and bipedal gait sequences, the gait abnormalities seen in humans and in rodent OA models reflect similar compensatory behaviors that protect an injured limb from loading. The purpose of this review is to describe these compensations and current methods used to assess rodent gait characteristics, while detailing important considerations for the selection of gait analysis methods in rodent OA models. PMID:25160712

  2. PipeOnline 2.0: automated EST processing and functional data sorting.

    PubMed

    Ayoubi, Patricia; Jin, Xiaojing; Leite, Saul; Liu, Xianghui; Martajaja, Jeson; Abduraham, Abdurashid; Wan, Qiaolan; Yan, Wei; Misawa, Eduardo; Prade, Rolf A

    2002-11-01

    Expressed sequence tags (ESTs) are generated and deposited in the public domain, as redundant, unannotated, single-pass reactions, with virtually no biological content. PipeOnline automatically analyses and transforms large collections of raw DNA-sequence data from chromatograms or FASTA files by calling the quality of bases, screening and removing vector sequences, assembling and rewriting consensus sequences of redundant input files into a unigene EST data set and finally through translation, amino acid sequence similarity searches, annotation of public databases and functional data. PipeOnline generates an annotated database, retaining the processed unigene sequence, clone/file history, alignments with similar sequences, and proposed functional classification, if available. Functional annotation is automatic and based on a novel method that relies on homology of amino acid sequence multiplicity within GenBank records. Records are examined through a function ordered browser or keyword queries with automated export of results. PipeOnline offers customization for individual projects (MyPipeOnline), automated updating and alert service. PipeOnline is available at http://stress-genomics.org.

  3. galaxie--CGI scripts for sequence identification through automated phylogenetic analysis.

    PubMed

    Nilsson, R Henrik; Larsson, Karl-Henrik; Ursing, Björn M

    2004-06-12

    The prevalent use of similarity searches like BLAST to identify sequences and species implicitly assumes the reference database to be of extensive sequence sampling. This is often not the case, restraining the correctness of the outcome as a basis for sequence identification. Phylogenetic inference outperforms similarity searches in retrieving correct phylogenies and consequently sequence identities, and a project was initiated to design a freely available script package for sequence identification through automated Web-based phylogenetic analysis. Three CGI scripts were designed to facilitate qualified sequence identification from a Web interface. Query sequences are aligned to pre-made alignments or to alignments made by ClustalW with entries retrieved from a BLAST search. The subsequent phylogenetic analysis is based on the PHYLIP package for inferring neighbor-joining and parsimony trees. The scripts are highly configurable. A service installation and a version for local use are found at http://andromeda.botany.gu.se/galaxiewelcome.html and http://galaxie.cgb.ki.se

  4. Sequence information signal processor for local and global string comparisons

    DOEpatents

    Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

    1997-01-01

    A sequence information signal processing integrated circuit chip designed to perform high speed calculation of a dynamic programming algorithm based upon the algorithm defined by Waterman and Smith. The signal processing chip of the present invention is designed to be a building block of a linear systolic array, the performance of which can be increased by connecting additional sequence information signal processing chips to the array. The chip provides a high speed, low cost linear array processor that can locate highly similar global sequences or segments thereof such as contiguous subsequences from two different DNA or protein sequences. The chip is implemented in a preferred embodiment using CMOS VLSI technology to provide the equivalent of about 400,000 transistors or 100,000 gates. Each chip provides 16 processing elements, and is designed to provide 16 bit, two's compliment operation for maximum score precision of between -32,768 and +32,767. It is designed to provide a comparison between sequences as long as 4,194,304 elements without external software and between sequences of unlimited numbers of elements with the aid of external software. Each sequence can be assigned different deletion and insertion weight functions. Each processor is provided with a similarity measure device which is independently variable. Thus, each processor can contribute to maximum value score calculation using a different similarity measure.

  5. An Alignment-Free Algorithm in Comparing the Similarity of Protein Sequences Based on Pseudo-Markov Transition Probabilities among Amino Acids

    PubMed Central

    Li, Yushuang; Yang, Jiasheng; Zhang, Yi

    2016-01-01

    In this paper, we have proposed a novel alignment-free method for comparing the similarity of protein sequences. We first encode a protein sequence into a 440 dimensional feature vector consisting of a 400 dimensional Pseudo-Markov transition probability vector among the 20 amino acids, a 20 dimensional content ratio vector, and a 20 dimensional position ratio vector of the amino acids in the sequence. By evaluating the Euclidean distances among the representing vectors, we compare the similarity of protein sequences. We then apply this method into the ND5 dataset consisting of the ND5 protein sequences of 9 species, and the F10 and G11 datasets representing two of the xylanases containing glycoside hydrolase families, i.e., families 10 and 11. As a result, our method achieves a correlation coefficient of 0.962 with the canonical protein sequence aligner ClustalW in the ND5 dataset, much higher than those of other 5 popular alignment-free methods. In addition, we successfully separate the xylanases sequences in the F10 family and the G11 family and illustrate that the F10 family is more heat stable than the G11 family, consistent with a few previous studies. Moreover, we prove mathematically an identity equation involving the Pseudo-Markov transition probability vector and the amino acids content ratio vector. PMID:27918587

  6. Sequence Alignment to Predict Across Species Susceptibility ...

    EPA Pesticide Factsheets

    Conservation of a molecular target across species can be used as a line-of-evidence to predict the likelihood of chemical susceptibility. The web-based Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was developed to simplify, streamline, and quantitatively assess protein sequence/structural similarity across taxonomic groups as a means to predict relative intrinsic susceptibility. The intent of the tool is to allow for evaluation of any potential protein target, so it is amenable to variable degrees of protein characterization, depending on available information about the chemical/protein interaction and the molecular target itself. To allow for flexibility in the analysis, a layered strategy was adopted for the tool. The first level of the SeqAPASS analysis compares primary amino acid sequences to a query sequence, calculating a metric for sequence similarity (including detection of candidate orthologs), the second level evaluates sequence similarity within selected domains (e.g., ligand-binding domain, DNA binding domain), and the third level of analysis compares individual amino acid residue positions identified as being of importance for protein conformation and/or ligand binding upon chemical perturbation. Each level of the SeqAPASS analysis provides increasing evidence to apply toward rapid, screening-level assessments of probable cross species susceptibility. Such analyses can support prioritization of chemicals for further ev

  7. Molecular studies on larvae of Pseudoterranova parasite of Trichiurus lepturus Linnaeus, 1758 and Pomatomus saltatrix (Linnaeus, 1766) off Brazilian waters.

    PubMed

    Borges, Juliana N; Cunha, Luiz F G; Miranda, Daniele F; Monteiro-Neto, Cassiano; Santos, Cláudia P

    2015-12-01

    Pseudoterranova larvae parasitizing cutlassfish Trichiurus lepturus and bluefish Pomatomus saltatrix from Southwest Atlantic coast of Brazil were studied in this work by morphological, ultrastructural and molecular approaches. The genetic analysis were performed for the ITS2 intergenic region specific for Pseudoterranova decipiens, the partial 28S (LSU) of ribosomal DNA and the mtDNA cox-1 region. We obtained results for the 28S region and mtDNA cox-1 that was amplified using the polymerase chain reaction and sequenced to evaluate the phylogenetic relationships between sequences of this study and sequences from the GenBank. The morphological profile indicated that all the nine specimens collected from both fish were L3 larvae of Pseudoterranova sp. The genetic profile confirmed the generic level but due to the absence of similar sequences for adult parasites on GenBank for the regions amplifyied, it was not possible to identify them to the species level. The sequences obtained presented 89% of similarity with Pseudoterranova decipiens (28S sequences) and Contracaecum osculatum B (mtDNA cox-1). The low similarity allied to the fact that the amplification with the specific primer for P. decipiens didn't occur, lead us to conclude that our sequences don't belong to P. decipiens complex.

  8. Visualizing and Clustering Protein Similarity Networks: Sequences, Structures, and Functions.

    PubMed

    Mai, Te-Lun; Hu, Geng-Ming; Chen, Chi-Ming

    2016-07-01

    Research in the recent decade has demonstrated the usefulness of protein network knowledge in furthering the study of molecular evolution of proteins, understanding the robustness of cells to perturbation, and annotating new protein functions. In this study, we aimed to provide a general clustering approach to visualize the sequence-structure-function relationship of protein networks, and investigate possible causes for inconsistency in the protein classifications based on sequences, structures, and functions. Such visualization of protein networks could facilitate our understanding of the overall relationship among proteins and help researchers comprehend various protein databases. As a demonstration, we clustered 1437 enzymes by their sequences and structures using the minimum span clustering (MSC) method. The general structure of this protein network was delineated at two clustering resolutions, and the second level MSC clustering was found to be highly similar to existing enzyme classifications. The clustering of these enzymes based on sequence, structure, and function information is consistent with each other. For proteases, the Jaccard's similarity coefficient is 0.86 between sequence and function classifications, 0.82 between sequence and structure classifications, and 0.78 between structure and function classifications. From our clustering results, we discussed possible examples of divergent evolution and convergent evolution of enzymes. Our clustering approach provides a panoramic view of the sequence-structure-function network of proteins, helps visualize the relation between related proteins intuitively, and is useful in predicting the structure and function of newly determined protein sequences.

  9. Mass fingerprinting of the venom and transcriptome of venom gland of scorpion Centruroides tecomanus.

    PubMed

    Valdez-Velázquez, Laura L; Quintero-Hernández, Verónica; Romero-Gutiérrez, Maria Teresa; Coronas, Fredy I V; Possani, Lourival D

    2013-01-01

    Centruroides tecomanus is a Mexican scorpion endemic of the State of Colima, that causes human fatalities. This communication describes a proteome analysis obtained from milked venom and a transcriptome analysis from a cDNA library constructed from two pairs of venom glands of this scorpion. High perfomance liquid chromatography separation of soluble venom produced 80 fractions, from which at least 104 individual components were identified by mass spectrometry analysis, showing to contain molecular masses from 259 to 44,392 Da. Most of these components are within the expected molecular masses for Na(+)- and K(+)-channel specific toxic peptides, supporting the clinical findings of intoxication, when humans are stung by this scorpion. From the cDNA library 162 clones were randomly chosen, from which 130 sequences of good quality were identified and were clustered in 28 contigs containing, each, two or more expressed sequence tags (EST) and 49 singlets with only one EST. Deduced amino acid sequence analysis from 53% of the total ESTs showed that 81% (24 sequences) are similar to known toxic peptides that affect Na(+)-channel activity, and 19% (7 unique sequences) are similar to K(+)-channel especific toxins. Out of the 31 sequences, at least 8 peptides were confirmed by direct Edman degradation, using components isolated directly from the venom. The remaining 19%, 4%, 4%, 15% and 5% of the ESTs correspond respectively to proteins involved in cellular processes, antimicrobial peptides, venom components, proteins without defined function and sequences without similarity in databases. Among the cloned genes are those similar to metalloproteinases.

  10. Purification and characterization of the restriction endonuclease RsrI, an isoschizomer of EcoRI.

    PubMed

    Greene, P J; Ballard, B T; Stephenson, F; Kohr, W J; Rodriguez, H; Rosenberg, J M; Boyer, H W

    1988-08-15

    Rhodobacter sphaeroides strain 630 produces restriction enzyme RsrI which is an isoschizomer of EcoRI. We have purified this enzyme and initiated a comparison with the EcoRI endonuclease. The properties of RsrI are consistent with a reaction mechanism similar to that of EcoRI: the position of cleavage within the -GAATTC-site is identical, the MgCl2 optimum for the cleavage is identical, and the pH profile is similar. Methylation of the substrate sequence by the EcoRI methylase protects the site from cleavage by the RsrI endonuclease. RsrI cross-reacts strongly with anti-EcoRI serum indicating three-dimensional structural similarities. We have determined the sequence of 34 N terminal amino acids for RsrI and this sequence possesses significant similarity to the EcoRI N terminus.

  11. Three 3D graphical representations of DNA primary sequences based on the classifications of DNA bases and their applications.

    PubMed

    Xie, Guosen; Mo, Zhongxi

    2011-01-21

    In this article, we introduce three 3D graphical representations of DNA primary sequences, which we call RY-curve, MK-curve and SW-curve, based on three classifications of the DNA bases. The advantages of our representations are that (i) these 3D curves are strictly non-degenerate and there is no loss of information when transferring a DNA sequence to its mathematical representation and (ii) the coordinates of every node on these 3D curves have clear biological implication. Two applications of these 3D curves are presented: (a) a simple formula is derived to calculate the content of the four bases (A, G, C and T) from the coordinates of nodes on the curves; and (b) a 12-component characteristic vector is constructed to compare similarity among DNA sequences from different species based on the geometrical centers of the 3D curves. As examples, we examine similarity among the coding sequences of the first exon of beta-globin gene from eleven species and validate similarity of cDNA sequences of beta-globin gene from eight species. Copyright © 2010 Elsevier Ltd. All rights reserved.

  12. Illustrative case studies in the return of exome and genome sequencing results

    PubMed Central

    Amendola, Laura M; Lautenbach, Denise; Scollon, Sarah; Bernhardt, Barbara; Biswas, Sawona; East, Kelly; Everett, Jessica; Gilmore, Marian J; Himes, Patricia; Raymond, Victoria M; Wynn, Julia; Hart, Ragan; Jarvik, Gail P

    2015-01-01

    Whole genome and exome sequencing tests are increasingly being ordered in clinical practice, creating a need for research exploring the return of results from these tests. A goal of the Clinical Sequencing and Exploratory Research (CSER) consortium is to gain experience with this process to develop best practice recommendations for offering exome and genome testing and returning results. Genetic counselors in the CSER consortium have an integral role in the return of results from these genomic sequencing tests and have gained valuable insight. We present seven emerging themes related to return of exome and genome sequencing results accompanied by case descriptions illustrating important lessons learned, counseling challenges specific to these tests and considerations for future research and practice. PMID:26478737

  13. Loeffler 4.0: Diagnostic Metagenomics.

    PubMed

    Höper, Dirk; Wylezich, Claudia; Beer, Martin

    2017-01-01

    A new world of possibilities for "virus discovery" was opened up with high-throughput sequencing becoming available in the last decade. While scientifically metagenomic analysis was established before the start of the era of high-throughput sequencing, the availability of the first second-generation sequencers was the kick-off for diagnosticians to use sequencing for the detection of novel pathogens. Today, diagnostic metagenomics is becoming the standard procedure for the detection and genetic characterization of new viruses or novel virus variants. Here, we provide an overview about technical considerations of high-throughput sequencing-based diagnostic metagenomics together with selected examples of "virus discovery" for animal diseases or zoonoses and metagenomics for food safety or basic veterinary research. © 2017 Elsevier Inc. All rights reserved.

  14. A Model of BGA Thermal Fatigue Life Prediction Considering Load Sequence Effects

    PubMed Central

    Hu, Weiwei; Li, Yaqiu; Sun, Yufeng; Mosleh, Ali

    2016-01-01

    Accurate testing history data is necessary for all fatigue life prediction approaches, but such data is always deficient especially for the microelectronic devices. Additionally, the sequence of the individual load cycle plays an important role in physical fatigue damage. However, most of the existing models based on the linear damage accumulation rule ignore the sequence effects. This paper proposes a thermal fatigue life prediction model for ball grid array (BGA) packages to take into consideration the load sequence effects. For the purpose of improving the availability and accessibility of testing data, a new failure criterion is discussed and verified by simulation and experimentation. The consequences for the fatigue underlying sequence load conditions are shown. PMID:28773980

  15. Molecular characterization of a novel Nucleorhabdovirus from black currant identified by high-throughput sequencing

    USDA-ARS?s Scientific Manuscript database

    Contigs with sequence similarities to several nucleorhabdoviruses were identified by high-throughput sequencing analysis from a black currant (Ribes nigrum L.) cultivar. The complete genomic sequence of this new nucleorhabdovirus is 14,432 nucleotides. Its genomic organization is typical of nucleorh...

  16. rpoB Gene Sequencing for Identification of Corynebacterium Species

    PubMed Central

    Khamis, Atieh; Raoult, Didier; La Scola, Bernard

    2004-01-01

    The genus Corynebacterium is a heterogeneous group of species comprising human and animal pathogens and environmental bacteria. It is defined on the basis of several phenotypic characters and the results of DNA-DNA relatedness and, more recently, 16S rRNA gene sequencing. However, the 16S rRNA gene is not polymorphic enough to ensure reliable phylogenetic studies and needs to be completely sequenced for accurate identification. The almost complete rpoB sequences of 56 Corynebacterium species were determined by both PCR and genome walking methods. In all cases the percent similarities between different species were lower than those observed by 16S rRNA gene sequencing, even for those species with degrees of high similarity. Several clusters supported by high bootstrap values were identified. In order to propose a method for strain identification which does not require sequencing of the complete rpoB sequence (approximately 3,500 bp), we identified an area with a high degree of polymorphism, bordered by conserved sequences that can be used as universal primers for PCR amplification and sequencing. The sequence of this fragment (434 to 452 bp) allows accurate species identification and may be used in the future for routine sequence-based identification of Corynebacterium species. PMID:15364970

  17. An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species

    PubMed Central

    Galpert, Deborah; del Río, Sara; Herrera, Francisco; Ancede-Gallardo, Evys; Antunes, Agostinho; Agüero-Chapin, Guillermin

    2015-01-01

    Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles) are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification. PMID:26605337

  18. An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species.

    PubMed

    Galpert, Deborah; Del Río, Sara; Herrera, Francisco; Ancede-Gallardo, Evys; Antunes, Agostinho; Agüero-Chapin, Guillermin

    2015-01-01

    Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles) are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification.

  19. Fluorescence Determination of Tryptophan Side-Chain Accessibility and Dynamics in Triple-Helical Collagen-Like Peptides

    PubMed Central

    Simon-Lukasik, Kristine V.; Persikov, Anton V.; Brodsky, Barbara; Ramshaw, John A. M.; Laws, William R.; Alexander Ross, J. B.; Ludescher, Richard D.

    2003-01-01

    We report tryptophan fluorescence measurements of emission intensity, iodide quenching, and anisotropy that describe the environment and dynamics at X and Y sites in stable collagen-like peptides of sequence (Gly-X-Y)n. About 90% of tryptophans at both sites have similar solvent exposed fluorescence properties and a lifetime of 8.5–9 ns. Analysis of anisotropy decays using an associative model indicates that these long lifetime populations undergo rapid depolarizing motion with a 0.5 ns correlation time; however, the extent of fast motion at the Y site is considerably less than the essentially unrestricted motion at the X site. About 10% of tryptophans at both sites have a shorter (∼3 ns) lifetime indicating proximity to a protein quenching group; these minor populations are immobile on the peptide surface, depolarizing only by overall trimer rotation. Iodide quenching indicates that tryptophans at the X site are more accessible to solvent. Side chains at X sites are more solvent accessible and considerably more mobile than residues at Y sites and can more readily fluctuate among alternate intermolecular interactions in collagen fibrils. This fluorescence analysis of collagen-like peptides lays a foundation for studies on the structure, dynamics, and function of collagen and of triple-helical junctions in gelatin gels. PMID:12524302

  20. Impact of cultivation on characterisation of species composition of soil bacterial communities.

    PubMed

    McCaig, A E.; Grayston, S J.; Prosser, J I.; Glover, L A.

    2001-03-01

    The species composition of culturable bacteria in Scottish grassland soils was investigated using a combination of Biolog and 16S rDNA analysis for characterisation of isolates. The inclusion of a molecular approach allowed direct comparison of sequences from culturable bacteria with sequences obtained during analysis of DNA extracted directly from the same soil samples. Bacterial strains were isolated on Pseudomonas isolation agar (PIA), a selective medium, and on tryptone soya agar (TSA), a general laboratory medium. In total, 12 and 21 morphologically different bacterial cultures were isolated on PIA and TSA, respectively. Biolog and sequencing placed PIA isolates in the same taxonomic groups, the majority of cultures belonging to the Pseudomonas (sensu stricto) group. However, analysis of 16S rDNA sequences proved more efficient than Biolog for characterising TSA isolates due to limitations of the Microlog database for identifying environmental bacteria. In general, 16S rDNA sequences from TSA isolates showed high similarities to cultured species represented in sequence databases, although TSA-8 showed only 92.5% similarity to the nearest relative, Bacillus insolitus. In general, there was very little overlap between the culturable and uncultured bacterial communities, although two sequences, PIA-2 and TSA-13, showed >99% similarity to soil clones. A cloning step was included prior to sequence analysis of two isolates, TSA-5 and TSA-14, and analysis of several clones confirmed that these cultures comprised at least four and three sequence types, respectively. All isolate clones were most closely related to uncultured bacteria, with clone TSA-5.1 showing 99.8% similarity to a sequence amplified directly from the same soil sample. Interestingly, one clone, TSA-5.4, clustered within a novel group comprising only uncultured sequences. This group, which is associated with the novel, deep-branching Acidobacterium capsulatum lineage, also included clones isolated during direct analysis of the same soil and from a wide range of other sample types studied elsewhere. The study demonstrates the value of fine-scale molecular analysis for identification of laboratory isolates and indicates the culturability of approximately 1% of the total population but under a restricted range of media and cultivation conditions.

  1. AlignMe—a membrane protein sequence alignment web server

    PubMed Central

    Stamm, Marcus; Staritzbichler, René; Khafizov, Kamil; Forrest, Lucy R.

    2014-01-01

    We present a web server for pair-wise alignment of membrane protein sequences, using the program AlignMe. The server makes available two operational modes of AlignMe: (i) sequence to sequence alignment, taking two sequences in fasta format as input, combining information about each sequence from multiple sources and producing a pair-wise alignment (PW mode); and (ii) alignment of two multiple sequence alignments to create family-averaged hydropathy profile alignments (HP mode). For the PW sequence alignment mode, four different optimized parameter sets are provided, each suited to pairs of sequences with a specific similarity level. These settings utilize different types of inputs: (position-specific) substitution matrices, secondary structure predictions and transmembrane propensities from transmembrane predictions or hydrophobicity scales. In the second (HP) mode, each input multiple sequence alignment is converted into a hydrophobicity profile averaged over the provided set of sequence homologs; the two profiles are then aligned. The HP mode enables qualitative comparison of transmembrane topologies (and therefore potentially of 3D folds) of two membrane proteins, which can be useful if the proteins have low sequence similarity. In summary, the AlignMe web server provides user-friendly access to a set of tools for analysis and comparison of membrane protein sequences. Access is available at http://www.bioinfo.mpg.de/AlignMe PMID:24753425

  2. Sequence data - Magnitude and implications of some ambiguities.

    NASA Technical Reports Server (NTRS)

    Holmquist, R.; Jukes, T. H.

    1972-01-01

    A stochastic model is applied to the divergence of the horse-pig lineage from a common ansestor in terms of the alpha and beta chains of hemoglobin and fibrinopeptides. The results are compared with those based on the minimum mutation distance model of Fitch (1972). Buckwheat and cauliflower cytochrome c sequences are analyzed to demonstrate their ambiguities. A comparative analysis of evolutionary rates for various proteins of horses and pigs shows that errors of considerable magnitude are introduced by Glx and Asx ambiguities into evolutionary conclusions drawn from sequences of incompletely analyzed proteins.

  3. Algorithm, applications and evaluation for protein comparison by Ramanujan Fourier transform.

    PubMed

    Zhao, Jian; Wang, Jiasong; Hua, Wei; Ouyang, Pingkai

    2015-12-01

    The amino acid sequence of a protein determines its chemical properties, chain conformation and biological functions. Protein sequence comparison is of great importance to identify similarities of protein structures and infer their functions. Many properties of a protein correspond to the low-frequency signals within the sequence. Low frequency modes in protein sequences are linked to the secondary structures, membrane protein types, and sub-cellular localizations of the proteins. In this paper, we present Ramanujan Fourier transform (RFT) with a fast algorithm to analyze the low-frequency signals of protein sequences. The RFT method is applied to similarity analysis of protein sequences with the Resonant Recognition Model (RRM). The results show that the proposed fast RFT method on protein comparison is more efficient than commonly used discrete Fourier transform (DFT). RFT can detect common frequencies as significant feature for specific protein families, and the RFT spectrum heat-map of protein sequences demonstrates the information conservation in the sequence comparison. The proposed method offers a new tool for pattern recognition, feature extraction and structural analysis on protein sequences. Copyright © 2015 Elsevier Ltd. All rights reserved.

  4. Computationally predicted IgE epitopes of walnut allergens contribute to cross-reactivity with peanuts

    USDA-ARS?s Scientific Manuscript database

    Cross reactivity between peanuts and tree nuts implies that similar IgE epitopes are present in their proteins. To determine whether walnut sequences similar to known peanut IgE binding sequences, according to the property distance (PD) scale implemented in the Structural Database of Allergenic Prot...

  5. Sequenced Integration and the Identification of a Problem-Solving Approach through a Learning Process

    ERIC Educational Resources Information Center

    Cormas, Peter C.

    2016-01-01

    Preservice teachers (N = 27) in two sections of a sequenced, methodological and process integrated mathematics/science course solved a levers problem with three similar learning processes and a problem-solving approach, and identified a problem-solving approach through one different learning process. Similar learning processes used included:…

  6. Analysis of genetic diversity in pigeon pea germplasm using retrotransposon-based molecular markers.

    PubMed

    Maneesha; Upadhyaya, Kailash C

    2017-09-01

    Pigeon pea (Cajanus cajan), an important legume crop is predominantly cultivated in tropical and subtropical regions of Asia and Africa. It is normally considered to have a low degree of genetic diversity, an impediment in undertaking crop improvement programmes.We have analysed genetic polymorphism of domesticated pigeon pea germplasm (47 accessions) across the world using earlier characterized panzee retrotransposon-based molecularmarkers. Itwas conjectured that since retrotransposons are interspersed throughout the genome, retroelements-based markers would be able to uncover polymorphism possibly inherent in the diversity of retroelement sequences. Two PCR-based techniques, sequence-specific amplified polymorphism (SSAP) and retrotransposon microsatellite amplified polymorphism (REMAP) were utilized for the analyses.We show that a considerable degree of polymorphism could be detected using these techniques. Three primer combinations in SSAP generated 297 amplified products across 47 accessions with an average of 99 amplicons per assay. Degree of polymorphism varied from 84-95%. In the REMAP assays, the number of amplicons was much less but up to 73% polymorphism could be detected. On the basis of similarity coefficients, dendrograms were constructed. The results demonstrate that the retrotransposon-based markers could serve as a better alternative for the assessment of genetic diversity in crops with apparent low genetic base.

  7. Application of GRA for Sustainable Material Selection and Evaluation Using LCA

    NASA Astrophysics Data System (ADS)

    Jayakrishna, Kandasamy; Vinodh, Sekar; Sakthi Sanghvi, Vijayaselvan; Deepika, Chinadurai

    2016-07-01

    Material selection is identified as a successful key parameter in establishing any product to be sustainable, considering its end of life (EoL) characteristics. An accurate understanding of expected service conditions and environmental considerations are crucial in the selection of material plays a vital role with overwhelming customer expectations and stringent laws. Therefore, this article presents an integrated approach for sustainable material selection using grey relational analysis (GRA) considering the EoL disposal strategies with respect to an automotive product. GRA, an impact evaluation model measures the degree of similarity between the comparability (choice of material) sequence and reference (EoL strategies) sequence based on the relational grade. The ranking result shows that the outranking relationships in the order, ABS-REC > PP-INC > AL-REM > PP-LND > ABS-LND > ABS-INC > PU-LND > AL-REC > AL-LND > PU-INC > AL-INC. The best sustainable material selected was ABS and recycling was selected as the best EoL strategy with the grey relational value of 2.43856. The best material selected by this approach, ABS was evaluated for its viability using life cycle assessment and the estimated impacts also proved the practicability of the selected material highlighting the focus on dehumidification step in the manufacturing of the case product using this developed multi-criteria approach.

  8. Life cycle as a stable trait in the evaluation of diversity of Nostoc from biofilms in rivers.

    PubMed

    Mateo, Pilar; Perona, Elvira; Berrendero, Esther; Leganés, Francisco; Martín, Marta; Golubić, Stjepko

    2011-05-01

    The diversity within the genus Nostoc is still controversial and more studies are needed to clarify its heterogeneity. Macroscopic species have been extensively studied and discussed; however, the microscopic forms of the genus, especially those from running waters, are poorly known and likely represented by many more species than currently described. Nostoc isolates from biofilms of two Spanish calcareous rivers were characterized comparing the morphology and life cycle in two culture media with different levels of nutrients and also comparing the 16S rRNA gene sequences. The results showed that trichome shape and cellular dimensions varied considerably depending on the culture media used, whereas the characteristics expressed in the course of the life cycle remained stable for each strain independent of the culture conditions. Molecular phylogenetic analysis confirmed the distinction between the studied strains established on morphological grounds. A balanced approach to the evaluation of diversity of Nostoc in the service of autecological studies requires both genotypic information and the evaluation of stable traits. The results of this study show that 16S rRNA gene sequence similarity serves as an important criterion for characterizing Nostoc strains and is consistent with stable attributes, such as the life cycle. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.

  9. Microalgae-activated sludge treatment of molasses wastewater in sequencing batch photo-bioreactor.

    PubMed

    Tsioptsias, Costas; Lionta, Gesthimani; Samaras, Petros

    2017-05-01

    The aim of this work was the examination of the treatment potential of molasses wastewater, by the utilization of activated sludge and microalgae. The systems used included a sequencing batch bioreactor and a similar photo-bioreactor, favoring microalgae growth. The microalgae treatment of molasses wastewater mixture resulted in a considerable reduction in the total nitrogen content. A reduction in the ammonium and nitrate content was observed in the photo-bioreactor, while the effluent's total nitrogen consisted mainly of 50% organic nitrogen. The transformation of the nitrogen forms in the photo-bioreactor was attributed to microalgae activity, resulting in the production of a better quality effluent. Lower COD removal was observed for the photo-bioreactor than the control, which however increased, by the replacement of the anoxic phase by a long aeration period. The mechanism of nitrogen removal included both the denitrification process during the anoxic stage and the microalgae activities, as the replacement of the anoxic stage resulted in low total nitrogen removal capacities. A decrease in the photobioreactor performance was observed after 35 days of operation due to biofilm formation on the light tube surface, while the operation at higher temperature accelerated microalgae growth, resulting thus in the early failure of the photoreactor.

  10. Transspecies Transmission of Gammaretroviruses and the Origin of the Gibbon Ape Leukaemia Virus (GaLV) and the Koala Retrovirus (KoRV).

    PubMed

    Denner, Joachim

    2016-12-20

    Transspecies transmission of retroviruses is a frequent event, and the human immunodeficiency virus-1 (HIV-1) is a well-known example. The gibbon ape leukaemia virus (GaLV) and koala retrovirus (KoRV), two gammaretroviruses, are also the result of a transspecies transmission, however from a still unknown host. Related retroviruses have been found in Southeast Asian mice although the sequence similarity was limited. Viruses with a higher sequence homology were isolated from Melomys burtoni , the Australian and Indonesian grassland melomys. However, only the habitats of the koalas and the grassland melomys in Australia are overlapping, indicating that the melomys virus may not be the precursor of the GaLV. Viruses closely related to GaLV/KoRV were also detected in bats. Therefore, given the fact that the habitats of the gibbons in Thailand and the koalas in Australia are far away, and that bats are able to fly over long distances, the hypothesis that retroviruses of bats are the origin of GaLV and KoRV deserves consideration. Analysis of previous transspecies transmissions of retroviruses may help to evaluate the potential of transmission of related retroviruses in the future, e.g., that of porcine endogenous retroviruses (PERVs) during xenotransplantation using pig cells, tissues or organs.

  11. Predicting Hydrologic Function With Aquatic Gene Fragments

    NASA Astrophysics Data System (ADS)

    Good, S. P.; URycki, D. R.; Crump, B. C.

    2018-03-01

    Recent advances in microbiology techniques, such as genetic sequencing, allow for rapid and cost-effective collection of large quantities of genetic information carried within water samples. Here we posit that the unique composition of aquatic DNA material within a water sample contains relevant information about hydrologic function at multiple temporal scales. In this study, machine learning was used to develop discharge prediction models trained on the relative abundance of bacterial taxa classified into operational taxonomic units (OTUs) based on 16S rRNA gene sequences from six large arctic rivers. We term this approach "genohydrology," and show that OTU relative abundances can be used to predict river discharge at monthly and longer timescales. Based on a single DNA sample from each river, the average Nash-Sutcliffe efficiency (NSE) for predicted mean monthly discharge values throughout the year was 0.84, while the NSE for predicted discharge values across different return intervals was 0.67. These are considerable improvements over predictions based only on the area-scaled mean specific discharge of five similar rivers, which had average NSE values of 0.64 and -0.32 for seasonal and recurrence interval discharge values, respectively. The genohydrology approach demonstrates that genetic diversity within the aquatic microbiome is a large and underutilized data resource with benefits for prediction of hydrologic function.

  12. Pitch chroma discrimination, generalization, and transfer tests of octave equivalence in humans.

    PubMed

    Hoeschele, Marisa; Weisman, Ronald G; Sturdy, Christopher B

    2012-11-01

    Octave equivalence occurs when notes separated by an octave (a doubling in frequency) are judged as being perceptually similar. Considerable evidence points to the importance of the octave in music and speech. Yet, experimental demonstration of octave equivalence has been problematic. Using go/no-go operant discrimination and generalization, we studied octave equivalence in humans. In Experiment 1, we found that a procedure that failed to show octave equivalence in European starlings also failed in humans. In Experiment 2, we modified the procedure to control for the effects of pitch height perception by training participants in Octave 4 and testing in Octave 5. We found that the pattern of responding developed by discrimination training in Octave 4 generalized to Octave 5. We replicated and extended our findings in Experiment 3 by adding a transfer phase: Participants were trained with either the same or a reversed pattern of rewards in Octave 5. Participants transferred easily to the same pattern of reward in Octave 5 but struggled to learn the reversed pattern. We provided minimal instruction, presented no ordered sequences of notes, and used only sine-wave tones, but participants nonetheless constructed pitch chroma information from randomly ordered sequences of notes. Training in music weakly hindered octave generalization but moderately facilitated both positive and negative transfer.

  13. A Novel Laccase with Potent Antiproliferative and HIV-1 Reverse Transcriptase Inhibitory Activities from Mycelia of Mushroom Coprinus comatus

    PubMed Central

    Zhao, Shuang; Rong, Cheng-Bo; Kong, Chang; Liu, Yu; Xu, Feng; Miao, Qian-Jiang; Wang, Shou-Xian; Wang, He-Xiang

    2014-01-01

    A novel laccase was isolated and purified from fermentation mycelia of mushroom Coprinus comatus with an isolation procedure including three ion-exchange chromatography steps on DEAE-cellulose, CM-cellulose, and Q-Sepharose and one gel-filtration step by fast protein liquid chromatography on Superdex 75. The purified enzyme was a monomeric protein with a molecular weight of 64 kDa. It possessed a unique N-terminal amino acid sequence of AIGPVADLKV, which has considerably high sequence similarity with that of other fungal laccases, but is different from that of C. comatus laccases reported. The enzyme manifested an optimal pH value of 2.0 and an optimal temperature of 60°C using 2,2′-azinobis(3-ethylbenzothiazolone-6-sulfonic acid) diammonium salt (ABTS) as the substrate. The laccase displayed, at pH 2.0 and 37°C, K m values of 1.59 mM towards ABTS. It potently suppressed proliferation of tumor cell lines HepG2 and MCF7, and inhibited human immunodeficiency virus type 1 (HIV-1) reverse transcriptase (RT) with an IC50 value of 3.46 μM, 4.95 μM, and 5.85 μM, respectively, signifying that it is an antipathogenic protein. PMID:25540778

  14. GPS-PAIL: prediction of lysine acetyltransferase-specific modification sites from protein sequences.

    PubMed

    Deng, Wankun; Wang, Chenwei; Zhang, Ying; Xu, Yang; Zhang, Shuang; Liu, Zexian; Xue, Yu

    2016-12-22

    Protein acetylation catalyzed by specific histone acetyltransferases (HATs) is an essential post-translational modification (PTM) and involved in the regulation a broad spectrum of biological processes in eukaryotes. Although several ten thousands of acetylation sites have been experimentally identified, the upstream HATs for most of the sites are unclear. Thus, the identification of HAT-specific acetylation sites is fundamental for understanding the regulatory mechanisms of protein acetylation. In this work, we first collected 702 known HAT-specific acetylation sites of 205 proteins from the literature and public data resources, and a motif-based analysis demonstrated that different types of HATs exhibit similar but considerably distinct sequence preferences for substrate recognition. Using 544 human HAT-specific sites for training, we constructed a highly useful tool of GPS-PAIL for the prediction of HAT-specific sites for up to seven HATs, including CREBBP, EP300, HAT1, KAT2A, KAT2B, KAT5 and KAT8. The prediction accuracy of GPS-PAIL was critically evaluated, with a satisfying performance. Using GPS-PAIL, we also performed a large-scale prediction of potential HATs for known acetylation sites identified from high-throughput experiments in nine eukaryotes. Both online service and local packages were implemented, and GPS-PAIL is freely available at: http://pail.biocuckoo.org.

  15. GPS-PAIL: prediction of lysine acetyltransferase-specific modification sites from protein sequences

    PubMed Central

    Deng, Wankun; Wang, Chenwei; Zhang, Ying; Xu, Yang; Zhang, Shuang; Liu, Zexian; Xue, Yu

    2016-01-01

    Protein acetylation catalyzed by specific histone acetyltransferases (HATs) is an essential post-translational modification (PTM) and involved in the regulation a broad spectrum of biological processes in eukaryotes. Although several ten thousands of acetylation sites have been experimentally identified, the upstream HATs for most of the sites are unclear. Thus, the identification of HAT-specific acetylation sites is fundamental for understanding the regulatory mechanisms of protein acetylation. In this work, we first collected 702 known HAT-specific acetylation sites of 205 proteins from the literature and public data resources, and a motif-based analysis demonstrated that different types of HATs exhibit similar but considerably distinct sequence preferences for substrate recognition. Using 544 human HAT-specific sites for training, we constructed a highly useful tool of GPS-PAIL for the prediction of HAT-specific sites for up to seven HATs, including CREBBP, EP300, HAT1, KAT2A, KAT2B, KAT5 and KAT8. The prediction accuracy of GPS-PAIL was critically evaluated, with a satisfying performance. Using GPS-PAIL, we also performed a large-scale prediction of potential HATs for known acetylation sites identified from high-throughput experiments in nine eukaryotes. Both online service and local packages were implemented, and GPS-PAIL is freely available at: http://pail.biocuckoo.org. PMID:28004786

  16. Construction of a Cyber Attack Model for Nuclear Power Plants

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Varuttamaseni, Athi; Bari, Robert A.; Youngblood, Robert

    The consideration of how one compromised digital equipment can impact neighboring equipment is critical to understanding the progression of cyber attacks. The degree of influence that one component may have on another depends on a variety of factors, including the sharing of resources such as network bandwidth or processing power, the level of trust between components, and the inclusion of segmentation devices such as firewalls. The interactions among components via mechanisms that are unique to the digital world are not usually considered in traditional PRA. This means potential sequences of events that may occur during an attack may be missedmore » if one were to only look at conventional accident sequences. This paper presents a method where, starting from the initial attack vector, the progression of a cyber attack can be modeled. The propagation of the attack is modeled by considering certain attributes of the digital components in the system. These attributes determine the potential vulnerability of a component to a class of attack and the capability gained by the attackers once they are in control of the equipment. The use of attributes allows similar components (components with the same set of attributes) to be modeled in the same way, thereby reducing the computing resources required for analysis of large systems.« less

  17. Statistical analysis of the Bacterial Carbohydrate Structure Data Base (BCSDB): Characteristics and diversity of bacterial carbohydrates in comparison with mammalian glycans

    PubMed Central

    Herget, Stephan; Toukach, Philip V; Ranzinger, René; Hull, William E; Knirel, Yuriy A; von der Lieth, Claus-Wilhelm

    2008-01-01

    Background There are considerable differences between bacterial and mammalian glycans. In contrast to most eukaryotic carbohydrates, bacterial glycans are often composed of repeating units with diverse functions ranging from structural reinforcement to adhesion, colonization and camouflage. Since bacterial glycans are typically displayed at the cell surface, they can interact with the environment and, therefore, have significant biomedical importance. Results The sequence characteristics of glycans (monosaccharide composition, modifications, and linkage patterns) for the higher bacterial taxonomic classes have been examined and compared with the data for mammals, with both similarities and unique features becoming evident. Compared to mammalian glycans, the bacterial glycans deposited in the current databases have a more than ten-fold greater diversity at the monosaccharide level, and the disaccharide pattern space is approximately nine times larger. Specific bacterial subclasses exhibit characteristic glycans which can be distinguished on the basis of distinctive structural features or sequence properties. Conclusion For the first time a systematic database analysis of the bacterial glycome has been performed. This study summarizes the current knowledge of bacterial glycan architecture and diversity and reveals putative targets for the rational design and development of therapeutic intervention strategies by comparing bacterial and mammalian glycans. PMID:18694500

  18. The transmission dynamics and diversity of human metapneumovirus in Peru.

    PubMed

    Pollett, Simon; Trovão, Nidia S; Tan, Yi; Eden, John-Sebastian; Halpin, Rebecca A; Bera, Jayati; Das, Suman R; Wentworth, David; Ocaña, Victor; Mendocilla, Silvia M; Álvarez, Carlos; Calisto, Maria E; Garcia, Josefina; Halsey, Eric; Ampuero, Julia S; Nelson, Martha I; Leguia, Mariana

    2017-12-29

    The transmission dynamics of human metapneumovirus (HMPV) in tropical countries remain unclear. Further understanding of the genetic diversity of the virus could aid in HMPV vaccine design and improve our understanding of respiratory virus transmission dynamics in low- and middle-income countries. We examined the evolution of HMPV in Peru through phylogenetic analysis of 61 full genome HMPV sequences collected in three ecologically diverse regions of Peru (Lima, Piura, and Iquitos) during 2008-2012, comprising the largest data set of HMPV whole genomes sequenced from any tropical country to date. We revealed extensive genetic diversity generated by frequent viral introductions, with little evidence of local persistence. While considerable viral traffic between non-Peruvian countries and Peru was observed, HMPV epidemics in Peruvian locales were more frequently epidemiologically linked with other sites within Peru. We showed that Iquitos experienced greater HMPV traffic than the similar sized city of Piura by both Bayesian and maximum likelihood methods. There is extensive HMPV genetic diversity even within smaller and relatively less connected cities of Peru and this virus is spatially fluid. Greater diversity of HMPV in Iquitos compared to Piura may relate to higher volumes of human movement, including air traffic to this location. © 2017 The Authors. Influenza and Other Respiratory Viruses Published by John Wiley & Sons Ltd.

  19. Sirius PSB: a generic system for analysis of biological sequences.

    PubMed

    Koh, Chuan Hock; Lin, Sharene; Jedd, Gregory; Wong, Limsoon

    2009-12-01

    Computational tools are essential components of modern biological research. For example, BLAST searches can be used to identify related proteins based on sequence homology, or when a new genome is sequenced, prediction models can be used to annotate functional sites such as transcription start sites, translation initiation sites and polyadenylation sites and to predict protein localization. Here we present Sirius Prediction Systems Builder (PSB), a new computational tool for sequence analysis, classification and searching. Sirius PSB has four main operations: (1) Building a classifier, (2) Deploying a classifier, (3) Search for proteins similar to query proteins, (4) Preliminary and post-prediction analysis. Sirius PSB supports all these operations via a simple and interactive graphical user interface. Besides being a convenient tool, Sirius PSB has also introduced two novelties in sequence analysis. Firstly, genetic algorithm is used to identify interesting features in the feature space. Secondly, instead of the conventional method of searching for similar proteins via sequence similarity, we introduced searching via features' similarity. To demonstrate the capabilities of Sirius PSB, we have built two prediction models - one for the recognition of Arabidopsis polyadenylation sites and another for the subcellular localization of proteins. Both systems are competitive against current state-of-the-art models based on evaluation of public datasets. More notably, the time and effort required to build each model is greatly reduced with the assistance of Sirius PSB. Furthermore, we show that under certain conditions when BLAST is unable to find related proteins, Sirius PSB can identify functionally related proteins based on their biophysical similarities. Sirius PSB and its related supplements are available at: http://compbio.ddns.comp.nus.edu.sg/~sirius.

  20. RNA sequencing confirms similarities between PPI-responsive oesophageal eosinophilia and eosinophilic oesophagitis.

    PubMed

    Peterson, K A; Yoshigi, M; Hazel, M W; Delker, D A; Lin, E; Krishnamurthy, C; Consiglio, N; Robson, J; Yandell, M; Clayton, F

    2018-06-04

    Although current American guidelines distinguish proton pump inhibitor-responsive oesophageal eosinophilia (PPI-REE) from eosinophilic oesophagitis (EoE), these entities are broadly similar. While two microarray studies showed that they have similar transcriptomes, more extensive RNA sequencing studies have not been done previously. To determine whether RNA sequencing identifies genetic markers distinguishing PPI-REE from EoE. We retrospectively examined 13 PPI-REE and 14 EoE biopsies, matched for tissue eosinophil content, and 14 normal controls. Patients and controls were not PPI-treated at the time of biopsy. We did RNA sequencing on formalin-fixed, paraffin-embedded tissue, with differential expression confirmation by quantitative polymerase chain reaction (PCR). We validated the use of formalin-fixed, paraffin-embedded vs RNAlater-preserved tissue, and compared our formalin-fixed, paraffin-embedded EoE results to a prior EoE study. By RNA sequencing, no genes were differentially expressed between the EoE and PPI-REE groups at the false discovery rate (FDR) ≤0.01 level. Compared to normal controls, 1996 genes were differentially expressed in the PPI-REE group and 1306 genes in the EoE group. By less stringent criteria, only MAPK8IP2 was differentially expressed between PPI-REE and EoE (FDR = 0.029, 2.2-fold less in EoE than in PPI-REE), with similar results by PCR. KCNJ2, which was differentially expressed in a prior study, was similar in the EoE and PPI-REE groups by both RNA sequencing and real-time PCR. Eosinophilic oesophagitis and PPI-REE have comparable transcriptomes, confirming that they are part of the same disease continuum. © 2018 John Wiley & Sons Ltd.

  1. Genome Sequencing and Assembly by Long Reads in Plants

    PubMed Central

    Li, Changsheng; Lin, Feng; An, Dong; Huang, Ruidong

    2017-01-01

    Plant genomes generated by Sanger and Next Generation Sequencing (NGS) have provided insight into species diversity and evolution. However, Sanger sequencing is limited in its applications due to high cost, labor intensity, and low throughput, while NGS reads are too short to resolve abundant repeats and polyploidy, leading to incomplete or ambiguous assemblies. The advent and improvement of long-read sequencing by Third Generation Sequencing (TGS) methods such as PacBio and Nanopore have shown promise in producing high-quality assemblies for complex genomes. Here, we review the development of sequencing, introducing the application as well as considerations of experimental design in TGS of plant genomes. We also introduce recent revolutionary scaffolding technologies including BioNano, Hi-C, and 10× Genomics. We expect that the informative guidance for genome sequencing and assembly by long reads will benefit the initiation of scientists’ projects. PMID:29283420

  2. Motion and Energy Chemical Reactions, Parts One and Two of an Integrated Science Sequence, Teacher's Guide, 1973 Edition.

    ERIC Educational Resources Information Center

    Portland Project Committee, OR.

    This teacher's guide is for the second year of the Portland Project, a three-year integrated secondary science curriculum sequence. The first of two parts in this volume, "Motion and Energy," begins with the study of motion, going from the quantitative description to a consideration of what causes motion and a discussion of Newton's…

  3. Draft genome sequence of Inquilinus limosus strain MP06, a multidrug-resistant clinical isolate

    PubMed Central

    Pino, Marylú; Conza, José Di; Gutkind, Gabriel

    2015-01-01

    The bacterium, Inquilinus limosus, with its remarkable antimicrobial multiresistant profile, has increasingly been isolated in cystic fibrosis patients. We report draft genome sequence of a strain MP06, which is of considerable interest in elucidating the associated mechanisms of antibiotic resistance in this bacterium and for an insight about its persistence in airways of these patients. PMID:26691451

  4. Proliferation of group II introns in the chloroplast genome of the green alga Oedocladium carolinianum (Chlorophyceae).

    PubMed

    Brouard, Jean-Simon; Turmel, Monique; Otis, Christian; Lemieux, Claude

    2016-01-01

    The chloroplast genome sustained extensive changes in architecture during the evolution of the Chlorophyceae, a morphologically and ecologically diverse class of green algae belonging to the Chlorophyta; however, the forces driving these changes are poorly understood. The five orders recognized in the Chlorophyceae form two major clades: the CS clade consisting of the Chlamydomonadales and Sphaeropleales, and the OCC clade consisting of the Oedogoniales, Chaetophorales, and Chaetopeltidales. In the OCC clade, considerable variations in chloroplast DNA (cpDNA) structure, size, gene order, and intron content have been observed. The large inverted repeat (IR), an ancestral feature characteristic of most green plants, is present in Oedogonium cardiacum (Oedogoniales) but is lacking in the examined members of the Chaetophorales and Chaetopeltidales. Remarkably, the Oedogonium 35.5-kb IR houses genes that were putatively acquired through horizontal DNA transfer. To better understand the dynamics of chloroplast genome evolution in the Oedogoniales, we analyzed the cpDNA of a second representative of this order, Oedocladium carolinianum . The Oedocladium cpDNA was sequenced and annotated. The evolutionary distances separating Oedocladium and Oedogonium cpDNAs and two other pairs of chlorophycean cpDNAs were estimated using a 61-gene data set. Phylogenetic analysis of an alignment of group IIA introns from members of the OCC clade was performed. Secondary structures and insertion sites of oedogonialean group IIA introns were analyzed. The 204,438-bp Oedocladium genome is 7.9 kb larger than the Oedogonium genome, but its repertoire of conserved genes is remarkably similar and gene order differs by only one reversal. Although the 23.7-kb IR is missing the putative foreign genes found in Oedogonium , it contains sequences coding for a putative phage or bacterial DNA primase and a hypothetical protein. Intergenic sequences are 1.5-fold longer and dispersed repeats are more abundant, but a smaller fraction of the Oedocladium genome is occupied by introns. Six additional group II introns are present, five of which lack ORFs and carry highly similar sequences to that of the ORF-less IIA intron shared with Oedogonium . Secondary structure analysis of the group IIA introns disclosed marked differences in the exon-binding sites; however, each intron showed perfect or nearly perfect base pairing interactions with its target site. Our results suggest that chloroplast genes rearrange more slowly in the Oedogoniales than in the Chaetophorales and raise questions as to what was the nature of the foreign coding sequences in the IR of the common ancestor of the Oedogoniales. They provide the first evidence for intragenomic proliferation of group IIA introns in the Viridiplantae, revealing that intron spread in the Oedocladium lineage likely occurred by retrohoming after sequence divergence of the exon-binding sites.

  5. Complete genome sequence of a coxsackievirus B3 recombinant isolated from an aseptic meningitis outbreak in eastern China.

    PubMed

    Zhang, Wenqiang; Lin, Xiaojuan; Jiang, Ping; Tao, Zexin; Liu, Xiaolin; Ji, Feng; Wang, Tongzhan; Wang, Suting; Lv, Hui; Xu, Aiqiang; Wang, Haiyan

    2016-08-01

    Coxsackievirus B3 (CV-B3) has frequently been associated with aseptic meningitis outbreaks in China. To identify sequence motifs related to aseptic meningitis and to construct an infectious clone, the genome sequence of 08TC170, a representative strain isolated from cerebrospinal fluid (CSF) samples from an outbreak in Shandong in 2008, was determined, and the coding regions for P1-P3 and VP1 were aligned. The first 21 and last 20 residues were "TTAAAACAGCCTGTGGGTTGT" and "ATTCTCCGCATTCGGTGCGG", respectively. The whole genome consisted of 7401 nucleotides, sharing 80.8 % identity with the prototype strain Nancy and low sequence similarity with members of clusters A-C. In contrast, 08TC170 showed high sequence similarity to members of cluster D. An especially high level of sequence identity (≥97.7 %) was found within a branch constituted by 08TC170 and four Chinese strains that clustered together in all of the P1-P3 phylogenic trees. In addition, 08TC170 also possessed a close relationship to the Hong Kong strain 26362/08 in VP1. Similarity plot analysis showed that 08TC170 was most similar to the Chinese CV-B3 strain SSM in P1 and the partial P2 coding region but to the CV-B5 or E-6 strain in 2C and following regions. A T277A mutation was found in 08TC170 and other strains isolated in 2008-2010, but not in strains isolated before 2008, which had high sequence similarity and formed the cluster A277. The results suggested that 08TC170 was the product of both intertypic recombination and point mutation, whose effects on viral neurovirulence will be investigated in a further study. The high homology between 08TC170 and other strains revealed their co-circulation in mainland China and Hong Kong and indicates that further surveillance is needed.

  6. Comparative analysis of the feline immunoglobulin repertoire.

    PubMed

    Steiniger, Sebastian C J; Glanville, Jacob; Harris, Douglas W; Wilson, Thomas L; Ippolito, Gregory C; Dunham, Steven A

    2017-03-01

    Next-Generation Sequencing combined with bioinformatics is a powerful tool for analyzing the large number of DNA sequences present in the expressed antibody repertoire and these data sets can be used to advance a number of research areas including antibody discovery and engineering. The accurate measurement of the immune repertoire sequence composition, diversity and abundance is important for understanding the repertoire response in infections, vaccinations and cancer immunology and could also be useful for elucidating novel molecular targets. In this study 4 individual domestic cats (Felis catus) were subjected to antibody repertoire sequencing with total number of sequences generated 1079863 for VH for IgG, 1050824 VH for IgM, 569518 for VK and 450195 for VL. Our analysis suggests that a similar VDJ expression patterns exists across all cats. Similar to the canine repertoire, the feline repertoire is dominated by a single subgroup, namely VH3. The antibody paratope of felines showed similar amino acid variation when compared to human, mouse and canine counterparts. All animals show a similarly skewed VH CDR-H3 profile and, when compared to canine, human and mouse, distinct differences are observed. Our study represents the first attempt to characterize sequence diversity in the expressed feline antibody repertoire and this demonstrates the utility of using NGS to elucidate entire antibody repertoires from individual animals. These data provide significant insight into understanding the feline immune system function. Copyright © 2017 International Alliance for Biological Standardization. Published by Elsevier Ltd. All rights reserved.

  7. Sequence similarities and evolutionary relationships of microbial, plant and animal alpha-amylases.

    PubMed

    Janecek, S

    1994-09-01

    Amino acid sequence comparison of 37 alpha-amylases from microbial, plant and animal sources was performed to identify their mutual sequence similarities in addition to the five already described conserved regions. These sequence regions were examined from structure/function and evolutionary perspectives. An unrooted evolutionary tree of alpha-amylases was constructed on a subset of 55 residues from the alignment of sequence similarities along with conserved regions. The most important new information extracted from the tree was as follows: (a) the close evolutionary relationship of Alteromonas haloplanctis alpha-amylase (thermolabile enzyme from an antarctic psychrotroph) with the already known group of homologous alpha-amylases from streptomycetes, Thermomonospora curvata, insects and mammals, and (b) the remarkable 40.1% identity between starch-saccharifying Bacillus subtilis alpha-amylase and the enzyme from the ruminal bacterium Butyrivibrio fibrisolvens, an alpha-amylase with an unusually large polypeptide chain (943 residues in the mature enzyme). Due to a very high degree of similarity, the whole amino acid sequences of three groups of alpha-amylases, namely (a) fungi and yeasts, (b) plants, and (c) A. haloplanctis, streptomycetes, T. curvata, insects and mammals, were aligned independently and their unrooted distance trees were calculated using these alignments. Possible rooting of the trees was also discussed. Based on the knowledge of the location of the five disulfide bonds in the structure of pig pancreatic alpha-amylase, the possible disulfide bridges were established for each of these groups of homologous alpha-amylases.

  8. Complete nucleotide sequence of the freshwater unicellular cyanobacterium Synechococcus elongatus PCC 6301 chromosome: gene content and organization.

    PubMed

    Sugita, Chieko; Ogata, Koretsugu; Shikata, Masamitsu; Jikuya, Hiroyuki; Takano, Jun; Furumichi, Miho; Kanehisa, Minoru; Omata, Tatsuo; Sugiura, Masahiro; Sugita, Mamoru

    2007-01-01

    The entire genome of the unicellular cyanobacterium Synechococcus elongatus PCC 6301 (formerly Anacystis nidulans Berkeley strain 6301) was sequenced. The genome consisted of a circular chromosome 2,696,255 bp long. A total of 2,525 potential protein-coding genes, two sets of rRNA genes, 45 tRNA genes representing 42 tRNA species, and several genes for small stable RNAs were assigned to the chromosome by similarity searches and computer predictions. The translated products of 56% of the potential protein-coding genes showed sequence similarities to experimentally identified and predicted proteins of known function, and the products of 35% of the genes showed sequence similarities to the translated products of hypothetical genes. The remaining 9% of genes lacked significant similarities to genes for predicted proteins in the public DNA databases. Some 139 genes coding for photosynthesis-related components were identified. Thirty-seven genes for two-component signal transduction systems were also identified. This is the smallest number of such genes identified in cyanobacteria, except for marine cyanobacteria, suggesting that only simple signal transduction systems are found in this strain. The gene arrangement and nucleotide sequence of Synechococcus elongatus PCC 6301 were nearly identical to those of a closely related strain Synechococcus elongatus PCC 7942, except for the presence of a 188.6 kb inversion. The sequences as well as the gene information shown in this paper are available in the Web database, CYORF (http://www.cyano.genome.jp/).

  9. Querying Event Sequences by Exact Match or Similarity Search: Design and Empirical Evaluation

    PubMed Central

    Wongsuphasawat, Krist; Plaisant, Catherine; Taieb-Maimon, Meirav; Shneiderman, Ben

    2012-01-01

    Specifying event sequence queries is challenging even for skilled computer professionals familiar with SQL. Most graphical user interfaces for database search use an exact match approach, which is often effective, but near misses may also be of interest. We describe a new similarity search interface, in which users specify a query by simply placing events on a blank timeline and retrieve a similarity-ranked list of results. Behind this user interface is a new similarity measure for event sequences which the users can customize by four decision criteria, enabling them to adjust the impact of missing, extra, or swapped events or the impact of time shifts. We describe a use case with Electronic Health Records based on our ongoing collaboration with hospital physicians. A controlled experiment with 18 participants compared exact match and similarity search interfaces. We report on the advantages and disadvantages of each interface and suggest a hybrid interface combining the best of both. PMID:22379286

  10. Sequence History Update Tool

    NASA Technical Reports Server (NTRS)

    Khanampompan, Teerapat; Gladden, Roy; Fisher, Forest; DelGuercio, Chris

    2008-01-01

    The Sequence History Update Tool performs Web-based sequence statistics archiving for Mars Reconnaissance Orbiter (MRO). Using a single UNIX command, the software takes advantage of sequencing conventions to automatically extract the needed statistics from multiple files. This information is then used to populate a PHP database, which is then seamlessly formatted into a dynamic Web page. This tool replaces a previous tedious and error-prone process of manually editing HTML code to construct a Web-based table. Because the tool manages all of the statistics gathering and file delivery to and from multiple data sources spread across multiple servers, there is also a considerable time and effort savings. With the use of The Sequence History Update Tool what previously took minutes is now done in less than 30 seconds, and now provides a more accurate archival record of the sequence commanding for MRO.

  11. Genome Wide Re-Annotation of Caldicellulosiruptor saccharolyticus with New Insights into Genes Involved in Biomass Degradation and Hydrogen Production

    PubMed Central

    Chowdhary, Nupoor; Selvaraj, Ashok; KrishnaKumaar, Lakshmi; Kumar, Gopal Ramesh

    2015-01-01

    Caldicellulosiruptor saccharolyticus has proven itself to be an excellent candidate for biological hydrogen (H2) production, but still it has major drawbacks like sensitivity to high osmotic pressure and low volumetric H2 productivity, which should be considered before it can be used industrially. A whole genome re-annotation work has been carried out as an attempt to update the incomplete genome information that causes gap in the knowledge especially in the area of metabolic engineering, to improve the H2 producing capabilities of C. saccharolyticus. Whole genome re-annotation was performed through manual means for 2,682 Coding Sequences (CDSs). Bioinformatics tools based on sequence similarity, motif search, phylogenetic analysis and fold recognition were employed for re-annotation. Our methodology could successfully add functions for 409 hypothetical proteins (HPs), 46 proteins previously annotated as putative and assigned more accurate functions for the known protein sequences. Homology based gene annotation has been used as a standard method for assigning function to novel proteins, but over the past few years many non-homology based methods such as genomic context approaches for protein function prediction have been developed. Using non-homology based functional prediction methods, we were able to assign cellular processes or physical complexes for 249 hypothetical sequences. Our re-annotation pipeline highlights the addition of 231 new CDSs generated from MicroScope Platform, to the original genome with functional prediction for 49 of them. The re-annotation of HPs and new CDSs is stored in the relational database that is available on the MicroScope web-based platform. In parallel, a comparative genome analyses were performed among the members of genus Caldicellulosiruptor to understand the function and evolutionary processes. Further, with results from integrated re-annotation studies (homology and genomic context approach), we strongly suggest that Csac_0437 and Csac_0424 encode for glycoside hydrolases (GH) and are proposed to be involved in the decomposition of recalcitrant plant polysaccharides. Similarly, HPs: Csac_0732, Csac_1862, Csac_1294 and Csac_0668 are suggested to play a significant role in biohydrogen production. Function prediction of these HPs by using our integrated approach will considerably enhance the interpretation of large-scale experiments targeting this industrially important organism. PMID:26196387

  12. Genome Wide Re-Annotation of Caldicellulosiruptor saccharolyticus with New Insights into Genes Involved in Biomass Degradation and Hydrogen Production.

    PubMed

    Chowdhary, Nupoor; Selvaraj, Ashok; KrishnaKumaar, Lakshmi; Kumar, Gopal Ramesh

    2015-01-01

    Caldicellulosiruptor saccharolyticus has proven itself to be an excellent candidate for biological hydrogen (H2) production, but still it has major drawbacks like sensitivity to high osmotic pressure and low volumetric H2 productivity, which should be considered before it can be used industrially. A whole genome re-annotation work has been carried out as an attempt to update the incomplete genome information that causes gap in the knowledge especially in the area of metabolic engineering, to improve the H2 producing capabilities of C. saccharolyticus. Whole genome re-annotation was performed through manual means for 2,682 Coding Sequences (CDSs). Bioinformatics tools based on sequence similarity, motif search, phylogenetic analysis and fold recognition were employed for re-annotation. Our methodology could successfully add functions for 409 hypothetical proteins (HPs), 46 proteins previously annotated as putative and assigned more accurate functions for the known protein sequences. Homology based gene annotation has been used as a standard method for assigning function to novel proteins, but over the past few years many non-homology based methods such as genomic context approaches for protein function prediction have been developed. Using non-homology based functional prediction methods, we were able to assign cellular processes or physical complexes for 249 hypothetical sequences. Our re-annotation pipeline highlights the addition of 231 new CDSs generated from MicroScope Platform, to the original genome with functional prediction for 49 of them. The re-annotation of HPs and new CDSs is stored in the relational database that is available on the MicroScope web-based platform. In parallel, a comparative genome analyses were performed among the members of genus Caldicellulosiruptor to understand the function and evolutionary processes. Further, with results from integrated re-annotation studies (homology and genomic context approach), we strongly suggest that Csac_0437 and Csac_0424 encode for glycoside hydrolases (GH) and are proposed to be involved in the decomposition of recalcitrant plant polysaccharides. Similarly, HPs: Csac_0732, Csac_1862, Csac_1294 and Csac_0668 are suggested to play a significant role in biohydrogen production. Function prediction of these HPs by using our integrated approach will considerably enhance the interpretation of large-scale experiments targeting this industrially important organism.

  13. MoccaDB - an integrative database for functional, comparative and diversity studies in the Rubiaceae family

    PubMed Central

    Plechakova, Olga; Tranchant-Dubreuil, Christine; Benedet, Fabrice; Couderc, Marie; Tinaut, Alexandra; Viader, Véronique; De Block, Petra; Hamon, Perla; Campa, Claudine; de Kochko, Alexandre; Hamon, Serge; Poncet, Valérie

    2009-01-01

    Background In the past few years, functional genomics information has been rapidly accumulating on Rubiaceae species and especially on those belonging to the Coffea genus (coffee trees). An increasing number of expressed sequence tag (EST) data and EST- or genomic-derived microsatellite markers have been generated, together with Conserved Ortholog Set (COS) markers. This considerably facilitates comparative genomics or map-based genetic studies through the common use of orthologous loci across different species. Similar genomic information is available for e.g. tomato or potato, members of the Solanaceae family. Since both Rubiaceae and Solanaceae belong to the Euasterids I (lamiids) integration of information on genetic markers would be possible and lead to more efficient analyses and discovery of key loci involved in important traits such as fruit development, quality, and maturation, or adaptation. Our goal was to develop a comprehensive web data source for integrated information on validated orthologous markers in Rubiaceae. Description MoccaDB is an online MySQL-PHP driven relational database that houses annotated and/or mapped microsatellite markers in Rubiaceae. In its current release, the database stores 638 markers that have been defined on 259 ESTs and 379 genomic sequences. Marker information was retrieved from 11 published works, and completed with original data on 132 microsatellite markers validated in our laboratory. DNA sequences were derived from three Coffea species/hybrids. Microsatellite markers were checked for similarity, in vitro tested for cross-amplification and diversity/polymorphism status in up to 38 Rubiaceae species belonging to the Cinchonoideae and Rubioideae subfamilies. Functional annotation was provided and some markers associated with described metabolic pathways were also integrated. Users can search the database for marker, sequence, map or diversity information through multi-option query forms. The retrieved data can be browsed and downloaded, along with protocols used, using a standard web browser. MoccaDB also integrates bioinformatics tools (CMap viewer and local BLAST) and hyperlinks to related external data sources (NCBI GenBank and PubMed, SOL Genomic Network database). Conclusion We believe that MoccaDB will be extremely useful for all researchers working in the areas of comparative and functional genomics and molecular evolution, in general, and population analysis and association mapping of Rubiaceae and Solanaceae species, in particular. PMID:19788737

  14. Complete mitochondrial genome sequence of Indian medium carp, Labeo gonius (Hamilton, 1822) and its comparison with other related carp species.

    PubMed

    Behera, Bijay Kumar; Kumari, Kavita; Baisvar, Vishwamitra Singh; Rout, Ajaya Kumar; Pakrashi, Sudip; Paria, Prasenjet; Jena, J K

    2017-01-01

    In the present study, the complete mitochondrial genome sequence of Labeo gonius is reported using PGM sequencer (Ion Torrent). The complete mitogenome of L. gonius is obtained by the de novo sequences assembly of genomic reads using the Torrent Mapping Alignment Program (TMAP) which is 16 614 bp in length. The mitogenome of L. gonius comprised of 13 protein-coding genes, 22 tRNAs, 2 rRNA genes, and D-loop as control region along with gene order and organization, being similar to most of other fish mitogenomes of NCBI databases. The mitogenome in the present study has 99% similarity to the complete mitogenome sequence of Labeo fimbriatus, as reported earlier. The phylogenetic analysis of Cypriniformes depicted that their mitogenomes are closely related to each other. The complete mitogenome sequence of L. gonius would be helpful in understanding the population genetics, phylogenetics, and evolution of Indian Carps.

  15. Lactobacillus cypricasei Lawson et al. 2001 is a later heterotypic synonym of Lactobacillus acidipiscis Tanasupawat et al. 2000.

    PubMed

    Naser, Sabri M; Vancanneyt, Marc; Hoste, Bart; Snauwaert, Cindy; Swings, Jean

    2006-07-01

    The applicability of a multilocus sequence analysis (MLSA)-based identification system for lactobacilli was evaluated. Two housekeeping genes that code for the phenylalanyl-tRNA synthase alpha-subunit (pheS) and RNA polymerase alpha-subunit (rpoA) were sequenced and analysed for members of the Lactobacillus salivarius species group. The type strains of Lactobacillus acidipiscis and Lactobacillus cypricasei were investigated further using a third gene that encodes the alpha-subunit of ATP synthase (atpA). The MLSA data revealed close relatedness between L. acidipiscis and L. cypricasei, with 99.8-100 % pheS, rpoA and atpA gene sequence similarities. Comparison of the 16S rRNA gene sequences of the type strains of the two species confirmed the close relatedness (99.8 % gene sequence similarity) between the two taxa. Similar phenotypes and high DNA-DNA binding values in the range of 84 to 97.5 % confirmed that L. acidipiscis and L. cypricasei are synonymous species. On the basis of the present study, it is proposed that Lactobacillus cypricasei is a later heterotypic synonym of Lactobacillus acidipiscis.

  16. Molecular evidence for piroplasms in wild Reeves' muntjac (Muntiacus reevesi) in China.

    PubMed

    Yang, Ji-fei; Li, You-quan; Liu, Zhi-jie; Liu, Jun-long; Guan, Gui-quan; Chen, Ze; Luo, Jian-xun; Wang, Xiao-long; Yin, Hong

    2014-10-01

    DNA from liver samples of 17 free-ranging wild Reeves' muntjac (Muntiacus reevesi) was used for PCR amplification of piropalsm 18S rRNA gene. Of 17 samples, 14 (82.4%) showed a specific PCR product which were cloned and sequenced. BLAST analysis of the sequences obtained showed similarities to Babesia sp., Theileria capreoli, Theileria uilenbergi and Theileria sp. BO302-SE. Phylogenetic analysis showed that the Babesia sp. detected in the present study was distantly separated from known Babesia species of wild and domestic animals. Six sequences showed 100% similarity to T. capreoli while five sequences were separated from all known Theileria species and constituted an independent clade with Theileria sp. BO302-SE derived from roe deer in Italy; two sequences were close to T. uilenbergi with 97% similarity. This is the first description of hemoparasite infection in free-ranging wild Reeves' muntjac in China. Our results indicate that wild Reeves' muntjac may play an important reservoir role for hemoparasites. Crown Copyright © 2014. Published by Elsevier Ireland Ltd. All rights reserved.

  17. What is a melody? On the relationship between pitch and brightness of timbre

    PubMed Central

    Cousineau, Marion; Carcagno, Samuele; Demany, Laurent; Pressnitzer, Daniel

    2014-01-01

    Previous studies showed that the perceptual processing of sound sequences is more efficient when the sounds vary in pitch than when they vary in loudness. We show here that sequences of sounds varying in brightness of timbre are processed with the same efficiency as pitch sequences. The sounds used consisted of two simultaneous pure tones one octave apart, and the listeners’ task was to make same/different judgments on pairs of sequences varying in length (one, two, or four sounds). In one condition, brightness of timbre was varied within the sequences by changing the relative level of the two pure tones. In other conditions, pitch was varied by changing fundamental frequency, or loudness was varied by changing the overall level. In all conditions, only two possible sounds could be used in a given sequence, and these two sounds were equally discriminable. When sequence length increased from one to four, discrimination performance decreased substantially for loudness sequences, but to a smaller extent for brightness sequences and pitch sequences. In the latter two conditions, sequence length had a similar effect on performance. These results suggest that the processes dedicated to pitch and brightness analysis, when probed with a sequence-discrimination task, share unexpected similarities. PMID:24478638

  18. Parent and public interest in whole-genome sequencing.

    PubMed

    Dodson, Daniel S; Goldenberg, Aaron J; Davis, Matthew M; Singer, Dianne C; Tarini, Beth A

    2015-01-01

    The aim of this study was to assess the baseline interest of the public in whole-genome sequencing (WGS) for oneself, parents' interest in WGS for their youngest children, and factors associated with such interest. A random sample of adults from a probability-based nationally representative online panel was surveyed. All participants were provided basic information about WGS and then asked about their interest in WGS for themselves. Those participants who were parents were additionally asked about their interest in WGS for their children. The order in which parents were asked about their interest in WGS for themselves and for their child was randomized. The relationship between parent/child characteristics and interest in WGS was examined. The overall response rate was 62% (55% among parents). 58.6% of the total population (parents and nonparents) was interested in WGS for themselves. Similarly, 61.8% of the parents were interested in WGS for themselves and 57.8% were interested in WGS for their youngest children. Of note, 84.7% of the parents showed an identical interest level in WGS for themselves and their youngest children. Mothers as a group and parents whose youngest children had ≥2 health conditions had significantly more interest in WGS for themselves and their youngest children, while those with conservative political ideologies had considerably less. While US adults have varying interest levels in WGS, parents appear to have similar interests in genome testing for themselves and their youngest children. As WGS technology becomes available in the clinic and private market, clinicians should be prepared to discuss WGS risks and benefits with their patients. © 2015 S. Karger AG, Basel.

  19. Phosphorylation and cellular function of the human Rpa2 N-terminus in the budding yeast Saccharomyces cerevisiae.

    PubMed

    Ghospurkar, Padmaja L; Wilson, Timothy M; Liu, Shengqin; Herauf, Anna; Steffes, Jenna; Mueller, Erica N; Oakley, Gregory G; Haring, Stuart J

    2015-02-01

    Maintenance of genome integrity is critical for proper cell growth. This occurs through accurate DNA replication and repair of DNA lesions. A key factor involved in both DNA replication and the DNA damage response is the heterotrimeric single-stranded DNA (ssDNA) binding complex Replication Protein A (RPA). Although the RPA complex appears to be structurally conserved throughout eukaryotes, the primary amino acid sequence of each subunit can vary considerably. Examination of sequence differences along with the functional interchangeability of orthologous RPA subunits or regions could provide insight into important regions and their functions. This might also allow for study in simpler systems. We determined that substitution of yeast Replication Factor A (RFA) with human RPA does not support yeast cell viability. Exchange of a single yeast RFA subunit with the corresponding human RPA subunit does not function due to lack of inter-species subunit interactions. Substitution of yeast Rfa2 with domains/regions of human Rpa2 important for Rpa2 function (i.e., the N-terminus and the loop 3-4 region) supports viability in yeast cells, and hybrid proteins containing human Rpa2 N-terminal phospho-mutations result in similar DNA damage phenotypes to analogous yeast Rfa2 N-terminal phospho-mutants. Finally, the human Rpa2 N-terminus (NT) fused to yeast Rfa2 is phosphorylated in a manner similar to human Rpa2 in human cells, indicating that conserved kinases recognize the human domain in yeast. The implication is that budding yeast represents a potential model system for studying not only human Rpa2 N-terminal phosphorylation, but also phosphorylation of Rpa2 N-termini from other eukaryotic organisms. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  20. The complete CDS of the prion protein (PRNP) gene of African lion (Panthera leo).

    PubMed

    Maj, Andrzej; Spellman, Garth M; Sarver, Shane K

    2008-04-01

    We provide the complete PRNP CDS sequence for the African lion, which is different from the previously published sequence and more similar to other carnivore sequences. The newly obtained prion protein sequence differs from the domestic cat sequence at three amino acid positions and contains only four octapeptide repeats. We recommend that this sequence be used as the reference sequence for future studies of the PRNP gene for this species.

  1. PaperBLAST: Text Mining Papers for Information about Homologs

    DOE PAGES

    Price, Morgan N.; Arkin, Adam P.

    2017-08-15

    Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST’s database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quicklymore » finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins’ functions.« less

  2. PaperBLAST: Text Mining Papers for Information about Homologs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Price, Morgan N.; Arkin, Adam P.

    Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST’s database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quicklymore » finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins’ functions.« less

  3. PaperBLAST: Text Mining Papers for Information about Homologs

    PubMed Central

    Arkin, Adam P.

    2017-01-01

    ABSTRACT Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST’s database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. PaperBLAST is available at http://papers.genomics.lbl.gov/. IMPORTANCE With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins’ functions. PMID:28845458

  4. PaperBLAST: Text Mining Papers for Information about Homologs.

    PubMed

    Price, Morgan N; Arkin, Adam P

    2017-01-01

    Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST's database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. PaperBLAST is available at http://papers.genomics.lbl.gov/. IMPORTANCE With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins' functions.

  5. Orthology detection combining clustering and synteny for very large datasets.

    PubMed

    Lechner, Marcus; Hernandez-Rosales, Maribel; Doerr, Daniel; Wieseke, Nicolas; Thévenin, Annelyse; Stoye, Jens; Hartmann, Roland K; Prohaska, Sonja J; Stadler, Peter F

    2014-01-01

    The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.

  6. Orthology Detection Combining Clustering and Synteny for Very Large Datasets

    PubMed Central

    Lechner, Marcus; Hernandez-Rosales, Maribel; Doerr, Daniel; Wieseke, Nicolas; Thévenin, Annelyse; Stoye, Jens; Hartmann, Roland K.; Prohaska, Sonja J.; Stadler, Peter F.

    2014-01-01

    The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets. PMID:25137074

  7. Detecting atypical examples of known domain types by sequence similarity searching: the SBASE domain library approach.

    PubMed

    Dhir, Somdutta; Pacurar, Mircea; Franklin, Dino; Gáspári, Zoltán; Kertész-Farkas, Attila; Kocsor, András; Eisenhaber, Frank; Pongor, Sándor

    2010-11-01

    SBASE is a project initiated to detect known domain types and predicting domain architectures using sequence similarity searching (Simon et al., Protein Seq Data Anal, 5: 39-42, 1992, Pongor et al, Nucl. Acids. Res. 21:3111-3115, 1992). The current approach uses a curated collection of domain sequences - the SBASE domain library - and standard similarity search algorithms, followed by postprocessing which is based on a simple statistics of the domain similarity network (http://hydra.icgeb.trieste.it/sbase/). It is especially useful in detecting rare, atypical examples of known domain types which are sometimes missed even by more sophisticated methodologies. This approach does not require multiple alignment or machine learning techniques, and can be a useful complement to other domain detection methodologies. This article gives an overview of the project history as well as of the concepts and principles developed within this the project.

  8. The Development of Mental Models for Auditory Events: Relational Complexity and Discrimination of Pitch and Duration

    ERIC Educational Resources Information Center

    Stevens, Catherine; Gallagher, Melinda

    2004-01-01

    This experiment investigated relational complexity and relational shift in judgments of auditory patterns. Pitch and duration values were used to construct two-note perceptually similar sequences (unary relations) and four-note relationally similar sequences (binary relations). It was hypothesized that 5-, 8- and 11-year-old children would perform…

  9. Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

    DOE PAGES

    Yim, Won Cheol; Cushman, John C.

    2017-07-22

    Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible andmore » used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.« less

  10. Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yim, Won Cheol; Cushman, John C.

    Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible andmore » used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.« less

  11. Diversity of Babesia bovis merozoite surface antigen genes in the Philippines.

    PubMed

    Tattiyapong, Muncharee; Sivakumar, Thillaiampalam; Ybanez, Adrian Patalinghug; Ybanez, Rochelle Haidee Daclan; Perez, Zandro Obligado; Guswanto, Azirwan; Igarashi, Ikuo; Yokoyama, Naoaki

    2014-02-01

    Babesia bovis is the causative agent of fatal babesiosis in cattle. In the present study, we investigated the genetic diversity of B. bovis among Philippine cattle, based on the genes that encode merozoite surface antigens (MSAs). Forty-one B. bovis-positive blood DNA samples from cattle were used to amplify the msa-1, msa-2b, and msa-2c genes. In phylogenetic analyses, the msa-1, msa-2b, and msa-2c gene sequences generated from Philippine B. bovis-positive DNA samples were found in six, three, and four different clades, respectively. All of the msa-1 and most of the msa-2b sequences were found in clades that were formed only by Philippine msa sequences in the respective phylograms. While all the msa-1 sequences from the Philippines showed similarity to those formed by Australian msa-1 sequences, the msa-2b sequences showed similarity to either Australian or Mexican msa-2b sequences. In contrast, msa-2c sequences from the Philippines were distributed across all the clades of the phylogram, although one clade was formed exclusively by Philippine msa-2c sequences. Similarities among the deduced amino acid sequences of MSA-1, MSA-2b, and MSA-2c from the Philippines were 62.2-100, 73.1-100, and 67.3-100%, respectively. The present findings demonstrate that B. bovis populations are genetically diverse in the Philippines. This information will provide a good foundation for the future design and implementation of improved immunological preventive methodologies against bovine babesiosis in the Philippines. The study has also generated a set of data that will be useful for futher understanding of the global genetic diversity of this important parasite. © 2013.

  12. DEMO: Sequence Alignment to Predict Across Species Susceptibility

    EPA Science Inventory

    The US Environmental Protection Agency Sequence Alignment to Predict Across Species Susceptibility tool (SeqAPASS; https://seqapass.epa.gov/seqapass/) was developed to comparatively evaluate protein sequence and structural similarity across species as a means to extrapolate toxic...

  13. Transcriptome Profiling of Khat (Catha edulis) and Ephedra sinica Reveals Gene Candidates Potentially Involved in Amphetamine-Type Alkaloid Biosynthesis

    PubMed Central

    Groves, Ryan A.; Hagel, Jillian M.; Zhang, Ye; Kilpatrick, Korey; Levy, Asaf; Marsolais, Frédéric; Lewinsohn, Efraim; Sensen, Christoph W.; Facchini, Peter J.

    2015-01-01

    Amphetamine analogues are produced by plants in the genus Ephedra and by khat (Catha edulis), and include the widely used decongestants and appetite suppressants (1S,2S)-pseudoephedrine and (1R,2S)-ephedrine. The production of these metabolites, which derive from L-phenylalanine, involves a multi-step pathway partially mapped out at the biochemical level using knowledge of benzoic acid metabolism established in other plants, and direct evidence using khat and Ephedra species as model systems. Despite the commercial importance of amphetamine-type alkaloids, only a single step in their biosynthesis has been elucidated at the molecular level. We have employed Illumina next-generation sequencing technology, paired with Trinity and Velvet-Oases assembly platforms, to establish data-mining frameworks for Ephedra sinica and khat plants. Sequence libraries representing a combined 200,000 unigenes were subjected to an annotation pipeline involving direct searches against public databases. Annotations included the assignment of Gene Ontology (GO) terms used to allocate unigenes to functional categories. As part of our functional genomics program aimed at novel gene discovery, the databases were mined for enzyme candidates putatively involved in alkaloid biosynthesis. Queries used for mining included enzymes with established roles in benzoic acid metabolism, as well as enzymes catalyzing reactions similar to those predicted for amphetamine alkaloid metabolism. Gene candidates were evaluated based on phylogenetic relationships, FPKM-based expression data, and mechanistic considerations. Establishment of expansive sequence resources is a critical step toward pathway characterization, a goal with both academic and industrial implications. PMID:25806807

  14. Genometa--a fast and accurate classifier for short metagenomic shotgun reads.

    PubMed

    Davenport, Colin F; Neugebauer, Jens; Beckmann, Nils; Friedrich, Benedikt; Kameri, Burim; Kokott, Svea; Paetow, Malte; Siekmann, Björn; Wieding-Drewes, Matthias; Wienhöfer, Markus; Wolf, Stefan; Tümmler, Burkhard; Ahlers, Volker; Sprengel, Frauke

    2012-01-01

    Metagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We here present Genometa, a computationally undemanding graphical user interface program that enables identification of bacterial species and gene content from datasets generated by inexpensive high-throughput short read sequencing technologies. Our approach was first verified on two simulated metagenomic short read datasets, detecting 100% and 94% of the bacterial species included with few false positives or false negatives. Subsequent comparative benchmarking analysis against three popular metagenomic algorithms on an Illumina human gut dataset revealed Genometa to attribute the most reads to bacteria at species level (i.e. including all strains of that species) and demonstrate similar or better accuracy than the other programs. Lastly, speed was demonstrated to be many times that of BLAST due to the use of modern short read aligners. Our method is highly accurate if bacteria in the sample are represented by genomes in the reference sequence but cannot find species absent from the reference. This method is one of the most user-friendly and resource efficient approaches and is thus feasible for rapidly analysing millions of short reads on a personal computer. The Genometa program, a step by step tutorial and Java source code are freely available from http://genomics1.mh-hannover.de/genometa/ and on http://code.google.com/p/genometa/. This program has been tested on Ubuntu Linux and Windows XP/7.

  15. alpha-Amylase gene of Streptomyces limosus: nucleotide sequence, expression motifs, and amino acid sequence homology to mammalian and invertebrate alpha-amylases.

    PubMed Central

    Long, C M; Virolle, M J; Chang, S Y; Chang, S; Bibb, M J

    1987-01-01

    The nucleotide sequence of the coding and regulatory regions of the alpha-amylase gene (aml) of Streptomyces limosus was determined. High-resolution S1 mapping was used to locate the 5' end of the transcript and demonstrated that the gene is transcribed from a unique promoter. The predicted amino acid sequence has considerable identity to mammalian and invertebrate alpha-amylases, but not to those of plant, fungal, or eubacterial origin. Consistent with this is the susceptibility of the enzyme to an inhibitor of mammalian alpha-amylases. The amino-terminal sequence of the extracellular enzyme was determined, revealing the presence of a typical signal peptide preceding the mature form of the alpha-amylase. Images PMID:3500166

  16. Development and integration of block operations for data invariant automation of digital preprocessing and analysis of biological and biomedical Raman spectra.

    PubMed

    Schulze, H Georg; Turner, Robin F B

    2015-06-01

    High-throughput information extraction from large numbers of Raman spectra is becoming an increasingly taxing problem due to the proliferation of new applications enabled using advances in instrumentation. Fortunately, in many of these applications, the entire process can be automated, yielding reproducibly good results with significant time and cost savings. Information extraction consists of two stages, preprocessing and analysis. We focus here on the preprocessing stage, which typically involves several steps, such as calibration, background subtraction, baseline flattening, artifact removal, smoothing, and so on, before the resulting spectra can be further analyzed. Because the results of some of these steps can affect the performance of subsequent ones, attention must be given to the sequencing of steps, the compatibility of these sequences, and the propensity of each step to generate spectral distortions. We outline here important considerations to effect full automation of Raman spectral preprocessing: what is considered full automation; putative general principles to effect full automation; the proper sequencing of processing and analysis steps; conflicts and circularities arising from sequencing; and the need for, and approaches to, preprocessing quality control. These considerations are discussed and illustrated with biological and biomedical examples reflecting both successful and faulty preprocessing.

  17. Exploring the limits of sequence and structure in a variant βγ-crystallin domain of the protein absent in melanoma-1 (AIM1)

    PubMed Central

    Aravind, Penmatsa; Wistow, Graeme; Sharma, Yogendra; Sankaranarayanan, Rajan

    2008-01-01

    βγ-Crystallins belong to a superfamily of proteins in prokaryotes and eukaryotes that are based on duplications of a characteristic, highly conserved Greek Key motif. Most members of the superfamily in vertebrates are structural proteins of the eye lens that contain four motifs arranged as two structural domains. Absent in melanoma-1 (AIM1), an unusual member of the superfamily whose expression is associated with suppression of malignancy in melanoma, contains 12 βγ-crystallin motifs in six domains. Some of these motifs diverge considerably from the canonical motif sequence. AIM1g1, the first βγ-crystallin domain of AIM1, is the most variant of βγ-crystallin domains currently known. In order to understand the limits of sequence variation on the structure, we report the crystal structure of AIM1g1 at 1.9Å resolution. In spite of having changes in key residues, the domain retains the overall βγ-crystallin fold. The domain also contains an unusual extended surface loop that significantly alters the shape of the domain and its charge profile. This structure illustrates the resilience of the βγ fold to considerable sequence changes and its remarkable ability to adapt for novel functions. PMID:18582473

  18. Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

    PubMed Central

    Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

    2010-01-01

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665

  19. Sequence similarity is more relevant than species specificity in probabilistic backtranslation.

    PubMed

    Ferro, Alfredo; Giugno, Rosalba; Pigola, Giuseppe; Pulvirenti, Alfredo; Di Pietro, Cinzia; Purrello, Michele; Ragusa, Marco

    2007-02-21

    Backtranslation is the process of decoding a sequence of amino acids into the corresponding codons. All synthetic gene design systems include a backtranslation module. The degeneracy of the genetic code makes backtranslation potentially ambiguous since most amino acids are encoded by multiple codons. The common approach to overcome this difficulty is based on imitation of codon usage within the target species. This paper describes EasyBack, a new parameter-free, fully-automated software for backtranslation using Hidden Markov Models. EasyBack is not based on imitation of codon usage within the target species, but instead uses a sequence-similarity criterion. The model is trained with a set of proteins with known cDNA coding sequences, constructed from the input protein by querying the NCBI databases with BLAST. Unlike existing software, the proposed method allows the quality of prediction to be estimated. When tested on a group of proteins that show different degrees of sequence conservation, EasyBack outperforms other published methods in terms of precision. The prediction quality of a protein backtranslation methis markedly increased by replacing the criterion of most used codon in the same species with a Hidden Markov Model trained with a set of most similar sequences from all species. Moreover, the proposed method allows the quality of prediction to be estimated probabilistically.

  20. A Global Comparison of the Human and T. brucei Degradomes Gives Insights about Possible Parasite Drug Targets

    PubMed Central

    Mashiyama, Susan T.; Koupparis, Kyriacos; Caffrey, Conor R.; McKerrow, James H.; Babbitt, Patricia C.

    2012-01-01

    We performed a genome-level computational study of sequence and structure similarity, the latter using crystal structures and models, of the proteases of Homo sapiens and the human parasite Trypanosoma brucei. Using sequence and structure similarity networks to summarize the results, we constructed global views that show visually the relative abundance and variety of proteases in the degradome landscapes of these two species, and provide insights into evolutionary relationships between proteases. The results also indicate how broadly these sequence sets are covered by three-dimensional structures. These views facilitate cross-species comparisons and offer clues for drug design from knowledge about the sequences and structures of potential drug targets and their homologs. Two protease groups (“M32” and “C51”) that are very different in sequence from human proteases are examined in structural detail, illustrating the application of this global approach in mining new pathogen genomes for potential drug targets. Based on our analyses, a human ACE2 inhibitor was selected for experimental testing on one of these parasite proteases, TbM32, and was shown to inhibit it. These sequence and structure data, along with interactive versions of the protein similarity networks generated in this study, are available at http://babbittlab.ucsf.edu/resources.html. PMID:23236535

  1. Shuttle/Agena study. Volume 2, part 3: Preliminary test plans

    NASA Technical Reports Server (NTRS)

    1972-01-01

    Proposed testing for the Agena tug program is based upon best estimates of shuttle and Agena tug requirements and upon the Agena configuration currently envisioned to meet these requirements. The proposed tests are presented in development, qualification, system, and launch base test plans. These plans are based upon generalized requirements and assumed situations. The limitations of this study precluded all but minimal consideration of related shuttle orbiter and shuttle ground systems. The test plans include provisions for all testing from major component to systems level, identified as necessary to aid in confirmation of the modified Agena configuration for the space tug; considerations that crew safety requirements and new environmental conditions from shuttle interface effects do impose some new Agena testing requirements; considerations that many existing Agena flight-qualified components will be utilized and qualification testing will be minimal; testing not only for the Agena tug but also for new or modified items of handling or servicing equipment for supporting the Agena factory-to-launch sequence; and the assembly of required testing into a sequence-ordered series of events.

  2. A novel model for DNA sequence similarity analysis based on graph theory.

    PubMed

    Qi, Xingqin; Wu, Qin; Zhang, Yusen; Fuller, Eddie; Zhang, Cun-Quan

    2011-01-01

    Determination of sequence similarity is one of the major steps in computational phylogenetic studies. As we know, during evolutionary history, not only DNA mutations for individual nucleotide but also subsequent rearrangements occurred. It has been one of major tasks of computational biologists to develop novel mathematical descriptors for similarity analysis such that various mutation phenomena information would be involved simultaneously. In this paper, different from traditional methods (eg, nucleotide frequency, geometric representations) as bases for construction of mathematical descriptors, we construct novel mathematical descriptors based on graph theory. In particular, for each DNA sequence, we will set up a weighted directed graph. The adjacency matrix of the directed graph will be used to induce a representative vector for DNA sequence. This new approach measures similarity based on both ordering and frequency of nucleotides so that much more information is involved. As an application, the method is tested on a set of 0.9-kb mtDNA sequences of twelve different primate species. All output phylogenetic trees with various distance estimations have the same topology, and are generally consistent with the reported results from early studies, which proves the new method's efficiency; we also test the new method on a simulated data set, which shows our new method performs better than traditional global alignment method when subsequent rearrangements happen frequently during evolutionary history.

  3. Eco-epidemiology of Novel Bartonella Genotypes from Parasitic Flies of Insectivorous Bats.

    PubMed

    Sándor, Attila D; Földvári, Mihály; Krawczyk, Aleksandra I; Sprong, Hein; Corduneanu, Alexandra; Barti, Levente; Görföl, Tamás; Estók, Péter; Kováts, Dávid; Szekeres, Sándor; László, Zoltán; Hornok, Sándor; Földvári, Gábor

    2018-04-29

    Bats are important zoonotic reservoirs for many pathogens worldwide. Although their highly specialized ectoparasites, bat flies (Diptera: Hippoboscoidea), can transmit Bartonella bacteria including human pathogens, their eco-epidemiology is unexplored. Here, we analyzed the prevalence and diversity of Bartonella strains sampled from 10 bat fly species from 14 European bat species. We found high prevalence of Bartonella spp. in most bat fly species with wide geographical distribution. Bat species explained most of the variance in Bartonella distribution with the highest prevalence of infected flies recorded in species living in dense groups exclusively in caves. Bat gender but not bat fly gender was also an important factor with the more mobile male bats giving more opportunity for the ectoparasites to access several host individuals. We detected high diversity of Bartonella strains (18 sequences, 7 genotypes, in 9 bat fly species) comparable with tropical assemblages of bat-bat fly association. Most genotypes are novel (15 out of 18 recorded strains have a similarity of 92-99%, with three sequences having 100% similarity to Bartonella spp. sequences deposited in GenBank) with currently unknown pathogenicity; however, 4 of these sequences are similar (up to 92% sequence similarity) to Bartonella spp. with known zoonotic potential. The high prevalence and diversity of Bartonella spp. suggests a long shared evolution of these bacteria with bat flies and bats providing excellent study targets for the eco-epidemiology of host-vector-pathogen cycles.

  4. Cloning and characterization of the gene encoding IMP dehydrogenase from Arabidopsis thaliana.

    PubMed

    Collart, F R; Osipiuk, J; Trent, J; Olsen, G J; Huberman, E

    1996-10-03

    We have cloned and characterized the gene encoding inosine monophosphate dehydrogenase (IMPDH) from Arabidopsis thaliana (At). The transcription unit of the At gene spans approximately 1900 bp and specifies a protein of 503 amino acids with a calculated relative molecular mass (M(r)) of 54,190. The gene is comprised of a minimum of four introns and five exons with all donor and acceptor splice sequences conforming to previously proposed consensus sequences. The deduced IMPDH amino-acid sequence from At shows a remarkable similarity to other eukaryotic IMPDH sequences, with a 48% identity to human Type II enzyme. Allowing for conservative substitutions, the enzyme is 69% similar to human Type II IMPDH. The putative active-site sequence of At IMPDH conforms to the IMP dehydrogenase/guanosine monophosphate reductase motif and contains an essential active-site cysteine residue.

  5. Molecular Signatures of Microbial Metabolism in an Actively Growing, Silicified, Microbial Structure from Yellowstone National Park

    NASA Astrophysics Data System (ADS)

    Ferreira, M.; Creveling, J.; Hilburn, I.; Karlsson, E.; Pepe-Ranney, C.; Spear, J.; Dawson, S.; Geobio2008, I.

    2008-12-01

    Silicified structures that exhibit a putative biologic component in their formation permeate the rock record as stromatolites. We have studied a silicified microbial structure from a hot spring in Yellowstone National Park using phenotypic, phylogenetic, and metagenomic analyses to determine microbial carbon metabolic pathways and the phylogenetic affiliations of microbes present in this unique structure. In this multi-faceted approach, dominant physiologies, specifically with regards to anaerobic and aerobic metabolisms, were inferred from 16S rRNA gene sequences and 454 sequencing data from bulk DNA samples of the structure. Carbon utilization as indicated by ECO Biolog plates showed abundant heterotrophy and heterotrophic diversity throughout the microbial structure. Microbes within the structure are able to utilize all tested sources of carbohydrates, lipids/fatty acids, and protein/amino acids as carbon sources. ECO plate testing of the hot spring water yielded considerable less carbohydrate consumption (only 4 out of 13 tested carbohydrates) and similar lipids/fatty acids and protein/amino acids consumption (2 out of 3 and 5 out of 5 tested sources respectively). Full length 16S rRNA gene sequences and metagenomic 454 pyrosequencing of community DNA showed limited diversity among primary producers. From the 16S data, the majority of the autotrophs are inferred to utilize the Calvin cycle for CO2 fixation, followed by 3-hydroxypropionate/4- hydroxybutyrate CO2 fixation. However, an analysis of the metagenomic data compared to the KEGG database does not show genes directly involved with Calvin cycle carbon fixation. Further BLAST searches of our data failed to find significant matches within our 6514 metagenomic sequences to known RuBisCo sequences taken from the NCBI database. This is likely due to a far under-sampled dataset of metagenomic sequences, and the low number (958) that had matches to the KEGG pathways database. Anaerobic versus aerobic physiology also can be estimated from the 16S clone libraries. Phylogenetic analysis of recovered 16S sequences suggests that 15% of the 16S sequences can be attributed to anaerobic microbes while 42% likely come from aerobes. The remaining 43% of 16S rRNA gene sequences belong to metabolically unassigned phyla both known and novel. This preliminary study demonstrates that the small spatially stratified silicified microbial structure present on the margins of a hot spring contains a rich and complex microbial community with different trophic levels and enzymatic pathways.

  6. Simple chained guide trees give high-quality protein multiple sequence alignments

    PubMed Central

    Boyce, Kieran; Sievers, Fabian; Higgins, Desmond G.

    2014-01-01

    Guide trees are used to decide the order of sequence alignment in the progressive multiple sequence alignment heuristic. These guide trees are often the limiting factor in making large alignments, and considerable effort has been expended over the years in making these quickly or accurately. In this article we show that, at least for protein families with large numbers of sequences that can be benchmarked with known structures, simple chained guide trees give the most accurate alignments. These also happen to be the fastest and simplest guide trees to construct, computationally. Such guide trees have a striking effect on the accuracy of alignments produced by some of the most widely used alignment packages. There is a marked increase in accuracy and a marked decrease in computational time, once the number of sequences goes much above a few hundred. This is true, even if the order of sequences in the guide tree is random. PMID:25002495

  7. Genetic diversity of Burkholderia (Proteobacteria) species from the Caatinga and Atlantic rainforest biomes in Bahia, Brazil.

    PubMed

    Santini, A C; Santos, H R M; Gross, E; Corrêa, R X

    2013-03-11

    The genus Burkholderia (β-Proteobacteria) currently comprises more than 60 species, including parasites, symbionts and free-living organisms. Several new species of Burkholderia have recently been described showing a great diversity of phenotypes. We examined the diversity of Burkholderia spp in environmental samples collected from Caatinga and Atlantic rainforest biomes of Bahia, Brazil. Legume nodules were collected from five locations, and 16S rDNA and recA genes of the isolated microorganisms were analyzed. Thirty-three contigs of 16S rRNA genes and four contigs of the recA gene related to the genus Burkholderia were obtained. The genetic dissimilarity of the strains ranged from 0 to 2.5% based on 16S rDNA analysis, indicating two main branches: one distinct branch of the dendrogram for the B. cepacia complex and another branch that rendered three major groups, partially reflecting host plants and locations. A dendrogram designed with sequences of this research and those designed with sequences of Burkholderia-type strains and the first hit BLAST had similar topologies. A dendrogram similar to that constructed by analysis of 16S rDNA was obtained using sequences of the fragment of the recA gene. The 16S rDNA sequences enabled sufficient identification of relevant similarities and groupings amongst isolates and the sequences that we obtained. Only 6 of the 33 isolates analyzed via 16S rDNA sequencing showed high similarity with the B. cepacia complex. Thus, over 3/4 of the isolates have potential for biotechnological applications.

  8. Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST): A web tool for generating protein sequence similarity networks.

    PubMed

    Gerlt, John A; Bouvier, Jason T; Davidson, Daniel B; Imker, Heidi J; Sadkhin, Boris; Slater, David R; Whalen, Katie L

    2015-08-01

    The Enzyme Function Initiative, an NIH/NIGMS-supported Large-Scale Collaborative Project (EFI; U54GM093342; http://enzymefunction.org/), is focused on devising and disseminating bioinformatics and computational tools as well as experimental strategies for the prediction and assignment of functions (in vitro activities and in vivo physiological/metabolic roles) to uncharacterized enzymes discovered in genome projects. Protein sequence similarity networks (SSNs) are visually powerful tools for analyzing sequence relationships in protein families (H.J. Atkinson, J.H. Morris, T.E. Ferrin, and P.C. Babbitt, PLoS One 2009, 4, e4345). However, the members of the biological/biomedical community have not had access to the capability to generate SSNs for their "favorite" protein families. In this article we announce the EFI-EST (Enzyme Function Initiative-Enzyme Similarity Tool) web tool (http://efi.igb.illinois.edu/efi-est/) that is available without cost for the automated generation of SSNs by the community. The tool can create SSNs for the "closest neighbors" of a user-supplied protein sequence from the UniProt database (Option A) or of members of any user-supplied Pfam and/or InterPro family (Option B). We provide an introduction to SSNs, a description of EFI-EST, and a demonstration of the use of EFI-EST to explore sequence-function space in the OMP decarboxylase superfamily (PF00215). This article is designed as a tutorial that will allow members of the community to use the EFI-EST web tool for exploring sequence/function space in protein families. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. Characteristic motifs for families of allergenic proteins

    PubMed Central

    Ivanciuc, Ovidiu; Garcia, Tzintzuni; Torres, Miguel; Schein, Catherine H.; Braun, Werner

    2008-01-01

    The identification of potential allergenic proteins is usually done by scanning a database of allergenic proteins and locating known allergens with a high sequence similarity. However, there is no universally accepted cut-off value for sequence similarity to indicate potential IgE cross-reactivity. Further, overall sequence similarity may be less important than discrete areas of similarity in proteins with homologous structure. To identify such areas, we first classified all allergens and their subdomains in the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/) to their closest protein families as defined in Pfam, and identified conserved physicochemical property motifs characteristic of each group of sequences. Allergens populate only a small subset of all known Pfam families, as all allergenic proteins in SDAP could be grouped to only 130 (of 9318 total) Pfams, and 31 families contain more than four allergens. Conserved physicochemical property motifs for the aligned sequences of the most populated Pfam families were identified with the PCPMer program suite and catalogued in the webserver Motif-Mate (http://born.utmb.edu/motifmate/summary.php). We also determined specific motifs for allergenic members of a family that could distinguish them from non-allergenic ones. These allergen specific motifs should be most useful in database searches for potential allergens. We found that sequence motifs unique to the allergens in three families (seed storage proteins, Bet v 1, and tropomyosin) overlap with known IgE epitopes, thus providing evidence that our motif based approach can be used to assess the potential allergenicity of novel proteins. PMID:18951633

  10. Reappraisal of the taxonomy of Streptococcus suis serotypes 20, 22 and 26: Streptococcus parasuis sp. nov.

    PubMed

    Nomoto, R; Maruyama, F; Ishida, S; Tohya, M; Sekizaki, T; Osawa, Ro

    2015-02-01

    In order to clarify the taxonomic position of serotypes 20, 22 and 26 of Streptococcus suis, biochemical and molecular genetic studies were performed on isolates (SUT-7, SUT-286(T), SUT-319, SUT-328 and SUT-380) reacted with specific antisera of serotypes 20, 22 or 26 from the saliva of healthy pigs as well as reference strains of serotypes 20, 22 and 26. Comparative recN gene sequencing showed high genetic relatedness among our isolates, but marked differences from the type strain S. suis NCTC 10234(T), i.e. 74.8-75.7 % sequence similarity. The genomic relatedness between the isolates and other strains of species of the genus Streptococcus, including S. suis, was calculated using the average nucleotide identity values of whole genome sequences, which indicated that serotypes 20, 22 and 26 should be removed taxonomically from S. suis and treated as a novel genomic species. Comparative sequence analysis revealed 99.0-100 % sequence similarities for the 16S rRNA genes between the reference strains of serotypes 20, 22 and 26, and our isolates. Isolate STU-286(T) had relatively high 16S rRNA gene sequence similarity with S. suis NCTC 10234(T) (98.8 %). SUT-286(T) could be distinguished from S. suis and other closely related species of the genus Streptococcus using biochemical tests. Due to its phylogenetic and phenotypic similarities to S. suis we propose naming the novel species Streptococcus parasuis sp. nov., with SUT-286(T) ( = JCM 30273(T) = DSM 29126(T)) as the type strain. © 2015 IUMS.

  11. A preliminary 'test case' manufacturing sequence for 50 cents/watt solar photovoltaic modules in 1986

    NASA Technical Reports Server (NTRS)

    Bickler, D. B.

    1979-01-01

    The paper describes a 'test case' manufacturing process sequence for solar photovoltaic modules which will cost 50 cents/watt in 1986. The process, which starts with the purification of silicon grown into 75-mm-wide thin ribbons, is discussed, and the plant layout is depicted; each department is sized to produce 250 MW of modules/per year. The cost of this process sequence is compared to present technology at various companies showing considerable spread for each process; data are tabulated in a composite state-of-the-art cell processing cost summary for these processes.

  12. Precision medicine in the age of big data: The present and future role of large-scale unbiased sequencing in drug discovery and development.

    PubMed

    Vicini, P; Fields, O; Lai, E; Litwack, E D; Martin, A-M; Morgan, T M; Pacanowski, M A; Papaluca, M; Perez, O D; Ringel, M S; Robson, M; Sakul, H; Vockley, J; Zaks, T; Dolsten, M; Søgaard, M

    2016-02-01

    High throughput molecular and functional profiling of patients is a key driver of precision medicine. DNA and RNA characterization has been enabled at unprecedented cost and scale through rapid, disruptive progress in sequencing technology, but challenges persist in data management and interpretation. We analyze the state-of-the-art of large-scale unbiased sequencing in drug discovery and development, including technology, application, ethical, regulatory, policy and commercial considerations, and discuss issues of LUS implementation in clinical and regulatory practice. © 2015 American Society for Clinical Pharmacology and Therapeutics.

  13. Collaborative Filtering Recommendation on Users' Interest Sequences.

    PubMed

    Cheng, Weijie; Yin, Guisheng; Dong, Yuxin; Dong, Hongbin; Zhang, Wansong

    2016-01-01

    As an important factor for improving recommendations, time information has been introduced to model users' dynamic preferences in many papers. However, the sequence of users' behaviour is rarely studied in recommender systems. Due to the users' unique behavior evolution patterns and personalized interest transitions among items, users' similarity in sequential dimension should be introduced to further distinguish users' preferences and interests. In this paper, we propose a new collaborative filtering recommendation method based on users' interest sequences (IS) that rank users' ratings or other online behaviors according to the timestamps when they occurred. This method extracts the semantics hidden in the interest sequences by the length of users' longest common sub-IS (LCSIS) and the count of users' total common sub-IS (ACSIS). Then, these semantics are utilized to obtain users' IS-based similarities and, further, to refine the similarities acquired from traditional collaborative filtering approaches. With these updated similarities, transition characteristics and dynamic evolution patterns of users' preferences are considered. Our new proposed method was compared with state-of-the-art time-aware collaborative filtering algorithms on datasets MovieLens, Flixster and Ciao. The experimental results validate that the proposed recommendation method is effective and outperforms several existing algorithms in the accuracy of rating prediction.

  14. Collaborative Filtering Recommendation on Users’ Interest Sequences

    PubMed Central

    Cheng, Weijie; Yin, Guisheng; Dong, Yuxin; Dong, Hongbin; Zhang, Wansong

    2016-01-01

    As an important factor for improving recommendations, time information has been introduced to model users’ dynamic preferences in many papers. However, the sequence of users’ behaviour is rarely studied in recommender systems. Due to the users’ unique behavior evolution patterns and personalized interest transitions among items, users’ similarity in sequential dimension should be introduced to further distinguish users’ preferences and interests. In this paper, we propose a new collaborative filtering recommendation method based on users’ interest sequences (IS) that rank users’ ratings or other online behaviors according to the timestamps when they occurred. This method extracts the semantics hidden in the interest sequences by the length of users’ longest common sub-IS (LCSIS) and the count of users’ total common sub-IS (ACSIS). Then, these semantics are utilized to obtain users’ IS-based similarities and, further, to refine the similarities acquired from traditional collaborative filtering approaches. With these updated similarities, transition characteristics and dynamic evolution patterns of users’ preferences are considered. Our new proposed method was compared with state-of-the-art time-aware collaborative filtering algorithms on datasets MovieLens, Flixster and Ciao. The experimental results validate that the proposed recommendation method is effective and outperforms several existing algorithms in the accuracy of rating prediction. PMID:27195787

  15. Structural and Sequence Similarity Makes a Significant Impact on Machine-Learning-Based Scoring Functions for Protein-Ligand Interactions.

    PubMed

    Li, Yang; Yang, Jianyi

    2017-04-24

    The prediction of protein-ligand binding affinity has recently been improved remarkably by machine-learning-based scoring functions. For example, using a set of simple descriptors representing the atomic distance counts, the RF-Score improves the Pearson correlation coefficient to about 0.8 on the core set of the PDBbind 2007 database, which is significantly higher than the performance of any conventional scoring function on the same benchmark. A few studies have been made to discuss the performance of machine-learning-based methods, but the reason for this improvement remains unclear. In this study, by systemically controlling the structural and sequence similarity between the training and test proteins of the PDBbind benchmark, we demonstrate that protein structural and sequence similarity makes a significant impact on machine-learning-based methods. After removal of training proteins that are highly similar to the test proteins identified by structure alignment and sequence alignment, machine-learning-based methods trained on the new training sets do not outperform the conventional scoring functions any more. On the contrary, the performance of conventional functions like X-Score is relatively stable no matter what training data are used to fit the weights of its energy terms.

  16. Wind data mining by Kohonen Neural Networks.

    PubMed

    Fayos, José; Fayos, Carolina

    2007-02-14

    Time series of Circulation Weather Type (CWT), including daily averaged wind direction and vorticity, are self-classified by similarity using Kohonen Neural Networks (KNN). It is shown that KNN is able to map by similarity all 7300 five-day CWT sequences during the period of 1975-94, in London, United Kingdom. It gives, as a first result, the most probable wind sequences preceding each one of the 27 CWT Lamb classes in that period. Inversely, as a second result, the observed diffuse correlation between both five-day CWT sequences and the CWT of the 6(th) day, in the long 20-year period, can be generalized to predict the last from the previous CWT sequence in a different test period, like 1995, as both time series are similar. Although the average prediction error is comparable to that obtained by forecasting standard methods, the KNN approach gives complementary results, as they depend only on an objective classification of observed CWT data, without any model assumption. The 27 CWT of the Lamb Catalogue were coded with binary three-dimensional vectors, pointing to faces, edges and vertex of a "wind-cube," so that similar CWT vectors were close.

  17. Relation between native ensembles and experimental structures of proteins

    PubMed Central

    Best, Robert B.; Lindorff-Larsen, Kresten; DePristo, Mark A.; Vendruscolo, Michele

    2006-01-01

    Different experimental structures of the same protein or of proteins with high sequence similarity contain many small variations. Here we construct ensembles of “high-sequence similarity Protein Data Bank” (HSP) structures and consider the extent to which such ensembles represent the structural heterogeneity of the native state in solution. We find that different NMR measurements probing structure and dynamics of given proteins in solution, including order parameters, scalar couplings, and residual dipolar couplings, are remarkably well reproduced by their respective high-sequence similarity Protein Data Bank ensembles; moreover, we show that the effects of uncertainties in structure determination are insufficient to explain the results. These results highlight the importance of accounting for native-state protein dynamics in making comparisons with ensemble-averaged experimental data and suggest that even a modest number of structures of a protein determined under different conditions, or with small variations in sequence, capture a representative subset of the true native-state ensemble. PMID:16829580

  18. Seismic stratigraphy, tectonics and depositional history in the Halk el Menzel region, NE Tunisia

    NASA Astrophysics Data System (ADS)

    Sebei, Kawthar; Inoubli, Mohamed Hédi; Boussiga, Haïfa; Tlig, Said; Alouani, Rabah; Boujamaoui, Mustapha

    2007-01-01

    In the Halk el Menzel area, the proximal- to pelagic platform transition and related tectonic events during the Upper Cretaceous-Lower Miocene have not been taken into adequate consideration. The integrated interpretation of outcrop and subsurface data help define a seismic stratigraphic model and clarify the geodynamic evolution of the Halk el Menzel block. The sedimentary column comprises marls and limestones of the Campanian to Upper Eocene, overlain by Oligocene to Lower Miocene aged siliciclastics and carbonates. Well to well correlations show sedimentary sequences vary considerably in lithofacies and thicknesses over short distances with remarkable gaps. The comparison of sedimentary sequences cut by borehole and seismic stratigraphic modelling as well help define ten third order depositional sequences (S1-S10). Sequences S1 through S6 (Campanian-Paleocene) are mainly characterized by oblique to sigmoid configurations with prograding sedimentary structures, whereas, sequences S7-S10 (Ypresian to Middle Miocene) are organized in shallow water deposits with marked clinoform ramp geometry. Sedimentary discontinuities developed at sequence boundaries are thought to indicate widespread fall in relative sea level. Angular unconformities record a transpressive tectonic regime that operated from the Campanian to Upper Eocene. The geometry of sequences with reduced thicknesses, differential dipping of internal seismic reflections and associated normal faulting located westerly in the area, draw attention to a depositional sedimentary system developed on a gentle slope evolving from a tectonically driven steepening towards the Northwest. The seismic profiles help delimit normal faulting control environments of deposition. In contrast, reef build-ups in the Eastern parts occupy paleohighs NE-SW in strike with bordering Upper Maastrichtian-Ypresian seismic facies onlapping Upper Cretaceous counterparts. During the Middle-Upper Eocene, transpressive stress caused reactivation of faults from normal to reverse play. This has culminated in propagation folds located to the west; whereas, the eastern part of the block has suffered progressive subsidence. Transgressive carbonate depositional sequences have predominated during the Middle Miocene and have sealed pre-existing tectonic structures.

  19. Partial DNA sequencing of Douglas-fir cDNAs used in RFLP mapping

    Treesearch

    K.D. Jermstad; D.L. Bassoni; C.S. Kinlaw; D.B. Neale

    1998-01-01

    DNA sequences from 87 Douglas-fir (Pseudotsuga menziesii [Mirb.] Franco) cDNA RFLP probes were determined. Sequences were submitted to the GenBank dbEST database and searched for similarity against nucleotide and protein databases using the BLASTn and BLASTx programs. Twenty-one sequences (24%) were assigned putative functions; 18 of which...

  20. A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy.

    PubMed

    Gao, Xiang; Lin, Huaiying; Revanna, Kashi; Dong, Qunfeng

    2017-05-10

    Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .

  1. Critical Points and Traveling Wave in Locomotion: Experimental Evidence and Some Theoretical Considerations.

    PubMed

    Saltiel, Philippe; d'Avella, Andrea; Tresch, Matthew C; Wyler, Kuno; Bizzi, Emilio

    2017-01-01

    The central pattern generator (CPG) architecture for rhythm generation remains partly elusive. We compare cat and frog locomotion results, where the component unrelated to pattern formation appears as a temporal grid, and traveling wave respectively. Frog spinal cord microstimulation with N-methyl-D-Aspartate (NMDA), a CPG activator, produced a limited set of force directions, sometimes tonic, but more often alternating between directions similar to the tonic forces. The tonic forces were topographically organized, and sites evoking rhythms with different force subsets were located close to the constituent tonic force regions. Thus CPGs consist of topographically organized modules. Modularity was also identified as a limited set of muscle synergies whose combinations reconstructed the EMGs. The cat CPG was investigated using proprioceptive inputs during fictive locomotion. Critical points identified both as abrupt transitions in the effect of phasic perturbations, and burst shape transitions, had biomechanical correlates in intact locomotion. During tonic proprioceptive perturbations, discrete shifts between these critical points explained the burst durations changes, and amplitude changes occurred at one of these points. Besides confirming CPG modularity, these results suggest a fixed temporal grid of anchoring points, to shift modules onsets and offsets. Frog locomotion, reconstructed with the NMDA synergies, showed a partially overlapping synergy activation sequence. Using the early synergy output evoked by NMDA at different spinal sites, revealed a rostrocaudal topographic organization, where each synergy is preferentially evoked from a few, albeit overlapping, cord regions. Comparing the locomotor synergy sequence with this topography suggests that a rostrocaudal traveling wave would activate the synergies in the proper sequence for locomotion. This output was reproduced in a two-layer model using this topography and a traveling wave. Together our results suggest two CPG components: modules, i.e., synergies; and temporal patterning, seen as a temporal grid in the cat, and a traveling wave in the frog. Animal and limb navigation have similarities. Research relating grid cells to the theta rhythm and on segmentation during navigation may relate to our temporal grid and traveling wave results. Winfree's mathematical work, combining critical phases and a traveling wave, also appears important. We conclude suggesting tracing, and imaging experiments to investigate our CPG model.

  2. A statistical physics perspective on alignment-independent protein sequence comparison.

    PubMed

    Chattopadhyay, Amit K; Nasiev, Diar; Flower, Darren R

    2015-08-01

    Within bioinformatics, the textual alignment of amino acid sequences has long dominated the determination of similarity between proteins, with all that implies for shared structure, function and evolutionary descent. Despite the relative success of modern-day sequence alignment algorithms, so-called alignment-free approaches offer a complementary means of determining and expressing similarity, with potential benefits in certain key applications, such as regression analysis of protein structure-function studies, where alignment-base similarity has performed poorly. Here, we offer a fresh, statistical physics-based perspective focusing on the question of alignment-free comparison, in the process adapting results from 'first passage probability distribution' to summarize statistics of ensemble averaged amino acid propensity values. In this article, we introduce and elaborate this approach. © The Author 2015. Published by Oxford University Press.

  3. Understanding sequence similarity and framework analysis between centromere proteins using computational biology.

    PubMed

    Doss, C George Priya; Chakrabarty, Chiranjib; Debajyoti, C; Debottam, S

    2014-11-01

    Certain mysteries pointing toward their recruitment pathways, cell cycle regulation mechanisms, spindle checkpoint assembly, and chromosome segregation process are considered the centre of attraction in cancer research. In modern times, with the established databases, ranges of computational platforms have provided a platform to examine almost all the physiological and biochemical evidences in disease-associated phenotypes. Using existing computational methods, we have utilized the amino acid residues to understand the similarity within the evolutionary variance of different associated centromere proteins. This study related to sequence similarity, protein-protein networking, co-expression analysis, and evolutionary trajectory of centromere proteins will speed up the understanding about centromere biology and will create a road map for upcoming researchers who are initiating their work of clinical sequencing using centromere proteins.

  4. Comparison of Metabolic Pathways in Escherichia coli by Using Genetic Algorithms.

    PubMed

    Ortegon, Patricia; Poot-Hernández, Augusto C; Perez-Rueda, Ernesto; Rodriguez-Vazquez, Katya

    2015-01-01

    In order to understand how cellular metabolism has taken its modern form, the conservation and variations between metabolic pathways were evaluated by using a genetic algorithm (GA). The GA approach considered information on the complete metabolism of the bacterium Escherichia coli K-12, as deposited in the KEGG database, and the enzymes belonging to a particular pathway were transformed into enzymatic step sequences by using the breadth-first search algorithm. These sequences represent contiguous enzymes linked to each other, based on their catalytic activities as they are encoded in the Enzyme Commission numbers. In a posterior step, these sequences were compared using a GA in an all-against-all (pairwise comparisons) approach. Individual reactions were chosen based on their measure of fitness to act as parents of offspring, which constitute the new generation. The sequences compared were used to construct a similarity matrix (of fitness values) that was then considered to be clustered by using a k-medoids algorithm. A total of 34 clusters of conserved reactions were obtained, and their sequences were finally aligned with a multiple-sequence alignment GA optimized to align all the reaction sequences included in each group or cluster. From these comparisons, maps associated with the metabolism of similar compounds also contained similar enzymatic step sequences, reinforcing the Patchwork Model for the evolution of metabolism in E. coli K-12, an observation that can be expanded to other organisms, for which there is metabolism information. Finally, our mapping of these reactions is discussed, with illustrations from a particular case.

  5. Comparison of Metabolic Pathways in Escherichia coli by Using Genetic Algorithms

    PubMed Central

    Ortegon, Patricia; Poot-Hernández, Augusto C.; Perez-Rueda, Ernesto; Rodriguez-Vazquez, Katya

    2015-01-01

    In order to understand how cellular metabolism has taken its modern form, the conservation and variations between metabolic pathways were evaluated by using a genetic algorithm (GA). The GA approach considered information on the complete metabolism of the bacterium Escherichia coli K-12, as deposited in the KEGG database, and the enzymes belonging to a particular pathway were transformed into enzymatic step sequences by using the breadth-first search algorithm. These sequences represent contiguous enzymes linked to each other, based on their catalytic activities as they are encoded in the Enzyme Commission numbers. In a posterior step, these sequences were compared using a GA in an all-against-all (pairwise comparisons) approach. Individual reactions were chosen based on their measure of fitness to act as parents of offspring, which constitute the new generation. The sequences compared were used to construct a similarity matrix (of fitness values) that was then considered to be clustered by using a k-medoids algorithm. A total of 34 clusters of conserved reactions were obtained, and their sequences were finally aligned with a multiple-sequence alignment GA optimized to align all the reaction sequences included in each group or cluster. From these comparisons, maps associated with the metabolism of similar compounds also contained similar enzymatic step sequences, reinforcing the Patchwork Model for the evolution of metabolism in E. coli K-12, an observation that can be expanded to other organisms, for which there is metabolism information. Finally, our mapping of these reactions is discussed, with illustrations from a particular case. PMID:25973143

  6. The nucleotide sequences of 5S rRNAs from a rotifer, Brachionus plicatilis, and two nematodes, Rhabditis tokai and Caenorhabditis elegans.

    PubMed

    Kumazaki, T; Hori, H; Osawa, S; Ishii, N; Suzuki, K

    1982-11-11

    The nucleotide sequences of 5S rRNAs from a rotifer, Brachionus plicatilis, and two nematodes, Rhabditis tokai and Caenorhabditis elegans have been determined. The rotifer has two 5S rRNA species that are composed of 120 and 121 nucleotides, respectively. The sequences of these two 5S rRNAs are the same except that the latter has an additional base at its 3'-terminus. The 5S rRNAs from the two nematode species are both 119 nucleotides long. The sequence similarity percents are 79% (Brachionus/Rhabditis), 80% (Brachionus/Caenorhabditis), and 95% (Rhabditis/Caenorhabditis) among these three species. Brachionus revealed the highest similarity to Lingula (89%), but not to the nematodes (79%).

  7. Isolation and sequence analysis of a novel rhesus macaque foamy virus isolate with a serotype-1-like env.

    PubMed

    Ensser, Armin; Großkopf, Anna K; Mätz-Rensing, Kerstin; Roos, Christian; Hahn, Alexander S

    2018-06-02

    SFVmmu-DPZ9524 represents the third completely sequenced rhesus macaque simian foamy virus (SFV) isolate, alongside SFVmmu_K3T with a similar SFV-1-type env, and R289HybAGM with a SFV-2-like env. Sequence analysis demonstrates that, in gag and pol, SFVmmu-DPZ9524 is more closely related to R289HybAGM than to SFVmmu_K3T, which, outside of env, is more similar to a Japanese macaque isolate than to the other two rhesus macaque isolates SFVmmu-DPZ9524 and R289HybAGM. Further, we identify bel as another recombinant locus in R289HybAGM, confirming that recombination contributes to sequence diversity in SFV.

  8. Blood-Borne Candidatus Borrelia algerica in a Patient with Prolonged Fever in Oran, Algeria

    PubMed Central

    Fotso Fotso, Aurélien; Angelakis, Emmanouil; Mouffok, Nadjet; Drancourt, Michel; Raoult, Didier

    2015-01-01

    To improve the knowledge base of Borrelia in north Africa, we tested 257 blood samples collected from febrile patients in Oran, Algeria, between January and December 2012 for Borrelia species using flagellin gene polymerase chain reaction sequencing. A sequence indicative of a new Borrelia sp. named Candidatus Borrelia algerica was detected in one blood sample. Further multispacer sequence typing indicated this Borrelia sp. had 97% similarity with Borrelia crocidurae, Borrelia duttonii, and Borrelia recurrentis. In silico comparison of Candidatus B. algerica spacer sequences with those of Borrelia hispanica and Borrelia garinii revealed 94% and 89% similarity, respectively. Candidatus B. algerica is a new relapsing fever Borrelia sp. detected in Oran. Further studies may help predict its epidemiological importance. PMID:26416117

  9. Identification of distant drug off-targets by direct superposition of binding pocket surfaces.

    PubMed

    Schumann, Marcel; Armen, Roger S

    2013-01-01

    Correctly predicting off-targets for a given molecular structure, which would have the ability to bind a large range of ligands, is both particularly difficult and important if they share no significant sequence or fold similarity with the respective molecular target ("distant off-targets"). A novel approach for identification of off-targets by direct superposition of protein binding pocket surfaces is presented and applied to a set of well-studied and highly relevant drug targets, including representative kinases and nuclear hormone receptors. The entire Protein Data Bank is searched for similar binding pockets and convincing distant off-target candidates were identified that share no significant sequence or fold similarity with the respective target structure. These putative target off-target pairs are further supported by the existence of compounds that bind strongly to both with high topological similarity, and in some cases, literature examples of individual compounds that bind to both. Also, our results clearly show that it is possible for binding pockets to exhibit a striking surface similarity, while the respective off-target shares neither significant sequence nor significant fold similarity with the respective molecular target ("distant off-target").

  10. Identification of Distant Drug Off-Targets by Direct Superposition of Binding Pocket Surfaces

    PubMed Central

    Schumann, Marcel; Armen, Roger S.

    2013-01-01

    Correctly predicting off-targets for a given molecular structure, which would have the ability to bind a large range of ligands, is both particularly difficult and important if they share no significant sequence or fold similarity with the respective molecular target (“distant off-targets”). A novel approach for identification of off-targets by direct superposition of protein binding pocket surfaces is presented and applied to a set of well-studied and highly relevant drug targets, including representative kinases and nuclear hormone receptors. The entire Protein Data Bank is searched for similar binding pockets and convincing distant off-target candidates were identified that share no significant sequence or fold similarity with the respective target structure. These putative target off-target pairs are further supported by the existence of compounds that bind strongly to both with high topological similarity, and in some cases, literature examples of individual compounds that bind to both. Also, our results clearly show that it is possible for binding pockets to exhibit a striking surface similarity, while the respective off-target shares neither significant sequence nor significant fold similarity with the respective molecular target (“distant off-target”). PMID:24391782

  11. DNA sequence analysis of the photosynthesis region of Rhodobacter sphaeroides 2.4.1.

    PubMed

    Choudhary, M; Kaplan, S

    2000-02-15

    This paper describes the DNA sequence of the photosynthesis region of Rhodobacter sphaeroides 2.4.1 (T). The photosynthesis gene cluster is located within a approximately 73 kb Ase I genomic DNA fragment containing the puf, puhA, cycA and puc operons. A total of 65 open reading frames (ORFs) have been identified, of which 61 showed significant similarity to genes/proteins of other organisms while only four did not reveal any significant sequence similarity to any gene/protein sequences in the database. The data were compared with the corresponding genes/ORFs from a different strain of R.sphaeroides and Rhodobacter capsulatus, a close relative of R. sphaeroides. A detailed analysis of the gene organization in the photosynthesis region revealed a similar gene order in both species with some notable differences located to the pucBAC = cycA region. In addition, photosynthesis gene regulatory protein (PpsR, FNR, IHF) binding motifs in upstream sequences of a number of photosynthesis genes have been identified and shown to differ between these two species. The difference in gene organization relative to pucBAC and cycA suggests that this region originated independently of the photosynthesis gene cluster of R.sphaeroides.

  12. Early Life Stages

    EPA Pesticide Factsheets

    Childhood should be viewed as a sequence of lifestages, from birth through infancy and adolescence. When assessing early life risks, consideration is given to risks resulting from fetal exposure via the pregnant mother, as well as postnatal exposures.

  13. Quantification of the effects of eustasy, subsidence, and sediment supply on Miocene sequences, mid-Atlantic margin of the United States

    USGS Publications Warehouse

    Browning, J.V.; Miller, K.G.; McLaughlin, P.P.; Kominz, M.A.; Sugarman, P.J.; Monteverde, D.; Feigenson, M.D.; Hernandez, J.C.

    2006-01-01

    We use backstripping to quantify the roles of variations in global sea level (eustasy), subsidence, and sediment supply on the development of the Miocene stratigraphic record of the mid-Atlantic continental margin of the United States (New Jersey, Delaware, and Maryland). Eustasy is a primary influence on sequence patterns, determining the global template of sequences (i.e., times when sequences can be preserved) and explaining similarities in Miocene sequence architecture on margins throughout the world. Sequences can be correlated throughout the mid-Atlantic region with Sr-isotopic chronology (??0.6 m.y. to ??1.2 m.y.). Eight Miocene sequences correlate regionally and can be correlated to global ??18O increases, indicating glacioeustatic control. This margin is dominated by passive subsidence with little evidence for active tectonic overprints, except possibly in Maryland during the early Miocene. However, early Miocene sequences in New Jersey and Delaware display a patchwork distribution that is attributable to minor (tens of meters) intervals of excess subsidence. Backstripping quantifies that excess subsidence began in Delaware at ca. 21 Ma and continued until 12 Ma, with maximum rates from ca. 21-16 Ma. We attribute this enhanced subsidence to local flexural response to the progradation of thick sequences offshore and adjacent to this area. Removing this excess subsidence in Delaware yields a record that is remarkably similar to New Jersey eustatic estimates. We conclude that sea-level rise and fall is a first-order control on accommodation providing similar timing on all margins to the sequence record. Tectonic changes due to movement of the crust can overprint the record, resulting in large gaps in the stratigraphic record. Smaller differences in sequences can be attributed to local flexural loading effects, particularly in regions experiencing large-scale progradation. ?? 2006 Geological Society of America.

  14. Computational design of water-soluble α-helical barrels.

    PubMed

    Thomson, Andrew R; Wood, Christopher W; Burton, Antony J; Bartlett, Gail J; Sessions, Richard B; Brady, R Leo; Woolfson, Derek N

    2014-10-24

    The design of protein sequences that fold into prescribed de novo structures is challenging. General solutions to this problem require geometric descriptions of protein folds and methods to fit sequences to these. The α-helical coiled coils present a promising class of protein for this and offer considerable scope for exploring hitherto unseen structures. For α-helical barrels, which have more than four helices and accessible central channels, many of the possible structures remain unobserved. Here, we combine geometrical considerations, knowledge-based scoring, and atomistic modeling to facilitate the design of new channel-containing α-helical barrels. X-ray crystal structures of the resulting designs match predicted in silico models. Furthermore, the observed channels are chemically defined and have diameters related to oligomer state, which present routes to design protein function. Copyright © 2014, American Association for the Advancement of Science.

  15. The complete genome sequence and genetic analysis of ΦCA82 a novel uncultured microphage from the turkey gastrointestinal system

    PubMed Central

    2011-01-01

    The genomic DNA sequence of a novel enteric uncultured microphage, ΦCA82 from a turkey gastrointestinal system was determined utilizing metagenomics techniques. The entire circular, single-stranded nucleotide sequence of the genome was 5,514 nucleotides. The ΦCA82 genome is quite different from other microviruses as indicated by comparisons of nucleotide similarity, predicted protein similarity, and functional classifications. Only three genes showed significant similarity to microviral proteins as determined by local alignments using BLAST analysis. ORF1 encoded a predicted phage F capsid protein that was phylogenetically most similar to the Microviridae ΦMH2K member's major coat protein. The ΦCA82 genome also encoded a predicted minor capsid protein (ORF2) and putative replication initiation protein (ORF3) most similar to the microviral bacteriophage SpV4. The distant evolutionary relationship of ΦCA82 suggests that the divergence of this novel turkey microvirus from other microviruses may reflect unique evolutionary pressures encountered within the turkey gastrointestinal system. PMID:21714899

  16. Xylella genomics and bacterial pathogenicity to plants.

    PubMed

    Dow, J M; Daniels, M J

    2000-12-01

    Xylella fastidiosa, a pathogen of citrus, is the first plant pathogenic bacterium for which the complete genome sequence has been published. Inspection of the sequence reveals high relatedness to many genes of other pathogens, notably Xanthomonas campestris. Based on this, we suggest that Xylella possesses certain easily testable properties that contribute to pathogenicity. We also present some general considerations for deriving information on pathogenicity from bacterial genomics. Copyright 2000 John Wiley & Sons, Ltd.

  17. Genome Sequence of an Endophytic Fungus, Fusarium solani JS-169, Which Has Antifungal Activity.

    PubMed

    Kim, Jung A; Jeon, Jongbum; Park, Sook-Young; Kim, Ki-Tae; Choi, Gobong; Lee, Hyun-Jung; Kim, Yangsun; Yang, Hee-Sun; Yeo, Joo-Hong; Lee, Yong-Hwan; Kim, Soonok

    2017-10-19

    An endophytic fungus, Fusarium solani strain JS-169, isolated from a mulberry twig, showed considerable antifungal activity. Here, we report the draft genome sequence of this strain. The assembly comprises 17 scaffolds, with an N 50 value of 4.93 Mb. The assembled genome was 45,813,297 bp in length, with a G+C content of 49.91%. Copyright © 2017 Kim et al.

  18. Innate Immune Complexity in the Purple Sea Urchin: Diversity of the Sp185/333 System

    PubMed Central

    Smith, L. Courtney

    2012-01-01

    The California purple sea urchin, Strongylocentrotus purpuratus, is a long-lived echinoderm with a complex and sophisticated innate immune system. There are several large gene families that function in immunity in this species including the Sp185/333 gene family that has ∼50 (±10) members. The family shows intriguing sequence diversity and encodes a broad array of diverse yet similar proteins. The genes have two exons of which the second encodes the mature protein and has repeats and blocks of sequence called elements. Mosaics of element patterns plus single nucleotide polymorphisms-based variants of the elements result in significant sequence diversity among the genes yet maintains similar structure among the members of the family. Sequence of a bacterial artificial chromosome insert shows a cluster of six, tightly linked Sp185/333 genes that are flanked by GA microsatellites. The sequences between the GA microsatellites in which the Sp185/333 genes and flanking regions are located, are much more similar to each other than are the sequences outside the microsatellites suggesting processes such as gene conversion, recombination, or duplication. However, close linkage does not correspond with greater sequence similarity compared to randomly cloned and sequenced genes that are unlikely to be linked. There are three segmental duplications that are bounded by GAT microsatellites and include three almost identical genes plus flanking regions. RNA editing is detectible throughout the mRNAs based on comparisons to the genes, which, in combination with putative post-translational modifications to the proteins, results in broad arrays of Sp185/333 proteins that differ among individuals. The mature proteins have an N-terminal glycine-rich region, a central RGD motif, and a C-terminal histidine-rich region. The Sp185/333 proteins are localized to the cell surface and are found within vesicles in subsets of polygonal and small phagocytes. The coelomocyte proteome shows full-length and truncated proteins, including some with missense sequence. Current results suggest that both native Sp185/333 proteins and a recombinant protein bind bacteria and are likely important in sea urchin innate immunity. PMID:22566951

  19. Megabase sequencing of human genome by ordered-shotgun-sequencing (OSS) strategy

    NASA Astrophysics Data System (ADS)

    Chen, Ellson Y.

    1997-05-01

    So far we have used OSS strategy to sequence over 2 megabases DNA in large-insert clones from regions of human X chromosomes with different characteristic levels of GC content. The method starts by randomly fragmenting a BAC, YAC or PAC to 8-12 kb pieces and subcloning those into lambda phage. Insert-ends of these clones are sequenced and overlapped to create a partial map. Complete sequencing is then done on a minimal tiling path of selected subclones, recursively focusing on those at the edges of contigs to facilitate mergers of clones across the entire target. To reduce manual labor, PCR processes have been adapted to prepare sequencing templates throughout the entire operation. The streamlined process can thus lend itself to further automation. The OSS approach is suitable for large- scale genomic sequencing, providing considerable flexibility in the choice of subclones or regions for more or less intensive sequencing. For example, subclones containing contaminating host cell DNA or cloning vector can be recognized and ignored with minimal sequencing effort; regions overlapping a neighboring clone already sequenced need not be redone; and segments containing tandem repeats or long repetitive sequences can be spotted early on and targeted for additional attention.

  20. Investigation of Human Cancers for Retrovirus by Low-Stringency Target Enrichment and High-Throughput Sequencing.

    PubMed

    Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens; Gniadecki, Robert; Dybkaer, Karen; Rosenberg, Jacob; Langhoff, Jill Levin; Cruz, David Flores Santa; Fonager, Jannik; Izarzugaza, Jose M G; Gupta, Ramneek; Sicheritz-Ponten, Thomas; Brunak, Søren; Willerslev, Eske; Nielsen, Lars Peter; Hansen, Anders Johannes

    2015-08-19

    Although nearly one fifth of all human cancers have an infectious aetiology, the causes for the majority of cancers remain unexplained. Despite the enormous data output from high-throughput shotgun sequencing, viral DNA in a clinical sample typically constitutes a proportion of host DNA that is too small to be detected. Sequence variation among virus genomes complicates application of sequence-specific, and highly sensitive, PCR methods. Therefore, we aimed to develop and characterize a method that permits sensitive detection of sequences despite considerable variation. We demonstrate that our low-stringency in-solution hybridization method enables detection of <100 viral copies. Furthermore, distantly related proviral sequences may be enriched by orders of magnitude, enabling discovery of hitherto unknown viral sequences by high-throughput sequencing. The sensitivity was sufficient to detect retroviral sequences in clinical samples. We used this method to conduct an investigation for novel retrovirus in samples from three cancer types. In accordance with recent studies our investigation revealed no retroviral infections in human B-cell lymphoma cells, cutaneous T-cell lymphoma or colorectal cancer biopsies. Nonetheless, our generally applicable method makes sensitive detection possible and permits sequencing of distantly related sequences from complex material.

  1. Kickoff to Conflict: A Sequence Analysis of Intra-State Conflict-Preceding Event Structures

    PubMed Central

    D'Orazio, Vito; Yonamine, James E.

    2015-01-01

    While many studies have suggested or assumed that the periods preceding the onset of intra-state conflict are similar across time and space, few have empirically tested this proposition. Using the Integrated Crisis Early Warning System's domestic event data in Asia from 1998–2010, we subject this proposition to empirical analysis. We code the similarity of government-rebel interactions in sequences preceding the onset of intra-state conflict to those preceding further periods of peace using three different metrics: Euclidean, Levenshtein, and mutual information. These scores are then used as predictors in a bivariate logistic regression to forecast whether we are likely to observe conflict in neither, one, or both of the states. We find that our model accurately classifies cases where both sequences precede peace, but struggles to distinguish between cases in which one sequence escalates to conflict and where both sequences escalate to conflict. These findings empirically suggest that generalizable patterns exist between event sequences that precede peace. PMID:25951105

  2. Sequence analysis of 497 mouse brain ESTs expressed in the substantia nigra

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stewart, G.J.; Savioz, A.; Davies, R.W.

    1997-01-15

    The use of subtracted, region-specific cDNA libraries combined with single-pass cDNA sequencing allows the discovery of novel genes and facilitates molecular description of the tissue or region involved. We report the sequence of 497 mouse expressed sequence tags (ESTs) from two subtracted libraries enriched for cDNAs expressed in the substantia nigra, a brain region with important roles in movement control and Parkinson disease. Of these, 238 ESTs give no database matches and therefore derive from novel genes. A further 115 ESTs show sequence similarity to ESTs from other organisms, which themselves do not yield any significant database matches to genesmore » of known function. Fifty-six ESTs show sequence similarity to previously identified genes whose mouse homologues have not been reported. The total number of ESTs reported that are new for the mouse is 407, which, together with the 90 ESTs corresponding to known mouse genes or cDNAs, contributes to the molecular description of the substantia nigra. 21 refs., 4 tabs.« less

  3. Cloning and characterization of two novel DNases from Streptococcus pyogenes.

    PubMed

    Hasegawa, Tadao; Torii, Keizo; Hashikawa, Shinnosuke; Iinuma, Yoshitsugu; Ohta, Michio

    2002-06-01

    The proteins in the culture supernatant (exoproteins) from Streptococcus pyogenes serotype M1 were separated by two-dimensional gel electrophoresis, and their N-terminal amino acid sequences were determined. The amino acid sequences were compared to sequences in the S. pyogenes genome database. The coding sequence showed similarity to sequences of two genes, mf2-v ( mf2 variant) and mf3, which had sequence similarity to genes encoding mitogenic factor (MF); MF has DNase activity. The recombinant genes were expressed in Escherichia coli and the proteins were synthesized. Mf2-v and Mf3 had DNase activity. The activity of Mf2-v was localized to the C-terminal half of the protein. The mf3 gene was shown to be present in most clinically isolated strains of S. pyogenes tested, and the mf2gene was detected in 20% of the isolates. The products of the mf2 and mf3 genes in clinically isolated S. pyogenes strains were thus shown to be DNases.

  4. Variability and transmission by Aphis glycines of North American and Asian Soybean mosaic virus isolates.

    PubMed

    Domier, L L; Latorre, I J; Steinlage, T A; McCoppin, N; Hartman, G L

    2003-10-01

    The variability of North American and Asian strains and isolates of Soybean mosaic virus was investigated. First, polymerase chain reaction (PCR) products representing the coat protein (CP)-coding regions of 38 SMVs were analyzed for restriction fragment length polymorphisms (RFLP). Second, the nucleotide and predicted amino acid sequence variability of the P1-coding region of 18 SMVs and the helper component/protease (HC/Pro) and CP-coding regions of 25 SMVs were assessed. The CP nucleotide and predicted amino acid sequences were the most similar and predicted phylogenetic relationships similar to those obtained from RFLP analysis. Neither RFLP nor sequence analyses of the CP-coding regions grouped the SMVs by geographical origin. The P1 and HC/Pro sequences were more variable and separated the North American and Asian SMV isolates into two groups similar to previously reported differences in pathogenic diversity of the two sets of SMV isolates. The P1 region was the most informative of the three regions analyzed. To assess the biological relevance of the sequence differences in the HC/Pro and CP coding regions, the transmissibility of 14 SMV isolates by Aphis glycines was tested. All field isolates of SMV were transmitted efficiently by A. glycines, but the laboratory isolates analyzed were transmitted poorly. The amino acid sequences from most, but not all, of the poorly transmitted isolates contained mutations in the aphid transmission-associated DAG and/or KLSC amino acid sequence motifs of CP and HC/Pro, respectively.

  5. Interbasin water transfer, riverine connectivity, and spatial controls on fish biodiversity

    USGS Publications Warehouse

    Grant, Evan H. Campbell; Lynch, Heather J.; Muneepeerakul, Rachata; Muthukumarasamy, Arunachalam; Rodríguez-Iturbe, Ignacio; Fagan, William F.

    2012-01-01

    Background Large-scale inter-basin water transfer (IBWT) projects are commonly proposed as solutions to water distribution and supply problems. These problems are likely to intensify under future population growth and climate change scenarios. Scarce data on the distribution of freshwater fishes frequently limits the ability to assess the potential implications of an IBWT project on freshwater fish communities. Because connectivity in habitat networks is expected to be critical to species' biogeography, consideration of changes in the relative isolation of riverine networks may provide a strategy for controlling impacts of IBWTs on freshwater fish communities Methods/Principal Findings Using empirical data on the current patterns of freshwater fish biodiversity for rivers of peninsular India, we show here how the spatial changes alone under an archetypal IBWT project will (1) reduce freshwater fish biodiversity system-wide, (2) alter patterns of local species richness, (3) expand distributions of widespread species throughout peninsular rivers, and (4) decrease community richness by increasing inter-basin similarity (a mechanism for the observed decrease in biodiversity). Given the complexity of the IBWT, many paths to partial or full completion of the project are possible. We evaluate two strategies for step-wise implementation of the 11 canals, based on economic or ecological considerations. We find that for each step in the project, the impacts on freshwater fish communities are sensitive to which canal is added to the network. Conclusions/Significance Importantly, ecological impacts can be reduced by associating the sequence in which canals are added to characteristics of the links, except for the case when all 11 canals are implemented simultaneously (at which point the sequence of canal addition is inconsequential). By identifying the fundamental relationship between the geometry of riverine networks and freshwater fish biodiversity, our results will aid in assessing impacts of IBWT projects and balancing ecosystem and societal demands for freshwater, even in cases where biodiversity data are limited.

  6. Interbasin Water Transfer, Riverine Connectivity, and Spatial Controls on Fish Biodiversity

    PubMed Central

    Grant, Evan H. Campbell; Lynch, Heather J.; Muneepeerakul, Rachata; Arunachalam, Muthukumarasamy; Rodríguez-Iturbe, Ignacio; Fagan, William F.

    2012-01-01

    Background Large-scale inter-basin water transfer (IBWT) projects are commonly proposed as solutions to water distribution and supply problems. These problems are likely to intensify under future population growth and climate change scenarios. Scarce data on the distribution of freshwater fishes frequently limits the ability to assess the potential implications of an IBWT project on freshwater fish communities. Because connectivity in habitat networks is expected to be critical to species' biogeography, consideration of changes in the relative isolation of riverine networks may provide a strategy for controlling impacts of IBWTs on freshwater fish communities. Methods/Principal Findings Using empirical data on the current patterns of freshwater fish biodiversity for rivers of peninsular India, we show here how the spatial changes alone under an archetypal IBWT project will (1) reduce freshwater fish biodiversity system-wide, (2) alter patterns of local species richness, (3) expand distributions of widespread species throughout peninsular rivers, and (4) decrease community richness by increasing inter-basin similarity (a mechanism for the observed decrease in biodiversity). Given the complexity of the IBWT, many paths to partial or full completion of the project are possible. We evaluate two strategies for step-wise implementation of the 11 canals, based on economic or ecological considerations. We find that for each step in the project, the impacts on freshwater fish communities are sensitive to which canal is added to the network. Conclusions/Significance Importantly, ecological impacts can be reduced by associating the sequence in which canals are added to characteristics of the links, except for the case when all 11 canals are implemented simultaneously (at which point the sequence of canal addition is inconsequential). By identifying the fundamental relationship between the geometry of riverine networks and freshwater fish biodiversity, our results will aid in assessing impacts of IBWT projects and balancing ecosystem and societal demands for freshwater, even in cases where biodiversity data are limited. PMID:22470533

  7. Convergence yet Continued Complexity: A Systematic Review and Critique of Health Economic Models of Relapsing-Remitting Multiple Sclerosis in the United Kingdom.

    PubMed

    Allen, Felicity; Montgomery, Stephen; Maruszczak, Maciej; Kusel, Jeanette; Adlard, Nicholas

    2015-09-01

    Several disease-modifying therapies have marketing authorizations for the treatment of relapsing-remitting multiple sclerosis (RRMS). Given their appraisal by the National Institute for Health and Care Excellence, the objective was to systematically identify and critically evaluate the structures and assumptions used in health economic models of disease-modifying therapies for RRMS in the United Kingdom. Embase, MEDLINE, The Cochrane Library, and the National Institute for Health and Care Excellence Web site were searched systematically on March 3, 2014, to identify articles relating to health economic models in RRMS with a UK perspective. Data sources, techniques, and assumptions of the included models were extracted, compared, and critically evaluated. Of 386 results, 26 full texts were evaluated, leading to the inclusion of 18 articles (relating to 12 models). Early models varied considerably in method and structure, but convergence over time toward a Markov model with states based on disability score, a 1-year cycle length, and a lifetime time horizon was apparent. Recent models also allowed for disability improvement within the natural history of the condition. Considerable variety remains, with increasing numbers of comparators, the need for treatment sequencing, and different assumptions around efficacy waning and treatment withdrawal. Despite convergence over time to a similar Markov structure, there are still significant discrepancies between health economic models of RRMS in the United Kingdom. Differing methods, assumptions, and data sources render the comparison of model implementation and results problematic. The commonly used Markov structure leads to problems such as incapability to deal with heterogeneous populations and multiplying complexity with the addition of treatment sequences; these would best be solved by using alternative models such as discrete event simulations. Copyright © 2015 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.

  8. Interbasin water transfer, riverine connectivity, and spatial controls on fish biodiversity.

    PubMed

    Grant, Evan H Campbell; Lynch, Heather J; Muneepeerakul, Rachata; Arunachalam, Muthukumarasamy; Rodríguez-Iturbe, Ignacio; Fagan, William F

    2012-01-01

    Large-scale inter-basin water transfer (IBWT) projects are commonly proposed as solutions to water distribution and supply problems. These problems are likely to intensify under future population growth and climate change scenarios. Scarce data on the distribution of freshwater fishes frequently limits the ability to assess the potential implications of an IBWT project on freshwater fish communities. Because connectivity in habitat networks is expected to be critical to species' biogeography, consideration of changes in the relative isolation of riverine networks may provide a strategy for controlling impacts of IBWTs on freshwater fish communities. Using empirical data on the current patterns of freshwater fish biodiversity for rivers of peninsular India, we show here how the spatial changes alone under an archetypal IBWT project will (1) reduce freshwater fish biodiversity system-wide, (2) alter patterns of local species richness, (3) expand distributions of widespread species throughout peninsular rivers, and (4) decrease community richness by increasing inter-basin similarity (a mechanism for the observed decrease in biodiversity). Given the complexity of the IBWT, many paths to partial or full completion of the project are possible. We evaluate two strategies for step-wise implementation of the 11 canals, based on economic or ecological considerations. We find that for each step in the project, the impacts on freshwater fish communities are sensitive to which canal is added to the network. Importantly, ecological impacts can be reduced by associating the sequence in which canals are added to characteristics of the links, except for the case when all 11 canals are implemented simultaneously (at which point the sequence of canal addition is inconsequential). By identifying the fundamental relationship between the geometry of riverine networks and freshwater fish biodiversity, our results will aid in assessing impacts of IBWT projects and balancing ecosystem and societal demands for freshwater, even in cases where biodiversity data are limited.

  9. Prediction of multi-drug resistance transporters using a novel sequence analysis method [version 2; referees: 2 approved

    DOE PAGES

    McDermott, Jason E.; Bruillard, Paul; Overall, Christopher C.; ...

    2015-03-09

    There are many examples of groups of proteins that have similar function, but the determinants of functional specificity may be hidden by lack of sequencesimilarity, or by large groups of similar sequences with different functions. Transporters are one such protein group in that the general function, transport, can be easily inferred from the sequence, but the substrate specificity can be impossible to predict from sequence with current methods. In this paper we describe a linguistic-based approach to identify functional patterns from groups of unaligned protein sequences and its application to predict multi-drug resistance transporters (MDRs) from bacteria. We first showmore » that our method can recreate known patterns from PROSITE for several motifs from unaligned sequences. We then show that the method, MDRpred, can predict MDRs with greater accuracy and positive predictive value than a collection of currently available family-based models from the Pfam database. Finally, we apply MDRpred to a large collection of protein sequences from an environmental microbiome study to make novel predictions about drug resistance in a potential environmental reservoir.« less

  10. Patterns and Sequences: Interactive Exploration of Clickstreams to Understand Common Visitor Paths.

    PubMed

    Liu, Zhicheng; Wang, Yang; Dontcheva, Mira; Hoffman, Matthew; Walker, Seth; Wilson, Alan

    2017-01-01

    Modern web clickstream data consists of long, high-dimensional sequences of multivariate events, making it difficult to analyze. Following the overarching principle that the visual interface should provide information about the dataset at multiple levels of granularity and allow users to easily navigate across these levels, we identify four levels of granularity in clickstream analysis: patterns, segments, sequences and events. We present an analytic pipeline consisting of three stages: pattern mining, pattern pruning and coordinated exploration between patterns and sequences. Based on this approach, we discuss properties of maximal sequential patterns, propose methods to reduce the number of patterns and describe design considerations for visualizing the extracted sequential patterns and the corresponding raw sequences. We demonstrate the viability of our approach through an analysis scenario and discuss the strengths and limitations of the methods based on user feedback.

  11. Arabidopsis thaliana type I and II chaperonins.

    PubMed

    Hill, J E; Hemmingsen, S M

    2001-07-01

    An examination of the Arabidopsis thaliana genome sequence led to the identification of 29 predicted genes with the potential to encode members of the chaperonin family of chaperones (CPN60 and CCT), their associated cochaperonins, and the cytoplasmic chaperonin cofactor prefoldin. These comprise the first complete set of plant chaperonin protein sequences and indicate that the CPN family is more diverse than previously described. In addition to surprising sequence diversity within CPN subclasses, the genomic data also suggest the existence of previously undescribed family members, including a 10-kDa chloroplast cochaperonin. Consideration of the sequence data described in this review prompts questions about the complexities of plant CPN systems and the evolutionary relationships and functions of the component proteins, most of which have not been studied experimentally.

  12. Trichoderma virens β-glucosidase I (BGLI) gene; expression in Saccharomyces cerevisiae including docking and molecular dynamics studies.

    PubMed

    Wickramasinghe, Gammadde Hewa Ishan Maduka; Rathnayake, Pilimathalawe Panditharathna Attanayake Mudiyanselage Samith Indika; Chandrasekharan, Naduviladath Vishvanath; Weerasinghe, Mahindagoda Siril Samantha; Wijesundera, Ravindra Lakshman Chundananda; Wijesundera, Wijepurage Sandhya Sulochana

    2017-06-21

    Cellulose, a linear polymer of β 1-4, linked glucose, is the most abundant renewable fraction of plant biomass (lignocellulose). It is synergistically converted to glucose by endoglucanase (EG) cellobiohydrolase (CBH) and β-glucosidase (BGL) of the cellulase complex. BGL plays a major role in the conversion of randomly cleaved cellooligosaccharides into glucose. As it is well known, Saccharomyces cerevisiae can efficiently convert glucose into ethanol under anaerobic conditions. Therefore, S.cerevisiae was genetically modified with the objective of heterologous extracellular expression of the BGLI gene of Trichoderma virens making it capable of utilizing cellobiose to produce ethanol. The cDNA and a genomic sequence of the BGLI gene of Trichoderma virens was cloned in the yeast expression vector pGAPZα and separately transformed to Saccharomyces cerevisiae. The size of the BGLI cDNA clone was 1363 bp and the genomic DNA clone contained an additional 76 bp single intron following the first exon. The gene was 90% similar to the DNA sequence and 99% similar to the deduced amino acid sequence of 1,4-β-D-glucosidase of T. atroviride (AC237343.1). The BGLI activity expressed by the recombinant genomic clone was 3.4 times greater (1.7 x 10 -3  IU ml -1 ) than that observed for the cDNA clone (5 x 10 -4  IU ml -1 ). Furthermore, the activity was similar to the activity of locally isolated Trichoderma virens (1.5 x 10 -3  IU ml -1 ). The estimated size of the protein was 52 kDA. In fermentation studies, the maximum ethanol production by the genomic and the cDNA clones were 0.36 g and 0.06 g /g of cellobiose respectively. Molecular docking results indicated that the bare protein and cellobiose-protein complex behave in a similar manner with considerable stability in aqueous medium. The deduced binding site and the binding affinity of the constructed homology model appeared to be reasonable. Moreover, it was identified that the five hydrogen bonds formed between the amino acid residues of BGLI and cellobiose are mainly involved in the integrity of enzyme-substrate association. The BGLI activity was remarkably higher in the genomic DNA clone compared to the cDNA clone. Cellobiose was successfully fermented into ethanol by the recombinant S.cerevisiae genomic DNA clone. It has the potential to be used in the industrial production of ethanol as it is capable of simultaneous saccharification and fermentation of cellobiose. Homology modeling, docking studies and molecular dynamics simulation studies will provide a realistic model for further studies in the modification of active site residues which could be followed by mutation studies to improve the catalytic action of BGLI.

  13. Comparison of traditional phenotypic identification methods with partial 5' 16S rRNA gene sequencing for species-level identification of nonfermenting Gram-negative bacilli.

    PubMed

    Cloud, Joann L; Harmsen, Dag; Iwen, Peter C; Dunn, James J; Hall, Gerri; Lasala, Paul Rocco; Hoggan, Karen; Wilson, Deborah; Woods, Gail L; Mellmann, Alexander

    2010-04-01

    Correct identification of nonfermenting Gram-negative bacilli (NFB) is crucial for patient management. We compared phenotypic identifications of 96 clinical NFB isolates with identifications obtained by 5' 16S rRNA gene sequencing. Sequencing identified 88 isolates (91.7%) with >99% similarity to a sequence from the assigned species; 61.5% of sequencing results were concordant with phenotypic results, indicating the usability of sequencing to identify NFB.

  14. Improving performance of DS-CDMA systems using chaotic complex Bernoulli spreading codes

    NASA Astrophysics Data System (ADS)

    Farzan Sabahi, Mohammad; Dehghanfard, Ali

    2014-12-01

    The most important goal of spreading spectrum communication system is to protect communication signals against interference and exploitation of information by unintended listeners. In fact, low probability of detection and low probability of intercept are two important parameters to increase the performance of the system. In Direct Sequence Code Division Multiple Access (DS-CDMA) systems, these properties are achieved by multiplying the data information in spreading sequences. Chaotic sequences, with their particular properties, have numerous applications in constructing spreading codes. Using one-dimensional Bernoulli chaotic sequence as spreading code is proposed in literature previously. The main feature of this sequence is its negative auto-correlation at lag of 1, which with proper design, leads to increase in efficiency of the communication system based on these codes. On the other hand, employing the complex chaotic sequences as spreading sequence also has been discussed in several papers. In this paper, use of two-dimensional Bernoulli chaotic sequences is proposed as spreading codes. The performance of a multi-user synchronous and asynchronous DS-CDMA system will be evaluated by applying these sequences under Additive White Gaussian Noise (AWGN) and fading channel. Simulation results indicate improvement of the performance in comparison with conventional spreading codes like Gold codes as well as similar complex chaotic spreading sequences. Similar to one-dimensional Bernoulli chaotic sequences, the proposed sequences also have negative auto-correlation. Besides, construction of complex sequences with lower average cross-correlation is possible with the proposed method.

  15. Case-Based Plan Recognition Using Action Sequence Graphs

    DTIC Science & Technology

    2014-10-01

    resized as necessary. Similarly, trace- based reasoning (Zarka et al., 2013) and episode -based reasoning (Sánchez-Marré, 2005) store fixed-length...is a goal state of Π, where satisfies has the same semantics as originally laid out in Ghallab, Nau & Traverso (2004). Action 0 is ...Although there are syntactic similarities between planning encoding graphs and action sequence graphs, important semantic differences exist because the

  16. Cytogenetic evidence for asexual evolution of bdelloid rotifers.

    PubMed

    Mark Welch, Jessica L; Mark Welch, David B; Meselson, Matthew

    2004-02-10

    DNA sequencing has shown individual bdelloid rotifer genomes to contain two or more diverged copies of every gene examined and has revealed no closely similar copies. These and other findings are consistent with long-term asexual evolution of bdelloids. It is not entirely ruled out, however, that bdelloid genomes consist of previously undetected pairs of sequences so similar as to be identical over the regions sequenced, as might result if bdelloids were highly inbred sexual diploids or polyploids. Here, we employ fluorescent in situ hybridization with cosmid probes to determine the copy number and chromosomal distribution of the heat shock gene hsp82 and adjacent sequences in the bdelloid Philodina roseola. We conclude that the four copies identified by sequencing are the only ones present and that each is on a separate chromosome. Bdelloids therefore are not highly homozygous sexually reproducing diploids or polyploids.

  17. Complete genome sequence of an isolate of Potato virus X (PVX) infecting Cape gooseberry (Physalis peruviana) in Colombia.

    PubMed

    Gutiérrez, Pablo A; Alzate, Juan F; Montoya, Mauricio Marín

    2015-06-01

    Transcriptome analysis of a Cape gooseberry (Physalis peruviana) plant with leaf symptoms of a mild yellow mosaic typical of a viral disease revealed an infection with Potato virus X (PVX). The genome sequence of the PVX-Physalis isolate comprises 6435 nt and exhibits higher sequence similarity to members of the Eurasian group of PVX (~95 %) than to the American group (~77 %). Genome organization is similar to other PVX isolates with five open reading frames coding for proteins RdRp, TGBp1, TGBp2, TGBp3, and CP. 5' and 3' untranslated regions revealed all regulatory motifs typically found in PVX isolates. The PVX-Physalis genome is the only complete sequence available for a Potexvirus in Colombia and is a new addition to the restricted number of available sequences of PVX isolates infecting plant species different to potato.

  18. Comparison of ZP3 protein sequences among vertebrate species: to obtain a consensus sequence for immunocontraception.

    PubMed

    Zhu, X; Naz, R K

    1999-03-01

    The deduced ZP3 amino acid (aa) sequences of 13 vertebrate species namely mouse, hamster, rabbit, pig, porcine, cow, dog, cat, human, bonnet, marmoset, carp, and frog were compared using the PILEUP and PRETTY alignment programs (GCG, Wisconsin, USA). The published aa sequences obtained from 13 vertebrate species indicated the overall evolutionarily conservation in the N-terminus, central region, and C-terminus of the ZP3 polypeptide. More variations of ZP3 polypeptide sequences were seen in the alignments of carp and frog from the 11 mammalian species making the leader sequence more prominent. The canonical furin proteolytic processing signal at the C-terminus was found in all the ZP3 polypeptide sequences except of carp and frog. In the central region, the ZP3 deduced aa sequences of all the 13 vertebrate species aligned well, and six relatively conserved sequences were found. There are 11 conserved cysteine residues in the central region across all species including carp and frog, indicating that these residues have longer evolutionary history. The ZP3 aa sequence similarities were examined using the GAP program (GCG). The highest aa similarities are observed between the members of the same order within the class mammalia, and also (95.4%) between pig (ungulata) and rabbit (lagomorpha). The deduced ZP3 aa sequences per se may not be enough to build a phylogenetic tree.

  19. The first set of EST resource for gene discovery and marker development in pigeonpea (Cajanus cajan L.).

    PubMed

    Raju, Nikku L; Gnanesh, Belaghihalli N; Lekha, Pazhamala; Jayashree, Balaji; Pande, Suresh; Hiremath, Pavana J; Byregowda, Munishamappa; Singh, Nagendra K; Varshney, Rajeev K

    2010-03-11

    Pigeonpea (Cajanus cajan (L.) Millsp) is one of the major grain legume crops of the tropics and subtropics, but biotic stresses [Fusarium wilt (FW), sterility mosaic disease (SMD), etc.] are serious challenges for sustainable crop production. Modern genomic tools such as molecular markers and candidate genes associated with resistance to these stresses offer the possibility of facilitating pigeonpea breeding for improving biotic stress resistance. Availability of limited genomic resources, however, is a serious bottleneck to undertake molecular breeding in pigeonpea to develop superior genotypes with enhanced resistance to above mentioned biotic stresses. With an objective of enhancing genomic resources in pigeonpea, this study reports generation and analysis of comprehensive resource of FW- and SMD- responsive expressed sequence tags (ESTs). A total of 16 cDNA libraries were constructed from four pigeonpea genotypes that are resistant and susceptible to FW ('ICPL 20102' and 'ICP 2376') and SMD ('ICP 7035' and 'TTB 7') and a total of 9,888 (9,468 high quality) ESTs were generated and deposited in dbEST of GenBank under accession numbers GR463974 to GR473857 and GR958228 to GR958231. Clustering and assembly analyses of these ESTs resulted into 4,557 unique sequences (unigenes) including 697 contigs and 3,860 singletons. BLASTN analysis of 4,557 unigenes showed a significant identity with ESTs of different legumes (23.2-60.3%), rice (28.3%), Arabidopsis (33.7%) and poplar (35.4%). As expected, pigeonpea ESTs are more closely related to soybean (60.3%) and cowpea ESTs (43.6%) than other plant ESTs. Similarly, BLASTX similarity results showed that only 1,603 (35.1%) out of 4,557 total unigenes correspond to known proteins in the UniProt database (or= 5 sequences detected 102 single nucleotide polymorphisms (SNPs) in 37 contigs. As an example, a set of 10 contigs were used for confirming in silico predicted SNPs in a set of four genotypes using wet lab experiments. Occurrence of SNPs were confirmed for all the 6 contigs for which scorable and sequenceable amplicons were generated. PCR amplicons were not obtained in case of 4 contigs. Recognition sites for restriction enzymes were identified for 102 SNPs in 37 contigs that indicates possibility of assaying SNPs in 37 genes using cleaved amplified polymorphic sequences (CAPS) assay. The pigeonpea EST dataset generated here provides a transcriptomic resource for gene discovery and development of functional markers associated with biotic stress resistance. Sequence analyses of this dataset have showed conservation of a considerable number of pigeonpea transcripts across legume and model plant species analysed as well as some putative pigeonpea specific genes. Validation of identified biotic stress responsive genes should provide candidate genes for allele mining as well as candidate markers for molecular breeding.

  20. The first set of EST resource for gene discovery and marker development in pigeonpea (Cajanus cajan L.)

    PubMed Central

    2010-01-01

    Background Pigeonpea (Cajanus cajan (L.) Millsp) is one of the major grain legume crops of the tropics and subtropics, but biotic stresses [Fusarium wilt (FW), sterility mosaic disease (SMD), etc.] are serious challenges for sustainable crop production. Modern genomic tools such as molecular markers and candidate genes associated with resistance to these stresses offer the possibility of facilitating pigeonpea breeding for improving biotic stress resistance. Availability of limited genomic resources, however, is a serious bottleneck to undertake molecular breeding in pigeonpea to develop superior genotypes with enhanced resistance to above mentioned biotic stresses. With an objective of enhancing genomic resources in pigeonpea, this study reports generation and analysis of comprehensive resource of FW- and SMD- responsive expressed sequence tags (ESTs). Results A total of 16 cDNA libraries were constructed from four pigeonpea genotypes that are resistant and susceptible to FW ('ICPL 20102' and 'ICP 2376') and SMD ('ICP 7035' and 'TTB 7') and a total of 9,888 (9,468 high quality) ESTs were generated and deposited in dbEST of GenBank under accession numbers GR463974 to GR473857 and GR958228 to GR958231. Clustering and assembly analyses of these ESTs resulted into 4,557 unique sequences (unigenes) including 697 contigs and 3,860 singletons. BLASTN analysis of 4,557 unigenes showed a significant identity with ESTs of different legumes (23.2-60.3%), rice (28.3%), Arabidopsis (33.7%) and poplar (35.4%). As expected, pigeonpea ESTs are more closely related to soybean (60.3%) and cowpea ESTs (43.6%) than other plant ESTs. Similarly, BLASTX similarity results showed that only 1,603 (35.1%) out of 4,557 total unigenes correspond to known proteins in the UniProt database (≤ 1E-08). Functional categorization of the annotated unigenes sequences showed that 153 (3.3%) genes were assigned to cellular component category, 132 (2.8%) to biological process, and 132 (2.8%) in molecular function. Further, 19 genes were identified differentially expressed between FW- responsive genotypes and 20 between SMD- responsive genotypes. Generated ESTs were compiled together with 908 ESTs available in public domain, at the time of analysis, and a set of 5,085 unigenes were defined that were used for identification of molecular markers in pigeonpea. For instance, 3,583 simple sequence repeat (SSR) motifs were identified in 1,365 unigenes and 383 primer pairs were designed. Assessment of a set of 84 primer pairs on 40 elite pigeonpea lines showed polymorphism with 15 (28.8%) markers with an average of four alleles per marker and an average polymorphic information content (PIC) value of 0.40. Similarly, in silico mining of 133 contigs with ≥ 5 sequences detected 102 single nucleotide polymorphisms (SNPs) in 37 contigs. As an example, a set of 10 contigs were used for confirming in silico predicted SNPs in a set of four genotypes using wet lab experiments. Occurrence of SNPs were confirmed for all the 6 contigs for which scorable and sequenceable amplicons were generated. PCR amplicons were not obtained in case of 4 contigs. Recognition sites for restriction enzymes were identified for 102 SNPs in 37 contigs that indicates possibility of assaying SNPs in 37 genes using cleaved amplified polymorphic sequences (CAPS) assay. Conclusion The pigeonpea EST dataset generated here provides a transcriptomic resource for gene discovery and development of functional markers associated with biotic stress resistance. Sequence analyses of this dataset have showed conservation of a considerable number of pigeonpea transcripts across legume and model plant species analysed as well as some putative pigeonpea specific genes. Validation of identified biotic stress responsive genes should provide candidate genes for allele mining as well as candidate markers for molecular breeding. PMID:20222972

Top