Sample records for comparative sequence derived

  1. Feedback shift register sequences versus uniformly distributed random sequences for correlation chromatography

    NASA Technical Reports Server (NTRS)

    Kaljurand, M.; Valentin, J. R.; Shao, M.

    1996-01-01

    Two alternative input sequences are commonly employed in correlation chromatography (CC). They are sequences derived according to the algorithm of the feedback shift register (i.e., pseudo random binary sequences (PRBS)) and sequences derived by using the uniform random binary sequences (URBS). These two sequences are compared. By applying the "cleaning" data processing technique to the correlograms that result from these sequences, we show that when the PRBS is used the S/N of the correlogram is much higher than the one resulting from using URBS.

  2. Genetic differences between blood- and brain-derived viral sequences from human immunodeficiency virus type 1-infected patients: evidence of conserved elements in the V3 region of the envelope protein of brain-derived sequences.

    PubMed Central

    Korber, B T; Kunstman, K J; Patterson, B K; Furtado, M; McEvilly, M M; Levy, R; Wolinsky, S M

    1994-01-01

    Human immunodeficiency virus type 1 (HIV-1) sequences were generated from blood and from brain tissue obtained by stereotactic biopsy from six patients undergoing a diagnostic neurosurgical procedure. Proviral DNA was directly amplified by nested PCR, and 8 to 36 clones from each sample were sequenced. Phylogenetic analysis of intrapatient envelope V3-V5 region HIV-1 DNA sequence sets revealed that brain viral sequences were clustered relative to the blood viral sequences, suggestive of tissue-specific compartmentalization of the virus in four of the six cases. In the other two cases, the blood and brain virus sequences were intermingled in the phylogenetic analyses, suggesting trafficking of virus between the two tissues. Slide-based PCR-driven in situ hybridization of two of the patients' brain biopsy samples confirmed our interpretation of the intrapatient phylogenetic analyses. Interpatient V3 region brain-derived sequence distances were significantly less than blood-derived sequence distances. Relative to the tip of the loop, the set of brain-derived viral sequences had a tendency towards negative or neutral charge compared with the set of blood-derived viral sequences. Entropy calculations were used as a measure of the variability at each position in alignments of blood and brain viral sequences. A relatively conserved set of positions were found, with a significantly lower entropy in the brain-than in the blood-derived viral sequences. These sites constitute a brain "signature pattern," or a noncontiguous set of amino acids in the V3 region conserved in viral sequences derived from brain tissue. This brain-derived signature pattern was also well preserved among isolates previously characterized in vitro as macrophage tropic. Macrophage-monocyte tropism may be the biological constraint that results in the conservation of the viral brain signature pattern. Images PMID:7933130

  3. Exploiting rice-sorghum synteny for targeted development of EST-SSRs to enrich the sorghum genetic linkage map.

    PubMed

    Ramu, P; Kassahun, B; Senthilvel, S; Ashok Kumar, C; Jayashree, B; Folkertsma, R T; Reddy, L Ananda; Kuruvinashetti, M S; Haussmann, B I G; Hash, C T

    2009-11-01

    The sequencing and detailed comparative functional analysis of genomes of a number of select botanical models open new doors into comparative genomics among the angiosperms, with potential benefits for improvement of many orphan crops that feed large populations. In this study, a set of simple sequence repeat (SSR) markers was developed by mining the expressed sequence tag (EST) database of sorghum. Among the SSR-containing sequences, only those sharing considerable homology with rice genomic sequences across the lengths of the 12 rice chromosomes were selected. Thus, 600 SSR-containing sorghum EST sequences (50 homologous sequences on each of the 12 rice chromosomes) were selected, with the intention of providing coverage for corresponding homologous regions of the sorghum genome. Primer pairs were designed and polymorphism detection ability was assessed using parental pairs of two existing sorghum mapping populations. About 28% of these new markers detected polymorphism in this 4-entry panel. A subset of 55 polymorphic EST-derived SSR markers were mapped onto the existing skeleton map of a recombinant inbred population derived from cross N13 x E 36-1, which is segregating for Striga resistance and the stay-green component of terminal drought tolerance. These new EST-derived SSR markers mapped across all 10 sorghum linkage groups, mostly to regions expected based on prior knowledge of rice-sorghum synteny. The ESTs from which these markers were derived were then mapped in silico onto the aligned sorghum genome sequence, and 88% of the best hits corresponded to linkage-based positions. This study demonstrates the utility of comparative genomic information in targeted development of markers to fill gaps in linkage maps of related crop species for which sufficient genomic tools are not available.

  4. In vitro gene expression by cationized derivatives of an artificial protein with repeated RGD sequences, Pronectin.

    PubMed

    Hosseinkhani, Hossein; Tabata, Yasuhiko

    2003-01-09

    The objective of this study is to investigate the efficiency of a non-viral gene carrier with RGD sequences, Pronectin F(+) for gene transfection. The Pronectin F(+) was cationized by introducing ethylenediamine (Ed), spermidine (Sd), and spermine (Sm) to the hydroxyl groups while the corresponding gelatin derivative was prepared similarly because gelatin also has one RGD sequence per molecule. The zeta potential and molecular size of Pronectin F(+) and gelatin derivatives were examined before and after polyion complexation with a plasmid DNA of luciferase. When complexed with the plasmid DNA at the Pronectin F(+)/plasmid DNA mixing ratio of 50, the complex exhibited a zeta potential of about 10 mV, which is similar to that of the gelatin derivative-plasmid DNA complex. Irrespective of the type of Pronectin F(+) and gelatin derivatives, their complexation enabled the apparent molecular size of plasmid DNA to reduce to about 200 nm, the size decreasing with the increased derivative/plasmid DNA weight mixing ratio. The rat gastric mucosal (RGM)-1 cells treated with both complexes exhibited significantly stronger luciferase activities than free plasmid DNA although the enhanced extent was significant for the Sm derivative compared with the corresponding Ed and Sd derivatives. Cell attachment was enhanced by the Pronectin F(+) derivative to a significant high extent compared with the gelatin derivative. The amount of plasmid DNA internalized into the cells was enhanced by the complexation with every Pronectin F(+) derivative compared with the gelatin derivative. For both of Pronectin F(+) and gelatin carriers, the buffering capacity of Sm derivatives was higher than that of Ed and Sd derivatives and comparable to that of polyethyleneimine. It is likely that the high efficiency of gene transfection for the Sm derivative is due to the superior buffering effect. We conclude that the Sm derivative of Pronectin F(+) is promising as a non-viral vector of gene transfection.

  5. CRITICA: coding region identification tool invoking comparative analysis

    NASA Technical Reports Server (NTRS)

    Badger, J. H.; Olsen, G. J.; Woese, C. R. (Principal Investigator)

    1999-01-01

    Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).

  6. Brassica ASTRA: an integrated database for Brassica genomic research.

    PubMed

    Love, Christopher G; Robinson, Andrew J; Lim, Geraldine A C; Hopkins, Clare J; Batley, Jacqueline; Barker, Gary; Spangenberg, German C; Edwards, David

    2005-01-01

    Brassica ASTRA is a public database for genomic information on Brassica species. The database incorporates expressed sequences with Swiss-Prot and GenBank comparative sequence annotation as well as secondary Gene Ontology (GO) annotation derived from the comparison with Arabidopsis TAIR GO annotations. Simple sequence repeat molecular markers are identified within resident sequences and mapped onto the closely related Arabidopsis genome sequence. Bacterial artificial chromosome (BAC) end sequences derived from the Multinational Brassica Genome Project are also mapped onto the Arabidopsis genome sequence enabling users to identify candidate Brassica BACs corresponding to syntenic regions of Arabidopsis. This information is maintained in a MySQL database with a web interface providing the primary means of interrogation. The database is accessible at http://hornbill.cspp.latrobe.edu.au.

  7. Analytical and Clinical Validation of a Digital Sequencing Panel for Quantitative, Highly Accurate Evaluation of Cell-Free Circulating Tumor DNA

    PubMed Central

    Zill, Oliver A.; Sebisanovic, Dragan; Lopez, Rene; Blau, Sibel; Collisson, Eric A.; Divers, Stephen G.; Hoon, Dave S. B.; Kopetz, E. Scott; Lee, Jeeyun; Nikolinakos, Petros G.; Baca, Arthur M.; Kermani, Bahram G.; Eltoukhy, Helmy; Talasaz, AmirAli

    2015-01-01

    Next-generation sequencing of cell-free circulating solid tumor DNA addresses two challenges in contemporary cancer care. First this method of massively parallel and deep sequencing enables assessment of a comprehensive panel of genomic targets from a single sample, and second, it obviates the need for repeat invasive tissue biopsies. Digital SequencingTM is a novel method for high-quality sequencing of circulating tumor DNA simultaneously across a comprehensive panel of over 50 cancer-related genes with a simple blood test. Here we report the analytic and clinical validation of the gene panel. Analytic sensitivity down to 0.1% mutant allele fraction is demonstrated via serial dilution studies of known samples. Near-perfect analytic specificity (> 99.9999%) enables complete coverage of many genes without the false positives typically seen with traditional sequencing assays at mutant allele frequencies or fractions below 5%. We compared digital sequencing of plasma-derived cell-free DNA to tissue-based sequencing on 165 consecutive matched samples from five outside centers in patients with stage III-IV solid tumor cancers. Clinical sensitivity of plasma-derived NGS was 85.0%, comparable to 80.7% sensitivity for tissue. The assay success rate on 1,000 consecutive samples in clinical practice was 99.8%. Digital sequencing of plasma-derived DNA is indicated in advanced cancer patients to prevent repeated invasive biopsies when the initial biopsy is inadequate, unobtainable for genomic testing, or uninformative, or when the patient’s cancer has progressed despite treatment. Its clinical utility is derived from reduction in the costs, complications and delays associated with invasive tissue biopsies for genomic testing. PMID:26474073

  8. Plastome Sequence Determination and Comparative Analysis for Members of the Lolium-Festuca Grass Species Complex

    PubMed Central

    Hand, Melanie L.; Spangenberg, German C.; Forster, John W.; Cogan, Noel O. I.

    2013-01-01

    Chloroplast genome sequences are of broad significance in plant biology, due to frequent use in molecular phylogenetics, comparative genomics, population genetics, and genetic modification studies. The present study used a second-generation sequencing approach to determine and assemble the plastid genomes (plastomes) of four representatives from the agriculturally important Lolium-Festuca species complex of pasture grasses (Lolium multiflorum, Festuca pratensis, Festuca altissima, and Festuca ovina). Total cellular DNA was extracted from either roots or leaves, was sequenced, and the output was filtered for plastome-related reads. A comparison between sources revealed fewer plastome-related reads from root-derived template but an increase in incidental bacterium-derived sequences. Plastome assembly and annotation indicated high levels of sequence identity and a conserved organization and gene content between species. However, frequent deletions within the F. ovina plastome appeared to contribute to a smaller plastid genome size. Comparative analysis with complete plastome sequences from other members of the Poaceae confirmed conservation of most grass-specific features. Detailed analysis of the rbcL–psaI intergenic region, however, revealed a “hot-spot” of variation characterized by independent deletion events. The evolutionary implications of this observation are discussed. The complete plastome sequences are anticipated to provide the basis for potential organelle-specific genetic modification of pasture grasses. PMID:23550121

  9. Verifying Digital Components of Physical Systems: Experimental Evaluation of Test Quality

    NASA Astrophysics Data System (ADS)

    Laputenko, A. V.; López, J. E.; Yevtushenko, N. V.

    2018-03-01

    This paper continues the study of high quality test derivation for verifying digital components which are used in various physical systems; those are sensors, data transfer components, etc. We have used logic circuits b01-b010 of the package of ITC'99 benchmarks (Second Release) for experimental evaluation which as stated before, describe digital components of physical systems designed for various applications. Test sequences are derived for detecting the most known faults of the reference logic circuit using three different approaches to test derivation. Three widely used fault types such as stuck-at-faults, bridges, and faults which slightly modify the behavior of one gate are considered as possible faults of the reference behavior. The most interesting test sequences are short test sequences that can provide appropriate guarantees after testing, and thus, we experimentally study various approaches to the derivation of the so-called complete test suites which detect all fault types. In the first series of experiments, we compare two approaches for deriving complete test suites. In the first approach, a shortest test sequence is derived for testing each fault. In the second approach, a test sequence is pseudo-randomly generated by the use of an appropriate software for logic synthesis and verification (ABC system in our study) and thus, can be longer. However, after deleting sequences detecting the same set of faults, a test suite returned by the second approach is shorter. The latter underlines the fact that in many cases it is useless to spend `time and efforts' for deriving a shortest distinguishing sequence; it is better to use the test minimization afterwards. The performed experiments also show that the use of only randomly generated test sequences is not very efficient since such sequences do not detect all the faults of any type. After reaching the fault coverage around 70%, saturation is observed, and the fault coverage cannot be increased anymore. For deriving high quality short test suites, the approach that is the combination of randomly generated sequences together with sequences which are aimed to detect faults not detected by random tests, allows to reach the good fault coverage using shortest test sequences.

  10. HOMFLYPT polynomial is the best quantifier for topological cascades of vortex knots

    NASA Astrophysics Data System (ADS)

    Ricca, Renzo L.; Liu, Xin

    2018-02-01

    In this paper we derive and compare numerical sequences obtained by adapted polynomials such as HOMFLYPT, Jones and Alexander-Conway for the topological cascade of vortex torus knots and links that progressively untie by a single reconnection event at a time. Two cases are considered: the alternate sequence of knots and co-oriented links (with positive crossings) and the sequence of two-component links with oppositely oriented components (negative crossings). New recurrence equations are derived and sequences of numerical values are computed. In all cases the adapted HOMFLYPT polynomial proves to be the best quantifier for the topological cascade of torus knots and links.

  11. Functionally conserved enhancers with divergent sequences in distant vertebrates

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, Song; Oksenberg, Nir; Takayama, Sachiko

    To examine the contributions of sequence and function conservation in the evolution of enhancers, we systematically identified enhancers whose sequences are not conserved among distant groups of vertebrate species, but have homologous function and are likely to be derived from a common ancestral sequence. In conclusion, our approach combined comparative genomics and epigenomics to identify potential enhancer sequences in the genomes of three groups of distantly related vertebrate species.

  12. Functionally conserved enhancers with divergent sequences in distant vertebrates

    DOE PAGES

    Yang, Song; Oksenberg, Nir; Takayama, Sachiko; ...

    2015-10-30

    To examine the contributions of sequence and function conservation in the evolution of enhancers, we systematically identified enhancers whose sequences are not conserved among distant groups of vertebrate species, but have homologous function and are likely to be derived from a common ancestral sequence. In conclusion, our approach combined comparative genomics and epigenomics to identify potential enhancer sequences in the genomes of three groups of distantly related vertebrate species.

  13. Gene discovery in the hamster: a comparative genomics approach for gene annotation by sequencing of hamster testis cDNAs

    PubMed Central

    Oduru, Sreedhar; Campbell, Janee L; Karri, SriTulasi; Hendry, William J; Khan, Shafiq A; Williams, Simon C

    2003-01-01

    Background Complete genome annotation will likely be achieved through a combination of computer-based analysis of available genome sequences combined with direct experimental characterization of expressed regions of individual genomes. We have utilized a comparative genomics approach involving the sequencing of randomly selected hamster testis cDNAs to begin to identify genes not previously annotated on the human, mouse, rat and Fugu (pufferfish) genomes. Results 735 distinct sequences were analyzed for their relatedness to known sequences in public databases. Eight of these sequences were derived from previously unidentified genes and expression of these genes in testis was confirmed by Northern blotting. The genomic locations of each sequence were mapped in human, mouse, rat and pufferfish, where applicable, and the structure of their cognate genes was derived using computer-based predictions, genomic comparisons and analysis of uncharacterized cDNA sequences from human and macaque. Conclusion The use of a comparative genomics approach resulted in the identification of eight cDNAs that correspond to previously uncharacterized genes in the human genome. The proteins encoded by these genes included a new member of the kinesin superfamily, a SET/MYND-domain protein, and six proteins for which no specific function could be predicted. Each gene was expressed primarily in testis, suggesting that they may play roles in the development and/or function of testicular cells. PMID:12783626

  14. Complementary DNA cloning and molecular evolution of opine dehydrogenases in some marine invertebrates.

    PubMed

    Kimura, Tomohiro; Nakano, Toshiki; Yamaguchi, Toshiyasu; Sato, Minoru; Ogawa, Tomohisa; Muramoto, Koji; Yokoyama, Takehiko; Kan-No, Nobuhiro; Nagahisa, Eizou; Janssen, Frank; Grieshaber, Manfred K

    2004-01-01

    The complete complementary DNA sequences of genes presumably coding for opine dehydrogenases from Arabella iricolor (sandworm), Haliotis discus hannai (abalone), and Patinopecten yessoensis (scallop) were determined, and partial cDNA sequences were derived for Meretrix lusoria (Japanese hard clam) and Spisula sachalinensis (Sakhalin surf clam). The primers ODH-9F and ODH-11R proved useful for amplifying the sequences for opine dehydrogenases from the 4 mollusk species investigated in this study. The sequence of the sandworm was obtained using primers constructed from the amino acid sequence of tauropine dehydrogenase, the main opine dehydrogenase in A. iricolor. The complete cDNA sequence of A. iricolor, H. discus hannai, and P. yessoensis encode 397, 400, and 405 amino acids, respectively. All sequences were aligned and compared with published databank sequences of Loligo opalescens, Loligo vulgaris (squid), Sepia officinalis (cuttlefish), and Pecten maximus (scallop). As expected, a high level of homology was observed for the cDNA from closely related species, such as for cephalopods or scallops, whereas cDNA from the other species showed lower-level homologies. A similar trend was observed when the deduced amino acid sequences were compared. Furthermore, alignment of these sequences revealed some structural motifs that are possibly related to the binding sites of the substrates. The phylogenetic trees derived from the nucleotide and amino acid sequences were consistent with the classification of species resulting from classical taxonomic analyses.

  15. From genomics to functional markers in the era of next-generation sequencing.

    PubMed

    Salgotra, R K; Gupta, B B; Stewart, C N

    2014-03-01

    The availability of complete genome sequences, along with other genomic resources for Arabidopsis, rice, pigeon pea, soybean and other crops, has revolutionized our understanding of the genetic make-up of plants. Next-generation DNA sequencing (NGS) has facilitated single nucleotide polymorphism discovery in plants. Functionally-characterized sequences can be identified and functional markers (FMs) for important traits can be developed at an ever-increasing ease. FMs are derived from sequence polymorphisms found in allelic variants of a functional gene. Linkage disequilibrium-based association mapping and homologous recombinants have been developed for identification of "perfect" markers for their use in crop improvement practices. Compared with many other molecular markers, FMs derived from the functionally characterized sequence genes using NGS techniques and their use provide opportunities to develop high-yielding plant genotypes resistant to various stresses at a fast pace.

  16. In silico site-directed mutagenesis informs species-specific predictions of chemical susceptibility derived from the Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool

    EPA Science Inventory

    The Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was developed to address needs for rapid, cost effective methods of species extrapolation of chemical susceptibility. Specifically, the SeqAPASS tool compares the primary sequence (Level 1), functiona...

  17. Improving de novo sequence assembly using machine learning and comparative genomics for overlap correction.

    PubMed

    Palmer, Lance E; Dejori, Mathaeus; Bolanos, Randall; Fasulo, Daniel

    2010-01-15

    With the rapid expansion of DNA sequencing databases, it is now feasible to identify relevant information from prior sequencing projects and completed genomes and apply it to de novo sequencing of new organisms. As an example, this paper demonstrates how such extra information can be used to improve de novo assemblies by augmenting the overlapping step. Finding all pairs of overlapping reads is a key task in many genome assemblers, and to this end, highly efficient algorithms have been developed to find alignments in large collections of sequences. It is well known that due to repeated sequences, many aligned pairs of reads nevertheless do not overlap. But no overlapping algorithm to date takes a rigorous approach to separating aligned but non-overlapping read pairs from true overlaps. We present an approach that extends the Minimus assembler by a data driven step to classify overlaps as true or false prior to contig construction. We trained several different classification models within the Weka framework using various statistics derived from overlaps of reads available from prior sequencing projects. These statistics included percent mismatch and k-mer frequencies within the overlaps as well as a comparative genomics score derived from mapping reads to multiple reference genomes. We show that in real whole-genome sequencing data from the E. coli and S. aureus genomes, by providing a curated set of overlaps to the contigging phase of the assembler, we nearly doubled the median contig length (N50) without sacrificing coverage of the genome or increasing the number of mis-assemblies. Machine learning methods that use comparative and non-comparative features to classify overlaps as true or false can be used to improve the quality of a sequence assembly.

  18. Evaluating the protein coding potential of exonized transposable element sequences

    PubMed Central

    Piriyapongsa, Jittima; Rutledge, Mark T; Patel, Sanil; Borodovsky, Mark; Jordan, I King

    2007-01-01

    Background Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE sequences have been incorporated as exons into mRNAs, and it is widely assumed that such 'exonized' TEs encode protein sequences. However, the extent to which TE-derived sequences actually encode proteins is unknown and a matter of some controversy. We have tried to address this outstanding issue from two perspectives: i-by evaluating ascertainment biases related to the search methods used to uncover TE-derived protein coding sequences (CDS) and ii-through a probabilistic codon-frequency based analysis of the protein coding potential of TE-derived exons. Results We compared the ability of three classes of sequence similarity search methods to detect TE-derived sequences among data sets of experimentally characterized proteins: 1-a profile-based hidden Markov model (HMM) approach, 2-BLAST methods and 3-RepeatMasker. Profile based methods are more sensitive and more selective than the other methods evaluated. However, the application of profile-based search methods to the detection of TE-derived sequences among well-curated experimentally characterized protein data sets did not turn up many more cases than had been previously detected and nowhere near as many cases as recent genome-wide searches have. We observed that the different search methods used were complementary in the sense that they yielded largely non-overlapping sets of hits and differed in their ability to recover known cases of TE-derived CDS. The probabilistic analysis of TE-derived exon sequences indicates that these sequences have low protein coding potential on average. In particular, non-autonomous TEs that do not encode protein sequences, such as Alu elements, are frequently exonized but unlikely to encode protein sequences. Conclusion The exaptation of the numerous TE sequences found in exons as bona fide protein coding sequences may prove to be far less common than has been suggested by the analysis of complete genomes. We hypothesize that many exonized TE sequences actually function as post-transcriptional regulators of gene expression, rather than coding sequences, which may act through a variety of double stranded RNA related regulatory pathways. Indeed, their relatively high copy numbers and similarity to sequences dispersed throughout the genome suggests that exonized TE sequences could serve as master regulators with a wide scope of regulatory influence. Reviewers: This article was reviewed by Itai Yanai, Kateryna D. Makova, Melissa Wilson (nominated by Kateryna D. Makova) and Cedric Feschotte (nominated by John M. Logsdon Jr.). PMID:18036258

  19. Genetic diversity in Trypanosoma theileri from Sri Lankan cattle and water buffaloes.

    PubMed

    Yokoyama, Naoaki; Sivakumar, Thillaiampalam; Fukushi, Shintaro; Tattiyapong, Muncharee; Tuvshintulga, Bumduuren; Kothalawala, Hemal; Silva, Seekkuge Susil Priyantha; Igarashi, Ikuo; Inoue, Noboru

    2015-01-30

    Trypanosoma theileri is a hemoprotozoan parasite that infects various ruminant species. We investigated the epidemiology of this parasite among cattle and water buffalo populations bred in Sri Lanka, using a diagnostic PCR assay based on the cathepsin L-like protein (CATL) gene. Blood DNA samples sourced from cattle (n=316) and water buffaloes (n=320) bred in different geographical areas of Sri Lanka were PCR screened for T. theileri. Parasite DNA was detected in cattle and water buffaloes alike in all the sampling locations. The overall T. theileri-positive rate was higher in water buffaloes (15.9%) than in cattle (7.6%). Subsequently, PCR amplicons were sequenced and the partial CATL sequences were phylogenetically analyzed. The identity values for the CATL gene were 89.6-99.7% among the cattle-derived sequences, compared with values of 90.7-100% for the buffalo-derived sequences. However, the cattle-derived sequences shared 88.2-100% identity values with those from buffaloes. In the phylogenetic tree, the Sri Lankan CATL gene sequences fell into two major clades (TthI and TthII), both of which contain CATL sequences from several other countries. Although most of the CATL sequences from Sri Lankan cattle and buffaloes clustered independently, two buffalo-derived sequences were observed to be closely related to those of the Sri Lankan cattle. Furthermore, a Sri Lankan buffalo sequence clustered with CATL gene sequences from Brazilian buffalo and Thai cattle. In addition to reporting the first PCR-based survey of T. theileri among Sri Lankan-bred cattle and water buffaloes, the present study found that some of the CATL gene fragments sourced from water buffaloes shared similarity with those determined from cattle in this country. Copyright © 2014 Elsevier B.V. All rights reserved.

  20. Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

    NASA Technical Reports Server (NTRS)

    Gatlin, L. L.

    1974-01-01

    Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.

  1. Diversity of the P2 protein among nontypeable Haemophilus influenzae isolates.

    PubMed Central

    Bell, J; Grass, S; Jeanteur, D; Munson, R S

    1994-01-01

    The genes for outer membrane protein P2 of four nontypeable Haemophilus influenzae strains were cloned and sequenced. The derived amino acid sequences were compared with the outer membrane protein P2 sequence from H. influenzae type b MinnA and the sequences of P2 from three additional nontypeable H. influenzae strains. The sequences were 76 to 94% identical. The sequences had regions with considerable variability separated by regions which were highly conserved. The variable regions mapped to putative surface-exposed loops of the protein. PMID:8188390

  2. Anti-infective activity of apolipoprotein domain derived peptides in vitro: identification of novel antimicrobial peptides related to apolipoprotein B with anti-HIV activity

    PubMed Central

    2010-01-01

    Background Previous reports have shown that peptides derived from the apolipoprotein E receptor binding region and the amphipathic α-helical domains of apolipoprotein AI have broad anti-infective activity and antiviral activity respectively. Lipoproteins and viruses share a similar cell biological niche, being of overlapping size and displaying similar interactions with mammalian cells and receptors, which may have led to other antiviral sequences arising within apolipoproteins, in addition to those previously reported. We therefore designed a series of peptides based around either apolipoprotein receptor binding regions, or amphipathic α-helical domains, and tested these for antiviral and antibacterial activity. Results Of the nineteen new peptides tested, seven showed some anti-infective activity, with two of these being derived from two apolipoproteins not previously used to derive anti-infective sequences. Apolipoprotein J (151-170) - based on a predicted amphipathic alpha-helical domain from apolipoprotein J - had measurable anti-HSV1 activity, as did apolipoprotein B (3359-3367) dp (apoBdp), the latter being derived from the LDL receptor binding domain B of apolipoprotein B. The more active peptide - apoBdp - showed similarity to the previously reported apoE derived anti-infective peptide, and further modification of the apoBdp sequence to align the charge distribution more closely to that of apoEdp or to introduce aromatic residues resulted in increased breadth and potency of activity. The most active peptide of this type showed similar potent anti-HIV activity, comparable to that we previously reported for the apoE derived peptide apoEdpL-W. Conclusions These data suggest that further antimicrobial peptides may be obtained using human apolipoprotein sequences, selecting regions with either amphipathic α-helical structure, or those linked to receptor-binding regions. The finding that an amphipathic α-helical region of apolipoprotein J has antiviral activity comparable with that for the previously reported apolipoprotein AI derived peptide 18A, suggests that full-length apolipoprotein J may also have such activity, as has been reported for full-length apolipoprotein AI. Although the strength of the anti-infective activity of the sequences identified was limited, this could be increased substantially by developing related mutant peptides. Indeed the apolipoprotein B-derived peptide mutants uncovered by the present study may have utility as HIV therapeutics or microbicides. PMID:20298574

  3. Cardiomyocytes In Vitro Adhesion Is Actively Influenced by Biomimetic Synthetic Peptides for Cardiac Tissue Engineering

    PubMed Central

    Huerta-Cantillo, Rocio; Comisso, Marina; Danesin, Roberta; Ghezzo, Francesca; Naso, Filippo; Gastaldello, Alessandra; Schittullo, Eleonora; Buratto, Edward; Spina, Michele; Gerosa, Gino; Dettin, Monica

    2012-01-01

    Scaffolds for tissue engineering must be designed to direct desired events such as cell attachment, growth, and differentiation. The incorporation of extracellular matrix-derived peptides into biomaterials has been proposed to mimic biochemical signals. In this study, three synthetic fragments of fibronectin, vitronectin, and stromal-derived factor-1 were investigated for the first time as potential adhesive sequences for cardiomyocytes (CMs) compared to smooth muscle cells. CMs are responsive to all peptides to differing degrees, demonstrating the existence of diverse adhesion mechanisms. The pretreatment of nontissue culture well surfaces with the (Arginine-Glycine-Aspartic Acid) RGD sequence anticipated the appearance of CMs' contractility compared to the control (fibronectin-coated well) and doubled the length of cell viability. Future prospects are the inclusion of these sequences into biomaterial formulation with the improvement in cell adhesion that could play an important role in cell retention during dynamic cell seeding. PMID:22011064

  4. Phylogenetic Analysis of Shewanella Strains by DNA Relatedness Derived from Whole Genome Microarray DNA-DNA Hybridization and Comparison with Other Methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Liyou; Yi, T. Y.; Van Nostrand, Joy

    Phylogenetic analyses were done for the Shewanella strains isolated from Baltic Sea (38 strains), US DOE Hanford Uranium bioremediation site [Hanford Reach of the Columbia River (HRCR), 11 strains], Pacific Ocean and Hawaiian sediments (8 strains), and strains from other resources (16 strains) with three out group strains, Rhodopseudomonas palustris, Clostridium cellulolyticum, and Thermoanaerobacter ethanolicus X514, using DNA relatedness derived from WCGA-based DNA-DNA hybridizations, sequence similarities of 16S rRNA gene and gyrB gene, and sequence similarities of 6 loci of Shewanella genome selected from a shared gene list of the Shewanella strains with whole genome sequenced based on the averagemore » nucleotide identity of them (ANI). The phylogenetic trees based on 16S rRNA and gyrB gene sequences, and DNA relatedness derived from WCGA hybridizations of the tested Shewanella strains share exactly the same sub-clusters with very few exceptions, in which the strains were basically grouped by species. However, the phylogenetic analysis based on DNA relatedness derived from WCGA hybridizations dramatically increased the differentiation resolution at species and strains level within Shewanella genus. When the tree based on DNA relatedness derived from WCGA hybridizations was compared to the tree based on the combined sequences of the selected functional genes (6 loci), we found that the resolutions of both methods are similar, but the clustering of the tree based on DNA relatedness derived from WMGA hybridizations was clearer. These results indicate that WCGA-based DNA-DNA hybridization is an idea alternative of conventional DNA-DNA hybridization methods and it is superior to the phylogenetics methods based on sequence similarities of single genes. Detailed analysis is being performed for the re-classification of the strains examined.« less

  5. A Contextualized, Differential Sequence Mining Method to Derive Students' Learning Behavior Patterns

    ERIC Educational Resources Information Center

    Kinnebrew, John S.; Loretz, Kirk M.; Biswas, Gautam

    2013-01-01

    Computer-based learning environments can produce a wealth of data on student learning interactions. This paper presents an exploratory data mining methodology for assessing and comparing students' learning behaviors from these interaction traces. The core algorithm employs a novel combination of sequence mining techniques to identify deferentially…

  6. Sputnik: a database platform for comparative plant genomics.

    PubMed

    Rudd, Stephen; Mewes, Hans-Werner; Mayer, Klaus F X

    2003-01-01

    Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST collections has been reduced using optimised EST clustering techniques. Annotation of cluster sequences is performed by exploiting and transferring information from the comprehensive knowledgebase already produced for the completed model plant genome (Arabidopsis thaliana) and by performing additional state of-the-art sequence analyses relevant to today's plant biologist. Functional predictions, comparative analyses and associative annotations for 500 000 plant EST derived peptides make Sputnik (http://mips.gsf.de/proj/sputnik/) a valid platform for contemporary plant genomics.

  7. Sputnik: a database platform for comparative plant genomics

    PubMed Central

    Rudd, Stephen; Mewes, Hans-Werner; Mayer, Klaus F.X.

    2003-01-01

    Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST collections has been reduced using optimised EST clustering techniques. Annotation of cluster sequences is performed by exploiting and transferring information from the comprehensive knowledgebase already produced for the completed model plant genome (Arabidopsis thaliana) and by performing additional state of-the-art sequence analyses relevant to today's plant biologist. Functional predictions, comparative analyses and associative annotations for 500 000 plant EST derived peptides make Sputnik (http://mips.gsf.de/proj/sputnik/) a valid platform for contemporary plant genomics. PMID:12519965

  8. Thermodynamics-based models of transcriptional regulation with gene sequence.

    PubMed

    Wang, Shuqiang; Shen, Yanyan; Hu, Jinxing

    2015-12-01

    Quantitative models of gene regulatory activity have the potential to improve our mechanistic understanding of transcriptional regulation. However, the few models available today have been based on simplistic assumptions about the sequences being modeled or heuristic approximations of the underlying regulatory mechanisms. In this work, we have developed a thermodynamics-based model to predict gene expression driven by any DNA sequence. The proposed model relies on a continuous time, differential equation description of transcriptional dynamics. The sequence features of the promoter are exploited to derive the binding affinity which is derived based on statistical molecular thermodynamics. Experimental results show that the proposed model can effectively identify the activity levels of transcription factors and the regulatory parameters. Comparing with the previous models, the proposed model can reveal more biological sense.

  9. Human lactoferricin derived di-peptides deploying loop structures induce apoptosis specifically in cancer cells through targeting membranous phosphatidylserine.

    PubMed

    Riedl, Sabrina; Leber, Regina; Rinner, Beate; Schaider, Helmut; Lohner, Karl; Zweytick, Dagmar

    2015-11-01

    Host defense-derived peptides have emerged as a novel strategy for the development of alternative anticancer therapies. In this study we report on characteristic features of human lactoferricin (hLFcin) derivatives which facilitate specific killing of cancer cells of melanoma, glioblastoma and rhabdomyosarcoma compared with non-specific derivatives and the synthetic peptide RW-AH. Changes in amino acid sequence of hLFcin providing 9-11 amino acids stretched derivatives LF11-316, -318 and -322 only yielded low antitumor activity. However, the addition of the repeat (di-peptide) and the retro-repeat (di-retro-peptide) sequences highly improved cancer cell toxicity up to 100% at 20 μM peptide concentration. Compared to the complete parent sequence hLFcin the derivatives showed toxicity on the melanoma cell line A375 increased by 10-fold and on the glioblastoma cell line U-87mg by 2-3-fold. Reduced killing velocity, apoptotic blebbing, activation of caspase 3/7 and formation of apoptotic DNA fragments proved that the active and cancer selective peptides, e.g. R-DIM-P-LF11-322, trigger apoptosis, whereas highly active, though non-selective peptides, such as DIM-LF11-318 and RW-AH seem to kill rapidly via necrosis inducing membrane lyses. Structural studies revealed specific toxicity on cancer cells by peptide derivatives with loop structures, whereas non-specific peptides comprised α-helical structures without loop. Model studies with the cancer membrane mimic phosphatidylserine (PS) gave strong evidence that PS only exposed by cancer cells is an important target for specific hLFcin derivatives. Other negatively charged membrane exposed molecules as sialic acid, heparan and chondroitin sulfate were shown to have minor impact on peptide activity. Copyright © 2015. Published by Elsevier B.V.

  10. Comparative Genomics Analyses Reveal Extensive Chromosome Colinearity and Novel Quantitative Trait Loci in Eucalyptus.

    PubMed

    Li, Fagen; Zhou, Changpin; Weng, Qijie; Li, Mei; Yu, Xiaoli; Guo, Yong; Wang, Yu; Zhang, Xiaohong; Gan, Siming

    2015-01-01

    Dense genetic maps, along with quantitative trait loci (QTLs) detected on such maps, are powerful tools for genomics and molecular breeding studies. In the important woody genus Eucalyptus, the recent release of E. grandis genome sequence allows for sequence-based genomic comparison and searching for positional candidate genes within QTL regions. Here, dense genetic maps were constructed for E. urophylla and E. tereticornis using genomic simple sequence repeats (SSR), expressed sequence tag (EST) derived SSR, EST-derived cleaved amplified polymorphic sequence (EST-CAPS), and diversity arrays technology (DArT) markers. The E. urophylla and E. tereticornis maps comprised 700 and 585 markers across 11 linkage groups, totaling at 1,208.2 and 1,241.4 cM in length, respectively. Extensive synteny and colinearity were observed as compared to three earlier DArT-based eucalypt maps (two maps with E. grandis × E. urophylla and one map of E. globulus) and with the E. grandis genome sequence. Fifty-three QTLs for growth (10-56 months of age) and wood density (56 months) were identified in 22 discrete regions on both maps, in which only one colocalizaiton was found between growth and wood density. Novel QTLs were revealed as compared with those previously detected on DArT-based maps for similar ages in Eucalyptus. Eleven to 585 positional candidate genes were obained for a 56-month-old QTL through aligning QTL confidence interval with the E. grandis genome. These results will assist in comparative genomics studies, targeted gene characterization, and marker-assisted selection in Eucalyptus and the related taxa.

  11. Comparative Genomics Analyses Reveal Extensive Chromosome Colinearity and Novel Quantitative Trait Loci in Eucalyptus

    PubMed Central

    Weng, Qijie; Li, Mei; Yu, Xiaoli; Guo, Yong; Wang, Yu; Zhang, Xiaohong; Gan, Siming

    2015-01-01

    Dense genetic maps, along with quantitative trait loci (QTLs) detected on such maps, are powerful tools for genomics and molecular breeding studies. In the important woody genus Eucalyptus, the recent release of E. grandis genome sequence allows for sequence-based genomic comparison and searching for positional candidate genes within QTL regions. Here, dense genetic maps were constructed for E. urophylla and E. tereticornis using genomic simple sequence repeats (SSR), expressed sequence tag (EST) derived SSR, EST-derived cleaved amplified polymorphic sequence (EST-CAPS), and diversity arrays technology (DArT) markers. The E. urophylla and E. tereticornis maps comprised 700 and 585 markers across 11 linkage groups, totaling at 1,208.2 and 1,241.4 cM in length, respectively. Extensive synteny and colinearity were observed as compared to three earlier DArT-based eucalypt maps (two maps with E. grandis × E. urophylla and one map of E. globulus) and with the E. grandis genome sequence. Fifty-three QTLs for growth (10–56 months of age) and wood density (56 months) were identified in 22 discrete regions on both maps, in which only one colocalizaiton was found between growth and wood density. Novel QTLs were revealed as compared with those previously detected on DArT-based maps for similar ages in Eucalyptus. Eleven to 585 positional candidate genes were obained for a 56-month-old QTL through aligning QTL confidence interval with the E. grandis genome. These results will assist in comparative genomics studies, targeted gene characterization, and marker-assisted selection in Eucalyptus and the related taxa. PMID:26695430

  12. Optimization of parameter values for complex pulse sequences by simulated annealing: application to 3D MP-RAGE imaging of the brain.

    PubMed

    Epstein, F H; Mugler, J P; Brookeman, J R

    1994-02-01

    A number of pulse sequence techniques, including magnetization-prepared gradient echo (MP-GRE), segmented GRE, and hybrid RARE, employ a relatively large number of variable pulse sequence parameters and acquire the image data during a transient signal evolution. These sequences have recently been proposed and/or used for clinical applications in the brain, spine, liver, and coronary arteries. Thus, the need for a method of deriving optimal pulse sequence parameter values for this class of sequences now exists. Due to the complexity of these sequences, conventional optimization approaches, such as applying differential calculus to signal difference equations, are inadequate. We have developed a general framework for adapting the simulated annealing algorithm to pulse sequence parameter value optimization, and applied this framework to the specific case of optimizing the white matter-gray matter signal difference for a T1-weighted variable flip angle 3D MP-RAGE sequence. Using our algorithm, the values of 35 sequence parameters, including the magnetization-preparation RF pulse flip angle and delay time, 32 flip angles in the variable flip angle gradient-echo acquisition sequence, and the magnetization recovery time, were derived. Optimized 3D MP-RAGE achieved up to a 130% increase in white matter-gray matter signal difference compared with optimized 3D RF-spoiled FLASH with the same total acquisition time. The simulated annealing approach was effective at deriving optimal parameter values for a specific 3D MP-RAGE imaging objective, and may be useful for other imaging objectives and sequences in this general class.

  13. Sockeye: A 3D Environment for Comparative Genomics

    PubMed Central

    Montgomery, Stephen B.; Astakhova, Tamara; Bilenky, Mikhail; Birney, Ewan; Fu, Tony; Hassel, Maik; Melsopp, Craig; Rak, Marcin; Robertson, A. Gordon; Sleumer, Monica; Siddiqui, Asim S.; Jones, Steven J.M.

    2004-01-01

    Comparative genomics techniques are used in bioinformatics analyses to identify the structural and functional properties of DNA sequences. As the amount of available sequence data steadily increases, the ability to perform large-scale comparative analyses has become increasingly relevant. In addition, the growing complexity of genomic feature annotation means that new approaches to genomic visualization need to be explored. We have developed a Java-based application called Sockeye that uses three-dimensional (3D) graphics technology to facilitate the visualization of annotation and conservation across multiple sequences. This software uses the Ensembl database project to import sequence and annotation information from several eukaryotic species. A user can additionally import their own custom sequence and annotation data. Individual annotation objects are displayed in Sockeye by using custom 3D models. Ensembl-derived and imported sequences can be analyzed by using a suite of multiple and pair-wise alignment algorithms. The results of these comparative analyses are also displayed in the 3D environment of Sockeye. By using the Java3D API to visualize genomic data in a 3D environment, we are able to compactly display cross-sequence comparisons. This provides the user with a novel platform for visualizing and comparing genomic feature organization. PMID:15123592

  14. Draft Genome Sequence of Campylobacter jejuni 11168H

    PubMed Central

    Macdonald, Sarah E.; Gundogdu, Ozan; Dorrell, Nick; Wren, Brendan W.; Blake, Damer

    2017-01-01

    ABSTRACT Campylobacter jejuni is the most prevalent cause of food-borne gastroenteritis in the developed world. The reference and original sequenced strain C. jejuni NCTC11168 has low levels of motility compared to clinical isolates. Here, we describe the draft genome of the laboratory derived hypermotile variant named 11168H. PMID:28153902

  15. Insights from Human/Mouse genome comparisons

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pennacchio, Len A.

    2003-03-30

    Large-scale public genomic sequencing efforts have provided a wealth of vertebrate sequence data poised to provide insights into mammalian biology. These include deep genomic sequence coverage of human, mouse, rat, zebrafish, and two pufferfish (Fugu rubripes and Tetraodon nigroviridis) (Aparicio et al. 2002; Lander et al. 2001; Venter et al. 2001; Waterston et al. 2002). In addition, a high-priority has been placed on determining the genomic sequence of chimpanzee, dog, cow, frog, and chicken (Boguski 2002). While only recently available, whole genome sequence data have provided the unique opportunity to globally compare complete genome contents. Furthermore, the shared evolutionary ancestrymore » of vertebrate species has allowed the development of comparative genomic approaches to identify ancient conserved sequences with functionality. Accordingly, this review focuses on the initial comparison of available mammalian genomes and describes various insights derived from such analysis.« less

  16. Comparative analyses of putative toxin gene homologs from an Old World viper, Daboia russelii

    PubMed Central

    Krishnan, Neeraja M.

    2017-01-01

    Availability of snake genome sequences has opened up exciting areas of research on comparative genomics and gene diversity. One of the challenges in studying snake genomes is the acquisition of biological material from live animals, especially from the venomous ones, making the process cumbersome and time-consuming. Here, we report comparative sequence analyses of putative toxin gene homologs from Russell’s viper (Daboia russelii) using whole-genome sequencing data obtained from shed skin. When compared with the major venom proteins in Russell’s viper studied previously, we found 45–100% sequence similarity between the venom proteins and their putative homologs in the skin. Additionally, comparative analyses of 20 putative toxin gene family homologs provided evidence of unique sequence motifs in nerve growth factor (NGF), platelet derived growth factor (PDGF), Kunitz/Bovine pancreatic trypsin inhibitor (Kunitz BPTI), cysteine-rich secretory proteins, antigen 5, andpathogenesis-related1 proteins (CAP) and cysteine-rich secretory protein (CRISP). In those derived proteins, we identified V11 and T35 in the NGF domain; F23 and A29 in the PDGF domain; N69, K2 and A5 in the CAP domain; and Q17 in the CRISP domain to be responsible for differences in the largest pockets across the protein domain structures in crotalines, viperines and elapids from the in silico structure-based analysis. Similarly, residues F10, Y11 and E20 appear to play an important role in the protein structures across the kunitz protein domain of viperids and elapids. Our study highlights the usefulness of shed skin in obtaining good quality high-molecular weight DNA for comparative genomic studies, and provides evidence towards the unique features and evolution of putative venom gene homologs in vipers. PMID:29230357

  17. Molecular Diagnosis of Orthopedic-Device-Related Infection Directly from Sonication Fluid by Metagenomic Sequencing

    PubMed Central

    Sanderson, Nicholas D.; Atkins, Bridget L.; Brent, Andrew J.; Cole, Kevin; Foster, Dona; McNally, Martin A.; Oakley, Sarah; Peto, Leon; Taylor, Adrian; Peto, Tim E. A.; Crook, Derrick W.; Eyre, David W.

    2017-01-01

    ABSTRACT Culture of multiple periprosthetic tissue samples is the current gold standard for microbiological diagnosis of prosthetic joint infections (PJI). Additional diagnostic information may be obtained through culture of sonication fluid from explants. However, current techniques can have relatively low sensitivity, with prior antimicrobial therapy and infection by fastidious organisms influencing results. We assessed if metagenomic sequencing of total DNA extracts obtained direct from sonication fluid can provide an alternative rapid and sensitive tool for diagnosis of PJI. We compared metagenomic sequencing with standard aerobic and anaerobic culture in 97 sonication fluid samples from prosthetic joint and other orthopedic device infections. Reads from Illumina MiSeq sequencing were taxonomically classified using Kraken. Using 50 derivation samples, we determined optimal thresholds for the number and proportion of bacterial reads required to identify an infection and confirmed our findings in 47 independent validation samples. Compared to results from sonication fluid culture, the species-level sensitivity of metagenomic sequencing was 61/69 (88%; 95% confidence interval [CI], 77 to 94%; for derivation samples 35/38 [92%; 95% CI, 79 to 98%]; for validation samples, 26/31 [84%; 95% CI, 66 to 95%]), and genus-level sensitivity was 64/69 (93%; 95% CI, 84 to 98%). Species-level specificity, adjusting for plausible fastidious causes of infection, species found in concurrently obtained tissue samples, and prior antibiotics, was 85/97 (88%; 95% CI, 79 to 93%; for derivation samples, 43/50 [86%; 95% CI, 73 to 94%]; for validation samples, 42/47 [89%; 95% CI, 77 to 96%]). High levels of human DNA contamination were seen despite the use of laboratory methods to remove it. Rigorous laboratory good practice was required to minimize bacterial DNA contamination. We demonstrate that metagenomic sequencing can provide accurate diagnostic information in PJI. Our findings, combined with the increasing availability of portable, random-access sequencing technology, offer the potential to translate metagenomic sequencing into a rapid diagnostic tool in PJI. PMID:28490492

  18. Whole-genome sequence of Escherichia coli serotype O157:H7 strain EDL932 (ATCC 43894)

    USDA-ARS?s Scientific Manuscript database

    Escherichia coli serotype O157:H7 EDL 933 is a ground beef isolate associated with a 1983 hemorrhagic colitis outbreak. Considered the prototype O157:H7 strain, its derived genome sequence is a standard reference strain for comparative genomic studies of Shiga toxin-producing E. coli (STEC). Here we...

  19. A new molecular evolution model for limited insertion independent of substitution.

    PubMed

    Lèbre, Sophie; Michel, Christian J

    2013-10-01

    We recently introduced a new molecular evolution model called the IDIS model for Insertion Deletion Independent of Substitution [13,14]. In the IDIS model, the three independent processes of substitution, insertion and deletion of residues have constant rates. In order to control the genome expansion during evolution, we generalize here the IDIS model by introducing an insertion rate which decreases when the sequence grows and tends to 0 for a maximum sequence length nmax. This new model, called LIIS for Limited Insertion Independent of Substitution, defines a matrix differential equation satisfied by a vector P(t) describing the sequence content in each residue at evolution time t. An analytical solution is obtained for any diagonalizable substitution matrix M. Thus, the LIIS model gives an expression of the sequence content vector P(t) in each residue under evolution time t as a function of the eigenvalues and the eigenvectors of matrix M, the residue insertion rate vector R, the total insertion rate r, the initial and maximum sequence lengths n0 and nmax, respectively, and the sequence content vector P(t0) at initial time t0. The derivation of the analytical solution is much more technical, compared to the IDIS model, as it involves Gauss hypergeometric functions. Several propositions of the LIIS model are derived: proof that the IDIS model is a particular case of the LIIS model when the maximum sequence length nmax tends to infinity, fixed point, time scale, time step and time inversion. Using a relation between the sequence length l and the evolution time t, an expression of the LIIS model as a function of the sequence length l=n(t) is obtained. Formulas for 'insertion only', i.e. when the substitution rates are all equal to 0, are derived at evolution time t and sequence length l. Analytical solutions of the LIIS model are explicitly derived, as a function of either evolution time t or sequence length l, for two classical substitution matrices: the 3-parameter symmetric substitution matrix [12] (LIIS-SYM3) and the HKY asymmetric substitution matrix[9] (LIIS-HKY). An evaluation of the LIIS model (precisely, LIIS-HKY) based on four statistical analyses of the GC content in complete genomes of four prokaryotic taxonomic groups, namely Chlamydiae, Crenarchaeota, Spirochaetes and Thermotogae, shows the expected improvement from the theory of the LIIS model compared to the IDIS model. Copyright © 2013 Elsevier Inc. All rights reserved.

  20. A vertebrate case study of the quality of assemblies derived from next-generation sequences

    PubMed Central

    2011-01-01

    The unparalleled efficiency of next-generation sequencing (NGS) has prompted widespread adoption, but significant problems remain in the use of NGS data for whole genome assembly. We explore the advantages and disadvantages of chicken genome assemblies generated using a variety of sequencing and assembly methodologies. NGS assemblies are equivalent in some ways to a Sanger-based assembly yet deficient in others. Nonetheless, these assemblies are sufficient for the identification of the majority of genes and can reveal novel sequences when compared to existing assembly references. PMID:21453517

  1. RNA expression in a cartilaginous fish cell line reveals ancient 3′ noncoding regions highly conserved in vertebrates

    PubMed Central

    Forest, David; Nishikawa, Ryuhei; Kobayashi, Hiroshi; Parton, Angela; Bayne, Christopher J.; Barnes, David W.

    2007-01-01

    We have established a cartilaginous fish cell line [Squalus acanthias embryo cell line (SAE)], a mesenchymal stem cell line derived from the embryo of an elasmobranch, the spiny dogfish shark S. acanthias. Elasmobranchs (sharks and rays) first appeared >400 million years ago, and existing species provide useful models for comparative vertebrate cell biology, physiology, and genomics. Comparative vertebrate genomics among evolutionarily distant organisms can provide sequence conservation information that facilitates identification of critical coding and noncoding regions. Although these genomic analyses are informative, experimental verification of functions of genomic sequences depends heavily on cell culture approaches. Using ESTs defining mRNAs derived from the SAE cell line, we identified lengthy and highly conserved gene-specific nucleotide sequences in the noncoding 3′ UTRs of eight genes involved in the regulation of cell growth and proliferation. Conserved noncoding 3′ mRNA regions detected by using the shark nucleotide sequences as a starting point were found in a range of other vertebrate orders, including bony fish, birds, amphibians, and mammals. Nucleotide identity of shark and human in these regions was remarkably well conserved. Our results indicate that highly conserved gene sequences dating from the appearance of jawed vertebrates and representing potential cis-regulatory elements can be identified through the use of cartilaginous fish as a baseline. Because the expression of genes in the SAE cell line was prerequisite for their identification, this cartilaginous fish culture system also provides a physiologically valid tool to test functional hypotheses on the role of these ancient conserved sequences in comparative cell biology. PMID:17227856

  2. Gramene 2013: Comparative plant genomics resources

    USDA-ARS?s Scientific Manuscript database

    Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework fo...

  3. Extrapolation methods for vector sequences

    NASA Technical Reports Server (NTRS)

    Smith, David A.; Ford, William F.; Sidi, Avram

    1987-01-01

    This paper derives, describes, and compares five extrapolation methods for accelerating convergence of vector sequences or transforming divergent vector sequences to convergent ones. These methods are the scalar epsilon algorithm (SEA), vector epsilon algorithm (VEA), topological epsilon algorithm (TEA), minimal polynomial extrapolation (MPE), and reduced rank extrapolation (RRE). MPE and RRE are first derived and proven to give the exact solution for the right 'essential degree' k. Then, Brezinski's (1975) generalization of the Shanks-Schmidt transform is presented; the generalized form leads from systems of equations to TEA. The necessary connections are then made with SEA and VEA. The algorithms are extended to the nonlinear case by cycling, the error analysis for MPE and VEA is sketched, and the theoretical support for quadratic convergence is discussed. Strategies for practical implementation of the methods are considered.

  4. DNA sequence analysis, expression, distribution, and physiological role of the Xaa-prolyldipeptidyl aminopeptidase gene from Lactobacillus helveticus CNRZ32.

    PubMed

    Yüksel, G U; Steele, J L

    1996-02-01

    Lactobacillus helveticus CNRZ32 possesses an Xaa-prolyldipeptidyl aminopeptidase (PepX), which releases amino-terminal dipeptides from peptides containing proline residues in the penultimate position. The PepX gene, designated pepX, from Lb. helveticus CNRZ32 was sequenced. Analysis of the sequence identified a putative 2379-bp pepX open-reading frame, which encodes a polypeptide of 793 amino acid residues with a deduced molecular mass of 88,111 Da. The gene shows significant sequence identity with sequenced pepX genes from lactic acid bacteria. The product of the gene contains a motif that is almost identical with the active-site motif of the serine-dependent PepX from lactococci. The introduction of pepX into Lactococcus lactis LM0230 on either pGK12 (a low-copy-number plasmid vector) or pIL253 (a high-copy-number plasmid vector) did not result in a significant increase in PepX activity, while the introduction of pepX into CNRZ32 on pGK12 resulted in a four-fold increase in PepX activity. Southern hybridization experiments revealed that the pepX gene from CNRZ32 is well conserved in lactobacilli, pediococci and streptococci. The physiological role of PepX during growth in lactobacillus MRS (a rich medium containing protein hydrolysates along with other ingredients) and milk was examined by comparing growth of CNRZ32 and a CNRZ32 PepX-negative derivative. No difference in growth rate or acid production was observed between CNRZ32 and its PepX-negative derivative in MRS. However, the CNRZ32 PepX-negative derivative grew in milk at a reduced specific growth rate when compared to wild-type CNRZ32. Introduction of the cloned PepX determinant into the CNRZ32 PepX-negative derivative resulted in a construct with a specific growth rate similar to that of wild-type CNRZ32.

  5. Coupling Deep Transcriptome Analysis with Untargeted Metabolic Profiling in Ophiorrhiza pumila to Further the Understanding of the Biosynthesis of the Anti-Cancer Alkaloid Camptothecin and Anthraquinones

    PubMed Central

    Yamazaki, Mami; Mochida, Keiichi; Asano, Takashi; Nakabayashi, Ryo; Chiba, Motoaki; Udomson, Nirin; Yamazaki, Yasuyo; Goodenowe, Dayan B.; Sankawa, Ushio; Yoshida, Takuhiro; Toyoda, Atsushi; Totoki, Yasushi; Sakaki, Yoshiyuki; Góngora-Castillo, Elsa; Buell, C. Robin; Sakurai, Tetsuya; Saito, Kazuki

    2013-01-01

    The Rubiaceae species, Ophiorrhiza pumila, accumulates camptothecin, an anti-cancer alkaloid with a potent DNA topoisomerase I inhibitory activity, as well as anthraquinones that are derived from the combination of the isochorismate and hemiterpenoid pathways. The biosynthesis of these secondary products is active in O. pumila hairy roots yet very low in cell suspension culture. Deep transcriptome analysis was conducted in O. pumila hairy roots and cell suspension cultures using the Illumina platform, yielding a total of 2 Gb of sequence for each sample. We generated a hybrid transcriptome assembly of O. pumila using the Illumina-derived short read sequences and conventional Sanger-derived expressed sequence tag clones derived from a full-length cDNA library constructed using RNA from hairy roots. Among 35,608 non-redundant unigenes, 3,649 were preferentially expressed in hairy roots compared with cell suspension culture. Candidate genes involved in the biosynthetic pathway for the monoterpenoid indole alkaloid camptothecin were identified; specifically, genes involved in post-strictosamide biosynthetic events and genes involved in the biosynthesis of anthraquinones and chlorogenic acid. Untargeted metabolomic analysis by Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR-MS) indicated that most of the proposed intermediates in the camptothecin biosynthetic pathway accumulated in hairy roots in a preferential manner compared with cell suspension culture. In addition, a number of anthraquinones and chlorogenic acid preferentially accumulated in hairy roots compared with cell suspension culture. These results suggest that deep transcriptome and metabolome data sets can facilitate the identification of genes and intermediates involved in the biosynthesis of secondary products including camptothecin in O. pumila. PMID:23503598

  6. A better sequence-read simulator program for metagenomics.

    PubMed

    Johnson, Stephen; Trost, Brett; Long, Jeffrey R; Pittet, Vanessa; Kusalik, Anthony

    2014-01-01

    There are many programs available for generating simulated whole-genome shotgun sequence reads. The data generated by many of these programs follow predefined models, which limits their use to the authors' original intentions. For example, many models assume that read lengths follow a uniform or normal distribution. Other programs generate models from actual sequencing data, but are limited to reads from single-genome studies. To our knowledge, there are no programs that allow a user to generate simulated data following non-parametric read-length distributions and quality profiles based on empirically-derived information from metagenomics sequencing data. We present BEAR (Better Emulation for Artificial Reads), a program that uses a machine-learning approach to generate reads with lengths and quality values that closely match empirically-derived distributions. BEAR can emulate reads from various sequencing platforms, including Illumina, 454, and Ion Torrent. BEAR requires minimal user input, as it automatically determines appropriate parameter settings from user-supplied data. BEAR also uses a unique method for deriving run-specific error rates, and extracts useful statistics from the metagenomic data itself, such as quality-error models. Many existing simulators are specific to a particular sequencing technology; however, BEAR is not restricted in this way. Because of its flexibility, BEAR is particularly useful for emulating the behaviour of technologies like Ion Torrent, for which no dedicated sequencing simulators are currently available. BEAR is also the first metagenomic sequencing simulator program that automates the process of generating abundances, which can be an arduous task. BEAR is useful for evaluating data processing tools in genomics. It has many advantages over existing comparable software, such as generating more realistic reads and being independent of sequencing technology, and has features particularly useful for metagenomics work.

  7. Prediction of beta-turns from amino acid sequences using the residue-coupled model.

    PubMed

    Guruprasad, K; Shukla, S

    2003-04-01

    We evaluated the prediction of beta-turns from amino acid sequences using the residue-coupled model with an enlarged representative protein data set selected from the Protein Data Bank. Our results show that the probability values derived from a data set comprising 425 protein chains yielded an overall beta-turn prediction accuracy 68.74%, compared with 94.7% reported earlier on a data set of 30 proteins using the same method. However, we noted that the overall beta-turn prediction accuracy using probability values derived from the 30-protein data set reduces to 40.74% when tested on the data set comprising 425 protein chains. In contrast, using probability values derived from the 425 data set used in this analysis, the overall beta-turn prediction accuracy yielded consistent results when tested on either the 30-protein data set (64.62%) used earlier or a more recent representative data set comprising 619 protein chains (64.66%) or on a jackknife data set comprising 476 representative protein chains (63.38%). We therefore recommend the use of probability values derived from the 425 representative protein chains data set reported here, which gives more realistic and consistent predictions of beta-turns from amino acid sequences.

  8. Full-Length Venom Protein cDNA Sequences from Venom-Derived mRNA: Exploring Compositional Variation and Adaptive Multigene Evolution

    PubMed Central

    Modahl, Cassandra M.; Mackessy, Stephen P.

    2016-01-01

    Envenomation of humans by snakes is a complex and continuously evolving medical emergency, and treatment is made that much more difficult by the diverse biochemical composition of many venoms. Venomous snakes and their venoms also provide models for the study of molecular evolutionary processes leading to adaptation and genotype-phenotype relationships. To compare venom complexity and protein sequences, venom gland transcriptomes are assembled, which usually requires the sacrifice of snakes for tissue. However, toxin transcripts are also present in venoms, offering the possibility of obtaining cDNA sequences directly from venom. This study provides evidence that unknown full-length venom protein transcripts can be obtained from the venoms of multiple species from all major venomous snake families. These unknown venom protein cDNAs are obtained by the use of primers designed from conserved signal peptide sequences within each venom protein superfamily. This technique was used to assemble a partial venom gland transcriptome for the Middle American Rattlesnake (Crotalus simus tzabcan) by amplifying sequences for phospholipases A2, serine proteases, C-lectins, and metalloproteinases from within venom. Phospholipase A2 sequences were also recovered from the venoms of several rattlesnakes and an elapid snake (Pseudechis porphyriacus), and three-finger toxin sequences were recovered from multiple rear-fanged snake species, demonstrating that the three major clades of advanced snakes (Elapidae, Viperidae, Colubridae) have stable mRNA present in their venoms. These cDNA sequences from venom were then used to explore potential activities derived from protein sequence similarities and evolutionary histories within these large multigene superfamilies. Venom-derived sequences can also be used to aid in characterizing venoms that lack proteomic profiles and identify sequence characteristics indicating specific envenomation profiles. This approach, requiring only venom, provides access to cDNA sequences in the absence of living specimens, even from commercial venom sources, to evaluate important regional differences in venom composition and to study snake venom protein evolution. PMID:27280639

  9. Analysis and Prediction of Exon Skipping Events from RNA-Seq with Sequence Information Using Rotation Forest.

    PubMed

    Du, Xiuquan; Hu, Changlin; Yao, Yu; Sun, Shiwei; Zhang, Yanping

    2017-12-12

    In bioinformatics, exon skipping (ES) event prediction is an essential part of alternative splicing (AS) event analysis. Although many methods have been developed to predict ES events, a solution has yet to be found. In this study, given the limitations of machine learning algorithms with RNA-Seq data or genome sequences, a new feature, called RS (RNA-seq and sequence) features, was constructed. These features include RNA-Seq features derived from the RNA-Seq data and sequence features derived from genome sequences. We propose a novel Rotation Forest classifier to predict ES events with the RS features (RotaF-RSES). To validate the efficacy of RotaF-RSES, a dataset from two human tissues was used, and RotaF-RSES achieved an accuracy of 98.4%, a specificity of 99.2%, a sensitivity of 94.1%, and an area under the curve (AUC) of 98.6%. When compared to the other available methods, the results indicate that RotaF-RSES is efficient and can predict ES events with RS features.

  10. Research on respiratory motion correction method based on liver contrast-enhanced ultrasound images of single mode

    NASA Astrophysics Data System (ADS)

    Zhang, Ji; Li, Tao; Zheng, Shiqiang; Li, Yiyong

    2015-03-01

    To reduce the effects of respiratory motion in the quantitative analysis based on liver contrast-enhanced ultrasound (CEUS) image sequencesof single mode. The image gating method and the iterative registration method using model image were adopted to register liver contrast-enhanced ultrasound image sequences of single mode. The feasibility of the proposed respiratory motion correction method was explored preliminarily using 10 hepatocellular carcinomas CEUS cases. The positions of the lesions in the time series of 2D ultrasound images after correction were visually evaluated. Before and after correction, the quality of the weighted sum of transit time (WSTT) parametric images were also compared, in terms of the accuracy and spatial resolution. For the corrected and uncorrected sequences, their mean deviation values (mDVs) of time-intensity curve (TIC) fitting derived from CEUS sequences were measured. After the correction, the positions of the lesions in the time series of 2D ultrasound images were almost invariant. In contrast, the lesions in the uncorrected images all shifted noticeably. The quality of the WSTT parametric maps derived from liver CEUS image sequences were improved more greatly. Moreover, the mDVs of TIC fitting derived from CEUS sequences after the correction decreased by an average of 48.48+/-42.15. The proposed correction method could improve the accuracy of quantitative analysis based on liver CEUS image sequences of single mode, which would help in enhancing the differential diagnosis efficiency of liver tumors.

  11. Identification of HIV-1 genitourinary tract compartmentalization by analyzing the env gene sequences in urine.

    PubMed

    Blasi, Maria; Carpenter, J Harris; Balakumaran, Bala; Cara, Andrea; Gao, Feng; Klotman, Mary E

    2015-08-24

    HIV-1 persists indefinitely in memory CD4 T cells and other long-lived cellular reservoirs despite antiretroviral therapy. Our group had previously demonstrated that HIV-1 can establish a productive infection in renal epithelial cells and that the kidney represents a separate compartment for HIV-1 replication. Here, to better understand the viruses in this unique site, we genetically characterized and compared the viruses in blood and urine specimens from 24 HIV-1 infected patients with detectable viremia. Blood and urine samples were obtained from 35 HIV-1 positive patients. Single-genome amplification was performed on HIV-1 env RNA and DNA isolated from urine supernatants and urine-derived cell pellets, respectively, as well as from plasma and peripheral blood mononuclear cell from the same individuals. Neighbor-joining trees were constructed under the Kimura 2-parameter model. We amplified and sequenced the full-length HIV-1 envelope (env) gene from 12 of the 24 individuals, indicating that 50% of the viremic HIV-1-positive patients had viral RNA in their urine. Phylogenetic analysis of the env sequences from four individuals with more than 15 urine-derived env sequences showed that the majority of the sequences from urine formed distinct cluster(s) independent of those peripheral blood mononuclear cell and plasma-derived sequences, consistent with viral compartmentalization in the urine. Our results suggest the presence of a distinct HIV compartment in the genitourinary tract.

  12. Identification of HIV-1 Genitourinary Tract Compartmentalization by Analyzing the env Gene Sequences in Urine

    PubMed Central

    BLASI, Maria; CARPENTER, J. Harris; BALAKUMARAN, Bala; CARA, Andrea; GAO, Feng; KLOTMAN, Mary E.

    2015-01-01

    Objective HIV-1 persists indefinitely in memory CD4+ T cells and other long-lived cellular reservoirs despite antiretroviral therapy (ART). Our group had previously demonstrated that HIV-1 can establish a productive infection in renal epithelial cells and that the kidney represents a separate compartment for HIV-1 replication. Here, to better understand the viruses in this unique site, we genetically characterized and compared the viruses in blood and urine specimens from twenty-four HIV-1 infected subjects with detectable viremia. Design and Methods Blood and urine samples were obtained from 35 HIV-1 positive subjects. Single-genome amplification was performed on HIV-1 env RNA and DNA isolated from urine supernatants and urine derived cell pellets respectively, as well as from plasma and PBMC from the same individuals. Neighbor-joining trees were constructed under the Kimura 2-parameter mode. Results We amplified and sequenced the full-length HIV-1 envelope (env) gene from twelve of the twenty-four individuals, indicating that fifty percent (50%) of the viremic HIV-1 positive patients had viral RNA in their urine. Phylogenetic analysis of the env sequences from four subjects with more than fifteen urine-derived env sequences showed that the majority of the sequences from urine formed distinct cluster(s) independent of those PBMC and plasma-derived sequences, consistent with viral compartmentalization in the urine. Conclusions Our results suggest the presence of a distinct HIV compartment in the genitourinary tract. PMID:26372275

  13. On analytic design of loudspeaker arrays with uniform radiation characteristics

    PubMed

    Aarts; Janssen

    2000-01-01

    Some notes on analytical derived loudspeaker arrays with uniform radiation characteristics are presented. The array coefficients are derived via analytical means and compared with so-called maximal flat sequences known from telecommunications and information theory. It appears that the newly derived array, i.e., the quadratic phase array, has a higher efficiency than the Bessel array and a flatter response than the Barker array. The method discussed admits generalization to the design of arrays with desired nonuniform radiating characteristics.

  14. Error propagation in eigenimage filtering.

    PubMed

    Soltanian-Zadeh, H; Windham, J P; Jenkins, J M

    1990-01-01

    Mathematical derivation of error (noise) propagation in eigenimage filtering is presented. Based on the mathematical expressions, a method for decreasing the propagated noise given a sequence of images is suggested. The signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR) of the final composite image are compared to the SNRs and CNRs of the images in the sequence. The consistency of the assumptions and accuracy of the mathematical expressions are investigated using sequences of simulated and real magnetic resonance (MR) images of an agarose phantom and a human brain.

  15. Selfish supernumerary chromosome reveals its origin as a mosaic of host genome and organellar sequences.

    PubMed

    Martis, Mihaela Maria; Klemme, Sonja; Banaei-Moghaddam, Ali Mohammad; Blattner, Frank R; Macas, Jiří; Schmutzer, Thomas; Scholz, Uwe; Gundlach, Heidrun; Wicker, Thomas; Šimková, Hana; Novák, Petr; Neumann, Pavel; Kubaláková, Marie; Bauer, Eva; Haseneyer, Grit; Fuchs, Jörg; Doležel, Jaroslav; Stein, Nils; Mayer, Klaus F X; Houben, Andreas

    2012-08-14

    Supernumerary B chromosomes are optional additions to the basic set of A chromosomes, and occur in all eukaryotic groups. They differ from the basic complement in morphology, pairing behavior, and inheritance and are not required for normal growth and development. The current view is that B chromosomes are parasitic elements comparable to selfish DNA, like transposons. In contrast to transposons, they are autonomously inherited independent of the host genome and have their own mechanisms of mitotic or meiotic drive. Although B chromosomes were first described a century ago, little is known about their origin and molecular makeup. The widely accepted view is that they are derived from fragments of A chromosomes and/or generated in response to interspecific hybridization. Through next-generation sequencing of sorted A and B chromosomes, we show that B chromosomes of rye are rich in gene-derived sequences, allowing us to trace their origin to fragments of A chromosomes, with the largest parts corresponding to rye chromosomes 3R and 7R. Compared with A chromosomes, B chromosomes were also found to accumulate large amounts of specific repeats and insertions of organellar DNA. The origin of rye B chromosomes occurred an estimated ∼1.1-1.3 Mya, overlapping in time with the onset of the genus Secale (1.7 Mya). We propose a comprehensive model of B chromosome evolution, including its origin by recombination of several A chromosomes followed by capturing of additional A-derived and organellar sequences and amplification of B-specific repeats.

  16. Are special read alignment strategies necessary and cost-effective when handling sequencing reads from patient-derived tumor xenografts?

    PubMed

    Tso, Kai-Yuen; Lee, Sau Dan; Lo, Kwok-Wai; Yip, Kevin Y

    2014-12-23

    Patient-derived tumor xenografts in mice are widely used in cancer research and have become important in developing personalized therapies. When these xenografts are subject to DNA sequencing, the samples could contain various amounts of mouse DNA. It has been unclear how the mouse reads would affect data analyses. We conducted comprehensive simulations to compare three alignment strategies at different mutation rates, read lengths, sequencing error rates, human-mouse mixing ratios and sequenced regions. We also sequenced a nasopharyngeal carcinoma xenograft and a cell line to test how the strategies work on real data. We found the "filtering" and "combined reference" strategies performed better than aligning reads directly to human reference in terms of alignment and variant calling accuracies. The combined reference strategy was particularly good at reducing false negative variants calls without significantly increasing the false positive rate. In some scenarios the performance gain of these two special handling strategies was too small for special handling to be cost-effective, but it was found crucial when false non-synonymous SNVs should be minimized, especially in exome sequencing. Our study systematically analyzes the effects of mouse contamination in the sequencing data of human-in-mouse xenografts. Our findings provide information for designing data analysis pipelines for these data.

  17. GFam: a platform for automatic annotation of gene families.

    PubMed

    Sasidharan, Rajkumar; Nepusz, Tamás; Swarbreck, David; Huala, Eva; Paccanaro, Alberto

    2012-10-01

    We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam's capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/.

  18. Isolation and characterization of novel EST-derived genic markers in Pisum sativum (Fabaceae)1

    PubMed Central

    Jain, Shalu; McPhee, Kevin E.

    2013-01-01

    • Premise of the study: Novel markers were developed for pea (Pisum sativum) from pea expressed sequence tags (ESTs) having significant homology to Medicago truncatula gene sequences to investigate genetic diversity, linkage mapping, and cross-species transferability. • Methods and Results: Seventy-seven EST-derived genic markers were developed through comparative mapping between M. truncatula and P. sativum in which 75 markers produced PCR products and 33 were polymorphic among 16 pea genotypes. • Conclusions: The novel markers described here will be useful for future genetic studies of P. sativum; their amplification in lentil (Lens culinaris) demonstrates their potential for use in closely related species. PMID:25202494

  19. Friend and Moloney murine leukemia viruses specifically recombine with different endogenous retroviral sequences to generate mink cell focus-forming viruses.

    PubMed

    Evans, L H; Cloyd, M W

    1985-01-01

    A group of mink cell focus-forming (MCF) viruses was derived by inoculation of NFS/N mice with Moloney murine leukemia virus (Mo-MuLV 1387) and was compared to a similarly derived group of MCF viruses from mice inoculated with Friend MuLV (Fr-MuLV 57). Antigenic analyses using monoclonal antibodies specific for MCF virus and xenotropic MuLV envelope proteins and genomic structural analyses by RNase T1-resistant oligonucleotide finger-printing indicated that the Moloney and Friend MCF viruses arose by recombination of the respective ecotropic MuLVs with different endogenous retrovirus sequences of NFS mice.

  20. Optimising diffusion-weighted imaging in the abdomen and pelvis: comparison of image quality between monopolar and bipolar single-shot spin-echo echo-planar sequences.

    PubMed

    Kyriazi, Stavroula; Blackledge, Matthew; Collins, David J; Desouza, Nandita M

    2010-10-01

    To compare geometric distortion, signal-to-noise ratio (SNR), apparent diffusion coefficient (ADC), efficacy of fat suppression and presence of artefact between monopolar (Stejskal and Tanner) and bipolar (twice-refocused, eddy-current-compensating) diffusion-weighted imaging (DWI) sequences in the abdomen and pelvis. A semiquantitative distortion index (DI) was derived from the subtraction images with b = 0 and 1,000 s/mm(2) in a phantom and compared between the two sequences. Seven subjects were imaged with both sequences using four b values (0, 600, 900 and 1,050 s/mm(2)) and SNR, ADC for different organs and fat-to-muscle signal ratio (FMR) were compared. Image quality was evaluated by two radiologists on a 5-point scale. DI was improved in the bipolar sequence, indicating less geometric distortion. SNR was significantly lower for all tissues and b values in the bipolar images compared with the monopolar (p < 0.05), whereas FMR was not statistically different. ADC in liver, kidney and sacrum was higher in the bipolar scheme compared to the monopolar (p < 0.03), whereas in muscle it was lower (p = 0.018). Image quality scores were higher for the bipolar sequence (p ≤ 0.025). Artefact reduction makes the bipolar DWI sequence preferable in abdominopelvic applications, although the trade-off in SNR may compromise ADC measurements in muscle.

  1. Real-time detection of BRAF V600E mutation from archival hairy cell leukemia FFPE tissue by nanopore sequencing.

    PubMed

    Vacca, Davide; Cancila, Valeria; Gulino, Alessandro; Lo Bosco, Giosuè; Belmonte, Beatrice; Di Napoli, Arianna; Florena, Ada Maria; Tripodo, Claudio; Arancio, Walter

    2018-02-01

    The MinION is a miniaturized high-throughput next generation sequencing platform of novel conception. The use of nucleic acids derived from formalin-fixed paraffin-embedded samples is highly desirable, but their adoption for molecular assays is hurdled by the high degree of fragmentation and by the chemical-induced mutations stemming from the fixation protocols. In order to investigate the suitability of MinION sequencing on formalin-fixed paraffin-embedded samples, the presence and frequency of BRAF c.1799T > A mutation was investigated in two archival tissue specimens of Hairy cell leukemia and Hairy cell leukemia Variant. Despite the poor quality of the starting DNA, BRAF mutation was successfully detected in the Hairy cell leukemia sample with around 50% of the reads obtained within 2 h of the sequencing start. Notably, the mutational burden of the Hairy cell leukemia sample as derived from nanopore sequencing proved to be comparable to a sensitive method for the detection of point mutations, namely the Digital PCR, using a validated assay. Nanopore sequencing can be adopted for targeted sequencing of genetic lesions on critical DNA samples such as those extracted from archival routine formalin-fixed paraffin-embedded samples. This result let speculating about the possibility that the nanopore sequencing could be trustably adopted for the real-time targeted sequencing of genetic lesions. Our report opens the window for the adoption of nanopore sequencing in molecular pathology for research and diagnostics.

  2. Stellar Diameters and Temperatures. III. Main-sequence A, F, G, and K Stars: Additional High-precision Measurements and Empirical Relations

    NASA Astrophysics Data System (ADS)

    Boyajian, Tabetha S.; von Braun, Kaspar; van Belle, Gerard; Farrington, Chris; Schaefer, Gail; Jones, Jeremy; White, Russel; McAlister, Harold A.; ten Brummelaar, Theo A.; Ridgway, Stephen; Gies, Douglas; Sturmann, Laszlo; Sturmann, Judit; Turner, Nils H.; Goldfinger, P. J.; Vargas, Norm

    2013-07-01

    Based on CHARA Array measurements, we present the angular diameters of 23 nearby, main-sequence stars, ranging from spectral types A7 to K0, 5 of which are exoplanet host stars. We derive linear radii, effective temperatures, and absolute luminosities of the stars using Hipparcos parallaxes and measured bolometric fluxes. The new data are combined with previously published values to create an Angular Diameter Anthology of measured angular diameters to main-sequence stars (luminosity classes V and IV). This compilation consists of 125 stars with diameter uncertainties of less than 5%, ranging in spectral types from A to M. The large quantity of empirical data is used to derive color-temperature relations to an assortment of color indices in the Johnson (BVR J I J JHK), Cousins (R C I C), Kron (R K I K), Sloan (griz), and WISE (W 3 W 4) photometric systems. These relations have an average standard deviation of ~3% and are valid for stars with spectral types A0-M4. To derive even more accurate relations for Sun-like stars, we also determined these temperature relations omitting early-type stars (T eff > 6750 K) that may have biased luminosity estimates because of rapid rotation; for this subset the dispersion is only ~2.5%. We find effective temperatures in agreement within a couple of percent for the interferometrically characterized sample of main-sequence stars compared to those derived via the infrared flux method and spectroscopic analysis.

  3. The accuracy of ultrashort echo time MRI sequences for medical additive manufacturing.

    PubMed

    van Eijnatten, Maureen; Rijkhorst, Erik-Jan; Hofman, Mark; Forouzanfar, Tymour; Wolff, Jan

    2016-01-01

    Additively manufactured bone models, implants and drill guides are becoming increasingly popular amongst maxillofacial surgeons and dentists. To date, such constructs are commonly manufactured using CT technology that induces ionizing radiation. Recently, ultrashort echo time (UTE) MRI sequences have been developed that allow radiation-free imaging of facial bones. The aim of the present study was to assess the feasibility of UTE MRI sequences for medical additive manufacturing (AM). Three morphologically different dry human mandibles were scanned using a CT and MRI scanner. Additionally, optical scans of all three mandibles were made to acquire a "gold standard". All CT and MRI scans were converted into Standard Tessellation Language (STL) models and geometrically compared with the gold standard. To quantify the accuracy of the AM process, the CT, MRI and gold-standard STL models of one of the mandibles were additively manufactured, optically scanned and compared with the original gold-standard STL model. Geometric differences between all three CT-derived STL models and the gold standard were <1.0 mm. All three MRI-derived STL models generally presented deviations <1.5 mm in the symphyseal and mandibular area. The AM process introduced minor deviations of <0.5 mm. This study demonstrates that MRI using UTE sequences is a feasible alternative to CT in generating STL models of the mandible and would therefore be suitable for surgical planning and AM. Further in vivo studies are necessary to assess the usability of UTE MRI sequences in clinical settings.

  4. Molecular Evolution of a Type 1 Wild-Vaccine Poliovirus Recombinant during Widespread Circulation in China

    PubMed Central

    Liu, Hong-Mei; Zheng, Du-Ping; Zhang, Li-Bi; Oberste, M. Steven; Pallansch, Mark A.; Kew, Olen M.

    2000-01-01

    Type 1 wild-vaccine recombinant polioviruses were isolated from poliomyelitis patients in China from 1991 to 1993. We compared the sequences of 34 recombinant isolates over the 1,353-nucleotide (nt) genomic interval (nt 2480 to 3832) encoding the major capsid protein, VP1, and the protease, 2A. All recombinants had a 367-nt block of sequence (nt 3271 to 3637) derived from the Sabin 1 oral poliovirus vaccine strain spanning the 3′-terminal sequences of VP1 (115 nt) and the 5′ half of 2A (252 nt). The remaining VP1 sequences were closely (up to 99.5%) related to those of a major genotype of wild type 1 poliovirus endemic to China up to 1994. In contrast, the non-vaccine-derived sequences at the 3′ half of 2A were more distantly related (<90% nucleotide sequence match) to those of other contemporary wild polioviruses from China. The vaccine-derived sequences of the earliest (April 1991) isolates completely matched those of Sabin 1. Later isolates diverged from the early isolates primarily by accumulation of synonymous base substitutions (at a rate of ∼3.7 × 10−2 substitutions per synonymous site per year) over the entire VP1-2A interval. Distinct evolutionary lineages were found in different Chinese provinces. From the combined epidemiologic and evolutionary analyses, we propose that the recombinant virus arose during mixed infection of a single individual in northern China in early 1991 and that its progeny spread by multiple independent chains of transmission into some of the most populous areas of China within a year of the initiating infection. PMID:11070012

  5. Microsatellite genotyping and genome-wide single nucleotide polymorphism-based indices of Plasmodium falciparum diversity within clinical infections.

    PubMed

    Murray, Lee; Mobegi, Victor A; Duffy, Craig W; Assefa, Samuel A; Kwiatkowski, Dominic P; Laman, Eugene; Loua, Kovana M; Conway, David J

    2016-05-12

    In regions where malaria is endemic, individuals are often infected with multiple distinct parasite genotypes, a situation that may impact on evolution of parasite virulence and drug resistance. Most approaches to studying genotypic diversity have involved analysis of a modest number of polymorphic loci, although whole genome sequencing enables a broader characterisation of samples. PCR-based microsatellite typing of a panel of ten loci was performed on Plasmodium falciparum in 95 clinical isolates from a highly endemic area in the Republic of Guinea, to characterize within-isolate genetic diversity. Separately, single nucleotide polymorphism (SNP) data from genome-wide short-read sequences of the same samples were used to derive within-isolate fixation indices (F ws), an inverse measure of diversity within each isolate compared to overall local genetic diversity. The latter indices were compared with the microsatellite results, and also with indices derived by randomly sampling modest numbers of SNPs. As expected, the number of microsatellite loci with more than one allele in each isolate was highly significantly inversely correlated with the genome-wide F ws fixation index (r = -0.88, P < 0.001). However, the microsatellite analysis revealed that most isolates contained mixed genotypes, even those that had no detectable genome sequence heterogeneity. Random sampling of different numbers of SNPs showed that an F ws index derived from ten or more SNPs with minor allele frequencies of >10 % had high correlation (r > 0.90) with the index derived using all SNPs. Different types of data give highly correlated indices of within-infection diversity, although PCR-based analysis detects low-level minority genotypes not apparent in bulk sequence analysis. When whole-genome data are not obtainable, quantitative assay of ten or more SNPs can yield a reasonably accurate estimate of the within-infection fixation index (F ws).

  6. Protein Sequence Classification with Improved Extreme Learning Machine Algorithms

    PubMed Central

    2014-01-01

    Precisely classifying a protein sequence from a large biological protein sequences database plays an important role for developing competitive pharmacological products. Comparing the unseen sequence with all the identified protein sequences and returning the category index with the highest similarity scored protein, conventional methods are usually time-consuming. Therefore, it is urgent and necessary to build an efficient protein sequence classification system. In this paper, we study the performance of protein sequence classification using SLFNs. The recent efficient extreme learning machine (ELM) and its invariants are utilized as the training algorithms. The optimal pruned ELM is first employed for protein sequence classification in this paper. To further enhance the performance, the ensemble based SLFNs structure is constructed where multiple SLFNs with the same number of hidden nodes and the same activation function are used as ensembles. For each ensemble, the same training algorithm is adopted. The final category index is derived using the majority voting method. Two approaches, namely, the basic ELM and the OP-ELM, are adopted for the ensemble based SLFNs. The performance is analyzed and compared with several existing methods using datasets obtained from the Protein Information Resource center. The experimental results show the priority of the proposed algorithms. PMID:24795876

  7. Probabilistic Evaluation of Competing Climate Models

    NASA Astrophysics Data System (ADS)

    Braverman, A. J.; Chatterjee, S.; Heyman, M.; Cressie, N.

    2017-12-01

    A standard paradigm for assessing the quality of climate model simulations is to compare what these models produce for past and present time periods, to observations of the past and present. Many of these comparisons are based on simple summary statistics called metrics. Here, we propose an alternative: evaluation of competing climate models through probabilities derived from tests of the hypothesis that climate-model-simulated and observed time sequences share common climate-scale signals. The probabilities are based on the behavior of summary statistics of climate model output and observational data, over ensembles of pseudo-realizations. These are obtained by partitioning the original time sequences into signal and noise components, and using a parametric bootstrap to create pseudo-realizations of the noise sequences. The statistics we choose come from working in the space of decorrelated and dimension-reduced wavelet coefficients. We compare monthly sequences of CMIP5 model output of average global near-surface temperature anomalies to similar sequences obtained from the well-known HadCRUT4 data set, as an illustration.

  8. A comparative study of working memory: immediate serial spatial recall in baboons (Papio papio) and humans.

    PubMed

    Fagot, Joël; De Lillo, Carlo

    2011-12-01

    Two experiments assessed if non-human primates can be meaningfully compared to humans in a non-verbal test of serial recall. A procedure was used that was derived from variations of the Corsi test, designed to test the effects of sequence structure and movement path length in humans. Two baboons were tested in Experiment 1. The monkeys showed several attributes of human serial recall. These included an easier recall of sequences with a shorter number of items and of sequences characterized by a shorter path length when the number of items was kept constant. However, the accuracy and speed of processing did not indicate that the monkeys were able to benefit from the spatiotemporal structure of sequences. Humans tested in Experiment 2 showed a quantitatively longer memory span, and, in contrast with monkeys, benefitted from sequence structure. The results are discussed in relation to differences in how human and non-human primates segment complex visual patterns. Copyright © 2011 Elsevier Ltd. All rights reserved.

  9. Establishing gene models from the Pinus pinaster genome using gene capture and BAC sequencing.

    PubMed

    Seoane-Zonjic, Pedro; Cañas, Rafael A; Bautista, Rocío; Gómez-Maldonado, Josefa; Arrillaga, Isabel; Fernández-Pozo, Noé; Claros, M Gonzalo; Cánovas, Francisco M; Ávila, Concepción

    2016-02-27

    In the era of DNA throughput sequencing, assembling and understanding gymnosperm mega-genomes remains a challenge. Although drafts of three conifer genomes have recently been published, this number is too low to understand the full complexity of conifer genomes. Using techniques focused on specific genes, gene models can be established that can aid in the assembly of gene-rich regions, and this information can be used to compare genomes and understand functional evolution. In this study, gene capture technology combined with BAC isolation and sequencing was used as an experimental approach to establish de novo gene structures without a reference genome. Probes were designed for 866 maritime pine transcripts to sequence genes captured from genomic DNA. The gene models were constructed using GeneAssembler, a new bioinformatic pipeline, which reconstructed over 82% of the gene structures, and a high proportion (85%) of the captured gene models contained sequences from the promoter regulatory region. In a parallel experiment, the P. pinaster BAC library was screened to isolate clones containing genes whose cDNA sequence were already available. BAC clones containing the asparagine synthetase, sucrose synthase and xyloglucan endotransglycosylase gene sequences were isolated and used in this study. The gene models derived from the gene capture approach were compared with the genomic sequences derived from the BAC clones. This combined approach is a particularly efficient way to capture the genomic structures of gene families with a small number of members. The experimental approach used in this study is a valuable combined technique to study genomic gene structures in species for which a reference genome is unavailable. It can be used to establish exon/intron boundaries in unknown gene structures, to reconstruct incomplete genes and to obtain promoter sequences that can be used for transcriptional studies. A bioinformatics algorithm (GeneAssembler) is also provided as a Ruby gem for this class of analyses.

  10. Free energy minimization to predict RNA secondary structures and computational RNA design.

    PubMed

    Churkin, Alexander; Weinbrand, Lina; Barash, Danny

    2015-01-01

    Determining the RNA secondary structure from sequence data by computational predictions is a long-standing problem. Its solution has been approached in two distinctive ways. If a multiple sequence alignment of a collection of homologous sequences is available, the comparative method uses phylogeny to determine conserved base pairs that are more likely to form as a result of billions of years of evolution than by chance. In the case of single sequences, recursive algorithms that compute free energy structures by using empirically derived energy parameters have been developed. This latter approach of RNA folding prediction by energy minimization is widely used to predict RNA secondary structure from sequence. For a significant number of RNA molecules, the secondary structure of the RNA molecule is indicative of its function and its computational prediction by minimizing its free energy is important for its functional analysis. A general method for free energy minimization to predict RNA secondary structures is dynamic programming, although other optimization methods have been developed as well along with empirically derived energy parameters. In this chapter, we introduce and illustrate by examples the approach of free energy minimization to predict RNA secondary structures.

  11. PrionHome: a database of prions and other sequences relevant to prion phenomena.

    PubMed

    Harbi, Djamel; Parthiban, Marimuthu; Gendoo, Deena M A; Ehsani, Sepehr; Kumar, Manish; Schmitt-Ulms, Gerold; Sowdhamini, Ramanathan; Harrison, Paul M

    2012-01-01

    Prions are units of propagation of an altered state of a protein or proteins; prions can propagate from organism to organism, through cooption of other protein copies. Prions contain no necessary nucleic acids, and are important both as both pathogenic agents, and as a potential force in epigenetic phenomena. The original prions were derived from a misfolded form of the mammalian Prion Protein PrP. Infection by these prions causes neurodegenerative diseases. Other prions cause non-Mendelian inheritance in budding yeast, and sometimes act as diseases of yeast. We report the bioinformatic construction of the PrionHome, a database of >2000 prion-related sequences. The data was collated from various public and private resources and filtered for redundancy. The data was then processed according to a transparent classification system of prionogenic sequences (i.e., sequences that can make prions), prionoids (i.e., proteins that propagate like prions between individual cells), and other prion-related phenomena. There are eight PrionHome classifications for sequences. The first four classifications are derived from experimental observations: prionogenic sequences, prionoids, other prion-related phenomena, and prion interactors. The second four classifications are derived from sequence analysis: orthologs, paralogs, pseudogenes, and candidate-prionogenic sequences. Database entries list: supporting information for PrionHome classifications, prion-determinant areas (where relevant), and disordered and compositionally-biased regions. Also included are literature references for the PrionHome classifications, transcripts and genomic coordinates, and structural data (including comparative models made for the PrionHome from manually curated alignments). We provide database usage examples for both vertebrate and fungal prion contexts. Using the database data, we have performed a detailed analysis of the compositional biases in known budding-yeast prionogenic sequences, showing that the only abundant bias pattern is for asparagine bias with subsidiary serine bias. We anticipate that this database will be a useful experimental aid and reference resource. It is freely available at: http://libaio.biol.mcgill.ca/prion.

  12. PrionHome: A Database of Prions and Other Sequences Relevant to Prion Phenomena

    PubMed Central

    Harbi, Djamel; Parthiban, Marimuthu; Gendoo, Deena M. A.; Ehsani, Sepehr; Kumar, Manish; Schmitt-Ulms, Gerold; Sowdhamini, Ramanathan; Harrison, Paul M.

    2012-01-01

    Prions are units of propagation of an altered state of a protein or proteins; prions can propagate from organism to organism, through cooption of other protein copies. Prions contain no necessary nucleic acids, and are important both as both pathogenic agents, and as a potential force in epigenetic phenomena. The original prions were derived from a misfolded form of the mammalian Prion Protein PrP. Infection by these prions causes neurodegenerative diseases. Other prions cause non-Mendelian inheritance in budding yeast, and sometimes act as diseases of yeast. We report the bioinformatic construction of the PrionHome, a database of >2000 prion-related sequences. The data was collated from various public and private resources and filtered for redundancy. The data was then processed according to a transparent classification system of prionogenic sequences (i.e., sequences that can make prions), prionoids (i.e., proteins that propagate like prions between individual cells), and other prion-related phenomena. There are eight PrionHome classifications for sequences. The first four classifications are derived from experimental observations: prionogenic sequences, prionoids, other prion-related phenomena, and prion interactors. The second four classifications are derived from sequence analysis: orthologs, paralogs, pseudogenes, and candidate-prionogenic sequences. Database entries list: supporting information for PrionHome classifications, prion-determinant areas (where relevant), and disordered and compositionally-biased regions. Also included are literature references for the PrionHome classifications, transcripts and genomic coordinates, and structural data (including comparative models made for the PrionHome from manually curated alignments). We provide database usage examples for both vertebrate and fungal prion contexts. Using the database data, we have performed a detailed analysis of the compositional biases in known budding-yeast prionogenic sequences, showing that the only abundant bias pattern is for asparagine bias with subsidiary serine bias. We anticipate that this database will be a useful experimental aid and reference resource. It is freely available at: http://libaio.biol.mcgill.ca/prion. PMID:22363733

  13. DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses.

    PubMed

    Zepeda-Mendoza, Marie Lisandra; Bohmann, Kristine; Carmona Baez, Aldo; Gilbert, M Thomas P

    2016-05-03

    DNA metabarcoding is an approach for identifying multiple taxa in an environmental sample using specific genetic loci and taxa-specific primers. When combined with high-throughput sequencing it enables the taxonomic characterization of large numbers of samples in a relatively time- and cost-efficient manner. One recent laboratory development is the addition of 5'-nucleotide tags to both primers producing double-tagged amplicons and the use of multiple PCR replicates to filter erroneous sequences. However, there is currently no available toolkit for the straightforward analysis of datasets produced in this way. We present DAMe, a toolkit for the processing of datasets generated by double-tagged amplicons from multiple PCR replicates derived from an unlimited number of samples. Specifically, DAMe can be used to (i) sort amplicons by tag combination, (ii) evaluate PCR replicates dissimilarity, and (iii) filter sequences derived from sequencing/PCR errors, chimeras, and contamination. This is attained by calculating the following parameters: (i) sequence content similarity between the PCR replicates from each sample, (ii) reproducibility of each unique sequence across the PCR replicates, and (iii) copy number of the unique sequences in each PCR replicate. We showcase the insights that can be obtained using DAMe prior to taxonomic assignment, by applying it to two real datasets that vary in their complexity regarding number of samples, sequencing libraries, PCR replicates, and used tag combinations. Finally, we use a third mock dataset to demonstrate the impact and importance of filtering the sequences with DAMe. DAMe allows the user-friendly manipulation of amplicons derived from multiple samples with PCR replicates built in a single or multiple sequencing libraries. It allows the user to: (i) collapse amplicons into unique sequences and sort them by tag combination while retaining the sample identifier and copy number information, (ii) identify sequences carrying unused tag combinations, (iii) evaluate the comparability of PCR replicates of the same sample, and (iv) filter tagged amplicons from a number of PCR replicates using parameters of minimum length, copy number, and reproducibility across the PCR replicates. This enables an efficient analysis of complex datasets, and ultimately increases the ease of handling datasets from large-scale studies.

  14. Comparison of variable region 3 sequences of human immunodeficiency virus type 1 from infected children with the RNA and DNA sequences of the virus populations of their mothers.

    PubMed Central

    Scarlatti, G; Leitner, T; Halapi, E; Wahlberg, J; Marchisio, P; Clerici-Schoeller, M A; Wigzell, H; Fenyö, E M; Albert, J; Uhlén, M

    1993-01-01

    We have compared the variable region 3 sequences from 10 human immunodeficiency virus type 1 (HIV-1)-infected infants to virus sequences from the corresponding mothers. The sequences were derived from DNA of uncultured peripheral blood mononuclear cells (PBMC), DNA of cultured PBMC, and RNA from serum collected at or shortly after delivery. The infected infants, in contrast to the mothers, harbored homogeneous virus populations. Comparison of sequences from the children and clones derived from DNA of the corresponding mothers showed that the transmitted virus represented either a minor or a major virus population of the mother. In contrast to an earlier study, we found no evidence of selection of minor virus variants during transmission. Furthermore, the transmitted virus variant did not show any characteristic molecular features. In some cases the transmitted virus was more related to the virus RNA population of the mother and in other cases it was more related to the virus DNA population. This suggests that either cell-free or cell-associated virus may be transmitted. These data will help AIDS researchers to understand the mechanism of transmission and to plan strategies for prevention of transmission. PMID:8446584

  15. [A new variant of the simian T-lymphotropic retrovirus type I (STLV-IF) in the Sukhumi colony of hamadryas baboons].

    PubMed

    Chikobaeva, M G; Schatzl, H; Rose, D; Bush, U; Iakovleva, L A; Deinhardt, F; Helm, K; Lapin, B A

    1993-01-01

    Polymerase chain reaction (PCR) was developed for the detection of simian T-lymphotropic virus type 1 (STLV-1) infection of P. hamadryas and direct sequencing using oligo-nucleotide primer pairs specific for the tax and env regions of the related human T-lymphotropic virus type 1 (HTLV-1). Excellent specificity was shown in the detection of STLV-1 provirus in infected baboons by PCR using HTLV-1-derived primers. The nucleotide sequences of env 467bp and tax 159bp of the proviral genome (env position 5700-6137, tax position 7373-7498 HTLV-1, according to Seiki et al., 1983) derived from STLV-1-infected P. hamadryas were analysed using PCR and direct sequencing techniques. Two STLV-1 isolates from different sources (Sukhumi main-SuTLV-1 and forest stocks-STLV-1F) were compared. Two variants of STLV-1 among P. hamadryas with different level of homology to HTLV-1 were wound (83.8% and 95.2%, respectively). A possible role of nucleotide changes in env and tax sequenced fragments and oncogenicity of STLV-1 variants is discussed.

  16. Authentication of Cordyceps sinensis by DNA Analyses: Comparison of ITS Sequence Analysis and RAPD-Derived Molecular Markers.

    PubMed

    Lam, Kelly Y C; Chan, Gallant K L; Xin, Gui-Zhong; Xu, Hong; Ku, Chuen-Fai; Chen, Jian-Ping; Yao, Ping; Lin, Huang-Quan; Dong, Tina T X; Tsim, Karl W K

    2015-12-15

    Cordyceps sinensis is an endoparasitic fungus widely used as a tonic and medicinal food in the practice of traditional Chinese medicine (TCM). In historical usage, Cordyceps specifically is referring to the species of C. sinensis. However, a number of closely related species are named themselves as Cordyceps, and they are sold commonly as C. sinensis. The substitutes and adulterants of C. sinensis are often introduced either intentionally or accidentally in the herbal market, which seriously affects the therapeutic effects or even leads to life-threatening poisoning. Here, we aim to identify Cordyceps by DNA sequencing technology. Two different DNA-based approaches were compared. The internal transcribed spacer (ITS) sequences and the random amplified polymorphic DNA (RAPD)-sequence characterized amplified region (SCAR) were developed here to authenticate different species of Cordyceps. Both approaches generally enabled discrimination of C. sinensis from others. The application of the two methods, supporting each other, increases the security of identification. For better reproducibility and faster analysis, the SCAR markers derived from the RAPD results provide a new method for quick authentication of Cordyceps.

  17. Analysis of temporal transcription expression profiles reveal links between protein function and developmental stages of Drosophila melanogaster.

    PubMed

    Wan, Cen; Lees, Jonathan G; Minneci, Federico; Orengo, Christine A; Jones, David T

    2017-10-01

    Accurate gene or protein function prediction is a key challenge in the post-genome era. Most current methods perform well on molecular function prediction, but struggle to provide useful annotations relating to biological process functions due to the limited power of sequence-based features in that functional domain. In this work, we systematically evaluate the predictive power of temporal transcription expression profiles for protein function prediction in Drosophila melanogaster. Our results show significantly better performance on predicting protein function when transcription expression profile-based features are integrated with sequence-derived features, compared with the sequence-derived features alone. We also observe that the combination of expression-based and sequence-based features leads to further improvement of accuracy on predicting all three domains of gene function. Based on the optimal feature combinations, we then propose a novel multi-classifier-based function prediction method for Drosophila melanogaster proteins, FFPred-fly+. Interpreting our machine learning models also allows us to identify some of the underlying links between biological processes and developmental stages of Drosophila melanogaster.

  18. Complete genomic sequence of Powassan virus: evaluation of genetic elements in tick-borne versus mosquito-borne flaviviruses.

    PubMed

    Mandl, C W; Holzmann, H; Kunz, C; Heinz, F X

    1993-05-01

    The complete nucleotide sequence of the positive-stranded RNA genome of the tick-borne flavivirus Powassan (10,839 nucleotides) was elucidated and the amino acid sequence of all viral proteins was derived. Based on this sequence as well as serological data, Powassan virus represents the most divergent member of the tick-borne serocomplex within the genus flaviviruses, family Flaviviridae. The primary nucleotide sequence and potential RNA secondary structures of the Powassan virus genome as well as the protein sequences and the reactivities of the virion with a panel of monoclonal antibodies were compared to other tick-borne and mosquito-borne flaviviruses. These analyses corroborated significant differences between tick-borne and mosquito-borne flaviviruses, but also emphasized structural elements that are conserved among both vector groups. The comparisons among tick-borne flaviviruses revealed conserved sequence elements that might represent important determinants of the tick-borne flavivirus phenotype.

  19. Comparative performance of high-density oligonucleotide sequencing and dideoxynucleotide sequencing of HIV type 1 pol from clinical samples.

    PubMed

    Günthard, H F; Wong, J K; Ignacio, C C; Havlir, D V; Richman, D D

    1998-07-01

    The performance of the high-density oligonucleotide array methodology (GeneChip) in detecting drug resistance mutations in HIV-1 pol was compared with that of automated dideoxynucleotide sequencing (ABI) of clinical samples, viral stocks, and plasmid-derived NL4-3 clones. Sequences from 29 clinical samples (plasma RNA, n = 17; lymph node RNA, n = 5; lymph node DNA, n = 7) from 12 patients, from 6 viral stock RNA samples, and from 13 NL4-3 clones were generated by both methods. Editing was done independently by a different investigator for each method before comparing the sequences. In addition, NL4-3 wild type (WT) and mutants were mixed in varying concentrations and sequenced by both methods. Overall, a concordance of 99.1% was found for a total of 30,865 bases compared. The comparison of clinical samples (plasma RNA and lymph node RNA and DNA) showed a slightly lower match of base calls, 98.8% for 19,831 nucleotides compared (protease region, 99.5%, n = 8272; RT region, 98.3%, n = 11,316), than for viral stocks and NL4-3 clones (protease region, 99.8%; RT region, 99.5%). Artificial mixing experiments showed a bias toward calling wild-type bases by GeneChip. Discordant base calls are most likely due to differential detection of mixtures. The concordance between GeneChip and ABI was high and appeared dependent on the nature of the templates (directly amplified versus cloned) and the complexity of mixes.

  20. Design, Synthesis, and Use of Peptides Derived from Human Papillomavirus L1 Protein for the Modification of Gold Electrode Surfaces by Self-Assembled Monolayers.

    PubMed

    Lara Carrillo, John Alejandro; Fierro Medina, Ricardo; Manríquez Rocha, Juan; Bustos Bustos, Erika; Insuasty Cepeda, Diego Sebastián; García Castañeda, Javier Eduardo; Rivera Monroy, Zuly Jenny

    2017-11-14

    In order to obtain gold electrode surfaces modified with Human Papillomavirus L1 protein (HPV L1)-derived peptides, two sequences, SPINNTKPHEAR and YIK, were chosen. Both have been recognized by means of sera from patients infected with HPV. The molecules, Fc-Ahx-SPINNTKPHEAR, Ac-C- Ahx -(Fc)KSPINNTKPHEAR, Ac-C- Ahx -SPINNTKPHEAR(Fc)K, C- Ahx -SPINNTKPHEAR, and (YIK)₂- Ahx -C, were designed, synthesized, and characterized. Our results suggest that peptides derived from the SPINNTKPHEAR sequence, containing ferrocene and cysteine residues, are not stable and not adequate for electrode surface modification. The surface of polycrystalline gold electrodes was modified with the peptides C-Ahx-SPINNTKPHEAR or (YIK)₂-Ahx-C through self-assembly. The modified polycrystalline gold electrodes were characterized via infrared spectroscopy and electrochemical measurements. The thermodynamic parameters, surface coverage factor, and medium pH effect were determined for these surfaces. The results indicate that surface modification depends on the peptide sequence (length, amino acid composition, polyvalence, etc.). The influence of antipeptide antibodies on the voltammetric response of the modified electrode was evaluated by comparing results obtained with pre-immune and post-immune serum samples.

  1. Tumour auto-contouring on 2d cine MRI for locally advanced lung cancer: A comparative study.

    PubMed

    Fast, Martin F; Eiben, Björn; Menten, Martin J; Wetscherek, Andreas; Hawkes, David J; McClelland, Jamie R; Oelfke, Uwe

    2017-12-01

    Radiotherapy guidance based on magnetic resonance imaging (MRI) is currently becoming a clinical reality. Fast 2d cine MRI sequences are expected to increase the precision of radiation delivery by facilitating tumour delineation during treatment. This study compares four auto-contouring algorithms for the task of delineating the primary tumour in six locally advanced (LA) lung cancer patients. Twenty-two cine MRI sequences were acquired using either a balanced steady-state free precession or a spoiled gradient echo imaging technique. Contours derived by the auto-contouring algorithms were compared against manual reference contours. A selection of eight image data sets was also used to assess the inter-observer delineation uncertainty. Algorithmically derived contours agreed well with the manual reference contours (median Dice similarity index: ⩾0.91). Multi-template matching and deformable image registration performed significantly better than feature-driven registration and the pulse-coupled neural network (PCNN). Neither MRI sequence nor image orientation was a conclusive predictor for algorithmic performance. Motion significantly degraded the performance of the PCNN. The inter-observer variability was of the same order of magnitude as the algorithmic performance. Auto-contouring of tumours on cine MRI is feasible in LA lung cancer patients. Despite large variations in implementation complexity, the different algorithms all have relatively similar performance. Copyright © 2017 The Author(s). Published by Elsevier B.V. All rights reserved.

  2. Visualization of genome signatures of eukaryote genomes by batch-learning self-organizing map with a special emphasis on Drosophila genomes.

    PubMed

    Abe, Takashi; Hamano, Yuta; Ikemura, Toshimichi

    2014-01-01

    A strategy of evolutionary studies that can compare vast numbers of genome sequences is becoming increasingly important with the remarkable progress of high-throughput DNA sequencing methods. We previously established a sequence alignment-free clustering method "BLSOM" for di-, tri-, and tetranucleotide compositions in genome sequences, which can characterize sequence characteristics (genome signatures) of a wide range of species. In the present study, we generated BLSOMs for tetra- and pentanucleotide compositions in approximately one million sequence fragments derived from 101 eukaryotes, for which almost complete genome sequences were available. BLSOM recognized phylotype-specific characteristics (e.g., key combinations of oligonucleotide frequencies) in the genome sequences, permitting phylotype-specific clustering of the sequences without any information regarding the species. In our detailed examination of 12 Drosophila species, the correlation between their phylogenetic classification and the classification on the BLSOMs was observed to visualize oligonucleotides diagnostic for species-specific clustering.

  3. Complete genome sequence and the expression pattern of plasmids of the model ethanologen Zymomonas mobilis ZM4 and its xylose-utilizing derivatives 8b and 2032.

    PubMed

    Yang, Shihui; Vera, Jessica M; Grass, Jeff; Savvakis, Giannis; Moskvin, Oleg V; Yang, Yongfu; McIlwain, Sean J; Lyu, Yucai; Zinonos, Irene; Hebert, Alexander S; Coon, Joshua J; Bates, Donna M; Sato, Trey K; Brown, Steven D; Himmel, Michael E; Zhang, Min; Landick, Robert; Pappas, Katherine M; Zhang, Yaoping

    2018-01-01

    Zymomonas mobilis is a natural ethanologen being developed and deployed as an industrial biofuel producer. To date, eight Z. mobilis strains have been completely sequenced and found to contain 2-8 native plasmids. However, systematic verification of predicted Z. mobilis plasmid genes and their contribution to cell fitness has not been hitherto addressed. Moreover, the precise number and identities of plasmids in Z. mobilis model strain ZM4 have been unclear. The lack of functional information about plasmid genes in ZM4 impedes ongoing studies for this model biofuel-producing strain. In this study, we determined the complete chromosome and plasmid sequences of ZM4 and its engineered xylose-utilizing derivatives 2032 and 8b. Compared to previously published and revised ZM4 chromosome sequences, the ZM4 chromosome sequence reported here contains 65 nucleotide sequence variations as well as a 2400-bp insertion. Four plasmids were identified in all three strains, with 150 plasmid genes predicted in strain ZM4 and 2032, and 153 plasmid genes predicted in strain 8b due to the insertion of heterologous DNA for expanded substrate utilization. Plasmid genes were then annotated using Blast2GO, InterProScan, and systems biology data analyses, and most genes were found to have apparent orthologs in other organisms or identifiable conserved domains. To verify plasmid gene prediction, RNA-Seq was used to map transcripts and also compare relative gene expression under various growth conditions, including anaerobic and aerobic conditions, or growth in different concentrations of biomass hydrolysates. Overall, plasmid genes were more responsive to varying hydrolysate concentrations than to oxygen availability. Additionally, our results indicated that although all plasmids were present in low copy number (about 1-2 per cell), the copy number of some plasmids varied under specific growth conditions or due to heterologous gene insertion. The complete genome of ZM4 and two xylose-utilizing derivatives is reported in this study, with an emphasis on identifying and characterizing plasmid genes. Plasmid gene annotation, validation, expression levels at growth conditions of interest, and contribution to host fitness are reported for the first time.

  4. Comparative analysis of the full genome sequence of European bat lyssavirus type 1 and type 2 with other lyssaviruses and evidence for a conserved transcription termination and polyadenylation motif in the G-L 3' non-translated region.

    PubMed

    Marston, D A; McElhinney, L M; Johnson, N; Müller, T; Conzelmann, K K; Tordo, N; Fooks, A R

    2007-04-01

    We report the first full-length genomic sequences for European bat lyssavirus type-1 (EBLV-1) and type-2 (EBLV-2). The EBLV-1 genomic sequence was derived from a virus isolated from a serotine bat in Hamburg, Germany, in 1968 and the EBLV-2 sequence was derived from a virus isolate from a human case of rabies that occurred in Scotland in 2002. A long-distance PCR strategy was used to amplify the open reading frames (ORFs), followed by standard and modified RACE (rapid amplification of cDNA ends) techniques to amplify the 3' and 5' ends. The lengths of each complete viral genome for EBLV-1 and EBLV-2 were 11 966 and 11 930 base pairs, respectively, and follow the standard rhabdovirus genome organization of five viral proteins. Comparison with other lyssavirus sequences demonstrates variation in degrees of homology, with the genomic termini showing a high degree of complementarity. The nucleoprotein was the most conserved, both intra- and intergenotypically, followed by the polymerase (L), matrix and glyco- proteins, with the phosphoprotein being the most variable. In addition, we have shown that the two EBLVs utilize a conserved transcription termination and polyadenylation (TTP) motif, approximately 50 nt upstream of the L gene start codon. All available lyssavirus sequences to date, with the exception of Pasteur virus (PV) and PV-derived isolates, use the second TTP site. This observation may explain differences in pathogenicity between lyssavirus strains, dependent on the length of the untranslated region, which might affect transcriptional activity and RNA stability.

  5. The accuracy of ultrashort echo time MRI sequences for medical additive manufacturing

    PubMed Central

    Rijkhorst, Erik-Jan; Hofman, Mark; Forouzanfar, Tymour; Wolff, Jan

    2016-01-01

    Objectives: Additively manufactured bone models, implants and drill guides are becoming increasingly popular amongst maxillofacial surgeons and dentists. To date, such constructs are commonly manufactured using CT technology that induces ionizing radiation. Recently, ultrashort echo time (UTE) MRI sequences have been developed that allow radiation-free imaging of facial bones. The aim of the present study was to assess the feasibility of UTE MRI sequences for medical additive manufacturing (AM). Methods: Three morphologically different dry human mandibles were scanned using a CT and MRI scanner. Additionally, optical scans of all three mandibles were made to acquire a “gold standard”. All CT and MRI scans were converted into Standard Tessellation Language (STL) models and geometrically compared with the gold standard. To quantify the accuracy of the AM process, the CT, MRI and gold-standard STL models of one of the mandibles were additively manufactured, optically scanned and compared with the original gold-standard STL model. Results: Geometric differences between all three CT-derived STL models and the gold standard were <1.0 mm. All three MRI-derived STL models generally presented deviations <1.5 mm in the symphyseal and mandibular area. The AM process introduced minor deviations of <0.5 mm. Conclusions: This study demonstrates that MRI using UTE sequences is a feasible alternative to CT in generating STL models of the mandible and would therefore be suitable for surgical planning and AM. Further in vivo studies are necessary to assess the usability of UTE MRI sequences in clinical settings. PMID:26943179

  6. Cohabitational and marital histories of adults in Great Britain.

    PubMed

    Haskey, J

    1999-01-01

    This article presents findings on cohabitation-derived from cohabitation and marriage histories collected in a specially designed module of the ONS Omnibus Survey. It examines the sequence of types of partnerships, and how this sequence varies by birth cohort of respondents. Also compared is the relative stability of cohabiting unions and married partnerships. Finally, the reasons for converting a cohabiting union into a marriage are analysed--separately for men and women, and separately according to whether the marriage continued or ended.

  7. Not so bad after all: retroviruses and long terminal repeat retrotransposons as a source of new genes in vertebrates.

    PubMed

    Naville, M; Warren, I A; Haftek-Terreau, Z; Chalopin, D; Brunet, F; Levin, P; Galiana, D; Volff, J-N

    2016-04-01

    Viruses and transposable elements, once considered as purely junk and selfish sequences, have repeatedly been used as a source of novel protein-coding genes during the evolution of most eukaryotic lineages, a phenomenon called 'molecular domestication'. This is exemplified perfectly in mammals and other vertebrates, where many genes derived from long terminal repeat (LTR) retroelements (retroviruses and LTR retrotransposons) have been identified through comparative genomics and functional analyses. In particular, genes derived from gag structural protein and envelope (env) genes, as well as from the integrase-coding and protease-coding sequences, have been identified in humans and other vertebrates. Retroelement-derived genes are involved in many important biological processes including placenta formation, cognitive functions in the brain and immunity against retroelements, as well as in cell proliferation, apoptosis and cancer. These observations support an important role of retroelement-derived genes in the evolution and diversification of the vertebrate lineage. Copyright © 2016 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.

  8. Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic

    PubMed Central

    Yebra, Gonzalo; Hodcroft, Emma B.; Ragonnet-Cronin, Manon L.; Pillay, Deenan; Brown, Andrew J. Leigh; Fraser, Christophe; Kellam, Paul; de Oliveira, Tulio; Dennis, Ann; Hoppe, Anne; Kityo, Cissy; Frampton, Dan; Ssemwanga, Deogratius; Tanser, Frank; Keshani, Jagoda; Lingappa, Jairam; Herbeck, Joshua; Wawer, Maria; Essex, Max; Cohen, Myron S.; Paton, Nicholas; Ratmann, Oliver; Kaleebu, Pontiano; Hayes, Richard; Fidler, Sarah; Quinn, Thomas; Novitsky, Vladimir; Haywards, Andrew; Nastouli, Eleni; Morris, Steven; Clark, Duncan; Kozlakidis, Zisis

    2016-01-01

    HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes (gag-pol-env, gag-pol, gag, pol, env and partial pol) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree’s using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag-pol-env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences. PMID:28008945

  9. Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic.

    PubMed

    Yebra, Gonzalo; Hodcroft, Emma B; Ragonnet-Cronin, Manon L; Pillay, Deenan; Brown, Andrew J Leigh

    2016-12-23

    HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes (gag-pol-env, gag-pol, gag, pol, env and partial pol) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree's using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag-pol-env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences.

  10. Transcriptome sequencing of different narrow-leafed lupin tissue types provides a comprehensive uni-gene assembly and extensive gene-based molecular markers

    PubMed Central

    Kamphuis, Lars G; Hane, James K; Nelson, Matthew N; Gao, Lingling; Atkins, Craig A; Singh, Karam B

    2015-01-01

    Narrow-leafed lupin (NLL; Lupinus angustifolius L.) is an important grain legume crop that is valuable for sustainable farming and is becoming recognized as a human health food. NLL breeding is directed at improving grain production, disease resistance, drought tolerance and health benefits. However, genetic and genomic studies have been hindered by a lack of extensive genomic resources for the species. Here, the generation, de novo assembly and annotation of transcriptome datasets derived from five different NLL tissue types of the reference accession cv. Tanjil are described. The Tanjil transcriptome was compared to transcriptomes of an early domesticated cv. Unicrop, a wild accession P27255, as well as accession 83A:476, together being the founding parents of two recombinant inbred line (RIL) populations. In silico predictions for transcriptome-derived gene-based length and SNP polymorphic markers were conducted and corroborated using a survey assembly sequence for NLL cv. Tanjil. This yielded extensive indel and SNP polymorphic markers for the two RIL populations. A total of 335 transcriptome-derived markers and 66 BAC-end sequence-derived markers were evaluated, and 275 polymorphic markers were selected to genotype the reference NLL 83A:476 × P27255 RIL population. This significantly improved the completeness, marker density and quality of the reference NLL genetic map. PMID:25060816

  11. Activation of Adhesion G Protein-coupled Receptors: AGONIST SPECIFICITY OF STACHEL SEQUENCE-DERIVED PEPTIDES.

    PubMed

    Demberg, Lilian M; Winkler, Jana; Wilde, Caroline; Simon, Kay-Uwe; Schön, Julia; Rothemund, Sven; Schöneberg, Torsten; Prömel, Simone; Liebscher, Ines

    2017-03-17

    Members of the adhesion G protein-coupled receptor (aGPCR) family carry an agonistic sequence within their large ectodomains. Peptides derived from this region, called the Stachel sequence, can activate the respective receptor. As the conserved core region of the Stachel sequence is highly similar between aGPCRs, the agonist specificity of Stachel sequence-derived peptides was tested between family members using cell culture-based second messenger assays. Stachel peptides derived from aGPCRs of subfamily VI (GPR110/ADGRF1, GPR116/ADGRF5) and subfamily VIII (GPR64/ADGRG2, GPR126/ADGRG6) are able to activate more than one member of the respective subfamily supporting their evolutionary relationship and defining them as pharmacological receptor subtypes. Extended functional analyses of the Stachel sequences and derived peptides revealed agonist promiscuity, not only within, but also between aGPCR subfamilies. For example, the Stachel -derived peptide of GPR110 (subfamily VI) can activate GPR64 and GPR126 (both subfamily VIII). Our results indicate that key residues in the Stachel sequence are very similar between aGPCRs allowing for agonist promiscuity of several Stachel -derived peptides. Therefore, aGPCRs appear to be pharmacologically more closely related than previously thought. Our findings have direct implications for many aGPCR studies, as potential functional overlap has to be considered for in vitro and in vivo studies. However, it also offers the possibility of a broader use of more potent peptides when the original Stachel sequence is less effective. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

  12. Determination of the sequences of protein-derived peptides and peptide mixtures by mass spectrometry

    PubMed Central

    Morris, Howard R.; Williams, Dudley H.; Ambler, Richard P.

    1971-01-01

    Micro-quantities of protein-derived peptides have been converted into N-acetylated permethyl derivatives, and their sequences determined by low-resolution mass spectrometry without prior knowledge of their amino acid compositions or lengths. A new strategy is suggested for the mass spectrometric sequencing of oligopeptides or proteins, involving gel filtration of protein hydrolysates and subsequent sequence analysis of peptide mixtures. Finally, results are given that demonstrate for the first time the use of mass spectrometry for the analysis of a protein-derived peptide mixture, again without prior knowledge of the protein or components within the mixture. PMID:5158904

  13. Development of Scoring Functions for Antibody Sequence Assessment and Optimization

    PubMed Central

    Seeliger, Daniel

    2013-01-01

    Antibody development is still associated with substantial risks and difficulties as single mutations can radically change molecule properties like thermodynamic stability, solubility or viscosity. Since antibody generation methodologies cannot select and optimize for molecule properties which are important for biotechnological applications, careful sequence analysis and optimization is necessary to develop antibodies that fulfil the ambitious requirements of future drugs. While efforts to grab the physical principles of undesired molecule properties from the very bottom are becoming increasingly powerful, the wealth of publically available antibody sequences provides an alternative way to develop early assessment strategies for antibodies using a statistical approach which is the objective of this paper. Here, publically available sequences were used to develop heuristic potentials for the framework regions of heavy and light chains of antibodies of human and murine origin. The potentials take into account position dependent probabilities of individual amino acids but also conditional probabilities which are inevitable for sequence assessment and optimization. It is shown that the potentials derived from human sequences clearly distinguish between human sequences and sequences from mice and, hence, can be used as a measure of humaness which compares a given sequence with the phenotypic pool of human sequences instead of comparing sequence identities to germline genes. Following this line, it is demonstrated that, using the developed potentials, humanization of an antibody can be described as a simple mathematical optimization problem and that the in-silico generated framework variants closely resemble native sequences in terms of predicted immunogenicity. PMID:24204701

  14. Data Interoperability of Whole Exome Sequencing (WES) Based Mutational Burden Estimates from Different Laboratories

    PubMed Central

    Qiu, Ping; Pang, Ling; Arreaza, Gladys; Maguire, Maureen; Chang, Ken C. N.; Marton, Matthew J.; Levitan, Diane

    2016-01-01

    Immune checkpoint inhibitors, which unleash a patient’s own T cells to kill tumors, are revolutionizing cancer treatment. Several independent studies suggest that higher non-synonymous mutational burden assessed by whole exome sequencing (WES) in tumors is associated with improved objective response, durable clinical benefit, and progression-free survival in immune checkpoint inhibitors treatment. Next-generation sequencing (NGS) is a promising technology being used in the clinic to direct patient treatment. Cancer genome WES poses a unique challenge due to tumor heterogeneity and sequencing artifacts introduced by formalin-fixed, paraffin-embedded (FFPE) tissue. In order to evaluate the data interoperability of WES data from different sources to survey tumor mutational landscape, we compared WES data of several tumor/normal matched samples from five commercial vendors. A large data discrepancy was observed from vendors’ self-reported data. Independent data analysis from vendors’ raw NGS data shows that whole exome sequencing data from qualified vendors can be combined and analyzed uniformly to derive comparable quantitative estimates of tumor mutational burden. PMID:27136543

  15. FARME DB: a functional antibiotic resistance element database

    PubMed Central

    Wallace, James C.; Port, Jesse A.; Smith, Marissa N.; Faustman, Elaine M.

    2017-01-01

    Antibiotic resistance (AR) is a major global public health threat but few resources exist that catalog AR genes outside of a clinical context. Current AR sequence databases are assembled almost exclusively from genomic sequences derived from clinical bacterial isolates and thus do not include many microbial sequences derived from environmental samples that confer resistance in functional metagenomic studies. These environmental metagenomic sequences often show little or no similarity to AR sequences from clinical isolates using standard classification criteria. In addition, existing AR databases provide no information about flanking sequences containing regulatory or mobile genetic elements. To help address this issue, we created an annotated database of DNA and protein sequences derived exclusively from environmental metagenomic sequences showing AR in laboratory experiments. Our Functional Antibiotic Resistant Metagenomic Element (FARME) database is a compilation of publically available DNA sequences and predicted protein sequences conferring AR as well as regulatory elements, mobile genetic elements and predicted proteins flanking antibiotic resistant genes. FARME is the first database to focus on functional metagenomic AR gene elements and provides a resource to better understand AR in the 99% of bacteria which cannot be cultured and the relationship between environmental AR sequences and antibiotic resistant genes derived from cultured isolates. Database URL: http://staff.washington.edu/jwallace/farme PMID:28077567

  16. Alleviating artifacts in 1H MRI thermometry by single scan spatiotemporal encoding.

    PubMed

    Schmidt, Rita; Frydman, Lucio

    2013-10-01

    Recent years have seen an increased interest in combining MRI thermometry with devices capable of destroying malignancies by heat ablation. Expected from the MR protocols are accurate and fast thermal characterizations, providing real time feedback on restricted tissue volumes and/or rapidly moving organs like liver. This article explores the potential advantages of relying on spatiotemporally encoded (SPEN) sequences for retrieving real-time thermometric images based on the water's proton resonance frequency (PRF) shifts. Hybrid spatiotemporal/k-space encoding single-scan MRI experiments were implemented on animal and human scanners, and their abilities to deliver single- and multi-slice real-time thermometric measurements based on PRF-derived phase maps in phantoms and in vivo, were compared against echo planar imaging (EPI) and gradient-echo counterparts. Under comparable acquisition conditions, SPEN exhibited advantages vis-à-vis EPI in terms of dealing with inhomogeneous magnetic field distortions, with shifts arising due to changes in the central frequency offsets, with PRF distributions, and for zooming into restricted fields-of-view without special pulse sequence provisions. This work confirms the ability of SPEN sequences, particularly when implemented under fully-refocused conditions, to exploit their built-in robustness to shift- and field-derived inhomogeneities for monitoring thermal changes in real-time under in vitro and in vivo conditions.

  17. Unveiling the metabolic potential of two soil-derived microbial consortia selected on wheat straw

    PubMed Central

    Jiménez, Diego Javier; Chaves-Moreno, Diego; van Elsas, Jan Dirk

    2015-01-01

    Based on the premise that plant biomass can be efficiently degraded by mixed microbial cultures and/or enzymes, we here applied a targeted metagenomics-based approach to explore the metabolic potential of two forest soil-derived lignocellulolytic microbial consortia, denoted RWS and TWS (bred on wheat straw). Using the metagenomes of three selected batches of two experimental systems, about 1.2 Gb of sequence was generated. Comparative analyses revealed an overrepresentation of predicted carbohydrate transporters (ABC, TonB and phosphotransferases), two-component sensing systems and β-glucosidases/galactosidases in the two consortia as compared to the forest soil inoculum. Additionally, “profiling” of carbohydrate-active enzymes showed significant enrichments of several genes encoding glycosyl hydrolases of families GH2, GH43, GH92 and GH95. Sequence analyses revealed these to be most strongly affiliated to genes present on the genomes of Sphingobacterium, Bacteroides, Flavobacterium and Pedobacter spp. Assembly of the RWS and TWS metagenomes generated 16,536 and 15,902 contigs of ≥10 Kb, respectively. Thirteen contigs, containing 39 glycosyl hydrolase genes, constitute novel (hemi)cellulose utilization loci with affiliation to sequences primarily found in the Bacteroidetes. Overall, this study provides deep insight in the plant polysaccharide degrading capabilities of microbial consortia bred from forest soil, highlighting their biotechnological potential. PMID:26343383

  18. Repair of DNA double-strand breaks by templated nucleotide sequence insertions derived from distant regions of the genome.

    PubMed

    Onozawa, Masahiro; Zhang, Zhenhua; Kim, Yoo Jung; Goldberg, Liat; Varga, Tamas; Bergsagel, P Leif; Kuehl, W Michael; Aplan, Peter D

    2014-05-27

    We used the I-SceI endonuclease to produce DNA double-strand breaks (DSBs) and observed that a fraction of these DSBs were repaired by insertion of sequences, which we termed "templated sequence insertions" (TSIs), derived from distant regions of the genome. These TSIs were derived from genic, retrotransposon, or telomere sequences and were not deleted from the donor site in the genome, leading to the hypothesis that they were derived from reverse-transcribed RNA. Cotransfection of RNA and an I-SceI expression vector demonstrated insertion of RNA-derived sequences at the DNA-DSB site, and TSIs were suppressed by reverse-transcriptase inhibitors. Both observations support the hypothesis that TSIs were derived from RNA templates. In addition, similar insertions were detected at sites of DNA DSBs induced by transcription activator-like effector nuclease proteins. Whole-genome sequencing of myeloma cell lines revealed additional TSIs, demonstrating that repair of DNA DSBs via insertion was not restricted to experimentally produced DNA DSBs. Analysis of publicly available databases revealed that many of these TSIs are polymorphic in the human genome. Taken together, these results indicate that insertional events should be considered as alternatives to gross chromosomal rearrangements in the interpretation of whole-genome sequence data and that this mutagenic form of DNA repair may play a role in genetic disease, exon shuffling, and mammalian evolution.

  19. Genome sequence and analysis of Lactobacillus helveticus

    PubMed Central

    Cremonesi, Paola; Chessa, Stefania; Castiglioni, Bianca

    2013-01-01

    The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of Lactobacillus helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE) inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract. As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones. PMID:23335916

  20. Mitochondrial and Nuclear Ribosomal DNA Evidence Supports the Existence of a New Trichuris Species in the Endangered François’ Leaf-Monkey

    PubMed Central

    Liu, Guo-Hua; Gasser, Robin B.; Nejsum, Peter; Wang, Yan; Chen, Qiang; Song, Hui-Qun; Zhu, Xing-Quan

    2013-01-01

    The whipworm of humans, Trichuris trichiura, is responsible for a neglected tropical disease (NTD) of major importance in tropical and subtropical countries of the world. Whipworms also infect animal hosts, including pigs, dogs and non-human primates, cause clinical disease (trichuriasis) similar to that of humans. Although Trichuris species are usually considered to be host specific, it is not clear whether non-human primates are infected with T. trichiura or other species. In the present study, we sequenced the complete mitochondrial (mt) genome as well as the first and second internal transcribed spacers (ITS-1 and ITS-2) of Trichuris from the François’ leaf-monkey (langur), and compared them with homologous sequences from human- and pig-derived Trichuris. In addition, sequence comparison of a conserved mt ribosomal gene among multiple individual whipworms revealed substantial nucleotide differences among these three host species but limited sequence variation within each of them. The molecular data indicate that the monkey-derived whipworm is a separate species from that of humans. Future work should focus on detailed population genetic and morphological studies (by electron microscopy) of whipworms from various non-humans primates and humans. PMID:23840431

  1. Mitochondrial and nuclear ribosomal DNA evidence supports the existence of a new Trichuris species in the endangered françois' leaf-monkey.

    PubMed

    Liu, Guo-Hua; Gasser, Robin B; Nejsum, Peter; Wang, Yan; Chen, Qiang; Song, Hui-Qun; Zhu, Xing-Quan

    2013-01-01

    The whipworm of humans, Trichuris trichiura, is responsible for a neglected tropical disease (NTD) of major importance in tropical and subtropical countries of the world. Whipworms also infect animal hosts, including pigs, dogs and non-human primates, cause clinical disease (trichuriasis) similar to that of humans. Although Trichuris species are usually considered to be host specific, it is not clear whether non-human primates are infected with T. trichiura or other species. In the present study, we sequenced the complete mitochondrial (mt) genome as well as the first and second internal transcribed spacers (ITS-1 and ITS-2) of Trichuris from the François' leaf-monkey (langur), and compared them with homologous sequences from human- and pig-derived Trichuris. In addition, sequence comparison of a conserved mt ribosomal gene among multiple individual whipworms revealed substantial nucleotide differences among these three host species but limited sequence variation within each of them. The molecular data indicate that the monkey-derived whipworm is a separate species from that of humans. Future work should focus on detailed population genetic and morphological studies (by electron microscopy) of whipworms from various non-humans primates and humans.

  2. In-vitro Assessment of Knee MRI in the Presence of Metal Implants Comparing MAVRIC-SL and Conventional FSE Sequences at 1.5 and 3 Tesla Field Strength

    PubMed Central

    Liebl, Hans; Heilmeier, Ursula; Lee, Sonia; Nardo, Lorenzo; Patsch, Janina; Schuppert, Christopher; Han, Misung; Rondak, Ina-Christine; Banerjee, Suchandrima; Koch, Kevin; Link, Thomas M.; Krug, Roland

    2014-01-01

    PURPOSE To assess lesion detection and artifact size reduction of a MAVRIC-SEMAC hybrid sequence (MAVRIC-SL) compared to standard sequences at 1.5T and 3T in porcine knee specimens with metal hardware. METHODS Artificial cartilage and bone lesions of defined size were created in the proximity of titanium and steel screws with 2.5 mm diameter in 12 porcine knee specimens and were imaged at 1.5T and 3T MRI with MAVRIC-SL PD and STIR, standard FSE T2 PD and STIR and fat-saturated T2 FSE sequences. Three radiologists blinded to the lesion locations assessed lesion detection rates on randomized images for each sequence using ROC. Artifact length and width were measured. RESULTS Metal artifact sizes were largest in the presence of steel screws at 3T (FSE T2 FS: 28.7cm2) and 1.5T (16.03cm2). MAVRIC-SL PD and STIR reduced artifact sizes at both 3T (1.43cm2; 2.46cm2) and 1.5T (1.16cm2; 1.59cm2) compared to FS T2 FSE sequences (27.57cm2; 13.20cm2). At 3T, ROC derived AUC values using MAVRIC-SL sequences were significantly higher compared to standard sequences (MAVRIC-PD: 0.87, versus FSE-T2-FS: 0.73 (p=0.025); MAVRIC- STIR: 0.9 versus T2-STIR: 0.78 (p=0.001) and versus FSE-T2-FS: 0.73 (p=0.026)). Similar values were observed at 1.5T. Comparison of 3T and 1.5T showed no significant differences (MAVRIC-SL PD: p=0.382; MAVRIC-SL STIR: p=0.071. CONCLUSION MAVRIC-SL sequences provided superior lesion detection and reduced metal artifact size at both 1.5T and 3T compared to conventionally used FSE sequences. No significant disadvantage was found comparing MAVRIC-SL at 3T and 1.5T, though metal artifacts at 3T were larger. PMID:24912802

  3. In-depth genome analyses of viruses from vaccine-derived rabies cases and corresponding live-attenuated oral rabies vaccines.

    PubMed

    Pfaff, Florian; Müller, Thomas; Freuling, Conrad M; Fehlner-Gardiner, Christine; Nadin-Davis, Susan; Robardet, Emmanuelle; Cliquet, Florence; Vuta, Vlad; Hostnik, Peter; Mettenleiter, Thomas C; Beer, Martin; Höper, Dirk

    2018-02-10

    Live-attenuated rabies virus strains such as those derived from the field isolate Street Alabama Dufferin (SAD) have been used extensively and very effectively as oral rabies vaccines for the control of fox rabies in both Europe and Canada. Although these vaccines are safe, some cases of vaccine-derived rabies have been detected during rabies surveillance accompanying these campaigns. In recent analysis it was shown that some commercial SAD vaccines consist of diverse viral populations, rather than clonal genotypes. For cases of vaccine-derived rabies, only consensus sequence data have been available to date and information concerning their population diversity was thus lacking. In our study, we used high-throughput sequencing to analyze 11 cases of vaccine-derived rabies, and compared their viral population diversity to the related oral rabies vaccines using pairwise Manhattan distances. This extensive deep sequencing analysis of vaccine-derived rabies cases observed during oral vaccination programs provided deeper insights into the effect of accidental in vivo replication of genetically diverse vaccine strains in the central nervous system of target and non-target species under field conditions. The viral population in vaccine-derived cases appeared to be clonal in contrast to their parental vaccines. The change from a state of high population diversity present in the vaccine batches to a clonal genotype in the affected animal may indicate the presence of a strong bottleneck during infection. In conclusion, it is very likely that these few cases are the consequence of host factors and not the result of the selection of a more virulent genotype. Furthermore, this type of vaccine-derived rabies leads to the selection of clonal genotypes and the selected variants were genetically very similar to potent SAD vaccines that have undergone a history of in vitro selection. Copyright © 2018. Published by Elsevier Ltd.

  4. Anomalous Diffusion Measured by a Twice-Refocused Spin Echo Pulse Sequence: Analysis Using Fractional Order Calculus

    PubMed Central

    2011-01-01

    Purpose To theoretically develop and experimentally validate a formulism based on a fractional order calculus (FC) diffusion model to characterize anomalous diffusion in brain tissues measured with a twice-refocused spin-echo (TRSE) pulse sequence. Materials and Methods The FC diffusion model is the fractional order generalization of the Bloch-Torrey equation. Using this model, an analytical expression was derived to describe the diffusion-induced signal attenuation in a TRSE pulse sequence. To experimentally validate this expression, a set of diffusion-weighted (DW) images was acquired at 3 Tesla from healthy human brains using a TRSE sequence with twelve b-values ranging from 0 to 2,600 s/mm2. For comparison, DW images were also acquired using a Stejskal-Tanner diffusion gradient in a single-shot spin-echo echo planar sequence. For both datasets, a Levenberg-Marquardt fitting algorithm was used to extract three parameters: diffusion coefficient D, fractional order derivative in space β, and a spatial parameter μ (in units of μm). Using adjusted R-squared values and standard deviations, D, β and μ values and the goodness-of-fit in three specific regions of interest (ROI) in white matter, gray matter, and cerebrospinal fluid were evaluated for each of the two datasets. In addition, spatially resolved parametric maps were assessed qualitatively. Results The analytical expression for the TRSE sequence, derived from the FC diffusion model, accurately characterized the diffusion-induced signal loss in brain tissues at high b-values. In the selected ROIs, the goodness-of-fit and standard deviations for the TRSE dataset were comparable with the results obtained from the Stejskal-Tanner dataset, demonstrating the robustness of the FC model across multiple data acquisition strategies. Qualitatively, the D, β, and μ maps from the TRSE dataset exhibited fewer artifacts, reflecting the improved immunity to eddy currents. Conclusion The diffusion-induced signal attenuation in a TRSE pulse sequence can be described by an FC diffusion model at high b-values. This model performs equally well for data acquired from the human brain tissues with a TRSE pulse sequence or a conventional Stejskal-Tanner sequence. PMID:21509877

  5. Anomalous diffusion measured by a twice-refocused spin echo pulse sequence: analysis using fractional order calculus.

    PubMed

    Gao, Qing; Srinivasan, Girish; Magin, Richard L; Zhou, Xiaohong Joe

    2011-05-01

    To theoretically develop and experimentally validate a formulism based on a fractional order calculus (FC) diffusion model to characterize anomalous diffusion in brain tissues measured with a twice-refocused spin-echo (TRSE) pulse sequence. The FC diffusion model is the fractional order generalization of the Bloch-Torrey equation. Using this model, an analytical expression was derived to describe the diffusion-induced signal attenuation in a TRSE pulse sequence. To experimentally validate this expression, a set of diffusion-weighted (DW) images was acquired at 3 Tesla from healthy human brains using a TRSE sequence with twelve b-values ranging from 0 to 2600 s/mm(2). For comparison, DW images were also acquired using a Stejskal-Tanner diffusion gradient in a single-shot spin-echo echo planar sequence. For both datasets, a Levenberg-Marquardt fitting algorithm was used to extract three parameters: diffusion coefficient D, fractional order derivative in space β, and a spatial parameter μ (in units of μm). Using adjusted R-squared values and standard deviations, D, β, and μ values and the goodness-of-fit in three specific regions of interest (ROIs) in white matter, gray matter, and cerebrospinal fluid, respectively, were evaluated for each of the two datasets. In addition, spatially resolved parametric maps were assessed qualitatively. The analytical expression for the TRSE sequence, derived from the FC diffusion model, accurately characterized the diffusion-induced signal loss in brain tissues at high b-values. In the selected ROIs, the goodness-of-fit and standard deviations for the TRSE dataset were comparable with the results obtained from the Stejskal-Tanner dataset, demonstrating the robustness of the FC model across multiple data acquisition strategies. Qualitatively, the D, β, and μ maps from the TRSE dataset exhibited fewer artifacts, reflecting the improved immunity to eddy currents. The diffusion-induced signal attenuation in a TRSE pulse sequence can be described by an FC diffusion model at high b-values. This model performs equally well for data acquired from the human brain tissues with a TRSE pulse sequence or a conventional Stejskal-Tanner sequence. Copyright © 2011 Wiley-Liss, Inc.

  6. Identification of human-to-human transmissibility factors in PB2 proteins of influenza A by large-scale mutual information analysis

    PubMed Central

    Miotto, Olivo; Heiny, AT; Tan, Tin Wee; August, J Thomas; Brusic, Vladimir

    2008-01-01

    Background The identification of mutations that confer unique properties to a pathogen, such as host range, is of fundamental importance in the fight against disease. This paper describes a novel method for identifying amino acid sites that distinguish specific sets of protein sequences, by comparative analysis of matched alignments. The use of mutual information to identify distinctive residues responsible for functional variants makes this approach highly suitable for analyzing large sets of sequences. To support mutual information analysis, we developed the AVANA software, which utilizes sequence annotations to select sets for comparison, according to user-specified criteria. The method presented was applied to an analysis of influenza A PB2 protein sequences, with the objective of identifying the components of adaptation to human-to-human transmission, and reconstructing the mutation history of these components. Results We compared over 3,000 PB2 protein sequences of human-transmissible and avian isolates, to produce a catalogue of sites involved in adaptation to human-to-human transmission. This analysis identified 17 characteristic sites, five of which have been present in human-transmissible strains since the 1918 Spanish flu pandemic. Sixteen of these sites are located in functional domains, suggesting they may play functional roles in host-range specificity. The catalogue of characteristic sites was used to derive sequence signatures from historical isolates. These signatures, arranged in chronological order, reveal an evolutionary timeline for the adaptation of the PB2 protein to human hosts. Conclusion By providing the most complete elucidation to date of the functional components participating in PB2 protein adaptation to humans, this study demonstrates that mutual information is a powerful tool for comparative characterization of sequence sets. In addition to confirming previously reported findings, several novel characteristic sites within PB2 are reported. Sequence signatures generated using the characteristic sites catalogue characterize concisely the adaptation characteristics of individual isolates. Evolutionary timelines derived from signatures of early human influenza isolates suggest that characteristic variants emerged rapidly, and remained remarkably stable through subsequent pandemics. In addition, the signatures of human-infecting H5N1 isolates suggest that this avian subtype has low pandemic potential at present, although it presents more human adaptation components than most avian subtypes. PMID:18315849

  7. CompaRNA: a server for continuous benchmarking of automated methods for RNA secondary structure prediction

    PubMed Central

    Puton, Tomasz; Kozlowski, Lukasz P.; Rother, Kristian M.; Bujnicki, Janusz M.

    2013-01-01

    We present a continuous benchmarking approach for the assessment of RNA secondary structure prediction methods implemented in the CompaRNA web server. As of 3 October 2012, the performance of 28 single-sequence and 13 comparative methods has been evaluated on RNA sequences/structures released weekly by the Protein Data Bank. We also provide a static benchmark generated on RNA 2D structures derived from the RNAstrand database. Benchmarks on both data sets offer insight into the relative performance of RNA secondary structure prediction methods on RNAs of different size and with respect to different types of structure. According to our tests, on the average, the most accurate predictions obtained by a comparative approach are generated by CentroidAlifold, MXScarna, RNAalifold and TurboFold. On the average, the most accurate predictions obtained by single-sequence analyses are generated by CentroidFold, ContextFold and IPknot. The best comparative methods typically outperform the best single-sequence methods if an alignment of homologous RNA sequences is available. This article presents the results of our benchmarks as of 3 October 2012, whereas the rankings presented online are continuously updated. We will gladly include new prediction methods and new measures of accuracy in the new editions of CompaRNA benchmarks. PMID:23435231

  8. Isolation, characterization, and evaluation of three Citrus sinensis-derived constitutive gene promoters.

    PubMed

    Erpen, L; Tavano, E C R; Harakava, R; Dutt, M; Grosser, J W; Piedade, S M S; Mendes, B M J; Mourão Filho, F A A

    2018-05-23

    Regulatory sequences from the citrus constitutive genes cyclophilin (CsCYP), glyceraldehyde-3-phosphate dehydrogenase C2 (CsGAPC2), and elongation factor 1-alpha (CsEF1) were isolated, fused to the uidA gene, and qualitatively and quantitatively evaluated in transgenic sweet orange plants. The 5' upstream region of a gene (the promoter) is the most important component for the initiation and regulation of gene transcription of both native genes and transgenes in plants. The isolation and characterization of gene regulatory sequences are essential to the development of intragenic or cisgenic genetic manipulation strategies, which imply the use of genetic material from the same species or from closely related species. We describe herein the isolation and evaluation of the promoter sequence from three constitutively expressed citrus genes: cyclophilin (CsCYP), glyceraldehyde-3-phosphate dehydrogenase C2 (CsGAPC2), and elongation factor 1-alpha (CsEF1). The functionality of the promoters was confirmed by a histochemical GUS assay in leaves, stems, and roots of stably transformed citrus plants expressing the promoter-uidA construct. Lower uidA mRNA levels were detected when the transgene was under the control of citrus promoters as compared to the expression under the control of the CaMV35S promoter. The association of the uidA gene with the citrus-derived promoters resulted in mRNA levels of up to 60-41.8% of the value obtained with the construct containing CaMV35S driving the uidA gene. Moreover, a lower inter-individual variability in transgene expression was observed amongst the different transgenic lines, where gene constructs containing citrus-derived promoters were used. In silico analysis of the citrus-derived promoter sequences revealed that their activity may be controlled by several putative cis-regulatory elements. These citrus promoters will expand the availability of regulatory sequences for driving gene expression in citrus gene-modification programs.

  9. Comparative analysis of Campylobacter isolates from wild birds and chickens using MALDI-TOF MS, biochemical testing, and DNA sequencing.

    PubMed

    Lawton, Samantha J; Weis, Allison M; Byrne, Barbara A; Fritz, Heather; Taff, Conor C; Townsend, Andrea K; Weimer, Bart C; Mete, Aslı; Wheeler, Sarah; Boyce, Walter M

    2018-05-01

    Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) was compared to conventional biochemical testing methods and nucleic acid analyses (16S rDNA sequencing, hippurate hydrolysis gene testing, whole genome sequencing [WGS]) for species identification of Campylobacter isolates obtained from chickens ( Gallus gallus domesticus, n = 8), American crows ( Corvus brachyrhynchos, n = 17), a mallard duck ( Anas platyrhynchos, n = 1), and a western scrub-jay ( Aphelocoma californica, n = 1). The test results for all 27 isolates were in 100% agreement between MALDI-TOF MS, the combined results of 16S rDNA sequencing, and the hippurate hydrolysis gene PCR ( p = 0.0027, kappa = 1). Likewise, the identifications derived from WGS from a subset of 14 isolates were in 100% agreement with the MALDI-TOF MS identification. In contrast, biochemical testing misclassified 5 isolates of C. jejuni as C. coli, and 16S rDNA sequencing alone was not able to differentiate between C. coli and C. jejuni for 11 sequences ( p = 0.1573, kappa = 0.0857) when compared to MALDI-TOF MS and WGS. No agreement was observed between MALDI-TOF MS dendrograms and the phylogenetic relationships revealed by rDNA sequencing or WGS. Our results confirm that MALDI-TOF MS is a fast and reliable method for identifying Campylobacter isolates to the species level from wild birds and chickens, but not for elucidating phylogenetic relationships among Campylobacter isolates.

  10. Neutrality and evolvability of designed protein sequences

    NASA Astrophysics Data System (ADS)

    Bhattacherjee, Arnab; Biswas, Parbati

    2010-07-01

    The effect of foldability on protein’s evolvability is analyzed by a two-prong approach consisting of a self-consistent mean-field theory and Monte Carlo simulations. Theory and simulation models representing protein sequences with binary patterning of amino acid residues compatible with a particular foldability criteria are used. This generalized foldability criterion is derived using the high temperature cumulant expansion approximating the free energy of folding. The effect of cumulative point mutations on these designed proteins is studied under neutral condition. The robustness, protein’s ability to tolerate random point mutations is determined with a selective pressure of stability (ΔΔG) for the theory designed sequences, which are found to be more robust than that of Monte Carlo and mean-field-biased Monte Carlo generated sequences. The results show that this foldability criterion selects viable protein sequences more effectively compared to the Monte Carlo method, which has a marked effect on how the selective pressure shapes the evolutionary sequence space. These observations may impact de novo sequence design and its applications in protein engineering.

  11. MetaCompare: A computational pipeline for prioritizing environmental resistome risk.

    PubMed

    Oh, Min; Pruden, Amy; Chen, Chaoqi; Heath, Lenwood S; Xia, Kang; Zhang, Liqing

    2018-04-26

    The spread of antibiotic resistance is a growing public health concern. While numerous studies have highlighted the importance of environmental sources and pathways of the spread of antibiotic resistance, a systematic means of comparing and prioritizing risks represented by various environmental compartments is lacking. Here we introduce MetaCompare, a publicly-available tool for ranking 'resistome risk,' which we define as the potential for antibiotic resistance genes (ARGs) to be associated with mobile genetic elements (MGEs) and mobilize to pathogens based on metagenomic data. A computational pipeline was developed in which each ARG is evaluated based on relative abundance, mobility, and presence within a pathogen. This is determined through assembly of shotgun sequencing data and analysis of contigs containing ARGs to determine if they contain sequence similarity to MGEs or human pathogens. Based on the assembled metagenomes, samples are projected into a 3-D hazard space and assigned resistome risk scores. To validate, we tested previously published metagenomic data derived from distinct aquatic environments. Based on unsupervised machine learning, the test samples clustered in the hazard space in a manner consistent with their origin. The derived scores produced a well-resolved ascending resistome risk ranking of: wastewater treatment plant effluent, dairy lagoon, hospital sewage.

  12. Algorithms for accelerated convergence of adaptive PCA.

    PubMed

    Chatterjee, C; Kang, Z; Roychowdhury, V P

    2000-01-01

    We derive and discuss new adaptive algorithms for principal component analysis (PCA) that are shown to converge faster than the traditional PCA algorithms due to Oja, Sanger, and Xu. It is well known that traditional PCA algorithms that are derived by using gradient descent on an objective function are slow to converge. Furthermore, the convergence of these algorithms depends on appropriate choices of the gain sequences. Since online applications demand faster convergence and an automatic selection of gains, we present new adaptive algorithms to solve these problems. We first present an unconstrained objective function, which can be minimized to obtain the principal components. We derive adaptive algorithms from this objective function by using: 1) gradient descent; 2) steepest descent; 3) conjugate direction; and 4) Newton-Raphson methods. Although gradient descent produces Xu's LMSER algorithm, the steepest descent, conjugate direction, and Newton-Raphson methods produce new adaptive algorithms for PCA. We also provide a discussion on the landscape of the objective function, and present a global convergence proof of the adaptive gradient descent PCA algorithm using stochastic approximation theory. Extensive experiments with stationary and nonstationary multidimensional Gaussian sequences show faster convergence of the new algorithms over the traditional gradient descent methods.We also compare the steepest descent adaptive algorithm with state-of-the-art methods on stationary and nonstationary sequences.

  13. Antibacterial Activity of Synthetic Peptides Derived from Lactoferricin against Escherichia coli ATCC 25922 and Enterococcus faecalis ATCC 29212

    PubMed Central

    León-Calvijo, María A.; Leal-Castro, Aura L.; Almanzar-Reina, Giovanni A.; Rosas-Pérez, Jaiver E.; García-Castañeda, Javier E.; Rivera-Monroy, Zuly J.

    2015-01-01

    Peptides derived from human and bovine lactoferricin were designed, synthesized, purified, and characterized using RP-HPLC and MALDI-TOF-MS. Specific changes in the sequences were designed as (i) the incorporation of unnatural amino acids in the sequence, the (ii) reduction or (iii) elongation of the peptide chain length, and (iv) synthesis of molecules with different number of branches containing the same sequence. For each peptide, the antibacterial activity against Escherichia coli ATCC 25922 and Enterococcus faecalis ATCC 29212 was evaluated. Our results showed that Peptides I.2 (RWQWRWQWR) and I.4 ((RRWQWR)4K2 Ahx 2C2) exhibit bigger or similar activity against E. coli (MIC 4–33 μM) and E. faecalis (MIC 10–33 μM) when they were compared with lactoferricin protein (LF) and some of its derivate peptides as II.1 (FKCRRWQWRMKKLGA) and IV.1 (FKCRRWQWRMKKLGAPSITCVRRAE). It should be pointed out that Peptides I.2 and I.4, containing the RWQWR motif, are short and easy to synthesize; our results demonstrate that it is possible to design and obtain synthetic peptides that exhibit enhanced antibacterial activity using a methodology that is fast and low-cost and that allows obtaining products with a high degree of purity and high yield. PMID:25815317

  14. Antibacterial activity of synthetic peptides derived from lactoferricin against Escherichia coli ATCC 25922 and Enterococcus faecalis ATCC 29212.

    PubMed

    León-Calvijo, María A; Leal-Castro, Aura L; Almanzar-Reina, Giovanni A; Rosas-Pérez, Jaiver E; García-Castañeda, Javier E; Rivera-Monroy, Zuly J

    2015-01-01

    Peptides derived from human and bovine lactoferricin were designed, synthesized, purified, and characterized using RP-HPLC and MALDI-TOF-MS. Specific changes in the sequences were designed as (i) the incorporation of unnatural amino acids in the sequence, the (ii) reduction or (iii) elongation of the peptide chain length, and (iv) synthesis of molecules with different number of branches containing the same sequence. For each peptide, the antibacterial activity against Escherichia coli ATCC 25922 and Enterococcus faecalis ATCC 29212 was evaluated. Our results showed that Peptides I.2 (RWQWRWQWR) and I.4 ((RRWQWR)4K2Ahx2C2) exhibit bigger or similar activity against E. coli (MIC 4-33 μM) and E. faecalis (MIC 10-33 μM) when they were compared with lactoferricin protein (LF) and some of its derivate peptides as II.1 (FKCRRWQWRMKKLGA) and IV.1 (FKCRRWQWRMKKLGAPSITCVRRAE). It should be pointed out that Peptides I.2 and I.4, containing the RWQWR motif, are short and easy to synthesize; our results demonstrate that it is possible to design and obtain synthetic peptides that exhibit enhanced antibacterial activity using a methodology that is fast and low-cost and that allows obtaining products with a high degree of purity and high yield.

  15. Archaebacterial rhodopsin sequences: Implications for evolution

    NASA Technical Reports Server (NTRS)

    Lanyi, J. K.

    1991-01-01

    It was proposed over 10 years ago that the archaebacteria represent a separate kingdom which diverged very early from the eubacteria and eukaryotes. It follows that investigations of archaebacterial characteristics might reveal features of early evolution. So far, two genes, one for bacteriorhodopsin and another for halorhodopsin, both from Halobacterium halobium, have been sequenced. We cloned and sequenced the gene coding for the polypeptide of another one of these rhodopsins, a halorhodopsin in Natronobacterium pharaonis. Peptide sequencing of cyanogen bromide fragments, and immuno-reactions of the protein and synthetic peptides derived from the C-terminal gene sequence, confirmed that the open reading frame was the structural gene for the pharaonis halorhodopsin polypeptide. The flanking DNA sequences of this gene, as well as those of other bacterial rhodopsins, were compared to previously proposed archaebacterial consensus sequences. In pairwise comparisons of the open reading frame with DNA sequences for bacterio-opsin and halo-opsin from Halobacterium halobium, silent divergences were calculated. These indicate very considerable evolutionary distance between each pair of genes, even in the dame organism. In spite of this, three protein sequences show extensive similarities, indicating strong selective pressures.

  16. Evolution of hard proteins in the sauropsid integument in relation to the cornification of skin derivatives in amniotes.

    PubMed

    Alibardi, Lorenzo; Dalla Valle, Luisa; Nardi, Alessia; Toni, Mattia

    2009-04-01

    Hard skin appendages in amniotes comprise scales, feathers and hairs. The cell organization of these appendages probably derived from the localization of specialized areas of dermal-epidermal interaction in the integument. The horny scales and the other derivatives were formed from large areas of dermal-epidermal interaction. The evolution of these skin appendages was characterized by the production of specific coiled-coil keratins and associated proteins in the inter-filament matrix. Unlike mammalian keratin-associated proteins, those of sauropsids contain a double beta-folded sequence of about 20 amino acids, known as the core-box. The core-box shows 60%-95% sequence identity with known reptilian and avian proteins. The core-box determines the polymerization of these proteins into filaments indicated as beta-keratin filaments. The nucleotide and derived amino acid sequences for these sauropsid keratin-associated proteins are presented in conjunction with a hypothesis about their evolution in reptiles-birds compared to mammalian keratin-associated proteins. It is suggested that genes coding for ancestral glycine-serine-rich sequences of alpha-keratins produced a new class of small matrix proteins. In sauropsids, matrix proteins may have originated after mutation and enrichment in proline, probably in a central region of the ancestral protein. This mutation gave rise to the core-box, and other regions of the original protein evolved differently in the various reptilians orders. In lepidosaurians, two main groups, the high glycine proline and the high cysteine proline proteins, were formed. In archosaurians and chelonians two main groups later diversified into the high glycine proline tyrosine, non-feather proteins, and into the glycine-tyrosine-poor group of feather proteins, which evolved in birds. The latter proteins were particularly suited for making the elongated barb/barbule cells of feathers. In therapsids-mammals, mutations of the ancestral proteins formed the high glycine-tyrosine or the high cysteine proteins but no core-box was produced in the matrix proteins of the hard corneous material of mammalian derivatives.

  17. Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial community DNA sample.

    PubMed

    Luo, Chengwei; Tsementzi, Despina; Kyrpides, Nikos; Read, Timothy; Konstantinidis, Konstantinos T

    2012-01-01

    Next-generation sequencing (NGS) is commonly used in metagenomic studies of complex microbial communities but whether or not different NGS platforms recover the same diversity from a sample and their assembled sequences are of comparable quality remain unclear. We compared the two most frequently used platforms, the Roche 454 FLX Titanium and the Illumina Genome Analyzer (GA) II, on the same DNA sample obtained from a complex freshwater planktonic community. Despite the substantial differences in read length and sequencing protocols, the platforms provided a comparable view of the community sampled. For instance, derived assemblies overlapped in ~90% of their total sequences and in situ abundances of genes and genotypes (estimated based on sequence coverage) correlated highly between the two platforms (R(2)>0.9). Evaluation of base-call error, frameshift frequency, and contig length suggested that Illumina offered equivalent, if not better, assemblies than Roche 454. The results from metagenomic samples were further validated against DNA samples of eighteen isolate genomes, which showed a range of genome sizes and G+C% content. We also provide quantitative estimates of the errors in gene and contig sequences assembled from datasets characterized by different levels of complexity and G+C% content. For instance, we noted that homopolymer-associated, single-base errors affected ~1% of the protein sequences recovered in Illumina contigs of 10× coverage and 50% G+C; this frequency increased to ~3% when non-homopolymer errors were also considered. Collectively, our results should serve as a useful practical guide for choosing proper sampling strategies and data possessing protocols for future metagenomic studies.

  18. Covariance Matrix Estimation for Massive MIMO

    NASA Astrophysics Data System (ADS)

    Upadhya, Karthik; Vorobyov, Sergiy A.

    2018-04-01

    We propose a novel pilot structure for covariance matrix estimation in massive multiple-input multiple-output (MIMO) systems in which each user transmits two pilot sequences, with the second pilot sequence multiplied by a random phase-shift. The covariance matrix of a particular user is obtained by computing the sample cross-correlation of the channel estimates obtained from the two pilot sequences. This approach relaxes the requirement that all the users transmit their uplink pilots over the same set of symbols. We derive expressions for the achievable rate and the mean-squared error of the covariance matrix estimate when the proposed method is used with staggered pilots. The performance of the proposed method is compared with existing methods through simulations.

  19. A statistical learning approach to the modeling of chromatographic retention of oligonucleotides incorporating sequence and secondary structure data

    PubMed Central

    Sturm, Marc; Quinten, Sascha; Huber, Christian G.; Kohlbacher, Oliver

    2007-01-01

    We propose a new model for predicting the retention time of oligonucleotides. The model is based on ν support vector regression using features derived from base sequence and predicted secondary structure of oligonucleotides. Because of the secondary structure information, the model is applicable even at relatively low temperatures where the secondary structure is not suppressed by thermal denaturing. This makes the prediction of oligonucleotide retention time for arbitrary temperatures possible, provided that the target temperature lies within the temperature range of the training data. We describe different possibilities of feature calculation from base sequence and secondary structure, present the results and compare our model to existing models. PMID:17567619

  20. PatGen--a consolidated resource for searching genetic patent sequences.

    PubMed

    Rouse, Richard J D; Castagnetto, Jesus; Niedner, Roland H

    2005-04-15

    Compared to the wealth of online resources covering genomic, proteomic and derived data the Bioinformatics community is rather underserved when it comes to patent information related to biological sequences. The current online resources are either incomplete or rather expensive. This paper describes, PatGen, an integrated database containing data from bioinformatic and patent resources. This effort addresses the inconsistency of publicly available genetic patent data coverage by providing access to a consolidated dataset. PatGen can be searched at http://www.patgendb.com rjdrouse@patentinformatics.com.

  1. Large-Scale Concatenation cDNA Sequencing

    PubMed Central

    Yu, Wei; Andersson, Björn; Worley, Kim C.; Muzny, Donna M.; Ding, Yan; Liu, Wen; Ricafrente, Jennifer Y.; Wentland, Meredith A.; Lennon, Greg; Gibbs, Richard A.

    1997-01-01

    A total of 100 kb of DNA derived from 69 individual human brain cDNA clones of 0.7–2.0 kb were sequenced by concatenated cDNA sequencing (CCS), whereby multiple individual DNA fragments are sequenced simultaneously in a single shotgun library. The method yielded accurate sequences and a similar efficiency compared with other shotgun libraries constructed from single DNA fragments (>20 kb). Computer analyses were carried out on 65 cDNA clone sequences and their corresponding end sequences to examine both nucleic acid and amino acid sequence similarities in the databases. Thirty-seven clones revealed no DNA database matches, 12 clones generated exact matches (≥98% identity), and 16 clones generated nonexact matches (57%–97% identity) to either known human or other species genes. Of those 28 matched clones, 8 had corresponding end sequences that failed to identify similarities. In a protein similarity search, 27 clone sequences displayed significant matches, whereas only 20 of the end sequences had matches to known protein sequences. Our data indicate that full-length cDNA insert sequences provide significantly more nucleic acid and protein sequence similarity matches than expressed sequence tags (ESTs) for database searching. [All 65 cDNA clone sequences described in this paper have been submitted to the GenBank data library under accession nos. U79240–U79304.] PMID:9110174

  2. Precise Detection of IDH1/2 and BRAF Hotspot Mutations in Clinical Glioma Tissues by a Differential Calculus Analysis of High-Resolution Melting Data

    PubMed Central

    Hatae, Ryusuke; Yoshimoto, Koji; Kuga, Daisuke; Akagi, Yojiro; Murata, Hideki; Suzuki, Satoshi O.; Mizoguchi, Masahiro; Iihara, Koji

    2016-01-01

    High resolution melting (HRM) is a simple and rapid method for screening mutations. It offers various advantages for clinical diagnostic applications. Conventional HRM analysis often yields equivocal results, especially for surgically obtained tissues. We attempted to improve HRM analyses for more effective applications to clinical diagnostics. HRM analyses were performed for IDH1R132 and IDH2R172 mutations in 192 clinical glioma samples in duplicate and these results were compared with sequencing results. BRAFV600E mutations were analyzed in 52 additional brain tumor samples. The melting profiles were used for differential calculus analyses. Negative second derivative plots revealed additional peaks derived from heteroduplexes in PCR products that contained mutations; this enabled unequivocal visual discrimination of the mutations. We further developed a numerical expression, the HRM-mutation index (MI), to quantify the heteroduplex-derived peak of the mutational curves. Using this expression, all IDH1 mutation statuses matched those ascertained by sequencing, with the exception of three samples. These discordant results were all derived from the misinterpretation of sequencing data. The effectiveness of our approach was further validated by analyses of IDH2R172 and BRAFV600E mutations. The present analytical method enabled an unequivocal and objective HRM analysis and is suitable for reliable mutation scanning in surgically obtained glioma tissues. This approach could facilitate molecular diagnostics in clinical environments. PMID:27529619

  3. Remnants of an Ancient Deltaretrovirus in the Genomes of Horseshoe Bats (Rhinolophidae).

    PubMed

    Hron, Tomáš; Farkašová, Helena; Gifford, Robert J; Benda, Petr; Hulva, Pavel; Görföl, Tamás; Pačes, Jan; Elleder, Daniel

    2018-04-10

    Endogenous retrovirus (ERV) sequences provide a rich source of information about the long-term interactions between retroviruses and their hosts. However, most ERVs are derived from a subset of retrovirus groups, while ERVs derived from certain other groups remain extremely rare. In particular, only a single ERV sequence has been identified that shows evidence of being related to an ancient Deltaretrovirus , despite the large number of vertebrate genome sequences now available. In this report, we identify a second example of an ERV sequence putatively derived from a past deltaretroviral infection, in the genomes of several species of horseshoe bats (Rhinolophidae). This sequence represents a fragment of viral genome derived from a single integration. The time of the integration was estimated to be 11-19 million years ago. This finding, together with the previously identified endogenous Deltaretrovirus in long-fingered bats (Miniopteridae), suggest a close association of bats with ancient deltaretroviruses.

  4. Draft genome sequence of Thermoanaerobacterium sp. strain PSU-2 isolated from thermophilic hydrogen producing reactor.

    PubMed

    O-Thong, Sompong; Khongkliang, Peerawat; Mamimin, Chonticha; Singkhala, Apinya; Prasertsan, Poonsuk; Birkeland, Nils-Kåre

    2017-06-01

    Thermoanaerobacterium sp. strain PSU-2 was isolated from thermophilic hydrogen producing reactor and subjected to draft genome sequencing on 454 pyrosequencing and annotated on RAST. The draft genome sequence of strain PSU-2 contains 2,552,497 bases with an estimated G + C content of 35.2%, 2555 CDS, 8 rRNAs and 57 tRNAs. The strain had a number of genes responsible for carbohydrates metabolic, amino acids and derivatives, and protein metabolism of 17.7%, 14.39% and 9.81%, respectively. Strain PSU-2 also had gene responsible for hydrogen biosynthesis as well as the genes related to Ni-Fe hydrogenase. Comparative genomic analysis indicates strain PSU-2 shares about 94% genome sequence similarity with Thermoanaerobacterium xylanolyticum LX-11. The nucleotide sequence of this draft genome was deposited into DDBJ/ENA/GenBank under the accession MSQD00000000.

  5. In silico Comparison of 19 Porphyromonas gingivalis Strains in Genomics, Phylogenetics, Phylogenomics and Functional Genomics.

    PubMed

    Chen, Tsute; Siddiqui, Huma; Olsen, Ingar

    2017-01-01

    Currently, genome sequences of a total of 19 Porphyromonas gingivalis strains are available, including eight completed genomes (strains W83, ATCC 33277, TDC60, HG66, A7436, AJW4, 381, and A7A1-28) and 11 high-coverage draft sequences (JCVI SC001, F0185, F0566, F0568, F0569, F0570, SJD2, W4087, W50, Ando, and MP4-504) that are assembled into fewer than 300 contigs. The objective was to compare these genomes at both nucleotide and protein sequence levels in order to understand their phylogenetic and functional relatedness. Four copies of 16S rRNA gene sequences were identified in each of the eight complete genomes and one in the other 11 unfinished genomes. These 43 16S rRNA sequences represent only 24 unique sequences and the derived phylogenetic tree suggests a possible evolutionary history for these strains. Phylogenomic comparison based on shared proteins and whole genome nucleotide sequences consistently showed two groups with closely related members: one consisted of ATCC 33277, 381, and HG66, another of W83, W50, and A7436. At least 1,037 core/shared proteins were identified in the 19 P. gingivalis genomes based on the most stringent detecting parameters. Comparative functional genomics based on genome-wide comparisons between NCBI and RAST annotations, as well as additional approaches, revealed functions that are unique or missing in individual P. gingivalis strains, or species-specific in all P. gingivalis strains, when compared to a neighboring species P. asaccharolytica . All the comparative results of this study are available online for download at ftp://www.homd.org/publication_data/20160425/.

  6. In silico Comparison of 19 Porphyromonas gingivalis Strains in Genomics, Phylogenetics, Phylogenomics and Functional Genomics

    PubMed Central

    Chen, Tsute; Siddiqui, Huma; Olsen, Ingar

    2017-01-01

    Currently, genome sequences of a total of 19 Porphyromonas gingivalis strains are available, including eight completed genomes (strains W83, ATCC 33277, TDC60, HG66, A7436, AJW4, 381, and A7A1-28) and 11 high-coverage draft sequences (JCVI SC001, F0185, F0566, F0568, F0569, F0570, SJD2, W4087, W50, Ando, and MP4-504) that are assembled into fewer than 300 contigs. The objective was to compare these genomes at both nucleotide and protein sequence levels in order to understand their phylogenetic and functional relatedness. Four copies of 16S rRNA gene sequences were identified in each of the eight complete genomes and one in the other 11 unfinished genomes. These 43 16S rRNA sequences represent only 24 unique sequences and the derived phylogenetic tree suggests a possible evolutionary history for these strains. Phylogenomic comparison based on shared proteins and whole genome nucleotide sequences consistently showed two groups with closely related members: one consisted of ATCC 33277, 381, and HG66, another of W83, W50, and A7436. At least 1,037 core/shared proteins were identified in the 19 P. gingivalis genomes based on the most stringent detecting parameters. Comparative functional genomics based on genome-wide comparisons between NCBI and RAST annotations, as well as additional approaches, revealed functions that are unique or missing in individual P. gingivalis strains, or species-specific in all P. gingivalis strains, when compared to a neighboring species P. asaccharolytica. All the comparative results of this study are available online for download at ftp://www.homd.org/publication_data/20160425/. PMID:28261563

  7. Complete genome sequence and the expression pattern of plasmids of the model ethanologen Zymomonas mobilis ZM4 and its xylose-utilizing derivatives 8b and 2032

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, Shihui; Vera, Jessica M.; Grass, Jeff

    Zymomonas mobilis is a natural ethanologen being developed and deployed as an industrial biofuel producer. To date, eight Z. mobilis strains have been completely sequenced and found to contain 2-8 native plasmids. However, systematic verification of predicted Z. mobilis plasmid genes and their contribution to cell fitness has not been hitherto addressed. Moreover, the precise number and identities of plasmids in Z. mobilis model strain ZM4 have been unclear. The lack of functional information about plasmid genes in ZM4 impedes ongoing studies for this model biofuel-producing strain. In this study, we determined the complete chromosome and plasmid sequences of ZM4more » and its engineered xylose-utilizing derivatives 2032 and 8b. Compared to previously published and revised ZM4 chromosome sequences, the ZM4 chromosome sequence reported here contains 65 nucleotide sequence variations as well as a 2400-bp insertion. Four plasmids were identified in all three strains, with 150 plasmid genes predicted in strain ZM4 and 2032, and 153 plasmid genes predicted in strain 8b due to the insertion of heterologous DNA for expanded substrate utilization. Plasmid genes were then annotated using Blast2GO, InterProScan, and systems biology data analyses, and most genes were found to have apparent orthologs in other organisms or identifiable conserved domains. To verify plasmid gene prediction, RNA-Seq was used to map transcripts and also compare relative gene expression under various growth conditions, including anaerobic and aerobic conditions, or growth in different concentrations of biomass hydrolysates. Overall, plasmid genes were more responsive to varying hydrolysate concentrations than to oxygen availability. Additionally, our results indicated that although all plasmids were present in low copy number (about 1-2 per cell), the copy number of some plasmids varied under specific growth conditions or due to heterologous gene insertion. The complete genome of ZM4 and two xylose-utilizing derivatives is reported in this study, with an emphasis on identifying and characterizing plasmid genes. Furthermore, plasmid gene annotation, validation, expression levels at growth conditions of interest, and contribution to host fitness are reported for the first time.« less

  8. Complete genome sequence and the expression pattern of plasmids of the model ethanologen Zymomonas mobilis ZM4 and its xylose-utilizing derivatives 8b and 2032

    DOE PAGES

    Yang, Shihui; Vera, Jessica M.; Grass, Jeff; ...

    2018-05-02

    Zymomonas mobilis is a natural ethanologen being developed and deployed as an industrial biofuel producer. To date, eight Z. mobilis strains have been completely sequenced and found to contain 2-8 native plasmids. However, systematic verification of predicted Z. mobilis plasmid genes and their contribution to cell fitness has not been hitherto addressed. Moreover, the precise number and identities of plasmids in Z. mobilis model strain ZM4 have been unclear. The lack of functional information about plasmid genes in ZM4 impedes ongoing studies for this model biofuel-producing strain. In this study, we determined the complete chromosome and plasmid sequences of ZM4more » and its engineered xylose-utilizing derivatives 2032 and 8b. Compared to previously published and revised ZM4 chromosome sequences, the ZM4 chromosome sequence reported here contains 65 nucleotide sequence variations as well as a 2400-bp insertion. Four plasmids were identified in all three strains, with 150 plasmid genes predicted in strain ZM4 and 2032, and 153 plasmid genes predicted in strain 8b due to the insertion of heterologous DNA for expanded substrate utilization. Plasmid genes were then annotated using Blast2GO, InterProScan, and systems biology data analyses, and most genes were found to have apparent orthologs in other organisms or identifiable conserved domains. To verify plasmid gene prediction, RNA-Seq was used to map transcripts and also compare relative gene expression under various growth conditions, including anaerobic and aerobic conditions, or growth in different concentrations of biomass hydrolysates. Overall, plasmid genes were more responsive to varying hydrolysate concentrations than to oxygen availability. Additionally, our results indicated that although all plasmids were present in low copy number (about 1-2 per cell), the copy number of some plasmids varied under specific growth conditions or due to heterologous gene insertion. The complete genome of ZM4 and two xylose-utilizing derivatives is reported in this study, with an emphasis on identifying and characterizing plasmid genes. Furthermore, plasmid gene annotation, validation, expression levels at growth conditions of interest, and contribution to host fitness are reported for the first time.« less

  9. Sequencing and comparative analyses of Aegilops tauschii chromosome arm 3DS revealed rapid evolution of Triticeae genome

    USDA-ARS?s Scientific Manuscript database

    Bread wheat (Triticum aestivum, AABBDD) is an allohexaploid species derived from multiple rounds of interspecific hybridizations. A high-quality genome assembly of diploid Ae. tauschii, the donor of the wheat D genome, will provide a useful platform to study polyploid wheat evolution. A combination...

  10. Analysis of drug binding pockets and repurposing opportunities for twelve essential enzymes of ESKAPE pathogens

    PubMed Central

    Naz, Sadia; Ngo, Tony; Farooq, Umar

    2017-01-01

    Background The rapid increase in antibiotic resistance by various bacterial pathogens underlies the significance of developing new therapies and exploring different drug targets. A fraction of bacterial pathogens abbreviated as ESKAPE by the European Center for Disease Prevention and Control have been considered a major threat due to the rise in nosocomial infections. Here, we compared putative drug binding pockets of twelve essential and mostly conserved metabolic enzymes in numerous bacterial pathogens including those of the ESKAPE group and Mycobacterium tuberculosis. The comparative analysis will provide guidelines for the likelihood of transferability of the inhibitors from one species to another. Methods Nine bacterial species including six ESKAPE pathogens, Mycobacterium tuberculosis along with Mycobacterium smegmatis and Eschershia coli, two non-pathogenic bacteria, have been selected for drug binding pocket analysis of twelve essential enzymes. The amino acid sequences were obtained from Uniprot, aligned using ICM v3.8-4a and matched against the Pocketome encyclopedia. We used known co-crystal structures of selected target enzyme orthologs to evaluate the location of their active sites and binding pockets and to calculate a matrix of pairwise sequence identities across each target enzyme across the different species. This was used to generate sequence maps. Results High sequence identity of enzyme binding pockets, derived from experimentally determined co-crystallized structures, was observed among various species. Comparison at both full sequence level and for drug binding pockets of key metabolic enzymes showed that binding pockets are highly conserved (sequence similarity up to 100%) among various ESKAPE pathogens as well as Mycobacterium tuberculosis. Enzymes orthologs having conserved binding sites may have potential to interact with inhibitors in similar way and might be helpful for design of similar class of inhibitors for a particular species. The derived pocket alignments and distance-based maps provide guidelines for drug discovery and repurposing. In addition they also provide recommendations for the relevant model bacteria that may be used for initial drug testing. Discussion Comparing ligand binding sites through sequence identity calculation could be an effective approach to identify conserved orthologs as drug binding pockets have shown higher level of conservation among various species. By using this approach we could avoid the problems associated with full sequence comparison. We identified essential metabolic enzymes among ESKAPE pathogens that share high sequence identity in their putative drug binding pockets (up to 100%), of which known inhibitors can potentially antagonize these identical pockets in the various species in a similar manner. PMID:28948099

  11. Analysis of drug binding pockets and repurposing opportunities for twelve essential enzymes of ESKAPE pathogens.

    PubMed

    Naz, Sadia; Ngo, Tony; Farooq, Umar; Abagyan, Ruben

    2017-01-01

    The rapid increase in antibiotic resistance by various bacterial pathogens underlies the significance of developing new therapies and exploring different drug targets. A fraction of bacterial pathogens abbreviated as ESKAPE by the European Center for Disease Prevention and Control have been considered a major threat due to the rise in nosocomial infections. Here, we compared putative drug binding pockets of twelve essential and mostly conserved metabolic enzymes in numerous bacterial pathogens including those of the ESKAPE group and Mycobacterium tuberculosis . The comparative analysis will provide guidelines for the likelihood of transferability of the inhibitors from one species to another. Nine bacterial species including six ESKAPE pathogens, Mycobacterium tuberculosis along with Mycobacterium smegmatis and Eschershia coli , two non-pathogenic bacteria, have been selected for drug binding pocket analysis of twelve essential enzymes. The amino acid sequences were obtained from Uniprot, aligned using ICM v3.8-4a and matched against the Pocketome encyclopedia. We used known co-crystal structures of selected target enzyme orthologs to evaluate the location of their active sites and binding pockets and to calculate a matrix of pairwise sequence identities across each target enzyme across the different species. This was used to generate sequence maps. High sequence identity of enzyme binding pockets, derived from experimentally determined co-crystallized structures, was observed among various species. Comparison at both full sequence level and for drug binding pockets of key metabolic enzymes showed that binding pockets are highly conserved (sequence similarity up to 100%) among various ESKAPE pathogens as well as Mycobacterium tuberculosis . Enzymes orthologs having conserved binding sites may have potential to interact with inhibitors in similar way and might be helpful for design of similar class of inhibitors for a particular species. The derived pocket alignments and distance-based maps provide guidelines for drug discovery and repurposing. In addition they also provide recommendations for the relevant model bacteria that may be used for initial drug testing. Comparing ligand binding sites through sequence identity calculation could be an effective approach to identify conserved orthologs as drug binding pockets have shown higher level of conservation among various species. By using this approach we could avoid the problems associated with full sequence comparison. We identified essential metabolic enzymes among ESKAPE pathogens that share high sequence identity in their putative drug binding pockets (up to 100%), of which known inhibitors can potentially antagonize these identical pockets in the various species in a similar manner.

  12. Phenotypic variation in Lactococcus lactis subsp. lactis isolates derived from intestinal tracts of marine and freshwater fish.

    PubMed

    Itoi, S; Yuasa, K; Washio, S; Abe, T; Ikuno, E; Sugita, H

    2009-09-01

    We compared phenotypic characteristics of Lactococcus lactis subsp. lactis derived from different sources including the intestinal tract of marine fish and freshwater fish, and cheese starter culture. In the phylogenetic analysis based on partial 16S rRNA gene nucleotide sequences (1371 bp), freshwater fish-, marine fish- and cheese starter culture-derived strains were identical to that of L. lactis subsp. lactis previously reported. Fermentation profiles determined using the API 50 CH system were similar except for fermentation of several sugars including l-arabinose, mannitol, amygdalin, saccharose, trehalose, inulin and gluconate. The strains did have distinct levels of halotolerance: marine fish-derived strains > cheese starter-derived strain > freshwater fish-derived isolate. Lactococcus lactis subsp. lactis showed extensive diversity in phenotypic adaptation to various environments. The phenotypic properties of these strains suggested that L. lactis subsp. lactis strains from fish intestine have additional functions compared with the cheese starter-derived strain that has previously described. The unique phenotypic traits of the fish intestinal tract-derived L. lactis subsp. lactis might make them useful as a probiotics in aquaculture, and contribute to the development of functional foods and novel food additives, since the strains derived from fish intestines might have additional functions such as antibacterial activity.

  13. Assembly of the Lactuca sativa, L. cv. Tizian draft genome sequence reveals differences within major resistance complex 1 as compared to the cv. Salinas reference genome.

    PubMed

    Verwaaijen, Bart; Wibberg, Daniel; Nelkner, Johanna; Gordin, Miriam; Rupp, Oliver; Winkler, Anika; Bremges, Andreas; Blom, Jochen; Grosch, Rita; Pühler, Alfred; Schlüter, Andreas

    2018-02-10

    Lettuce (Lactuca sativa, L.) is an important annual plant of the family Asteraceae (Compositae). The commercial lettuce cultivar Tizian has been used in various scientific studies investigating the interaction of the plant with phytopathogens or biological control agents. Here, we present the de novo draft genome sequencing and gene prediction for this specific cultivar derived from transcriptome sequence data. The assembled scaffolds amount to a size of 2.22 Gb. Based on RNAseq data, 31,112 transcript isoforms were identified. Functional predictions for these transcripts were determined within the GenDBE annotation platform. Comparison with the cv. Salinas reference genome revealed a high degree of sequence similarity on genome and transcriptome levels, with an average amino acid identity of 99%. Furthermore, it was observed that two large regions are either missing or are highly divergent within the cv. Tizian genome compared to cv. Salinas. One of these regions covers the major resistance complex 1 region of cv. Salinas. The cv. Tizian draft genome sequence provides a valuable resource for future functional and transcriptome analyses focused on this lettuce cultivar. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Predicting RNA-protein binding sites and motifs through combining local and global deep convolutional neural networks.

    PubMed

    Pan, Xiaoyong; Shen, Hong-Bin

    2018-05-02

    RNA-binding proteins (RBPs) take over 5∼10% of the eukaryotic proteome and play key roles in many biological processes, e.g. gene regulation. Experimental detection of RBP binding sites is still time-intensive and high-costly. Instead, computational prediction of the RBP binding sites using pattern learned from existing annotation knowledge is a fast approach. From the biological point of view, the local structure context derived from local sequences will be recognized by specific RBPs. However, in computational modeling using deep learning, to our best knowledge, only global representations of entire RNA sequences are employed. So far, the local sequence information is ignored in the deep model construction process. In this study, we present a computational method iDeepE to predict RNA-protein binding sites from RNA sequences by combining global and local convolutional neural networks (CNNs). For the global CNN, we pad the RNA sequences into the same length. For the local CNN, we split a RNA sequence into multiple overlapping fixed-length subsequences, where each subsequence is a signal channel of the whole sequence. Next, we train deep CNNs for multiple subsequences and the padded sequences to learn high-level features, respectively. Finally, the outputs from local and global CNNs are combined to improve the prediction. iDeepE demonstrates a better performance over state-of-the-art methods on two large-scale datasets derived from CLIP-seq. We also find that the local CNN run 1.8 times faster than the global CNN with comparable performance when using GPUs. Our results show that iDeepE has captured experimentally verified binding motifs. https://github.com/xypan1232/iDeepE. xypan172436@gmail.com or hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online.

  15. Study on multiple-hops performance of MOOC sequences-based optical labels for OPS networks

    NASA Astrophysics Data System (ADS)

    Zhang, Chongfu; Qiu, Kun; Ma, Chunli

    2009-11-01

    In this paper, we utilize a new study method that is under independent case of multiple optical orthogonal codes to derive the probability function of MOOCS-OPS networks, discuss the performance characteristics for a variety of parameters, and compare some characteristics of the system employed by single optical orthogonal code or multiple optical orthogonal codes sequences-based optical labels. The performance of the system is also calculated, and our results verify that the method is effective. Additionally it is found that performance of MOOCS-OPS networks would, negatively, be worsened, compared with single optical orthogonal code-based optical label for optical packet switching (SOOC-OPS); however, MOOCS-OPS networks can greatly enlarge the scalability of optical packet switching networks.

  16. High-throughput sequencing of natively paired antibody chains provides evidence for original antigenic sin shaping the antibody response to influenza vaccination.

    PubMed

    Tan, Yann-Chong; Blum, Lisa K; Kongpachith, Sarah; Ju, Chia-Hsin; Cai, Xiaoyong; Lindstrom, Tamsin M; Sokolove, Jeremy; Robinson, William H

    2014-03-01

    We developed a DNA barcoding method to enable high-throughput sequencing of the cognate heavy- and light-chain pairs of the antibodies expressed by individual B cells. We used this approach to elucidate the plasmablast antibody response to influenza vaccination. We show that >75% of the rationally selected plasmablast antibodies bind and neutralize influenza, and that antibodies from clonal families, defined by sharing both heavy-chain VJ and light-chain VJ sequence usage, do so most effectively. Vaccine-induced heavy-chain VJ regions contained on average >20 nucleotide mutations as compared to their predicted germline gene sequences, and some vaccine-induced antibodies exhibited higher binding affinities for hemagglutinins derived from prior years' seasonal influenza as compared to their affinities for the immunization strains. Our results show that influenza vaccination induces the recall of memory B cells that express antibodies that previously underwent affinity maturation against prior years' seasonal influenza, suggesting that 'original antigenic sin' shapes the antibody response to influenza vaccination. Published by Elsevier Inc.

  17. Combining real-time PCR and next-generation DNA sequencing to provide quantitative comparisons of fungal aerosol populations

    NASA Astrophysics Data System (ADS)

    Dannemiller, Karen C.; Lang-Yona, Naama; Yamamoto, Naomichi; Rudich, Yinon; Peccia, Jordan

    2014-02-01

    We examined fungal communities associated with the PM10 mass of Rehovot, Israel outdoor air samples collected in the spring and fall seasons. Fungal communities were described by 454 pyrosequencing of the internal transcribed spacer (ITS) region of the fungal ribosomal RNA encoding gene. To allow for a more quantitative comparison of fungal exposure in humans, the relative abundance values of specific taxa were transformed to absolute concentrations through multiplying these values by the sample's total fungal spore concentration (derived from universal fungal qPCR). Next, the sequencing-based absolute concentrations for Alternaria alternata, Cladosporium cladosporioides, Epicoccum nigrum, and Penicillium/Aspergillus spp. were compared to taxon-specific qPCR concentrations for A. alternata, C. cladosporioides, E. nigrum, and Penicillium/Aspergillus spp. derived from the same spring and fall aerosol samples. Results of these comparisons showed that the absolute concentration values generated from pyrosequencing were strongly associated with the concentration values derived from taxon-specific qPCR (for all four species, p < 0.005, all R > 0.70). The correlation coefficients were greater for species present in higher concentrations. Our microbial aerosol population analyses demonstrated that fungal diversity (number of fungal operational taxonomic units) was higher in the spring compared to the fall (p = 0.02), and principal coordinate analysis showed distinct seasonal differences in taxa distribution (ANOSIM p = 0.004). Among genera containing allergenic and/or pathogenic species, the absolute concentrations of Alternaria, Aspergillus, Fusarium, and Cladosporium were greater in the fall, while Cryptococcus, Penicillium, and Ulocladium concentrations were greater in the spring. The transformation of pyrosequencing fungal population relative abundance data to absolute concentrations can improve next-generation DNA sequencing-based quantitative aerosol exposure assessment.

  18. A chirality change in XPC- and Sfi1-derived peptides affects their affinity for centrin.

    PubMed

    Grecu, Dora; Irudayaraj, Victor Paul Raj; Martinez-Sanz, Juan; Mallet, Jean-Maurice; Assairi, Liliane

    2016-04-01

    The Ca(2+)-binding protein centrin binds to a hydrophobic motif (W(1)xxL(4)xxxL(8)) included in the sequence of several cellular targets: XPC (xeroderma pigmentosum group C protein), Sfi1 (suppressor of fermentation-induced loss of stress resistance protein1), and Sac3 [the central component of the transcription and mRNA export (TREX-2) complex]. However, centrin binding occurs in a reversed orientation (L(8)xxxL(4)xxW(1)) for Sfi1 and Sac3 compared with XPC. Because D-peptides have been investigated for future therapeutic use, we analyzed their centrin-binding properties. Their affinity for centrin was measured using isothermal titration calorimetry. The chirality change in the target-derived peptides affected their ability to bind centrin in a specific manner depending on the sequence orientation of the centrin-binding motif. In contrast to L-XPC-P10, D-XPC-P10 bound C-HsCen1 in a Ca(2+)-dependent manner and to a lesser extent. D-XPC-P10 exhibited a reduced affinity for C-HsCen1 (Ka=0.064 × 10(6) M(-1)) by a factor of 2000 compared with L-XPC-P10 (Ka=132 × 10(6) M(-1)). D-peptides have a lower affinity than L-peptides for centrin, and the strength of this affinity depends on the sequence orientation of the target-derived peptides. The residual affinity observed for D-XPC suggests that the use of d-peptides represents a promising strategy for inhibiting centrin binding to its targets. Copyright © 2016 Elsevier Inc. All rights reserved.

  19. A white dwarf companion to the main-sequence star 4 Omicron(1) Orionis and the binary hypothesis for the origin of peculiar red giants

    NASA Technical Reports Server (NTRS)

    Ake, Thomas B.; Johnson, Hollis R.

    1988-01-01

    Ultraviolet spectra of the peculiar red giants (PRGs) called MS stars are investigated, and the discovery of a white dwarf (WD) companion to the MS star 4 Omicron(1) Orionis is reported. The observations and data analysis are discussed and compared with those for field WDs in order to derive parameters for the WD and the luminosity of the primary. Detection limits for the other MS stars investigated are derived, and the binary hypothesis for PRGs is reviewed.

  20. Crypteins derived from the mouse neuropeptide FF (NPFF)A precursor display NPFF-like effects in nociceptive tests in mice.

    PubMed

    Kotlinska, Jolanta H; Gibula-Bruzda, Ewa; Suder, Piotr; Wasielak, Magdalena; Bray, Lauriane; Raoof, Hana; Bodzon-Kulakowska, Anna; Silberring, Jerzy

    2012-07-01

    NPFF precursor, pro-NPFF(A) contains three known bioactive sequences: NPFF (FLFQPQRF-NH(2)), neuropeptide AF (NPAF; AGEGLSSPFWSLAAPQRF-NH(2)) and neuropeptide SF (NPSF; SLAAPQRF-NH(2)). The key-feature of these fragments is their common PQRF-amidated sequence at their C termini. Here, we evaluated the biological activity of two other sequences derived from the mouse NPFF(A) precursor, that does not have PQRF-amidated C-terminus. One peptide was residing between positions 85 and 99 in the mice pro-NPFF(A). This peptide was referred to as neuropeptide SA (NPSA; SAWGSWSKEQLNPQA), assigned due to its flanking amino acids. Another sequence used in the experiments was N-terminal fragment of NPSA, here referred to as neuropeptide SS (NPSS; SAWGSWS). These two peptides, classified as crypteins, were synthesized and tested in the hot-plate and tail immersion tests in mice for their pharmacological activity in morphine-induced antinociception. The effects of both crypteins were compared to NPFF. Our experiments indicated that both crypteins inhibited morphine antinociception and their effects were reversed by RF9, an antagonist of NPFF receptors. These data show that NPSA and NPSS possess NPFF-like anti-opioid activity in these behavioral tests. Copyright © 2012 Elsevier Inc. All rights reserved.

  1. Role of Mitochondrial Inheritance on Prostate Cancer Outcome in African American Men. Addendum

    DTIC Science & Technology

    2016-11-01

    DNA sequencing technique developed by our collaborator using single amplicon long-range PCR that permits deep coverage (10,000-20,000X on average) of...the mitochondrial genome. We have sequenced 652 samples derived from frozen fully using this technology. The additional DNA samples derived from...paraffin embedded (FFPE) tissue were more challenging, but have now been sequenced . Mapping of DNA variants in our sequenced genomes to mitochondrial

  2. A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome

    DOE PAGES

    Chapman, Jarrod A.; Mascher, Martin; Buluc, Aydin; ...

    2015-01-31

    We report that polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible tomore » construct a mapping population.« less

  3. Spectrophotometry of comets Giacobini-Zinner and Halley

    NASA Technical Reports Server (NTRS)

    Tegler, Stephen C.; O'Dell, C. R.

    1987-01-01

    Optical window spectrophotometry was performed on comets Giacobini-Zinner and Halley over the interval 300-1000 nm. Band and band-sequence fluxes were obtained for the brightest features of OH, CN, NH, and C2, special care having been given to determinations of extinction, instrumental sensitivities, and corrections for Fraunhofer lines. C2 Swan band-sequence flux ratios were determined with unprecedented accuracy and compared with the predictions of the detailed equilibrium models of Krishna Swamy et al. (1977, 1979, 1981, and 1987). It is found that these band sequences do not agree with the predictions, which calls into question the assumptions made in deriving the model, namely resonance fluorescence statistical equilibrium. Suggestions are made as to how to resolve this discrepancy.

  4. A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chapman, Jarrod A.; Mascher, Martin; Buluc, Aydin

    We report that polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible tomore » construct a mapping population.« less

  5. 16S rRNA Gene Sequence Analysis of Drinking Water Using RNA and DNA Extracts as Targets for Clone Library Development

    EPA Science Inventory

    The bacterial composition of chlorinated drinking water was analyzed using 16S rRNA gene clone libraries derived from DNA extracts of 12 samples and compared to clone libraries previously generated using RNA extracts from the same samples. Phylogenetic analysis of 761 DNA-based ...

  6. Theory and practical application of out of sequence measurements with results for multi-static tracking

    NASA Astrophysics Data System (ADS)

    Iny, David

    2007-09-01

    This paper addresses the out-of-sequence measurement (OOSM) problem associated with multiple platform tracking systems. The problem arises due to different transmission delays in communication of detection reports across platforms. Much of the literature focuses on the improvement to the state estimate by incorporating the OOSM. As the time lag increases, there is diminishing improvement to the state estimate. However, this paper shows that optimal processing of OOSMs may still be beneficial by improving data association as part of a multi-target tracker. This paper derives exact multi-lag algorithms with the property that the standard log likelihood track scoring is independent of the order in which the measurements are processed. The orthogonality principle is applied to generalize the method of Bar- Shalom in deriving the exact A1 algorithm for 1-lag estimation. Theory is also developed for optimal filtering of time averaged measurements and measurements correlated through periodic updates of a target aim-point. An alternative derivation of the multi-lag algorithms is also achieved using an efficient variant of the augmented state Kalman filter (AS-KF). This results in practical and reasonably efficient multi-lag algorithms. Results are compared to a well known ad hoc algorithm for incorporating OOSMs. Finally, the paper presents some simulated multi-target multi-static scenarios where there is a benefit to processing the data out of sequence in order to improve pruning efficiency.

  7. The first genetic characterization of a D4 measles virus strain derived from a patient with subacute sclerosing panencephalitis.

    PubMed

    Ivancic-Jelecki, Jelena; Baricevic, Marijana; Santak, Maja; Harcet, Matija; Tešović, Goran; Marusic Della Marina, Branka; Forcic, Dubravko

    2013-07-01

    Measles virus (MV) strains derived from patients with subacute sclerosing panencephalitis (SSPE), SSPE strains, possess numerous mutations when compared to viruses belonging to the same genotype and circulating in similar time period. Although many SSPE strains have been extensively characterized, none of them belongs to D4 genotype which currently predominates in Europe where it has caused a number of recent outbreaks/epidemics. We sequenced an MV derived from a patient with long-term SSPE; the virus was named MVs/Zagreb.CRO/30.06[D4] (SSPE). Initial genetic analysis showed that it belongs to D4 genotype. The sequences of genes encoding matrix and fusion proteins indicate premature protein terminations. Putative hemagglutin (H) protein is lengthened for 20 amino acids, which is the longest H protein elongation so far found in SSPE viruses. Nucleotides 1421 A, 1422 G, 1507 C and 1542 C in nucleoprotein gene open reading frame seem to be specific for this D4 strain, differentiating it from other D4 non-SSPE strains. Besides, a unique mutation at position 543 of H protein was found, histidine instead of tyrosine. As persistent MV infections are initially established by "normal" wild-type MV strains, the presented comparative analyses describe alterations that could be involved in the maintenance of persistent infection, disease development and progression. Copyright © 2013 Elsevier B.V. All rights reserved.

  8. Comparative Sequence Analysis of Multidrug-Resistant IncA/C Plasmids from Salmonella enterica.

    PubMed

    Hoffmann, Maria; Pettengill, James B; Gonzalez-Escalona, Narjol; Miller, John; Ayers, Sherry L; Zhao, Shaohua; Allard, Marc W; McDermott, Patrick F; Brown, Eric W; Monday, Steven R

    2017-01-01

    Determinants of multidrug resistance (MDR) are often encoded on mobile elements, such as plasmids, transposons, and integrons, which have the potential to transfer among foodborne pathogens, as well as to other virulent pathogens, increasing the threats these traits pose to human and veterinary health. Our understanding of MDR among Salmonella has been limited by the lack of closed plasmid genomes for comparisons across resistance phenotypes, due to difficulties in effectively separating the DNA of these high-molecular weight, low-copy-number plasmids from chromosomal DNA. To resolve this problem, we demonstrate an efficient protocol for isolating, sequencing and closing IncA/C plasmids from Salmonella sp. using single molecule real-time sequencing on a Pacific Biosciences (Pacbio) RS II Sequencer. We obtained six Salmonella enterica isolates from poultry, representing six different serovars, each exhibiting the MDR-Ampc resistance profile. Salmonella plasmids were obtained using a modified mini preparation and transformed with Escherichia coli DH10Br. A Qiagen Large-Construct kit™ was used to recover highly concentrated and purified plasmid DNA that was sequenced using PacBio technology. These six closed IncA/C plasmids ranged in size from 104 to 191 kb and shared a stable, conserved backbone containing 98 core genes, with only six differences among those core genes. The plasmids encoded a number of antimicrobial resistance genes, including those for quaternary ammonium compounds and mercury. We then compared our six IncA/C plasmid sequences: first with 14 IncA/C plasmids derived from S. enterica available at the National Center for Biotechnology Information (NCBI), and then with an additional 38 IncA/C plasmids derived from different taxa. These comparisons allowed us to build an evolutionary picture of how antimicrobial resistance may be mediated by this common plasmid backbone. Our project provides detailed genetic information about resistance genes in plasmids, advances in plasmid sequencing, and phylogenetic analyses, and important insights about how MDR evolution occurs across diverse serotypes from different animal sources, particularly in agricultural settings where antimicrobial drug use practices vary.

  9. Numerical Calculation of the Spectrum of the Severe (1%) Lighting Current and Its First Derivative

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brown, C G; Ong, M M; Perkins, M P

    2010-02-12

    Recently, the direct-strike lighting environment for the stockpile-to-target sequence was updated [1]. In [1], the severe (1%) lightning current waveforms for first and subsequent return strokes are defined based on Heidler's waveform. This report presents numerical calculations of the spectra of those 1% lightning current waveforms and their first derivatives. First, the 1% lightning current models are repeated here for convenience. Then, the numerical method for calculating the spectra is presented and tested. The test uses a double-exponential waveform and its first derivative, which we fit to the previous 1% direct-strike lighting environment from [2]. Finally, the resulting spectra aremore » given and are compared with those of the double-exponential waveform and its first derivative.« less

  10. Morphological and molecular characterization of a marine fish trypanosome from South Africa, including its development in a leech vector.

    PubMed

    Hayes, Polly M; Lawton, Scott P; Smit, Nico J; Gibson, Wendy C; Davies, Angela J

    2014-01-24

    Trypanosomes are ubiquitous blood parasites of marine and freshwater fishes, typically transmitted by aquatic leeches. Phylogenetic studies have been dominated by examples derived from freshwater fishes, with few marine representatives. Furthermore, life cycle studies on marine fish trypanosomes have focused on those of the northern hemisphere. In this investigation, we have examined the life cycle and molecular taxonomy of a marine fish trypanosome from South Africa. To locate trypanosome stages, leeches were removed from fishes captured on the west and south coasts of South Africa, and fish blood films and leech squashes were Giemsa-stained and screened; leeches were also examined histologically. To determine whether trypanosome stages in fishes and leeches were of the same genotype, DNA was extracted from Giemsa-stained fish blood films and leech squashes, and from fish whole blood. Fragments of the 18S rRNA gene were amplified by PCR using trypanosome-specific primers and sequenced. Resulting sequence data were compared with each other and with published trypanosome 18S rDNA sequences, and used for phylogenetic analysis. Trypanosomes were detected in blood films from fishes of the families Clinidae, Blenniidae and Gobiidae. The flagellates ranged in size and staining properties within the films and across fish hosts. In squashes and histological sections of adult and juvenile leeches, identified as Zeylanicobdella arugamensis, trypanosome developmental stages were predominantly slender epimastigotes. Sequence data showed that trypanosomes derived from fishes were identical, irrespective of whether they were small or large forms; sequences derived largely from leech epimastigotes were also identical to those obtained from fish trypanosomes. Fish and leech trypanosome sequences fell into a marine fish aquatic clade, and aligned most closely with two trypanosome sequences from marine fishes off Norway. Combined morphological and molecular methods indicate that the trypanosomes examined here represent a single pleomorphic species, rather than the three species described originally. This species is identified as Trypanosoma nudigobii Fantham, 1919 with the leech Z. arugamensis as its vector, and T. capigobii Fantham, 1919 and T. blenniclini Fantham, 1930 are regarded as junior synonyms of the species. Phylogenetic analysis establishes its affinity with marine fish trypanosomes off Norway.

  11. Pancreatic cancer cell lines as patient-derived avatars: genetic characterisation and functional utility.

    PubMed

    Knudsen, Erik S; Balaji, Uthra; Mannakee, Brian; Vail, Paris; Eslinger, Cody; Moxom, Christopher; Mansour, John; Witkiewicz, Agnieszka K

    2018-03-01

    Pancreatic ductal adenocarcinoma (PDAC) is a therapy recalcitrant disease with the worst survival rate of common solid tumours. Preclinical models that accurately reflect the genetic and biological diversity of PDAC will be important for delineating features of tumour biology and therapeutic vulnerabilities. 27 primary PDAC tumours were employed for genetic analysis and development of tumour models. Tumour tissue was used for derivation of xenografts and cell lines. Exome sequencing was performed on the originating tumour and developed models. RNA sequencing, histological and functional analyses were employed to determine the relationship of the patient-derived models to clinical presentation of PDAC. The cohort employed captured the genetic diversity of PDAC. From most cases, both cell lines and xenograft models were developed. Exome sequencing confirmed preservation of the primary tumour mutations in developed cell lines, which remained stable with extended passaging. The level of genetic conservation in the cell lines was comparable to that observed with patient-derived xenograft (PDX) models. Unlike historically established PDAC cancer cell lines, patient-derived models recapitulated the histological architecture of the primary tumour and exhibited metastatic spread similar to that observed clinically. Detailed genetic analyses of tumours and derived models revealed features of ex vivo evolution and the clonal architecture of PDAC. Functional analysis was used to elucidate therapeutic vulnerabilities of relevance to treatment of PDAC. These data illustrate that with the appropriate methods it is possible to develop cell lines that maintain genetic features of PDAC. Such models serve as important substrates for analysing the significance of genetic variants and create a unique biorepository of annotated cell lines and xenografts that were established simultaneously from same primary tumour. These models can be used to infer genetic and empirically determined therapeutic sensitivities that would be germane to the patient. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  12. On the Accuracy of Atmospheric Parameter Determination in BAFGK Stars

    NASA Astrophysics Data System (ADS)

    Ryabchikova, T.; Piskunov, N.; Shulyak, D.

    2015-04-01

    During the past few years, many papers determining the atmospheric parameters in FGK stars appeared in the literature where the accuracy of effective temperatures is given as 20-40 K. For main sequence stars within the 5 000-13 000 K temperature range, we have performed a comparative analysis of the parameters derived from the spectra by using the SME (Spectroscopy Made Easy) package and those found in the literature. Our sample includes standard stars Sirius, Procyon, δ Eri, and the Sun. Combining different spectral regions in the fitting procedure, we investigated an effect different atomic species have on the derived atmospheric parameters. The temperature difference may exceed 100 K depending on the spectral regions used in the SME procedure. It is shown that the atmospheric parameters derived with the SME procedure which includes wings of hydrogen lines in fitting agrees better with the results derived by the other methods and tools across a large part of the main sequence. For three stars—π Cet, 21 Peg, and Procyon—the atmospheric parameters were also derived by fitting a calculated energy distribution to the observed one. We found a substantial difference in the parameters inferred from different sets and combinations of spectrophotometric observations. An intercomparison of our results and literature data shows that the average accuracy of effective temperature determination for cool stars and for the early B-stars is 70-85 K and 170-200 K, respectively.

  13. Assessment of fungal diversity in a water-damaged office building.

    PubMed

    Green, Brett J; Lemons, Angela R; Park, Yeonmi; Cox-Ganser, Jean M; Park, Ju-Hyeong

    2017-04-01

    Recent studies have described fungal communities in indoor environments using gene sequencing-based approaches. In this study, dust-borne fungal communities were elucidated from a water-damaged office building located in the northeastern region of the United States using internal transcribed spacer (ITS) rRNA gene sequencing. Genomic DNA was extracted from 5 mg of floor dust derived from 22 samples collected from either the lower floors (n = 8) or a top floor (n = 14) of the office building. ITS gene sequencing resolved a total of 933 ITS sequences and was clustered into 216 fungal operational taxonomic units (OTUs). Analysis of fungal OTUs at the 97% similarity threshold showed a difference between the lower and top floors that was marginally significant (p = 0.049). Species richness and diversity indices were reduced in the lower floor samples compared to the top floor samples and there was a high degree of compositional dissimilarity within and between the two different areas within the building. Fungal OTUs were placed in the phyla Ascomycota (55%), Basidiomycota (41%), Zygomycota (3%), Glomeromycota (0.4%), Chytridiomycota (0.3%), and unassigned fungi (0.5%). The Ascomycota classes with the highest relative abundances included the Dothideomycetes (30%) and Eurotiomycetes (16%). The Basidiomycota consisted of the classes Ustilaginomycetes (14%), Tremellomycetes (11%), and Agaricomycetes (8%). Sequence reads derived from the plant pathogen Ustilago syntherismae were the most abundant in the analysis as were obligate Basidiomycota yeast species that accounted for 12% and 11% of fungal ITS sequences, respectively. ITS gene sequencing provides additional insight into the diversity of fungal OTUs. These data further highlight the contribution of fungi placed in the phylum Basidiomycota, obligate yeasts, as well as xerophilic species that are typically not resolved using traditional culture methods.

  14. Complete genome sequences and comparative genome analysis of Lactobacillus plantarum strain 5-2 isolated from fermented soybean.

    PubMed

    Liu, Chen-Jian; Wang, Rui; Gong, Fu-Ming; Liu, Xiao-Feng; Zheng, Hua-Jun; Luo, Yi-Yong; Li, Xiao-Ran

    2015-12-01

    Lactobacillus plantarum is an important probiotic and is mostly isolated from fermented foods. We sequenced the genome of L. plantarum strain 5-2, which was derived from fermented soybean isolated from Yunnan province, China. The strain was determined to contain 3114 genes. Fourteen complete insertion sequence (IS) elements were found in 5-2 chromosome. There were 24 DNA replication proteins and 76 DNA repair proteins in the 5-2 genome. Consistent with the classification of L. plantarum as a facultative heterofermentative lactobacillus, the 5-2 genome encodes key enzymes required for the EMP (Embden-Meyerhof-Parnas) and phosphoketolase (PK) pathways. Several components of the secretion machinery are found in the 5-2 genome, which was compared with L. plantarum ST-III, JDM1 and WCFS1. Most of the specific proteins in the four genomes appeared to be related to their prophage elements. Copyright © 2015 Elsevier Inc. All rights reserved.

  15. Non-LTE Stellar Population Synthesis of Globular Clusters Using Synthetic Integrated Light Spectra. I. Constructing the IL Spectra

    NASA Astrophysics Data System (ADS)

    Young, Mitchell. E.; Short, C. Ian

    2017-02-01

    We present an investigation of the globular cluster population synthesis method of McWilliam & Bernstein, focusing on the impact of non-LTE (NLTE) modeling effects and color-magnitude diagram (CMD) discretization. Johnson-Cousins-Bessel U - B, B-V, V-I, and J-K colors are produced for 96 synthetic integrated light (IL) spectra with two different discretization prescriptions and three degrees of NLTE treatment. These color values are used to compare NLTE- and LTE-derived population ages. Relative contributions of different spectral types to the IL spectra for different wavebands are measured. IL NLTE spectra are shown to be more luminous in the UV and optical than LTE spectra, but show stronger absorption features in the IR. The main features showing discrepancies between NLTE and LTE IL spectra may be attributed to light metals, primarily Fe I, Ca I, and Ti I, as well as TiO molecular bands. Main-sequence stars are shown to have negligible NLTE effects at IR wavelengths compared to more evolved stars. Photometric color values are shown to vary at the millimagnitude level as a function of CMD discretization. Finer CMD sampling for the upper main sequence and turnoff, base of the red giant branch, and the horizontal branch minimizes this variation. Differences in ages derived from LTE and NLTE IL spectra are found to range from 0.55 to 2.54 Gyr, comparable to the uncertainty in GC ages derived from color indices with observational uncertainties of 0.01 mag, the limiting precision of the Harris catalog.

  16. DNA sequences and composition from 12 BAC clones-derived MUSB SSR markers mapped to cotton (Gossypium Hirsutum L. x G. Barbadense L.)chromosomes 11 and 21

    USDA-ARS?s Scientific Manuscript database

    To discover resistance (R) and/or pathogen-induced (PR) genes involved in disease response, 12 bacterial artificial chromosome (BAC) clones from cv. Acala Maxxa (G. hirsutum) were sequenced at the Clemson University, Genomics Institute, Clemson, SC. These BACs derived MUSB single sequence repeat (SS...

  17. snpAD: An ancient DNA genotype caller.

    PubMed

    Prüfer, Kay

    2018-06-21

    The study of ancient genomes can elucidate the evolutionary past. However, analyses are complicated by base-modifications in ancient DNA molecules that result in errors in DNA sequences. These errors are particularly common near the ends of sequences and pose a challenge for genotype calling. I describe an iterative method that estimates genotype frequencies and errors along sequences to allow for accurate genotype calling from ancient sequences. The implementation of this method, called snpAD, performs well on high-coverage ancient data, as shown by simulations and by subsampling the data of a high-coverage Neandertal genome. Although estimates for low-coverage genomes are less accurate, I am able to derive approximate estimates of heterozygosity from several low-coverage Neandertals. These estimates show that low heterozygosity, compared to modern humans, was common among Neandertals. The C ++ code of snpAD is freely available at http://bioinf.eva.mpg.de/snpAD/. Supplementary data are available at Bioinformatics online.

  18. RNA circularization reveals terminal sequence heterogeneity in a double-stranded RNA virus.

    PubMed

    Widmer, G

    1993-03-01

    Double-stranded RNA viruses (dsRNA), termed LRV1, have been found in several strains of the protozoan parasite Leishmania. With the aim of constructing a full-length cDNA copy of the viral genome, including its terminal sequences, a protocol based on PCR amplification across the 3'-5' junction of circularized RNA was developed. This method proved to be applicable to dsRNA. It provided a relatively simple alternative to one-sided PCR, without loss of specificity inherent in the use of generic primers. LRV1 terminal nucleotide sequences obtained by this method showed a considerable variation in length, particularly at the 5' end of the positive strand, as well as the potential for forming 3' overhangs. The opposite genomic end terminates in 0, 1, or 2 TCA trinucleotide repeats. These results are compared with terminal sequences derived from one-sided PCR experiments.

  19. Molecular Epidemiological Survey and Genetic Characterization of Anaplasma Species in Mongolian Livestock.

    PubMed

    Ochirkhuu, Nyamsuren; Konnai, Satoru; Odbileg, Raadan; Murata, Shiro; Ohashi, Kazuhiko

    2017-08-01

    Anaplasma species are obligate intracellular rickettsial pathogens that cause great economic loss to the animal industry. Few studies on Anaplasma infections in Mongolian livestock have been conducted. This study examined the prevalence of Anaplasma marginale, Anaplasma ovis, Anaplasma phagocytophilum, and Anaplasma bovis by polymerase chain reaction assay in 928 blood samples collected from native cattle and dairy cattle (Bos taurus), yaks (Bos grunniens), sheep (Ovis aries), and goats (Capra aegagrus hircus) in four provinces of Ulaanbaatar city in Mongolia. We genetically characterized positive samples through sequencing analysis based on the heat-shock protein groEL, major surface protein 4 (msp4), and 16S rRNA genes. Only A. ovis was detected in Mongolian livestock (cattle, yaks, sheep, and goats), with 413 animals (44.5%) positive for groEL and 308 animals (33.2%) positive for msp4 genes. In the phylogenetic tree, we separated A. ovis sequences into two distinct clusters based on the groEL gene. One cluster comprised sequences derived mainly from sheep and goats, which was similar to that in A. ovis isolates from other countries. The other divergent cluster comprised sequences derived from cattle and yaks and appeared to be newly branched from that in previously published single isolates in Mongolian cattle. In addition, the msp4 gene of A. ovis using same and different samples with groEL gene of the pathogen demonstrated that all sequences derived from all animal species, except for three sequences derived from cattle and yak, were clustered together, and were identical or similar to those in isolates from other countries. We used 16S rRNA gene sequences to investigate the genetically divergent A. ovis and identified high homology of 99.3-100%. However, the sequences derived from cattle did not match those derived from sheep and goats. The results of this study on the prevalence and molecular characterization of A. ovis in Mongolian livestock can facilitate the control of infectious diseases in livestock.

  20. Incorporating evolution of transcription factor binding sites into annotated alignments.

    PubMed

    Bais, Abha S; Grossmann, Stefen; Vingron, Martin

    2007-08-01

    Identifying transcription factor binding sites (TFBSs) is essential to elucidate putative regulatory mechanisms. A common strategy is to combine cross-species conservation with single sequence TFBS annotation to yield "conserved TFBSs". Most current methods in this field adopt a multi-step approach that segregates the two aspects. Again, it is widely accepted that the evolutionary dynamics of binding sites differ from those of the surrounding sequence. Hence, it is desirable to have an approach that explicitly takes this factor into account. Although a plethora of approaches have been proposed for the prediction of conserved TFBSs, very few explicitly model TFBS evolutionary properties, while additionally being multi-step. Recently, we introduced a novel approach to simultaneously align and annotate conserved TFBSs in a pair of sequences. Building upon the standard Smith-Waterman algorithm for local alignments, SimAnn introduces additional states for profiles to output extended alignments or annotated alignments. That is, alignments with parts annotated as gaplessly aligned TFBSs (pair-profile hits)are generated. Moreover,the pair- profile related parameters are derived in a sound statistical framework. In this article, we extend this approach to explicitly incorporate evolution of binding sites in the SimAnn framework. We demonstrate the extension in the theoretical derivations through two position-specific evolutionary models, previously used for modelling TFBS evolution. In a simulated setting, we provide a proof of concept that the approach works given the underlying assumptions,as compared to the original work. Finally, using a real dataset of experimentally verified binding sites in human-mouse sequence pairs,we compare the new approach (eSimAnn) to an existing multi-step tool that also considers TFBS evolution. Although it is widely accepted that binding sites evolve differently from the surrounding sequences, most comparative TFBS identification methods do not explicitly consider this.Additionally, prediction of conserved binding sites is carried out in a multi-step approach that segregates alignment from TFBS annotation. In this paper, we demonstrate how the simultaneous alignment and annotation approach of SimAnn can be further extended to incorporate TFBS evolutionary relationships. We study how alignments and binding site predictions interplay at varying evolutionary distances and for various profile qualities.

  1. A Review of Multivariate Distributions for Count Data Derived from the Poisson Distribution.

    PubMed

    Inouye, David; Yang, Eunho; Allen, Genevera; Ravikumar, Pradeep

    2017-01-01

    The Poisson distribution has been widely studied and used for modeling univariate count-valued data. Multivariate generalizations of the Poisson distribution that permit dependencies, however, have been far less popular. Yet, real-world high-dimensional count-valued data found in word counts, genomics, and crime statistics, for example, exhibit rich dependencies, and motivate the need for multivariate distributions that can appropriately model this data. We review multivariate distributions derived from the univariate Poisson, categorizing these models into three main classes: 1) where the marginal distributions are Poisson, 2) where the joint distribution is a mixture of independent multivariate Poisson distributions, and 3) where the node-conditional distributions are derived from the Poisson. We discuss the development of multiple instances of these classes and compare the models in terms of interpretability and theory. Then, we empirically compare multiple models from each class on three real-world datasets that have varying data characteristics from different domains, namely traffic accident data, biological next generation sequencing data, and text data. These empirical experiments develop intuition about the comparative advantages and disadvantages of each class of multivariate distribution that was derived from the Poisson. Finally, we suggest new research directions as explored in the subsequent discussion section.

  2. Characterization of GM events by insert knowledge adapted re-sequencing approaches

    PubMed Central

    Yang, Litao; Wang, Congmao; Holst-Jensen, Arne; Morisset, Dany; Lin, Yongjun; Zhang, Dabing

    2013-01-01

    Detection methods and data from molecular characterization of genetically modified (GM) events are needed by stakeholders of public risk assessors and regulators. Generally, the molecular characteristics of GM events are incomprehensively revealed by current approaches and biased towards detecting transformation vector derived sequences. GM events are classified based on available knowledge of the sequences of vectors and inserts (insert knowledge). Herein we present three insert knowledge-adapted approaches for characterization GM events (TT51-1 and T1c-19 rice as examples) based on paired-end re-sequencing with the advantages of comprehensiveness, accuracy, and automation. The comprehensive molecular characteristics of two rice events were revealed with additional unintended insertions comparing with the results from PCR and Southern blotting. Comprehensive transgene characterization of TT51-1 and T1c-19 is shown to be independent of a priori knowledge of the insert and vector sequences employing the developed approaches. This provides an opportunity to identify and characterize also unknown GM events. PMID:24088728

  3. Characterization of GM events by insert knowledge adapted re-sequencing approaches.

    PubMed

    Yang, Litao; Wang, Congmao; Holst-Jensen, Arne; Morisset, Dany; Lin, Yongjun; Zhang, Dabing

    2013-10-03

    Detection methods and data from molecular characterization of genetically modified (GM) events are needed by stakeholders of public risk assessors and regulators. Generally, the molecular characteristics of GM events are incomprehensively revealed by current approaches and biased towards detecting transformation vector derived sequences. GM events are classified based on available knowledge of the sequences of vectors and inserts (insert knowledge). Herein we present three insert knowledge-adapted approaches for characterization GM events (TT51-1 and T1c-19 rice as examples) based on paired-end re-sequencing with the advantages of comprehensiveness, accuracy, and automation. The comprehensive molecular characteristics of two rice events were revealed with additional unintended insertions comparing with the results from PCR and Southern blotting. Comprehensive transgene characterization of TT51-1 and T1c-19 is shown to be independent of a priori knowledge of the insert and vector sequences employing the developed approaches. This provides an opportunity to identify and characterize also unknown GM events.

  4. Evolution of hard proteins in the sauropsid integument in relation to the cornification of skin derivatives in amniotes

    PubMed Central

    Alibardi, Lorenzo; Valle, Luisa Dalla; Nardi, Alessia; Toni, Mattia

    2009-01-01

    Hard skin appendages in amniotes comprise scales, feathers and hairs. The cell organization of these appendages probably derived from the localization of specialized areas of dermal–epidermal interaction in the integument. The horny scales and the other derivatives were formed from large areas of dermal–epidermal interaction. The evolution of these skin appendages was characterized by the production of specific coiled-coil keratins and associated proteins in the inter-filament matrix. Unlike mammalian keratin-associated proteins, those of sauropsids contain a double beta-folded sequence of about 20 amino acids, known as the core-box. The core-box shows 60%–95% sequence identity with known reptilian and avian proteins. The core-box determines the polymerization of these proteins into filaments indicated as beta-keratin filaments. The nucleotide and derived amino acid sequences for these sauropsid keratin-associated proteins are presented in conjunction with a hypothesis about their evolution in reptiles-birds compared to mammalian keratin-associated proteins. It is suggested that genes coding for ancestral glycine-serine-rich sequences of alpha-keratins produced a new class of small matrix proteins. In sauropsids, matrix proteins may have originated after mutation and enrichment in proline, probably in a central region of the ancestral protein. This mutation gave rise to the core-box, and other regions of the original protein evolved differently in the various reptilians orders. In lepidosaurians, two main groups, the high glycine proline and the high cysteine proline proteins, were formed. In archosaurians and chelonians two main groups later diversified into the high glycine proline tyrosine, non-feather proteins, and into the glycine-tyrosine-poor group of feather proteins, which evolved in birds. The latter proteins were particularly suited for making the elongated barb/barbule cells of feathers. In therapsids-mammals, mutations of the ancestral proteins formed the high glycine-tyrosine or the high cysteine proteins but no core-box was produced in the matrix proteins of the hard corneous material of mammalian derivatives. PMID:19422429

  5. Classification of Plant Associated Bacteria Using RIF, a Computationally Derived DNA Marker

    PubMed Central

    Schneider, Kevin L.; Marrero, Glorimar; Alvarez, Anne M.; Presting, Gernot G.

    2011-01-01

    A DNA marker that distinguishes plant associated bacteria at the species level and below was derived by comparing six sequenced genomes of Xanthomonas, a genus that contains many important phytopathogens. This DNA marker comprises a portion of the dnaA replication initiation factor (RIF). Unlike the rRNA genes, dnaA is a single copy gene in the vast majority of sequenced bacterial genomes, and amplification of RIF requires genus-specific primers. In silico analysis revealed that RIF has equal or greater ability to differentiate closely related species of Xanthomonas than the widely used ribosomal intergenic spacer region (ITS). Furthermore, in a set of 263 Xanthomonas, Ralstonia and Clavibacter strains, the RIF marker was directly sequenced in both directions with a success rate approximately 16% higher than that for ITS. RIF frameworks for Xanthomonas, Ralstonia and Clavibacter were constructed using 682 reference strains representing different species, subspecies, pathovars, races, hosts and geographic regions, and contain a total of 109 different RIF sequences. RIF sequences showed subspecific groupings but did not place strains of X. campestris or X. axonopodis into currently named pathovars nor R. solanacearum strains into their respective races, confirming previous conclusions that pathovar and race designations do not necessarily reflect genetic relationships. The RIF marker also was sequenced for 24 reference strains from three genera in the Enterobacteriaceae: Pectobacterium, Pantoea and Dickeya. RIF sequences of 70 previously uncharacterized strains of Ralstonia, Clavibacter, Pectobacterium and Dickeya matched, or were similar to, those of known reference strains, illustrating the utility of the frameworks to classify bacteria below the species level and rapidly match unknown isolates to reference strains. The RIF sequence frameworks are available at the online RIF database, RIFdb, and can be queried for diagnostic purposes with RIF sequences obtained from unknown strains in both chromatogram and FASTA format. PMID:21533033

  6. Complete Chloroplast Genome Sequence of Tartary Buckwheat (Fagopyrum tataricum) and Comparative Analysis with Common Buckwheat (F. esculentum)

    PubMed Central

    Cho, Kwang-Soo; Yun, Bong-Kyoung; Yoon, Young-Ho; Hong, Su-Young; Mekapogu, Manjulatha; Kim, Kyung-Hee; Yang, Tae-Jin

    2015-01-01

    We report the chloroplast (cp) genome sequence of tartary buckwheat (Fagopyrum tataricum) obtained by next-generation sequencing technology and compared this with the previously reported common buckwheat (F. esculentum ssp. ancestrale) cp genome. The cp genome of F. tataricum has a total sequence length of 159,272 bp, which is 327 bp shorter than the common buckwheat cp genome. The cp gene content, order, and orientation are similar to those of common buckwheat, but with some structural variation at tandem and palindromic repeat frequencies and junction areas. A total of seven InDels (around 100 bp) were found within the intergenic sequences and the ycf1 gene. Copy number variation of the 21-bp tandem repeat varied in F. tataricum (four repeats) and F. esculentum (one repeat), and the InDel of the ycf1 gene was 63 bp long. Nucleotide and amino acid have highly conserved coding sequence with about 98% homology and four genes—rpoC2, ycf3, accD, and clpP—have high synonymous (Ks) value. PCR based InDel markers were applied to diverse genetic resources of F. tataricum and F. esculentum, and the amplicon size was identical to that expected in silico. Therefore, these InDel markers are informative biomarkers to practically distinguish raw or processed buckwheat products derived from F. tataricum and F. esculentum. PMID:25966355

  7. A MAP-based image interpolation method via Viterbi decoding of Markov chains of interpolation functions.

    PubMed

    Vedadi, Farhang; Shirani, Shahram

    2014-01-01

    A new method of image resolution up-conversion (image interpolation) based on maximum a posteriori sequence estimation is proposed. Instead of making a hard decision about the value of each missing pixel, we estimate the missing pixels in groups. At each missing pixel of the high resolution (HR) image, we consider an ensemble of candidate interpolation methods (interpolation functions). The interpolation functions are interpreted as states of a Markov model. In other words, the proposed method undergoes state transitions from one missing pixel position to the next. Accordingly, the interpolation problem is translated to the problem of estimating the optimal sequence of interpolation functions corresponding to the sequence of missing HR pixel positions. We derive a parameter-free probabilistic model for this to-be-estimated sequence of interpolation functions. Then, we solve the estimation problem using a trellis representation and the Viterbi algorithm. Using directional interpolation functions and sequence estimation techniques, we classify the new algorithm as an adaptive directional interpolation using soft-decision estimation techniques. Experimental results show that the proposed algorithm yields images with higher or comparable peak signal-to-noise ratios compared with some benchmark interpolation methods in the literature while being efficient in terms of implementation and complexity considerations.

  8. Complete Genome Sequence and Comparative Analysis of the Fish Pathogen Lactococcus garvieae

    PubMed Central

    Oshima, Kenshiro; Yoshizaki, Mariko; Kawanishi, Michiko; Nakaya, Kohei; Suzuki, Takehito; Miyauchi, Eiji; Ishii, Yasuo; Tanabe, Soichi; Murakami, Masaru; Hattori, Masahira

    2011-01-01

    Lactococcus garvieae causes fatal haemorrhagic septicaemia in fish such as yellowtail. The comparative analysis of genomes of a virulent strain Lg2 and a non-virulent strain ATCC 49156 of L. garvieae revealed that the two strains shared a high degree of sequence identity, but Lg2 had a 16.5-kb capsule gene cluster that is absent in ATCC 49156. The capsule gene cluster was composed of 15 genes, of which eight genes are highly conserved with those in exopolysaccharide biosynthesis gene cluster often found in Lactococcus lactis strains. Sequence analysis of the capsule gene cluster in the less virulent strain L. garvieae Lg2-S, Lg2-derived strain, showed that two conserved genes were disrupted by a single base pair deletion, respectively. These results strongly suggest that the capsule is crucial for virulence of Lg2. The capsule gene cluster of Lg2 may be a genomic island from several features such as the presence of insertion sequences flanked on both ends, different GC content from the chromosomal average, integration into the locus syntenic to other lactococcal genome sequences, and distribution in human gut microbiomes. The analysis also predicted other potential virulence factors such as haemolysin. The present study provides new insights into understanding of the virulence mechanisms of L. garvieae in fish. PMID:21829716

  9. Piscine reovirus: Genomic and molecular phylogenetic analysis from farmed and wild salmonids collected on the Canada/US Pacific Coast

    USGS Publications Warehouse

    Siah, Ahmed; Morrison, Diane B.; Fringuelli, Elena; Savage, Paul S.; Richmond, Zina; Purcell, Maureen K.; Johns, Robert; Johnson, Stewart C.; Sakasida, Sonja M.

    2015-01-01

    Piscine reovirus (PRV) is a double stranded non-enveloped RNA virus detected in farmed and wild salmonids. This study examined the phylogenetic relationships among different PRV sequence types present in samples from salmonids in Western Canada and the US, including Alaska (US), British Columbia (Canada) and Washington State (US). Tissues testing positive for PRV were partially sequenced for segment S1, producing 71 sequences that grouped into 10 unique sequence types. Sequence analysis revealed no identifiable geographical or temporal variation among the sequence types. Identical sequence types were found in fish sampled in 2001, 2005 and 2014. In addition, PRV positive samples from fish derived from Alaska, British Columbia and Washington State share identical sequence types. Comparative analysis of the phylogenetic tree indicated that Canada/US Pacific Northwest sequences formed a subgroup with some Norwegian sequence types (group II), distinct from other Norwegian and Chilean sequences (groups I, III and IV). Representative PRV positive samples from farmed and wild fish in British Columbia and Washington State were subjected to genome sequencing using next generation sequencing methods. Individual analysis of each of the 10 partial segments indicated that the Canadian and US PRV sequence types clustered separately from available whole genome sequences of some Norwegian and Chilean sequences for all segments except the segment S4. In summary, PRV was genetically homogenous over a large geographic distance (Alaska to Washington State), and the sequence types were relatively stable over a 13 year period.

  10. Piscine Reovirus: Genomic and Molecular Phylogenetic Analysis from Farmed and Wild Salmonids Collected on the Canada/US Pacific Coast

    PubMed Central

    Siah, Ahmed; Morrison, Diane B.; Fringuelli, Elena; Savage, Paul; Richmond, Zina; Johns, Robert; Purcell, Maureen K.; Johnson, Stewart C.; Saksida, Sonja M.

    2015-01-01

    Piscine reovirus (PRV) is a double stranded non-enveloped RNA virus detected in farmed and wild salmonids. This study examined the phylogenetic relationships among different PRV sequence types present in samples from salmonids in Western Canada and the US, including Alaska (US), British Columbia (Canada) and Washington State (US). Tissues testing positive for PRV were partially sequenced for segment S1, producing 71 sequences that grouped into 10 unique sequence types. Sequence analysis revealed no identifiable geographical or temporal variation among the sequence types. Identical sequence types were found in fish sampled in 2001, 2005 and 2014. In addition, PRV positive samples from fish derived from Alaska, British Columbia and Washington State share identical sequence types. Comparative analysis of the phylogenetic tree indicated that Canada/US Pacific Northwest sequences formed a subgroup with some Norwegian sequence types (group II), distinct from other Norwegian and Chilean sequences (groups I, III and IV). Representative PRV positive samples from farmed and wild fish in British Columbia and Washington State were subjected to genome sequencing using next generation sequencing methods. Individual analysis of each of the 10 partial segments indicated that the Canadian and US PRV sequence types clustered separately from available whole genome sequences of some Norwegian and Chilean sequences for all segments except the segment S4. In summary, PRV was genetically homogenous over a large geographic distance (Alaska to Washington State), and the sequence types were relatively stable over a 13 year period. PMID:26536673

  11. Whole Genome Sequence of Two Wild-Derived Mus musculus domesticus Inbred Strains, LEWES/EiJ and ZALENDE/EiJ, with Different Diploid Numbers

    PubMed Central

    Morgan, Andrew P.; Didion, John P.; Doran, Anthony G.; Holt, James M.; McMillan, Leonard; Keane, Thomas M.; de Villena, Fernando Pardo-Manuel

    2016-01-01

    Wild-derived mouse inbred strains are becoming increasingly popular for complex traits analysis, evolutionary studies, and systems genetics. Here, we report the whole-genome sequencing of two wild-derived mouse inbred strains, LEWES/EiJ and ZALENDE/EiJ, of Mus musculus domesticus origin. These two inbred strains were selected based on their geographic origin, karyotype, and use in ongoing research. We generated 14× and 18× coverage sequence, respectively, and discovered over 1.1 million novel variants, most of which are private to one of these strains. This report expands the number of wild-derived inbred genomes in the Mus genus from six to eight. The sequence variation can be accessed via an online query tool; variant calls (VCF format) and alignments (BAM format) are available for download from a dedicated ftp site. Finally, the sequencing data have also been stored in a lossless, compressed, and indexed format using the multi-string Burrows-Wheeler transform. All data can be used without restriction. PMID:27765810

  12. A Critical Assessment of Ages Derived Using Pre-Main-Sequence Isochrones in Colour-Magnitude Diagrams

    NASA Astrophysics Data System (ADS)

    Bell, Cameron P. M.

    2012-11-01

    In this thesis a critical assessment of the ages derived using theoretical pre-main-sequence (pre-MS) stellar evolutionary models is presented by comparing the predictions to the low-mass pre-MS population of 14 young star-forming regions (SFRs) in colour-magnitude diagrams (CMDs). Deriving pre-MS ages requires precise distances and estimates of the reddening. Therefore, the main-sequence (MS) members of the SFRs have been used to derive a self-consistent set of statistically robust ages, distances and reddenings with associated uncertainties using a maximum-likelihood fitting statistic and MS evolutionary models. A photometric method for de-reddening individual stars - known as the Q-method - in regions where the extinction is spatially variable has been updated and is presented. The effects of both the model dependency and the SFR composition on these derived parameters are also discussed. The problem of calibrating photometric observations of red pre-MS stars is examined and it is shown that using observations of MS stars to transform the data into a standard photometric system can introduce significant errors in the position of the pre-MS locus in CMD space. Hence, it is crucial that precise photometric studies - especially of pre-MS objects - be carried out in the natural photometric system of the observations. This therefore requires a robust model of the system responses for the instrument used, and thus the calculated responses for the Wide-Field Camera on the Isaac Newton Telescope are presented. These system responses have been tested using standard star observations and have been shown to be a good representation of the photometric system. A benchmark test for the pre-MS evolutionary models is performed by comparing them to a set of well-calibrated CMDs of the Pleiades in the wavelength regime 0.4-2.5 μm. The masses predicted by these models are also tested against dynamical masses using a sample of MS binaries by calculating the system magnitude in a given photometric bandpass. This analysis shows that for Teff ≤ 4000 K the models systematically overestimate the flux by a factor of 2 at 0.5 μm, though this decreases with wavelength, becoming negligible at 2.2 μm. Thus before the pre-MS models are used to derive ages, a recalibration of the models is performed by incorporating an empirical colour-Teff relation and bolometric corrections based on the Ks-band luminosity of Pleiades members, with theoretical corrections for the dependence on the surface gravity (log g). The recalibrated pre-MS model isochrones are used to derive ages from the pre-MS populations of the SFRs. These ages are then compared with the MS derivations, thus providing a powerful diagnostic tool with which to discriminate between the different pre-MS age scales that arise from a much stronger model dependency in the pre-MS regime. The revised ages assigned to each of the 14 SFRs are up to a factor two older than previous derivations, a result with wide-ranging implications, including that circumstellar discs survive longer and that the average Class II lifetime is greater than currently believed.

  13. RAD tag sequencing as a source of SNP markers in Cynara cardunculus L

    PubMed Central

    2012-01-01

    Background The globe artichoke (Cynara cardunculus L. var. scolymus) genome is relatively poorly explored, especially compared to those of the other major Asteraceae crops sunflower and lettuce. No SNP markers are in the public domain. We have combined the recently developed restriction-site associated DNA (RAD) approach with the Illumina DNA sequencing platform to effect the rapid and mass discovery of SNP markers for C. cardunculus. Results RAD tags were sequenced from the genomic DNA of three C. cardunculus mapping population parents, generating 9.7 million reads, corresponding to ~1 Gbp of sequence. An assembly based on paired ends produced ~6.0 Mbp of genomic sequence, separated into ~19,000 contigs (mean length 312 bp), of which ~21% were fragments of putative coding sequence. The shared sequences allowed for the discovery of ~34,000 SNPs and nearly 800 indels, equivalent to a SNP frequency of 5.6 per 1,000 nt, and an indel frequency of 0.2 per 1,000 nt. A sample of heterozygous SNP loci was mapped by CAPS assays and this exercise provided validation of our mining criteria. The repetitive fraction of the genome had a high representation of retrotransposon sequence, followed by simple repeats, AT-low complexity regions and mobile DNA elements. The genomic k-mers distribution and CpG rate of C. cardunculus, compared with data derived from three whole genome-sequenced dicots species, provided a further evidence of the random representation of the C. cardunculus genome generated by RAD sampling. Conclusion The RAD tag sequencing approach is a cost-effective and rapid method to develop SNP markers in a highly heterozygous species. Our approach permitted to generate a large and robust SNP datasets by the adoption of optimized filtering criteria. PMID:22214349

  14. Spin models inferred from patient-derived viral sequence data faithfully describe HIV fitness landscapes

    NASA Astrophysics Data System (ADS)

    Shekhar, Karthik; Ruberman, Claire F.; Ferguson, Andrew L.; Barton, John P.; Kardar, Mehran; Chakraborty, Arup K.

    2013-12-01

    Mutational escape from vaccine-induced immune responses has thwarted the development of a successful vaccine against AIDS, whose causative agent is HIV, a highly mutable virus. Knowing the virus' fitness as a function of its proteomic sequence can enable rational design of potent vaccines, as this information can focus vaccine-induced immune responses to target mutational vulnerabilities of the virus. Spin models have been proposed as a means to infer intrinsic fitness landscapes of HIV proteins from patient-derived viral protein sequences. These sequences are the product of nonequilibrium viral evolution driven by patient-specific immune responses and are subject to phylogenetic constraints. How can such sequence data allow inference of intrinsic fitness landscapes? We combined computer simulations and variational theory á la Feynman to show that, in most circumstances, spin models inferred from patient-derived viral sequences reflect the correct rank order of the fitness of mutant viral strains. Our findings are relevant for diverse viruses.

  15. General and inducible hypermutation facilitate parallel adaptation in Pseudomonas aeruginosa despite divergent mutation spectra.

    PubMed

    Weigand, Michael R; Sundin, George W

    2012-08-21

    The successful growth of hypermutator strains of bacteria contradicts a clear preference for lower mutation rates observed in the microbial world. Whether by general DNA repair deficiency or the inducible action of low-fidelity DNA polymerases, the evolutionary strategies of bacteria include methods of hypermutation. Although both raise mutation rate, general and inducible hypermutation operate through distinct molecular mechanisms and therefore likely impart unique adaptive consequences. Here we compare the influence of general and inducible hypermutation on adaptation in the model organism Pseudomonas aeruginosa PAO1 through experimental evolution. We observed divergent spectra of single base substitutions derived from general and inducible hypermutation by sequencing rpoB in spontaneous rifampicin-resistant (Rif(R)) mutants. Likewise, the pattern of mutation in a draft genome sequence of a derived inducible hypermutator isolate differed from those of general hypermutators reported in the literature. However, following experimental evolution, populations of both mutator types exhibited comparable improvements in fitness across varied conditions that differed from the highly specific adaptation of nonmutators. Our results suggest that despite their unique mutation spectra, general and inducible hypermutation can analogously influence the ecology and adaptation of bacteria, significantly shaping pathogenic populations where hypermutation has been most widely observed.

  16. Unexpected series of regular frequency spacing of δ Scuti stars in the non-asymptotic regime. II. Sample-Echelle diagrams and rotation

    DOE PAGES

    Paparo, M.; Benko, J. M.; Hareter, M.; ...

    2016-06-17

    A sequence search method was developed for searching for regular frequency spacing in δ Scuti stars by visual inspection (VI) and algorithmic search. The sample contains 90 δ Scuti stars observed by CoRoT. An example is given to represent the VI. The algorithm (SSA) is described in detail. The data treatment of the CoRoT light curves, the criteria for frequency filtering, and the spacings derived by two methods (i.e., three approaches: VI, SSA, and FT) are given for each target. Echelle diagrams are presented for 77 targets for which at least one sequence of regular spacing was identified. Comparing the spacing and the shifts between pairs of echelle ridges revealed that at least one pair of echelle ridges is shifted to midway between the spacing for 22 stars. The estimated rotational frequencies compared to the shifts revealed rotationally split doublets, triplets, and multiplets not only for single frequencies, but for the complete echelle ridges in 31 δ Scuti stars. Furthermore, using several possible assumptions for the origin of the spacings, we derived the large separation (more » $${\\rm{\\Delta }}\

  17. Unexpected series of regular frequency spacing of δ Scuti stars in the non-asymptotic regime. II. Sample-Echelle diagrams and rotation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Paparo, M.; Benko, J. M.; Hareter, M.

    A sequence search method was developed for searching for regular frequency spacing in δ Scuti stars by visual inspection (VI) and algorithmic search. The sample contains 90 δ Scuti stars observed by CoRoT. An example is given to represent the VI. The algorithm (SSA) is described in detail. The data treatment of the CoRoT light curves, the criteria for frequency filtering, and the spacings derived by two methods (i.e., three approaches: VI, SSA, and FT) are given for each target. Echelle diagrams are presented for 77 targets for which at least one sequence of regular spacing was identified. Comparing the spacing and the shifts between pairs of echelle ridges revealed that at least one pair of echelle ridges is shifted to midway between the spacing for 22 stars. The estimated rotational frequencies compared to the shifts revealed rotationally split doublets, triplets, and multiplets not only for single frequencies, but for the complete echelle ridges in 31 δ Scuti stars. Furthermore, using several possible assumptions for the origin of the spacings, we derived the large separation (more » $${\\rm{\\Delta }}\

  18. Structurally complex and highly active RNA ligases derived from random RNA sequences

    NASA Technical Reports Server (NTRS)

    Ekland, E. H.; Szostak, J. W.; Bartel, D. P.

    1995-01-01

    Seven families of RNA ligases, previously isolated from random RNA sequences, fall into three classes on the basis of secondary structure and regiospecificity of ligation. Two of the three classes of ribozymes have been engineered to act as true enzymes, catalyzing the multiple-turnover transformation of substrates into products. The most complex of these ribozymes has a minimal catalytic domain of 93 nucleotides. An optimized version of this ribozyme has a kcat exceeding one per second, a value far greater than that of most natural RNA catalysts and approaching that of comparable protein enzymes. The fact that such a large and complex ligase emerged from a very limited sampling of sequence space implies the existence of a large number of distinct RNA structures of equivalent complexity and activity.

  19. [Personalized urooncology based on molecular uropathology: what is the future?].

    PubMed

    Dahl, E; Haller, F

    2013-07-01

    Targeted therapies and biomarker validation are key drivers in the advancement of personalized oncology which is a growing topic in all clinical areas. Compared with other professions, such as pulmonology and gynecology, development in urology has so far been retarded but has recently gained increasing momentum. A basis for this is the currently growing and in future accelerated application of new knowledge derived from molecular biology in the field of uropathology. The rapid gain of knowledge is driven by a whole new class of analytical methods, such as massively parallel sequencing (deep sequencing or next generation sequencing), which enables analysis of virtually a new universe of potential biomarkers. This article describes the emerging paradigm shift in molecular pathological diagnostics of urological tumors using the example of prostate cancer.

  20. Rapid in silico cloning of genes using expressed sequence tags (ESTs).

    PubMed

    Gill, R W; Sanseau, P

    2000-01-01

    Expressed sequence tags (ESTs) are short single-pass DNA sequences obtained from either end of cDNA clones. These ESTs are derived from a vast number of cDNA libraries obtained from different species. Human ESTs are the bulk of the data and have been widely used to identify new members of gene families, as markers on the human chromosomes, to discover polymorphism sites and to compare expression patterns in different tissues or pathologies states. Information strategies have been devised to query EST databases. Since most of the analysis is performed with a computer, the term "in silico" strategy has been coined. In this chapter we will review the current status of EST databases, the pros and cons of EST-type data and describe possible strategies to retrieve meaningful information.

  1. [Comparative analysis of clustered regularly interspaced short palindromic repeats (CRISPRs) loci in the genomes of halophilic archaea].

    PubMed

    Zhang, Fan; Zhang, Bing; Xiang, Hua; Hu, Songnian

    2009-11-01

    Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is a widespread system that provides acquired resistance against phages in bacteria and archaea. Here we aim to genome-widely analyze the CRISPR in extreme halophilic archaea, of which the whole genome sequences are available at present time. We used bioinformatics methods including alignment, conservation analysis, GC content and RNA structure prediction to analyze the CRISPR structures of 7 haloarchaeal genomes. We identified the CRISPR structures in 5 halophilic archaea and revealed a conserved palindromic motif in the flanking regions of these CRISPR structures. In addition, we found that the repeat sequences of large CRISPR structures in halophilic archaea were greatly conserved, and two types of predicted RNA secondary structures derived from the repeat sequences were likely determined by the fourth base of the repeat sequence. Our results support the proposal that the leader sequence may function as recognition site by having palindromic structures in flanking regions, and the stem-loop secondary structure formed by repeat sequences may function in mediating the interaction between foreign genetic elements and CAS-encoded proteins.

  2. A comparative analysis of exome capture.

    PubMed

    Parla, Jennifer S; Iossifov, Ivan; Grabill, Ian; Spector, Mona S; Kramer, Melissa; McCombie, W Richard

    2011-09-29

    Human exome resequencing using commercial target capture kits has been and is being used for sequencing large numbers of individuals to search for variants associated with various human diseases. We rigorously evaluated the capabilities of two solution exome capture kits. These analyses help clarify the strengths and limitations of those data as well as systematically identify variables that should be considered in the use of those data. Each exome kit performed well at capturing the targets they were designed to capture, which mainly corresponds to the consensus coding sequences (CCDS) annotations of the human genome. In addition, based on their respective targets, each capture kit coupled with high coverage Illumina sequencing produced highly accurate nucleotide calls. However, other databases, such as the Reference Sequence collection (RefSeq), define the exome more broadly, and so not surprisingly, the exome kits did not capture these additional regions. Commercial exome capture kits provide a very efficient way to sequence select areas of the genome at very high accuracy. Here we provide the data to help guide critical analyses of sequencing data derived from these products.

  3. MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins

    PubMed Central

    Disfani, Fatemeh Miri; Hsu, Wei-Lun; Mizianty, Marcin J.; Oldfield, Christopher J.; Xue, Bin; Dunker, A. Keith; Uversky, Vladimir N.; Kurgan, Lukasz

    2012-01-01

    Motivation: Molecular recognition features (MoRFs) are short binding regions located within longer intrinsically disordered regions that bind to protein partners via disorder-to-order transitions. MoRFs are implicated in important processes including signaling and regulation. However, only a limited number of experimentally validated MoRFs is known, which motivates development of computational methods that predict MoRFs from protein chains. Results: We introduce a new MoRF predictor, MoRFpred, which identifies all MoRF types (α, β, coil and complex). We develop a comprehensive dataset of annotated MoRFs to build and empirically compare our method. MoRFpred utilizes a novel design in which annotations generated by sequence alignment are fused with predictions generated by a Support Vector Machine (SVM), which uses a custom designed set of sequence-derived features. The features provide information about evolutionary profiles, selected physiochemical properties of amino acids, and predicted disorder, solvent accessibility and B-factors. Empirical evaluation on several datasets shows that MoRFpred outperforms related methods: α-MoRF-Pred that predicts α-MoRFs and ANCHOR which finds disordered regions that become ordered when bound to a globular partner. We show that our predicted (new) MoRF regions have non-random sequence similarity with native MoRFs. We use this observation along with the fact that predictions with higher probability are more accurate to identify putative MoRF regions. We also identify a few sequence-derived hallmarks of MoRFs. They are characterized by dips in the disorder predictions and higher hydrophobicity and stability when compared to adjacent (in the chain) residues. Availability: http://biomine.ece.ualberta.ca/MoRFpred/; http://biomine.ece.ualberta.ca/MoRFpred/Supplement.pdf Contact: lkurgan@ece.ualberta.ca Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22689782

  4. Generating Models of Surgical Procedures using UMLS Concepts and Multiple Sequence Alignment

    PubMed Central

    Meng, Frank; D’Avolio, Leonard W.; Chen, Andrew A.; Taira, Ricky K.; Kangarloo, Hooshang

    2005-01-01

    Surgical procedures can be viewed as a process composed of a sequence of steps performed on, by, or with the patient’s anatomy. This sequence is typically the pattern followed by surgeons when generating surgical report narratives for documenting surgical procedures. This paper describes a methodology for semi-automatically deriving a model of conducted surgeries, utilizing a sequence of derived Unified Medical Language System (UMLS) concepts for representing surgical procedures. A multiple sequence alignment was computed from a collection of such sequences and was used for generating the model. These models have the potential of being useful in a variety of informatics applications such as information retrieval and automatic document generation. PMID:16779094

  5. ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes.

    PubMed

    Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim

    2010-03-01

    Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith-Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. The database can be accessed through http://proteinworlddb.org

  6. Molecular Characterization and Comparative Sequence Analysis of Defense-Related Gene, Oryza rufipogon Receptor-Like Protein Kinase 1

    PubMed Central

    Law, Yee-Song; Gudimella, Ranganath; Song, Beng-Kah; Ratnam, Wickneswari; Harikrishna, Jennifer Ann

    2012-01-01

    Many of the plant leucine rich repeat receptor-like kinases (LRR-RLKs) have been found to regulate signaling during plant defense processes. In this study, we selected and sequenced an LRR-RLK gene, designated as Oryza rufipogon receptor-like protein kinase 1 (OrufRPK1), located within yield QTL yld1.1 from the wild rice Oryza rufipogon (accession IRGC105491). A 2055 bp coding region and two exons were identified. Southern blotting determined OrufRPK1 to be a single copy gene. Sequence comparison with cultivated rice orthologs (OsI219RPK1, OsI9311RPK1 and OsJNipponRPK1, respectively derived from O. sativa ssp. indica cv. MR219, O. sativa ssp. indica cv. 9311 and O. sativa ssp. japonica cv. Nipponbare) revealed the presence of 12 single nucleotide polymorphisms (SNPs) with five non-synonymous substitutions, and 23 insertion/deletion sites. The biological role of the OrufRPK1 as a defense related LRR-RLK is proposed on the basis of cDNA sequence characterization, domain subfamily classification, structural prediction of extra cellular domains, cluster analysis and comparative gene expression. PMID:22942769

  7. Design of Protein Multi-specificity Using an Independent Sequence Search Reduces the Barrier to Low Energy Sequences

    PubMed Central

    Sevy, Alexander M.; Jacobs, Tim M.; Crowe, James E.; Meiler, Jens

    2015-01-01

    Computational protein design has found great success in engineering proteins for thermodynamic stability, binding specificity, or enzymatic activity in a ‘single state’ design (SSD) paradigm. Multi-specificity design (MSD), on the other hand, involves considering the stability of multiple protein states simultaneously. We have developed a novel MSD algorithm, which we refer to as REstrained CONvergence in multi-specificity design (RECON). The algorithm allows each state to adopt its own sequence throughout the design process rather than enforcing a single sequence on all states. Convergence to a single sequence is encouraged through an incrementally increasing convergence restraint for corresponding positions. Compared to MSD algorithms that enforce (constrain) an identical sequence on all states the energy landscape is simplified, which accelerates the search drastically. As a result, RECON can readily be used in simulations with a flexible protein backbone. We have benchmarked RECON on two design tasks. First, we designed antibodies derived from a common germline gene against their diverse targets to assess recovery of the germline, polyspecific sequence. Second, we design “promiscuous”, polyspecific proteins against all binding partners and measure recovery of the native sequence. We show that RECON is able to efficiently recover native-like, biologically relevant sequences in this diverse set of protein complexes. PMID:26147100

  8. Molecular characteristics and efficacy of 16D10 siRNAs in inhibiting root-knot nematode infection in transgenic grape hairy roots.

    PubMed

    Yang, Yingzhen; Jittayasothorn, Yingyos; Chronis, Demosthenis; Wang, Xiaohong; Cousins, Peter; Zhong, Gan-Yuan

    2013-01-01

    Root-knot nematodes (RKNs) infect many annual and perennial crops and are the most devastating soil-born pests in vineyards. To develop a biotech-based solution for controlling RKNs in grapes, we evaluated the efficacy of plant-derived RNA interference (RNAi) silencing of a conserved RKN effector gene, 16D10, for nematode resistance in transgenic grape hairy roots. Two hairpin-based silencing constructs, containing a stem sequence of 42 bp (pART27-42) or 271 bp (pART27-271) of the 16D10 gene, were transformed into grape hairy roots and compared for their small interfering RNA (siRNA) production and efficacy on suppression of nematode infection. Transgenic hairy root lines carrying either of the two RNAi constructs showed less susceptibility to nematode infection compared with control. Small RNA libraries from four pART27-42 and two pART27-271 hairy root lines were sequenced using an Illumina sequencing technology. The pART27-42 lines produced hundred times more 16D10-specific siRNAs than the pART27-271 lines. On average the 16D10 siRNA population had higher GC content than the 16D10 stem sequences in the RNAi constructs, supporting previous observation that plant dicer-like enzymes prefer GC-rich sequences as substrates for siRNA production. The stems of the 16D10 RNAi constructs were not equally processed into siRNAs. Several hot spots for siRNA production were found in similar positions of the hairpin stems in pART27-42 and pART27-271. Interestingly, stem sequences at the loop terminus produced more siRNAs than those at the stem base. Furthermore, the relative abundance of guide and passenger single-stranded RNAs from putative siRNA duplexes was largely correlated with their 5' end thermodynamic strength. This study demonstrated the feasibility of using a plant-derived RNAi approach for generation of novel nematode resistance in grapes and revealed several interesting molecular characteristics of transgene siRNAs important for optimizing plant RNAi constructs.

  9. Molecular Characteristics and Efficacy of 16D10 siRNAs in Inhibiting Root-Knot Nematode Infection in Transgenic Grape Hairy Roots

    PubMed Central

    Chronis, Demosthenis; Wang, Xiaohong; Cousins, Peter; Zhong, Gan-Yuan

    2013-01-01

    Root-knot nematodes (RKNs) infect many annual and perennial crops and are the most devastating soil-born pests in vineyards. To develop a biotech-based solution for controlling RKNs in grapes, we evaluated the efficacy of plant-derived RNA interference (RNAi) silencing of a conserved RKN effector gene, 16D10, for nematode resistance in transgenic grape hairy roots. Two hairpin-based silencing constructs, containing a stem sequence of 42 bp (pART27-42) or 271 bp (pART27-271) of the 16D10 gene, were transformed into grape hairy roots and compared for their small interfering RNA (siRNA) production and efficacy on suppression of nematode infection. Transgenic hairy root lines carrying either of the two RNAi constructs showed less susceptibility to nematode infection compared with control. Small RNA libraries from four pART27-42 and two pART27-271 hairy root lines were sequenced using an Illumina sequencing technology. The pART27-42 lines produced hundred times more 16D10-specific siRNAs than the pART27-271 lines. On average the 16D10 siRNA population had higher GC content than the 16D10 stem sequences in the RNAi constructs, supporting previous observation that plant dicer-like enzymes prefer GC-rich sequences as substrates for siRNA production. The stems of the 16D10 RNAi constructs were not equally processed into siRNAs. Several hot spots for siRNA production were found in similar positions of the hairpin stems in pART27-42 and pART27-271. Interestingly, stem sequences at the loop terminus produced more siRNAs than those at the stem base. Furthermore, the relative abundance of guide and passenger single-stranded RNAs from putative siRNA duplexes was largely correlated with their 5′ end thermodynamic strength. This study demonstrated the feasibility of using a plant-derived RNAi approach for generation of novel nematode resistance in grapes and revealed several interesting molecular characteristics of transgene siRNAs important for optimizing plant RNAi constructs. PMID:23874962

  10. Dissecting genomic hotspots underlying seed protein, oil, and sucrose content in an interspecific mapping population of soybean using high-density linkage mapping.

    PubMed

    Patil, Gunvant; Vuong, Tri D; Kale, Sandip; Valliyodan, Babu; Deshmukh, Rupesh; Zhu, Chengsong; Wu, Xiaolei; Bai, Yonghe; Yungbluth, Dennis; Lu, Fang; Kumpatla, Siva; Shannon, J Grover; Varshney, Rajeev K; Nguyen, Henry T

    2018-04-04

    The cultivated [Glycine max (L) Merr.] and wild [Glycine soja Siebold & Zucc.] soybean species comprise wide variation in seed composition traits. Compared to wild soybean, cultivated soybean contains low protein, high oil, and high sucrose. In this study, an interspecific population was derived from a cross between G. max (Williams 82) and G. soja (PI 483460B). This recombinant inbred line (RIL) population of 188 lines was sequenced at 0.3× depth. Based on 91 342 single nucleotide polymorphisms (SNPs), recombination events in RILs were defined, and a high-resolution bin map was developed (4070 bins). In addition to bin mapping, quantitative trait loci (QTL) analysis for protein, oil, and sucrose was performed using 3343 polymorphic SNPs (3K-SNP), derived from Illumina Infinium BeadChip sequencing platform. The QTL regions from both platforms were compared, and a significant concordance was observed between bin and 3K-SNP markers. Importantly, the bin map derived from next-generation sequencing technology enhanced mapping resolution (from 1325 to 50 Kb). A total of five, nine, and four QTLs were identified for protein, oil, and sucrose content, respectively, and some of the QTLs coincided with soybean domestication-related genomic loci. The major QTL for protein and oil were mapped on Chr. 20 (qPro_20) and suggested negative correlation between oil and protein. In terms of sucrose content, a novel and major QTL were identified on Chr. 8 (qSuc_08) and harbours putative genes involved in sugar transport. In addition, genome-wide association using 91 342 SNPs confirmed the genomic loci derived from QTL mapping. A QTL-based haplotype using whole-genome resequencing of 106 diverse soybean lines identified unique allelic variation in wild soybean that could be utilized to widen the genetic base in cultivated soybean. © 2018 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  11. The short interspersed repetitive element of Trypanosoma cruzi, SIRE, is part of VIPER, an unusual retroelement related to long terminal repeat retrotransposons

    PubMed Central

    Vázquez, Martín; Ben-Dov, Claudia; Lorenzi, Hernan; Moore, Troy; Schijman, Alejandro; Levin, Mariano J.

    2000-01-01

    The short interspersed repetitive element (SIRE) of Trypanosoma cruzi was first detected when comparing the sequences of loci that encode the TcP2β genes. It is present in about 1,500–3,000 copies per genome, depending on the strain, and it is distributed in all chromosomes. An initial analysis of SIRE sequences from 21 genomic fragments allowed us to derive a consensus nucleotide sequence and structure for the element, consisting of three regions (I, II, and III) each harboring distinctive features. Analysis of 158 transcribed SIREs demonstrates that the consensus is highly conserved. The sequences of 51 cDNAs show that SIRE is included in the 3′ end of several mRNAs, always transcribed from the sense strand, contributing the polyadenylation site in 63% of the cases. This study led to the characterization of VIPER (vestigial interposed retroelement), a 2,326-bp-long unusual retroelement. VIPER's 5′ end is formed by the first 182 bp of SIRE, whereas its 3′ end is formed by the last 220 bp of the element. Both SIRE moieties are connected by a 1,924-bp-long fragment that carries a unique ORF encoding a complete reverse transcriptase-RNase H gene whose 15 C-terminal amino acids derive from codons specified by SIRE's region II. The amino acid sequence of VIPER's reverse transcriptase-RNase H shares significant homology to that of long terminal repeat retrotransposons. The fact that SIRE and VIPER sequences are found only in the T. cruzi genome may be of relevance for studies concerning the evolution and the genome flexibility of this protozoan parasite. PMID:10688909

  12. Genetic characterization of Babesia and Theileria parasites in water buffaloes in Sri Lanka.

    PubMed

    Sivakumar, Thillaiampalam; Tattiyapong, Muncharee; Fukushi, Shintaro; Hayashida, Kyoko; Kothalawala, Hemal; Silva, Seekkuge Susil Priyantha; Vimalakumar, Singarayar Caniciyas; Kanagaratnam, Ratnam; Meewewa, Asela Sanjeewa; Suthaharan, Kalpana; Puvirajan, Thamotharampillai; de Silva, Weligodage Kumarawansa; Igarashi, Ikuo; Yokoyama, Naoaki

    2014-02-24

    Water buffaloes are thought to be the reservoir hosts for several hemoprotozoan parasites that infect cattle. In the present study, we surveyed Sri Lankan bred water buffaloes for infections with Babesia bovis, Babesia bigemina, Theileria annulata, and Theileria orientalis using parasite-specific PCR assays. When 320 blood-derived DNA samples from water buffaloes reared in three different districts (Polonnaruwa, Mannar, and Mullaitivu) of Sri Lanka were PCR screened, B. bovis, B. bigemina, and T. orientalis were detected. While T. orientalis was the predominant parasite (82.5%), low PCR-positive rates were observed for B. bovis (1.9%) and B. bigemina (1.6%). Amplicons of the gene sequences of the Rhoptry Associated Protein-1 (RAP-1) of B. bovis, the Apical Membrane Antigen-1 (AMA-1) of B. bigemina, and the Major Piroplasm Surface Protein (MPSP) of T. orientalis were compared with those characterized previously in Sri Lankan cattle. While the B. bigemina AMA-1 sequences from water buffaloes shared high identity values with those from cattle, B. bovis RAP-1 sequences from water buffaloes diverged genetically from those of cattle. For T. orientalis, none of the MPSP sequence types reported previously in Sri Lankan cattle (types 1, 3, 5, and 7) were detected in the water buffaloes, and the MPSP sequences analyzed in the present study belonged to types N1 or N2. In summary, in addition to reporting the first PCR-based survey of Babesia and Theileria parasites in water buffaloes in Sri Lanka, the present study found that the predominant variants of water buffalo-derived B. bovis RAP-1 and T. orientalis MPSP sequences were different from those previously described from cattle in this country. Copyright © 2013 Elsevier B.V. All rights reserved.

  13. Proposal of Vespertiliibacter pulmonis gen. nov., sp. nov. and two genomospecies as new members of the family Pasteurellaceae isolated from European bats.

    PubMed

    Mühldorfer, Kristin; Speck, Stephanie; Wibbelt, Gudrun

    2014-07-01

    Five bacterial strains isolated from bats of the family Vespertilionidae were characterized by phenotypic tests and multilocus sequence analysis (MLSA) using the 16S rRNA gene and four housekeeping genes (rpoA, rpoB, infB, recN). Phylogenetic analyses of individual and combined datasets indicated that the five strains represent a monophyletic cluster within the family Pasteurellaceae. Comparison of 16S rRNA gene sequences demonstrated a high degree of similarity (98.3-99.9%) among the group of bat-derived strains, while searches in nucleotide databases indicated less than 96% sequence similarity to known members of the Pasteurellaceae. The housekeeping genes rpoA, rpoB, infB and recN provided higher resolution compared with the 16S rRNA gene and subdivided the group according to the bat species from which the strains were isolated. Three strains derived from noctule bats shared 98.6-100% sequence similarity in all four genes investigated, whereas, based on rpoB, infB and recN gene sequences, 91.8-96% similarity was observed with and between the remaining two strains isolated from a serotine bat and a pipistrelle bat, respectively. Genome relatedness as deduced from recN gene sequences correlated well with the results of MLSA and indicated that the five strains represent a new genus. Based on these results, it is proposed to classify the five strains derived from bats within Vespertiliibacter pulmonis gen. nov., sp. nov. (the type species), Vespertiliibacter genomospecies 1 and Vespertiliibacter genomospecies 2. The genus can be distinguished phenotypically from recognized genera of the Pasteurellaceae by at least three characteristics. All strains are nutritionally fastidious and require a chemically defined supplement with NAD for growth. The DNA G+C content of strain E127/08(T) is 38.2 mol%. The type strain of Vespertiliibacter pulmonis gen. nov., sp. nov. is E127/08(T) ( = CCUG 64585(T) = DSM 27238(T)). The reference strains of Vespertiliibacter genomospecies 1 and 2 are E145/08 and E157/08, respectively. © 2014 IUMS.

  14. Identification of positional candidates for bovine placental genes responsible for early embryonic death during cloning-attempted pregnancy.

    PubMed

    Yamada, Takahisa; Muramatsu, Youji; Taniguchi, Yukio; Sasaki, Yoshiyuki

    Our previous study detected 291 and 77 genes showing early embryonic death-associated elevation and reduction of expression, respectively, in the fetal placenta of the cow carrying somatic nuclear transfer-derived cloned embryo. In this study, we mapped the 10 genes showing the elevation and the 10 genes doing the reduction most significantly, using somatic cell hybrid and bovine draft genome sequence. We then compared the mapped positions for these genes with the genomic locations of bovine quantitative trait loci for still-birth and/or abortion. Among the mapped genes, peptidylglycine alpha-amidating monooxygenase (PAM), spectrin, beta, nonerythrocytic 1 (SPTBNI), and an unknown novel gene containing AU277832 expressed sequence tag were intriguing, in that the mapped positions were consistent with the genomic locations of bovine still-birth and/or abortion quantitative trait loci, and thus identified as positional candidates for bovine placental genes responsible for the early embryonic death during the pregnancy attempted by somatic nuclear transfer-derived cloning.

  15. RhoA Regulation of Cardiomyocyte Differentiation

    PubMed Central

    Kaarbø, Mari; Crane, Denis I.; Murrell, Wayne G.

    2013-01-01

    Earlier findings from our laboratory implicated RhoA in heart developmental processes. To investigate factors that potentially regulate RhoA expression, RhoA gene organisation and promoter activity were analysed. Comparative analysis indicated strict conservation of both gene organisation and coding sequence of the chick, mouse, and human RhoA genes. Bioinformatics analysis of the derived promoter region of mouse RhoA identified putative consensus sequence binding sites for several transcription factors involved in heart formation and organogenesis generally. Using luciferase reporter assays, RhoA promoter activity was shown to increase in mouse-derived P19CL6 cells that were induced to differentiate into cardiomyocytes. Overexpression of a dominant negative mutant of mouse RhoA (mRhoAN19) blocked this cardiomyocyte differentiation of P19CL6 cells and led to the accumulation of the cardiac transcription factors SRF and GATA4 and the early cardiac marker cardiac α-actin. Taken together, these findings indicate a fundamental role for RhoA in the differentiation of cardiomyocytes. PMID:23935420

  16. Humanization of Antibodies Using Heavy Chain Complementarity-determining Region 3 Grafting Coupled with in Vitro Somatic Hypermutation*

    PubMed Central

    Bowers, Peter M.; Neben, Tamlyn Y.; Tomlinson, Geoffery L.; Dalton, Jennifer L.; Altobell, Larry; Zhang, Xue; Macomber, John L.; Wu, Betty F.; Toobian, Rachelle M.; McConnell, Audrey D.; Verdino, Petra; Chau, Betty; Horlick, Robert A.; King, David J.

    2013-01-01

    A method for simultaneous humanization and affinity maturation of monoclonal antibodies has been developed using heavy chain complementarity-determining region (CDR) 3 grafting combined with somatic hypermutation in vitro. To minimize the amount of murine antibody-derived antibody sequence used during humanization, only the CDR3 region from a murine antibody that recognizes the cytokine hβNGF was grafted into a nonhomologous human germ line V region. The resulting CDR3-grafted HC was paired with a CDR-grafted light chain, displayed on the surface of HEK293 cells, and matured using in vitro somatic hypermutation. A high affinity humanized antibody was derived that was considerably more potent than the parental antibody, possessed a low pm dissociation constant, and demonstrated potent inhibition of hβNGF activity in vitro. The resulting antibody contained half the heavy chain murine donor sequence compared with the same antibody humanized using traditional methods. PMID:23355464

  17. TESS: a geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites.

    PubMed Central

    Wallace, A. C.; Borkakoti, N.; Thornton, J. M.

    1997-01-01

    It is well established that sequence templates such as those in the PROSITE and PRINTS databases are powerful tools for predicting the biological function and tertiary structure for newly derived protein sequences. The number of X-ray and NMR protein structures is increasing rapidly and it is apparent that a 3D equivalent of the sequence templates is needed. Here, we describe an algorithm called TESS that automatically derives 3D templates from structures deposited in the Brookhaven Protein Data Bank. While a new sequence can be searched for sequence patterns, a new structure can be scanned against these 3D templates to identify functional sites. As examples, 3D templates are derived for enzymes with an O-His-O "catalytic triad" and for the ribonucleases and lysozymes. When these 3D templates are applied to a large data set of nonidentical proteins, several interesting hits are located. This suggests that the development of a 3D template database may help to identify the function of new protein structures, if unknown, as well as to design proteins with specific functions. PMID:9385633

  18. Reprint of: Early Behavioural Facilitation by Temporal Expectations in Complex Visual-motor Sequences.

    PubMed

    Heideman, Simone G; van Ede, Freek; Nobre, Anna C

    2018-05-24

    In daily life, temporal expectations may derive from incidental learning of recurring patterns of intervals. We investigated the incidental acquisition and utilisation of combined temporal-ordinal (spatial/effector) structure in complex visual-motor sequences using a modified version of a serial reaction time (SRT) task. In this task, not only the series of targets/responses, but also the series of intervals between subsequent targets was repeated across multiple presentations of the same sequence. Each participant completed three sessions. In the first session, only the repeating sequence was presented. During the second and third session, occasional probe blocks were presented, where a new (unlearned) spatial-temporal sequence was introduced. We first confirm that participants not only got faster over time, but that they were slower and less accurate during probe blocks, indicating that they incidentally learned the sequence structure. Having established a robust behavioural benefit induced by the repeating spatial-temporal sequence, we next addressed our central hypothesis that implicit temporal orienting (evoked by the learned temporal structure) would have the largest influence on performance for targets following short (as opposed to longer) intervals between temporally structured sequence elements, paralleling classical observations in tasks using explicit temporal cues. We found that indeed, reaction time differences between new and repeated sequences were largest for the short interval, compared to the medium and long intervals, and that this was the case, even when comparing late blocks (where the repeated sequence had been incidentally learned), to early blocks (where this sequence was still unfamiliar). We conclude that incidentally acquired temporal expectations that follow a sequential structure can have a robust facilitatory influence on visually-guided behavioural responses and that, like more explicit forms of temporal orienting, this effect is most pronounced for sequence elements that are expected at short inter-element intervals. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  19. cis-β-Bromostyrene derivatives from cinnamic acids via a tandem substitutive bromination-decarboxylation sequence.

    PubMed

    Tang, Khanh G; Kent, Greggory T; Erden, Ihsan; Wu, Weiming

    2017-10-04

    cis -β-Bromostyrene derivatives were synthesized stereospecifically from cinnamic acids through β-lactone intermediates. The synthetic sequence did not require the purification of the β-lactone intermediates although they were found to be stable and readily purified in most cases.

  20. Characterization of circulating transfer RNA-Derived RNA fragments in cattle

    USDA-ARS?s Scientific Manuscript database

    The objective was to characterize naturally occurring circulating transfer RNA-derived RNA Fragments (tRFs) in cattle. Serum from eight clinically normal adult dairy cows was collected, and small non-coding RNAs were extracted immediately after collection and sequenced by Illumina MiSeq. Sequences a...

  1. Amino acid sequence of bovine muzzle epithelial desmocollin derived from cloned cDNA: a novel subtype of desmosomal cadherins.

    PubMed

    Koch, P J; Goldschmidt, M D; Walsh, M J; Zimbelmann, R; Schmelz, M; Franke, W W

    1991-05-01

    Desmosomes are cell-type-specific intercellular junctions found in epithelium, myocardium and certain other tissues. They consist of assemblies of molecules involved in the adhesion of specific cell types and in the anchorage of cell-type-specific cytoskeletal elements, the intermediate-size filaments, to the plasma membrane. To explore the individual desmosomal components and their functions we have isolated DNA clones encoding the desmosomal glycoprotein, desmocollin, using antibodies and a cDNA expression library from bovine muzzle epithelium. The cDNA-deduced amino-acid sequence of desmocollin (presently we cannot decide to which of the two desmocollins, DC I or DC II, this clone relates) defines a polypeptide with a calculated molecular weight of 85,000, with a single candidate sequence of 24 amino acids sufficiently long for a transmembrane arrangement, and an extracellular aminoterminal portion of 561 amino acid residues, compared to a cytoplasmic part of only 176 amino acids. Amino acid sequence comparisons have revealed that desmocollin is highly homologous to members of the cadherin family of cell adhesion molecules, including the previously sequenced desmoglein, another desmosome-specific cadherin. Using riboprobes derived from cDNAs for Northern-blot analyses, we have identified an mRNA of approximately 6 kb in stratified epithelia such as muzzle epithelium and tongue mucosa but not in two epithelial cell culture lines containing desmosomes and desmoplakins. The difference may indicate drastic differences in mRNA concentration or the existence of cell-type-specific desmocollin subforms. The molecular topology of desmocollin(s) is discussed in relation to possible functions of the individual molecular domains.

  2. Partial molar volumes of proteins: amino acid side-chain contributions derived from the partial molar volumes of some tripeptides over the temperature range 10-90 degrees C.

    PubMed

    Häckel, M; Hinz, H J; Hedwig, G R

    1999-11-15

    The partial molar volumes of tripeptides of sequence glycyl-X-glycine, where X is one of the amino acids alanine, leucine, threonine, glutamine, phenylalanine, histidine, cysteine, proline, glutamic acid, and arginine, have been determined in aqueous solution over the temperature range 10-90 degrees C using differential scanning densitometry . These data, together with those reported previously, have been used to derive the partial molar volumes of the side-chains of all 20 amino acids. The side-chain volumes are critically compared with literature values derived using partial molar volumes for alternative model compounds. The new amino acid side-chain volumes, along with that for the backbone glycyl group, were used to calculate the partial specific volumes of several proteins in aqueous solution. The results obtained are compared with those observed experimentally. The new side-chain volumes have also been used to re-determine residue volume changes upon protein folding.

  3. Generation of a Maize B Centromere Minimal Map Containing the Central Core Domain.

    PubMed

    Ellis, Nathanael A; Douglas, Ryan N; Jackson, Caroline E; Birchler, James A; Dawe, R Kelly

    2015-10-28

    The maize B centromere has been used as a model for centromere epigenetics and as the basis for building artificial chromosomes. However, there are no sequence resources for this important centromere. Here we used transposon display for the centromere-specific retroelement CRM2 to identify a collection of 40 sequence tags that flank CRM2 insertion points on the B chromosome. These were confirmed to lie within the centromere by assaying deletion breakpoints from centromere misdivision derivatives (intracentromere breakages caused by centromere fission). Markers were grouped together on the basis of their association with other markers in the misdivision series and assembled into a pseudocontig containing 10.1 kb of sequence. To identify sequences that interact directly with centromere proteins, we carried out chromatin immunoprecipitation using antibodies to centromeric histone H3 (CENH3), a defining feature of functional centromeric sequences. The CENH3 chromatin immunoprecipitation map was interpreted relative to the known transmission rates of centromere misdivision derivatives to identify a centromere core domain spanning 33 markers. A subset of seven markers was mapped in additional B centromere misdivision derivatives with the use of unique primer pairs. A derivative previously shown to have no canonical centromere sequences (Telo3-3) lacks these core markers. Our results provide a molecular map of the B chromosome centromere and identify key sequences within the map that interact directly with centromeric histone H3. Copyright © 2015 Ellis et al.

  4. Sequence analysis of the pyruvylated galactan sulfate-derived oligosaccharides by negative-ion electrospray tandem mass spectrometry.

    PubMed

    Li, Na; Mao, Wenjun; Liu, Xue; Wang, Shuyao; Xia, Zheng; Cao, Sujian; Li, Lin; Zhang, Qi; Liu, Shan

    2016-10-04

    Five sulfated oligosaccharide fragments, F1-F5, were prepared from a pyruvylated galactan sulfate from the green alga Codium divaricatum, by partial depolymerization using mild acid hydrolysis and purification with gel-permeation chromatography. Negative-ion electrospray tandem mass spectrometry with collision-induced dissociation (ES-CID-MS/MS) is attempted for sequence determination of the sulfated oligosaccharides. The sequence of F1 with homogeneous disaccharide composition was first characterized to be Galp-(4SO4)-(1 → 3)-Galp by detailed nuclear magnetic resonance spectroscopic analyses. The fragmentation pattern of F1 in the product ion spectra was established on the basis of negative-ion ES-CID MS/MS, which was then applied to sequence analysis of other sulfated oligosaccharides. The sequences of F2 and F3 were deduced to be Galp-(4SO4)-(1 → 3)-Galp-(1 → 3)-Galp-(1 → 3)-Galp and 3,4-O-(1-carboxyethylidene)-Galp-(6SO4)-(1 → 3)-Galp, respectively. The sequences of major fragments in F4 and F5 were also deduced. The investigation demonstrated that negative-ion ES-CID-MS/MS was an efficient method for the sequence analysis of the pyruvylated galactan sulfate-derived oligosaccharides which revealed the patterns of substitution and glycosidic linkages. The pyruvylated galactan sulfate-derived oligosaccharides were novel sulfated oligosaccharides different from other algal polysaccharide-derived oligosaccharides. Copyright © 2016 Elsevier Ltd. All rights reserved.

  5. The complete mitochondrial genome of the common sea slater, Ligia oceanica (Crustacea, Isopoda) bears a novel gene order and unusual control region features

    PubMed Central

    Kilpert, Fabian; Podsiadlowski, Lars

    2006-01-01

    Background Sequence data and other characters from mitochondrial genomes (gene translocations, secondary structure of RNA molecules) are useful in phylogenetic studies among metazoan animals from population to phylum level. Moreover, the comparison of complete mitochondrial sequences gives valuable information about the evolution of small genomes, e.g. about different mechanisms of gene translocation, gene duplication and gene loss, or concerning nucleotide frequency biases. The Peracarida (gammarids, isopods, etc.) comprise about 21,000 species of crustaceans, living in many environments from deep sea floor to arid terrestrial habitats. Ligia oceanica is a terrestrial isopod living at rocky seashores of the european North Sea and Atlantic coastlines. Results The study reveals the first complete mitochondrial DNA sequence from a peracarid crustacean. The mitochondrial genome of Ligia oceanica is a circular double-stranded DNA molecule, with a size of 15,289 bp. It shows several changes in mitochondrial gene order compared to other crustacean species. An overview about mitochondrial gene order of all crustacean taxa yet sequenced is also presented. The largest non-coding part (the putative mitochondrial control region) of the mitochondrial genome of Ligia oceanica is unexpectedly not AT-rich compared to the remainder of the genome. It bears two repeat regions (4× 10 bp and 3× 64 bp), and a GC-rich hairpin-like secondary structure. Some of the transfer RNAs show secondary structures which derive from the usual cloverleaf pattern. While some tRNA genes are putative targets for RNA editing, trnR could not be localized at all. Conclusion Gene order is not conserved among Peracarida, not even among isopods. The two isopod species Ligia oceanica and Idotea baltica show a similarly derived gene order, compared to the arthropod ground pattern and to the amphipod Parhyale hawaiiensis, suggesting that most of the translocation events were already present the last common ancestor of these isopods. Beyond that, the positions of three tRNA genes differ in the two isopod species. Strand bias in nucleotide frequency is reversed in both isopod species compared to other Malacostraca. This is probably due to a reversal of the replication origin, which is further supported by the fact that the hairpin structure typically found in the control region shows a reversed orientation in the isopod species, compared to other crustaceans. PMID:16987408

  6. A fast sequence assembly method based on compressed data structures.

    PubMed

    Liang, Peifeng; Zhang, Yancong; Lin, Kui; Hu, Jinglu

    2014-01-01

    Assembling a large genome using next generation sequencing reads requires large computer memory and a long execution time. To reduce these requirements, a memory and time efficient assembler is presented from applying FM-index in JR-Assembler, called FMJ-Assembler, where FM stand for FMR-index derived from the FM-index and BWT and J for jumping extension. The FMJ-Assembler uses expanded FM-index and BWT to compress data of reads to save memory and jumping extension method make it faster in CPU time. An extensive comparison of the FMJ-Assembler with current assemblers shows that the FMJ-Assembler achieves a better or comparable overall assembly quality and requires lower memory use and less CPU time. All these advantages of the FMJ-Assembler indicate that the FMJ-Assembler will be an efficient assembly method in next generation sequencing technology.

  7. Mosaic HIV-1 vaccines expand the breadth and depth of cellular immune responses in rhesus monkeys.

    PubMed

    Barouch, Dan H; O'Brien, Kara L; Simmons, Nathaniel L; King, Sharon L; Abbink, Peter; Maxfield, Lori F; Sun, Ying-Hua; La Porte, Annalena; Riggs, Ambryice M; Lynch, Diana M; Clark, Sarah L; Backus, Katherine; Perry, James R; Seaman, Michael S; Carville, Angela; Mansfield, Keith G; Szinger, James J; Fischer, Will; Muldoon, Mark; Korber, Bette

    2010-03-01

    The worldwide diversity of HIV-1 presents an unprecedented challenge for vaccine development. Antigens derived from natural HIV-1 sequences have elicited only a limited breadth of cellular immune responses in nonhuman primate studies and clinical trials to date. Polyvalent 'mosaic' antigens, in contrast, are designed to optimize cellular immunologic coverage of global HIV-1 sequence diversity. Here we show that mosaic HIV-1 Gag, Pol and Env antigens expressed by recombinant, replication-incompetent adenovirus serotype 26 vectors markedly augmented both the breadth and depth without compromising the magnitude of antigen-specific T lymphocyte responses as compared with consensus or natural sequence HIV-1 antigens in rhesus monkeys. Polyvalent mosaic antigens therefore represent a promising strategy to expand cellular immunologic vaccine coverage for genetically diverse pathogens such as HIV-1.

  8. A global comparability approach for biosimilar monoclonal antibodies using LC-tandem MS based proteomics.

    PubMed

    Chen, Shun-Li; Wu, Shiaw-Lin; Huang, Li-Juan; Huang, Jia-Bao; Chen, Shu-Hui

    2013-06-01

    Liquid chromatography-tandem mass spectrometry-based proteomics for peptide mapping and sequencing was used to characterize the marketed monoclonal antibody trastuzumab and compare it with two biosimilar products, mAb A containing D359E and L361M variations at the Fc site and mAb B without variants. Complete sequence coverage (100%) including disulfide linkages, glycosylations and other commonly occurring modifications (i.e., deamidation, oxidation, dehydration and K-clipping) were identified using maps generated from multi-enzyme digestions. In addition to the targeted comparison for the relative populations of targeted modification forms, a non-targeted approach was used to globally compare ion intensities in tryptic maps. The non-targeted comparison provided an extra-dimensional view to examine any possible differences related to variants or modifications. A peptide containing the two variants in mAb A, D359E and L361M, was revealed using the non-targeted comparison of the tryptic maps. In contrast, no significant differences were observed when trastuzumab was self-compared or compared with mAb B. These results were consistent with the data derived from peptide sequencing via collision induced dissociation/electron transfer dissociation. Thus, combined targeted and non-targeted approaches using powerful mass spectrometry-based proteomic tools hold great promise for the structural characterization of biosimilar products. Copyright © 2013 Elsevier B.V. All rights reserved.

  9. Experimental and statistical post-validation of positive example EST sequences carrying peroxisome targeting signals type 1 (PTS1)

    PubMed Central

    Lingner, Thomas; Kataya, Amr R. A.; Reumann, Sigrun

    2012-01-01

    We recently developed the first algorithms specifically for plants to predict proteins carrying peroxisome targeting signals type 1 (PTS1) from genome sequences.1 As validated experimentally, the prediction methods are able to correctly predict unknown peroxisomal Arabidopsis proteins and to infer novel PTS1 tripeptides. The high prediction performance is primarily determined by the large number and sequence diversity of the underlying positive example sequences, which mainly derived from EST databases. However, a few constructs remained cytosolic in experimental validation studies, indicating sequencing errors in some ESTs. To identify erroneous sequences, we validated subcellular targeting of additional positive example sequences in the present study. Moreover, we analyzed the distribution of prediction scores separately for each orthologous group of PTS1 proteins, which generally resembled normal distributions with group-specific mean values. The cytosolic sequences commonly represented outliers of low prediction scores and were located at the very tail of a fitted normal distribution. Three statistical methods for identifying outliers were compared in terms of sensitivity and specificity.” Their combined application allows elimination of erroneous ESTs from positive example data sets. This new post-validation method will further improve the prediction accuracy of both PTS1 and PTS2 protein prediction models for plants, fungi, and mammals. PMID:22415050

  10. Experimental and statistical post-validation of positive example EST sequences carrying peroxisome targeting signals type 1 (PTS1).

    PubMed

    Lingner, Thomas; Kataya, Amr R A; Reumann, Sigrun

    2012-02-01

    We recently developed the first algorithms specifically for plants to predict proteins carrying peroxisome targeting signals type 1 (PTS1) from genome sequences. As validated experimentally, the prediction methods are able to correctly predict unknown peroxisomal Arabidopsis proteins and to infer novel PTS1 tripeptides. The high prediction performance is primarily determined by the large number and sequence diversity of the underlying positive example sequences, which mainly derived from EST databases. However, a few constructs remained cytosolic in experimental validation studies, indicating sequencing errors in some ESTs. To identify erroneous sequences, we validated subcellular targeting of additional positive example sequences in the present study. Moreover, we analyzed the distribution of prediction scores separately for each orthologous group of PTS1 proteins, which generally resembled normal distributions with group-specific mean values. The cytosolic sequences commonly represented outliers of low prediction scores and were located at the very tail of a fitted normal distribution. Three statistical methods for identifying outliers were compared in terms of sensitivity and specificity." Their combined application allows elimination of erroneous ESTs from positive example data sets. This new post-validation method will further improve the prediction accuracy of both PTS1 and PTS2 protein prediction models for plants, fungi, and mammals.

  11. A Review of Multivariate Distributions for Count Data Derived from the Poisson Distribution

    PubMed Central

    Inouye, David; Yang, Eunho; Allen, Genevera; Ravikumar, Pradeep

    2017-01-01

    The Poisson distribution has been widely studied and used for modeling univariate count-valued data. Multivariate generalizations of the Poisson distribution that permit dependencies, however, have been far less popular. Yet, real-world high-dimensional count-valued data found in word counts, genomics, and crime statistics, for example, exhibit rich dependencies, and motivate the need for multivariate distributions that can appropriately model this data. We review multivariate distributions derived from the univariate Poisson, categorizing these models into three main classes: 1) where the marginal distributions are Poisson, 2) where the joint distribution is a mixture of independent multivariate Poisson distributions, and 3) where the node-conditional distributions are derived from the Poisson. We discuss the development of multiple instances of these classes and compare the models in terms of interpretability and theory. Then, we empirically compare multiple models from each class on three real-world datasets that have varying data characteristics from different domains, namely traffic accident data, biological next generation sequencing data, and text data. These empirical experiments develop intuition about the comparative advantages and disadvantages of each class of multivariate distribution that was derived from the Poisson. Finally, we suggest new research directions as explored in the subsequent discussion section. PMID:28983398

  12. Functional noncoding sequences derived from SINEs in the mammalian genome.

    PubMed

    Nishihara, Hidenori; Smit, Arian F A; Okada, Norihiro

    2006-07-01

    Recent comparative analyses of mammalian sequences have revealed that a large number of nonprotein-coding genomic regions are under strong selective constraint. Here, we report that some of these loci have been derived from a newly defined family of ancient SINEs (short interspersed repetitive elements). This is a surprising result, as SINEs and other transposable elements are commonly thought to be genomic parasites. We named the ancient SINE family AmnSINE1, for Amniota SINE1, because we found it to be present in mammals as well as in birds, and some copies predate the mammalian-bird split 310 million years ago (Mya). AmnSINE1 has a chimeric structure of a 5S rRNA and a tRNA-derived SINE, and is related to five tRNA-derived SINE families that we characterized here in the coelacanth, dogfish shark, hagfish, and amphioxus genomes. All of the newly described SINE families have a common central domain that is also shared by zebrafish SINE3, and we collectively name them the DeuSINE (Deuterostomia SINE) superfamily. Notably, of the approximately 1000 still identifiable copies of AmnSINE1 in the human genome, 105 correspond to loci phylogenetically highly conserved among mammalian orthologs. The conservation is strongest over the central domain. Thus, AmnSINE1 appears to be the best example of a transposable element of which a significant fraction of the copies have acquired genomic functionality.

  13. Morphological and molecular characterization of a marine fish trypanosome from South Africa, including its development in a leech vector

    PubMed Central

    2014-01-01

    Background Trypanosomes are ubiquitous blood parasites of marine and freshwater fishes, typically transmitted by aquatic leeches. Phylogenetic studies have been dominated by examples derived from freshwater fishes, with few marine representatives. Furthermore, life cycle studies on marine fish trypanosomes have focused on those of the northern hemisphere. In this investigation, we have examined the life cycle and molecular taxonomy of a marine fish trypanosome from South Africa. Methods To locate trypanosome stages, leeches were removed from fishes captured on the west and south coasts of South Africa, and fish blood films and leech squashes were Giemsa-stained and screened; leeches were also examined histologically. To determine whether trypanosome stages in fishes and leeches were of the same genotype, DNA was extracted from Giemsa-stained fish blood films and leech squashes, and from fish whole blood. Fragments of the 18S rRNA gene were amplified by PCR using trypanosome-specific primers and sequenced. Resulting sequence data were compared with each other and with published trypanosome 18S rDNA sequences, and used for phylogenetic analysis. Results Trypanosomes were detected in blood films from fishes of the families Clinidae, Blenniidae and Gobiidae. The flagellates ranged in size and staining properties within the films and across fish hosts. In squashes and histological sections of adult and juvenile leeches, identified as Zeylanicobdella arugamensis, trypanosome developmental stages were predominantly slender epimastigotes. Sequence data showed that trypanosomes derived from fishes were identical, irrespective of whether they were small or large forms; sequences derived largely from leech epimastigotes were also identical to those obtained from fish trypanosomes. Fish and leech trypanosome sequences fell into a marine fish aquatic clade, and aligned most closely with two trypanosome sequences from marine fishes off Norway. Conclusions Combined morphological and molecular methods indicate that the trypanosomes examined here represent a single pleomorphic species, rather than the three species described originally. This species is identified as Trypanosoma nudigobii Fantham, 1919 with the leech Z. arugamensis as its vector, and T. capigobii Fantham, 1919 and T. blenniclini Fantham, 1930 are regarded as junior synonyms of the species. Phylogenetic analysis establishes its affinity with marine fish trypanosomes off Norway. PMID:24460725

  14. The testes transcriptome derived from the New World Screwworm, Cochliomyia hominivorax SRA

    USDA-ARS?s Scientific Manuscript database

    In a collaboration with National Center for Genome Resources researchers, we sequenced and assembled the testes transcriptome derived from the Pacora, Panama, production plant strain J06 of the New World Screwworm, Cochliomyia hominivorax. This sequencing project produced 72,750,822 raw reads and th...

  15. Aligning a New Reference Genetic Map of Lupinus angustifolius with the Genome Sequence of the Model Legume, Lotus japonicus

    PubMed Central

    Nelson, Matthew N.; Moolhuijzen, Paula M.; Boersma, Jeffrey G.; Chudy, Magdalena; Lesniewska, Karolina; Bellgard, Matthew; Oliver, Richard P.; Święcicki, Wojciech; Wolko, Bogdan; Cowling, Wallace A.; Ellwood, Simon R.

    2010-01-01

    We have developed a dense reference genetic map of Lupinus angustifolius (2n = 40) based on a set of 106 publicly available recombinant inbred lines derived from a cross between domesticated and wild parental lines. The map comprised 1090 loci in 20 linkage groups and three small clusters, drawing together data from several previous mapping publications plus almost 200 new markers, of which 63 were gene-based markers. A total of 171 mainly gene-based, sequence-tagged site loci served as bridging points for comparing the Lu. angustifolius genome with the genome sequence of the model legume, Lotus japonicus via BLASTn homology searching. Comparative analysis indicated that the genomes of Lu. angustifolius and Lo. japonicus are highly diverged structurally but with significant regions of conserved synteny including the region of the Lu. angustifolius genome containing the pod-shatter resistance gene, lentus. We discuss the potential of synteny analysis for identifying candidate genes for domestication traits in Lu. angustifolius and in improving our understanding of Fabaceae genome evolution. PMID:20133394

  16. The Mouse Genomes Project: a repository of inbred laboratory mouse strain genomes.

    PubMed

    Adams, David J; Doran, Anthony G; Lilue, Jingtao; Keane, Thomas M

    2015-10-01

    The Mouse Genomes Project was initiated in 2009 with the goal of using next-generation sequencing technologies to catalogue molecular variation in the common laboratory mouse strains, and a selected set of wild-derived inbred strains. The initial sequencing and survey of sequence variation in 17 inbred strains was completed in 2011 and included comprehensive catalogue of single nucleotide polymorphisms, short insertion/deletions, larger structural variants including their fine scale architecture and landscape of transposable element variation, and genomic sites subject to post-transcriptional alteration of RNA. From this beginning, the resource has expanded significantly to include 36 fully sequenced inbred laboratory mouse strains, a refined and updated data processing pipeline, and new variation querying and data visualisation tools which are available on the project's website ( http://www.sanger.ac.uk/resources/mouse/genomes/ ). The focus of the project is now the completion of de novo assembled chromosome sequences and strain-specific gene structures for the core strains. We discuss how the assembled chromosomes will power comparative analysis, data access tools and future directions of mouse genetics.

  17. Prediction of Spontaneous Protein Deamidation from Sequence-Derived Secondary Structure and Intrinsic Disorder.

    PubMed

    Lorenzo, J Ramiro; Alonso, Leonardo G; Sánchez, Ignacio E

    2015-01-01

    Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage "Protein and nucleic acid structure and sequence analysis".

  18. FMRI investigation of cross-modal interactions in beat perception: Audition primes vision, but not vice versa

    PubMed Central

    Grahn, Jessica A.; Henry, Molly J.; McAuley, J. Devin

    2011-01-01

    How we measure time and integrate temporal cues from different sensory modalities are fundamental questions in neuroscience. Sensitivity to a “beat” (such as that routinely perceived in music) differs substantially between auditory and visual modalities. Here we examined beat sensitivity in each modality, and examined cross-modal influences, using functional magnetic resonance imaging (fMRI) to characterize brain activity during perception of auditory and visual rhythms. In separate fMRI sessions, participants listened to auditory sequences or watched visual sequences. The order of auditory and visual sequence presentation was counterbalanced so that cross-modal order effects could be investigated. Participants judged whether sequences were speeding up or slowing down, and the pattern of tempo judgments was used to derive a measure of sensitivity to an implied beat. As expected, participants were less sensitive to an implied beat in visual sequences than in auditory sequences. However, visual sequences produced a stronger sense of beat when preceded by auditory sequences with identical temporal structure. Moreover, increases in brain activity were observed in the bilateral putamen for visual sequences preceded by auditory sequences when compared to visual sequences without prior auditory exposure. No such order-dependent differences (behavioral or neural) were found for the auditory sequences. The results provide further evidence for the role of the basal ganglia in internal generation of the beat and suggest that an internal auditory rhythm representation may be activated during visual rhythm perception. PMID:20858544

  19. An accurate algorithm for the detection of DNA fragments from dilution pool sequencing experiments.

    PubMed

    Bansal, Vikas

    2018-01-01

    The short read lengths of current high-throughput sequencing technologies limit the ability to recover long-range haplotype information. Dilution pool methods for preparing DNA sequencing libraries from high molecular weight DNA fragments enable the recovery of long DNA fragments from short sequence reads. These approaches require computational methods for identifying the DNA fragments using aligned sequence reads and assembling the fragments into long haplotypes. Although a number of computational methods have been developed for haplotype assembly, the problem of identifying DNA fragments from dilution pool sequence data has not received much attention. We formulate the problem of detecting DNA fragments from dilution pool sequencing experiments as a genome segmentation problem and develop an algorithm that uses dynamic programming to optimize a likelihood function derived from a generative model for the sequence reads. This algorithm uses an iterative approach to automatically infer the mean background read depth and the number of fragments in each pool. Using simulated data, we demonstrate that our method, FragmentCut, has 25-30% greater sensitivity compared with an HMM based method for fragment detection and can also detect overlapping fragments. On a whole-genome human fosmid pool dataset, the haplotypes assembled using the fragments identified by FragmentCut had greater N50 length, 16.2% lower switch error rate and 35.8% lower mismatch error rate compared with two existing methods. We further demonstrate the greater accuracy of our method using two additional dilution pool datasets. FragmentCut is available from https://bansal-lab.github.io/software/FragmentCut. vibansal@ucsd.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  20. RNA Editing in Plant Mitochondria

    NASA Astrophysics Data System (ADS)

    Hiesel, Rudolf; Wissinger, Bernd; Schuster, Wolfgang; Brennicke, Axel

    1989-12-01

    Comparative sequence analysis of genomic and complementary DNA clones from several mitochondrial genes in the higher plant Oenothera revealed nucleotide sequence divergences between the genomic and the messenger RNA-derived sequences. These sequence alterations could be most easily explained by specific post-transcriptional nucleotide modifications. Most of the nucleotide exchanges in coding regions lead to altered codons in the mRNA that specify amino acids better conserved in evolution than those encoded by the genomic DNA. Several instances show that the genomic arginine codon CGG is edited in the mRNA to the tryptophan codon TGG in amino acid positions that are highly conserved as tryptophan in the homologous proteins of other species. This editing suggests that the standard genetic code is used in plant mitochondria and resolves the frequent coincidence of CGG codons and tryptophan in different plant species. The apparently frequent and non-species-specific equivalency of CGG and TGG codons in particular suggests that RNA editing is a common feature of all higher plant mitochondria.

  1. Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications

    PubMed Central

    Harris, R. Alan; Wang, Ting; Coarfa, Cristian; Nagarajan, Raman P.; Hong, Chibo; Downey, Sara L.; Johnson, Brett E.; Fouse, Shaun D.; Delaney, Allen; Zhao, Yongjun; Olshen, Adam; Ballinger, Tracy; Zhou, Xin; Forsberg, Kevin J.; Gu, Junchen; Echipare, Lorigail; O’Geen, Henriette; Lister, Ryan; Pelizzola, Mattia; Xi, Yuanxin; Epstein, Charles B.; Bernstein, Bradley E.; Hawkins, R. David; Ren, Bing; Chung, Wen-Yu; Gu, Hongcang; Bock, Christoph; Gnirke, Andreas; Zhang, Michael Q.; Haussler, David; Ecker, Joseph; Li, Wei; Farnham, Peggy J.; Waterland, Robert A.; Meissner, Alexander; Marra, Marco A.; Hirst, Martin; Milosavljevic, Aleksandar; Costello, Joseph F.

    2010-01-01

    Sequencing-based DNA methylation profiling methods are comprehensive and, as accuracy and affordability improve, will increasingly supplant microarrays for genome-scale analyses. Here, four sequencing-based methodologies were applied to biological replicates of human embryonic stem cells to compare their CpG coverage genome-wide and in transposons, resolution, cost, concordance and its relationship with CpG density and genomic context. The two bisulfite methods reached concordance of 82% for CpG methylation levels and 99% for non-CpG cytosine methylation levels. Using binary methylation calls, two enrichment methods were 99% concordant, while regions assessed by all four methods were 97% concordant. To achieve comprehensive methylome coverage while reducing cost, an approach integrating two complementary methods was examined. The integrative methylome profile along with histone methylation, RNA, and SNP profiles derived from the sequence reads allowed genome-wide assessment of allele-specific epigenetic states, identifying most known imprinted regions and new loci with monoallelic epigenetic marks and monoallelic expression. PMID:20852635

  2. M dwarf spectra from 0.6 to 1.5 micron - A spectral sequence, model atmosphere fitting, and the temperature scale

    NASA Technical Reports Server (NTRS)

    Kirkpatrick, J. D.; Kelly, Douglas M.; Rieke, George H.; Liebert, James; Allard, France; Wehrse, Rainer

    1993-01-01

    Red/infrared (0.6-1.5 micron) spectra are presented for a sequence of well-studied M dwarfs ranging from M2 through M9. A variety of temperature-sensitive features useful for spectral classification are identified. Using these features, the spectral data are compared to recent theoretical models, from which a temperature scale is assigned. The red portion of the model spectra provide reasonably good fits for dwarfs earlier than M6. For layer types, the infrared region provides a more reliable fit to the observations. In each case, the wavelength region used includes the broad peak of the energy distribution. For a given spectral type, the derived temperature sequence assigns higher temperatures than have earlier studies - the difference becoming more pronounced at lower luminosities. The positions of M dwarfs on the H-R diagram are, as a result, in closer agreement with theoretical tracks of the lower main sequence.

  3. An Efficient Strategy of Screening for Pathogens in Wild-Caught Ticks and Mosquitoes by Reusing Small RNA Deep Sequencing Data

    PubMed Central

    An, Xiaoping; Fan, Hang; Ma, Maijuan; Anderson, Benjamin D.; Jiang, Jiafu; Liu, Wei; Cao, Wuchun; Tong, Yigang

    2014-01-01

    This paper explored our hypothesis that sRNA (18∼30 bp) deep sequencing technique can be used as an efficient strategy to identify microorganisms other than viruses, such as prokaryotic and eukaryotic pathogens. In the study, the clean reads derived from the sRNA deep sequencing data of wild-caught ticks and mosquitoes were compared against the NCBI nucleotide collection (non-redundant nt database) using Blastn. The blast results were then analyzed with in-house Python scripts. An empirical formula was proposed to identify the putative pathogens. Results showed that not only viruses but also prokaryotic and eukaryotic species of interest can be screened out and were subsequently confirmed with experiments. Specially, a novel Rickettsia spp. was indicated to exist in Haemaphysalis longicornis ticks collected in Beijing. Our study demonstrated the reuse of sRNA deep sequencing data would have the potential to trace the origin of pathogens or discover novel agents of emerging/re-emerging infectious diseases. PMID:24618575

  4. Classification of Culturable Bifidobacterial Population from Colonic Samples of Wild Pigs (Sus scrofa) Based on Three Molecular Genetic Methods.

    PubMed

    Pechar, Radko; Killer, Jiří; Mekadim, Chahrazed; Geigerová, Martina; Rada, Vojtěch

    2017-11-01

    Occurrence of bifidobacteria, known as health-promoting probiotic microorganisms, in the digestive tract of wild pigs (Sus scrofa) has not been examined yet. One hundred forty-nine fructose-6-phosphate phosphoketolase positive bacterial strains were isolated from colonic content of twenty-two individuals of wild pigs originated from four localities in the Czechia. Based on PCR-DGGE technique targeting the variable V3 region of the 16S rRNA genes, strains were initially differentiated into four groups represented by: (i) probably a new Bifidobacterium species (89 strains), (ii) B. boum/B. thermophilum/B. thermacidophilum subsp. porcinum/B. thermacidophilum subsp. thermacidophilum (sub)species (49 strains), (iii) Pseudoscardovia suis (7 strains), and (iv) B. pseudolongum subsp. globosum/B. pseudolongum subsp. pseudolongum (4 strains), respectively. Given the fact that DGGE technique did not allow to differentiate the representatives of thermophilic bifidobacteria and B. pseudolongum subspecies, strains were further classified by the 16S rRNA and thrS gene sequences. Primers targeting the variable regions of the latter gene were designed to be applicable in identification and phylogeny of Bifidobacteriaceae family. The 16S rRNA-derived phylogenetic study classified members of the first group into five subgroups in a separated cluster of thermophilic bifidobacteria. Comparable results were obtained by the thrS-derived phylogenetic analysis. Remarkably, variability among thrS sequences was higher compared with 16S rRNA gene sequences. Overall, molecular genetic techniques application allowed to identify a new Bifidobacterium phylotype which is predominant in the digestive tract of examined wild pigs.

  5. A nucleotide sequence comparison of coxsackievirus B4 isolates from aquatic samples and clinical specimens.

    PubMed Central

    Hughes, M. S.; Hoey, E. M.; Coyle, P. V.

    1993-01-01

    Ten coxsackievirus B4 (CVB4) strains isolated from clinical and environmental sources in Northern Ireland in 1985-7, were compared at the nucleotide sequence level. Dideoxynucleotide sequencing of a polymerase chain reaction (PCR) amplified fragment, spanning the VP1/P2A genomic region, classified the isolates into two distinct groups or genotypes as defined by Rico-Hesse and colleagues for poliovirus type 1. Isolates within each group shared approximately 99% sequence identity at the nucleotide level whereas < or = 86% sequence identity was shared between groups. One isolate derived from a clinical specimen in 1987 was grouped with six CVB4 isolates recovered from the aquatic environment in 1986-7. The second group comprised CVB4 isolates from clinical specimens in 1985-6. Both groups were different at the nucleotide level from the prototype strain isolated in 1950. It was concluded that the method could be used to sub-type CVB4 isolates and would be of value in epidemiological studies of CVB4. Predicted amino acid sequences revealed non-conservation of the tyrosine residue at the VP1/P2A cleavage site but were of little value in distinguishing CVB4 variants. PMID:8386098

  6. Three 3D graphical representations of DNA primary sequences based on the classifications of DNA bases and their applications.

    PubMed

    Xie, Guosen; Mo, Zhongxi

    2011-01-21

    In this article, we introduce three 3D graphical representations of DNA primary sequences, which we call RY-curve, MK-curve and SW-curve, based on three classifications of the DNA bases. The advantages of our representations are that (i) these 3D curves are strictly non-degenerate and there is no loss of information when transferring a DNA sequence to its mathematical representation and (ii) the coordinates of every node on these 3D curves have clear biological implication. Two applications of these 3D curves are presented: (a) a simple formula is derived to calculate the content of the four bases (A, G, C and T) from the coordinates of nodes on the curves; and (b) a 12-component characteristic vector is constructed to compare similarity among DNA sequences from different species based on the geometrical centers of the 3D curves. As examples, we examine similarity among the coding sequences of the first exon of beta-globin gene from eleven species and validate similarity of cDNA sequences of beta-globin gene from eight species. Copyright © 2010 Elsevier Ltd. All rights reserved.

  7. A two-step recognition of signal sequences determines the translocation efficiency of proteins.

    PubMed Central

    Belin, D; Bost, S; Vassalli, J D; Strub, K

    1996-01-01

    The cytosolic and secreted, N-glycosylated, forms of plasminogen activator inhibitor-2 (PAI-2) are generated by facultative translocation. To study the molecular events that result in the bi-topological distribution of proteins, we determined in vitro the capacities of several signal sequences to bind the signal recognition particle (SRP) during targeting, and to promote vectorial transport of murine PAI-2 (mPAI-2). Interestingly, the six signal sequences we compared (mPAI-2 and three mutated derivatives thereof, ovalbumin and preprolactin) were found to have the differential activities in the two events. For example, the mPAI-2 signal sequence first binds SRP with moderate efficiency and secondly promotes the vectorial transport of only a fraction of the SRP-bound nascent chains. Our results provide evidence that the translocation efficiency of proteins can be controlled by the recognition of their signal sequences at two steps: during SRP-mediated targeting and during formation of a committed translocation complex. This second recognition may occur at several time points during the insertion/translocation step. In conclusion, signal sequences have a more complex structure than previously anticipated, allowing for multiple and independent interactions with the translocation machinery. Images PMID:8599930

  8. A two-step recognition of signal sequences determines the translocation efficiency of proteins.

    PubMed

    Belin, D; Bost, S; Vassalli, J D; Strub, K

    1996-02-01

    The cytosolic and secreted, N-glycosylated, forms of plasminogen activator inhibitor-2 (PAI-2) are generated by facultative translocation. To study the molecular events that result in the bi-topological distribution of proteins, we determined in vitro the capacities of several signal sequences to bind the signal recognition particle (SRP) during targeting, and to promote vectorial transport of murine PAI-2 (mPAI-2). Interestingly, the six signal sequences we compared (mPAI-2 and three mutated derivatives thereof, ovalbumin and preprolactin) were found to have the differential activities in the two events. For example, the mPAI-2 signal sequence first binds SRP with moderate efficiency and secondly promotes the vectorial transport of only a fraction of the SRP-bound nascent chains. Our results provide evidence that the translocation efficiency of proteins can be controlled by the recognition of their signal sequences at two steps: during SRP-mediated targeting and during formation of a committed translocation complex. This second recognition may occur at several time points during the insertion/translocation step. In conclusion, signal sequences have a more complex structure than previously anticipated, allowing for multiple and independent interactions with the translocation machinery.

  9. Comparison of the genomic sequence of the microminipig, a novel breed of swine, with the genomic database for conventional pig.

    PubMed

    Miura, Naoki; Kucho, Ken-Ichi; Noguchi, Michiko; Miyoshi, Noriaki; Uchiumi, Toshiki; Kawaguchi, Hiroaki; Tanimoto, Akihide

    2014-01-01

    The microminipig, which weighs less than 10 kg at an early stage of maturity, has been reported as a potential experimental model animal. Its extremely small size and other distinct characteristics suggest the possibility of a number of differences between the genome of the microminipig and that of conventional pigs. In this study, we analyzed the genomes of two healthy microminipigs using a next-generation sequencer SOLiD™ system. We then compared the obtained genomic sequences with a genomic database for the domestic pig (Sus scrofa). The mapping coverage of sequenced tag from the microminipig to conventional pig genomic sequences was greater than 96% and we detected no clear, substantial genomic variance from these data. The results may indicate that the distinct characteristics of the microminipig derive from small-scale alterations in the genome, such as Single Nucleotide Polymorphisms or translational modifications, rather than large-scale deletion or insertion polymorphisms. Further investigation of the entire genomic sequence of the microminipig with methods enabling deeper coverage is required to elucidate the genetic basis of its distinct phenotypic traits. Copyright © 2014 International Institute of Anticancer Research (Dr. John G. Delinassios), All rights reserved.

  10. A palindrome-mediated mechanism distinguishes translocations involving LCR-B of chromosome 22q11.2.

    PubMed

    Gotter, Anthony L; Shaikh, Tamim H; Budarf, Marcia L; Rhodes, C Harker; Emanuel, Beverly S

    2004-01-01

    Two known recurrent constitutional translocations, t(11;22) and t(17;22), as well as a non-recurrent t(4;22), display derivative chromosomes that have joined to a common site within the low copy repeat B (LCR-B) region of 22q11.2. This breakpoint is located between two AT-rich inverted repeats that form a nearly perfect palindrome. Breakpoints within the 11q23, 17q11 and 4q35 partner chromosomes also fall near the center of palindromic sequences. In the present work the breakpoints of a fourth translocation involving LCR-B, a balanced ependymoma-associated t(1;22), were characterized not only to localize this junction relative to known genes, but also to further understand the mechanism underlying these rearrangements. FISH mapping was used to localize the 22q11.2 breakpoint to LCR-B and the 1p21 breakpoint to single BAC clones. STS mapping narrowed the 1p21.2 breakpoint to a 1990 bp AT-rich region, and junction fragments were amplified by nested PCR. Junction fragment-derived sequence indicates that the 1p21.2 breakpoint splits a 278 nt palindrome capable of forming stem-loop secondary structure. In contrast, the 1p21.2 reference genomic sequence from clones in the database does not exhibit this configuration, suggesting a predisposition for regional genomic instability perhaps etiologic for this rearrangement. Given its similarity to known chromosomal fragile site (FRA) sequences, this polymorphic 1p21.2 sequence may represent one of the FRA1 loci. Comparative analysis of the secondary structure of sequences surrounding translocation breakpoints that involve LCR-B with those not involving this region indicate a unique ability of the former to form stem-loop structures. The relative likelihood of forming these configurations appears to be related to the rate of translocation occurrence. Further analysis suggests that constitutional translocations in general occur between sequences of similar melting temperature and propensity for secondary structure.

  11. Identification and Evaluation of Single-Nucleotide Polymorphisms in Allotetraploid Peanut (Arachis hypogaea L.) Based on Amplicon Sequencing Combined with High Resolution Melting (HRM) Analysis.

    PubMed

    Hong, Yanbin; Pandey, Manish K; Liu, Ying; Chen, Xiaoping; Liu, Hong; Varshney, Rajeev K; Liang, Xuanqiang; Huang, Shangzhi

    2015-01-01

    The cultivated peanut (Arachis hypogaea L.) is an allotetraploid (AABB) species derived from the A-genome (Arachis duranensis) and B-genome (Arachis ipaensis) progenitors. Presence of two versions of a DNA sequence based on the two progenitor genomes poses a serious technical and analytical problem during single nucleotide polymorphism (SNP) marker identification and analysis. In this context, we have analyzed 200 amplicons derived from expressed sequence tags (ESTs) and genome survey sequences (GSS) to identify SNPs in a panel of genotypes consisting of 12 cultivated peanut varieties and two diploid progenitors representing the ancestral genomes. A total of 18 EST-SNPs and 44 genomic-SNPs were identified in 12 peanut varieties by aligning the sequence of A. hypogaea with diploid progenitors. The average frequency of sequence polymorphism was higher for genomic-SNPs than the EST-SNPs with one genomic-SNP every 1011 bp as compared to one EST-SNP every 2557 bp. In order to estimate the potential and further applicability of these identified SNPs, 96 peanut varieties were genotyped using high resolution melting (HRM) method. Polymorphism information content (PIC) values for EST-SNPs ranged between 0.021 and 0.413 with a mean of 0.172 in the set of peanut varieties, while genomic-SNPs ranged between 0.080 and 0.478 with a mean of 0.249. Total 33 SNPs were used for polymorphism detection among the parents and 10 selected lines from mapping population Y13Zh (Zhenzhuhei × Yueyou13). Of the total 33 SNPs, nine SNPs showed polymorphism in the mapping population Y13Zh, and seven SNPs were successfully mapped into five linkage groups. Our results showed that SNPs can be identified in allotetraploid peanut with high accuracy through amplicon sequencing and HRM assay. The identified SNPs were very informative and can be used for different genetic and breeding applications in peanut.

  12. A palindrome-mediated mechanism distinguishes translocations involving LCR-B of chromosome 22q11.2

    PubMed Central

    Gotter, Anthony L.; Shaikh, Tamim H.; Budarf, Marcia L.; Rhodes, C. Harker; Emanuel, Beverly S.

    2010-01-01

    Two known recurrent constitutional translocations, t(11;22) and t(17;22), as well as a non-recurrent t(4;22), display derivative chromosomes that have joined to a common site within the low copy repeat B (LCR-B) region of 22q11.2. This breakpoint is located between two AT-rich inverted repeats that form a nearly perfect palindrome. Breakpoints within the 11q23, 17q11 and 4q35 partner chromosomes also fall near the center of palindromic sequences. In the present work the breakpoints of a fourth translocation involving LCR-B, a balanced ependymoma-associated t(1;22), were characterized not only to localize this junction relative to known genes, but also to further understand the mechanism underlying these rearrangements. FISH mapping was used to localize the 22q11.2 breakpoint to LCR-B and the 1p21 breakpoint to single BAC clones. STS mapping narrowed the 1p21.2 breakpoint to a 1990 bp AT-rich region, and junction fragments were amplified by nested PCR. Junction fragment-derived sequence indicates that the 1p21.2 breakpoint splits a 278 nt palindrome capable of forming stem–loop secondary structure. In contrast, the 1p21.2 reference genomic sequence from clones in the database does not exhibit this configuration, suggesting a predisposition for regional genomic instability perhaps etiologic for this rearrangement. Given its similarity to known chromosomal fragile site (FRA) sequences, this polymorphic 1p21.2 sequence may represent one of the FRA1 loci. Comparative analysis of the secondary structure of sequences surrounding translocation breakpoints that involve LCR-B with those not involving this region indicate a unique ability of the former to form stem–loop structures. The relative likelihood of forming these configurations appears to be related to the rate of translocation occurrence. Further analysis suggests that constitutional translocations in general occur between sequences of similar melting temperature and propensity for secondary structure. PMID:14613967

  13. Reference tissue quantification of DCE-MRI data without a contrast agent calibration

    NASA Astrophysics Data System (ADS)

    Walker-Samuel, Simon; Leach, Martin O.; Collins, David J.

    2007-02-01

    The quantification of dynamic contrast-enhanced (DCE) MRI data conventionally requires a conversion from signal intensity to contrast agent concentration by measuring a change in the tissue longitudinal relaxation rate, R1. In this paper, it is shown that the use of a spoiled gradient-echo acquisition sequence (optimized so that signal intensity scales linearly with contrast agent concentration) in conjunction with a reference tissue-derived vascular input function (VIF), avoids the need for the conversion to Gd-DTPA concentration. This study evaluates how to optimize such sequences and which dynamic time-series parameters are most suitable for this type of analysis. It is shown that signal difference and relative enhancement provide useful alternatives when full contrast agent quantification cannot be achieved, but that pharmacokinetic parameters derived from both contain sources of error (such as those caused by differences between reference tissue and region of interest proton density and native T1 values). It is shown in a rectal cancer study that these sources of uncertainty are smaller when using signal difference, compared with relative enhancement (15 ± 4% compared with 33 ± 4%). Both of these uncertainties are of the order of those associated with the conversion to Gd-DTPA concentration, according to literature estimates.

  14. Simultaneous digital super-resolution and nonuniformity correction for infrared imaging systems.

    PubMed

    Meza, Pablo; Machuca, Guillermo; Torres, Sergio; Martin, Cesar San; Vera, Esteban

    2015-07-20

    In this article, we present a novel algorithm to achieve simultaneous digital super-resolution and nonuniformity correction from a sequence of infrared images. We propose to use spatial regularization terms that exploit nonlocal means and the absence of spatial correlation between the scene and the nonuniformity noise sources. We derive an iterative optimization algorithm based on a gradient descent minimization strategy. Results from infrared image sequences corrupted with simulated and real fixed-pattern noise show a competitive performance compared with state-of-the-art methods. A qualitative analysis on the experimental results obtained with images from a variety of infrared cameras indicates that the proposed method provides super-resolution images with significantly less fixed-pattern noise.

  15. Mass and age of red giant branch stars observed with LAMOST and Kepler

    NASA Astrophysics Data System (ADS)

    Wu, Yaqian; Xiang, Maosheng; Bi, Shaolan; Liu, Xiaowei; Yu, Jie; Hon, Marc; Sharma, Sanjib; Li, Tanda; Huang, Yang; Liu, Kang; Zhang, Xianfei; Li, Yaguang; Ge, Zhishuai; Tian, Zhijia; Zhang, Jinghua; Zhang, Jianwei

    2018-04-01

    Obtaining accurate and precise masses and ages for large numbers of giant stars is of great importance for unraveling the assemblage history of the Galaxy. In this paper, we estimate masses and ages of 6940 red giant branch (RGB) stars with asteroseismic parameters deduced from Kepler photometry and stellar atmospheric parameters derived from LAMOST spectra. The typical uncertainties of mass is a few per cent, and that of age is ˜20 per cent. The sample stars reveal two separate sequences in the age-[α/Fe] relation - a high-α sequence with stars older than ˜8 Gyr and a low-α sequence composed of stars with ages ranging from younger than 1 Gyr to older than 11 Gyr. We further investigate the feasibility of deducing ages and masses directly from LAMOST spectra with a machine learning method based on kernel based principal component analysis, taking a sub-sample of these RGB stars as a training data set. We demonstrate that ages thus derived achieve an accuracy of ˜24 per cent. We also explored the feasibility of estimating ages and masses based on the spectroscopically measured carbon and nitrogen abundances. The results are quite satisfactory and significantly improved compared to the previous studies.

  16. Cloning and Sequencing of Defective Particles Derived from the Autonomous Parvovirus Minute Virus of Mice for the Construction of Vectors with Minimal cis-Acting Sequences

    PubMed Central

    Clément, Nathalie; Avalosse, Bernard; El Bakkouri, Karim; Velu, Thierry; Brandenburger, Annick

    2001-01-01

    The production of wild-type-free stocks of recombinant parvovirus minute virus of mice [MVM(p)] is difficult due to the presence of homologous sequences in vector and helper genomes that cannot easily be eliminated from the overlapping coding sequences. We have therefore cloned and sequenced spontaneously occurring defective particles of MVM(p) with very small genomes to identify the minimal cis-acting sequences required for DNA amplification and virus production. One of them has lost all capsid-coding sequences but is still able to replicate in permissive cells when nonstructural proteins are provided in trans by a helper plasmid. Vectors derived from this particle produce stocks with no detectable wild-type MVM after cotransfection with new, matched, helper plasmids that present no homology downstream from the transgene. PMID:11152501

  17. Evaluation of sequence alignments and oligonucleotide probes with respect to three-dimensional structure of ribosomal RNA using ARB software package

    PubMed Central

    Kumar, Yadhu; Westram, Ralf; Kipfer, Peter; Meier, Harald; Ludwig, Wolfgang

    2006-01-01

    Background Availability of high-resolution RNA crystal structures for the 30S and 50S ribosomal subunits and the subsequent validation of comparative secondary structure models have prompted the biologists to use three-dimensional structure of ribosomal RNA (rRNA) for evaluating sequence alignments of rRNA genes. Furthermore, the secondary and tertiary structural features of rRNA are highly useful and successfully employed in designing rRNA targeted oligonucleotide probes intended for in situ hybridization experiments. RNA3D, a program to combine sequence alignment information with three-dimensional structure of rRNA was developed. Integration into ARB software package, which is used extensively by the scientific community for phylogenetic analysis and molecular probe designing, has substantially extended the functionality of ARB software suite with 3D environment. Results Three-dimensional structure of rRNA is visualized in OpenGL 3D environment with the abilities to change the display and overlay information onto the molecule, dynamically. Phylogenetic information derived from the multiple sequence alignments can be overlaid onto the molecule structure in a real time. Superimposition of both statistical and non-statistical sequence associated information onto the rRNA 3D structure can be done using customizable color scheme, which is also applied to a textual sequence alignment for reference. Oligonucleotide probes designed by ARB probe design tools can be mapped onto the 3D structure along with the probe accessibility models for evaluation with respect to secondary and tertiary structural conformations of rRNA. Conclusion Visualization of three-dimensional structure of rRNA in an intuitive display provides the biologists with the greater possibilities to carry out structure based phylogenetic analysis. Coupled with secondary structure models of rRNA, RNA3D program aids in validating the sequence alignments of rRNA genes and evaluating probe target sites. Superimposition of the information derived from the multiple sequence alignment onto the molecule dynamically allows the researchers to observe any sequence inherited characteristics (phylogenetic information) in real-time environment. The extended ARB software package is made freely available for the scientific community via . PMID:16672074

  18. Nearly simultaneous observations of chromospheric and coronal radiative losses of cool stars

    NASA Technical Reports Server (NTRS)

    Schrijver, C. J.; Dobson, A. K.; Radick, R. R.

    1992-01-01

    The flux-flux relationships of cool stars are studied on the basis of nearly simultaneous measurements of Ca II H+K, Mg II h+k, and soft X-ray fluxes. A linear relationship is derived between IUE Mg II h+k fluxes and Mount Wilson Ca II H+K fluxes which were obtained within 36 hr of each other for a sample of 26 F5-K3 main-sequence stars. Nearly simultaneous EXOSAT soft X-ray fluxes are compared with Ca II H+K fluxes for a sample of 20 dwarfs and gaints with spectral types ranging from F6 to K2, and 72 additional cool stars for which noncontemporaneous Ca II H+K and EINSTEIN soft X-ray fluxes are available are compared. It is confirmed that a nonradiatively heated chromosphere exists on even the least active main-sequence stars. This basal chromosphere is probably independent of stellar magnetic activity.

  19. Unexpected DNA affinity and sequence selectivity through core rigidity in guanidinium-based minor groove binders.

    PubMed

    Nagle, Padraic S; McKeever, Caitriona; Rodriguez, Fernando; Nguyen, Binh; Wilson, W David; Rozas, Isabel

    2014-09-25

    In this paper we report the design and biophysical evaluation of novel rigid-core symmetric and asymmetric dicationic DNA binders containing 9H-fluorene and 9,10-dihydroanthracene cores as well as the synthesis of one of these fluorene derivatives. First, the affinity toward particular DNA sequences of these compounds and flexible core derivatives was evaluated by means of surface plasmon resonance and thermal denaturation experiments finding that the position of the cations significantly influence the binding strength. Then their affinity and mode of binding were further studied by performing circular dichroism and UV studies and the results obtained were rationalized by means of DFT calculations. We found that the fluorene derivatives prepared have the ability to bind to the minor groove of certain DNA sequences and intercalate to others, whereas the dihydroanthracene compounds bind via intercalation to all the DNA sequences studied here.

  20. RNA Sequencing Reveals Differential Expression of Mitochondrial and Oxidation Reduction Genes in the Long-Lived Naked Mole-Rat When Compared to Mice

    PubMed Central

    Holmes, Andrew; Szafranski, Karol; Faulkes, Chris G.; Coen, Clive W.; Buffenstein, Rochelle; Platzer, Matthias; de Magalhães, João Pedro; Church, George M.

    2011-01-01

    The naked mole-rat (Heterocephalus glaber) is a long-lived, cancer resistant rodent and there is a great interest in identifying the adaptations responsible for these and other of its unique traits. We employed RNA sequencing to compare liver gene expression profiles between naked mole-rats and wild-derived mice. Our results indicate that genes associated with oxidoreduction and mitochondria were expressed at higher relative levels in naked mole-rats. The largest effect is nearly 300-fold higher expression of epithelial cell adhesion molecule (Epcam), a tumour-associated protein. Also of interest are the protease inhibitor, alpha2-macroglobulin (A2m), and the mitochondrial complex II subunit Sdhc, both ageing-related genes found strongly over-expressed in the naked mole-rat. These results hint at possible candidates for specifying species differences in ageing and cancer, and in particular suggest complex alterations in mitochondrial and oxidation reduction pathways in the naked mole-rat. Our differential gene expression analysis obviated the need for a reference naked mole-rat genome by employing a combination of Illumina/Solexa and 454 platforms for transcriptome sequencing and assembling transcriptome contigs of the non-sequenced species. Overall, our work provides new research foci and methods for studying the naked mole-rat's fascinating characteristics. PMID:22073188

  1. ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes

    PubMed Central

    Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim

    2010-01-01

    Motivation: Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith–Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid™, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. Availability: The database can be accessed through http://proteinworlddb.org Contact: otto@fiocruz.br PMID:20089515

  2. Isotopic (U-Pb, Nd) and geochemical constraints on the origins of the Aileu and Gondwana sequences of Timor

    NASA Astrophysics Data System (ADS)

    Boger, S. D.; Spelbrink, L. G.; Lee, R. I.; Sandiford, M.; Maas, R.; Woodhead, J. D.

    2017-02-01

    Detrital zircon U-Pb age data collected from the argillitic sedimentary rocks of the Timorese Aileu Complex and Gondwana Sequence indicate that both units were derived from a common source containing 200-600 Ma, 900-1250 Ma and 1450-1900 Ma zircon. The modally most significant age population within this range of ages dates to c. 260 Ma. The observed spectrum of ages can be traced to the eastern active margin of Pangaea and its immediate foreland, which today is best exposed along the northeast coast of Australia. Compared to the relative homogeneity of the detrital zircon age data, geochemical and Nd isotopic data show that the mudstones of the Aileu Complex are on average more siliceous, have higher K2O/Na2O, Rb/Sr, Th/Sc and yield notably older Nd TDM model ages when compared to those from the Gondwana Sequence. These data are interpreted to suggest that, although both sequences share a common east Pangaea provenance, they were eroded from different sections of this active margin and deposited in spatially separated basins. The present proximity of these units is a result of their tectonic juxtaposition during the Pliocene to Recent collision between the northern edge of the Indo-Australia plate and the Banda Arc.

  3. A comparative study of ChIP-seq sequencing library preparation methods.

    PubMed

    Sundaram, Arvind Y M; Hughes, Timothy; Biondi, Shea; Bolduc, Nathalie; Bowman, Sarah K; Camilli, Andrew; Chew, Yap C; Couture, Catherine; Farmer, Andrew; Jerome, John P; Lazinski, David W; McUsic, Andrew; Peng, Xu; Shazand, Kamran; Xu, Feng; Lyle, Robert; Gilfillan, Gregor D

    2016-10-21

    ChIP-seq is the primary technique used to investigate genome-wide protein-DNA interactions. As part of this procedure, immunoprecipitated DNA must undergo "library preparation" to enable subsequent high-throughput sequencing. To facilitate the analysis of biopsy samples and rare cell populations, there has been a recent proliferation of methods allowing sequencing library preparation from low-input DNA amounts. However, little information exists on the relative merits, performance, comparability and biases inherent to these procedures. Notably, recently developed single-cell ChIP procedures employing microfluidics must also employ library preparation reagents to allow downstream sequencing. In this study, seven methods designed for low-input DNA/ChIP-seq sample preparation (Accel-NGS® 2S, Bowman-method, HTML-PCR, SeqPlex™, DNA SMART™, TELP and ThruPLEX®) were performed on five replicates of 1 ng and 0.1 ng input H3K4me3 ChIP material, and compared to a "gold standard" reference PCR-free dataset. The performance of each method was examined for the prevalence of unmappable reads, amplification-derived duplicate reads, reproducibility, and for the sensitivity and specificity of peak calling. We identified consistent high performance in a subset of the tested reagents, which should aid researchers in choosing the most appropriate reagents for their studies. Furthermore, we expect this work to drive future advances by identifying and encouraging use of the most promising methods and reagents. The results may also aid judgements on how comparable are existing datasets that have been prepared with different sample library preparation reagents.

  4. Pre-main-sequence isochrones - II. Revising star and planet formation time-scales

    NASA Astrophysics Data System (ADS)

    Bell, Cameron P. M.; Naylor, Tim; Mayne, N. J.; Jeffries, R. D.; Littlefair, S. P.

    2013-09-01

    We have derived ages for 13 young (<30 Myr) star-forming regions and find that they are up to a factor of 2 older than the ages typically adopted in the literature. This result has wide-ranging implications, including that circumstellar discs survive longer (≃ 10-12 Myr) and that the average Class I lifetime is greater (≃1 Myr) than currently believed. For each star-forming region, we derived two ages from colour-magnitude diagrams. First, we fitted models of the evolution between the zero-age main sequence and terminal-age main sequence to derive a homogeneous set of main-sequence ages, distances and reddenings with statistically meaningful uncertainties. Our second age for each star-forming region was derived by fitting pre-main-sequence stars to new semi-empirical model isochrones. For the first time (for a set of clusters younger than 50 Myr), we find broad agreement between these two ages, and since these are derived from two distinct mass regimes that rely on different aspects of stellar physics, it gives us confidence in the new age scale. This agreement is largely due to our adoption of empirical colour-Teff relations and bolometric corrections for pre-main-sequence stars cooler than 4000 K. The revised ages for the star-forming regions in our sample are: ˜2 Myr for NGC 6611 (Eagle Nebula; M 16), IC 5146 (Cocoon Nebula), NGC 6530 (Lagoon Nebula; M 8) and NGC 2244 (Rosette Nebula); ˜6 Myr for σ Ori, Cep OB3b and IC 348; ≃10 Myr for λ Ori (Collinder 69); ≃11 Myr for NGC 2169; ≃12 Myr for NGC 2362; ≃13 Myr for NGC 7160; ≃14 Myr for χ Per (NGC 884); and ≃20 Myr for NGC 1960 (M 36).

  5. The low information content of Neurospora splicing signals: implications for RNA splicing and intron origin.

    PubMed

    Collins, Richard A; Stajich, Jason E; Field, Deborah J; Olive, Joan E; DeAbreu, Diane M

    2015-05-01

    When we expressed a small (0.9 kb) nonprotein-coding transcript derived from the mitochondrial VS plasmid in the nucleus of Neurospora we found that it was efficiently spliced at one or more of eight 5' splice sites and ten 3' splice sites, which are present apparently by chance in the sequence. Further experimental and bioinformatic analyses of other mitochondrial plasmids, random sequences, and natural nuclear genes in Neurospora and other fungi indicate that fungal spliceosomes recognize a wide range of 5' splice site and branchpoint sequences and predict introns to be present at high frequency in random sequence. In contrast, analysis of intronless fungal nuclear genes indicates that branchpoint, 5' splice site and 3' splice site consensus sequences are underrepresented compared with random sequences. This underrepresentation of splicing signals is sufficient to deplete the nuclear genome of splice sites at locations that do not comprise biologically relevant introns. Thus, the splicing machinery can recognize a wide range of splicing signal sequences, but splicing still occurs with great accuracy, not because the splicing machinery distinguishes correct from incorrect introns, but because incorrect introns are substantially depleted from the genome. © 2015 Collins et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  6. The Awesome Power of Yeast Evolutionary Genetics: New Genome Sequences and Strain Resources for the Saccharomyces sensu stricto Genus

    PubMed Central

    Scannell, Devin R.; Zill, Oliver A.; Rokas, Antonis; Payen, Celia; Dunham, Maitreya J.; Eisen, Michael B.; Rine, Jasper; Johnston, Mark; Hittinger, Chris Todd

    2011-01-01

    High-quality, well-annotated genome sequences and standardized laboratory strains fuel experimental and evolutionary research. We present improved genome sequences of three species of Saccharomyces sensu stricto yeasts: S. bayanus var. uvarum (CBS 7001), S. kudriavzevii (IFO 1802T and ZP 591), and S. mikatae (IFO 1815T), and describe their comparison to the genomes of S. cerevisiae and S. paradoxus. The new sequences, derived by assembling millions of short DNA sequence reads together with previously published Sanger shotgun reads, have vastly greater long-range continuity and far fewer gaps than the previously available genome sequences. New gene predictions defined a set of 5261 protein-coding orthologs across the five most commonly studied Saccharomyces yeasts, enabling a re-examination of the tempo and mode of yeast gene evolution and improved inferences of species-specific gains and losses. To facilitate experimental investigations, we generated genetically marked, stable haploid strains for all three of these Saccharomyces species. These nearly complete genome sequences and the collection of genetically marked strains provide a valuable toolset for comparative studies of gene function, metabolism, and evolution, and render Saccharomyces sensu stricto the most experimentally tractable model genus. These resources are freely available and accessible through www.SaccharomycesSensuStricto.org. PMID:22384314

  7. Unusual RNA plant virus integration in the soybean genome leads to the production of small RNAs.

    PubMed

    da Fonseca, Guilherme Cordenonsi; de Oliveira, Luiz Felipe Valter; de Morais, Guilherme Loss; Abdelnor, Ricardo Vilela; Nepomuceno, Alexandre Lima; Waterhouse, Peter M; Farinelli, Laurent; Margis, Rogerio

    2016-05-01

    Horizontal gene transfer (HGT) is known to be a major force in genome evolution. The acquisition of genes from viruses by eukaryotic genomes is a well-studied example of HGT, including rare cases of non-retroviral RNA virus integration. The present study describes the integration of cucumber mosaic virus RNA-1 into soybean genome. After an initial metatranscriptomic analysis of small RNAs derived from soybean, the de novo assembly resulted a 3029-nt contig homologous to RNA-1. The integration of this sequence in the soybean genome was confirmed by DNA deep sequencing. The locus where the integration occurred harbors the full RNA-1 sequence followed by the partial sequence of an endogenous mRNA and another sequence of RNA-1 as an inverted repeat and allowing the formation of a hairpin structure. This region recombined into a retrotransposon located inside an exon of a soybean gene. The nucleotide similarity of the integrated sequence compared to other Cucumber mosaic virus sequences indicates that the integration event occurred recently. We described a rare event of non-retroviral RNA virus integration in soybean that leads to the production of a double-stranded RNA in a similar fashion to virus resistance RNAi plants. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  8. False positives complicate ancient pathogen identifications using high-throughput shotgun sequencing

    PubMed Central

    2014-01-01

    Background Identification of historic pathogens is challenging since false positives and negatives are a serious risk. Environmental non-pathogenic contaminants are ubiquitous. Furthermore, public genetic databases contain limited information regarding these species. High-throughput sequencing may help reliably detect and identify historic pathogens. Results We shotgun-sequenced 8 16th-century Mixtec individuals from the site of Teposcolula Yucundaa (Oaxaca, Mexico) who are reported to have died from the huey cocoliztli (‘Great Pestilence’ in Nahautl), an unknown disease that decimated native Mexican populations during the Spanish colonial period, in order to identify the pathogen. Comparison of these sequences with those deriving from the surrounding soil and from 4 precontact individuals from the site found a wide variety of contaminant organisms that confounded analyses. Without the comparative sequence data from the precontact individuals and soil, false positives for Yersinia pestis and rickettsiosis could have been reported. Conclusions False positives and negatives remain problematic in ancient DNA analyses despite the application of high-throughput sequencing. Our results suggest that several studies claiming the discovery of ancient pathogens may need further verification. Additionally, true single molecule sequencing’s short read lengths, inability to sequence through DNA lesions, and limited ancient-DNA-specific technical development hinder its application to palaeopathology. PMID:24568097

  9. RNA-Sequencing Gene Expression Profiling of Orbital Adipose-Derived Stem Cell Population Implicate HOX Genes and WNT Signaling Dysregulation in the Pathogenesis of Thyroid-Associated Orbitopathy.

    PubMed

    Tao, Wensi; Ayala-Haedo, Juan A; Field, Matthew G; Pelaez, Daniel; Wester, Sara T

    2017-12-01

    The purpose of this study was to characterize the intrinsic cellular properties of orbital adipose-derived stem cells (OASC) from patients with thyroid-associated orbitopathy (TAO) and healthy controls. Orbital adipose tissue was collected from a total of nine patients: four controls and five patients with TAO. Isolated OASC were characterized with mesenchymal stem cell-specific markers. Orbital adipose-derived stem cells were differentiated into three lineages: chondrocytes, osteocytes, and adipocytes. Reverse transcription PCR of genes involved in the adipogenesis, chondrogenesis, and osteogenesis pathways were selected to assay the differentiation capacities. RNA sequencing analysis (RNA-seq) was performed and results were compared to assess for differences in gene expression between TAO and controls. Selected top-ranked results were confirmed by RT-PCR. Orbital adipose-derived stem cells isolated from orbital fat expressed high levels of mesenchymal stem cell markers, but low levels of the pluripotent stem cell markers. Orbital adipose-derived stem cells isolated from TAO patients exhibited an increase in adipogenesis, and a decrease in chondrogenesis and osteogenesis. RNA-seq disclosed 54 differentially expressed genes. In TAO OASC, expression of early neural crest progenitor marker (WNT signaling, ZIC genes and MSX2) was lost. Meanwhile, ectopic expression of HOXB2 and HOXB3 was found in the OASC from TAO. Our results suggest that there are intrinsic genetic and cellular differences in the OASC populations derived from TAO patients. The upregulation in adipogenesis in OASC of TAO may be is consistent with the clinical phenotype. Downregulation of early neural crest markers and ectopic expression of HOXB2 and HOXB3 in TAO OASC demonstrate dysregulation of developmental and tissue patterning pathways.

  10. RNA-Sequencing Gene Expression Profiling of Orbital Adipose-Derived Stem Cell Population Implicate HOX Genes and WNT Signaling Dysregulation in the Pathogenesis of Thyroid-Associated Orbitopathy

    PubMed Central

    Tao, Wensi; Ayala-Haedo, Juan A.; Field, Matthew G.; Pelaez, Daniel; Wester, Sara T.

    2017-01-01

    Purpose The purpose of this study was to characterize the intrinsic cellular properties of orbital adipose-derived stem cells (OASC) from patients with thyroid-associated orbitopathy (TAO) and healthy controls. Methods Orbital adipose tissue was collected from a total of nine patients: four controls and five patients with TAO. Isolated OASC were characterized with mesenchymal stem cell–specific markers. Orbital adipose-derived stem cells were differentiated into three lineages: chondrocytes, osteocytes, and adipocytes. Reverse transcription PCR of genes involved in the adipogenesis, chondrogenesis, and osteogenesis pathways were selected to assay the differentiation capacities. RNA sequencing analysis (RNA-seq) was performed and results were compared to assess for differences in gene expression between TAO and controls. Selected top-ranked results were confirmed by RT-PCR. Results Orbital adipose-derived stem cells isolated from orbital fat expressed high levels of mesenchymal stem cell markers, but low levels of the pluripotent stem cell markers. Orbital adipose-derived stem cells isolated from TAO patients exhibited an increase in adipogenesis, and a decrease in chondrogenesis and osteogenesis. RNA-seq disclosed 54 differentially expressed genes. In TAO OASC, expression of early neural crest progenitor marker (WNT signaling, ZIC genes and MSX2) was lost. Meanwhile, ectopic expression of HOXB2 and HOXB3 was found in the OASC from TAO. Conclusion Our results suggest that there are intrinsic genetic and cellular differences in the OASC populations derived from TAO patients. The upregulation in adipogenesis in OASC of TAO may be is consistent with the clinical phenotype. Downregulation of early neural crest markers and ectopic expression of HOXB2 and HOXB3 in TAO OASC demonstrate dysregulation of developmental and tissue patterning pathways. PMID:29214313

  11. Identification of ovule transcripts from the Apospory-Specific Genomic Region (ASGR)-carrier chromosome

    PubMed Central

    2011-01-01

    Background Apomixis, asexual seed production in plants, holds great potential for agriculture as a means to fix hybrid vigor. Apospory is a form of apomixis where the embryo develops from an unreduced egg that is derived from a somatic nucellar cell, the aposporous initial, via mitosis. Understanding the molecular mechanism regulating aposporous initial specification will be a critical step toward elucidation of apomixis and also provide insight into developmental regulation and downstream signaling that results in apomixis. To discover candidate transcripts for regulating aposporous initial specification in P. squamulatum, we compared two transcriptomes derived from microdissected ovules at the stage of aposporous initial formation between the apomictic donor parent, P. squamulatum (accession PS26), and an apomictic derived backcross 8 (BC8) line containing only the Apospory-Specific Genomic Region (ASGR)-carrier chromosome from P. squamulatum. Toward this end, two transcriptomes derived from ovules of an apomictic donor parent and its apomictic backcross derivative at the stage of apospory initiation, were sequenced using 454-FLX technology. Results Using 454-FLX technology, we generated 332,567 reads with an average read length of 147 base pairs (bp) for the PS26 ovule transcriptome library and 363,637 reads with an average read length of 142 bp for the BC8 ovule transcriptome library. A total of 33,977 contigs from the PS26 ovule transcriptome library and 26,576 contigs from the BC8 ovule transcriptome library were assembled using the Multifunctional Inertial Reference Assembly program. Using stringent in silico parameters, 61 transcripts were predicted to map to the ASGR-carrier chromosome, of which 49 transcripts were verified as ASGR-carrier chromosome specific. One of the alien expressed genes could be assigned as tightly linked to the ASGR by screening of apomictic and sexual F1s. Only one transcript, which did not map to the ASGR, showed expression primarily in reproductive tissue. Conclusions Our results suggest that a strategy of comparative sequencing of transcriptomes between donor parent and backcross lines containing an alien chromosome of interest can be an efficient method of identifying transcripts derived from an alien chromosome in a chromosome addition line. PMID:21521529

  12. Evaluation of anonymous and expressed sequence tag derived polymorphic microsatellite markers in the tobacco budworm Heliothis virescens (Lepidoptera: noctuidae)

    USDA-ARS?s Scientific Manuscript database

    Polymorphic genetic markers were identified and characterized using a partial genomic library of Heliothis virescens enriched for simple sequence repeats (SSR) and nucleotide sequences of expressed sequence tags (EST). Nucleotide sequences of 192 clones from the partial genomic library yielded 147 u...

  13. Question 7: Comparative Genomics and Early Cell Evolution: A Cautionary Methodological Note

    NASA Astrophysics Data System (ADS)

    Islas, Sara; Hernández-Morales, Ricardo; Lazcano, Antonio

    2007-10-01

    Inventories of the gene content of the last common ancestor (LCA), i.e., the cenancestor, include sequences that may have undergone horizontal transfer events, as well as sequences that have originated in different pre-cenancestral epochs. However, the universal distribution of highly conserved genes involved in RNA metabolism provide insights into early stages of cell evolution during which RNA played a much more conspicuous biological role, and is consistent with the hypothesis that extant living systems were preceded by an RNA/protein world. Insights into the traits of primitive entities from which the LCA evolved may be derived from the analysis of paralogous gene families, including those formed by sequences that resulted from internal elongation events. Three major types of paralogous gene families can be recognized. The importance of this grouping for understanding the traits of early cells is discussed.

  14. Phylogenetic characterization of the ubiquitous electron transfer flavoprotein families ETF-alpha and ETF-beta.

    PubMed

    Tsai, M H; Saier, M H

    1995-06-01

    Electron transfer flavoproteins (ETF) are alpha beta-heterodimers found in eukaryotic mitochondria and bacteria. We have identified currently sequenced protein members of the ETF-alpha and ETF-beta families. Members of these two families include (a) the ETF subunits of mammals and bacteria, (b) homologous pairs of proteins (FixB/FixA) that are essential for nitrogen fixation in some bacteria, and (c) a pair of carnitine-inducible proteins encoded by two open reading frames in Escherichia coli (YaaQ and YaaR). These three groups of proteins comprise three clusters on both the ETF-alpha and ETF-beta phylogenetic trees, separated from each other by comparable phylogenetic distances. This fact suggests that these two protein families evolved with similar overall rates of evolutionary divergence. Relative regions of sequence conservation are evaluated, and signature sequences for both families are derived.

  15. Genome-wide analysis of wild-type Epstein-Barr virus genomes derived from healthy individuals of the 1,000 Genomes Project.

    PubMed

    Santpere, Gabriel; Darre, Fleur; Blanco, Soledad; Alcami, Antonio; Villoslada, Pablo; Mar Albà, M; Navarro, Arcadi

    2014-04-01

    Most people in the world (∼90%) are infected by the Epstein-Barr virus (EBV), which establishes itself permanently in B cells. Infection by EBV is related to a number of diseases including infectious mononucleosis, multiple sclerosis, and different types of cancer. So far, only seven complete EBV strains have been described, all of them coming from donors presenting EBV-related diseases. To perform a detailed comparative genomic analysis of EBV including, for the first time, EBV strains derived from healthy individuals, we reconstructed EBV sequences infecting lymphoblastoid cell lines (LCLs) from the 1000 Genomes Project. As strain B95-8 was used to transform B cells to obtain LCLs, it is always present, but a specific deletion in its genome sets it apart from natural EBV strains. After studying hundreds of individuals, we determined the presence of natural EBV in at least 10 of them and obtained a set of variants specific to wild-type EBV. By mapping the natural EBV reads into the EBV reference genome (NC007605), we constructed nearly complete wild-type viral genomes from three individuals. Adding them to the five disease-derived EBV genomic sequences available in the literature, we performed an in-depth comparative genomic analysis. We found that latency genes harbor more nucleotide diversity than lytic genes and that six out of nine latency-related genes, as well as other genes involved in viral attachment and entry into host cells, packaging, and the capsid, present the molecular signature of accelerated protein evolution rates, suggesting rapid host-parasite coevolution.

  16. Derivatives of a Vancomycin-Resistant Staphylococcus aureus Strain Isolated at Hershey Medical Center

    PubMed Central

    Bozdogan, Bülent; Ednie, Lois; Credito, Kim; Kosowska, Klaudia; Appelbaum, Peter C.

    2004-01-01

    Antimicrobial susceptibilities and genetic relatedness of the vancomycin-resistant Staphylococcus aureus strain (VRSA) isolated at Hershey, Pa. (VRSA Hershey), and its vancomycin-susceptible and high-level-resistant derivatives were studied and compared to 32 methicillin-resistant S. aureus strains (MRSA) isolated from patients and medical staff in contact with the VRSA patient. Derivatives of VRSA were obtained by subculturing six VRSA colonies from the original culture with or without vancomycin. Ten days of drug-free subculture caused the loss of vanA in two vancomycin-susceptible derivatives for which vancomycin MICs were 1 to 4 μg/ml. Multistep selection of three VRSA clones with vancomycin for 10 days increased vancomycin MICs from 32 to 1,024 to 2,048 μg/ml. MICs of teicoplanin, dalbavancin, and oritavancin were also increased from 4, 0.5, and 0.12 to 64, 1, and 32 μg/ml, respectively. Pulsed-field gel electrophoresis (PFGE) and multilocus sequence typing analysis indicated that VRSA Hershey was the vanA-acquired variety of a common MRSA clone in our hospital with sequence type 5 (ST5). Three of five vancomycin-intermediate S. aureus strains tested from geographically different areas were also ST5, and the Michigan VRSA was ST371, a one-allele variant of ST5. Derivatives of VRSA Hershey had differences in PFGE profiles and the size of SmaI fragment that carries the vanA gene cluster, indicating instability of this cluster in VRSA Hershey. However induction with vancomycin increased glycopeptide MICs and stabilized the resistance. PMID:15561854

  17. Derivatives of a vancomycin-resistant Staphylococcus aureus strain isolated at Hershey Medical Center.

    PubMed

    Bozdogan, Bülent; Ednie, Lois; Credito, Kim; Kosowska, Klaudia; Appelbaum, Peter C

    2004-12-01

    Antimicrobial susceptibilities and genetic relatedness of the vancomycin-resistant Staphylococcus aureus strain (VRSA) isolated at Hershey, Pa. (VRSA Hershey), and its vancomycin-susceptible and high-level-resistant derivatives were studied and compared to 32 methicillin-resistant S. aureus strains (MRSA) isolated from patients and medical staff in contact with the VRSA patient. Derivatives of VRSA were obtained by subculturing six VRSA colonies from the original culture with or without vancomycin. Ten days of drug-free subculture caused the loss of vanA in two vancomycin-susceptible derivatives for which vancomycin MICs were 1 to 4 microg/ml. Multistep selection of three VRSA clones with vancomycin for 10 days increased vancomycin MICs from 32 to 1,024 to 2,048 microg/ml. MICs of teicoplanin, dalbavancin, and oritavancin were also increased from 4, 0.5, and 0.12 to 64, 1, and 32 microg/ml, respectively. Pulsed-field gel electrophoresis (PFGE) and multilocus sequence typing analysis indicated that VRSA Hershey was the vanA-acquired variety of a common MRSA clone in our hospital with sequence type 5 (ST5). Three of five vancomycin-intermediate S. aureus strains tested from geographically different areas were also ST5, and the Michigan VRSA was ST371, a one-allele variant of ST5. Derivatives of VRSA Hershey had differences in PFGE profiles and the size of SmaI fragment that carries the vanA gene cluster, indicating instability of this cluster in VRSA Hershey. However induction with vancomycin increased glycopeptide MICs and stabilized the resistance.

  18. Two EST-derived marker systems for cultivar identification in tree peony.

    PubMed

    Zhang, J J; Shu, Q Y; Liu, Z A; Ren, H X; Wang, L S; De Keyser, E

    2012-02-01

    Tree peony (Paeonia suffruticosa Andrews), a woody deciduous shrub, belongs to the section Moutan DC. in the genus of Paeonia of the Paeoniaceae family. To increase the efficiency of breeding, two EST-derived marker systems were developed based on a tree peony expressed sequence tag (EST) database. Using target region amplification polymorphism (TRAP), 19 of 39 primer pairs showed good amplification for 56 accessions with amplicons ranging from 120 to 3,000 bp long, among which 99.3% were polymorphic. In contrast, 7 of 21 primer pairs demonstrated adequate amplification with clear bands for simple sequence repeats (SSRs) developed from ESTs, and a total of 33 alleles were found in 56 accessions. The similarity matrices generated by TRAP and EST-SSR markers were compared, and the Mantel test (r = 0.57778, P = 0.0020) showed a moderate correlation between the two types of molecular markers. TRAP markers were suitable for DNA fingerprinting and EST-SSR markers were more appropriate for discriminating synonyms (the same cultivars with different names due to limited information exchanged among different geographic areas). The two sets of EST-derived markers will be used further for genetic linkage map construction and quantitative trait locus detection in tree peony.

  19. Estimating Bacterial Diversity for Ecological Studies: Methods, Metrics, and Assumptions

    PubMed Central

    Birtel, Julia; Walser, Jean-Claude; Pichon, Samuel; Bürgmann, Helmut; Matthews, Blake

    2015-01-01

    Methods to estimate microbial diversity have developed rapidly in an effort to understand the distribution and diversity of microorganisms in natural environments. For bacterial communities, the 16S rRNA gene is the phylogenetic marker gene of choice, but most studies select only a specific region of the 16S rRNA to estimate bacterial diversity. Whereas biases derived from from DNA extraction, primer choice and PCR amplification are well documented, we here address how the choice of variable region can influence a wide range of standard ecological metrics, such as species richness, phylogenetic diversity, β-diversity and rank-abundance distributions. We have used Illumina paired-end sequencing to estimate the bacterial diversity of 20 natural lakes across Switzerland derived from three trimmed variable 16S rRNA regions (V3, V4, V5). Species richness, phylogenetic diversity, community composition, β-diversity, and rank-abundance distributions differed significantly between 16S rRNA regions. Overall, patterns of diversity quantified by the V3 and V5 regions were more similar to one another than those assessed by the V4 region. Similar results were obtained when analyzing the datasets with different sequence similarity thresholds used during sequences clustering and when the same analysis was used on a reference dataset of sequences from the Greengenes database. In addition we also measured species richness from the same lake samples using ARISA Fingerprinting, but did not find a strong relationship between species richness estimated by Illumina and ARISA. We conclude that the selection of 16S rRNA region significantly influences the estimation of bacterial diversity and species distributions and that caution is warranted when comparing data from different variable regions as well as when using different sequencing techniques. PMID:25915756

  20. Reference quality assembly of the 3.5-Gb genome of Capsicum annuum from a single linked-read library.

    PubMed

    Hulse-Kemp, Amanda M; Maheshwari, Shamoni; Stoffel, Kevin; Hill, Theresa A; Jaffe, David; Williams, Stephen R; Weisenfeld, Neil; Ramakrishnan, Srividya; Kumar, Vijay; Shah, Preyas; Schatz, Michael C; Church, Deanna M; Van Deynze, Allen

    2018-01-01

    Linked-Read sequencing technology has recently been employed successfully for de novo assembly of human genomes, however, the utility of this technology for complex plant genomes is unproven. We evaluated the technology for this purpose by sequencing the 3.5-gigabase (Gb) diploid pepper ( Capsicum annuum ) genome with a single Linked-Read library. Plant genomes, including pepper, are characterized by long, highly similar repetitive sequences. Accordingly, significant effort is used to ensure that the sequenced plant is highly homozygous and the resulting assembly is a haploid consensus. With a phased assembly approach, we targeted a heterozygous F 1 derived from a wide cross to assess the ability to derive both haplotypes and characterize a pungency gene with a large insertion/deletion. The Supernova software generated a highly ordered, more contiguous sequence assembly than all currently available C. annuum reference genomes. Over 83% of the final assembly was anchored and oriented using four publicly available  de novo linkage maps. A comparison of the annotation of conserved eukaryotic genes indicated the completeness of assembly. The validity of the phased assembly is further demonstrated with the complete recovery of both 2.5-Kb insertion/deletion haplotypes of the PUN1 locus in the F 1 sample that represents pungent and nonpungent peppers, as well as nearly full recovery of the BUSCO2 gene set within each of the two haplotypes. The most contiguous pepper genome assembly to date has been generated which demonstrates that Linked-Read library technology provides a tool to de novo assemble complex highly repetitive heterozygous plant genomes. This technology can provide an opportunity to cost-effectively develop high-quality genome assemblies for other complex plants and compare structural and gene differences through accurate haplotype reconstruction.

  1. Cell and molecular biology of the spiny dogfish Squalus acanthias and little skate Leucoraja erinacea: insights from in vitro cultured cells.

    PubMed

    Barnes, D W

    2012-04-01

    Two of the most commonly used elasmobranch experimental model species are the spiny dogfish Squalus acanthias and the little skate Leucoraja erinacea. Comparative biology and genomics with these species have provided useful information in physiology, pharmacology, toxicology, immunology, evolutionary developmental biology and genetics. A wealth of information has been obtained using in vitro approaches to study isolated cells and tissues from these organisms under circumstances in which the extracellular environment can be controlled. In addition to classical work with primary cell cultures, continuously proliferating cell lines have been derived recently, representing the first cell lines from cartilaginous fishes. These lines have proved to be valuable tools with which to explore functional genomic and biological questions and to test hypotheses at the molecular level. In genomic experiments, complementary (c)DNA libraries have been constructed, and c. 8000 unique transcripts identified, with over 3000 representing previously unknown gene sequences. A sub-set of messenger (m)RNAs has been detected for which the 3' untranslated regions show elements that are remarkably well conserved evolutionarily, representing novel, potentially regulatory gene sequences. The cell culture systems provide physiologically valid tools to study functional roles of these sequences and other aspects of elasmobranch molecular cell biology and physiology. Information derived from the use of in vitro cell cultures is valuable in revealing gene diversity and information for genomic sequence assembly, as well as for identification of new genes and molecular markers, construction of gene-array probes and acquisition of full-length cDNA sequences. © 2012 The Author. Journal of Fish Biology © 2012 The Fisheries Society of the British Isles.

  2. Analysis and functional annotation of expressed sequence tags from in vitro cell lines of elasmobranchs: spiny dogfish shark (Squalus acanthias) and little skate (Leucoraja erinacea)

    PubMed Central

    Parton, Angela; Bayne, Christopher J.; Barnes, David W.

    2010-01-01

    Elasmobranchs are the most commonly used experimental models among the jawed, cartilaginous fish (Chondrichthyes). Previously we developed cell lines from embryos of two elasmobranchs, Squalus acanthias the spiny dogfish shark (SAE line), and Leucoraja erinacea the little skate (LEE-1 line). From these lines cDNA libraries were derived and expressed sequence tags (ESTs) generated. From the SAE cell line 4303 unique transcripts were identified, with 1848 of these representing unknown sequences (showing no BLASTX identification). From the LEE-1 cell line, 3660 unique transcripts were identified, and unknown, unique sequences totaled 1333. Gene Ontology (GO) annotation showed that GO assignments for the two cell lines were in general similar. These results suggest that the procedures used to derive the cell lines led to isolation of cell types of the same general embryonic origin from both species. The LEE-1 transcripts included GO categories “envelope” and “oxidoreductase activity” but the SAE transcripts did not. GO analysis of SAE transcripts identified the category “anatomical structure formation” that was not present in LEE-1 cells. Increased organelle compartments may exist within LEE-1 cells compared to SAE cells, and the higher oxidoreductase activity in LEE-1 cells may indicate a role for these cells in responses associated with innate immunity or in steroidogenesis. These EST libraries from elasmobranch cell lines provide information for assembly of genomic sequences and are useful in revealing gene diversity, new genes and molecular markers, as well as in providing means for elucidation of full-length cDNAs and probes for gene array analyses. This is the first study of this type with members of the Chondrichthyes. PMID:20471924

  3. Analysis and functional annotation of expressed sequence tags from in vitro cell lines of elasmobranchs: Spiny dogfish shark (Squalus acanthias) and little skate (Leucoraja erinacea).

    PubMed

    Parton, Angela; Bayne, Christopher J; Barnes, David W

    2010-09-01

    Elasmobranchs are the most commonly used experimental models among the jawed, cartilaginous fish (Chondrichthyes). Previously we developed cell lines from embryos of two elasmobranchs, Squalus acanthias the spiny dogfish shark (SAE line), and Leucoraja erinacea the little skate (LEE-1 line). From these lines cDNA libraries were derived and expressed sequence tags (ESTs) generated. From the SAE cell line 4303 unique transcripts were identified, with 1848 of these representing unknown sequences (showing no BLASTX identification). From the LEE-1 cell line, 3660 unique transcripts were identified, and unknown, unique sequences totaled 1333. Gene Ontology (GO) annotation showed that GO assignments for the two cell lines were in general similar. These results suggest that the procedures used to derive the cell lines led to isolation of cell types of the same general embryonic origin from both species. The LEE-1 transcripts included GO categories "envelope" and "oxidoreductase activity" but the SAE transcripts did not. GO analysis of SAE transcripts identified the category "anatomical structure formation" that was not present in LEE-1 cells. Increased organelle compartments may exist within LEE-1 cells compared to SAE cells, and the higher oxidoreductase activity in LEE-1 cells may indicate a role for these cells in responses associated with innate immunity or in steroidogenesis. These EST libraries from elasmobranch cell lines provide information for assembly of genomic sequences and are useful in revealing gene diversity, new genes and molecular markers, as well as in providing means for elucidation of full-length cDNAs and probes for gene array analyses. This is the first study of this type with members of the Chondrichthyes. Copyright 2010 Elsevier Inc. All rights reserved.

  4. Unraveling the sequence and structure of the protein osteocalcin from a 42 ka fossil horse

    NASA Astrophysics Data System (ADS)

    Ostrom, Peggy H.; Gandhi, Hasand; Strahler, John R.; Walker, Angela K.; Andrews, Philip C.; Leykam, Joseph; Stafford, Thomas W.; Kelly, Robert L.; Walker, Danny N.; Buckley, Mike; Humpula, James

    2006-04-01

    We report the first complete amino acid sequence and evidence of secondary structure for osteocalcin from a temperate fossil. The osteocalcin derives from a 42 ka equid bone excavated from Juniper Cave, Wyoming. Results were determined by matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-MS) and Edman sequencing with independent confirmation of the sequence in two laboratories. The ancient sequence was compared to that of three modern taxa: horse ( Equus caballus), zebra ( Equus grevyi), and donkey ( Equus asinus). Although there was no difference in sequence among modern taxa, MALDI-MS and Edman sequencing show that residues 48 and 49 of our modern horse are Thr, Ala rather than Pro, Val as previously reported (Carstanjen B., Wattiez, R., Armory, H., Lepage, O.M., Remy, B., 2002. Isolation and characterization of equine osteocalcin. Ann. Med. Vet.146(1), 31-38). MALDI-MS and Edman sequencing data indicate that the osteocalcin sequence of the 42 ka fossil is similar to that of modern horse. Previously inaccessible structural attributes for ancient osteocalcin were observed. Glu 39 rather than Gln 39 is consistent with deamidation, a process known to occur during fossilization and aging. Two post-translational modifications were documented: Hyp 9 and a disulfide bridge. The latter suggests at least partial retention of secondary structure. As has been done for ancient DNA research, we recommend standards for preparation and criteria for authenticating results of ancient protein sequencing.

  5. Comparative mapping in intraspecific populations uncovers a high degree of macrosynteny between A- and B-genome diploid species of peanut

    PubMed Central

    2012-01-01

    Background Cultivated peanut or groundnut (Arachis hypogaea L.) is an important oilseed crop with an allotetraploid genome (AABB, 2n = 4x = 40). Both the low level of genetic variation within the cultivated gene pool and its polyploid nature limit the utilization of molecular markers to explore genome structure and facilitate genetic improvement. Nevertheless, a wealth of genetic diversity exists in diploid Arachis species (2n = 2x = 20), which represent a valuable gene pool for cultivated peanut improvement. Interspecific populations have been used widely for genetic mapping in diploid species of Arachis. However, an intraspecific mapping strategy was essential to detect chromosomal rearrangements among species that could be obscured by mapping in interspecific populations. To develop intraspecific reference linkage maps and gain insights into karyotypic evolution within the genus, we comparatively mapped the A- and B-genome diploid species using intraspecific F2 populations. Exploring genome organization among diploid peanut species by comparative mapping will enhance our understanding of the cultivated tetraploid peanut genome. Moreover, new sources of molecular markers that are highly transferable between species and developed from expressed genes will be required to construct saturated genetic maps for peanut. Results A total of 2,138 EST-SSR (expressed sequence tag-simple sequence repeat) markers were developed by mining a tetraploid peanut EST assembly including 101,132 unigenes (37,916 contigs and 63,216 singletons) derived from 70,771 long-read (Sanger) and 270,957 short-read (454) sequences. A set of 97 SSR markers were also developed by mining 9,517 genomic survey sequences of Arachis. An SSR-based intraspecific linkage map was constructed using an F2 population derived from a cross between K 9484 (PI 298639) and GKBSPSc 30081 (PI 468327) in the B-genome species A. batizocoi. A high degree of macrosynteny was observed when comparing the homoeologous linkage groups between A (A. duranensis) and B (A. batizocoi) genomes. Comparison of the A- and B-genome genetic linkage maps also showed a total of five inversions and one major reciprocal translocation between two pairs of chromosomes under our current mapping resolution. Conclusions Our findings will contribute to understanding tetraploid peanut genome origin and evolution and eventually promote its genetic improvement. The newly developed EST-SSR markers will enrich current molecular marker resources in peanut. PMID:23140574

  6. Phytoplasma phylogenetics based on analysis of secA and 23S rRNA gene sequences for improved resolution of candidate species of 'Candidatus Phytoplasma'.

    PubMed

    Hodgetts, Jennifer; Boonham, Neil; Mumford, Rick; Harrison, Nigel; Dickinson, Matthew

    2008-08-01

    Phytoplasma phylogenetics has focused primarily on sequences of the non-coding 16S rRNA gene and the 16S-23S rRNA intergenic spacer region (16-23S ISR), and primers that enable amplification of these regions from all phytoplasmas by PCR are well established. In this study, primers based on the secA gene have been developed into a semi-nested PCR assay that results in a sequence of the expected size (about 480 bp) from all 34 phytoplasmas examined, including strains representative of 12 16Sr groups. Phylogenetic analysis of secA gene sequences showed similar clustering of phytoplasmas when compared with clusters resolved by similar sequence analyses of a 16-23S ISR-23S rRNA gene contig or of the 16S rRNA gene alone. The main differences between trees were in the branch lengths, which were elongated in the 16-23S ISR-23S rRNA gene tree when compared with the 16S rRNA gene tree and elongated still further in the secA gene tree, despite this being a shorter sequence. The improved resolution in the secA gene-derived phylogenetic tree resulted in the 16SrII group splitting into two distinct clusters, while phytoplasmas associated with coconut lethal yellowing-type diseases split into three distinct groups, thereby supporting past proposals that they represent different candidate species within 'Candidatus Phytoplasma'. The ability to differentiate 16Sr groups and subgroups by virtual RFLP analysis of secA gene sequences suggests that this gene may provide an informative alternative molecular marker for pathogen identification and diagnosis of phytoplasma diseases.

  7. Meaningful call combinations and compositional processing in the southern pied babbler

    PubMed Central

    Engesser, Sabrina; Ridley, Amanda R.; Townsend, Simon W.

    2016-01-01

    Language’s expressive power is largely attributable to its compositionality: meaningful words are combined into larger/higher-order structures with derived meaning. Despite its importance, little is known regarding the evolutionary origins and emergence of this syntactic ability. Although previous research has shown a rudimentary capability to combine meaningful calls in primates, because of a scarcity of comparative data, it is unclear to what extent analog forms might also exist outside of primates. Here, we address this ambiguity and provide evidence for rudimentary compositionality in the discrete vocal system of a social passerine, the pied babbler (Turdoides bicolor). Natural observations and predator presentations revealed that babblers produce acoustically distinct alert calls in response to close, low-urgency threats and recruitment calls when recruiting group members during locomotion. On encountering terrestrial predators, both vocalizations are combined into a “mobbing sequence,” potentially to recruit group members in a dangerous situation. To investigate whether babblers process the sequence in a compositional way, we conducted systematic experiments, playing back the individual calls in isolation as well as naturally occurring and artificial sequences. Babblers reacted most strongly to mobbing sequence playbacks, showing a greater attentiveness and a quicker approach to the loudspeaker, compared with individual calls or control sequences. We conclude that the sequence constitutes a compositional structure, communicating information on both the context and the requested action. Our work supports previous research suggesting combinatoriality as a viable mechanism to increase communicative output and indicates that the ability to combine and process meaningful vocal structures, a basic syntax, may be more widespread than previously thought. PMID:27155011

  8. Meaningful call combinations and compositional processing in the southern pied babbler.

    PubMed

    Engesser, Sabrina; Ridley, Amanda R; Townsend, Simon W

    2016-05-24

    Language's expressive power is largely attributable to its compositionality: meaningful words are combined into larger/higher-order structures with derived meaning. Despite its importance, little is known regarding the evolutionary origins and emergence of this syntactic ability. Although previous research has shown a rudimentary capability to combine meaningful calls in primates, because of a scarcity of comparative data, it is unclear to what extent analog forms might also exist outside of primates. Here, we address this ambiguity and provide evidence for rudimentary compositionality in the discrete vocal system of a social passerine, the pied babbler (Turdoides bicolor). Natural observations and predator presentations revealed that babblers produce acoustically distinct alert calls in response to close, low-urgency threats and recruitment calls when recruiting group members during locomotion. On encountering terrestrial predators, both vocalizations are combined into a "mobbing sequence," potentially to recruit group members in a dangerous situation. To investigate whether babblers process the sequence in a compositional way, we conducted systematic experiments, playing back the individual calls in isolation as well as naturally occurring and artificial sequences. Babblers reacted most strongly to mobbing sequence playbacks, showing a greater attentiveness and a quicker approach to the loudspeaker, compared with individual calls or control sequences. We conclude that the sequence constitutes a compositional structure, communicating information on both the context and the requested action. Our work supports previous research suggesting combinatoriality as a viable mechanism to increase communicative output and indicates that the ability to combine and process meaningful vocal structures, a basic syntax, may be more widespread than previously thought.

  9. Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance.

    PubMed

    Bashir, Ali; Bansal, Vikas; Bafna, Vineet

    2010-06-18

    Massively parallel DNA sequencing technologies have enabled the sequencing of several individual human genomes. These technologies are also being used in novel ways for mRNA expression profiling, genome-wide discovery of transcription-factor binding sites, small RNA discovery, etc. The multitude of sequencing platforms, each with their unique characteristics, pose a number of design challenges, regarding the technology to be used and the depth of sequencing required for a particular sequencing application. Here we describe a number of analytical and empirical results to address design questions for two applications: detection of structural variations from paired-end sequencing and estimating mRNA transcript abundance. For structural variation, our results provide explicit trade-offs between the detection and resolution of rearrangement breakpoints, and the optimal mix of paired-read insert lengths. Specifically, we prove that optimal detection and resolution of breakpoints is achieved using a mix of exactly two insert library lengths. Furthermore, we derive explicit formulae to determine these insert length combinations, enabling a 15% improvement in breakpoint detection at the same experimental cost. On empirical short read data, these predictions show good concordance with Illumina 200 bp and 2 Kbp insert length libraries. For transcriptome sequencing, we determine the sequencing depth needed to detect rare transcripts from a small pilot study. With only 1 Million reads, we derive corrections that enable almost perfect prediction of the underlying expression probability distribution, and use this to predict the sequencing depth required to detect low expressed genes with greater than 95% probability. Together, our results form a generic framework for many design considerations related to high-throughput sequencing. We provide software tools http://bix.ucsd.edu/projects/NGS-DesignTools to derive platform independent guidelines for designing sequencing experiments (amount of sequencing, choice of insert length, mix of libraries) for novel applications of next generation sequencing.

  10. Muroid rodent phylogenetics: 900-species tree reveals increasing diversification rates

    PubMed Central

    Schenk, John J.

    2017-01-01

    We combined new sequence data for more than 300 muroid rodent species with our previously published sequences for up to five nuclear and one mitochondrial genes to generate the most widely and densely sampled hypothesis of evolutionary relationships across Muroidea. An exhaustive screening procedure for publically available sequences was implemented to avoid the propagation of taxonomic errors that are common to supermatrix studies. The combined data set of carefully screened sequences derived from all available sequences on GenBank with our new data resulted in a robust maximum likelihood phylogeny for 900 of the approximately 1,620 muroids. Several regions that were equivocally resolved in previous studies are now more decisively resolved, and we estimated a chronogram using 28 fossil calibrations for the most integrated age and topological estimates to date. The results were used to update muroid classification and highlight questions needing additional data. We also compared the results of multigene supermatrix studies like this one with the principal published supertrees and concluded that the latter are unreliable for any comparative study in muroids. In addition, we explored diversification patterns as an explanation for why muroid rodents represent one of the most species-rich groups of mammals by detecting evidence for increasing net diversification rates through time across the muroid tree. We suggest the observation of increasing rates may be due to a combination of parallel increases in rate across clades and high average extinction rates. Five increased diversification-rate-shifts were inferred, suggesting that multiple, but perhaps not independent, events have led to the remarkable species diversity in the superfamily. Our results provide a phylogenetic framework for comparative studies that is not highly dependent upon the signal from any one gene. PMID:28813483

  11. Functional noncoding sequences derived from SINEs in the mammalian genome

    PubMed Central

    Nishihara, Hidenori; Smit, Arian F.A.; Okada, Norihiro

    2006-01-01

    Recent comparative analyses of mammalian sequences have revealed that a large number of nonprotein-coding genomic regions are under strong selective constraint. Here, we report that some of these loci have been derived from a newly defined family of ancient SINEs (short interspersed repetitive elements). This is a surprising result, as SINEs and other transposable elements are commonly thought to be genomic parasites. We named the ancient SINE family AmnSINE1, for Amniota SINE1, because we found it to be present in mammals as well as in birds, and some copies predate the mammalian-bird split 310 million years ago (Mya). AmnSINE1 has a chimeric structure of a 5S rRNA and a tRNA-derived SINE, and is related to five tRNA-derived SINE families that we characterized here in the coelacanth, dogfish shark, hagfish, and amphioxus genomes. All of the newly described SINE families have a common central domain that is also shared by zebrafish SINE3, and we collectively name them the DeuSINE (Deuterostomia SINE) superfamily. Notably, of the ∼1000 still identifiable copies of AmnSINE1 in the human genome, 105 correspond to loci phylogenetically highly conserved among mammalian orthologs. The conservation is strongest over the central domain. Thus, AmnSINE1 appears to be the best example of a transposable element of which a significant fraction of the copies have acquired genomic functionality. PMID:16717141

  12. A comparison of the enzymatic properties of the major cysteine proteinases from Trypanosoma congolense and Trypanosoma cruzi.

    PubMed

    Chagas, J R; Authie, E; Serveau, C; Lalmanach, G; Juliano, L; Gauthier, F

    1997-09-01

    Congopain and cruzipain, the major cysteine proteinases from Trypanosoma congolense and Trypanosoma cruzi, were compared for their activities towards a series of new, sensitive fluorogenic substrates of the papain family of cysteine proteinases and for their sensitivity to inhibition by cystatins and related biotinylated peptidyl diazomethanes. Low Ki values, in the 10 pM range, were found for the interaction of both proteinases with natural cystatin inhibitors. The kinetic constants for the hydrolysis of cystatin-derived substrates, and the inhibition by related diazomethanes were essentially identical. Unlike cathepsins B and L, the related mammal papain family proteinases, congopain and cruzipain accomodate a prolyl residue in P2'. Substrates having the sequence VGGP from P2 to P2' were hydrolysed by both congopain and cruzipain with a k(cat)/Km greater than 4.10(3) mM(-1) s(-1). Irreversible diazomethane inhibitors, deduced from the unprime sequence of cystatin-derived substrates, inhibited the two parasite proteinases. N-terminal labelling of diazomethanes with a biotin group did not alter the rate of inhibition significantly, which provides a useful tool for examining the distribution of these enzymes in the parasite and in the host. Despite their similar activities on cystatin-derived substrates, congopain and cruzipain had significantly different pH-activity profiles when assayed with a cystatin-derived substrate. They were correlated with structural differences, especially at the presumed S2 subsites.

  13. Identification of S-glutathionylation sites in species-specific proteins by incorporating five sequence-derived features into the general pseudo-amino acid composition.

    PubMed

    Zhao, Xiaowei; Ning, Qiao; Ai, Meiyue; Chai, Haiting; Yang, Guifu

    2016-06-07

    As a selective and reversible protein post-translational modification, S-glutathionylation generates mixed disulfides between glutathione (GSH) and cysteine residues, and plays an important role in regulating protein activity, stability, and redox regulation. To fully understand S-glutathionylation mechanisms, identification of substrates and specific S-Glutathionylated sites is crucial. Experimental identification of S-glutathionylated sites is labor-intensive and time consuming, so establishing an effective computational method is much desirable due to their convenient and fast speed. Therefore, in this study, a new bioinformatics tool named SSGlu (Species-Specific identification of Protein S-glutathionylation Sites) was developed to identify species-specific protein S-glutathionylated sites, utilizing support vector machines that combine multiple sequence-derived features with a two-step feature selection. By 5-fold cross validation, the performance of SSGlu was measured with an AUC of 0.8105 and 0.8041 for Homo sapiens and Mus musculus, respectively. Additionally, SSGlu was compared with the existing methods, and the higher MCC and AUC of SSGlu demonstrated that SSGlu was very promising to predict S-glutathionylated sites. Furthermore, a site-specific analysis showed that S-glutathionylation intimately correlated with the features derived from its surrounding sites. The conclusions derived from this study might help to understand more of the S-glutathionylation mechanism and guide the related experimental validation. For public access, SSGlu is freely accessible at http://59.73.198.144:8080/SSGlu/. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. A SSR-based genetic linkage map of cultivated peanut (Arachis hypogaea L.)

    USDA-ARS?s Scientific Manuscript database

    The objective of this study was to construct a molecular linkage map of cultivated tetraploid peanut using simple sequence repeat (SSR) markers derived primarily from peanut genomic sequences, expressed sequence tags (ESTs), and by "data mining" sequences released in GenBank. Three recombinant inbre...

  15. Application of whole genome shotgun sequencing for detection and characterization of genetically modified organisms and derived products.

    PubMed

    Holst-Jensen, Arne; Spilsberg, Bjørn; Arulandhu, Alfred J; Kok, Esther; Shi, Jianxin; Zel, Jana

    2016-07-01

    The emergence of high-throughput, massive or next-generation sequencing technologies has created a completely new foundation for molecular analyses. Various selective enrichment processes are commonly applied to facilitate detection of predefined (known) targets. Such approaches, however, inevitably introduce a bias and are prone to miss unknown targets. Here we review the application of high-throughput sequencing technologies and the preparation of fit-for-purpose whole genome shotgun sequencing libraries for the detection and characterization of genetically modified and derived products. The potential impact of these new sequencing technologies for the characterization, breeding selection, risk assessment, and traceability of genetically modified organisms and genetically modified products is yet to be fully acknowledged. The published literature is reviewed, and the prospects for future developments and use of the new sequencing technologies for these purposes are discussed.

  16. Generation and Analysis of a Large-Scale Expressed Sequence Tag Database from a Full-Length Enriched cDNA Library of Developing Leaves of Gossypium hirsutum L

    PubMed Central

    Pang, Chaoyou; Fan, Shuli; Song, Meizhen; Yu, Shuxun

    2013-01-01

    Background Cotton (Gossypium hirsutum L.) is one of the world’s most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. Methodology/Principal Findings In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR), which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. Conclusions/Significance These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence assembly and annotation in G. hirsutum and comparative genomics among Gossypium species. PMID:24146870

  17. ASFinder: a tool for genome-wide identification of alternatively splicing transcripts from EST-derived sequences.

    PubMed

    Min, Xiang Jia

    2013-01-01

    Expressed Sequence Tags (ESTs) are a rich resource for identifying Alternatively Splicing (AS) genes. The ASFinder webserver is designed to identify AS isoforms from EST-derived sequences. Two approaches are implemented in ASFinder. If no genomic sequences are provided, the server performs a local BLASTN to identify AS isoforms from ESTs having both ends aligned but an internal segment unaligned. Otherwise, ASFinder uses SIM4 to map ESTs to the genome, then the overlapping ESTs that are mapped to the same genomic locus and have internal variable exon/intron boundaries are identified as AS isoforms. The tool is available at http://proteomics.ysu.edu/tools/ASFinder.html.

  18. Analysis of Post-Fire Vegetation Recovery in the Mediterranean Basin using MODIS Derived Vegetation Indices

    NASA Astrophysics Data System (ADS)

    Hawtree, Daniel; San Miguel, Jesus; Sedano, Fernando; Kempeneers, Pieter

    2010-05-01

    The Mediterranean basin region is highly susceptible to wildfire, with approximately 60,000 individual fires and half a million ha of natural vegetation burnt per year. Of particular concern in this region is the impact of repeated wildfires on the ability of natural lands to return to a pre-fire state, and of the possibility of desertification of semi-arid areas. Given these concerns, understanding the temporal patterns of vegetation recovery is important for the management of environmental resources in the region. A valuable tool for evaluating these recovery patterns are vegetation indices derived from remote sensing data. Previous research on post-fire vegetation recovery conducted in this region has found significant variability in recovery times across different study sites. It is unclear what the primary variables are affecting the differences in the rates of recovery, and if any geographic patterns of behavior exist across the Mediterranean basin. This research has primarily been conducted using indices derived from Landsat imagery. However, no extensive analysis of vegetation regeneration for large regions has been published, and assessment of vegetation recovery on the basis of medium-spatial resolution imagery such as that of MODIS has not yet been analyzed. This study examines the temporal pattern of vegetation recovery in a number of fire sites in the Mediterranean basin, using data derived from MODIS 16 -day composite vegetation indices. The intent is to develop a more complete picture of the temporal sequence of vegetation recovery, and to evaluate what additional factors impact variations in the recovery sequence. In addition, this study evaluates the utility of using MODIS derived vegetation indices for regeneration studies, and compares the findings to earlier studies which rely on Landsat data. Wildfires occurring between the years 2000 and 2004 were considered as potential study sites for this research. Using the EFFIS dataset, all wildfires covering an area of at least 1,000 ha were identified. The land-cover / land-use of these large fires sites were then evaluated using the CORINE land-cover data set, and the sites dominated primarily by natural vegetation were identified. Once these candidate sites were identified, a subset was selected across a range of locations and site characteristics for post-fire recovery analysis. To evaluate the post-fire recovery sequence in these locations, time-series of NDVI, EVI, and LAI were derived using 250 meter resolution MODIS data (MOD13Q). The vegetation index values were then compared to pre-fire values to determine recovery relative to the pre-fire vegetative state. The variability in rates of recovery are then considered with respect to moisture availability, vegetation type, and local site conditions to evaluate if any patterns of recovery can be determined.

  19. Dense depth maps from correspondences derived from perceived motion

    NASA Astrophysics Data System (ADS)

    Kirby, Richard; Whitaker, Ross

    2017-01-01

    Many computer vision applications require finding corresponding points between images and using the corresponding points to estimate disparity. Today's correspondence finding algorithms primarily use image features or pixel intensities common between image pairs. Some 3-D computer vision applications, however, do not produce the desired results using correspondences derived from image features or pixel intensities. Two examples are the multimodal camera rig and the center region of a coaxial camera rig. We present an image correspondence finding technique that aligns pairs of image sequences using optical flow fields. The optical flow fields provide information about the structure and motion of the scene, which are not available in still images but can be used in image alignment. We apply the technique to a dual focal length stereo camera rig consisting of a visible light-infrared camera pair and to a coaxial camera rig. We test our method on real image sequences and compare our results with the state-of-the-art multimodal and structure from motion (SfM) algorithms. Our method produces more accurate depth and scene velocity reconstruction estimates than the state-of-the-art multimodal and SfM algorithms.

  20. Impact of inoculum sources on biotransformation of pharmaceuticals and personal care products.

    PubMed

    Kim, Sunah; Rossmassler, Karen; Broeckling, Corey D; Galloway, Sarah; Prenni, Jessica; De Long, Susan K

    2017-11-15

    Limited knowledge of optimal microbial community composition for PPCP biotreatment, and of the microbial phylotypes that drive biotransformation within mixed microbial communities, has hindered the rational design and operation of effective and reliable biological PPCP treatment technologies. Herein, bacterial community composition was investigated as an isolated variable within batch biofilm reactors via comparison of PPCP removals for three distinct inocula. Inocula pre-acclimated to model PPCPs were derived from activated sludge (AS), ditch sediment historically-impacted by wastewater treatment plant effluent (Sd), and material from laboratory-scale soil aquifer treatment (SAT) columns. PPCP removals were found to be substantially higher for AS- and Sd-derived inocula compared to the SAT-derived inocula despite comparable biomass. Removal patterns differed among the 6 model compounds examined (diclofenac, 5-fluorouracil, gabapentin, gemfibrozil, ibuprofen, and triclosan) indicating differences in biotransformation mechanisms. Sphingomonas, Beijerinckia, Methylophilus, and unknown Cytophagaceae were linked with successful PPCP biodegradation via next-generation sequencing of 16S rRNA genes over time. Results indicate the criticality of applying engineering approaches to control bacterial community compositions in biotreatment systems. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Short-term application of dexamethasone on stem cells derived from human gingiva reduces the expression of RUNX2 and β-catenin.

    PubMed

    Kim, Bo-Bae; Kim, Minji; Park, Yun-Hee; Ko, Youngkyung; Park, Jun-Beom

    2017-06-01

    Objective Next-generation sequencing was performed to evaluate the effects of short-term application of dexamethasone on human gingiva-derived mesenchymal stem cells. Methods Human gingiva-derived stem cells were treated with a final concentration of 10 -7  M dexamethasone and the same concentration of vehicle control. This was followed by mRNA sequencing and data analysis, gene ontology and pathway analysis, quantitative real-time polymerase chain reaction of mRNA, and western blot analysis of RUNX2 and β-catenin. Results In total, 26,364 mRNAs were differentially expressed. Comparison of the results of dexamethasone versus control at 2 hours revealed that 7 mRNAs were upregulated and 25 mRNAs were downregulated. The application of dexamethasone reduced the expression of RUNX2 and β-catenin in human gingiva-derived mesenchymal stem cells. Conclusion The effects of dexamethasone on stem cells were evaluated with mRNA sequencing, and validation of the expression was performed with qualitative real-time polymerase chain reaction and western blot analysis. The results of this study can provide new insights into the role of mRNA sequencing in maxillofacial areas.

  2. A new RT-PCR assay for the identification of the predominant recombination types in 2C and 3D genomic regions of vaccine-derived poliovirus strains.

    PubMed

    Pliaka, V; Dedepsidis, E; Kyriakopoulou, Z; Mpirli, K; Tsakogiannis, D; Pratti, A; Levidiotou-Stefanou, S; Markoulatos, P

    2010-06-01

    In the post-eradication era of wild polioviruses, the only remaining sources of poliovirus infection worldwide would be the vaccine-derived polioviruses (VDPVs). As the preponderance of countries certified to be polio-free has switched from OPV (oral poliovirus vaccine) to IPV (inactivated poliovirus vaccine), importation of recombinant evolved derivatives of vaccinal strains would have serious implication for public health. To test the robustness of the proposed RT-PCR screening analysis, eleven recombinant vaccine-derived polioviruses that were characterized previously by sequencing by our group, in addition to three recently identified recombinant environmental isolates were assayed. Although the most definitive characterization of VDPVs is by genomic sequencing, in this study we describe a new, inexpensive and broadly applicable RT-PCR assay for the identification of the predominant recombination types S3/Sx in 2C and S2/Sx in 3D genomic regions respectively of VDPVs, that can be readily implemented in laboratories lacking sequencing facilities as a first approach for the early detection of vaccine-derived poliovirus (VDPVs).

  3. Molecular diversity, cultivation, and improved detection by fluorescent in situ hybridization of a dominant group of human gut bacteria related to Roseburia spp. or Eubacterium rectale.

    PubMed

    Aminov, Rustam I; Walker, Alan W; Duncan, Sylvia H; Harmsen, Hermie J M; Welling, Gjalt W; Flint, Harry J

    2006-09-01

    Phylogenetic analysis was used to compare 16S rRNA sequences from 19 cultured human gut strains of Roseburia and Eubacterium rectale with 356 related sequences derived from clone libraries. The cultured strains were found to represent five of the six phylotypes identified. A new oligonucleotide probe, Rrec584, and the previous group probe Rint623, when used in conjunction with a new helper oligonucleotide, each recognized an average of 7% of bacteria detected by the eubacterial probe Eub338 in feces from 10 healthy volunteers. Most of the diversity within this important group of butyrate-producing gut bacteria can apparently be retrieved through cultivation.

  4. Molecular Diversity, Cultivation, and Improved Detection by Fluorescent In Situ Hybridization of a Dominant Group of Human Gut Bacteria Related to Roseburia spp. or Eubacterium rectale

    PubMed Central

    Aminov, Rustam I.; Walker, Alan W.; Duncan, Sylvia H.; Harmsen, Hermie J. M.; Welling, Gjalt W.; Flint, Harry J.

    2006-01-01

    Phylogenetic analysis was used to compare 16S rRNA sequences from 19 cultured human gut strains of Roseburia and Eubacterium rectale with 356 related sequences derived from clone libraries. The cultured strains were found to represent five of the six phylotypes identified. A new oligonucleotide probe, Rrec584, and the previous group probe Rint623, when used in conjunction with a new helper oligonucleotide, each recognized an average of 7% of bacteria detected by the eubacterial probe Eub338 in feces from 10 healthy volunteers. Most of the diversity within this important group of butyrate-producing gut bacteria can apparently be retrieved through cultivation. PMID:16957265

  5. Pathogenicity, sequence and phylogenetic analysis of Malaysian Chicken anaemia virus obtained after low and high passages in MSB-1 cells.

    PubMed

    Chowdhury, S M Z H; Omar, A R; Aini, I; Hair-Bejo, M; Jamaluddin, A A; Md-Zain, B M; Kono, Y

    2003-12-01

    Specific-pathogen-free (SPF) chickens inoculated with low passage Chicken anaemia virus (CAV), SMSC-1 and 3-1 isolates produced lesions suggestive of CAV infection. Repeated passages of the isolates in cell culture until passage 60 (P60) and passage 123 produced viruses that showed a significantly reduced level of pathogenicity in SPF chickens compared to the low passage isolates. Sequence comparison indicated that nucleotide changes in only the coding region of the P60 passage isolates were thought to contribute to virus attenuation. Phylogenetic analysis indicated that SMSC-1 and 3-1 were highly divergent, but their P60 passage derivatives shared significant homology to a Japanese isolate A2.

  6. Transcriptome-based differentiation of closely-related Miscanthus lines.

    PubMed

    Chouvarine, Philippe; Cooksey, Amanda M; McCarthy, Fiona M; Ray, David A; Baldwin, Brian S; Burgess, Shane C; Peterson, Daniel G

    2012-01-01

    Distinguishing between individuals is critical to those conducting animal/plant breeding, food safety/quality research, diagnostic and clinical testing, and evolutionary biology studies. Classical genetic identification studies are based on marker polymorphisms, but polymorphism-based techniques are time and labor intensive and often cannot distinguish between closely related individuals. Illumina sequencing technologies provide the detailed sequence data required for rapid and efficient differentiation of related species, lines/cultivars, and individuals in a cost-effective manner. Here we describe the use of Illumina high-throughput exome sequencing, coupled with SNP mapping, as a rapid means of distinguishing between related cultivars of the lignocellulosic bioenergy crop giant miscanthus (Miscanthus × giganteus). We provide the first exome sequence database for Miscanthus species complete with Gene Ontology (GO) functional annotations. A SNP comparative analysis of rhizome-derived cDNA sequences was successfully utilized to distinguish three Miscanthus × giganteus cultivars from each other and from other Miscanthus species. Moreover, the resulting phylogenetic tree generated from SNP frequency data parallels the known breeding history of the plants examined. Some of the giant miscanthus plants exhibit considerable sequence divergence. Here we describe an analysis of Miscanthus in which high-throughput exome sequencing was utilized to differentiate between closely related genotypes despite the current lack of a reference genome sequence. We functionally annotated the exome sequences and provide resources to support Miscanthus systems biology. In addition, we demonstrate the use of the commercial high-performance cloud computing to do computational GO annotation.

  7. Multidrug Resistance-Associated Protein 3 (Mrp3/Abcc3/Moat-D) Is Expressed in the SAE Squalus acanthias Shark Embryo–Derived Cell Line

    PubMed Central

    Kobayashi, Hiroshi; Parton, Angela; Czechanski, Anne; Durkin, Christopher; Kong, Chi-Chon; Barnes, David

    2008-01-01

    The multidrug resistance-associated protein 3 (MRP3/Mrp3) is a member of the ATP-binding cassette (ABC) protein family of membrane transporters and related proteins that act on a variety of xenobiotic and anionic molecules to transfer these substrates in an ATP-dependent manner. In recent years, useful comparative information regarding evolutionarily conserved structure and transport functions of these proteins has accrued through the use of primitive marine animals such as cartilaginous fish. Until recently, one missing tool in comparative studies with cartilaginous fish was cell culture. We have derived from the embryo of Squalus acanthias, the spiny dogfish shark, the S. acanthias embryo (SAE) mesenchymal stem cell line. This is the first continuously proliferating cell line from a cartilaginous fish. We identified expression of Mrp3 in this cell line, cloned the molecule, and examined molecular and cellular physiological aspects of the protein. Shark Mrp3 is characterized by three membrane-spanning domains and two nucleotide-binding domains. Multiple alignments with other species showed that the shark Mrp3 amino acid sequence was well conserved. The shark sequence was overall 64% identical to human MRP3, 72% identical to chicken Mrp3, and 71% identical to frog and stickleback Mrp3. Highest identity between shark and human amino acid sequence (82%) was seen in the carboxyl-terminal nucleotide-binding domain of the proteins. Cell culture experiments showed that mRNA for the protein was induced as much as 25-fold by peptide growth factors, fetal bovine serum, and lipid nutritional components, with the largest effect mediated by a combination of lipids including unsaturated and saturated fatty acids, cholesterol, and vitamin E. PMID:18284333

  8. Multidrug resistance-associated protein 3 (Mrp3/Abcc3/Moat-D) is expressed in the SAE Squalus acanthias shark embryo-derived cell line.

    PubMed

    Kobayashi, Hiroshi; Parton, Angela; Czechanski, Anne; Durkin, Christopher; Kong, Chi-Chon; Barnes, David

    2007-01-01

    The multidrug resistance-associated protein 3 (MRP3/Mrp3) is a member of the ATP-binding cassette (ABC) protein family of membrane transporters and related proteins that act on a variety of xenobiotic and anionic molecules to transfer these substrates in an ATP-dependent manner. In recent years, useful comparative information regarding evolutionarily conserved structure and transport functions of these proteins has accrued through the use of primitive marine animals such as cartilaginous fish. Until recently, one missing tool in comparative studies with cartilaginous fish was cell culture. We have derived from the embryo of Squalus acanthias, the spiny dogfish shark, the S. acanthias embryo (SAE) mesenchymal stem cell line. This is the first continuously proliferating cell line from a cartilaginous fish. We identified expression of Mrp3 in this cell line, cloned the molecule, and examined molecular and cellular physiological aspects of the protein. Shark Mrp3 is characterized by three membrane-spanning domains and two nucleotide-binding domains. Multiple alignments with other species showed that the shark Mrp3 amino acid sequence was well conserved. The shark sequence was overall 64% identical to human MRP3, 72% identical to chicken Mrp3, and 71% identical to frog and stickleback Mrp3. Highest identity between shark and human amino acid sequence (82%) was seen in the carboxyl-terminal nucleotide-binding domain of the proteins. Cell culture experiments showed that mRNA for the protein was induced as much as 25-fold by peptide growth factors, fetal bovine serum, and lipid nutritional components, with the largest effect mediated by a combination of lipids including unsaturated and saturated fatty acids, cholesterol, and vitamin E.

  9. Integrating motion, illumination, and structure in video sequences with applications in illumination-invariant tracking.

    PubMed

    Xu, Yilei; Roy-Chowdhury, Amit K

    2007-05-01

    In this paper, we present a theory for combining the effects of motion, illumination, 3D structure, albedo, and camera parameters in a sequence of images obtained by a perspective camera. We show that the set of all Lambertian reflectance functions of a moving object, at any position, illuminated by arbitrarily distant light sources, lies "close" to a bilinear subspace consisting of nine illumination variables and six motion variables. This result implies that, given an arbitrary video sequence, it is possible to recover the 3D structure, motion, and illumination conditions simultaneously using the bilinear subspace formulation. The derivation builds upon existing work on linear subspace representations of reflectance by generalizing it to moving objects. Lighting can change slowly or suddenly, locally or globally, and can originate from a combination of point and extended sources. We experimentally compare the results of our theory with ground truth data and also provide results on real data by using video sequences of a 3D face and the entire human body with various combinations of motion and illumination directions. We also show results of our theory in estimating 3D motion and illumination model parameters from a video sequence.

  10. Probing the Rare Biosphere of the North-West Mediterranean Sea: An Experiment with High Sequencing Effort.

    PubMed

    Crespo, Bibiana G; Wallhead, Philip J; Logares, Ramiro; Pedrós-Alió, Carlos

    2016-01-01

    High-throughput sequencing (HTS) techniques have suggested the existence of a wealth of species with very low relative abundance: the rare biosphere. We attempted to exhaustively map this rare biosphere in two water samples by performing an exceptionally deep pyrosequencing analysis (~500,000 final reads per sample). Species data were derived by a 97% identity criterion and various parametric distributions were fitted to the observed counts. Using the best-fitting Sichel distribution we estimate a total species richness of 1,568-1,669 (95% Credible Interval) and 5,027-5,196 for surface and deep water samples respectively, implying that 84-89% of the total richness in those two samples was sequenced, and we predict that a quadrupling of the present sequencing effort would suffice to observe 90% of the total richness in both samples. Comparing the HTS results with a culturing approach we found that most of the cultured taxa were not obtained by HTS, despite the high sequencing effort. Culturing therefore remains a useful tool for uncovering marine bacterial diversity, in addition to its other uses for studying the ecology of marine bacteria.

  11. Robust temporal alignment of multimodal cardiac sequences

    NASA Astrophysics Data System (ADS)

    Perissinotto, Andrea; Queirós, Sandro; Morais, Pedro; Baptista, Maria J.; Monaghan, Mark; Rodrigues, Nuno F.; D'hooge, Jan; Vilaça, João. L.; Barbosa, Daniel

    2015-03-01

    Given the dynamic nature of cardiac function, correct temporal alignment of pre-operative models and intraoperative images is crucial for augmented reality in cardiac image-guided interventions. As such, the current study focuses on the development of an image-based strategy for temporal alignment of multimodal cardiac imaging sequences, such as cine Magnetic Resonance Imaging (MRI) or 3D Ultrasound (US). First, we derive a robust, modality-independent signal from the image sequences, estimated by computing the normalized cross-correlation between each frame in the temporal sequence and the end-diastolic frame. This signal is a resembler for the left-ventricle (LV) volume curve over time, whose variation indicates different temporal landmarks of the cardiac cycle. We then perform the temporal alignment of these surrogate signals derived from MRI and US sequences of the same patient through Dynamic Time Warping (DTW), allowing to synchronize both sequences. The proposed framework was evaluated in 98 patients, which have undergone both 3D+t MRI and US scans. The end-systolic frame could be accurately estimated as the minimum of the image-derived surrogate signal, presenting a relative error of 1.6 +/- 1.9% and 4.0 +/- 4.2% for the MRI and US sequences, respectively, thus supporting its association with key temporal instants of the cardiac cycle. The use of DTW reduces the desynchronization of the cardiac events in MRI and US sequences, allowing to temporally align multimodal cardiac imaging sequences. Overall, a generic, fast and accurate method for temporal synchronization of MRI and US sequences of the same patient was introduced. This approach could be straightforwardly used for the correct temporal alignment of pre-operative MRI information and intra-operative US images.

  12. Cloning and expression of a cDNA coding for a human monocyte-derived plasminogen activator inhibitor.

    PubMed

    Antalis, T M; Clark, M A; Barnes, T; Lehrbach, P R; Devine, P L; Schevzov, G; Goss, N H; Stephens, R W; Tolstoshev, P

    1988-02-01

    Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding the remainder of the mPAI-2 mRNA was obtained by primer extension of U937 poly(A)+ RNA using a probe complementary to the mPAI-2 coding region. The coding sequence for mPAI-2 was placed under the control of the lambda PL promoter, and the protein expressed in Escherichia coli formed a complex with urokinase that could be detected immunologically. By nucleotide sequence analysis, mPAI-2 cDNA encodes a protein containing 415 amino acids with a predicted unglycosylated Mr of 46,543. The predicted amino acid sequence of mPAI-2 is very similar to placental PAI-2 (3 amino acid differences) and shows extensive homology with members of the serine protease inhibitor (serpin) superfamily. mPAI-2 was found to be more homologous to ovalbumin (37%) than the endothelial plasminogen activator inhibitor, PAI-1 (26%). Like ovalbumin, mPAI-2 appears to have no typical amino-terminal signal sequence. The 3' untranslated region of the mPAI-2 cDNA contains a putative regulatory sequence that has been associated with the inflammatory mediators.

  13. Cloning and expression of a cDNA coding for a human monocyte-derived plasminogen activator inhibitor.

    PubMed Central

    Antalis, T M; Clark, M A; Barnes, T; Lehrbach, P R; Devine, P L; Schevzov, G; Goss, N H; Stephens, R W; Tolstoshev, P

    1988-01-01

    Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding the remainder of the mPAI-2 mRNA was obtained by primer extension of U937 poly(A)+ RNA using a probe complementary to the mPAI-2 coding region. The coding sequence for mPAI-2 was placed under the control of the lambda PL promoter, and the protein expressed in Escherichia coli formed a complex with urokinase that could be detected immunologically. By nucleotide sequence analysis, mPAI-2 cDNA encodes a protein containing 415 amino acids with a predicted unglycosylated Mr of 46,543. The predicted amino acid sequence of mPAI-2 is very similar to placental PAI-2 (3 amino acid differences) and shows extensive homology with members of the serine protease inhibitor (serpin) superfamily. mPAI-2 was found to be more homologous to ovalbumin (37%) than the endothelial plasminogen activator inhibitor, PAI-1 (26%). Like ovalbumin, mPAI-2 appears to have no typical amino-terminal signal sequence. The 3' untranslated region of the mPAI-2 cDNA contains a putative regulatory sequence that has been associated with the inflammatory mediators. Images PMID:3257578

  14. A Benchmark Study on Error Assessment and Quality Control of CCS Reads Derived from the PacBio RS

    PubMed Central

    Jiao, Xiaoli; Zheng, Xin; Ma, Liang; Kutty, Geetha; Gogineni, Emile; Sun, Qiang; Sherman, Brad T.; Hu, Xiaojun; Jones, Kristine; Raley, Castle; Tran, Bao; Munroe, David J.; Stephens, Robert; Liang, Dun; Imamichi, Tomozumi; Kovacs, Joseph A.; Lempicki, Richard A.; Huang, Da Wei

    2013-01-01

    PacBio RS, a newly emerging third-generation DNA sequencing platform, is based on a real-time, single-molecule, nano-nitch sequencing technology that can generate very long reads (up to 20-kb) in contrast to the shorter reads produced by the first and second generation sequencing technologies. As a new platform, it is important to assess the sequencing error rate, as well as the quality control (QC) parameters associated with the PacBio sequence data. In this study, a mixture of 10 prior known, closely related DNA amplicons were sequenced using the PacBio RS sequencing platform. After aligning Circular Consensus Sequence (CCS) reads derived from the above sequencing experiment to the known reference sequences, we found that the median error rate was 2.5% without read QC, and improved to 1.3% with an SVM based multi-parameter QC method. In addition, a De Novo assembly was used as a downstream application to evaluate the effects of different QC approaches. This benchmark study indicates that even though CCS reads are post error-corrected it is still necessary to perform appropriate QC on CCS reads in order to produce successful downstream bioinformatics analytical results. PMID:24179701

  15. A Benchmark Study on Error Assessment and Quality Control of CCS Reads Derived from the PacBio RS.

    PubMed

    Jiao, Xiaoli; Zheng, Xin; Ma, Liang; Kutty, Geetha; Gogineni, Emile; Sun, Qiang; Sherman, Brad T; Hu, Xiaojun; Jones, Kristine; Raley, Castle; Tran, Bao; Munroe, David J; Stephens, Robert; Liang, Dun; Imamichi, Tomozumi; Kovacs, Joseph A; Lempicki, Richard A; Huang, Da Wei

    2013-07-31

    PacBio RS, a newly emerging third-generation DNA sequencing platform, is based on a real-time, single-molecule, nano-nitch sequencing technology that can generate very long reads (up to 20-kb) in contrast to the shorter reads produced by the first and second generation sequencing technologies. As a new platform, it is important to assess the sequencing error rate, as well as the quality control (QC) parameters associated with the PacBio sequence data. In this study, a mixture of 10 prior known, closely related DNA amplicons were sequenced using the PacBio RS sequencing platform. After aligning Circular Consensus Sequence (CCS) reads derived from the above sequencing experiment to the known reference sequences, we found that the median error rate was 2.5% without read QC, and improved to 1.3% with an SVM based multi-parameter QC method. In addition, a De Novo assembly was used as a downstream application to evaluate the effects of different QC approaches. This benchmark study indicates that even though CCS reads are post error-corrected it is still necessary to perform appropriate QC on CCS reads in order to produce successful downstream bioinformatics analytical results.

  16. Differential recognition of the ORF2 region in a complete genome sequence of porcine circovirus type 2 (PCV2) isolated from boar bone marrow in Korea.

    PubMed

    Kweon, Chang-Hee; Nguyen, Lien Thi Kim; Yoo, Mi-Sun; Kang, Seung-Won

    2015-09-15

    Porcine circovirus type 2 (PCV2) is the causative agent of post-weaning multisystemic wasting syndrome (PMWS) in swine. Here, a phylogenetic tree was constructed using PCV2 nucleotide sequences derived from the bone marrow of Korean boar and previously reported PCV2 sequences isolated from various countries. PCV2 from Korean boar bone marrow (KC188796) was classified into the group containing PCV2a-Canada and other PCV2 strain from Korea. While the ORF1 region of the PCV2 genome was highly conserved, ORF2 (the capsid protein coding region) was relatively variable. The nucleotide sequences for bone marrow-derived PCV2 were 93.4-99.0% homologous to the other reference sequences. The deduced amino acid sequences for the ORF1 and ORF2 coding regions were 97.4-99.3% and 84.5-97.4% homologous with the other reference strains, respectively, indicating that KC188796 did not differ markedly from the other PCV2 strains. Phylogenetic analysis demonstrated that bone marrow-derived PCV2 was highly similar to PCV2a from Canada and may be related to persistent PCV2 infections in swine. Copyright © 2015 Elsevier B.V. All rights reserved.

  17. Theileria parva antigens recognized by CD8+ T cells show varying degrees of diversity in buffalo-derived infected cell lines.

    PubMed

    Sitt, Tatjana; Pelle, Roger; Chepkwony, Maurine; Morrison, W Ivan; Toye, Philip

    2018-05-06

    The extent of sequence diversity among the genes encoding 10 antigens (Tp1-10) known to be recognized by CD8+ T lymphocytes from cattle immune to Theileria parva was analysed. The sequences were derived from parasites in 23 buffalo-derived cell lines, three cattle-derived isolates and one cloned cell line obtained from a buffalo-derived stabilate. The results revealed substantial variation among the antigens through sequence diversity. The greatest nucleotide and amino acid diversity were observed in Tp1, Tp2 and Tp9. Tp5 and Tp7 showed the least amount of allelic diversity, and Tp5, Tp6 and Tp7 had the lowest levels of protein diversity. Tp6 was the most conserved protein; only a single non-synonymous substitution was found in all obtained sequences. The ratio of non-synonymous: synonymous substitutions varied from 0.84 (Tp1) to 0.04 (Tp6). Apart from Tp2 and Tp9, we observed no variation in the other defined CD8+ T cell epitopes (Tp4, 5, 7 and 8), indicating that epitope variation is not a universal feature of T. parva antigens. In addition to providing markers that can be used to examine the diversity in T. parva populations, the results highlight the potential for using conserved antigens to develop vaccines that provide broad protection against T. parva.

  18. Construction of an Integrated High Density Simple Sequence Repeat Linkage Map in Cultivated Strawberry (Fragaria × ananassa) and its Applicability

    PubMed Central

    Isobe, Sachiko N.; Hirakawa, Hideki; Sato, Shusei; Maeda, Fumi; Ishikawa, Masami; Mori, Toshiki; Yamamoto, Yuko; Shirasawa, Kenta; Kimura, Mitsuhiro; Fukami, Masanobu; Hashizume, Fujio; Tsuji, Tomoko; Sasamoto, Shigemi; Kato, Midori; Nanri, Keiko; Tsuruoka, Hisano; Minami, Chiharu; Takahashi, Chika; Wada, Tsuyuko; Ono, Akiko; Kawashima, Kumiko; Nakazaki, Naomi; Kishida, Yoshie; Kohara, Mitsuyo; Nakayama, Shinobu; Yamada, Manabu; Fujishiro, Tsunakazu; Watanabe, Akiko; Tabata, Satoshi

    2013-01-01

    The cultivated strawberry (Fragaria× ananassa) is an octoploid (2n = 8x = 56) of the Rosaceae family whose genomic architecture is still controversial. Several recent studies support the AAA′A′BBB′B′ model, but its complexity has hindered genetic and genomic analysis of this important crop. To overcome this difficulty and to assist genome-wide analysis of F. × ananassa, we constructed an integrated linkage map by organizing a total of 4474 of simple sequence repeat (SSR) markers collected from published Fragaria sequences, including 3746 SSR markers [Fragaria vesca expressed sequence tag (EST)-derived SSR markers] derived from F. vesca ESTs, 603 markers (F. × ananassa EST-derived SSR markers) from F. × ananassa ESTs, and 125 markers (F. × ananassa transcriptome-derived SSR markers) from F. × ananassa transcripts. Along with the previously published SSR markers, these markers were mapped onto five parent-specific linkage maps derived from three mapping populations, which were then assembled into an integrated linkage map. The constructed map consists of 1856 loci in 28 linkage groups (LGs) that total 2364.1 cM in length. Macrosynteny at the chromosome level was observed between the LGs of F. × ananassa and the genome of F. vesca. Variety distinction on 129 F. × ananassa lines was demonstrated using 45 selected SSR markers. PMID:23248204

  19. The Complete Chloroplast and Mitochondrial Genome Sequences of Boea hygrometrica: Insights into the Evolution of Plant Organellar Genomes

    PubMed Central

    Wang, Xumin; Deng, Xin; Zhang, Xiaowei; Hu, Songnian; Yu, Jun

    2012-01-01

    The complete nucleotide sequences of the chloroplast (cp) and mitochondrial (mt) genomes of resurrection plant Boea hygrometrica (Bh, Gesneriaceae) have been determined with the lengths of 153,493 bp and 510,519 bp, respectively. The smaller chloroplast genome contains more genes (147) with a 72% coding sequence, and the larger mitochondrial genome have less genes (65) with a coding faction of 12%. Similar to other seed plants, the Bh cp genome has a typical quadripartite organization with a conserved gene in each region. The Bh mt genome has three recombinant sequence repeats of 222 bp, 843 bp, and 1474 bp in length, which divide the genome into a single master circle (MC) and four isomeric molecules. Compared to other angiosperms, one remarkable feature of the Bh mt genome is the frequent transfer of genetic material from the cp genome during recent Bh evolution. We also analyzed organellar genome evolution in general regarding genome features as well as compositional dynamics of sequence and gene structure/organization, providing clues for the understanding of the evolution of organellar genomes in plants. The cp-derived sequences including tRNAs found in angiosperm mt genomes support the conclusion that frequent gene transfer events may have begun early in the land plant lineage. PMID:22291979

  20. Bacterial population dynamics during the ensiling of Medicago sativa (alfalfa) and subsequent exposure to air.

    PubMed

    McGarvey, J A; Franco, R B; Palumbo, J D; Hnasko, R; Stanker, L; Mitloehner, F M

    2013-06-01

    To describe, at high resolution, the bacterial population dynamics and chemical transformations during the ensiling of alfalfa and subsequent exposure to air. Samples of alfalfa, ensiled alfalfa and silage exposed to air were collected and their bacterial population structures compared using 16S rRNA gene libraries containing approximately 1900 sequences each. Cultural and chemical analyses were also performed to complement the 16S gene sequence data. Sequence analysis revealed significant differences (P < 0·05) in the bacterial populations at each time point. The alfalfa-derived library contained mostly sequences associated with the Gammaproteobacteria (including the genera: Enterobacter, Erwinia and Pantoea); the ensiled material contained mostly sequences associated with the lactic acid bacteria (LAB) (including the genera: Lactobacillus, Pediococcus and Lactococcus). Exposure to air resulted in even greater percentages of LAB, especially among the genus Lactobacillus, and a significant drop in bacterial diversity. In-depth 16S rRNA gene sequence analysis revealed significant bacterial population structure changes during ensiling and again during exposure to air. This in-depth description of the bacterial population dynamics that occurred during ensiling and simulated feed out expands our knowledge of these processes. © 2013 The Society for Applied Microbiology No claim to US Government works.

  1. Comprehensive Survey of Genetic Diversity in Chloroplast Genomes and 45S nrDNAs within Panax ginseng Species

    PubMed Central

    Kim, Kyunghee; Lee, Sang-Choon; Lee, Junki; Lee, Hyun Oh; Joh, Ho Jun; Kim, Nam-Hoon; Park, Hyun-Seung; Yang, Tae-Jin

    2015-01-01

    We report complete sequences of chloroplast (cp) genome and 45S nuclear ribosomal DNA (45S nrDNA) for 11 Panax ginseng cultivars. We have obtained complete sequences of cp and 45S nrDNA, the representative barcoding target sequences for cytoplasm and nuclear genome, respectively, based on low coverage NGS sequence of each cultivar. The cp genomes sizes ranged from 156,241 to 156,425 bp and the major size variation was derived from differences in copy number of tandem repeats in the ycf1 gene and in the intergenic regions of rps16-trnUUG and rpl32-trnUAG. The complete 45S nrDNA unit sequences were 11,091 bp, representing a consensus single transcriptional unit with an intergenic spacer region. Comparative analysis of these sequences as well as those previously reported for three Chinese accessions identified very rare but unique polymorphism in the cp genome within P. ginseng cultivars. There were 12 intra-species polymorphisms (six SNPs and six InDels) among 14 cultivars. We also identified five SNPs from 45S nrDNA of 11 Korean ginseng cultivars. From the 17 unique informative polymorphic sites, we developed six reliable markers for analysis of ginseng diversity and cultivar authentication. PMID:26061692

  2. Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes).

    PubMed

    Dessimoz, Christophe; Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

    2011-09-01

    Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references.

  3. Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes)

    PubMed Central

    Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

    2011-01-01

    Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references. PMID:21712341

  4. Whale song analyses using bioinformatics sequence analysis approaches

    NASA Astrophysics Data System (ADS)

    Chen, Yian A.; Almeida, Jonas S.; Chou, Lien-Siang

    2005-04-01

    Animal songs are frequently analyzed using discrete hierarchical units, such as units, themes and songs. Because animal songs and bio-sequences may be understood as analogous, bioinformatics analysis tools DNA/protein sequence alignment and alignment-free methods are proposed to quantify the theme similarities of the songs of false killer whales recorded off northeast Taiwan. The eighteen themes with discrete units that were identified in an earlier study [Y. A. Chen, masters thesis, University of Charleston, 2001] were compared quantitatively using several distance metrics. These metrics included the scores calculated using the Smith-Waterman algorithm with the repeated procedure; the standardized Euclidian distance and the angle metrics based on word frequencies. The theme classifications based on different metrics were summarized and compared in dendrograms using cluster analyses. The results agree with earlier classifications derived by human observation qualitatively. These methods further quantify the similarities among themes. These methods could be applied to the analyses of other animal songs on a larger scale. For instance, these techniques could be used to investigate song evolution and cultural transmission quantifying the dissimilarities of humpback whale songs across different seasons, years, populations, and geographic regions. [Work supported by SC Sea Grant, and Ilan County Government, Taiwan.

  5. Genotype and Phenotype of Echinococcus granulosus Derived from Wild Sheep (Ovis orientalis) in Iran.

    PubMed

    Eslami, Ali; Meshgi, Behnam; Jalousian, Fatemeh; Rahmani, Shima; Salari, Mohammad Ali

    2016-02-01

    The aim of the present study is to determine the characteristics of genotype and phenotype of Echinococcus granulosus derived from wild sheep and to compare them with the strains of E. granulosus sensu stricto (sheep-dog) and E. granulosus camel strain (camel-dog) in Iran. In Khojir National Park, near Tehran, Iran, a fertile hydatid cyst was recently found in the liver of a dead wild sheep (Ovis orientalis). The number of protoscolices (n=6,000) proved enough for an experimental infection in a dog. The characteristics of large and small hooks of metacestode were statistically determined as the sensu stricto strain but not the camel strain (P=0.5). To determine E. granulosus genotype, 20 adult worms of this type were collected from the infected dog. The second internal transcribed spacer (ITS2) of the nuclear ribosomal DNA (rDNA) and cytochrome c oxidase 1 subunit (COX1) of the mitochondrial DNA were amplified from individual adult worm by PCR. Subsequently, the PCR product was sequenced by Sanger method. The lengths of ITS2 and COX1 sequences were 378 and 857 bp, respectively, for all the sequenced samples. The amplified DNA sequences from both ribosomal and mitochondrial genes were highly similar (99% and 98%, respectively) to that of the ovine strain in the GenBank database. The results of the present study indicate that the morpho-molecular features and characteristics of E. granulosus in the Iranian wild sheep are the same as those of the sheep-dog E. granulosus sensu stricto strain.

  6. In-depth characterization of breast cancer tumor-promoting cell transcriptome by RNA sequencing and microarrays

    PubMed Central

    Soldà, Giulia; Merlino, Giuseppe; Fina, Emanuela; Brini, Elena; Moles, Anna; Cappelletti, Vera; Daidone, Maria Grazia

    2016-01-01

    Numerous studies have reported the existence of tumor-promoting cells (TPC) with self-renewal potential and a relevant role in drug resistance. However, pathways and modifications involved in the maintenance of such tumor subpopulations are still only partially understood. Sequencing-based approaches offer the opportunity for a detailed study of TPC including their transcriptome modulation. Using microarrays and RNA sequencing approaches, we compared the transcriptional profiles of parental MCF7 breast cancer cells with MCF7-derived TPC (i.e. MCFS). Data were explored using different bioinformatic approaches, and major findings were experimentally validated. The different analytical pipelines (Lifescope and Cufflinks based) yielded similar although not identical results. RNA sequencing data partially overlapped microarray results and displayed a higher dynamic range, although overall the two approaches concordantly predicted pathway modifications. Several biological functions were altered in TPC, ranging from production of inflammatory cytokines (i.e., IL-8 and MCP-1) to proliferation and response to steroid hormones. More than 300 non-coding RNAs were defined as differentially expressed, and 2,471 potential splicing events were identified. A consensus signature of genes up-regulated in TPC was derived and was found to be significantly associated with insensitivity to fulvestrant in a public breast cancer patient dataset. Overall, we obtained a detailed portrait of the transcriptome of a breast cancer TPC line, highlighted the role of non-coding RNAs and differential splicing, and identified a gene signature with a potential as a context-specific biomarker in patients receiving endocrine treatment. PMID:26556871

  7. Long-term excretion of vaccine-derived poliovirus by a healthy child.

    PubMed

    Martín, Javier; Odoom, Kofi; Tuite, Gráinne; Dunn, Glynis; Hopewell, Nicola; Cooper, Gill; Fitzharris, Catherine; Butler, Karina; Hall, William W; Minor, Philip D

    2004-12-01

    A child was found to be excreting type 1 vaccine-derived poliovirus (VDPV) with a 1.1% sequence drift from Sabin type 1 vaccine strain in the VP1 coding region 6 months after he was immunized with oral live polio vaccine. Seventeen type 1 poliovirus isolates were recovered from stools taken from this child during the following 4 months. Contrary to expectation, the child was not deficient in humoral immunity and showed high levels of serum neutralization against poliovirus. Selected virus isolates were characterized in terms of their antigenic properties, virulence in transgenic mice, sensitivity for growth at high temperatures, and differences in nucleotide sequence from the Sabin type 1 strain. The VDPV isolates showed mutations at key nucleotide positions that correlated with the observed reversion to biological properties typical of wild polioviruses. A number of capsid mutations mapped at known antigenic sites leading to changes in the viral antigenic structure. Estimates of sequence evolution based on the accumulation of nucleotide changes in the VP1 coding region detected a "defective" molecular clock running at an apparent faster speed of 2.05% nucleotide changes per year versus 1% shown in previous studies. Remarkably, when compared to several type 1 VDPV strains of different origins, isolates from this child showed a much higher proportion of nonsynonymous versus synonymous nucleotide changes in the capsid coding region. This anomaly could explain the high VP1 sequence drift found and the ability of these virus strains to replicate in the gut for a longer period than expected.

  8. Tropism and infectivity of duck-derived egg drop syndrome virus in chickens.

    PubMed

    Kang, Min; Cha, Se-Yeoun; Jang, Hyung-Kwan

    2017-01-01

    Egg drop syndrome virus (EDSV) can markedly decrease egg production in laying hens. Duck is the natural host of EDSV. EDSV derived from ducks abrogate egg drop in laying hens. We have previously confirmed that duck-derived EDSVs have a variety of replication activities in chick embryo liver (CEL) cells. However, it is currently unclear whether duck-derived EDSV could display tropism and adaptation in laying hens. This study assessed whether duck-derived EDSV can adapt to laying hens, and estimated the inducing factors. Complete genome sequences of duck-derived EDSVs (D11-JW-012, D11-JW-017, and D11-JW-032 isolates) with various replication efficiency in CEL cells and C10-GY-001 isolate causing disease in laying hens were analyzed to find their differences. Phylogenetic analysis of complete genome sequence revealed that C10-GY-001, D11-JW-032, and strain 127 virus as vaccine were clustered into the same group, with D11-JW-012 and D11-JW-017 clustered in another group. Comparison between D11-JW-012 isolate that poorly replicated and D11-JW-017 isolate that replicated well in CEL cells in same cluster revealed six amino acid differences on IVa2, DNA polymerase, endopeptidase, and DNA-binding protein. These amino acids might be key candidates enhancing cellular tropism in chicken. When the pathogenicities of these isolates in laying hens were compared, D11-JW-032 showed severe signs similar to 127 virus, D11-JW-017 showed intermediate signs, while D11-JW-012 showed almost no sign. Eleven amino acids differed between D11-JW-032 and D11-JW-017, and 17 amino acids were different between D11-JW-032 and D11-JW-012. These results suggest that EDSVs derived from ducks have various pathogenicities in laying hens. Key amino acid candidates might have altered their affinity to tropism of laying hens, causing difference pathogenicities.

  9. Characterization of EST-derived and non-EST simple sequence repeats in an F₁ hybrid population of Vitis vinifera L.

    PubMed

    Kayesh, E; Bilkish, N; Liu, G S; Chen, W; Leng, X P; Fang, J G

    2014-03-31

    Among different classes of molecular markers, expressed sequence tags (ESTs) are a new resource for developing simple sequence repeat (SSR) functional markers for genotyping and genetic mapping in F1 hybrid populations of Vitis vinifera L. Recently, because of the availability of an enormous amount of data for ESTs in the public domain, the emphasis has shifted from genomic SSRs to EST-SSRs, which belong to transcribed regions of the genome and may have a role in gene expression or function. The objective of this study was to assess the polymorphisms among 94 F1 hybrids from "Early Rose" and "Red Globe" using 25 EST-derived and 25 non-EST SSR markers. A total collection of 362,375 grape ESTs that were retrieved from the National Center for Biotechnology Information (NCBI) and 2522 EST-SSR sequences were identified. From them, 205 primer pairs were randomly selected, including 176 pairs that were EST-derived and 29 non-EST SSR primer pairs, for polymerase chain reaction amplification. A total of 131 alleles were amplified using 50 pairs of primers; 78 alleles were amplified using EST-derived SSR primers and 53 were from non-EST SSR primers. At most, 6 and 5 alleles were amplified by EST-derived and non-EST SSR primers, respectively. The EST-derived SSR markers showed a maximum polymorphic information content (PIC) value of 1 and a minimum of 0.33 while non-EST SSR markers had maximum and minimum PIC values of 1 and 0.25, respectively. The average PIC value was 0.56 for EST-derived SSR markers and 0.45 for non-EST SSR markers.

  10. Automatic prediction of protein domains from sequence information using a hybrid learning system.

    PubMed

    Nagarajan, Niranjan; Yona, Golan

    2004-06-12

    We describe a novel method for detecting the domain structure of a protein from sequence information alone. The method is based on analyzing multiple sequence alignments that are derived from a database search. Multiple measures are defined to quantify the domain information content of each position along the sequence and are combined into a single predictor using a neural network. The output is further smoothed and post-processed using a probabilistic model to predict the most likely transition positions between domains. The method was assessed using the domain definitions in SCOP and CATH for proteins of known structure and was compared with several other existing methods. Our method performs well both in terms of accuracy and sensitivity. It improves significantly over the best methods available, even some of the semi-manual ones, while being fully automatic. Our method can also be used to suggest and verify domain partitions based on structural data. A few examples of predicted domain definitions and alternative partitions, as suggested by our method, are also discussed. An online domain-prediction server is available at http://biozon.org/tools/domains/

  11. Characterization of genic microsatellite markers derived from expressed sequence tags in Pacific abalone ( Haliotis discus hannai)

    NASA Astrophysics Data System (ADS)

    Li, Qi; Shu, Jing; Zhao, Cui; Liu, Shikai; Kong, Lingfeng; Zheng, Xiaodong

    2010-01-01

    Simple sequence repeat (SSR) markers were developed from the expressed sequence tags (ESTs) of Pacific abalone ( Haliotis discus hannai). Repeat motifs were found in 4.95% of the ESTs at a frequency of one repeat every 10.04 kb of EST sequences, after redundancy elimination. Seventeen polymorphic EST-SSRs were developed. The number of alleles per locus varied from 2-17, with an average of 6.8 alleles per locus. The expected and observed heterozygosities ranged from 0.159 to 0.928 and from 0.132 to 0.922, respectively. Twelve of the 17 loci (70.6%) were successfully amplified in H. diversicolor. Seventeen loci segregated in three families, with three showing the presence of null alleles (17.6%). The adequate level of variability and low frequency of null alleles observed in H. discus hannai, together with the high rate of transportability across Haliotis species, make this set of EST-SSR markers an important tool for comparative mapping, marker-assisted selection, and evolutionary studies, not only in the Pacific abalone, but also in related species.

  12. Principles of long noncoding RNA evolution derived from direct comparison of transcriptomes in 17 species.

    PubMed

    Hezroni, Hadas; Koppstein, David; Schwartz, Matthew G; Avrutin, Alexandra; Bartel, David P; Ulitsky, Igor

    2015-05-19

    The inability to predict long noncoding RNAs from genomic sequence has impeded the use of comparative genomics for studying their biology. Here, we develop methods that use RNA sequencing (RNA-seq) data to annotate the transcriptomes of 16 vertebrates and the echinoid sea urchin, uncovering thousands of previously unannotated genes, most of which produce long intervening noncoding RNAs (lincRNAs). Although in each species, >70% of lincRNAs cannot be traced to homologs in species that diverged >50 million years ago, thousands of human lincRNAs have homologs with similar expression patterns in other species. These homologs share short, 5'-biased patches of sequence conservation nested in exonic architectures that have been extensively rewired, in part by transposable element exonization. Thus, over a thousand human lincRNAs are likely to have conserved functions in mammals, and hundreds beyond mammals, but those functions require only short patches of specific sequences and can tolerate major changes in gene architecture. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  13. A database of annotated tentative orthologs from crop abiotic stress transcripts.

    PubMed

    Balaji, Jayashree; Crouch, Jonathan H; Petite, Prasad V N S; Hoisington, David A

    2006-10-07

    A minimal requirement to initiate a comparative genomics study on plant responses to abiotic stresses is a dataset of orthologous sequences. The availability of a large amount of sequence information, including those derived from stress cDNA libraries allow for the identification of stress related genes and orthologs associated with the stress response. Orthologous sequences serve as tools to explore genes and their relationships across species. For this purpose, ESTs from stress cDNA libraries across 16 crop species including 6 important cereal crops and 10 dicots were systematically collated and subjected to bioinformatics analysis such as clustering, grouping of tentative orthologous sets, identification of protein motifs/patterns in the predicted protein sequence, and annotation with stress conditions, tissue/library source and putative function. All data are available to the scientific community at http://intranet.icrisat.org/gt1/tog/homepage.htm. We believe that the availability of annotated plant abiotic stress ortholog sets will be a valuable resource for researchers studying the biology of environmental stresses in plant systems, molecular evolution and genomics.

  14. Phylogenetic Analysis of Aedes aegypti Based on Mitochondrial ND4 Gene Sequences in Almadinah, Saudi Arabia.

    PubMed

    Ali, Khalil H Al; El-Badry, Ayman A; Ali, Mouhanad Al; El-Sayed, Wael S M; El-Beshbishy, Hesham A

    2016-06-01

    Aedes aegypti is the main vector of the yellow fever and dengue virus. This mosquito has become the major indirect cause of morbidity and mortality of the human worldwide. Dengue virus activity has been reported recently in the western areas of Saudi Arabia. There is no vaccine for dengue virus until now, and the control of the disease depends on the control of the vector. The present study has aimed to perform phylogenetic analysis of Aedes aegypti based on mitochondrial NADH dehydrogenase subunit 4 ( ND4 ) gene at Almadinah, Saudi Arabia in order to get further insight into the epidemiology and transmission of this vector. Mitochondrial ND4 gene was sequenced in the eight isolated Aedes aegypti mosquitoes from Almadinah, Saudi Arabia, sequences were aligned, and phylogenetic analysis were performed and compared with 54 sequences of Aedes reported in the previous studies from Mexico, Thailand, Brazil, and Africa. Our results suggest that increased gene flow among Aedes aegypti populations occurs between Africa and Saudi Arabia. Phylogenetic relationship analysis showed two genetically distinct Aedes aegypti in Saudi Arabia derived from dual African ancestor.

  15. Escherichia coli isolates from patients with bacteremic urinary tract infection are genetically distinct from those derived from sepsis following prostate transrectal biopsy.

    PubMed

    Dan, Michael; Yair, Yael; Samosav, Alex; Gottesman, Tamar; Yossepowitch, Orit; Harari-Schwartz, Orna; Tsivian, Alexander; Schreiber, Rachel; Gophna, Uri

    2015-01-01

    Transrectal ultrasound-guided (TRUS) prostate biopsy is a very common procedure that is generally considered relatively safe. However, severe sepsis can occur after TRUS prostate biopsies, with Escherichia coli being the predominant causative agent. A common perception is that the bacteria that cause post-TRUS prostate biopsy infections originate in the urinary tract, but this view has not been adequately tested. Yet other authors believe on the basis of indirect evidence that the pathogens are introduced into the bloodstream by the biopsy needle after passage through the rectal mucosa. We compared E. coli isolates from male patients with bacteremic urinary tract infection (B-UTI) to isolates of patients with post prostate biopsy sepsis (PPBS), in terms of their sequence types, determined by multi-locus sequence typing (MLST) and their virulence markers. B-UTI isolates were much richer in virulence genes than were PPBS isolates, supporting the hypothesis that E. coli causing PPBS derive directly from the rectum. Sequence type 131 (ST131) strains and related strain from the ST131 were common (>30%) among the E. coli isolates from PPBS patients as well as from B-UTI patients and all these strains expressed extended spectrum beta-lactamases. Our finding supports the hypothesis that E. coli causing PPBS derive directly from the rectum, bypassing the urinary tract, and therefore do not require many of the virulence capabilities necessary for an E. coli strain that must persist in the urinary tract. In light of the increasing prevalence of highly resistant E. coli strains, a new approach for prevention of PPBS is urgently required. Copyright © 2015. Published by Elsevier GmbH.

  16. Expression and characterization of constitutive heat shock protein 70.1 (HSPA-1A) gene in in vitro produced and in vivo-derived buffalo (Bubalus bubalis) embryos.

    PubMed

    Sharma, G T; Nath, A; Prasad, S; Singhal, S; Singh, N; Gade, N E; Dubey, P K; Saikumar, G

    2012-12-01

    Cells are blessed with a group of stress protector molecules known as heat shock proteins (HSPs), amongst them HSP70, encoded by HSPA-1A gene, is most abundant and highly conserved protein. Variety of stresses hampers the developmental competence of embryos under in vivo and in vitro conditions. Present work was designed to study the quantitative expression of HSPA-1A mRNA in immature oocytes (IMO), matured oocytes (MO), in vitro produced (IVP) and in vivo-derived (IVD) buffalo embryos to assess the level of stress to which embryos are exposed under in vivo and in vitro culture conditions. Further, HSPA-1A gene sequence was analysed to determine its homology with other mammalian sequences. The mRNA expression analysis was carried out on 72 oocytes (40 IMO; 32 MO), 76 IVP and 55 IVD buffalo embryos. Expression of HSPA-1A was found in oocytes and throughout the developmental stages of embryos examined irrespective of the embryo source; however, higher (p < 0.05) expression was observed in 8-16 cell, morula and blastocyst stages of IVP embryos as compared to IVD embryos. Phylogenetic analysis of bubaline HSPA-1A revealed that it shares 91-98% identity with other mammalian sequences. It can be concluded that higher level of HSPA-1A mRNA in IVP embryos in comparison with in vivo-derived embryos is an indicator of cellular stress in IVP system. This study suggests need for further optimization of in vitro culture system in which HSPA-1A gene could be used as a stress biomarker during pre-implantation development. © 2012 Blackwell Verlag GmbH.

  17. The alpha-fetoprotein (AFP) third domain: a search for AFP interaction sites of cell cycle proteins.

    PubMed

    Mizejewski, G J

    2016-09-01

    The carboxy-terminal third domain of alpha-fetoprotein (AFP-3D) is known to harbor binding and/or interaction sites for hydrophobic ligands, receptors, and binding proteins. Such reports have established that AFP-3D consists of amino acid (AA) sequence stretches on the AFP polypeptide that engages in protein-to-protein interactions with various ligands and receptors. Using a computer software program specifically designed for such interactions, the present report identified AA sequence fragments on AFP-3D that could potentially interact with a variety of cell cycle proteins. The cell cycle proteins identified were (1) cyclins, (2) cyclin-dependent kinases, (3) cell cycle-associated proteins (inhibitors, checkpoints, initiators), and (4) ubiquitin ligases. Following detection of the AFP-3D to cell cycle protein interaction sites, the computer-derived AFP localization AA sequences were compared and aligned with previously reported hydrophobic ligand and receptor interaction sites on AFP-3D. A literature survey of the association of cell cycle proteins with AFP showed both positive relationships and correlations. Previous reports of experimental AFP-derived peptides effects on various cell cycle proteins served to confirm and verify the present computer cell cycle protein identifications. Cell cycle protein interactions with AFP-CD peptides have been reported in cultured MCF-7 breast cancer cells subjected to mRNA microarray analysis. After 7 days in culture with MCF-7 cells, the AFP-derived peptides were shown to downregulate cyclin E, SKP2, checkpoint suppressors, cyclin-dependent kinases, and ubiquitin ligases that modulate cyclin E/CdK2 transition from the G1 to the S-phase of the cell cycle. Thus, the experimental data on AFP-CD interaction with cell cycle proteins were consistent with the "in silico" findings.

  18. Population-Genomic Insights into Variation in Prevotella intermedia and Prevotella nigrescens Isolates and Its Association with Periodontal Disease

    PubMed Central

    Zhang, Yifei; Zhen, Min; Zhan, Yalin; Song, Yeqing; Zhang, Qian; Wang, Jinfeng

    2017-01-01

    High-throughput sequencing has helped to reveal the close relationship between Prevotella and periodontal disease, but the roles of subspecies diversity and genomic variation within this genus in periodontal diseases still need to be investigated. We performed a comparative genome analysis of 48 Prevotella intermedia and Prevotella nigrescens isolates that from the same cohort of subjects to identify the main drivers of their pathogenicity and adaptation to different environments. The comparisons were done between two species and between disease and health based on pooled sequences. The results showed that both P. intermedia and P. nigrescens have highly dynamic genomes and can take up various exogenous factors through horizontal gene transfer. The major differences between disease-derived and health-derived samples of P. intermedia and P. nigrescens were factors related to genome modification and recombination, indicating that the Prevotella isolates from disease sites may be more capable of genomic reconstruction. We also identified genetic elements specific to each sample, and found that disease groups had more unique virulence factors related to capsule and lipopolysaccharide synthesis, secretion systems, proteinases, and toxins, suggesting that strains from disease sites may have more specific virulence, particularly for P. intermedia. The differentially represented pathways between samples from disease and health were related to energy metabolism, carbohydrate and lipid metabolism, and amino acid metabolism, consistent with data from the whole subgingival microbiome in periodontal disease and health. Disease-derived samples had gained or lost several metabolic genes compared to healthy-derived samples, which could be linked with the difference in virulence performance between diseased and healthy sample groups. Our findings suggest that P. intermedia and P. nigrescens may serve as “crucial substances” in subgingival plaque, which may reflect changes in microbial and environmental dynamics in subgingival microbial ecosystems. This provides insight into the potential of P. intermedia and P. nigrescens as new predictive biomarkers and targets for effective interventions in periodontal disease. PMID:28983469

  19. Population-Genomic Insights into Variation in Prevotella intermedia and Prevotella nigrescens Isolates and Its Association with Periodontal Disease.

    PubMed

    Zhang, Yifei; Zhen, Min; Zhan, Yalin; Song, Yeqing; Zhang, Qian; Wang, Jinfeng

    2017-01-01

    High-throughput sequencing has helped to reveal the close relationship between Prevotella and periodontal disease, but the roles of subspecies diversity and genomic variation within this genus in periodontal diseases still need to be investigated. We performed a comparative genome analysis of 48 Prevotella intermedia and Prevotella nigrescens isolates that from the same cohort of subjects to identify the main drivers of their pathogenicity and adaptation to different environments. The comparisons were done between two species and between disease and health based on pooled sequences. The results showed that both P. intermedia and P. nigrescens have highly dynamic genomes and can take up various exogenous factors through horizontal gene transfer. The major differences between disease-derived and health-derived samples of P. intermedia and P. nigrescens were factors related to genome modification and recombination, indicating that the Prevotella isolates from disease sites may be more capable of genomic reconstruction. We also identified genetic elements specific to each sample, and found that disease groups had more unique virulence factors related to capsule and lipopolysaccharide synthesis, secretion systems, proteinases, and toxins, suggesting that strains from disease sites may have more specific virulence, particularly for P. intermedia . The differentially represented pathways between samples from disease and health were related to energy metabolism, carbohydrate and lipid metabolism, and amino acid metabolism, consistent with data from the whole subgingival microbiome in periodontal disease and health. Disease-derived samples had gained or lost several metabolic genes compared to healthy-derived samples, which could be linked with the difference in virulence performance between diseased and healthy sample groups. Our findings suggest that P. intermedia and P. nigrescens may serve as "crucial substances" in subgingival plaque, which may reflect changes in microbial and environmental dynamics in subgingival microbial ecosystems. This provides insight into the potential of P. intermedia and P. nigrescens as new predictive biomarkers and targets for effective interventions in periodontal disease.

  20. Identification of Single-Copy Orthologous Genes between Physalis and Solanum lycopersicum and Analysis of Genetic Diversity in Physalis Using Molecular Markers

    PubMed Central

    Wei, Jingli; Hu, Xiaorong; Yang, Jingjing; Yang, Wencai

    2012-01-01

    The genus Physalis includes a number of commercially important edible and ornamental species. Its high nutritional value and potential medicinal properties leads to the increased commercial interest in the products of this genus worldwide. However, lack of molecular markers prevents the detailed study of genetics and phylogeny in Physalis, which limits the progress of breeding. In the present study, we compared the DNA sequences between Physalis and tomato, and attempted to analyze genetic diversity in Physalis using tomato markers. Blasting 23180 DNA sequences derived from Physalis against the International Tomato Annotation Group (ITAG) Release2.3 Predicted CDS (SL2.40) discovered 3356 single-copy orthologous genes between them. A total of 38 accessions from at least six species of Physalis were subjected to genetic diversity analysis using 97 tomato markers and 25 SSR markers derived from P. peruviana. Majority (73.2%) of tomato markers could amplify DNA fragments from at least one accession of Physalis. Diversity in Physalis at molecular level was also detected. The average Nei’s genetic distance between accessions was 0.3806 with a range of 0.2865 to 0.7091. These results indicated Physalis and tomato had similarity at both molecular marker and DNA sequence levels. Therefore, the molecular markers developed in tomato can be used in genetic study in Physalis. PMID:23166835

  1. Identification of single-copy orthologous genes between Physalis and Solanum lycopersicum and analysis of genetic diversity in Physalis using molecular markers.

    PubMed

    Wei, Jingli; Hu, Xiaorong; Yang, Jingjing; Yang, Wencai

    2012-01-01

    The genus Physalis includes a number of commercially important edible and ornamental species. Its high nutritional value and potential medicinal properties leads to the increased commercial interest in the products of this genus worldwide. However, lack of molecular markers prevents the detailed study of genetics and phylogeny in Physalis, which limits the progress of breeding. In the present study, we compared the DNA sequences between Physalis and tomato, and attempted to analyze genetic diversity in Physalis using tomato markers. Blasting 23180 DNA sequences derived from Physalis against the International Tomato Annotation Group (ITAG) Release2.3 Predicted CDS (SL2.40) discovered 3356 single-copy orthologous genes between them. A total of 38 accessions from at least six species of Physalis were subjected to genetic diversity analysis using 97 tomato markers and 25 SSR markers derived from P. peruviana. Majority (73.2%) of tomato markers could amplify DNA fragments from at least one accession of Physalis. Diversity in Physalis at molecular level was also detected. The average Nei's genetic distance between accessions was 0.3806 with a range of 0.2865 to 0.7091. These results indicated Physalis and tomato had similarity at both molecular marker and DNA sequence levels. Therefore, the molecular markers developed in tomato can be used in genetic study in Physalis.

  2. Inferring the expression variability of human transposable element-derived exons by linear model analysis of deep RNA sequencing data.

    PubMed

    Zhang, Wensheng; Edwards, Andrea; Fan, Wei; Fang, Zhide; Deininger, Prescott; Zhang, Kun

    2013-08-28

    The exonization of transposable elements (TEs) has proven to be a significant mechanism for the creation of novel exons. Existing knowledge of the retention patterns of TE exons in mRNAs were mainly established by the analysis of Expressed Sequence Tag (EST) data and microarray data. This study seeks to validate and extend previous studies on the expression of TE exons by an integrative statistical analysis of high throughput RNA sequencing data. We collected 26 RNA-seq datasets spanning multiple tissues and cancer types. The exon-level digital expressions (indicating retention rates in mRNAs) were quantified by a double normalized measure, called the rescaled RPKM (Reads Per Kilobase of exon model per Million mapped reads). We analyzed the distribution profiles and the variability (across samples and between tissue/disease groups) of TE exon expressions, and compared them with those of other constitutive or cassette exons. We inferred the effects of four genomic factors, including the location, length, cognate TE family and TE nucleotide proportion (RTE, see Methods section) of a TE exon, on the exons' expression level and expression variability. We also investigated the biological implications of an assembly of highly-expressed TE exons. Our analysis confirmed prior studies from the following four aspects. First, with relatively high expression variability, most TE exons in mRNAs, especially those without exact counterparts in the UCSC RefSeq (Reference Sequence) gene tables, demonstrate low but still detectable expression levels in most tissue samples. Second, the TE exons in coding DNA sequences (CDSs) are less highly expressed than those in 3' (5') untranslated regions (UTRs). Third, the exons derived from chronologically ancient repeat elements, such as MIRs, tend to be highly expressed in comparison with those derived from younger TEs. Fourth, the previously observed negative relationship between the lengths of exons and the inclusion levels in transcripts is also true for exonized TEs. Furthermore, our study resulted in several novel findings. They include: (1) for the TE exons with non-zero expression and as shown in most of the studied biological samples, a high TE nucleotide proportion leads to their lower retention rates in mRNAs; (2) the considered genomic features (i.e. a continuous variable such as the exon length or a category indicator such as 3'UTR) influence the expression level and the expression variability (CV) of TE exons in an inverse manner; (3) not only the exons derived from Alu elements but also the exons from the TEs of other families were preferentially established in zinc finger (ZNF) genes.

  3. Functionalization of peptide nucleolipid bioconjugates and their structure anti-cancer activity relationship studies.

    PubMed

    Rana, Niki; Cultrara, Christopher; Phillips, Mariana; Sabatino, David

    2017-09-01

    In the search for more potent peptide-based anti-cancer conjugates the generation of new, functionally diverse nucleolipid derived D-(KLAKLAK) 2 -AK sequences has enabled a structure and anti-cancer activity relationship study. A reductive amination approach was key for the synthesis of alkylamine, diamine and polyamine derived nucleolipids as well as those incorporating heterocyclic functionality. The carboxy-derived nucleolipids were then coupled to the C-terminus of the D-(KLAKLAK) 2 -AK killer peptide sequence and produced with and without the FITC fluorophore for investigating biological activity in cancer cells. The amphiphilic, α-helical peptide-nucleolipid bioconjugates were found to exhibit variable effects on the viability of MM.1S cells, with the histamine derived nucleolipid peptide bioconjugate displaying the most significant anti-cancer effects. Thus, functionally diverse nucleolipids have been developed to fine-tune the structure and anti-cancer properties of killer peptide sequences, such as D-(KLAKLAK) 2 -AK. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws

    NASA Technical Reports Server (NTRS)

    Cooke, Daniel; Rushton, Nelson

    2013-01-01

    With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less costly than development of comparable parallel code. Moreover, SequenceL not only automatically parallelizes the code, but since it is based on CSP-NT, it is provably race free, thus eliminating the largest quality challenge the parallelized software developer faces.

  5. Evaluation of two main RNA-seq approaches for gene quantification in clinical RNA sequencing: polyA+ selection versus rRNA depletion.

    PubMed

    Zhao, Shanrong; Zhang, Ying; Gamini, Ramya; Zhang, Baohong; von Schack, David

    2018-03-19

    To allow efficient transcript/gene detection, highly abundant ribosomal RNAs (rRNA) are generally removed from total RNA either by positive polyA+ selection or by rRNA depletion (negative selection) before sequencing. Comparisons between the two methods have been carried out by various groups, but the assessments have relied largely on non-clinical samples. In this study, we evaluated these two RNA sequencing approaches using human blood and colon tissue samples. Our analyses showed that rRNA depletion captured more unique transcriptome features, whereas polyA+ selection outperformed rRNA depletion with higher exonic coverage and better accuracy of gene quantification. For blood- and colon-derived RNAs, we found that 220% and 50% more reads, respectively, would have to be sequenced to achieve the same level of exonic coverage in the rRNA depletion method compared with the polyA+ selection method. Therefore, in most cases we strongly recommend polyA+ selection over rRNA depletion for gene quantification in clinical RNA sequencing. Our evaluation revealed that a small number of lncRNAs and small RNAs made up a large fraction of the reads in the rRNA depletion RNA sequencing data. Thus, we recommend that these RNAs are specifically depleted to improve the sequencing depth of the remaining RNAs.

  6. Empirical Validation of Pooled Whole Genome Population Re-Sequencing in Drosophila melanogaster

    PubMed Central

    Zhu, Yuan; Bergland, Alan O.; González, Josefa; Petrov, Dmitri A.

    2012-01-01

    The sequencing of pooled non-barcoded individuals is an inexpensive and efficient means of assessing genome-wide population allele frequencies, yet its accuracy has not been thoroughly tested. We assessed the accuracy of this approach on whole, complex eukaryotic genomes by resequencing pools of largely isogenic, individually sequenced Drosophila melanogaster strains. We called SNPs in the pooled data and estimated false positive and false negative rates using the SNPs called in individual strain as a reference. We also estimated allele frequency of the SNPs using “pooled” data and compared them with “true” frequencies taken from the estimates in the individual strains. We demonstrate that pooled sequencing provides a faithful estimate of population allele frequency with the error well approximated by binomial sampling, and is a reliable means of novel SNP discovery with low false positive rates. However, a sufficient number of strains should be used in the pooling because variation in the amount of DNA derived from individual strains is a substantial source of noise when the number of pooled strains is low. Our results and analysis confirm that pooled sequencing is a very powerful and cost-effective technique for assessing of patterns of sequence variation in populations on genome-wide scales, and is applicable to any dataset where sequencing individuals or individual cells is impossible, difficult, time consuming, or expensive. PMID:22848651

  7. Genetic diversity and host specificity varies across three genera of blood parasites in ducks of the Pacific Americas Flyway

    USGS Publications Warehouse

    Reeves, Andrew B.; Smith, Matthew M.; Meixell, Brandt W.; Fleskes, Joseph P.; Ramey, Andrew M.

    2015-01-01

    Birds of the order Anseriformes, commonly referred to as waterfowl, are frequently infected by Haemosporidia of the genera Haemoproteus, Plasmodium, and Leucocytozoon via dipteran vectors. We analyzed nucleotide sequences of the Cytochrome b (Cytb) gene from parasites of these genera detected in six species of ducks from Alaska and California, USA to characterize the genetic diversity of Haemosporidia infecting waterfowl at two ends of the Pacific Americas Flyway. In addition, parasite Cytb sequences were compared to those available on a public database to investigate specificity of genetic lineages to hosts of the order Anseriformes. Haplotype and nucleotide diversity of Haemoproteus Cytb sequences was lower than was detected for Plasmodium and Leucocytozoon parasites. Although waterfowl are presumed to be infected by only a single species of Leucocytozoon, L. simondi, diversity indices were highest for haplotypes from this genus and sequences formed five distinct clades separated by genetic distances of 4.9%–7.6%, suggesting potential cryptic speciation. All Haemoproteus andLeucocytozoon haplotypes derived from waterfowl samples formed monophyletic clades in phylogenetic analyses and were unique to the order Anseriformes with few exceptions. In contrast, waterfowl-origin Plasmodium haplotypes were identical or closely related to lineages found in other avian orders. Our results suggest a more generalist strategy for Plasmodiumparasites infecting North American waterfowl as compared to those of the generaHaemoproteus and Leucocytozoon.

  8. Bacterial taxa–area and distance–decay relationships in marine environments

    PubMed Central

    Zinger, L; Boetius, A; Ramette, A

    2014-01-01

    The taxa–area relationship (TAR) and the distance–decay relationship (DDR) both describe spatial turnover of taxa and are central patterns of biodiversity. Here, we compared TAR and DDR of bacterial communities across different marine realms and ecosystems at the global scale. To obtain reliable global estimates for both relationships, we quantified the poorly assessed effects of sequencing depth, rare taxa removal and number of sampling sites. Slope coefficients of bacterial TARs were within the range of those of plants and animals, whereas slope coefficients of bacterial DDR were much lower. Slope coefficients were mostly affected by removing rare taxa and by the number of sampling sites considered in the calculations. TAR and DDR slope coefficients were overestimated at sequencing depth <4000 sequences per sample. Noticeably, bacterial TAR and DDR patterns did not correlate with each other both within and across ecosystem types, suggesting that (i) TAR cannot be directly derived from DDR and (ii) TAR and DDR may be influenced by different ecological factors. Nevertheless, we found marine bacterial TAR and DDR to be steeper in ecosystems associated with high environmental heterogeneity or spatial isolation, namely marine sediments and coastal environments compared with pelagic ecosystems. Hence, our study provides information on macroecological patterns of marine bacteria, as well as methodological and conceptual insights, at a time when biodiversity surveys increasingly make use of high-throughput sequencing technologies. PMID:24460915

  9. Genetic Diversity and Host Specificity Varies across Three Genera of Blood Parasites in Ducks of the Pacific Americas Flyway

    PubMed Central

    Reeves, Andrew B.; Smith, Mathew M.; Meixell, Brandt W.; Fleskes, Joseph P; Ramey, Andrew M.

    2015-01-01

    Birds of the order Anseriformes, commonly referred to as waterfowl, are frequently infected by Haemosporidia of the genera Haemoproteus, Plasmodium, and Leucocytozoon via dipteran vectors. We analyzed nucleotide sequences of the Cytochrome b (Cytb) gene from parasites of these genera detected in six species of ducks from Alaska and California, USA to characterize the genetic diversity of Haemosporidia infecting waterfowl at two ends of the Pacific Americas Flyway. In addition, parasite Cytb sequences were compared to those available on a public database to investigate specificity of genetic lineages to hosts of the order Anseriformes. Haplotype and nucleotide diversity of Haemoproteus Cytb sequences was lower than was detected for Plasmodium and Leucocytozoon parasites. Although waterfowl are presumed to be infected by only a single species of Leucocytozoon, L. simondi, diversity indices were highest for haplotypes from this genus and sequences formed five distinct clades separated by genetic distances of 4.9%–7.6%, suggesting potential cryptic speciation. All Haemoproteus and Leucocytozoon haplotypes derived from waterfowl samples formed monophyletic clades in phylogenetic analyses and were unique to the order Anseriformes with few exceptions. In contrast, waterfowl-origin Plasmodium haplotypes were identical or closely related to lineages found in other avian orders. Our results suggest a more generalist strategy for Plasmodium parasites infecting North American waterfowl as compared to those of the genera Haemoproteus and Leucocytozoon. PMID:25710468

  10. Expanding the cerebrospinal fluid endopeptidome.

    PubMed

    Hansson, Karl T; Skillbäck, Tobias; Pernevik, Elin; Kern, Silke; Portelius, Erik; Höglund, Kina; Brinkmalm, Gunnar; Holmén-Larsson, Jessica; Blennow, Kaj; Zetterberg, Henrik; Gobom, Johan

    2017-03-01

    Biomarkers of neurodegenerative disorders are needed to assist in diagnosis, to monitor disease progression and therapeutic interventions, and to provide insight into disease mechanisms. One route to identify such biomarkers is by proteomic and peptidomic analysis of cerebrospinal fluid (CSF). In the current study, we performed an in-depth analysis of the human CSF endopeptidome to establish an inventory that may serve as a basis for future targeted biomarker studies. High-pH RP HPLC was employed for off-line sample prefractionation followed by low-pH nano-LC-MS analysis. Different software programs and scoring algorithms for peptide identification were employed and compared. A total of 18 031 endogenous peptides were identified at a FDR of 1%, increasing the number of known endogenous CSF peptides 10-fold compared to previous studies. The peptides were derived from 2 053 proteins of which more than 60 have been linked to neurodegeneration. Notably, among the findings were six peptides derived from microtubule-associated protein tau, three of which span the diagnostically interesting threonine-181 (Tau-F isoform). Also, 213 peptides from amyloid precursor protein were identified, 58 of which were partially or completely within the sequence of amyloid β 1-40/42, as well as 109 peptides from apolipoprotein E, spanning sequences that discriminate between the E2/E3/E4 isoforms of the protein. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Human placental lactogen mRNA and its structural genes during pregnancy: quantitation with a complementary DNA.

    PubMed Central

    McWilliams, D; Callahan, R C; Boime, I

    1977-01-01

    A complementary DNA (cDNA) strand was transcribed from human placental lactogen (hPL) mRNA. Based on alkaline sucrose gradient centrifugation, the size of the cDNA was about 8 S, which would represent at least 80% of the hPL mRNA. Previously we showed that four to five times more hPL was synthesized in cell-free extracts derived from term as compared to first trimester placentas. Hybridization of the cDNA with RNA derived from placental tissue revealed that there was about four times more hPL mRNA sequences in total RNA from term placenta than in a comparable quantity of total first trimester RNA. Only background hybridization was observed when the cDNA was incubated with RNA prepared from human kidney. To test if this differential accumulation of hPL mRNA was the result of an amplification of hPL genes, we hybridized the labeled cDNA with cellular DNA from first trimester and term placentas and with DNA isolated from human brain. In all cases, the amount of hPL sequences was approximately two copies per haploid genome. Thus, the enhanced synthesis of hPL mRNA appears to result from a transcriptional activation rather than an amplification of the hPL gene. The increase likely reflects placental differentiation in which the proportion of syncytial trophoblast increases at term. Images PMID:66681

  12. Evidence for recombination of mitochondrial DNA in triploid crucian carp.

    PubMed

    Guo, Xinhong; Liu, Shaojun; Liu, Yun

    2006-03-01

    In this study, we report the complete mitochondrial DNA (mtDNA) sequences of the allotetraploid and triploid crucian carp and compare the complete mtDNA sequences between the triploid crucian carp and its female parent Japanese crucian carp and between the triploid crucian carp and its male parent allotetraploid. Our results indicate that the complete mtDNA nucleotide identity (98%) between the triploid crucian carp and its male parent allotetraploid was higher than that (93%) between the triploid crucian carp and its female parent Japanese crucian carp. Moreover, the presence of a pattern of identity and difference at synonymous sites of mitochondrial genomes between the triploid crucian carp and its parents provides direct evidence that triploid crucian carp possessed the recombination mtDNA fragment (12,759 bp) derived from the paternal fish. These results suggest that mtDNA recombination was derived from the fusion of the maternal and paternal mtDNAs. Compared with the haploid egg with one set of genome from the Japanese crucian carp, the diploid sperm with two sets of genomes from the allotetraploid could more easily make its mtDNA fuse with the mtDNA of the haploid egg. In addition, the triple hybrid nature of the triploid crucian carp probably allowed its better mtDNA recombination. In summary, our results provide the first evidence of mtDNA combination in polyploid fish.

  13. Genetic variation of viral protein 1 genes of field strains of waterfowl parvoviruses and their attenuated derivatives.

    PubMed

    Tsai, Hsiang-Jung; Tseng, Chun-hsien; Chang, Poa-chun; Mei, Kai; Wang, Shih-Chi

    2004-09-01

    To understand the genetic variations between the field strains of waterfowl parvoviruses and their attenuated derivatives, we analyzed the complete nucleotide sequences of the viral protein 1 (VP1) genes of nine field strains and two vaccine strains of waterfowl parvoviruses. Sequence comparison of the VP1 proteins showed that these viruses could be divided into goose parvovirus (GPV) related and Muscovy duck parvovirus (MDPV) related groups. The amino acid difference between GPV- and MDPV-related groups ranged from 13.1% to 15.8%, and the most variable region resided in the N terminus of VP2. The vaccine strains of GPV and MDPV exhibited only 1.2% and 0.3% difference in amino acid when compared with their parental field strains, and most of these differences resided in residues 497-575 of VP1, suggesting that these residues might be important for the attenuation of GPV and MDPV. When the GPV strains isolated in 1982 (the strain 82-0308) and in 2001 (the strain 01-1001) were compared, only 0.3% difference in amino acid was found, while MDPV strains isolated in 1990 (the strain 90-0219) and 1997 (the strain 97-0104) showed only 0.4% difference in amino acid. The result indicates that the genome of waterfowl parvovirus had remained highly stable in the field.

  14. Endogenous siRNAs and noncoding RNA-derived small RNAs are expressed in adult mouse hippocampus and are up-regulated in olfactory discrimination training.

    PubMed

    Smalheiser, Neil R; Lugli, Giovanni; Thimmapuram, Jyothi; Cook, Edwin H; Larson, John

    2011-01-01

    We previously proposed that endogenous siRNAs may regulate synaptic plasticity and long-term gene expression in the mammalian brain. Here, a hippocampal-dependent task was employed in which adult mice were trained to execute a nose-poke in a port containing one of two simultaneously present odors in order to obtain a reward. Mice demonstrating olfactory discrimination training were compared to pseudo-training and nose-poke control groups; size-selected hippocampal RNA was subjected to Illumina deep sequencing. Sequences that aligned uniquely and exactly to the genome without uncertain nucleotide assignments, within exons or introns of MGI annotated genes, were examined further. The data confirm that small RNAs having features of endogenous siRNAs are expressed in brain; that many of them derive from genes that regulate synaptic plasticity (and have been implicated in neuropsychiatric diseases); and that hairpin-derived endo-siRNAs and the 20- to 23-nt size class of small RNAs show a significant increase during an early stage of training. The most abundant putative siRNAs arose from an intronic inverted repeat within the SynGAP1 locus; this inverted repeat was a substrate for dicer in vitro, and SynGAP1 siRNA was specifically associated with Argonaute proteins in vivo. Unexpectedly, a dramatic increase with training (more than 100-fold) was observed for a class of 25- to 30-nt small RNAs derived from specific sites within snoRNAs and abundant noncoding RNAs (Y1 RNA, RNA component of mitochondrial RNAse P, 28S rRNA, and 18S rRNA). Further studies are warranted to characterize the role(s) played by endogenous siRNAs and noncoding RNA-derived small RNAs in learning and memory.

  15. A high-speed on-chip pseudo-random binary sequence generator for multi-tone phase calibration

    NASA Astrophysics Data System (ADS)

    Gommé, Liesbeth; Vandersteen, Gerd; Rolain, Yves

    2011-07-01

    An on-chip reference generator is conceived by adopting the technique of decimating a pseudo-random binary sequence (PRBS) signal in parallel sequences. This is of great benefit when high-speed generation of PRBS and PRBS-derived signals is the objective. The design implemented standard CMOS logic is available in commercial libraries to provide the logic functions for the generator. The design allows the user to select the periodicity of the PRBS and the PRBS-derived signals. The characterization of the on-chip generator marks its performance and reveals promising specifications.

  16. To Clone or Not To Clone: Method Analysis for Retrieving Consensus Sequences In Ancient DNA Samples

    PubMed Central

    Winters, Misa; Barta, Jodi Lynn; Monroe, Cara; Kemp, Brian M.

    2011-01-01

    The challenges associated with the retrieval and authentication of ancient DNA (aDNA) evidence are principally due to post-mortem damage which makes ancient samples particularly prone to contamination from “modern” DNA sources. The necessity for authentication of results has led many aDNA researchers to adopt methods considered to be “gold standards” in the field, including cloning aDNA amplicons as opposed to directly sequencing them. However, no standardized protocol has emerged regarding the necessary number of clones to sequence, how a consensus sequence is most appropriately derived, or how results should be reported in the literature. In addition, there has been no systematic demonstration of the degree to which direct sequences are affected by damage or whether direct sequencing would provide disparate results from a consensus of clones. To address this issue, a comparative study was designed to examine both cloned and direct sequences amplified from ∼3,500 year-old ancient northern fur seal DNA extracts. Majority rules and the Consensus Confidence Program were used to generate consensus sequences for each individual from the cloned sequences, which exhibited damage at 31 of 139 base pairs across all clones. In no instance did the consensus of clones differ from the direct sequence. This study demonstrates that, when appropriate, cloning need not be the default method, but instead, should be used as a measure of authentication on a case-by-case basis, especially when this practice adds time and cost to studies where it may be superfluous. PMID:21738625

  17. Symmetric convolution of asymmetric multidimensional sequences using discrete trigonometric transforms.

    PubMed

    Foltz, T M; Welsh, B M

    1999-01-01

    This paper uses the fact that the discrete Fourier transform diagonalizes a circulant matrix to provide an alternate derivation of the symmetric convolution-multiplication property for discrete trigonometric transforms. Derived in this manner, the symmetric convolution-multiplication property extends easily to multiple dimensions using the notion of block circulant matrices and generalizes to multidimensional asymmetric sequences. The symmetric convolution of multidimensional asymmetric sequences can then be accomplished by taking the product of the trigonometric transforms of the sequences and then applying an inverse trigonometric transform to the result. An example is given of how this theory can be used for applying a two-dimensional (2-D) finite impulse response (FIR) filter with nonlinear phase which models atmospheric turbulence.

  18. Diverse and Abundant Secondary Metabolism Biosynthetic Gene Clusters in the Genomes of Marine Sponge Derived Streptomyces spp. Isolates.

    PubMed

    Jackson, Stephen A; Crossman, Lisa; Almeida, Eduardo L; Margassery, Lekha Menon; Kennedy, Jonathan; Dobson, Alan D W

    2018-02-20

    The genus Streptomyces produces secondary metabolic compounds that are rich in biological activity. Many of these compounds are genetically encoded by large secondary metabolism biosynthetic gene clusters (smBGCs) such as polyketide synthases (PKS) and non-ribosomal peptide synthetases (NRPS) which are modular and can be highly repetitive. Due to the repeats, these gene clusters can be difficult to resolve using short read next generation datasets and are often quite poorly predicted using standard approaches. We have sequenced the genomes of 13 Streptomyces spp. strains isolated from shallow water and deep-sea sponges that display antimicrobial activities against a number of clinically relevant bacterial and yeast species. Draft genomes have been assembled and smBGCs have been identified using the antiSMASH (antibiotics and Secondary Metabolite Analysis Shell) web platform. We have compared the smBGCs amongst strains in the search for novel sequences conferring the potential to produce novel bioactive secondary metabolites. The strains in this study recruit to four distinct clades within the genus Streptomyces . The marine strains host abundant smBGCs which encode polyketides, NRPS, siderophores, bacteriocins and lantipeptides. The deep-sea strains appear to be enriched with gene clusters encoding NRPS. Marine adaptations are evident in the sponge-derived strains which are enriched for genes involved in the biosynthesis and transport of compatible solutes and for heat-shock proteins. Streptomyces spp. from marine environments are a promising source of novel bioactive secondary metabolites as the abundance and diversity of smBGCs show high degrees of novelty. Sponge derived Streptomyces spp. isolates appear to display genomic adaptations to marine living when compared to terrestrial strains.

  19. Micronuclear DNA of Oxytricha nova contains sequences with autonomously replicating activity in Saccharomyces cerevisiae.

    PubMed Central

    Colombo, M M; Swanton, M T; Donini, P; Prescott, D M

    1984-01-01

    Oxytricha nova is a hypotrichous ciliate with micronuclei and macronuclei. Micronuclei, which contain large, chromosomal-sized DNA, are genetically inert but undergo meiosis and exchange during cell mating. Macronuclei, which contain only small, gene-sized DNA molecules, provide all of the nuclear RNA needed to run the cell. After cell mating the macronucleus is derived from a micronucleus, a derivation that includes excision of the genes from chromosomes and elimination of the remaining DNA. The eliminated DNA includes all of the repetitious sequences and approximately 95% of the unique sequences. We cloned large restriction fragments from the micronucleus that confer replication ability on a replication-deficient plasmid in Saccharomyces cerevisiae. Sequences that confer replication ability are called autonomously replicating sequences. The frequency and effectiveness of autonomously replicating sequences in micronuclear DNA are similar to those reported for DNAs of other organisms introduced into yeast cells. Of the 12 micronuclear fragments with autonomously replicating sequence activity, 9 also showed homology to macronuclear DNA, indicating that they contain a macronuclear gene sequence. We conclude from this that autonomously replicating sequence activity is nonrandomly distributed throughout micronuclear DNA and is preferentially associated with those regions of micronuclear DNA that contain genes. Images PMID:6092934

  20. Revising Star and Planet Formation Timescales

    NASA Astrophysics Data System (ADS)

    Bell, Cameron P. M.; Naylor, Tim; Mayne, N. J.; Jeffries, R. D.; Littlefair, S. P.

    2013-07-01

    We have derived ages for 13 young (<30 Myr) star-forming regions and find that they are up to a factor of 2 older than the ages typically adopted in the literature. This result has wide-ranging implications, including that circumstellar discs survive longer (≃ 10-12 Myr) and that the average Class I lifetime is greater (≃1 Myr) than currently believed. For each star-forming region, we derived two ages from colour-magnitude diagrams. First, we fitted models of the evolution between the zero-age main sequence and terminal-age main sequence to derive a homogeneous set of main-sequence ages, distances and reddenings with statistically meaningful uncertainties. Our second age for each star-forming region was derived by fitting pre-main-sequence stars to new semi-empirical model isochrones. For the first time (for a set of clusters younger than 50 Myr), we find broad agreement between these two ages, and since these are derived from two distinct mass regimes that rely on different aspects of stellar physics, it gives us confidence in the new age scale. This agreement is largely due to our adoption of empirical colour-Teff relations and bolometric corrections for pre-main-sequence stars cooler than 4000 K. The revised ages for the star-forming regions in our sample are: 2 Myr for NGC 6611 (Eagle Nebula; M 16), IC 5146 (Cocoon Nebula), NGC 6530 (Lagoon Nebula; M 8) and NGC 2244 (Rosette Nebula); 6 Myr for σ Ori, Cep OB3b and IC 348; ≃10 Myr for λ Ori (Collinder 69); ≃11 Myr for NGC 2169; ≃12 Myr for NGC 2362; ≃13 Myr for NGC 7160; ≃14 Myr for χ Per (NGC 884); and ≃20 Myr for NGC 1960 (M 36).

  1. Complete coding regions of the prototypes enterovirus B93 and C95: phylogenetic analyses of the P1 and P3 regions of EV-B and EV-C strains.

    PubMed

    Junttila, N; Lévêque, N; Magnius, L O; Kabue, J P; Muyembe-Tamfum, J J; Maslin, J; Lina, B; Norder, H

    2015-03-01

    Complete coding regions were sequenced for two new enterovirus genomes: EV-B93 previously identified by VP1 sequencing, derived from a child with acute flaccid paralysis in the Democratic Republic of Congo; and EV-C95 from a French soldier with acute gastroenteritis in Djibouti. The EV-B93 P1 had more than 30% nucleotide divergence from other EV-B types, with highest similarity to E-15 and EV-B80. The P1 nucleotide sequence of EV-C95 was most similar, 71%, to CV-A21. Complete coding regions for the new enteroviruses were compared with those of 135 EV-B and 176 EV-C strains representing all types available in GenBank. When strains from the same outbreak or strains isolated during the same year in the same geographical region were excluded, 27 of the 58 EV-B, and 16 of the 23 EV-C types were represented by more than one sequence. However, for EV-B the P3 sequences formed three clades mainly according to origin or time of isolation, irrespective of type, while for EV-C the P3 sequences segregated mainly according to disease manifestation, with most strains causing paralysis, including polioviruses, forming one clade, and strains causing respiratory illness forming another. There was no intermixing of types between these two clades, apart from two EV-C96 strains. The EV-B P3 sequences had lower inter-clade and higher intra-clade variability as compared to the EV-C sequences, which may explain why inter-clade recombinations are more frequent in EV-B. Further analysis of more isolates may shed light on the role of recombinations in the evolution of EV-B in geographical context. © 2014 Wiley Periodicals, Inc.

  2. Characterization and comparative analysis of the genome of Puccinia sorghi Schwein, the causal agent of maize common rust.

    PubMed

    Rochi, Lucia; Diéguez, María José; Burguener, Germán; Darino, Martín Alejandro; Pergolesi, María Fernanda; Ingala, Lorena Romina; Cuyeu, Alba Romina; Turjanski, Adrián; Kreff, Enrique Domingo; Sacco, Francisco

    2018-03-01

    Rust fungi are one of the most devastating pathogens of crop plants. The biotrophic fungus Puccinia sorghi Schwein (Ps) is responsible for maize common rust, an endemic disease of maize (Zea mays L.) in Argentina that causes significant yield losses in corn production. In spite of this, the Ps genomic sequence was not available. We used Illumina sequencing to rapidly produce the 99.6Mbdraft genome sequence of Ps race RO10H11247, derived from a single-uredinial isolate from infected maize leaves collected in the Argentine Corn Belt Region during 2010. High quality reads were obtained from 200bppaired-end and 5000bpmate-paired libraries and assembled in 15,722 scaffolds. A pipeline which combined an ab initio program with homology-based models and homology to in planta enriched ESTs from four cereal pathogenic fungus (the three sequenced wheat rusts and Ustilago maydis) was used to identify 21,087 putative coding sequences, of which 1599 might be part of the Ps RO10H11247 secretome. Among the 458 highly conserved protein families from the euKaryotic Orthologous Groups (KOG) that occur in a wide range of eukaryotic organisms, 97.5% have at least one member with high homology in the Ps assembly (TBlastN, E-value⩽e-10) covering more than 50% of the length of the KOG protein. Comparative studies with the three sequenced wheat rust fungus, and microsynteny analysis involving Puccinia striiformis f. sp. tritici (Pst, wheat stripe rust fungus), support the quality achieved. The results presented here show the effectiveness of the Illumina strategy for sequencing dikaryotic genomes of non-model organisms and provides reliable DNA sequence information for genomic studies, including pathogenic mechanisms of this maize fungus and molecular marker design. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase.

    PubMed

    Zemla, Adam T; Lang, Dorothy M; Kostova, Tanya; Andino, Raul; Ecale Zhou, Carol L

    2011-06-02

    Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory--still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could help overcome these difficulties by facilitating the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV (structure-alignment sequence variability), a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus, and we demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique, or that share structural similarity with proteins that would be considered distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local structural alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position. StralSV is provided as a web service at http://proteinmodel.org/AS2TS/STRALSV/.

  4. In-silico mining, type and frequency analysis of genic microsatellites of finger millet (Eleusine coracana (L.) Gaertn.): a comparative genomic analysis of NBS-LRR regions of finger millet with rice.

    PubMed

    Kalyana Babu, B; Pandey, Dinesh; Agrawal, P K; Sood, Salej; Kumar, Anil

    2014-05-01

    In recent years, the increased availability of the DNA sequences has given the possibility to develop and explore the expressed sequence tags (ESTs) derived SSR markers. In the present study, a total of 1956 ESTs of finger millet were used to find the microsatellite type, distribution, frequency and developed a total of 545 primer pairs from the ESTs of finger millet. Thirty-two EST sequences had more than two microsatellites and 1357 sequences did not have any SSR repeats. The most frequent type of repeats was trimeric motif, however the second place was occupied by dimeric motif followed by tetra-, hexa- and penta repeat motifs. The most common dimer repeat motif was GA and in case of trimeric SSRs, it was CGG. The EST sequences of NBS-LRR region of finger millet and rice showed higher synteny and were found on nearly same positions on the rice chromosome map. A total of eight, out of 15 EST based SSR primers were polymorphic among the selected resistant and susceptible finger millet genotypes. The primer FMBLEST5 could able to differentiate them into resistant and susceptible genotypes. The alleles specific to the resistant and susceptible genotypes were sequenced using the ABI 3130XL genetic analyzer and found similarity to NBS-LRR regions of rice and finger millet and contained the characteristic kinase-2 and kinase 3a motifs of plant R-genes belonged to NBS-LRR region. The In-silico and comparative analysis showed that the genes responsible for blast resistance can be identified, mapped and further introgressed through molecular breeding approaches for enhancing the blast resistance in finger millet.

  5. Grading of Gliomas by Using Radiomic Features on Multiple Magnetic Resonance Imaging (MRI) Sequences.

    PubMed

    Qin, Jiang-Bo; Liu, Zhenyu; Zhang, Hui; Shen, Chen; Wang, Xiao-Chun; Tan, Yan; Wang, Shuo; Wu, Xiao-Feng; Tian, Jie

    2017-05-07

    BACKGROUND Gliomas are the most common primary brain neoplasms. Misdiagnosis occurs in glioma grading due to an overlap in conventional MRI manifestations. The aim of the present study was to evaluate the power of radiomic features based on multiple MRI sequences - T2-Weighted-Imaging-FLAIR (FLAIR), T1-Weighted-Imaging-Contrast-Enhanced (T1-CE), and Apparent Diffusion Coefficient (ADC) map - in glioma grading, and to improve the power of glioma grading by combining features. MATERIAL AND METHODS Sixty-six patients with histopathologically proven gliomas underwent T2-FLAIR and T1WI-CE sequence scanning with some patients (n=63) also undergoing DWI scanning. A total of 114 radiomic features were derived with radiomic methods by using in-house software. All radiomic features were compared between high-grade gliomas (HGGs) and low-grade gliomas (LGGs). Features with significant statistical differences were selected for receiver operating characteristic (ROC) curve analysis. The relationships between significantly different radiomic features and glial fibrillary acidic protein (GFAP) expression were evaluated. RESULTS A total of 8 radiomic features from 3 MRI sequences displayed significant differences between LGGs and HGGs. FLAIR GLCM Cluster Shade, T1-CE GLCM Entropy, and ADC GLCM Homogeneity were the best features to use in differentiating LGGs and HGGs in each MRI sequence. The combined feature was best able to differentiate LGGs and HGGs, which improved the accuracy of glioma grading compared to the above features in each MRI sequence. A significant correlation was found between GFAP and T1-CE GLCM Entropy, as well as between GFAP and ADC GLCM Homogeneity. CONCLUSIONS The combined radiomic feature had the highest efficacy in distinguishing LGGs from HGGs.

  6. The determination of high-resolution spatio-temporal glacier motion fields from time-lapse sequences

    NASA Astrophysics Data System (ADS)

    Schwalbe, Ellen; Maas, Hans-Gerd

    2017-12-01

    This paper presents a comprehensive method for the determination of glacier surface motion vector fields at high spatial and temporal resolution. These vector fields can be derived from monocular terrestrial camera image sequences and are a valuable data source for glaciological analysis of the motion behaviour of glaciers. The measurement concepts for the acquisition of image sequences are presented, and an automated monoscopic image sequence processing chain is developed. Motion vector fields can be derived with high precision by applying automatic subpixel-accuracy image matching techniques on grey value patterns in the image sequences. Well-established matching techniques have been adapted to the special characteristics of the glacier data in order to achieve high reliability in automatic image sequence processing, including the handling of moving shadows as well as motion effects induced by small instabilities in the camera set-up. Suitable geo-referencing techniques were developed to transform image measurements into a reference coordinate system.The result of monoscopic image sequence analysis is a dense raster of glacier surface point trajectories for each image sequence. Each translation vector component in these trajectories can be determined with an accuracy of a few centimetres for points at a distance of several kilometres from the camera. Extensive practical validation experiments have shown that motion vector and trajectory fields derived from monocular image sequences can be used for the determination of high-resolution velocity fields of glaciers, including the analysis of tidal effects on glacier movement, the investigation of a glacier's motion behaviour during calving events, the determination of the position and migration of the grounding line and the detection of subglacial channels during glacier lake outburst floods.

  7. Characterization of circulating transfer RNA-derived RNA fragments in cattle

    PubMed Central

    Casas, Eduardo; Cai, Guohong; Neill, John D.

    2015-01-01

    The objective was to characterize naturally occurring circulating transfer RNA-derived RNA fragments (tRFs) in cattle1. Serum from eight clinically normal adult dairy cows was collected, and small non-coding RNAs were extracted immediately after collection and sequenced by Illumina MiSeq. Sequences aligned to transfer RNA (tRNA) genes or their flanking sequences were characterized. Sequences aligned to the beginning of 5′ end of the mature tRNA were classified as tRF5; those aligned to the 3′ end of mature tRNA were classified as tRF3; and those aligned to the beginning of the 3′ end flanking sequences were classified as tRF1. There were 3,190,962 sequences that mapped to transfer RNA and small non-coding RNAs in the bovine genome. Of these, 2,323,520 were identified as tRF5s, 562 were tRF3s, and 81 were tRF1s. There were 866,799 sequences identified as other small non-coding RNAs (microRNA, rRNA, snoRNA, etc.) and were excluded from the study. The tRF5s ranged from 28 to 40 nucleotides; and 98.7% ranged from 30 to 34 nucleotides in length. The tRFs with the greatest number of sequences were derived from tRNA of histidine, glutamic acid, lysine, glycine, and valine. There was no association between number of codons for each amino acid and number of tRFs in the samples. The reason for tRF5s being the most abundant can only be explained if these sequences are associated with function within the animal. PMID:26379699

  8. A short introduction to cytogenetic studies in mammals with reference to the present volume.

    PubMed

    Graphodatsky, A; Ferguson-Smith, M A; Stanyon, R

    2012-01-01

    Genome diversity has long been studied from the comparative cytogenetic perspective. Early workers documented differences between species in diploid chromosome number and fundamental number. Banding methods allowed more detailed descriptions of between-species rearrangements and classes of differentially staining chromosome material. The infusion of molecular methods into cytogenetics provided a third revolution, which is still not exhausted. Chromosome painting has provided a global view of the translocation history of mammalian genome evolution, well summarized in the contributions to this special volume. More recently, FISH of cloned DNA has provided details on defining breakpoint and intrachromosomal marker order, which have helped to document inversions and centromere repositioning. The most recent trend in comparative molecular cytogenetics is to integrate sequencing information in order to formulate and test reconstructions of ancestral genomes and phylogenomic hypotheses derived from comparative cytogenetics. The integration of comparative cytogenetics and sequencing promises to provide an understanding of what drives chromosome rearrangements and genome evolution in general. We believe that the contributions in this volume, in no small way, point the way to the next phase in cytogenetic studies. Copyright © 2012 S. Karger AG, Basel.

  9. Osteogenic differentiation of 3D cultured mesenchymal stem cells induced by bioactive peptides.

    PubMed

    Lukasova, Vera; Buzgo, Matej; Sovkova, Vera; Dankova, Jana; Rampichova, Michala; Amler, Evzen

    2017-08-01

    Bioactive peptides derived from receptor binding motifs of native proteins are a potent source of bioactive molecules that can induce signalling pathways. These peptides could substitute for osteogenesis promoting supplements. The work presented here compares three kinds of bioactive peptides derived from collagen III, bone morphogenetic protein 7 (BMP-7) and BMP-2 with their potential osteogenic activity on the model of porcine mesenchymal stem cells (pMSCs). pMSCs were cultured on electrospun polycaprolactone nanofibrous scaffolds with different concentrations of the bioactive peptides without addition of any osteogenic supplement. Analysis of pMSCs cultures included measurement of the metabolic activity and proliferation, immunofluorescence staining and also qPCR. Results showed no detrimental effect of the bioactive peptides to cultured pMSCs. Based on qPCR analysis, the bioactive peptides are specific for osteogenic differentiation with no detectable expression of collagen II. Our results further indicate that peptide derived from BMP-2 protein promoted the expression of mRNA for osteocalcin (OCN) and collagen I significantly compared to control groups and also supported deposition of OCN as observed by immunostaining method. The data suggest that bioactive peptide with an amino acid sequence of KIPKASSVPTELSAISTLYL derived from BMP-2 protein was the most potent for triggering osteogenic differentiation of pMSCs. © 2017 John Wiley & Sons Ltd.

  10. Heuristic reusable dynamic programming: efficient updates of local sequence alignment.

    PubMed

    Hong, Changjin; Tewfik, Ahmed H

    2009-01-01

    Recomputation of the previously evaluated similarity results between biological sequences becomes inevitable when researchers realize errors in their sequenced data or when the researchers have to compare nearly similar sequences, e.g., in a family of proteins. We present an efficient scheme for updating local sequence alignments with an affine gap model. In principle, using the previous matching result between two amino acid sequences, we perform a forward-backward alignment to generate heuristic searching bands which are bounded by a set of suboptimal paths. Given a correctly updated sequence, we initially predict a new score of the alignment path for each contour to select the best candidates among them. Then, we run the Smith-Waterman algorithm in this confined space. Furthermore, our heuristic alignment for an updated sequence shows that it can be further accelerated by using reusable dynamic programming (rDP), our prior work. In this study, we successfully validate "relative node tolerance bound" (RNTB) in the pruned searching space. Furthermore, we improve the computational performance by quantifying the successful RNTB tolerance probability and switch to rDP on perturbation-resilient columns only. In our searching space derived by a threshold value of 90 percent of the optimal alignment score, we find that 98.3 percent of contours contain correctly updated paths. We also find that our method consumes only 25.36 percent of the runtime cost of sparse dynamic programming (sDP) method, and to only 2.55 percent of that of a normal dynamic programming with the Smith-Waterman algorithm.

  11. The first genetic map of the American cranberry: exploration of synteny conservation and quantitative trait loci.

    PubMed

    Georgi, Laura; Johnson-Cicalese, Jennifer; Honig, Josh; Das, Sushma Parankush; Rajah, Veeran D; Bhattacharya, Debashish; Bassil, Nahla; Rowland, Lisa J; Polashock, James; Vorsa, Nicholi

    2013-03-01

    The first genetic map of cranberry (Vaccinium macrocarpon) has been constructed, comprising 14 linkage groups totaling 879.9 cM with an estimated coverage of 82.2 %. This map, based on four mapping populations segregating for field fruit-rot resistance, contains 136 distinct loci. Mapped markers include blueberry-derived simple sequence repeat (SSR) and cranberry-derived sequence-characterized amplified region markers previously used for fingerprinting cranberry cultivars. In addition, SSR markers were developed near cranberry sequences resembling genes involved in flavonoid biosynthesis or defense against necrotrophic pathogens, or conserved orthologous set (COS) sequences. The cranberry SSRs were developed from next-generation cranberry genomic sequence assemblies; thus, the positions of these SSRs on the genomic map provide information about the genomic location of the sequence scaffold from which they were derived. The use of SSR markers near COS and other functional sequences, plus 33 SSR markers from blueberry, facilitates comparisons of this map with maps of other plant species. Regions of the cranberry map were identified that showed conservation of synteny with Vitis vinifera and Arabidopsis thaliana. Positioned on this map are quantitative trait loci (QTL) for field fruit-rot resistance (FFRR), fruit weight, titratable acidity, and sound fruit yield (SFY). The SFY QTL is adjacent to one of the fruit weight QTL and may reflect pleiotropy. Two of the FFRR QTL are in regions of conserved synteny with grape and span defense gene markers, and the third FFRR QTL spans a flavonoid biosynthetic gene.

  12. Sequential de novo centromere formation and inactivation on a chromosomal fragment in maize.

    PubMed

    Liu, Yalin; Su, Handong; Pang, Junling; Gao, Zhi; Wang, Xiu-Jie; Birchler, James A; Han, Fangpu

    2015-03-17

    The ability of centromeres to alternate between active and inactive states indicates significant epigenetic aspects controlling centromere assembly and function. In maize (Zea mays), misdivision of the B chromosome centromere on a translocation with the short arm of chromosome 9 (TB-9Sb) can produce many variants with varying centromere sizes and centromeric DNA sequences. In such derivatives of TB-9Sb, we found a de novo centromere on chromosome derivative 3-3, which has no canonical centromeric repeat sequences. This centromere is derived from a 288-kb region on the short arm of chromosome 9, and is 19 megabases (Mb) removed from the translocation breakpoint of chromosome 9 in TB-9Sb. The functional B centromere in progenitor telo2-2 is deleted from derivative 3-3, but some B-repeat sequences remain. The de novo centromere of derivative 3-3 becomes inactive in three further derivatives with new centromeres being formed elsewhere on each chromosome. Our results suggest that de novo centromere initiation is quite common and can persist on chromosomal fragments without a canonical centromere. However, we hypothesize that when de novo centromeres are initiated in opposition to a larger normal centromere, they are cleared from the chromosome by inactivation, thus maintaining karyotype integrity.

  13. Sequential de novo centromere formation and inactivation on a chromosomal fragment in maize

    PubMed Central

    Liu, Yalin; Su, Handong; Pang, Junling; Gao, Zhi; Wang, Xiu-Jie; Birchler, James A.; Han, Fangpu

    2015-01-01

    The ability of centromeres to alternate between active and inactive states indicates significant epigenetic aspects controlling centromere assembly and function. In maize (Zea mays), misdivision of the B chromosome centromere on a translocation with the short arm of chromosome 9 (TB-9Sb) can produce many variants with varying centromere sizes and centromeric DNA sequences. In such derivatives of TB-9Sb, we found a de novo centromere on chromosome derivative 3-3, which has no canonical centromeric repeat sequences. This centromere is derived from a 288-kb region on the short arm of chromosome 9, and is 19 megabases (Mb) removed from the translocation breakpoint of chromosome 9 in TB-9Sb. The functional B centromere in progenitor telo2-2 is deleted from derivative 3-3, but some B-repeat sequences remain. The de novo centromere of derivative 3-3 becomes inactive in three further derivatives with new centromeres being formed elsewhere on each chromosome. Our results suggest that de novo centromere initiation is quite common and can persist on chromosomal fragments without a canonical centromere. However, we hypothesize that when de novo centromeres are initiated in opposition to a larger normal centromere, they are cleared from the chromosome by inactivation, thus maintaining karyotype integrity. PMID:25733907

  14. Deriving high-resolution protein backbone structure propensities from all crystal data using the information maximization device.

    PubMed

    Solis, Armando D

    2014-01-01

    The most informative probability distribution functions (PDFs) describing the Ramachandran phi-psi dihedral angle pair, a fundamental descriptor of backbone conformation of protein molecules, are derived from high-resolution X-ray crystal structures using an information-theoretic approach. The Information Maximization Device (IMD) is established, based on fundamental information-theoretic concepts, and then applied specifically to derive highly resolved phi-psi maps for all 20 single amino acid and all 8000 triplet sequences at an optimal resolution determined by the volume of current data. The paper shows that utilizing the latent information contained in all viable high-resolution crystal structures found in the Protein Data Bank (PDB), totaling more than 77,000 chains, permits the derivation of a large number of optimized sequence-dependent PDFs. This work demonstrates the effectiveness of the IMD and the superiority of the resulting PDFs by extensive fold recognition experiments and rigorous comparisons with previously published triplet PDFs. Because it automatically optimizes PDFs, IMD results in improved performance of knowledge-based potentials, which rely on such PDFs. Furthermore, it provides an easy computational recipe for empirically deriving other kinds of sequence-dependent structural PDFs with greater detail and precision. The high-resolution phi-psi maps derived in this work are available for download.

  15. Sequence Analysis of IncA/C and IncI1 Plasmids Isolated from Multidrug-Resistant Salmonella Newport Using Single-Molecule Real-Time Sequencing.

    PubMed

    Cao, Guojie; Allard, Marc; Hoffmann, Maria; Muruvanda, Tim; Luo, Yan; Payne, Justin; Meng, Kevin; Zhao, Shaohua; McDermott, Patrick; Brown, Eric; Meng, Jianghong

    2018-06-01

    Multidrug-resistant (MDR) plasmids play an important role in disseminating antimicrobial resistance genes. To elucidate the antimicrobial resistance gene compositions in A/C incompatibility complex (IncA/C) plasmids carried by animal-derived MDR Salmonella Newport, and to investigate the spread mechanism of IncA/C plasmids, this study characterizes the complete nucleotide sequences of IncA/C plasmids by comparative analysis. Complete nucleotide sequencing of plasmids and chromosomes of six MDR Salmonella Newport strains was performed using PacBio RSII. Open reading frames were assigned using prokaryotic genome annotation pipeline (PGAP). To understand genomic diversity and evolutionary relationships among Salmonella Newport IncA/C plasmids, we included three complete IncA/C plasmid sequences with similar backbones from Salmonella Newport and Escherichia coli: pSN254, pAM04528, and peH4H, and additional 200 draft chromosomes. With the exception of canine isolate CVM22462, which contained an additional IncI1 plasmid, each of the six MDR Salmonella Newport strains contained only the IncA/C plasmid. These IncA/C plasmids (including references) ranged in size from 80.1 (pCVM21538) to 176.5 kb (pSN254) and carried various resistance genes. Resistance genes floR, tetA, tetR, strA, strB, sul, and mer were identified in all IncA/C plasmids. Additionally, bla CMY-2 and sugE were present in all IncA/C plasmids, excepting pCVM21538. Plasmid pCVM22462 was capable of being transferred by conjugation. The IncI1 plasmid pCVM22462b in CVM22462 carried bla CMY-2 and sugE. Our data showed that MDR Salmonella Newport strains carrying similar IncA/C plasmids clustered together in the phylogenetic tree using chromosome sequences and the IncA/C plasmids from animal-derived Salmonella Newport contained diverse resistance genes. In the current study, we analyzed genomic diversities and phylogenetic relationships among MDR Salmonella Newport using complete plasmids and chromosome sequences and provided possible spread mechanism of IncA/C plasmids in Salmonella Newport Lineage II.

  16. A small and efficient dimerization/packaging signal of rat VL30 RNA and its use in murine leukemia virus-VL30-derived vectors for gene transfer.

    PubMed

    Torrent, C; Gabus, C; Darlix, J L

    1994-02-01

    Retroviral genomes consist of two identical RNA molecules associated at their 5' ends by the dimer linkage structure located in the packaging element (Psi or E) necessary for RNA dimerization in vitro and packaging in vivo. In murine leukemia virus (MLV)-derived vectors designed for gene transfer, the Psi + sequence of 600 nucleotides directs the packaging of recombinant RNAs into MLV virions produced by helper cells. By using in vitro RNA dimerization as a screening system, a sequence of rat VL30 RNA located next to the 5' end of the Harvey mouse sarcoma virus genome and as small as 67 nucleotides was found to form stable dimeric RNA. In addition, a purine-rich sequence located at the 5' end of this VL30 RNA seems to be critical for RNA dimerization. When this VL30 element was extended by 107 nucleotides at its 3' end and inserted into an MLV-derived vector lacking MLV Psi +, it directed the efficient encapsidation of recombinant RNAs into MLV virions. Because this VL30 packaging signal is smaller and more efficient in packaging recombinant RNAs than the MLV Psi + and does not contain gag or glyco-gag coding sequences, its use in MLV-derived vectors should render even more unlikely recombinations which could generate replication-competent viruses. Therefore, utilization of the rat VL30 packaging sequence should improve the biological safety of MLV vectors for human gene transfer.

  17. Characterizing protein conformations by correlation analysis of coarse-grained contact matrices.

    PubMed

    Lindsay, Richard J; Siess, Jan; Lohry, David P; McGee, Trevor S; Ritchie, Jordan S; Johnson, Quentin R; Shen, Tongye

    2018-01-14

    We have developed a method to capture the essential conformational dynamics of folded biopolymers using statistical analysis of coarse-grained segment-segment contacts. Previously, the residue-residue contact analysis of simulation trajectories was successfully applied to the detection of conformational switching motions in biomolecular complexes. However, the application to large protein systems (larger than 1000 amino acid residues) is challenging using the description of residue contacts. Also, the residue-based method cannot be used to compare proteins with different sequences. To expand the scope of the method, we have tested several coarse-graining schemes that group a collection of consecutive residues into a segment. The definition of these segments may be derived from structural and sequence information, while the interaction strength of the coarse-grained segment-segment contacts is a function of the residue-residue contacts. We then perform covariance calculations on these coarse-grained contact matrices. We monitored how well the principal components of the contact matrices is preserved using various rendering functions. The new method was demonstrated to assist the reduction of the degrees of freedom for describing the conformation space, and it potentially allows for the analysis of a system that is approximately tenfold larger compared with the corresponding residue contact-based method. This method can also render a family of similar proteins into the same conformational space, and thus can be used to compare the structures of proteins with different sequences.

  18. Characterizing protein conformations by correlation analysis of coarse-grained contact matrices

    NASA Astrophysics Data System (ADS)

    Lindsay, Richard J.; Siess, Jan; Lohry, David P.; McGee, Trevor S.; Ritchie, Jordan S.; Johnson, Quentin R.; Shen, Tongye

    2018-01-01

    We have developed a method to capture the essential conformational dynamics of folded biopolymers using statistical analysis of coarse-grained segment-segment contacts. Previously, the residue-residue contact analysis of simulation trajectories was successfully applied to the detection of conformational switching motions in biomolecular complexes. However, the application to large protein systems (larger than 1000 amino acid residues) is challenging using the description of residue contacts. Also, the residue-based method cannot be used to compare proteins with different sequences. To expand the scope of the method, we have tested several coarse-graining schemes that group a collection of consecutive residues into a segment. The definition of these segments may be derived from structural and sequence information, while the interaction strength of the coarse-grained segment-segment contacts is a function of the residue-residue contacts. We then perform covariance calculations on these coarse-grained contact matrices. We monitored how well the principal components of the contact matrices is preserved using various rendering functions. The new method was demonstrated to assist the reduction of the degrees of freedom for describing the conformation space, and it potentially allows for the analysis of a system that is approximately tenfold larger compared with the corresponding residue contact-based method. This method can also render a family of similar proteins into the same conformational space, and thus can be used to compare the structures of proteins with different sequences.

  19. Number of 24-Hour Diet Recalls Needed to Estimate Energy Intake

    PubMed Central

    MA, Yunsheng; Olendzki, Barbara C.; Pagoto, Sherry L.; Hurley, Thomas G.; Magner, Robert P.; Ockene, Ira S.; Schneider, Kristin L.; Merriam, Philip A.; Hébert, James R.

    2009-01-01

    Purpose Twenty-four-hour diet recall interviews (24HRs) are used to assess diet and to validate other diet assessment instruments. Therefore it is important to know how many 24HRs are required to describe an individual's intake. Method Seventy-nine middle-aged white women completed seven 24HRs over a 14-day period, during which energy expenditure (EE) was determined by the doubly labeled water method (DLW). Mean daily intakes were compared to DLW-derived EE using paired t tests. Linear mixed models were used to evaluate the effect of call sequence and day of the week on 24HR-derived energy intake while adjusting for education, relative body weight, social desirability, and an interaction between call sequence and social desirability. Results Mean EE from DLW was 2115 kcal/day. Adjusted 24HR-derived energy intake was lowest at call 1 (1501 kcal/day); significantly higher energy intake was observed at calls 2 and 3 (2246 and 2315 kcal/day, respectively). Energy intake on Friday was significantly lower than on Sunday. Averaging energy intake from the first two calls better approximated true energy expenditure than did the first call, and averaging the first three calls further improved the estimate (p = 0.02 for both comparisons). Additional calls did not improve estimation. Conclusions Energy intake is underreported on the first 24HR. Three 24HRs appear optimal for estimating energy intake. PMID:19576535

  20. Number of 24-hour diet recalls needed to estimate energy intake.

    PubMed

    Ma, Yunsheng; Olendzki, Barbara C; Pagoto, Sherry L; Hurley, Thomas G; Magner, Robert P; Ockene, Ira S; Schneider, Kristin L; Merriam, Philip A; Hébert, James R

    2009-08-01

    Twenty-four-hour diet recall interviews (24HRs) are used to assess diet and to validate other diet assessment instruments. Therefore it is important to know how many 24HRs are required to describe an individual's intake. Seventy-nine middle-aged white women completed seven 24HRs over a 14-day period, during which energy expenditure (EE) was determined by the doubly labeled water method (DLW). Mean daily intakes were compared to DLW-derived EE using paired t tests. Linear mixed models were used to evaluate the effect of call sequence and day of the week on 24HR-derived energy intake while adjusting for education, relative body weight, social desirability, and an interaction between call sequence and social desirability. Mean EE from DLW was 2115 kcal/day. Adjusted 24HR-derived energy intake was lowest at call 1 (1501 kcal/day); significantly higher energy intake was observed at calls 2 and 3 (2246 and 2315 kcal/day, respectively). Energy intake on Friday was significantly lower than on Sunday. Averaging energy intake from the first two calls better approximated true energy expenditure than did the first call, and averaging the first three calls further improved the estimate (p=0.02 for both comparisons). Additional calls did not improve estimation. Energy intake is underreported on the first 24HR. Three 24HRs appear optimal for estimating energy intake.

  1. Evidence for plant-derived xenomiRs based on a large-scale analysis of public small RNA sequencing data from human samples.

    PubMed

    Zhao, Qi; Liu, Yuanning; Zhang, Ning; Hu, Menghan; Zhang, Hao; Joshi, Trupti; Xu, Dong

    2018-01-01

    In recent years, an increasing number of studies have reported the presence of plant miRNAs in human samples, which resulted in a hypothesis asserting the existence of plant-derived exogenous microRNA (xenomiR). However, this hypothesis is not widely accepted in the scientific community due to possible sample contamination and the small sample size with lack of rigorous statistical analysis. This study provides a systematic statistical test that can validate (or invalidate) the plant-derived xenomiR hypothesis by analyzing 388 small RNA sequencing data from human samples in 11 types of body fluids/tissues. A total of 166 types of plant miRNAs were found in at least one human sample, of which 14 plant miRNAs represented more than 80% of the total plant miRNAs abundance in human samples. Plant miRNA profiles were characterized to be tissue-specific in different human samples. Meanwhile, the plant miRNAs identified from microbiome have an insignificant abundance compared to those from humans, while plant miRNA profiles in human samples were significantly different from those in plants, suggesting that sample contamination is an unlikely reason for all the plant miRNAs detected in human samples. This study also provides a set of testable synthetic miRNAs with isotopes that can be detected in situ after being fed to animals.

  2. Cluster-Based Multipolling Sequencing Algorithm for Collecting RFID Data in Wireless LANs

    NASA Astrophysics Data System (ADS)

    Choi, Woo-Yong; Chatterjee, Mainak

    2015-03-01

    With the growing use of RFID (Radio Frequency Identification), it is becoming important to devise ways to read RFID tags in real time. Access points (APs) of IEEE 802.11-based wireless Local Area Networks (LANs) are being integrated with RFID networks that can efficiently collect real-time RFID data. Several schemes, such as multipolling methods based on the dynamic search algorithm and random sequencing, have been proposed. However, as the number of RFID readers associated with an AP increases, it becomes difficult for the dynamic search algorithm to derive the multipolling sequence in real time. Though multipolling methods can eliminate the polling overhead, we still need to enhance the performance of the multipolling methods based on random sequencing. To that extent, we propose a real-time cluster-based multipolling sequencing algorithm that drastically eliminates more than 90% of the polling overhead, particularly so when the dynamic search algorithm fails to derive the multipolling sequence in real time.

  3. Paleovirology of bornaviruses: What can be learned from molecular fossils of bornaviruses.

    PubMed

    Horie, Masayuki; Tomonaga, Keizo

    2018-04-06

    Endogenous viral elements (EVEs) are virus-derived sequences embedded in eukaryotic genomes formed by germline integration of viral sequences. As many EVEs were integrated into eukaryotic genomes millions of years ago, EVEs are considered molecular fossils of viruses. EVEs can be valuable informational sources about ancient viruses, including their time scale, geographical distribution, genetic information, and hosts. Although integration of viral sequences is not required for replications of viruses other than retroviruses, many non-retroviral EVEs have been reported to exist in eukaryotes. Investigation of these EVEs has expanded our knowledge regarding virus-host interactions, as well as provided information on ancient viruses. Among them, EVEs derived from bornaviruses, non-retroviral RNA viruses, have been relatively well studied. Bornavirus-derived EVEs are widely distributed in animal genomes, including the human genome, and the history of bornaviruses can be dated back to more than 65 million years. Although there are several reports focusing on the biological significance of bornavirus-derived sequences in mammals, paleovirology of bornaviruses has not yet been well described and summarized. In this paper, we describe what can be learned about bornaviruses from endogenous bornavirus-like elements from the view of paleovirology using published results and our novel data. Copyright © 2018 Elsevier B.V. All rights reserved.

  4. New fundamental parameters for attitude representation

    NASA Astrophysics Data System (ADS)

    Patera, Russell P.

    2017-08-01

    A new attitude parameter set is developed to clarify the geometry of combining finite rotations in a rotational sequence and in combining infinitesimal angular increments generated by angular rate. The resulting parameter set of six Pivot Parameters represents a rotation as a great circle arc on a unit sphere that can be located at any clocking location in the rotation plane. Two rotations are combined by linking their arcs at either of the two intersection points of the respective rotation planes. In a similar fashion, linking rotational increments produced by angular rate is used to derive the associated kinematical equations, which are linear and have no singularities. Included in this paper is the derivation of twelve Pivot Parameter elements that represent all twelve Euler Angle sequences, which enables efficient conversions between Pivot Parameters and any Euler Angle sequence. Applications of this new parameter set include the derivation of quaternions and the quaternion composition rule, as well as, the derivation of the analytical solution to time dependent coning motion. The relationships between Pivot Parameters and traditional parameter sets are included in this work. Pivot Parameters are well suited for a variety of aerospace applications due to their effective composition rule, singularity free kinematic equations, efficient conversion to and from Euler Angle sequences and clarity of their geometrical foundation.

  5. Guidance to rational use of pharmaceuticals in gallbladder sarcomatoid carcinoma using patient-derived cancer cells and whole exome sequencing.

    PubMed

    Feng, Feiling; Cheng, Qingbao; Yang, Liang; Zhang, Dadong; Ji, Shunlong; Zhang, Qiangzu; Lin, Yihui; Li, Fugen; Xiong, Lei; Liu, Chen; Jiang, Xiaoqing

    2017-01-17

    Gallbladder sarcomatoid carcinoma is a rare cancer with no clinical standard treatment. With the rapid development of next generation sequencing, it has been able to provide reasonable treatment options for patients based on genetic variations. However, most cancer drugs are not approval for gallbladder sarcomatoid carcinoma indications. The correlation between drug response and a genetic variation needs to be further elucidated. Three patient-derived cells-JXQ-3D-001, JXQ-3D-002, and JXQ-3D-003, were derived from biopsy samples of one gallbladder sarcomatoid carcinoma patient with progression and have been characterized. In order to study the relationship between drug sensitivity and gene alteration, genetic mutations of three patient-derived cells were discovered by whole exome sequencing, and drug screening has been performed based on the gene alterations and related signaling pathways that are associated with drug targets. It has been found that there are differences in biological characteristics such as morphology, cell proliferation, cell migration and colony formation activity among these three patient-derived cells although they are derived from the same patient. Their sensitivities to the chemotherapy drugs-Fluorouracil, Doxorubicin, and Cisplatin are distinct. Moreover, none of common chemotherapy drugs could inhibit the proliferations of all three patient-derived cells. Comprehensive analysis of their whole exome sequencing demonstrated that tumor-associated genes TP53, AKT2, FGFR3, FGF10, SDHA, and PI3KCA were mutated or amplified. Part of these alterations are actionable. By screening a set of compounds that are associated with the genetic alteration, it has been found that GDC-0941 and PF-04691502 for PI3K-AKT-mTOR pathway inhibitors could dramatically decrease the proliferation of three patient-derived cells. Importantly, expression of phosphorylated AKT and phosphorylated S6 were markedly decreased after treatments with PI3K-AKT-mTOR pathway inhibitors GDC-0941 (0.5 μM) and PF-04691502 (0.1 μM) in all three patient-derived cells. These data suggested that inhibition of the PI3K-AKT-mTOR pathway that was activated by PIK3CA amplification in all three patient-derived cells could reduce the cell proliferation. A patient-derived cell model combined with whole exome sequencing is a powerful tool to elucidate relationship between drug sensitivities and genetic alternations. In these gallbladder sarcomatoid carcinoma patient-derived cells, it is found that PIK3CA amplification could be used as a biomarker to indicate PI3K-AKT-mTOR pathway activation. Block of the pathway may benefit the gallbladder sarcomatoid carcinoma patient with this alternation in hypothesis. The real efficacy needs to be confirmed in vivo or in a clinical trial.

  6. Inferences from structural comparison: flexibility, secondary structure wobble and sequence alignment optimization.

    PubMed

    Zhang, Gaihua; Su, Zhen

    2012-01-01

    Work on protein structure prediction is very useful in biological research. To evaluate their accuracy, experimental protein structures or their derived data are used as the 'gold standard'. However, as proteins are dynamic molecular machines with structural flexibility such a standard may be unreliable. To investigate the influence of the structure flexibility, we analysed 3,652 protein structures of 137 unique sequences from 24 protein families. The results showed that (1) the three-dimensional (3D) protein structures were not rigid: the root-mean-square deviation (RMSD) of the backbone Cα of structures with identical sequences was relatively large, with the average of the maximum RMSD from each of the 137 sequences being 1.06 Å; (2) the derived data of the 3D structure was not constant, e.g. the highest ratio of the secondary structure wobble site was 60.69%, with the sequence alignments from structural comparisons of two proteins in the same family sometimes being completely different. Proteins may have several stable conformations and the data derived from resolved structures as a 'gold standard' should be optimized before being utilized as criteria to evaluate the prediction methods, e.g. sequence alignment from structural comparison. Helix/β-sheet transition exists in normal free proteins. The coil ratio of the 3D structure could affect its resolution as determined by X-ray crystallography.

  7. Differential sequence diversity at merozoite surface protein-1 locus of Plasmodium knowlesi from humans and macaques in Thailand.

    PubMed

    Putaporntip, Chaturong; Thongaree, Siriporn; Jongwutiwes, Somchai

    2013-08-01

    To determine the genetic diversity and potential transmission routes of Plasmodium knowlesi, we analyzed the complete nucleotide sequence of the gene encoding the merozoite surface protein-1 of this simian malaria (Pkmsp-1), an asexual blood-stage vaccine candidate, from naturally infected humans and macaques in Thailand. Analysis of Pkmsp-1 sequences from humans (n=12) and monkeys (n=12) reveals five conserved and four variable domains. Most nucleotide substitutions in conserved domains were dimorphic whereas three of four variable domains contained complex repeats with extensive sequence and size variation. Besides purifying selection in conserved domains, evidence of intragenic recombination scattering across Pkmsp-1 was detected. The number of haplotypes, haplotype diversity, nucleotide diversity and recombination sites of human-derived sequences exceeded that of monkey-derived sequences. Phylogenetic networks based on concatenated conserved sequences of Pkmsp-1 displayed a character pattern that could have arisen from sampling process or the presence of two independent routes of P. knowlesi transmission, i.e. from macaques to human and from human to humans in Thailand. Copyright © 2013 Elsevier B.V. All rights reserved.

  8. Cyanobacterial Diversity in Microbial Mats from the Hypersaline Lagoon System of Araruama, Brazil: An In-depth Polyphasic Study.

    PubMed

    Ramos, Vitor M C; Castelo-Branco, Raquel; Leão, Pedro N; Martins, Joana; Carvalhal-Gomes, Sinda; Sobrinho da Silva, Frederico; Mendonça Filho, João G; Vasconcelos, Vitor M

    2017-01-01

    Microbial mats are complex, micro-scale ecosystems that can be found in a wide range of environments. In the top layer of photosynthetic mats from hypersaline environments, a large diversity of cyanobacteria typically predominates. With the aim of strengthening the knowledge on the cyanobacterial diversity present in the coastal lagoon system of Araruama (state of Rio de Janeiro, Brazil), we have characterized three mat samples by means of a polyphasic approach. We have used morphological and molecular data obtained by culture-dependent and -independent methods. Moreover, we have compared different classification methodologies and discussed the outcomes, challenges, and pitfalls of these methods. Overall, we show that Araruama's lagoons harbor a high cyanobacterial diversity. Thirty-six unique morphospecies could be differentiated, which increases by more than 15% the number of morphospecies and genera already reported for the entire Araruama system. Morphology-based data were compared with the 16S rRNA gene phylogeny derived from isolate sequences and environmental sequences obtained by PCR-DGGE and pyrosequencing. Most of the 48 phylotypes could be associated with the observed morphospecies at the order level. More than one third of the sequences demonstrated to be closely affiliated (best BLAST hit results of ≥99%) with cyanobacteria from ecologically similar habitats. Some sequences had no close relatives in the public databases, including one from an isolate, being placed as "loner" sequences within different orders. This hints at hidden cyanobacterial diversity in the mats of the Araruama system, while reinforcing the relevance of using complementary approaches to study cyanobacterial diversity.

  9. Cyanobacterial Diversity in Microbial Mats from the Hypersaline Lagoon System of Araruama, Brazil: An In-depth Polyphasic Study

    PubMed Central

    Ramos, Vitor M. C.; Castelo-Branco, Raquel; Leão, Pedro N.; Martins, Joana; Carvalhal-Gomes, Sinda; Sobrinho da Silva, Frederico; Mendonça Filho, João G.; Vasconcelos, Vitor M.

    2017-01-01

    Microbial mats are complex, micro-scale ecosystems that can be found in a wide range of environments. In the top layer of photosynthetic mats from hypersaline environments, a large diversity of cyanobacteria typically predominates. With the aim of strengthening the knowledge on the cyanobacterial diversity present in the coastal lagoon system of Araruama (state of Rio de Janeiro, Brazil), we have characterized three mat samples by means of a polyphasic approach. We have used morphological and molecular data obtained by culture-dependent and -independent methods. Moreover, we have compared different classification methodologies and discussed the outcomes, challenges, and pitfalls of these methods. Overall, we show that Araruama's lagoons harbor a high cyanobacterial diversity. Thirty-six unique morphospecies could be differentiated, which increases by more than 15% the number of morphospecies and genera already reported for the entire Araruama system. Morphology-based data were compared with the 16S rRNA gene phylogeny derived from isolate sequences and environmental sequences obtained by PCR-DGGE and pyrosequencing. Most of the 48 phylotypes could be associated with the observed morphospecies at the order level. More than one third of the sequences demonstrated to be closely affiliated (best BLAST hit results of ≥99%) with cyanobacteria from ecologically similar habitats. Some sequences had no close relatives in the public databases, including one from an isolate, being placed as “loner” sequences within different orders. This hints at hidden cyanobacterial diversity in the mats of the Araruama system, while reinforcing the relevance of using complementary approaches to study cyanobacterial diversity. PMID:28713360

  10. Memory as embodiment: The case of modality and serial short-term memory.

    PubMed

    Macken, Bill; Taylor, John C; Kozlov, Michail D; Hughes, Robert W; Jones, Dylan M

    2016-10-01

    Classical explanations for the modality effect-superior short-term serial recall of auditory compared to visual sequences-typically recur to privileged processing of information derived from auditory sources. Here we critically appraise such accounts, and re-evaluate the nature of the canonical empirical phenomena that have motivated them. Three experiments show that the standard account of modality in memory is untenable, since auditory superiority in recency is often accompanied by visual superiority in mid-list serial positions. We explain this simultaneous auditory and visual superiority by reference to the way in which perceptual objects are formed in the two modalities and how those objects are mapped to speech motor forms to support sequence maintenance and reproduction. Specifically, stronger obligatory object formation operating in the standard auditory form of sequence presentation compared to that for visual sequences leads both to enhanced addressability of information at the object boundaries and reduced addressability for that in the interior. Because standard visual presentation does not lead to such object formation, such sequences do not show the boundary advantage observed for auditory presentation, but neither do they suffer loss of addressability associated with object information, thereby affording more ready mapping of that information into a rehearsal cohort to support recall. We show that a range of factors that impede this perceptual-motor mapping eliminate visual superiority while leaving auditory superiority unaffected. We make a general case for viewing short-term memory as an embodied, perceptual-motor process. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  11. Generation of expressed sequence tags for discovery of genes responsible for floral traits of Chrysanthemum morifolium by next-generation sequencing technology.

    PubMed

    Sasaki, Katsutomo; Mitsuda, Nobutaka; Nashima, Kenji; Kishimoto, Kyutaro; Katayose, Yuichi; Kanamori, Hiroyuki; Ohmiya, Akemi

    2017-09-04

    Chrysanthemum morifolium is one of the most economically valuable ornamental plants worldwide. Chrysanthemum is an allohexaploid plant with a large genome that is commercially propagated by vegetative reproduction. New cultivars with different floral traits, such as color, morphology, and scent, have been generated mainly by classical cross-breeding and mutation breeding. However, only limited genetic resources and their genome information are available for the generation of new floral traits. To obtain useful information about molecular bases for floral traits of chrysanthemums, we read expressed sequence tags (ESTs) of chrysanthemums by high-throughput sequencing using the 454 pyrosequencing technology. We constructed normalized cDNA libraries, consisting of full-length, 3'-UTR, and 5'-UTR cDNAs derived from various tissues of chrysanthemums. These libraries produced a total number of 3,772,677 high-quality reads, which were assembled into 213,204 contigs. By comparing the data obtained with those of full genome-sequenced species, we confirmed that our chrysanthemum contig set contained the majority of all expressed genes, which was sufficient for further molecular analysis in chrysanthemums. We confirmed that our chrysanthemum EST set (contigs) contained a number of contigs that encoded transcription factors and enzymes involved in pigment and aroma compound metabolism that was comparable to that of other species. This information can serve as an informative resource for identifying genes involved in various biological processes in chrysanthemums. Moreover, the findings of our study will contribute to a better understanding of the floral characteristics of chrysanthemums including the myriad cultivars at the molecular level.

  12. Developing expressed sequence tag libraries and the discovery of simple sequence repeat markers for two species of raspberry (Rubus L.).

    PubMed

    Bushakra, Jill M; Lewers, Kim S; Staton, Margaret E; Zhebentyayeva, Tetyana; Saski, Christopher A

    2015-10-26

    Due to a relatively high level of codominant inheritance and transferability within and among taxonomic groups, simple sequence repeat (SSR) markers are important elements in comparative mapping and delineation of genomic regions associated with traits of economic importance. Expressed sequence tags (ESTs) are a source of SSRs that can be used to develop markers to facilitate plant breeding and for more basic research across genera and higher plant orders. Leaf and meristem tissue from 'Heritage' red raspberry (Rubus idaeus) and 'Bristol' black raspberry (R. occidentalis) were utilized for RNA extraction. After conversion to cDNA and library construction, ESTs were sequenced, quality verified, assembled and scanned for SSRs.  Primers flanking the SSRs were designed and a subset tested for amplification, polymorphism and transferability across species. ESTs containing SSRs were functionally annotated using the GenBank non-redundant (nr) database and further classified using the gene ontology database. To accelerate development of EST-SSRs in the genus Rubus (Rosaceae), 1149 and 2358 cDNA sequences were generated from red raspberry and black raspberry, respectively. The cDNA sequences were screened using rigorous filtering criteria which resulted in the identification of 121 and 257 SSR loci for red and black raspberry, respectively. Primers were designed from the surrounding sequences resulting in 131 and 288 primer pairs, respectively, as some sequences contained more than one SSR locus. Sequence analysis revealed that the SSR-containing genes span a diversity of functions and share more sequence identity with strawberry genes than with other Rosaceous species. This resource of Rubus-specific, gene-derived markers will facilitate the construction of linkage maps composed of transferable markers for studying and manipulating important traits in this economically important genus.

  13. Minimizing the average distance to a closest leaf in a phylogenetic tree.

    PubMed

    Matsen, Frederick A; Gallagher, Aaron; McCoy, Connor O

    2013-11-01

    When performing an analysis on a collection of molecular sequences, it can be convenient to reduce the number of sequences under consideration while maintaining some characteristic of a larger collection of sequences. For example, one may wish to select a subset of high-quality sequences that represent the diversity of a larger collection of sequences. One may also wish to specialize a large database of characterized "reference sequences" to a smaller subset that is as close as possible on average to a collection of "query sequences" of interest. Such a representative subset can be useful whenever one wishes to find a set of reference sequences that is appropriate to use for comparative analysis of environmentally derived sequences, such as for selecting "reference tree" sequences for phylogenetic placement of metagenomic reads. In this article, we formalize these problems in terms of the minimization of the Average Distance to the Closest Leaf (ADCL) and investigate algorithms to perform the relevant minimization. We show that the greedy algorithm is not effective, show that a variant of the Partitioning Around Medoids (PAM) heuristic gets stuck in local minima, and develop an exact dynamic programming approach. Using this exact program we note that the performance of PAM appears to be good for simulated trees, and is faster than the exact algorithm for small trees. On the other hand, the exact program gives solutions for all numbers of leaves less than or equal to the given desired number of leaves, whereas PAM only gives a solution for the prespecified number of leaves. Via application to real data, we show that the ADCL criterion chooses chimeric sequences less often than random subsets, whereas the maximization of phylogenetic diversity chooses them more often than random. These algorithms have been implemented in publicly available software.

  14. CombAlign: a code for generating a one-to-many sequence alignment from a set of pairwise structure-based sequence alignments.

    PubMed

    Zhou, Carol L Ecale

    2015-01-01

    In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure. This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins. CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.

  15. High-Resolution Melting Analysis for Rapid Detection of Sequence Type 131 Escherichia coli.

    PubMed

    Harrison, Lucas B; Hanson, Nancy D

    2017-06-01

    Escherichia coli isolates belonging to the sequence type 131 (ST131) clonal complex have been associated with the global distribution of fluoroquinolone and β-lactam resistance. Whole-genome sequencing and multilocus sequence typing identify sequence type but are expensive when evaluating large numbers of samples. This study was designed to develop a cost-effective screening tool using high-resolution melting (HRM) analysis to differentiate ST131 from non-ST131 E. coli in large sample populations in the absence of sequence analysis. The method was optimized using DNA from 12 E. coli isolates. Singleplex PCR was performed using 10 ng of DNA, Type-it HRM buffer, and multilocus sequence typing primers and was followed by multiplex PCR. The amplicon sizes ranged from 630 to 737 bp. Melt temperature peaks were determined by performing HRM analysis at 0.1°C resolution from 50 to 95°C on a Rotor-Gene Q 5-plex HRM system. Derivative melt curves were compared between sequence types and analyzed by principal component analysis. A blinded study of 191 E. coli isolates of ST131 and unknown sequence types validated this methodology. This methodology returned 99.2% specificity (124 true negatives and 1 false positive) and 100% sensitivity (66 true positives and 0 false negatives). This HRM methodology distinguishes ST131 from non-ST131 E. coli without sequence analysis. The analysis can be accomplished in about 3 h in any laboratory with an HRM-capable instrument and principal component analysis software. Therefore, this assay is a fast and cost-effective alternative to sequencing-based ST131 identification. Copyright © 2017 Harrison and Hanson.

  16. Varied prevalence of Borna disease virus infection in Arabic, thoroughbred and their cross-bred horses in Iran.

    PubMed

    Bahmani, M K; Nowrouzian, I; Nakaya, T; Nakamura, Y; Hagiwara, K; Takahashi, H; Rad, M A; Ikuta, K

    1996-11-01

    Borna disease virus (BDV) naturally infects horses and sheep and induces progressive poliomeningoencephalomyelitis. Here, BDV recombinant proteins of the first open reading frame (ORF-I; coding for p40 nucleoprotein) and the second ORF-II (coding for p24 polymerase cofactor) were immunoblotted with plasma derived from 72 healthy (28 Arabic, 17 thoroughbred and 27 cross-bred) race horses at Tehran in Iran to detect anti-BDV antibodies. In addition, their peripheral blood mononuclear cells (PBMCs) were also examined for BDV RNA by a nested reverse transcriptase-polymerase chain reaction (RT-PCR) at ORF-II. The prevalence of BDV antibodies and/or RNA was 41.2% in Arabic, 23.5% in thoroughbred, and 33.3% in cross-bred horses, but only 17.9, 5.9, and 11.1% of them, respectively, showed positive signals for both BDV antibodies and RNA. Especially, cross-bred horses showed a higher prevalence for BDV RNA, which was detected only in females. In addition, significantly higher prevalence for BDV RNA was observed in Arabic males and thoroughbred females. The BDV prevalence did not increase with aging of the horse. Sequencing at the region of BDV derived from Iranian horses revealed a slight difference from those of Japanese horse- and European horse-derived BDVs even in the amino acid residues, although those in the three groups of Iranian horses were quite similar. Thus, the varied prevalence of BDV was observed with the horse strain or sex in Iranian horses, although BDV sequences were very similar among all three groups in Iran compared with those derived from other countries.

  17. Dual-functioning peptides discovered by phage display increase the magnitude and specificity of BMSC attachment to mineralized biomaterials.

    PubMed

    Ramaraju, Harsha; Miller, Sharon J; Kohn, David H

    2017-07-01

    Design of biomaterials for cell-based therapies requires presentation of specific physical and chemical cues to cells, analogous to cues provided by native extracellular matrices (ECM). We previously identified a peptide sequence with high affinity towards apatite (VTKHLNQISQSY, VTK) using phage display. The aims of this study were to identify a human MSC-specific peptide sequence through phage display, combine it with the apatite-specific sequence, and verify the specificity of the combined dual-functioning peptide to both apatite and human bone marrow stromal cells. In this study, a combinatorial phage display identified the cell binding sequence (DPIYALSWSGMA, DPI) which was combined with the mineral binding sequence to generate the dual peptide DPI-VTK. DPI-VTK demonstrated significantly greater binding affinity (1/K D ) to apatite surfaces compared to VTK, phosphorylated VTK (VTK phos ), DPI-VTK phos , RGD-VTK, and peptide-free apatite surfaces (p < 0.01), while significantly increasing hBMSC adhesion strength (τ 50 , p < 0.01). MSCs demonstrated significantly greater adhesion strength to DPI-VTK compared to other cell types, while attachment of MC3T3 pre-osteoblasts and murine fibroblasts was limited (p < 0.01). MSCs on DPI-VTK coated surfaces also demonstrated increased spreading compared to pre-osteoblasts and fibroblasts. MSCs cultured on DPI-VTK coated apatite films exhibited significantly greater proliferation compared to controls (p < 0.001). Moreover, early and late stage osteogenic differentiation markers were elevated on DPI-VTK coated apatite films compared to controls. Taken together, phage display can identify non-obvious cell and material specific peptides to increase human MSC adhesion strength to specific biomaterial surfaces and subsequently increase cell proliferation and differentiation. These new peptides expand biomaterial design methodology for cell-based regeneration of bone defects. This strategy of combining cell and material binding phage display derived peptides is broadly applicable to a variety of systems requiring targeted adhesion of specific cell populations, and may be generalized to the engineering of any adhesion surface. Copyright © 2017 Elsevier Ltd. All rights reserved.

  18. A Single Amino Acid Substitution in the v-Eyk Intracellular Domain Results in Activation of Stat3 and Enhances Cellular Transformation

    PubMed Central

    Besser, Daniel; Bromberg, Jacqueline F.; Darnell, James E.; Hanafusa, Hidesaburo

    1999-01-01

    The receptor tyrosine kinase Eyk, a member of the Axl/Tyro3 subfamily, activates the STAT pathway and transforms cells when constitutively activated. Here, we compared the potentials of the intracellular domains of Eyk molecules derived from c-Eyk and v-Eyk to transform rat 3Y1 fibroblasts. The v-Eyk molecule induced higher numbers of transformants in soft agar and stronger activation of Stat3; levels of Stat1 activation by the two Eyk molecules were similar. A mutation in the sequence Y933VPL, present in c-Eyk, to the v-Eyk sequence Y933VPQ led to increased activation of Stat3 and increased transformation efficiency. However, altering another sequence, Y862VNT, present in both Eyk molecules to F862VNT markedly decreased transformation without impairing Stat3 activation. These results indicate that activation of Stat3 enhances transformation efficiency and cooperates with another pathway to induce transformation. PMID:9891073

  19. Cloning and characterization of the novel D-aspartyl endopeptidase, paenidase, from Paenibacillus sp. B38.

    PubMed

    Nirasawa, Satoru; Nakahara, Kazuhiko; Takahashi, Saori

    2018-02-27

    Paenidase is the first microorganism-derived D-aspartyl endopeptidase that specifically recognizes an internal D-Asp residue to cleave [D-Asp]-X peptide bonds. Using peptide sequences obtained from the protein, we performed PCR with degenerate primers to amplify the paenidase I-encoding gene. Nucleotide sequencing revealed that mature paenidase I consists of 322 amino acid residues and that the protein is encoded as a pro-protein with a 197-amino-acid N-terminal extension compared to the mature protein. Paenidase I exhibits amino acid sequence similarity to several penicillin-binding proteins. In addition, paenidase I was classified into peptidase family S12 based on a MEROPS database search. Family S12 contains serine-type D-Ala-D-Ala carboxypeptidases that have three active site residues (Ser, Lys, and Tyr) in the conserved motifs Ser-Xaa-Thr-Lys and Tyr-Xaa-Asn. These motifs were conserved in the primary structure of paenidase I, and the role of these residues was confirmed by site-directed mutagenesis.

  20. Sequence and Analysis of the Tomato JOINTLESS Locus1

    PubMed Central

    Mao, Long; Begum, Dilara; Goff, Stephen A.; Wing, Rod A.

    2001-01-01

    A 119-kb bacterial artificial chromosome from the JOINTLESS locus on the tomato (Lycopersicon esculentum) chromosome 11 contained 15 putative genes. Repetitive sequences in this region include one copia-like LTR retrotransposon, 13 simple sequence repeats, three copies of a novel type III foldback transposon, and four putative short DNA repeats. Database searches showed that the foldback transposon and the short DNA repeats seemed to be associated preferably with genes. The predicted tomato genes were compared with the complete Arabidopsis genome. Eleven out of 15 tomato open reading frames were found to be colinear with segments on five Arabidopsis bacterial artificial chromosome/P1-derived artificial chromosome clones. The synteny patterns, however, did not reveal duplicated segments in Arabidopsis, where over half of the genome is duplicated. Our analysis indicated that the microsynteny between the tomato and Arabidopsis genomes was still conserved at a very small scale but was complicated by the large number of gene families in the Arabidopsis genome. PMID:11457984

  1. Chemical-biogeographic survey of secondary metabolism in soil.

    PubMed

    Charlop-Powers, Zachary; Owen, Jeremy G; Reddy, Boojala Vijay B; Ternei, Melinda A; Brady, Sean F

    2014-03-11

    In this study, we compare biosynthetic gene richness and diversity of 96 soil microbiomes from diverse environments found throughout the southwestern and northeastern regions of the United States. The 454-pyroseqencing of nonribosomal peptide adenylation (AD) and polyketide ketosynthase (KS) domain fragments amplified from these microbiomes provide a means to evaluate the variation of secondary metabolite biosynthetic diversity in different soil environments. Through soil composition and AD- and KS-amplicon richness analysis, we identify soil types with elevated biosynthetic potential. In general, arid soils show the richest observed biosynthetic diversity, whereas brackish sediments and pine forest soils show the least. By mapping individual environmental amplicon sequences to sequences derived from functionally characterized biosynthetic gene clusters, we identified conserved soil type-specific secondary metabolome enrichment patterns despite significant sample-to-sample sequence variation. These data are used to create chemical biogeographic distribution maps for biomedically valuable families of natural products in the environment that should prove useful for directing the discovery of bioactive natural products in the future.

  2. Genetic stability of progeny from an artificial allotetraploid carp using sperm from five fish species.

    PubMed

    Ye, Yuzhen; Wang, Zhongwei; Zhou, Jianfeng; Wu, Qingjiang

    2009-08-01

    Microsatellite markers and D-loop sequences of mtDNA from a female allotetraploid parent carp and her progenies of generations 1 and 2 induced by sperm of five distant fish species were analyzed. Eleven microsatellite markers were used to identify 48 alleles from the allotetraploid female. The same number of alleles (48) appeared in the first and second generations of the gynogenetic offspring, regardless of the source of the sperm used as an activator. The mtDNA D-loop analysis was performed on the female tetraploid parent, 25 gynogenetic offspring, and 5 sperm-donor species. Fourteen variable sites from the 1,018 bp sequences were observed in the offspring as compared to the female tetraploid parent. Results from D-loop sequence and microsatellite marker analysis showed exclusive maternal transmission, and no genetic information was derived from the father. Our study suggests that progenies of artificial tetraploid carp are genetically stable, which is important for genetic breeding of this tetraploid fish.

  3. A +1 ribosomal frameshifting motif prevalent among plant amalgaviruses.

    PubMed

    Nibert, Max L; Pyle, Jesse D; Firth, Andrew E

    2016-11-01

    Sequence accessions attributable to novel plant amalgaviruses have been found in the Transcriptome Shotgun Assembly database. Sixteen accessions, derived from 12 different plant species, appear to encompass the complete protein-coding regions of the proposed amalgaviruses, which would substantially expand the size of genus Amalgavirus from 4 current species. Other findings include evidence for UUU_CGN as a +1 ribosomal frameshifting motif prevalent among plant amalgaviruses; for a variant version of this motif found thus far in only two amalgaviruses from solanaceous plants; for a region of α-helical coiled coil propensity conserved in a central region of the ORF1 translation product of plant amalgaviruses; and for conserved sequences in a C-terminal region of the ORF2 translation product (RNA-dependent RNA polymerase) of plant amalgaviruses, seemingly beyond the region of conserved polymerase motifs. These results additionally illustrate the value of mining the TSA database and others for novel viral sequences for comparative analyses. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  4. On the sum of generalized Fibonacci sequence

    NASA Astrophysics Data System (ADS)

    Chong, Chin-Yoon; Ho, C. K.

    2014-06-01

    We consider the generalized Fibonacci sequence {Un defined by U0 = 0, U1 = 1, and Un+2 = pUn+1+qUn for all n∈Z0+ and p, q∈Z+. In this paper, we derived various sums of the generalized Fibonacci sequence from their recursive relations.

  5. Deep Sequencing Analysis of Apple Infecting Viruses in Korea

    PubMed Central

    Cho, In-Sook; Igori, Davaajargal; Lim, Seungmo; Choi, Gug-Seoun; Hammond, John; Lim, Hyoun-Sub; Moon, Jae Sun

    2016-01-01

    Deep sequencing has generated 52 contigs derived from five viruses; Apple chlorotic leaf spot virus (ACLSV), Apple stem grooving virus (ASGV), Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), and Apricot latent virus (ApLV) were identified from eight apple samples showing small leaves and/or growth retardation. Nucleotide (nt) sequence identity of the assembled contigs was from 68% to 99% compared to the reference sequences of the five respective viral genomes. Sequences of ASPV and ASGV were the most abundantly represented by the 52 contigs assembled. The presence of the five viruses in the samples was confirmed by RT-PCR using specific primers based on the sequences of each assembled contig. All five viruses were detected in three of the samples, whereas all samples had mixed infections with at least two viruses. The most frequently detected virus was ASPV, followed by ASGV, ApLV, ACLSV, and AGCaV which were withal found in mixed infections in the tested samples. AGCaV was identified in assembled contigs ID 1012480 and 93549, which showed 82% and 78% nt sequence identity with ORF1 of AGCaV isolate Aurora-1. ApLV was identified in three assembled contigs, ID 65587, 1802365, and 116777, which showed 77%, 78%, and 76% nt sequence identity respectively with ORF1 of ApLV isolate LA2. Deep sequencing assay was shown to be a valuable and powerful tool for detection and identification of known and unknown virome in infected apple trees, here identifying ApLV and AGCaV in commercial orchards in Korea for the first time. PMID:27721694

  6. MoccaDB - an integrative database for functional, comparative and diversity studies in the Rubiaceae family

    PubMed Central

    Plechakova, Olga; Tranchant-Dubreuil, Christine; Benedet, Fabrice; Couderc, Marie; Tinaut, Alexandra; Viader, Véronique; De Block, Petra; Hamon, Perla; Campa, Claudine; de Kochko, Alexandre; Hamon, Serge; Poncet, Valérie

    2009-01-01

    Background In the past few years, functional genomics information has been rapidly accumulating on Rubiaceae species and especially on those belonging to the Coffea genus (coffee trees). An increasing number of expressed sequence tag (EST) data and EST- or genomic-derived microsatellite markers have been generated, together with Conserved Ortholog Set (COS) markers. This considerably facilitates comparative genomics or map-based genetic studies through the common use of orthologous loci across different species. Similar genomic information is available for e.g. tomato or potato, members of the Solanaceae family. Since both Rubiaceae and Solanaceae belong to the Euasterids I (lamiids) integration of information on genetic markers would be possible and lead to more efficient analyses and discovery of key loci involved in important traits such as fruit development, quality, and maturation, or adaptation. Our goal was to develop a comprehensive web data source for integrated information on validated orthologous markers in Rubiaceae. Description MoccaDB is an online MySQL-PHP driven relational database that houses annotated and/or mapped microsatellite markers in Rubiaceae. In its current release, the database stores 638 markers that have been defined on 259 ESTs and 379 genomic sequences. Marker information was retrieved from 11 published works, and completed with original data on 132 microsatellite markers validated in our laboratory. DNA sequences were derived from three Coffea species/hybrids. Microsatellite markers were checked for similarity, in vitro tested for cross-amplification and diversity/polymorphism status in up to 38 Rubiaceae species belonging to the Cinchonoideae and Rubioideae subfamilies. Functional annotation was provided and some markers associated with described metabolic pathways were also integrated. Users can search the database for marker, sequence, map or diversity information through multi-option query forms. The retrieved data can be browsed and downloaded, along with protocols used, using a standard web browser. MoccaDB also integrates bioinformatics tools (CMap viewer and local BLAST) and hyperlinks to related external data sources (NCBI GenBank and PubMed, SOL Genomic Network database). Conclusion We believe that MoccaDB will be extremely useful for all researchers working in the areas of comparative and functional genomics and molecular evolution, in general, and population analysis and association mapping of Rubiaceae and Solanaceae species, in particular. PMID:19788737

  7. Sequences of multiple bacterial genomes and a Chlamydia trachomatis genotype from direct sequencing of DNA derived from a vaginal swab diagnostic specimen.

    PubMed

    Andersson, P; Klein, M; Lilliebridge, R A; Giffard, P M

    2013-09-01

    Ultra-deep Illumina sequencing was performed on whole genome amplified DNA derived from a Chlamydia trachomatis-positive vaginal swab. Alignment of reads with reference genomes allowed robust SNP identification from the C. trachomatis chromosome and plasmid. This revealed that the C. trachomatis in the specimen was very closely related to the sequenced urogenital, serovar F, clade T1 isolate F-SW4. In addition, high genome-wide coverage was obtained for Prevotella melaninogenica, Gardnerella vaginalis, Clostridiales genomosp. BVAB3 and Mycoplasma hominis. This illustrates the potential of metagenome data to provide high resolution bacterial typing data from multiple taxa in a diagnostic specimen. ©2013 The Authors Clinical Microbiology and Infection ©2013 European Society of Clinical Microbiology and Infectious Diseases.

  8. Optimized, unequal pulse spacing in multiple echo sequences improves refocusing in magnetic resonance.

    PubMed

    Jenista, Elizabeth R; Stokes, Ashley M; Branca, Rosa Tamara; Warren, Warren S

    2009-11-28

    A recent quantum computing paper (G. S. Uhrig, Phys. Rev. Lett. 98, 100504 (2007)) analytically derived optimal pulse spacings for a multiple spin echo sequence designed to remove decoherence in a two-level system coupled to a bath. The spacings in what has been called a "Uhrig dynamic decoupling (UDD) sequence" differ dramatically from the conventional, equal pulse spacing of a Carr-Purcell-Meiboom-Gill (CPMG) multiple spin echo sequence. The UDD sequence was derived for a model that is unrelated to magnetic resonance, but was recently shown theoretically to be more general. Here we show that the UDD sequence has theoretical advantages for magnetic resonance imaging of structured materials such as tissue, where diffusion in compartmentalized and microstructured environments leads to fluctuating fields on a range of different time scales. We also show experimentally, both in excised tissue and in a live mouse tumor model, that optimal UDD sequences produce different T(2)-weighted contrast than do CPMG sequences with the same number of pulses and total delay, with substantial enhancements in most regions. This permits improved characterization of low-frequency spectral density functions in a wide range of applications.

  9. Bounds on the cross-correlation functions of state m-sequences

    NASA Astrophysics Data System (ADS)

    Woodcock, C. F.; Davies, Phillip A.; Shaar, Ahmed A.

    1987-03-01

    Lower and upper bounds on the peaks of the periodic Hamming cross-correlation function for state m-sequences, which are often used in frequency-hopped spread-spectrum systems, are derived. The state position mapped (SPM) sequences of the state m-sequences are described. The use of SPM sequences for OR-channel code division multiplexing is studied. The relation between the Hamming cross-correlation function and the correlation function of SPM sequence is examined. Numerical results which support the theoretical data are presented.

  10. Palaeosymbiosis Revealed by Genomic Fossils of Wolbachia in a Strongyloidean Nematode

    PubMed Central

    Koutsovoulos, Georgios; Makepeace, Benjamin; Tanya, Vincent N.; Blaxter, Mark

    2014-01-01

    Wolbachia are common endosymbionts of terrestrial arthropods, and are also found in nematodes: the animal-parasitic filaria, and the plant-parasite Radopholus similis. Lateral transfer of Wolbachia DNA to the host genome is common. We generated a draft genome sequence for the strongyloidean nematode parasite Dictyocaulus viviparus, the cattle lungworm. In the assembly, we identified nearly 1 Mb of sequence with similarity to Wolbachia. The fragments were unlikely to derive from a live Wolbachia infection: most were short, and the genes were disabled through inactivating mutations. Many fragments were co-assembled with definitively nematode-derived sequence. We found limited evidence of expression of the Wolbachia-derived genes. The D. viviparus Wolbachia genes were most similar to filarial strains and strains from the host-promiscuous clade F. We conclude that D. viviparus was infected by Wolbachia in the past, and that clade F-like symbionts may have been the source of filarial Wolbachia infections. PMID:24901418

  11. Ultra-high field upper extremity peripheral nerve and non-contrast enhanced vascular imaging

    PubMed Central

    Raval, Shailesh B.; Britton, Cynthia A.; Zhao, Tiejun; Krishnamurthy, Narayanan; Santini, Tales; Gorantla, Vijay S.; Ibrahim, Tamer S.

    2017-01-01

    Objective The purpose of this study was to explore the efficacy of Ultra-high field [UHF] 7 Tesla [T] MRI as compared to 3T MRI in non-contrast enhanced [nCE] imaging of structural anatomy in the elbow, forearm, and hand [upper extremity]. Materials and method A wide range of sequences including T1 weighted [T1] volumetric interpolate breath-hold exam [VIBE], T2 weighted [T2] double-echo steady state [DESS], susceptibility weighted imaging [SWI], time-of-flight [TOF], diffusion tensor imaging [DTI], and diffusion spectrum imaging [DSI] were optimized and incorporated with a radiofrequency [RF] coil system composed of a transverse electromagnetic [TEM] transmit coil combined with an 8-channel receive-only array for 7T upper extremity [UE] imaging. In addition, Siemens optimized protocol/sequences were used on a 3T scanner and the resulting images from T1 VIBE and T2 DESS were compared to that obtained at 7T qualitatively and quantitatively [SWI was only qualitatively compared]. DSI studio was utilized to identify nerves based on analysis of diffusion weighted derived fractional anisotropy images. Images of forearm vasculature were extracted using a paint grow manual segmentation method based on MIPAV [Medical Image Processing, Analysis, and Visualization]. Results High resolution and high quality signal-to-noise ratio [SNR] and contrast-to-noise ratio [CNR]—images of the hand, forearm, and elbow were acquired with nearly homogeneous 7T excitation. Measured [performed on the T1 VIBE and T2 DESS sequences] SNR and CNR values were almost doubled at 7T vs. 3T. Cartilage, synovial fluid and tendon structures could be seen with higher clarity in the 7T T1 and T2 weighted images. SWI allowed high resolution and better quality imaging of large and medium sized arteries and veins, capillary networks and arteriovenous anastomoses at 7T when compared to 3T. 7T diffusion weighted sequence [not performed at 3T] demonstrates that the forearm nerves are clearly delineated by fiber tractography. The proper digital palmar arteries and superficial palmar arch could also be clearly visualized using TOF nCE 7T MRI. Conclusion Ultra-high resolution neurovascular imaging in upper extremities is possible at 7T without use of renal toxic intravenous contrast. 7T MRI can provide superior peripheral nerve [based on fiber anisotropy and diffusion coefficient parameters derived from diffusion tensor/spectrum imaging] and vascular [nCE MRA and vessel segmentation] imaging. PMID:28662061

  12. Differential gene expression in the siphonophore Nanomia bijuga (Cnidaria) assessed with multiple next-generation sequencing workflows.

    PubMed

    Siebert, Stefan; Robinson, Mark D; Tintori, Sophia C; Goetz, Freya; Helm, Rebecca R; Smith, Stephen A; Shaner, Nathan; Haddock, Steven H D; Dunn, Casey W

    2011-01-01

    We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through workflow choice and deeper reference sequencing.

  13. Differential Gene Expression in the Siphonophore Nanomia bijuga (Cnidaria) Assessed with Multiple Next-Generation Sequencing Workflows

    PubMed Central

    Siebert, Stefan; Robinson, Mark D.; Tintori, Sophia C.; Goetz, Freya; Helm, Rebecca R.; Smith, Stephen A.; Shaner, Nathan; Haddock, Steven H. D.; Dunn, Casey W.

    2011-01-01

    We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through workflow choice and deeper reference sequencing. PMID:21829563

  14. Genome sequence of a Providencia stuartii strain isolated from Lucilia sericata salivary glands

    USDA-ARS?s Scientific Manuscript database

    We present the draft genome sequence of a Providencia stuartii strain derived from salivary glands of larval Lucilia sericata; a common blow fly important to forensic, medical and veterinary science. The genome sequence will help dissect coinfections involving Providencia stuartii and Proteus mirab...

  15. From Arithmetic Sequences to Linear Equations

    ERIC Educational Resources Information Center

    Matsuura, Ryota; Harless, Patrick

    2012-01-01

    The first part of the article focuses on deriving the essential properties of arithmetic sequences by appealing to students' sense making and reasoning. The second part describes how to guide students to translate their knowledge of arithmetic sequences into an understanding of linear equations. Ryota Matsuura originally wrote these lessons for…

  16. Characterization of the gene encoding the polymorphic immunodominant molecule, a neutralizing antigen of Theileria parva

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Toye, P.G.; Metzelaar, M.J.; Wijngaard, P.L.J.

    1995-08-01

    Theileria parva, a tick-transmitted protozoan parasite related to Plasmodium spp., causes the disease East Coast fever, an acute and usually fatal lymphoproliferative disorder of cattle in Africa. Previous studies using sera from cattle that have survived infection identified a polymorphic immunodominant molecule (PIM) that is expressed by both the infective sporozoite stage of the parasite and the intracellular schizont. Here we show that mAb specific for the PIM Ag can inhibit sporozoite invasion of lymphocytes in vitro. A cDNA clone encoding the PIM Ag of the T. parva (Muguga) stock was obtained by using these mAb in a novel eukaryoticmore » expression cloning system that allows isolation of cDNA encoding cytoplasmic or surface Ags. To establish the molecular basis of the polymorphism of PIM, the cDNA of the PIM Ag from a buffalo-derived T. parva stock was isolated and its sequence was compared with that of the cattle-derived Muguga PIM. The two cDNAs showed considerable identity in both the 5{prime} and 3{prime} regions, but there was substantial sequence divergence in the central regions. Several types of repeated sequences were identified in the variant regions. In the Muguga form of the molecule, there were five tandem repeats of the tetrapeptide, QPEP, that were shown, by transfection of a deleted version of the PIM gene, not to react with several anti-PIM mAbs. By isolating and sequencing the genomic version of the gene, we identified two small introns in the 3{prime} region of the gene. Finally, we showed that polyclonal rat Abs against recombinant PIM neutralize sporozoite infectivity in vitro, suggesting that the PIM Ag should be evaluated for its capacity to immunize cattle against East Coast Fever.« less

  17. Structural details (kinks and non-α conformations) in transmembrane helices are intrahelically determined and can be predicted by sequence pattern descriptors

    PubMed Central

    Rigoutsos, Isidore; Riek, Peter; Graham, Robert M.; Novotny, Jiri

    2003-01-01

    One of the promising methods of protein structure prediction involves the use of amino acid sequence-derived patterns. Here we report on the creation of non-degenerate motif descriptors derived through data mining of training sets of residues taken from the transmembrane-spanning segments of polytopic proteins. These residues correspond to short regions in which there is a deviation from the regular α-helical character (i.e. π-helices, 310-helices and kinks). A ‘search engine’ derived from these motif descriptors correctly identifies, and discriminates amongst instances of the above ‘non-canonical’ helical motifs contained in the SwissProt/TrEMBL database of protein primary structures. Our results suggest that deviations from α-helicity are encoded locally in sequence patterns only about 7–9 residues long and can be determined in silico directly from the amino acid sequence. Delineation of such variations in helical habit is critical to understanding the complex structure–function relationships of polytopic proteins and for drug discovery. The success of our current methodology foretells development of similar prediction tools capable of identifying other structural motifs from sequence alone. The method described here has been implemented and is available on the World Wide Web at http://cbcsrv.watson.ibm.com/Ttkw.html. PMID:12888523

  18. Structural details (kinks and non-alpha conformations) in transmembrane helices are intrahelically determined and can be predicted by sequence pattern descriptors.

    PubMed

    Rigoutsos, Isidore; Riek, Peter; Graham, Robert M; Novotny, Jiri

    2003-08-01

    One of the promising methods of protein structure prediction involves the use of amino acid sequence-derived patterns. Here we report on the creation of non-degenerate motif descriptors derived through data mining of training sets of residues taken from the transmembrane-spanning segments of polytopic proteins. These residues correspond to short regions in which there is a deviation from the regular alpha-helical character (i.e. pi-helices, 3(10)-helices and kinks). A 'search engine' derived from these motif descriptors correctly identifies, and discriminates amongst instances of the above 'non-canonical' helical motifs contained in the SwissProt/TrEMBL database of protein primary structures. Our results suggest that deviations from alpha-helicity are encoded locally in sequence patterns only about 7-9 residues long and can be determined in silico directly from the amino acid sequence. Delineation of such variations in helical habit is critical to understanding the complex structure-function relationships of polytopic proteins and for drug discovery. The success of our current methodology foretells development of similar prediction tools capable of identifying other structural motifs from sequence alone. The method described here has been implemented and is available on the World Wide Web at http://cbcsrv.watson.ibm.com/Ttkw.html.

  19. SHARAKU: an algorithm for aligning and clustering read mapping profiles of deep sequencing in non-coding RNA processing.

    PubMed

    Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi

    2016-06-15

    Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5'-end processing and 3'-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA004502. yasu@bio.keio.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  20. The Secrets of the Nearest Starburst Cluster. I. Very Large Telescope/ISAAC Photometry of NGC 3603

    NASA Astrophysics Data System (ADS)

    Stolte, Andrea; Brandner, Wolfgang; Brandl, Bernhard; Zinnecker, Hans; Grebel, Eva K.

    2004-08-01

    VLT/ISAAC JHKL photometry with subarcsecond resolution of the dense, massive starburst cluster NGC 3603 YC forming the core of the NGC 3603 giant molecular cloud is analyzed to reveal characteristics of the stellar population in unprecedented detail. The color-magnitude plane features a strong pre-main-sequence/main-sequence (PMS/MS) transition region, including the PMS/MS transition point, and reveals a secondary sequence for the first time in a nearby young starburst cluster. Arguments for a possible binary nature of this sequence are given. The resolved PMS/MS transition region allows isochrone fitting below the hydrogen-burning turn-on in NGC 3603 YC, yielding an independent estimate of global cluster parameters. A distance modulus of 13.9 mag, equivalent to d=6.0+/-0.3 kpc, is derived, as well as a line-of-sight extinction of AV=4.5+/-0.6 toward PMS stars in the cluster center. The interpretation of a binary candidate sequence suggests a single age of 1 Myr for NGC 3603 YC, providing evidence for a single burst of star formation without the need to employ an age spread in the PMS population, as argued for in earlier studies. Disk fractions are derived from L-band excesses, indicating a radial increase in the disk frequency from 20% to 40% from the core to the cluster outskirts. The low disk fraction in the cluster core, as compared to the 42% L-band excess fraction found for massive stars in the Trapezium cluster of a comparably young age, indicates strong photoevaporation in the cluster center. The estimated binary fraction of 30%, as well as the low disk fraction, suggest strong impacts on low-mass star formation due to stellar interactions in the dense starburst. The significant differences between NGC 3603 YC and less dense and massive young star clusters in the Milky Way reveal the importance of using local starbursts as templates for massive extragalactic star formation. Based on observations obtained at the ESO VLT on Paranal, Chile, under programs 63.I-0015 and 65.I-0135, and data from the public VLT archive provided by ESO, as well as observations made with the NASA/ESA Hubble Space Telescope, obtained at the Space Telescope Science Institute, which is operated by the Association of Universities for Research in Astronomy, Inc. under NASA contract NAS5-26555.

  1. Construction of a small Mus musculus repetitive DNA library: identification of a new satellite sequence in Mus musculus.

    PubMed Central

    Pietras, D F; Bennett, K L; Siracusa, L D; Woodworth-Gutai, M; Chapman, V M; Gross, K W; Kane-Haas, C; Hastie, N D

    1983-01-01

    We report the construction of a small library of recombinant plasmids containing Mus musculus repetitive DNA inserts. The repetitive cloned fraction was derived from denatured genomic DNA by reassociation to a Cot value at which repetitive, but not unique, sequences have reannealed followed by exhaustive S1 nuclease treatment to degrade single stranded DNA. Initial characterizations of this library by colony filter hybridizations have led to the identification of a previously undetected M. musculus minor satellite as well as to clones containing M. musculus major satellite sequences. This new satellite is repeated 10-20 times less than the major satellite in the M. musculus genome. It has a repeat length of 130 nucleotides compared with the M. musculus major satellite with a repeat length of 234 nucleotides. Sequence analysis of the minor satellite has shown that it has a 29 base pair region with extensive homology to one of the major satellite repeating subunits. We also show by in situ hybridization that this minor satellite sequence is located at the centromeres and possibly the arms of at least half the M musculus chromosomes. Sequences related to the minor satellite have been found in the DNA of a related Mus species, Mus spretus, and may represent the major satellite of that species. Images PMID:6314268

  2. Genomic analysis of NAC transcription factors in banana (Musa acuminata) and definition of NAC orthologous groups for monocots and dicots.

    PubMed

    Cenci, Albero; Guignon, Valentin; Roux, Nicolas; Rouard, Mathieu

    2014-05-01

    Identifying the molecular mechanisms underlying tolerance to abiotic stresses is important in crop breeding. A comprehensive understanding of the gene families associated with drought tolerance is therefore highly relevant. NAC transcription factors form a large plant-specific gene family involved in the regulation of tissue development and responses to biotic and abiotic stresses. The main goal of this study was to set up a framework of orthologous groups determined by an expert sequence comparison of NAC genes from both monocots and dicots. In order to clarify the orthologous relationships among NAC genes of different species, we performed an in-depth comparative study of four divergent taxa, in dicots and monocots, whose genomes have already been completely sequenced: Arabidopsis thaliana, Vitis vinifera, Musa acuminata and Oryza sativa. Due to independent evolution, NAC copy number is highly variable in these plant genomes. Based on an expert NAC sequence comparison, we propose forty orthologous groups of NAC sequences that were probably derived from an ancestor gene present in the most recent common ancestor of dicots and monocots. These orthologous groups provide a curated resource for large-scale protein sequence annotation of NAC transcription factors. The established orthology relationships also provide a useful reference for NAC function studies in newly sequenced genomes such as M. acuminata and other plant species.

  3. Recovering complete mitochondrial genome sequences from RNA-Seq: A case study of Polytomella non-photosynthetic green algae.

    PubMed

    Tian, Yao; Smith, David Roy

    2016-05-01

    Thousands of mitochondrial genomes have been sequenced, but there are comparatively few available mitochondrial transcriptomes. This might soon be changing. High-throughput RNA sequencing (RNA-Seq) techniques have made it fast and cheap to generate massive amounts of mitochondrial transcriptomic data. Here, we explore the utility of RNA-Seq for assembling mitochondrial genomes and studying their expression patterns. Specifically, we investigate the mitochondrial transcriptomes from Polytomella non-photosynthetic green algae, which have among the smallest, most reduced mitochondrial genomes from the Archaeplastida as well as fragmented rRNA-coding regions, palindromic genes, and linear chromosomes with telomeres. Isolation of whole genomic RNA from the four known Polytomella species followed by Illumina paired-end sequencing generated enough mitochondrial-derived reads to easily recover almost-entire mitochondrial genome sequences. Read-mapping and coverage statistics also gave insights into Polytomella mitochondrial transcriptional architecture, revealing polycistronic transcripts and the expression of telomeres and palindromic genes. Ultimately, RNA-Seq is a promising, cost-effective technique for studying mitochondrial genetics, but it does have drawbacks, which are discussed. One of its greatest potentials, as shown here, is that it can be used to generate near-complete mitochondrial genome sequences, which could be particularly useful in situations where there is a lack of available mtDNA data. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  4. Characterization of the repetitive DNA elements in the genome of fish lymphocystis disease viruses.

    PubMed

    Schnitzler, P; Darai, G

    1989-09-01

    The complete DNA nucleotide sequence of the repetitive DNA elements in the genome of fish lymphocystis disease virus (FLDV) isolated from two different species (flounder and dab) was determined. The size of these repetitive DNA elements was found to be 1413 bp which corresponds to the DNA sequences of the 5' terminus of the EcoRI DNA fragment B (0.034 to 0.052 m.u.) and to the EcoRI DNA fragment M (0.718 to 0.736 m.u.) of the FLDV genome causing lymphocystis disease in flounder and plaice. The degree of DNA nucleotide homology between both regions was found to be 99%. The repetitive DNA element in the genome of FLDV isolated from other fish species (dab) was identified and is located within the EcoRI DNA fragment B and J of the viral genome. The DNA nucleotide sequence of one duplicate of this repetition (EcoRI DNA fragment J) was determined (1410 bp) and compared to the DNA nucleotide sequences of the repetitive DNA elements of the genome of FLDV isolated from flounder. It was found that the repetitive DNA elements of the genome of FLDV derived from two different fish species are highly conserved and possess a degree of DNA sequence homology of 94%. The DNA sequences of each strand of the individual repetitive element possess one open reading frame.

  5. PANGEA: pipeline for analysis of next generation amplicons

    PubMed Central

    Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz FW; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W

    2010-01-01

    High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including preprocessing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the χ2 step, are joined into one program called the ‘backbone’. PMID:20182525

  6. PANGEA: pipeline for analysis of next generation amplicons.

    PubMed

    Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz F W; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W

    2010-07-01

    High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including pre-processing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the chi(2) step, are joined into one program called the 'backbone'.

  7. Amyloid fibril formation from sequences of a natural beta-structured fibrous protein, the adenovirus fiber.

    PubMed

    Papanikolopoulou, Katerina; Schoehn, Guy; Forge, Vincent; Forsyth, V Trevor; Riekel, Christian; Hernandez, Jean-François; Ruigrok, Rob W H; Mitraki, Anna

    2005-01-28

    Amyloid fibrils are fibrous beta-structures that derive from abnormal folding and assembly of peptides and proteins. Despite a wealth of structural studies on amyloids, the nature of the amyloid structure remains elusive; possible connections to natural, beta-structured fibrous motifs have been suggested. In this work we focus on understanding amyloid structure and formation from sequences of a natural, beta-structured fibrous protein. We show that short peptides (25 to 6 amino acids) corresponding to repetitive sequences from the adenovirus fiber shaft have an intrinsic capacity to form amyloid fibrils as judged by electron microscopy, Congo Red binding, infrared spectroscopy, and x-ray fiber diffraction. In the presence of the globular C-terminal domain of the protein that acts as a trimerization motif, the shaft sequences adopt a triple-stranded, beta-fibrous motif. We discuss the possible structure and arrangement of these sequences within the amyloid fibril, as compared with the one adopted within the native structure. A 6-amino acid peptide, corresponding to the last beta-strand of the shaft, was found to be sufficient to form amyloid fibrils. Structural analysis of these amyloid fibrils suggests that perpendicular stacking of beta-strand repeat units is an underlying common feature of amyloid formation.

  8. De novo assembly and characterization of the garlic (Allium sativum) bud transcriptome by Illumina sequencing.

    PubMed

    Sun, Xiudong; Zhou, Shumei; Meng, Fanlu; Liu, Shiqi

    2012-10-01

    Garlic is widely used as a spice throughout the world for the culinary value of its flavor and aroma, which are created by the chemical transformation of a series of organic sulfur compounds. To analyze the transcriptome of Allium sativum and discover the genes involved in sulfur metabolism, cDNAs derived from the total RNA of Allium sativum buds were analyzed by Illumina sequencing. Approximately 26.67 million 90 bp paired-end clean reads were achieved in two libraries. A total of 127,933 unigenes were generated by de novo assembly and were compared with the sequences in public databases. Of these, 45,286 unigenes had significant hits to the sequences in the Nr database, 29,514 showed significant similarity to known proteins in the Swiss-Prot database and, 20,706 and 21,952 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Moreover, genes involved in organic sulfur biosynthesis were identified. These unigenes data will provide the foundation for research on gene expression, genomics and functional genomics in Allium sativum. Key message The obtained unigenes will provide the foundation for research on functional genomics in Allium sativum and its closely related species, and fill the gap of the existing plant EST database.

  9. Effect of hot acid hydrolysis and hot chlorine dioxide stage on bleaching effluent biodegradability.

    PubMed

    Gomes, C M; Colodette, J L; Delantonio, N R N; Mounteer, A H; Silva, C M

    2007-01-01

    The hot acid hydrolysis followed by chlorine dioxide (A/D*) and hot chlorine dioxide (D*) technologies have proven very useful for bleaching of eucalyptus kraft pulp. Although the characteristics and biodegradability of effluents from conventional chlorine dioxide bleaching are well known, such information is not yet available for effluents derived from hot acid hydrolysis and hot chorine dioxide bleaching. This study discusses the characteristics and biodegradability of such effluents. Combined whole effluents from the complete sequences DEpD, D*EpD, A/D*EpD and ADEpD, and from the pre-bleaching sequences DEp, D*Ep, A/D*Ep and ADEp were characterized by quantifying their colour, AOX and organic load (BOD, COD, TOC). These effluents were also evaluated for their treatability by simulation of an activated sludge system. It was concluded that treatment in the laboratory sequencing batch reactor was efficient for removal of COD, BOD and TOC of all effluents. However, colour increased after biological treatment, with the greatest increase found for the effluent produced using the AD technology. Biological treatment was less efficient at removing AOX of effluents from the sequences with D*, A/D* and AD as the first stages, when compared to the reference D stage; there was evidence of the lower treatability of these organochlorine compounds from these sequences.

  10. Mapping-by-sequencing in complex polyploid genomes using genic sequence capture: a case study to map yellow rust resistance in hexaploid wheat.

    PubMed

    Gardiner, Laura-Jayne; Bansept-Basler, Pauline; Olohan, Lisa; Joynson, Ryan; Brenchley, Rachel; Hall, Neil; O'Sullivan, Donal M; Hall, Anthony

    2016-08-01

    Previously we extended the utility of mapping-by-sequencing by combining it with sequence capture and mapping sequence data to pseudo-chromosomes that were organized using wheat-Brachypodium synteny. This, with a bespoke haplotyping algorithm, enabled us to map the flowering time locus in the diploid wheat Triticum monococcum L. identifying a set of deleted genes (Gardiner et al., 2014). Here, we develop this combination of gene enrichment and sliding window mapping-by-synteny analysis to map the Yr6 locus for yellow stripe rust resistance in hexaploid wheat. A 110 MB NimbleGen capture probe set was used to enrich and sequence a doubled haploid mapping population of hexaploid wheat derived from an Avalon and Cadenza cross. The Yr6 locus was identified by mapping to the POPSEQ chromosomal pseudomolecules using a bespoke pipeline and algorithm (Chapman et al., 2015). Furthermore the same locus was identified using newly developed pseudo-chromosome sequences as a mapping reference that are based on the genic sequence used for sequence enrichment. The pseudo-chromosomes allow us to demonstrate the application of mapping-by-sequencing to even poorly defined polyploidy genomes where chromosomes are incomplete and sub-genome assemblies are collapsed. This analysis uniquely enabled us to: compare wheat genome annotations; identify the Yr6 locus - defining a smaller genic region than was previously possible; associate the interval with one wheat sub-genome and increase the density of SNP markers associated. Finally, we built the pipeline in iPlant, making it a user-friendly community resource for phenotype mapping. © 2016 The Authors. The Plant Journal published by Society for Experimental Biology and John Wiley & Sons Ltd.

  11. Sequencing of a QTL-rich region of the Theobroma cacao genome using pooled BACs and the identification of trait specific candidate genes

    PubMed Central

    2011-01-01

    Background BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library. Results This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight. Conclusions Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed. PMID:21794110

  12. Sequencing of a QTL-rich region of the Theobroma cacao genome using pooled BACs and the identification of trait specific candidate genes.

    PubMed

    Feltus, Frank A; Saski, Christopher A; Mockaitis, Keithanne; Haiminen, Niina; Parida, Laxmi; Smith, Zachary; Ford, James; Staton, Margaret E; Ficklin, Stephen P; Blackmon, Barbara P; Cheng, Chun-Huai; Schnell, Raymond J; Kuhn, David N; Motamayor, Juan-Carlos

    2011-07-27

    BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library. This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight. Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed.

  13. Construction of an ultra-high density consensus genetic map, and enhancement of the physical map from genome sequencing in Lupinus angustifolius.

    PubMed

    Zhou, Gaofeng; Jian, Jianbo; Wang, Penghao; Li, Chengdao; Tao, Ye; Li, Xuan; Renshaw, Daniel; Clements, Jonathan; Sweetingham, Mark; Yang, Huaan

    2018-01-01

    An ultra-high density genetic map containing 34,574 sequence-defined markers was developed in Lupinus angustifolius. Markers closely linked to nine genes of agronomic traits were identified. A physical map was improved to cover 560.5 Mb genome sequence. Lupin (Lupinus angustifolius L.) is a recently domesticated legume grain crop. In this study, we applied the restriction-site associated DNA sequencing (RADseq) method to genotype an F 9 recombinant inbred line population derived from a wild type × domesticated cultivar (W × D) cross. A high density linkage map was developed based on the W × D population. By integrating sequence-defined DNA markers reported in previous mapping studies, we established an ultra-high density consensus genetic map, which contains 34,574 markers consisting of 3508 loci covering 2399 cM on 20 linkage groups. The largest gap in the entire consensus map was 4.73 cM. The high density W × D map and the consensus map were used to develop an improved physical map, which covered 560.5 Mb of genome sequence data. The ultra-high density consensus linkage map, the improved physical map and the markers linked to genes of breeding interest reported in this study provide a common tool for genome sequence assembly, structural genomics, comparative genomics, functional genomics, QTL mapping, and molecular plant breeding in lupin.

  14. Mitochondrial genome sequencing helps show the evolutionary mechanism of mitochondrial genome formation in Brassica

    PubMed Central

    2011-01-01

    Background Angiosperm mitochondrial genomes are more complex than those of other organisms. Analyses of the mitochondrial genome sequences of at least 11 angiosperm species have showed several common properties; these cannot easily explain, however, how the diverse mitotypes evolved within each genus or species. We analyzed the evolutionary relationships of Brassica mitotypes by sequencing. Results We sequenced the mitotypes of cam (Brassica rapa), ole (B. oleracea), jun (B. juncea), and car (B. carinata) and analyzed them together with two previously sequenced mitotypes of B. napus (pol and nap). The sizes of whole single circular genomes of cam, jun, ole, and car are 219,747 bp, 219,766 bp, 360,271 bp, and 232,241 bp, respectively. The mitochondrial genome of ole is largest as a resulting of the duplication of a 141.8 kb segment. The jun mitotype is the result of an inherited cam mitotype, and pol is also derived from the cam mitotype with evolutionary modifications. Genes with known functions are conserved in all mitotypes, but clear variation in open reading frames (ORFs) with unknown functions among the six mitotypes was observed. Sequence relationship analysis showed that there has been genome compaction and inheritance in the course of Brassica mitotype evolution. Conclusions We have sequenced four Brassica mitotypes, compared six Brassica mitotypes and suggested a mechanism for mitochondrial genome formation in Brassica, including evolutionary events such as inheritance, duplication, rearrangement, genome compaction, and mutation. PMID:21988783

  15. StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zemla, A; Lang, D; Kostova, T

    2010-11-29

    Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory - still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could overcome these difficulties and facilitatemore » the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV, a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus and demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique or that shared structural similarity with structures that are distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position.« less

  16. Characterizing DNA preservation in degraded specimens of Amara alpina (Carabidae: Coleoptera).

    PubMed

    Heintzman, Peter D; Elias, Scott A; Moore, Karen; Paszkiewicz, Konrad; Barnes, Ian

    2014-05-01

    DNA preserved in degraded beetle (Coleoptera) specimens, including those derived from dry-stored museum and ancient permafrost-preserved environments, could provide a valuable resource for researchers interested in species and population histories over timescales from decades to millenia. However, the potential of these samples as genetic resources is currently unassessed. Here, using Sanger and Illumina shotgun sequence data, we explored DNA preservation in specimens of the ground beetle Amara alpina, from both museum and ancient environments. Nearly all museum specimens had amplifiable DNA, with the maximum amplifiable fragment length decreasing with age. Amplification of DNA was only possible in 45% of ancient specimens. Preserved mitochondrial DNA fragments were significantly longer than those of nuclear DNA in both museum and ancient specimens. Metagenomic characterization of extracted DNA demonstrated that parasite-derived sequences, including Wolbachia and Spiroplasma, are recoverable from museum beetle specimens. Ancient DNA extracts contained beetle DNA in amounts comparable to museum specimens. Overall, our data demonstrate that there is great potential for both museum and ancient specimens of beetles in future genetic studies, and we see no reason why this would not be the case for other orders of insect. © 2013 John Wiley & Sons Ltd.

  17. Genome-wide analysis of esterase-like genes in the striped rice stem borer, Chilo suppressalis.

    PubMed

    Wang, Baoju; Wang, Ying; Zhang, Yang; Han, Ping; Li, Fei; Han, Zhaojun

    2015-06-01

    The striped rice stem borer, Chilo suppressalis, a destructive pest of rice, has developed high levels of resistance to certain insecticides. Esterases are reported to be involved in insecticide resistance in several insects. Therefore, this study systematically analyzed esterase-like genes in C. suppressalis. Fifty-one esterase-like genes were identified in the draft genomic sequences of the species, and 20 cDNA sequences were derived which encoded full- or nearly full-length proteins. The putative esterase proteins derived from these full-length genes are overall highly diversified. However, key residues that are functionally important including the serine residue in the active site are conserved in 18 out of the 20 proteins. Phylogenetic analysis revealed that most of these genes have homologues in other lepidoptera insects. Genes CsuEst6, CsuEst10, CsuEst11, and CsuEst51 were induced by the insecticide triazophos, and genes CsuEst9, CsuEst11, CsuEst14, and CsuEst51 were induced by the insecticide chlorantraniliprole. Our results provide a foundation for future studies of insecticide resistance in C. suppressalis and for comparative research with esterase genes from other insect species.

  18. Improve the prediction of RNA-binding residues using structural neighbours.

    PubMed

    Li, Quan; Cao, Zanxia; Liu, Haiyan

    2010-03-01

    The interactions between RNA-binding proteins (RBPs) with RNA play key roles in managing some of the cell's basic functions. The identification and prediction of RNA binding sites is important for understanding the RNA-binding mechanism. Computational approaches are being developed to predict RNA-binding residues based on the sequence- or structure-derived features. To achieve higher prediction accuracy, improvements on current prediction methods are necessary. We identified that the structural neighbors of RNA-binding and non-RNA-binding residues have different amino acid compositions. Combining this structure-derived feature with evolutionary (PSSM) and other structural information (secondary structure and solvent accessibility) significantly improves the predictions over existing methods. Using a multiple linear regression approach and 6-fold cross validation, our best model can achieve an overall correct rate of 87.8% and MCC of 0.47, with a specificity of 93.4%, correctly predict 52.4% of the RNA-binding residues for a dataset containing 107 non-homologous RNA-binding proteins. Compared with existing methods, including the amino acid compositions of structure neighbors lead to clearly improvement. A web server was developed for predicting RNA binding residues in a protein sequence (or structure),which is available at http://mcgill.3322.org/RNA/.

  19. Terminal Restriction Fragment Length Polymorphism Analysis Program, a Web-Based Research Tool for Microbial Community Analysis

    PubMed Central

    Marsh, Terence L.; Saxman, Paul; Cole, James; Tiedje, James

    2000-01-01

    Rapid analysis of microbial communities has proven to be a difficult task. This is due, in part, to both the tremendous diversity of the microbial world and the high complexity of many microbial communities. Several techniques for community analysis have emerged over the past decade, and most take advantage of the molecular phylogeny derived from 16S rRNA comparative sequence analysis. We describe a web-based research tool located at the Ribosomal Database Project web site (http://www.cme.msu.edu/RDP/html/analyses.html) that facilitates microbial community analysis using terminal restriction fragment length polymorphism of 16S ribosomal DNA. The analysis function (designated TAP T-RFLP) permits the user to perform in silico restriction digestions of the entire 16S sequence database and derive terminal restriction fragment sizes, measured in base pairs, from the 5′ terminus of the user-specified primer to the 3′ terminus of the restriction endonuclease target site. The output can be sorted and viewed either phylogenetically or by size. It is anticipated that the site will guide experimental design as well as provide insight into interpreting results of community analysis with terminal restriction fragment length polymorphisms. PMID:10919828

  20. Circulation of Endemic Type 2 Vaccine-Derived Poliovirus in Egypt from 1983 to 1993

    PubMed Central

    Yang, Chen-Fu; Naguib, Tary; Yang, Su-Ju; Nasr, Eman; Jorba, Jaume; Ahmed, Nahed; Campagnoli, Ray; van der Avoort, Harrie; Shimizu, Hiroyuki; Yoneyama, Tetsuo; Miyamura, Tatsuo; Pallansch, Mark; Kew, Olen

    2003-01-01

    From 1988 to 1993, 30 cases of poliomyelitis associated with poliovirus type 2 were found in seven governorates of Egypt. Because many of the cases were geographically and temporally clustered and because the case isolates differed antigenically from the vaccine strain, it was initially assumed that the cases signaled the continued circulation of wild type 2 poliovirus. However, comparison of sequences encoding the major capsid protein, VP1 (903 nucleotides), revealed that the isolates were related (93 to 97% nucleotide sequence identity) to the Sabin type 2 oral poliovirus vaccine (OPV) strain and unrelated (<82% nucleotide sequence identity) to the wild type 2 polioviruses previously indigenous to Egypt (last known isolate: 1979) or to any contemporary wild type 2 polioviruses found elsewhere. The rate and pattern of VP1 divergence among the circulating vaccine-derived poliovirus (cVDPV) isolates suggested that all lineages were derived from a single OPV infection that occurred around 1983 and that progeny from the initiating infection circulated for approximately a decade within Egypt along several independent chains of transmission. Complete genomic sequences of an early (1988) and a late (1993) cVDPV isolate revealed that their 5′ untranslated region (5′ UTR) and noncapsid- 3′ UTR sequences were derived from other species C enteroviruses. Circulation of type 2 cVDPVs occurred at a time of low OPV coverage in the affected communities and ceased when OPV coverage rates increased. The potential for cVDPVs to circulate in populations with low immunity to poliovirus has important implications for current and future strategies to eradicate polio worldwide. PMID:12857906

  1. A small and efficient dimerization/packaging signal of rat VL30 RNA and its use in murine leukemia virus-VL30-derived vectors for gene transfer.

    PubMed Central

    Torrent, C; Gabus, C; Darlix, J L

    1994-01-01

    Retroviral genomes consist of two identical RNA molecules associated at their 5' ends by the dimer linkage structure located in the packaging element (Psi or E) necessary for RNA dimerization in vitro and packaging in vivo. In murine leukemia virus (MLV)-derived vectors designed for gene transfer, the Psi + sequence of 600 nucleotides directs the packaging of recombinant RNAs into MLV virions produced by helper cells. By using in vitro RNA dimerization as a screening system, a sequence of rat VL30 RNA located next to the 5' end of the Harvey mouse sarcoma virus genome and as small as 67 nucleotides was found to form stable dimeric RNA. In addition, a purine-rich sequence located at the 5' end of this VL30 RNA seems to be critical for RNA dimerization. When this VL30 element was extended by 107 nucleotides at its 3' end and inserted into an MLV-derived vector lacking MLV Psi +, it directed the efficient encapsidation of recombinant RNAs into MLV virions. Because this VL30 packaging signal is smaller and more efficient in packaging recombinant RNAs than the MLV Psi + and does not contain gag or glyco-gag coding sequences, its use in MLV-derived vectors should render even more unlikely recombinations which could generate replication-competent viruses. Therefore, utilization of the rat VL30 packaging sequence should improve the biological safety of MLV vectors for human gene transfer. Images PMID:8289369

  2. Circulation of endemic type 2 vaccine-derived poliovirus in Egypt from 1983 to 1993.

    PubMed

    Yang, Chen-Fu; Naguib, Tary; Yang, Su-Ju; Nasr, Eman; Jorba, Jaume; Ahmed, Nahed; Campagnoli, Ray; van der Avoort, Harrie; Shimizu, Hiroyuki; Yoneyama, Tetsuo; Miyamura, Tatsuo; Pallansch, Mark; Kew, Olen

    2003-08-01

    From 1988 to 1993, 30 cases of poliomyelitis associated with poliovirus type 2 were found in seven governorates of Egypt. Because many of the cases were geographically and temporally clustered and because the case isolates differed antigenically from the vaccine strain, it was initially assumed that the cases signaled the continued circulation of wild type 2 poliovirus. However, comparison of sequences encoding the major capsid protein, VP1 (903 nucleotides), revealed that the isolates were related (93 to 97% nucleotide sequence identity) to the Sabin type 2 oral poliovirus vaccine (OPV) strain and unrelated (<82% nucleotide sequence identity) to the wild type 2 polioviruses previously indigenous to Egypt (last known isolate: 1979) or to any contemporary wild type 2 polioviruses found elsewhere. The rate and pattern of VP1 divergence among the circulating vaccine-derived poliovirus (cVDPV) isolates suggested that all lineages were derived from a single OPV infection that occurred around 1983 and that progeny from the initiating infection circulated for approximately a decade within Egypt along several independent chains of transmission. Complete genomic sequences of an early (1988) and a late (1993) cVDPV isolate revealed that their 5' untranslated region (5' UTR) and noncapsid- 3' UTR sequences were derived from other species C enteroviruses. Circulation of type 2 cVDPVs occurred at a time of low OPV coverage in the affected communities and ceased when OPV coverage rates increased. The potential for cVDPVs to circulate in populations with low immunity to poliovirus has important implications for current and future strategies to eradicate polio worldwide.

  3. Parameterizing time in electronic health record studies.

    PubMed

    Hripcsak, George; Albers, David J; Perotte, Adler

    2015-07-01

    Fields like nonlinear physics offer methods for analyzing time series, but many methods require that the time series be stationary-no change in properties over time.Objective Medicine is far from stationary, but the challenge may be able to be ameliorated by reparameterizing time because clinicians tend to measure patients more frequently when they are ill and are more likely to vary. We compared time parameterizations, measuring variability of rate of change and magnitude of change, and looking for homogeneity of bins of temporal separation between pairs of time points. We studied four common laboratory tests drawn from 25 years of electronic health records on 4 million patients. We found that sequence time-that is, simply counting the number of measurements from some start-produced more stationary time series, better explained the variation in values, and had more homogeneous bins than either traditional clock time or a recently proposed intermediate parameterization. Sequence time produced more accurate predictions in a single Gaussian process model experiment. Of the three parameterizations, sequence time appeared to produce the most stationary series, possibly because clinicians adjust their sampling to the acuity of the patient. Parameterizing by sequence time may be applicable to association and clustering experiments on electronic health record data. A limitation of this study is that laboratory data were derived from only one institution. Sequence time appears to be an important potential parameterization. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work properly cited. For commercial re-use, please contact journals.permissions@oup.com.

  4. Cell type-specific termination of transcription by transposable element sequences.

    PubMed

    Conley, Andrew B; Jordan, I King

    2012-09-30

    Transposable elements (TEs) encode sequences necessary for their own transposition, including signals required for the termination of transcription. TE sequences within the introns of human genes show an antisense orientation bias, which has been proposed to reflect selection against TE sequences in the sense orientation owing to their ability to terminate the transcription of host gene transcripts. While there is evidence in support of this model for some elements, the extent to which TE sequences actually terminate transcription of human gene across the genome remains an open question. Using high-throughput sequencing data, we have characterized over 9,000 distinct TE-derived sequences that provide transcription termination sites for 5,747 human genes across eight different cell types. Rarefaction curve analysis suggests that there may be twice as many TE-derived termination sites (TE-TTS) genome-wide among all human cell types. The local chromatin environment for these TE-TTS is similar to that seen for 3' UTR canonical TTS and distinct from the chromatin environment of other intragenic TE sequences. However, those TE-TTS located within the introns of human genes were found to be far more cell type-specific than the canonical TTS. TE-TTS were much more likely to be found in the sense orientation than other intragenic TE sequences of the same TE family and TE-TTS in the sense orientation terminate transcription more efficiently than those found in the antisense orientation. Alu sequences were found to provide a large number of relatively weak TTS, whereas LTR elements provided a smaller number of much stronger TTS. TE sequences provide numerous termination sites to human genes, and TE-derived TTS are particularly cell type-specific. Thus, TE sequences provide a powerful mechanism for the diversification of transcriptional profiles between cell types and among evolutionary lineages, since most TE-TTS are evolutionarily young. The extent of transcription termination by TEs seen here, along with the preference for sense-oriented TE insertions to provide TTS, is consistent with the observed antisense orientation bias of human TEs.

  5. A phylogeny of the bears (Ursidae) inferred from complete sequences of three mitochondrial genes.

    PubMed

    Talbot, S L; Shields, G F

    1996-06-01

    Complete sequences of DNA are described for the cytochrome b tRNA(Thr) and tRNA(Pro) genes of mitochondria of four extant species of ursids and compared to sequences of four other species of ursids previously studied by us. Phylogenetic analyses indicate that the giant panda and the spectacled bear are the basal taxa of the ursid radiation. The ursines, a group which includes the sun bear, sloth bear, American black bear, Asiatic black bear, brown bear, and polar bear, experienced a rapid radiation during the mid Pliocene to early Pleistocene. The Asiatic black bear and American black bear are sister taxa. The brown bear and polar bear are the most recently derived of the ursines, with the polar bear originating from within a clade of brown bears during the Pleistocene. This paraphyletic association suggests that the rate of morphological evolution may be accelerated relative to that of molecular evolution when a new ecological niche is occupied. Calibration of the corrected average number of nucleotide differences per site with the fossil record indicates that transitions at third positions of codons in the ursid cytochrome b gene occur at a rate of approximately 6% per million years, which is considerably slower than comparable values reported for other species of mammal.

  6. A simple and novel method for RNA-seq library preparation of single cell cDNA analysis by hyperactive Tn5 transposase.

    PubMed

    Brouilette, Scott; Kuersten, Scott; Mein, Charles; Bozek, Monika; Terry, Anna; Dias, Kerith-Rae; Bhaw-Rosun, Leena; Shintani, Yasunori; Coppen, Steven; Ikebe, Chiho; Sawhney, Vinit; Campbell, Niall; Kaneko, Masahiro; Tano, Nobuko; Ishida, Hidekazu; Suzuki, Ken; Yashiro, Kenta

    2012-10-01

    Deep sequencing of single cell-derived cDNAs offers novel insights into oncogenesis and embryogenesis. However, traditional library preparation for RNA-seq analysis requires multiple steps with consequent sample loss and stochastic variation at each step significantly affecting output. Thus, a simpler and better protocol is desirable. The recently developed hyperactive Tn5-mediated library preparation, which brings high quality libraries, is likely one of the solutions. Here, we tested the applicability of hyperactive Tn5-mediated library preparation to deep sequencing of single cell cDNA, optimized the protocol, and compared it with the conventional method based on sonication. This new technique does not require any expensive or special equipment, which secures wider availability. A library was constructed from only 100 ng of cDNA, which enables the saving of precious specimens. Only a few steps of robust enzymatic reaction resulted in saved time, enabling more specimens to be prepared at once, and with a more reproducible size distribution among the different specimens. The obtained RNA-seq results were comparable to the conventional method. Thus, this Tn5-mediated preparation is applicable for anyone who aims to carry out deep sequencing for single cell cDNAs. Copyright © 2012 Wiley Periodicals, Inc.

  7. The role of the hippocampus in transitive inference

    PubMed Central

    Zalesak, Martin; Heckers, Stephan

    2009-01-01

    Transitive inference (TI) is the ability to infer the relationship between items (e.g., A>C) after having learned a set of premise pairs (e.g., A>B and B>C). Previous studies in humans have identified a distributed neural network, including cortex, hippocampus, and thalamus, during TI judgments. We studied two aspects of TI using fMRI of subjects who had acquired the 6-item sequence (A>B>C>D>E>F) of visual stimuli. First, the identification of novel pairs not containing end items (i.e., B>D, C>E, B>E) was associated with greater left hippocampal activation when compared to the identification of novel pairs containing end items A and F. This demonstrates that the identification of stimulus pairs requiring the flexible representation of a sequence is associated with hippocampal activation. Second, for the three novel pairs devoid of end items we found greater right hippocampal activation for pairs B>D and C>E compared with pair B>E. This indicates that TI decisions on pairs derived from more adjacent items in the sequence are associated with greater hippocampal activation. Hippocampal activation thus scales with the degree of relational processing necessary for TI judgments. Both findings confirm a role of the hippocampus in transitive inference in humans. PMID:19216061

  8. Genomes of the Mouse Collaborative Cross.

    PubMed

    Srivastava, Anuj; Morgan, Andrew P; Najarian, Maya L; Sarsani, Vishal Kumar; Sigmon, J Sebastian; Shorter, John R; Kashfeen, Anwica; McMullan, Rachel C; Williams, Lucy H; Giusti-Rodríguez, Paola; Ferris, Martin T; Sullivan, Patrick; Hock, Pablo; Miller, Darla R; Bell, Timothy A; McMillan, Leonard; Churchill, Gary A; de Villena, Fernando Pardo-Manuel

    2017-06-01

    The Collaborative Cross (CC) is a multiparent panel of recombinant inbred (RI) mouse strains derived from eight founder laboratory strains. RI panels are popular because of their long-term genetic stability, which enhances reproducibility and integration of data collected across time and conditions. Characterization of their genomes can be a community effort, reducing the burden on individual users. Here we present the genomes of the CC strains using two complementary approaches as a resource to improve power and interpretation of genetic experiments. Our study also provides a cautionary tale regarding the limitations imposed by such basic biological processes as mutation and selection. A distinct advantage of inbred panels is that genotyping only needs to be performed on the panel, not on each individual mouse. The initial CC genome data were haplotype reconstructions based on dense genotyping of the most recent common ancestors (MRCAs) of each strain followed by imputation from the genome sequence of the corresponding founder inbred strain. The MRCA resource captured segregating regions in strains that were not fully inbred, but it had limited resolution in the transition regions between founder haplotypes, and there was uncertainty about founder assignment in regions of limited diversity. Here we report the whole genome sequence of 69 CC strains generated by paired-end short reads at 30× coverage of a single male per strain. Sequencing leads to a substantial improvement in the fine structure and completeness of the genomes of the CC. Both MRCAs and sequenced samples show a significant reduction in the genome-wide haplotype frequencies from two wild-derived strains, CAST/EiJ and PWK/PhJ. In addition, analysis of the evolution of the patterns of heterozygosity indicates that selection against three wild-derived founder strains played a significant role in shaping the genomes of the CC. The sequencing resource provides the first description of tens of thousands of new genetic variants introduced by mutation and drift in the CC genomes. We estimate that new SNP mutations are accumulating in each CC strain at a rate of 2.4 ± 0.4 per gigabase per generation. The fixation of new mutations by genetic drift has introduced thousands of new variants into the CC strains. The majority of these mutations are novel compared to currently sequenced laboratory stocks and wild mice, and some are predicted to alter gene function. Approximately one-third of the CC inbred strains have acquired large deletions (>10 kb) many of which overlap known coding genes and functional elements. The sequence of these mice is a critical resource to CC users, increases threefold the number of mouse inbred strain genomes available publicly, and provides insight into the effect of mutation and drift on common resources. Copyright © 2017 Srivastava et al.

  9. Genomes of the Mouse Collaborative Cross

    PubMed Central

    Srivastava, Anuj; Morgan, Andrew P.; Najarian, Maya L.; Sarsani, Vishal Kumar; Sigmon, J. Sebastian; Shorter, John R.; Kashfeen, Anwica; McMullan, Rachel C.; Williams, Lucy H.; Giusti-Rodríguez, Paola; Ferris, Martin T.; Sullivan, Patrick; Hock, Pablo; Miller, Darla R.; Bell, Timothy A.; McMillan, Leonard; Churchill, Gary A.; de Villena, Fernando Pardo-Manuel

    2017-01-01

    The Collaborative Cross (CC) is a multiparent panel of recombinant inbred (RI) mouse strains derived from eight founder laboratory strains. RI panels are popular because of their long-term genetic stability, which enhances reproducibility and integration of data collected across time and conditions. Characterization of their genomes can be a community effort, reducing the burden on individual users. Here we present the genomes of the CC strains using two complementary approaches as a resource to improve power and interpretation of genetic experiments. Our study also provides a cautionary tale regarding the limitations imposed by such basic biological processes as mutation and selection. A distinct advantage of inbred panels is that genotyping only needs to be performed on the panel, not on each individual mouse. The initial CC genome data were haplotype reconstructions based on dense genotyping of the most recent common ancestors (MRCAs) of each strain followed by imputation from the genome sequence of the corresponding founder inbred strain. The MRCA resource captured segregating regions in strains that were not fully inbred, but it had limited resolution in the transition regions between founder haplotypes, and there was uncertainty about founder assignment in regions of limited diversity. Here we report the whole genome sequence of 69 CC strains generated by paired-end short reads at 30× coverage of a single male per strain. Sequencing leads to a substantial improvement in the fine structure and completeness of the genomes of the CC. Both MRCAs and sequenced samples show a significant reduction in the genome-wide haplotype frequencies from two wild-derived strains, CAST/EiJ and PWK/PhJ. In addition, analysis of the evolution of the patterns of heterozygosity indicates that selection against three wild-derived founder strains played a significant role in shaping the genomes of the CC. The sequencing resource provides the first description of tens of thousands of new genetic variants introduced by mutation and drift in the CC genomes. We estimate that new SNP mutations are accumulating in each CC strain at a rate of 2.4 ± 0.4 per gigabase per generation. The fixation of new mutations by genetic drift has introduced thousands of new variants into the CC strains. The majority of these mutations are novel compared to currently sequenced laboratory stocks and wild mice, and some are predicted to alter gene function. Approximately one-third of the CC inbred strains have acquired large deletions (>10 kb) many of which overlap known coding genes and functional elements. The sequence of these mice is a critical resource to CC users, increases threefold the number of mouse inbred strain genomes available publicly, and provides insight into the effect of mutation and drift on common resources. PMID:28592495

  10. A reference genetic map of C. clementina hort. ex Tan.; citrus evolution inferences from comparative mapping

    PubMed Central

    2012-01-01

    Background Most modern citrus cultivars have an interspecific origin. As a foundational step towards deciphering the interspecific genome structures, a reference whole genome sequence was produced by the International Citrus Genome Consortium from a haploid derived from Clementine mandarin. The availability of a saturated genetic map of Clementine was identified as an essential prerequisite to assist the whole genome sequence assembly. Clementine is believed to be a ‘Mediterranean’ mandarin × sweet orange hybrid, and sweet orange likely arose from interspecific hybridizations between mandarin and pummelo gene pools. The primary goals of the present study were to establish a Clementine reference map using codominant markers, and to perform comparative mapping of pummelo, sweet orange, and Clementine. Results Five parental genetic maps were established from three segregating populations, which were genotyped with Single Nucleotide Polymorphism (SNP), Simple Sequence Repeats (SSR) and Insertion-Deletion (Indel) markers. An initial medium density reference map (961 markers for 1084.1 cM) of the Clementine was established by combining male and female Clementine segregation data. This Clementine map was compared with two pummelo maps and a sweet orange map. The linear order of markers was highly conserved in the different species. However, significant differences in map size were observed, which suggests a variation in the recombination rates. Skewed segregations were much higher in the male than female Clementine mapping data. The mapping data confirmed that Clementine arose from hybridization between ‘Mediterranean’ mandarin and sweet orange. The results identified nine recombination break points for the sweet orange gamete that contributed to the Clementine genome. Conclusions A reference genetic map of citrus, used to facilitate the chromosome assembly of the first citrus reference genome sequence, was established. The high conservation of marker order observed at the interspecific level should allow reasonable inferences of most citrus genome sequences by mapping next-generation sequencing (NGS) data in the reference genome sequence. The genome of the haploid Clementine used to establish the citrus reference genome sequence appears to have been inherited primarily from the ‘Mediterranean’ mandarin. The high frequency of skewed allelic segregations in the male Clementine data underline the probable extent of deviation from Mendelian segregation for characters controlled by heterozygous loci in male parents. PMID:23126659

  11. High-throughput sequence-based analysis of the bacterial composition of kefir and an associated kefir grain.

    PubMed

    Dobson, Alleson; O'Sullivan, Orla; Cotter, Paul D; Ross, Paul; Hill, Colin

    2011-07-01

    Lacticin 3147 is a two-peptide broad spectrum lantibiotic produced by Lactococcus lactis DPC3147 shown to inhibit a number of clinically relevant Gram-positive pathogens. Initially isolated from an Irish kefir grain, lacticin 3147 is one of the most extensively studied lantibiotics to date. In this study, the bacterial diversity of the Irish kefir grain from which L. lactis DPC3147 was originally isolated was for the first time investigated using a high-throughput parallel sequencing strategy. A total of 17 416 unique V4 variable regions of the 16S rRNA gene were analysed from both the kefir starter grain and its derivative kefir-fermented milk. Firmicutes (which includes the lactic acid bacteria) was the dominant phylum accounting for > 92% of sequences. Within the Firmicutes, dramatic differences in abundance were observed when the starter grain and kefir milk fermentate were compared. The kefir grain-associated bacterial community was largely composed of the Lactobacillaceae family while Streptococcaceae (primarily Lactococcus spp.) was the dominant family within the kefir milk fermentate. Sequencing data confirmed previous findings that the microbiota of kefir milk and the starter grain are quite different while at the same time, establishing that the microbial diversity of the starter grain is not uniform with a greater level of diversity associated with the interior kefir starter grain compared with the exterior. © 2011 Teagasc Food Research Centre, Moorepark. FEMS Microbiology Letters © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd.

  12. Hydrophobic cluster analysis of G protein-coupled receptors: a powerful tool to derive structural and functional information from 2D-representation of protein sequences.

    PubMed

    Lentes, K U; Mathieu, E; Bischoff, R; Rasmussen, U B; Pavirani, A

    1993-01-01

    Current methods for comparative analyses of protein sequences are 1D-alignments of amino acid sequences based on the maximization of amino acid identity (homology) and the prediction of secondary structure elements. This method has a major drawback once the amino acid identity drops below 20-25%, since maximization of a homology score does not take into account any structural information. A new technique called Hydrophobic Cluster Analysis (HCA) has been developed by Lemesle-Varloot et al. (Biochimie 72, 555-574), 1990). This consists of comparing several sequences simultaneously and combining homology detection with secondary structure analysis. HCA is primarily based on the detection and comparison of structural segments constituting the hydrophobic core of globular protein domains, with or without transmembrane domains. We have applied HCA to the analysis of different families of G-protein coupled receptors, such as catecholamine receptors as well as peptide hormone receptors. Utilizing HCA the thrombin receptor, a new and as yet unique member of the family of G-protein coupled receptors, can be clearly classified as being closely related to the family of neuropeptide receptors rather than to the catecholamine receptors for which the shape of the hydrophobic clusters and the length of their third cytoplasmic loop are very different. Furthermore, the potential of HCA to predict relationships between new putative and already characterized members of this family of receptors will be presented.

  13. Improved nucleic acid descriptors for siRNA efficacy prediction.

    PubMed

    Sciabola, Simone; Cao, Qing; Orozco, Modesto; Faustino, Ignacio; Stanton, Robert V

    2013-02-01

    Although considerable progress has been made recently in understanding how gene silencing is mediated by the RNAi pathway, the rational design of effective sequences is still a challenging task. In this article, we demonstrate that including three-dimensional descriptors improved the discrimination between active and inactive small interfering RNAs (siRNAs) in a statistical model. Five descriptor types were used: (i) nucleotide position along the siRNA sequence, (ii) nucleotide composition in terms of presence/absence of specific combinations of di- and trinucleotides, (iii) nucleotide interactions by means of a modified auto- and cross-covariance function, (iv) nucleotide thermodynamic stability derived by the nearest neighbor model representation and (v) nucleic acid structure flexibility. The duplex flexibility descriptors are derived from extended molecular dynamics simulations, which are able to describe the sequence-dependent elastic properties of RNA duplexes, even for non-standard oligonucleotides. The matrix of descriptors was analysed using three statistical packages in R (partial least squares, random forest, and support vector machine), and the most predictive model was implemented in a modeling tool we have made publicly available through SourceForge. Our implementation of new RNA descriptors coupled with appropriate statistical algorithms resulted in improved model performance for the selection of siRNA candidates when compared with publicly available siRNA prediction tools and previously published test sets. Additional validation studies based on in-house RNA interference projects confirmed the robustness of the scoring procedure in prospective studies.

  14. Development and application of microsatellites in candidate genes related to wood properties in the Chinese white poplar (Populus tomentosa Carr.).

    PubMed

    Du, Qingzhang; Gong, Chenrui; Pan, Wei; Zhang, Deqiang

    2013-02-01

    Gene-derived simple sequence repeats (genic SSRs), also known as functional markers, are often preferred over random genomic markers because they represent variation in gene coding and/or regulatory regions. We characterized 544 genic SSR loci derived from 138 candidate genes involved in wood formation, distributed throughout the genome of Populus tomentosa, a key ecological and cultivated wood production species. Of these SSRs, three-quarters were located in the promoter or intron regions, and dinucleotide (59.7%) and trinucleotide repeat motifs (26.5%) predominated. By screening 15 wild P. tomentosa ecotypes, we identified 188 polymorphic genic SSRs with 861 alleles, 2-7 alleles for each marker. Transferability analysis of 30 random genic SSRs, testing whether these SSRs work in 26 genotypes of five genus Populus sections (outgroup, Salix matsudana), showed that 72% of the SSRs could be amplified in Turanga and 100% could be amplified in Leuce. Based on genotyping of these 26 genotypes, a neighbour-joining analysis showed the expected six phylogenetic groupings. In silico analysis of SSR variation in 220 sequences that are homologous between P. tomentosa and Populus trichocarpa suggested that genic SSR variations between relatives were predominantly affected by repeat motif variations or flanking sequence mutations. Inheritance tests and single-marker associations demonstrated the power of genic SSRs in family-based linkage mapping and candidate gene-based association studies, as well as marker-assisted selection and comparative genomic studies of P. tomentosa and related species.

  15. Taxonomic and Functional Microbial Signatures of the Endemic Marine Sponge Arenosclera brasiliensis

    PubMed Central

    Trindade-Silva, Amaro E.; Rua, Cintia; Silva, Genivaldo G. Z.; Dutilh, Bas E.; Moreira, Ana Paula B.; Edwards, Robert A.; Hajdu, Eduardo; Lobo-Hajdu, Gisele; Vasconcelos, Ana Tereza; Berlinck, Roberto G. S.; Thompson, Fabiano L.

    2012-01-01

    The endemic marine sponge Arenosclera brasiliensis (Porifera, Demospongiae, Haplosclerida) is a known source of secondary metabolites such as arenosclerins A-C. In the present study, we established the composition of the A. brasiliensis microbiome and the metabolic pathways associated with this community. We used 454 shotgun pyrosequencing to generate approximately 640,000 high-quality sponge-derived sequences (∼150 Mb). Clustering analysis including sponge, seawater and twenty-three other metagenomes derived from marine animal microbiomes shows that A. brasiliensis contains a specific microbiome. Fourteen bacterial phyla (including Proteobacteria, Cyanobacteria, Actinobacteria, Bacteroidetes, Firmicutes and Cloroflexi) were consistently found in the A. brasiliensis metagenomes. The A. brasiliensis microbiome is enriched for Betaproteobacteria (e.g., Burkholderia) and Gammaproteobacteria (e.g., Pseudomonas and Alteromonas) compared with the surrounding planktonic microbial communities. Functional analysis based on Rapid Annotation using Subsystem Technology (RAST) indicated that the A. brasiliensis microbiome is enriched for sequences associated with membrane transport and one-carbon metabolism. In addition, there was an overrepresentation of sequences associated with aerobic and anaerobic metabolism as well as the synthesis and degradation of secondary metabolites. This study represents the first analysis of sponge-associated microbial communities via shotgun pyrosequencing, a strategy commonly applied in similar analyses in other marine invertebrate hosts, such as corals and algae. We demonstrate that A. brasiliensis has a unique microbiome that is distinct from that of the surrounding planktonic microbes and from other marine organisms, indicating a species-specific microbiome. PMID:22768320

  16. Bacterial Degraders of Coexisting Dichloromethane, Benzene, and Toluene, Identified by Stable-Isotope Probing.

    PubMed

    Yoshikawa, Miho; Zhang, Ming; Kurisu, Futoshi; Toyota, Koki

    2017-01-01

    Most bioremediation studies on volatile organic compounds (VOCs) have focused on a single contaminant or its derived compounds and degraders have been identified under single contaminant conditions. Bioremediation of multiple contaminants remains a challenging issue. To identify a bacterial consortium that degrades multiple VOCs (dichloromethane (DCM), benzene, and toluene), we applied DNA-stable isotope probing. For individual tests, we combined a 13 C-labeled VOC with other two unlabeled VOCs, and prepared three unlabeled VOCs as a reference. Over 11 days, DNA was periodically extracted from the consortia, and the bacterial community was evaluated by next-generation sequencing of bacterial 16S rRNA gene amplicons. Density gradient fractions of the DNA extracts were amplified by universal bacterial primers for the 16S rRNA gene sequences, and the amplicons were analyzed by terminal restriction fragment length polymorphism (T-RFLP) using restriction enzymes: Hha I and Msp I. The T-RFLP fragments were identified by 16S rRNA gene cloning and sequencing. Under all test conditions, the consortia were dominated by Rhodanobacter , Bradyrhizobium / Afipia , Rhizobium , and Hyphomicrobium . DNA derived from Hyphomicrobium and Propioniferax shifted toward heavier fractions under the condition added with 13 C-DCM and 13 C-benzene, respectively, compared with the reference, but no shifts were induced by 13 C-toluene addition. This implies that Hyphomicrobium and Propioniferax were the main DCM and benzene degraders, respectively, under the coexisting condition. The known benzene degrader Pseudomonas sp. was present but not actively involved in the degradation.

  17. High-resolution melting genotyping of Enterococcus faecium based on multilocus sequence typing derived single nucleotide polymorphisms.

    PubMed

    Tong, Steven Y C; Xie, Shirley; Richardson, Leisha J; Ballard, Susan A; Dakh, Farshid; Grabsch, Elizabeth A; Grayson, M Lindsay; Howden, Benjamin P; Johnson, Paul D R; Giffard, Philip M

    2011-01-01

    We have developed a single nucleotide polymorphism (SNP) nucleated high-resolution melting (HRM) technique to genotype Enterococcus faecium. Eight SNPs were derived from the E. faecium multilocus sequence typing (MLST) database and amplified fragments containing these SNPs were interrogated by HRM. We tested the HRM genotyping scheme on 85 E. faecium bloodstream isolates and compared the results with MLST, pulsed-field gel electrophoresis (PFGE) and an allele specific real-time PCR (AS kinetic PCR) SNP typing method. In silico analysis based on predicted HRM curves according to the G+C content of each fragment for all 567 sequence types (STs) in the MLST database together with empiric data from the 85 isolates demonstrated that HRM analysis resolves E. faecium into 231 "melting types" (MelTs) and provides a Simpson's Index of Diversity (D) of 0.991 with respect to MLST. This is a significant improvement on the AS kinetic PCR SNP typing scheme that resolves 61 SNP types with D of 0.95. The MelTs were concordant with the known ST of the isolates. For the 85 isolates, there were 13 PFGE patterns, 17 STs, 14 MelTs and eight SNP types. There was excellent concordance between PFGE, MLST and MelTs with Adjusted Rand Indices of PFGE to MelT 0.936 and ST to MelT 0.973. In conclusion, this HRM based method appears rapid and reproducible. The results are concordant with MLST and the MLST based population structure.

  18. The recurrence sequences via Sylvester matrices

    NASA Astrophysics Data System (ADS)

    Karaduman, Erdal; Deveci, Ömür

    2017-07-01

    In this work, we define the Pell-Jacobsthal-Slyvester sequence and the Jacobsthal-Pell-Slyvester sequence by using the Slyvester matrices which are obtained from the characteristic polynomials of the Pell and Jacobsthal sequences and then, we study the sequences defined modulo m. Also, we obtain the cyclic groups and the semigroups from the generating matrices of these sequences when read modulo m and then, we derive the relationships among the orders of the cyclic groups and the periods of the sequences. Furthermore, we redefine Pell-Jacobsthal-Slyvester sequence and the Jacobsthal-Pell-Slyvester sequence by means of the elements of the groups and then, we examine them in the finite groups.

  19. Implications for HIV testing policy derived from combining data on voluntary confidential testing with viral sequences and serological analyses.

    PubMed

    Brown, A E; Murphy, G; Rinck, G; Clewley, J P; Hill, C; Parry, J V; Johnson, A M; Pillay, D; Gill, O N

    2009-02-01

    Laboratory, clinical and sequence-based data were combined to assess the differential uptake of voluntary confidential HIV testing (VCT) according to risk and explore the occurrence of HIV transmission from individuals with recently acquired HIV infection, before the diagnostic opportunity. Between 1999 and 2002, nearly 30,000 anonymous tests for previously undiagnosed HIV infection were conducted among men who have sex with men (MSM) attending 15 sentinel sexually transmitted infection (STI) clinics in England, Wales and Northern Ireland. Using a serological testing algorithm, undiagnosed HIV-infected men were categorised into those with recent and non-recent infection. VCT uptake was compared between HIV-negative, recently HIV-infected and non-recently HIV-infected men. A phylogenetic analysis of HIV pol sequences from 127 recently HIV-infected MSM was conducted to identify instances in which transmission may have occurred before the diagnostic opportunity. HIV-negative MSM were more likely to receive VCT at clinic visits compared with undiagnosed HIV-infected MSM (56% (14,020/24,938) vs 31% (335/1072); p<0.001). Recently HIV-infected MSM were more likely to receive VCT compared with those with non-recent infections (42% (97/229) vs 28% (238/844); p<0.001). 22% (95/425) of undiagnosed HIV-infected MSM with STI received VCT. Phylogenetic analysis revealed at least seven transmissions may have been generated by recently HIV-infected MSM: a group that attended STI clinics soon after seroconversion. The integration of clinical, laboratory and sequence-based data reveals the need for specific targeting of the recently HIV exposed, and those with STI, for VCT. VCT promotion alone may be limited in its ability to prevent HIV transmission.

  20. Comparative analysis of biological activities of Der p I-derived peptides on Fc epsilon receptor-bearing cells from Dermatophagoides pteronyssinus-sensitive patients.

    PubMed Central

    Jeannin, P; Pestel, J; Bossus, M; Lassalle, P; Tartar, A; Tonnel, A B

    1993-01-01

    The ability of four uncoupled synthetic peptides (p52-71, p117-133, p176-187, p188-199) derived from Der p I, a major allergen from the house dust mite Dermatophagoides pteronyssinus (Dpt) to stimulate Fc epsilon R+ cells from Dpt-sensitive patients was comparatively analysed. Each free peptide may specifically stimulate basophils (Fc epsilon RI+ cells) and platelets (Fc epsilon RII+ cells) from patients with significant levels of anti-Der p I IgE antibodies; p52-71 and p117-133 appear the best cell stimulation inducers. Both concentration-dependent biological activities of Der p I-peptide on Fc epsilon R+ cells are enhanced by coupling peptide to a carrier (as human serum albumin). Interestingly each Der p I-sensitive patient tested presents an individual pattern of response to peptide. Thus, from our results it appears that different Der p I sequences could be involved in the immune response to Der p I. PMID:7682161

  1. Comparative analysis of biological activities of Der p I-derived peptides on Fc epsilon receptor-bearing cells from Dermatophagoides pteronyssinus-sensitive patients.

    PubMed

    Jeannin, P; Pestel, J; Bossus, M; Lassalle, P; Tartar, A; Tonnel, A B

    1993-04-01

    The ability of four uncoupled synthetic peptides (p52-71, p117-133, p176-187, p188-199) derived from Der p I, a major allergen from the house dust mite Dermatophagoides pteronyssinus (Dpt) to stimulate Fc epsilon R+ cells from Dpt-sensitive patients was comparatively analysed. Each free peptide may specifically stimulate basophils (Fc epsilon RI+ cells) and platelets (Fc epsilon RII+ cells) from patients with significant levels of anti-Der p I IgE antibodies; p52-71 and p117-133 appear the best cell stimulation inducers. Both concentration-dependent biological activities of Der p I-peptide on Fc epsilon R+ cells are enhanced by coupling peptide to a carrier (as human serum albumin). Interestingly each Der p I-sensitive patient tested presents an individual pattern of response to peptide. Thus, from our results it appears that different Der p I sequences could be involved in the immune response to Der p I.

  2. Draft Genome Sequences of Three European Laboratory Derivatives from Enterohemorrhagic Escherichia coli O157:H7 Strain EDL933, Including Two Plasmids

    PubMed Central

    Fellner, Lea; Huptas, Christopher; Simon, Svenja; Mühlig, Anna; Neuhaus, Klaus

    2016-01-01

    Escherichia coli O157:H7 EDL933, isolated in 1982 in the United States, was the first enterohemorrhagic E. coli (EHEC) strain sequenced. Unfortunately, European labs can no longer receive the original strain. We checked three European EDL933 derivatives and found major genetic deviations (deletions, inversions) in two strains. All EDL933 strains contain the cryptic EHEC-plasmid, not reported before. PMID:27056239

  3. Asymmetric histone modifications between the original and derived loci of human segmental duplications

    PubMed Central

    Zheng, Deyou

    2008-01-01

    Background Sequencing and annotation of several mammalian genomes have revealed that segmental duplications are a common architectural feature of primate genomes; in fact, about 5% of the human genome is composed of large blocks of interspersed segmental duplications. These segmental duplications have been implicated in genomic copy-number variation, gene novelty, and various genomic disorders. However, the molecular processes involved in the evolution and regulation of duplicated sequences remain largely unexplored. Results In this study, the profile of about 20 histone modifications within human segmental duplications was characterized using high-resolution, genome-wide data derived from a ChIP-Seq study. The analysis demonstrates that derivative loci of segmental duplications often differ significantly from the original with respect to many histone methylations. Further investigation showed that genes are present three times more frequently in the original than in the derivative, whereas pseudogenes exhibit the opposite trend. These asymmetries tend to increase with the age of segmental duplications. The uneven distribution of genes and pseudogenes does not, however, fully account for the asymmetry in the profile of histone modifications. Conclusion The first systematic analysis of histone modifications between segmental duplications demonstrates that two seemingly 'identical' genomic copies are distinct in their epigenomic properties. Results here suggest that local chromatin environments may be implicated in the discrimination of derived copies of segmental duplications from their originals, leading to a biased pseudogenization of the new duplicates. The data also indicate that further exploration of the interactions between histone modification and sequence degeneration is necessary in order to understand the divergence of duplicated sequences. PMID:18598352

  4. Characterization of a native hammerhead ribozyme derived from schistosomes

    PubMed Central

    OSBORNE, EDITH M.; SCHAAK, JANELL E.; DEROSE, VICTORIA J.

    2005-01-01

    A recent re-examination of the role of the helices surrounding the conserved core of the hammerhead ribozyme has identified putative loop–loop interactions between stems I and II in native hammerhead sequences. These extended hammerhead sequences are more active at low concentrations of divalent cations than are minimal hammerheads. The loop–loop interactions are proposed to stabilize a more active conformation of the conserved core. Here, a kinetic and thermodynamic characterization of an extended hammerhead sequence derived from Schistosoma mansoni is performed. Biphasic kinetics are observed, suggesting the presence of at least two conformers, one cleaving with a fast rate and the other with a slow rate. Replacing loop II with a poly(U) sequence designed to eliminate the interaction between the two loops results in greatly diminished activity, suggesting that the loop–loop interactions do aid in forming a more active conformation. Previous studies with minimal hammerheads have shown deleterious effects of Rp-phosphorothioate substitutions at the cleavage site and 5′ to A9, both of which could be rescued with Cd2+. Here, phosphorothioate modifications at the cleavage site and 5′ to A9 were made in the schistosome-derived sequence. In Mg2+, both phosphorothioate substitutions decreased the overall fraction cleaved without significantly affecting the observed rate of cleavage. The addition of Cd2+ rescued cleavage in both cases, suggesting that these are still putative metal binding sites in this native sequence. PMID:15659358

  5. A case of canine borreliosis in Iran caused by Borrelia persica.

    PubMed

    Shirani, Darush; Rakhshanpoor, Alaleh; Cutler, Sally Jane; Ghazinezhad, Behnaz; Naddaf, Saied Reza

    2016-04-01

    Tick-borne relapsing fever is an endemic disease in Iran, with most cases attributed to infection by Borrelia persica, which is transmitted by Ornithodoros tholozani soft ticks. Here, we report spirochetemia in blood of a puppy residing in Tehran, Iran. The causative species was identified by use of highly discriminative IGS sequencing; the 489 bp IGS sequence obtained in our study showed 99% identity (100% coverage) when compared with B. persica sequences derived from clinical cases or from O. tholozani ticks. Our IGS sequence also showed 99% similarity over 414 bp (85% coverage) with a strain from a domestic dog, and 96% over 328 bp (69% coverage) with a strain from a domestic cat. Pet-keeping in cosmopolitan cities like Tehran has become increasingly popular in recent years. Animals are often transported into the city in cages or cardboard boxes that might also harbor minute tick larvae and/or early stages of the nymphs bringing them into the urban environment. This may pose a threat to household members who buy and keep these puppies and as a result may come into close contact with infected ticks. Copyright © 2016 Elsevier GmbH. All rights reserved.

  6. Interchromosomal recombination in Zea mays.

    PubMed Central

    Hu, W; Timmermans, M C; Messing, J

    1998-01-01

    A new allele of the 27-kD zein locus in maize has been generated by interchromosomal recombination between chromosomes of two different inbred lines. A continuous patch of at least 11,817 bp of inbred W64A, containing the previously characterized Ra allele of the 27-kD zein gene, has been inserted into the genome of A188 by a single crossover. While both junction sequences are conserved, sequences of the two homologs between these junctions differ considerably. W64A contains the 7313-bp-long retrotransposon, Zeon-1. A188 contains a second copy of the 27-kD zein gene and a 2-kb repetitive element. Therefore, recombination results in a 7.3-kb insertion and a 14-kb deletion compared to the original S+A188 allele. If nonpairing sequences are looped out, 206 single base changes, frequently clustered, are present. The structure of this allele may explain how a recently discovered example of somatic recombination occurred in an A188/W64A hybrid. This would indicate that despite these sequence differences, pairing between these alleles could occur early during plant development. Therefore, such a somatically derived chimeric chromosome can also be heritable and give rise to new alleles. PMID:9799274

  7. A Comparison of Three Random Number Generators for Aircraft Dynamic Modeling Applications

    NASA Technical Reports Server (NTRS)

    Grauer, Jared A.

    2017-01-01

    Three random number generators, which produce Gaussian white noise sequences, were compared to assess their suitability in aircraft dynamic modeling applications. The first generator considered was the MATLAB (registered) implementation of the Mersenne-Twister algorithm. The second generator was a website called Random.org, which processes atmospheric noise measured using radios to create the random numbers. The third generator was based on synthesis of the Fourier series, where the random number sequences are constructed from prescribed amplitude and phase spectra. A total of 200 sequences, each having 601 random numbers, for each generator were collected and analyzed in terms of the mean, variance, normality, autocorrelation, and power spectral density. These sequences were then applied to two problems in aircraft dynamic modeling, namely estimating stability and control derivatives from simulated onboard sensor data, and simulating flight in atmospheric turbulence. In general, each random number generator had good performance and is well-suited for aircraft dynamic modeling applications. Specific strengths and weaknesses of each generator are discussed. For Monte Carlo simulation, the Fourier synthesis method is recommended because it most accurately and consistently approximated Gaussian white noise and can be implemented with reasonable computational effort.

  8. The phylogeny of yellow fever virus 17D vaccines.

    PubMed

    Stock, Nina K; Boschetti, Nicola; Herzog, Christian; Appelhans, Marc S; Niedrig, Matthias

    2012-02-01

    In recent years the safety of the yellow fever live vaccine 17D came under scrutiny. The focus was on serious adverse events after vaccinations that resemble a wild type infection with yellow fever and whose reasons are still not known. Also the exact mechanism of attenuation of the vaccine remains unknown to this day. In this context, the standards of safety and surveillance in vaccine production and administration have been discussed. Therein embodied was the demand for improved documentation of the derivation of the seed virus used for yellow fever vaccine production. So far, there was just a historical genealogy available that is based on source area and passage level. However, there is a need for a documentation based on molecular information to get better insights into the mechanisms of pathology. In this work we sequenced the whole genome of different passages of the YFV-17D strain used by Crucell Switzerland AG for vaccine production. Using all other publically available 17D full genome sequences we compared the sequence variance of all vaccine strains and oppose a phylogenetic tree based on full genome sequences to the historical genealogy. Copyright © 2011 Elsevier Ltd. All rights reserved.

  9. Complete Sequence and Analysis of Coconut Palm (Cocos nucifera) Mitochondrial Genome.

    PubMed

    Aljohi, Hasan Awad; Liu, Wanfei; Lin, Qiang; Zhao, Yuhui; Zeng, Jingyao; Alamer, Ali; Alanazi, Ibrahim O; Alawad, Abdullah O; Al-Sadi, Abdullah M; Hu, Songnian; Yu, Jun

    2016-01-01

    Coconut (Cocos nucifera L.), a member of the palm family (Arecaceae), is one of the most economically important crops in tropics, serving as an important source of food, drink, fuel, medicine, and construction material. Here we report an assembly of the coconut (C. nucifera, Oman local Tall cultivar) mitochondrial (mt) genome based on next-generation sequencing data. This genome, 678,653bp in length and 45.5% in GC content, encodes 72 proteins, 9 pseudogenes, 23 tRNAs, and 3 ribosomal RNAs. Within the assembly, we find that the chloroplast (cp) derived regions account for 5.07% of the total assembly length, including 13 proteins, 2 pseudogenes, and 11 tRNAs. The mt genome has a relatively large fraction of repeat content (17.26%), including both forward (tandem) and inverted (palindromic) repeats. Sequence variation analysis shows that the Ti/Tv ratio of the mt genome is lower as compared to that of the nuclear genome and neutral expectation. By combining public RNA-Seq data for coconut, we identify 734 RNA editing sites supported by at least two datasets. In summary, our data provides the second complete mt genome sequence in the family Arecaceae, essential for further investigations on mitochondrial biology of seed plants.

  10. Analyses of mitochondrial amino acid sequence datasets support the proposal that specimens of Hypodontus macropi from three species of macropodid hosts represent distinct species

    PubMed Central

    2013-01-01

    Background Hypodontus macropi is a common intestinal nematode of a range of kangaroos and wallabies (macropodid marsupials). Based on previous multilocus enzyme electrophoresis (MEE) and nuclear ribosomal DNA sequence data sets, H. macropi has been proposed to be complex of species. To test this proposal using independent molecular data, we sequenced the whole mitochondrial (mt) genomes of individuals of H. macropi from three different species of hosts (Macropus robustus robustus, Thylogale billardierii and Macropus [Wallabia] bicolor) as well as that of Macropicola ocydromi (a related nematode), and undertook a comparative analysis of the amino acid sequence datasets derived from these genomes. Results The mt genomes sequenced by next-generation (454) technology from H. macropi from the three host species varied from 13,634 bp to 13,699 bp in size. Pairwise comparisons of the amino acid sequences predicted from these three mt genomes revealed differences of 5.8% to 18%. Phylogenetic analysis of the amino acid sequence data sets using Bayesian Inference (BI) showed that H. macropi from the three different host species formed distinct, well-supported clades. In addition, sliding window analysis of the mt genomes defined variable regions for future population genetic studies of H. macropi in different macropodid hosts and geographical regions around Australia. Conclusions The present analyses of inferred mt protein sequence datasets clearly supported the hypothesis that H. macropi from M. robustus robustus, M. bicolor and T. billardierii represent distinct species. PMID:24261823

  11. Proteolytic processing of the vitellogenin precursor in the boll weevil, Anthonomus grandis.

    PubMed

    Heilmann, L J; Trewitt, P M; Kumaran, A K

    1993-01-01

    The soluble proteins of the eggs of the coleopteran insect Anthonomus grandis Boheman, the cotton boll weevil, consist almost entirely of two vitellin types with M(r)s of 160,000 and 47,000. We sequenced their N-terminal ends and one internal cyanogen bromide fragment of the large vitellin and compared these sequences with the deduced amino acid sequence from the vitellogenin gene. The results suggest that both the boll weevil vitellin proteins are products of the proteolytic cleavage of a single precursor protein. The smaller 47,000 M(r) vitellin protein is derived from the N-terminal portion of the precursor adjacent to an 18 amino acid signal peptide. The cleavage site between the large and small vitellins at amino acid 362 is adjacent to a pentapeptide sequence containing two pairs of arginine residues. Comparison of the boll weevil sequences with limited known sequences from the single 180,000 M(r) honey bee protein show that the honey bee vitellin N-terminal exhibits sequence homology to the N-terminal of the 47,000 M(r) boll weevil vitellin. Treatment of the vitellins with an N-glycosidase results in a decrease in molecular weight of both proteins, from 47,000 to 39,000 and from 160,000 to 145,000, indicating that about 10-15% of the molecular weight of each vitellin consists of N-linked carbohydrate. The molecular weight of the deglycosylated large vitellin is smaller than that predicted from the gene sequence, indicating possible further proteolytic processing at the C-terminal of that protein.

  12. A comparative study of retrotransposons in the centromeric regions of A and B chromosomes of maize.

    PubMed

    Theuri, J; Phelps-Durr, T; Mathews, S; Birchler, J

    2005-01-01

    Bacterial Artificial Chromosomes (BACs) derived from the B chromosome, based on homology with the B specific sequence, were subcloned and sequenced. Analysis of DNA sequence data indicated the presence of 23 common retroelements, as well as novel sequences of B chromosome origin. Generally, where the same retrotransposon type was observed in both A and B chromosomes, there were more copies per unit of sequence in the B centromeric region (the major site of B repeat) than in the A centromere, except for Huck-1. Based on previous estimates of the age of the major burst of transposition into the maize genome, the oldest retrotransposons (Ji-6 and Tekay, approximately 5.0 and 5.2 million years ago, respectively) were found in the B centromere region only, while the next two oldest (Huck-1 and Opie-1) were found in both the A and B sequences. Phylogenetic analysis of Opie retroelements from both A and B centromeres indicated that some of the B Opie centromeric sequences share a more recent common ancestor with A Opie retroelements than they do with other B Opie centromeric sequences. These results imply that the supernumerary maize B chromosome has coexisted with the A chromosomes during that period of transposition. They also support the hypothesis that the B chromosome had its origins from A chromosome elements, or that alternative origins, such as being donated to the maize genome in a wide species cross, preceded six million years ago, because the spectrum of retrotransposons in the two chromosomes is quite similar.

  13. Bacterial community composition in Brazilian Anthrosols and adjacent soils characterized using culturing and molecular identification.

    PubMed

    O'Neill, B; Grossman, J; Tsai, M T; Gomes, J E; Lehmann, J; Peterson, J; Neves, E; Thies, J E

    2009-07-01

    Microbial community composition was examined in two soil types, Anthrosols and adjacent soils, sampled from three locations in the Brazilian Amazon. The Anthrosols, also known as Amazonian dark earths, are highly fertile soils that are a legacy of pre-Columbian settlement. Both Anthrosols and adjacent soils are derived from the same parent material and subject to the same environmental conditions, including rainfall and temperature; however, the Anthrosols contain high levels of charcoal-like black carbon from which they derive their dark color. The Anthrosols typically have higher cation exchange capacity, higher pH, and higher phosphorus and calcium contents. We used culture media prepared from soil extracts to isolate bacteria unique to the two soil types and then sequenced their 16S rRNA genes to determine their phylogenetic placement. Higher numbers of culturable bacteria, by over two orders of magnitude at the deepest sampling depths, were counted in the Anthrosols. Sequences of bacteria isolated on soil extract media yielded five possible new bacterial families. Also, a higher number of families in the bacteria were represented by isolates from the deeper soil depths in the Anthrosols. Higher bacterial populations and a greater diversity of isolates were found in all of the Anthrosols, to a depth of up to 1 m, compared to adjacent soils located within 50-500 m of their associated Anthrosols. Compared to standard culture media, soil extract media revealed diverse soil microbial populations adapted to the unique biochemistry and physiological ecology of these Anthrosols.

  14. Nucleotide sequences specific to Yersinia pestis and methods for the detection of Yersinia pestis

    DOEpatents

    McCready, Paula M [Tracy, CA; Radnedge, Lyndsay [San Mateo, CA; Andersen, Gary L [Berkeley, CA; Ott, Linda L [Livermore, CA; Slezak, Thomas R [Livermore, CA; Kuczmarski, Thomas A [Livermore, CA; Motin, Vladinir L [League City, TX

    2009-02-24

    Nucleotide sequences specific to Yersinia pestis that serve as markers or signatures for identification of this bacterium were identified. In addition, forward and reverse primers and hybridization probes derived from these nucleotide sequences that are used in nucleotide detection methods to detect the presence of the bacterium are disclosed.

  15. Nucleotide sequences specific to Brucella and methods for the detection of Brucella

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McCready, Paula M; Radnedge, Lyndsay; Andersen, Gary L

    Nucleotide sequences specific to Brucella that serves as a marker or signature for identification of this bacterium were identified. In addition, forward and reverse primers and hybridization probes derived from these nucleotide sequences that are used in nucleotide detection methods to detect the presence of the bacterium are disclosed.

  16. Comparative genomics of Enterococcus faecalis from healthy Norwegian infants

    PubMed Central

    Solheim, Margrete; Aakra, Ågot; Snipen, Lars G; Brede, Dag A; Nes, Ingolf F

    2009-01-01

    Background Enterococcus faecalis, traditionally considered a harmless commensal of the intestinal tract, is now ranked among the leading causes of nosocomial infections. In an attempt to gain insight into the genetic make-up of commensal E. faecalis, we have studied genomic variation in a collection of community-derived E. faecalis isolated from the feces of Norwegian infants. Results The E. faecalis isolates were first sequence typed by multilocus sequence typing (MLST) and characterized with respect to antibiotic resistance and properties associated with virulence. A subset of the isolates was compared to the vancomycin resistant strain E. faecalis V583 (V583) by whole genome microarray comparison (comparative genomic hybridization (CGH)). Several of the putative enterococcal virulence factors were found to be highly prevalent among the commensal baby isolates. The genomic variation as observed by CGH was less between isolates displaying the same MLST sequence type than between isolates belonging to different evolutionary lineages. Conclusion The variations in gene content observed among the investigated commensal E. faecalis is comparable to the genetic variation previously reported among strains of various origins thought to be representative of the major E. faecalis lineages. Previous MLST analysis of E. faecalis have identified so-called high-risk enterococcal clonal complexes (HiRECC), defined as genetically distinct subpopulations, epidemiologically associated with enterococcal infections. The observed correlation between CGH and MLST presented here, may offer a method for the identification of lineage-specific genes, and may therefore add clues on how to distinguish pathogenic from commensal E. faecalis. In this work, information on the core genome of E. faecalis is also substantially extended. PMID:19393078

  17. Comparative genomic analysis of four representative plant growth-promoting rhizobacteria in Pseudomonas.

    PubMed

    Shen, Xuemei; Hu, Hongbo; Peng, Huasong; Wang, Wei; Zhang, Xuehong

    2013-04-22

    Some Pseudomonas strains function as predominant plant growth-promoting rhizobacteria (PGPR). Within this group, Pseudomonas chlororaphis and Pseudomonas fluorescens are non-pathogenic biocontrol agents, and some Pseudomonas aeruginosa and Pseudomonas stutzeri strains are PGPR. P. chlororaphis GP72 is a plant growth-promoting rhizobacterium with a fully sequenced genome. We conducted a genomic analysis comparing GP72 with three other pseudomonad PGPR: P. fluorescens Pf-5, P. aeruginosa M18, and the nitrogen-fixing strain P. stutzeri A1501. Our aim was to identify the similarities and differences among these strains using a comparative genomic approach to clarify the mechanisms of plant growth-promoting activity. The genome sizes of GP72, Pf-5, M18, and A1501 ranged from 4.6 to 7.1 M, and the number of protein-coding genes varied among the four species. Clusters of Orthologous Groups (COGs) analysis assigned functions to predicted proteins. The COGs distributions were similar among the four species. However, the percentage of genes encoding transposases and their inactivated derivatives (COG L) was 1.33% of the total genes with COGs classifications in A1501, 0.21% in GP72, 0.02% in Pf-5, and 0.11% in M18. A phylogenetic analysis indicated that GP72 and Pf-5 were the most closely related strains, consistent with the genome alignment results. Comparisons of predicted coding sequences (CDSs) between GP72 and Pf-5 revealed 3544 conserved genes. There were fewer conserved genes when GP72 CDSs were compared with those of A1501 and M18. Comparisons among the four Pseudomonas species revealed 603 conserved genes in GP72, illustrating common plant growth-promoting traits shared among these PGPR. Conserved genes were related to catabolism, transport of plant-derived compounds, stress resistance, and rhizosphere colonization. Some strain-specific CDSs were related to different kinds of biocontrol activities or plant growth promotion. The GP72 genome contained the cus operon (related to heavy metal resistance) and a gene cluster involved in type IV pilus biosynthesis, which confers adhesion ability. Comparative genomic analysis of four representative PGPR revealed some conserved regions, indicating common characteristics (metabolism of plant-derived compounds, heavy metal resistance, and rhizosphere colonization) among these pseudomonad PGPR. Genomic regions specific to each strain provide clues to its lifestyle, ecological adaptation, and physiological role in the rhizosphere.

  18. PMS2 gene mutational analysis: direct cDNA sequencing to circumvent pseudogene interference.

    PubMed

    Wimmer, Katharina; Wernstedt, Annekatrin

    2014-01-01

    The presence of highly homologous pseudocopies can compromise the mutation analysis of a gene of interest. In particular, when using PCR-based strategies, pseudogene co-amplification has to be effectively prevented. This is often achieved by using primers designed to be parental gene specific according to the reference sequence and by applying stringent PCR conditions. However, there are cases in which this approach is of limited utility. For example, it has been shown that the PMS2 gene exchanges sequences with one of its pseudogenes, named PMS2CL. This results in functional PMS2 alleles containing pseudogene-derived sequences at their 3'-end and in nonfunctional PMS2CL pseudogene alleles that contain gene-derived sequences. Hence, the paralogues cannot be distinguished according to the reference sequence. This shortcoming can be effectively circumvented by using direct cDNA sequencing. This approach is based on the selective amplification of PMS2 transcripts in two overlapping 1.6-kb RT-PCR products. In addition to avoiding pseudogene co-amplification and allele dropout, this method has also the advantage that it allows to effectively identify deletions, splice mutations, and de novo retrotransposon insertions that escape the detection of most DNA-based mutation analysis protocols.

  19. Arrested neural and advanced mesenchymal differentiation of glioblastoma cells-comparative study with neural progenitors

    PubMed Central

    2009-01-01

    Background Although features of variable differentiation in glioblastoma cell cultures have been reported, a comparative analysis of differentiation properties of normal neural GFAP positive progenitors, and those shown by glioblastoma cells, has not been performed. Methods Following methods were used to compare glioblastoma cells and GFAP+NNP (NHA): exposure to neural differentiation medium, exposure to adipogenic and osteogenic medium, western blot analysis, immunocytochemistry, single cell assay, BrdU incorporation assay. To characterize glioblastoma cells EGFR amplification analysis, LOH/MSI analysis, and P53 nucleotide sequence analysis were performed. Results In vitro differentiation of cancer cells derived from eight glioblastomas was compared with GFAP-positive normal neural progenitors (GFAP+NNP). Prior to exposure to differentiation medium, both types of cells showed similar multilineage phenotype (CD44+/MAP2+/GFAP+/Vimentin+/Beta III-tubulin+/Fibronectin+) and were positive for SOX-2 and Nestin. In contrast to GFAP+NNP, an efficient differentiation arrest was observed in all cell lines isolated from glioblastomas. Nevertheless, a subpopulation of cells isolated from four glioblastomas differentiated after serum-starvation with varying efficiency into derivatives indistinguishable from the neural derivatives of GFAP+NNP. Moreover, the cells derived from a majority of glioblastomas (7 out of 8), as well as GFAP+NNP, showed features of mesenchymal differentiation when exposed to medium with serum. Conclusion Our results showed that stable co-expression of multilineage markers by glioblastoma cells resulted from differentiation arrest. According to our data up to 95% of glioblastoma cells can present in vitro multilineage phenotype. The mesenchymal differentiation of glioblastoma cells is advanced and similar to mesenchymal differentiation of normal neural progenitors GFAP+NNP. PMID:19216795

  20. Diastereoselective synthesis and molecular docking studies of novel fused tetrahydropyridine derivatives as new inhibitors of HIV protease

    NASA Astrophysics Data System (ADS)

    Mohammadi, Ali A.; Taheri, Salman; Amouzegar, Ali; Ahdenov, Reza; Halvagar, Mohammad Reza; Sadr, Ahmad Shahir

    2017-07-01

    An efficient one-pot, catalyst-free, and four-components procedure for the synthesis of novel 10b-hydroxy-4-nitro-5-phenyl-2,3,5,5a-tetrahydro-1H-imidazo[1,2-a]indeno[2,1-e]pyridin-6(10bH)-one derivatives from corresponding diamine, nitro ketene dithioacetal, aldehydes and 1,3-indandione in ethanol has been achieved upon a Knoevenagel condensation-Michael addition-tautomerism-cyclisation sequence. All the newly synthesized compounds were screened for molecular docking studies. Molecular docking studies were carried out using the crystal structure of HIV protease enzyme. Some of the compounds obtain minimum binding energy and good affinity toward the active pocket of HIV protease enzyme in compare with Saquinavir as a standard HIV protease inhibitor.

  1. Carbazole ligands as c-myc G-quadruplex binders.

    PubMed

    Głuszyńska, Agata; Juskowiak, Bernard; Kuta-Siejkowska, Martyna; Hoffmann, Marcin; Haider, Shozeb

    2018-07-15

    The interactions of c-myc G-quadruplex with three carbazole derivatives were investigated by UV-Vis spectrophotometry, fluorescence, CD spectroscopy, and molecular modeling. The results showed that a combination of carbazole scaffold functionalized with ethyl, triazole and imidazole groups resulted in stabilization of the intramolecular G-quadruplex formed by the DNA sequence derived from the NHE III 1 region of c-myc oncogene (Pu22). Binding to the G-quadruplex Pu22 resulted in the significant increase in fluorescence intensity of complexed ligands 1-3. All ligands were capable of interacting with G4 DNA with binding stoichiometry indicating that two ligand molecules bind to G-quadruplex with comparable affinity, which agrees with binding model of end-stacking on terminal G-tetrads. Copyright © 2018 Elsevier B.V. All rights reserved.

  2. A hierarchical approach to reliability modeling of fault-tolerant systems. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Gossman, W. E.

    1986-01-01

    A methodology for performing fault tolerant system reliability analysis is presented. The method decomposes a system into its subsystems, evaluates vent rates derived from the subsystem's conditional state probability vector and incorporates those results into a hierarchical Markov model of the system. This is done in a manner that addresses failure sequence dependence associated with the system's redundancy management strategy. The method is derived for application to a specific system definition. Results are presented that compare the hierarchical model's unreliability prediction to that of a more complicated tandard Markov model of the system. The results for the example given indicate that the hierarchical method predicts system unreliability to a desirable level of accuracy while achieving significant computational savings relative to component level Markov model of the system.

  3. Single cell RNA sequencing of stem cell-derived retinal ganglion cells.

    PubMed

    Daniszewski, Maciej; Senabouth, Anne; Nguyen, Quan H; Crombie, Duncan E; Lukowski, Samuel W; Kulkarni, Tejal; Sluch, Valentin M; Jabbari, Jafar S; Chamling, Xitiz; Zack, Donald J; Pébay, Alice; Powell, Joseph E; Hewitt, Alex W

    2018-02-13

    We used single cell sequencing technology to characterize the transcriptomes of 1,174 human embryonic stem cell-derived retinal ganglion cells (RGCs) at the single cell level. The human embryonic stem cell line BRN3B-mCherry (A81-H7), was differentiated to RGCs using a guided differentiation approach. Cells were harvested at day 36 and prepared for single cell RNA sequencing. Our data indicates the presence of three distinct subpopulations of cells, with various degrees of maturity. One cluster of 288 cells showed increased expression of genes involved in axon guidance together with semaphorin interactions, cell-extracellular matrix interactions and ECM proteoglycans, suggestive of a more mature RGC phenotype.

  4. Impact of exogenous sequences on the characteristics of an epidemic type 2 recombinant vaccine-derived poliovirus.

    PubMed

    Riquet, Franck B; Blanchard, Claire; Jegouic, Sophie; Balanant, Jean; Guillot, Sophie; Vibet, Marie-Anne; Rakoto-Andrianarivelo, Mala; Delpeyroux, Francis

    2008-09-01

    Pathogenic circulating vaccine-derived polioviruses (cVDPVs) have become a major obstacle to the successful completion of the global polio eradication program. Most cVDPVs are recombinant between the oral poliovirus vaccine (OPV) and human enterovirus species C (HEV-C). To study the role of HEV-C sequences in the phenotype of cVDPVs, we generated a series of recombinants between a Madagascar cVDPV isolate and its parental OPV type 2 strain. Results indicated that the HEV-C sequences present in this cVDPV contribute to its characteristics, including pathogenicity, suggesting that interspecific recombination contributes to the phenotypic biodiversity of polioviruses and may favor the emergence of cVDPVs.

  5. HLA genotyping by next-generation sequencing of complementary DNA.

    PubMed

    Segawa, Hidenobu; Kukita, Yoji; Kato, Kikuya

    2017-11-28

    Genotyping of the human leucocyte antigen (HLA) is indispensable for various medical treatments. However, unambiguous genotyping is technically challenging due to high polymorphism of the corresponding genomic region. Next-generation sequencing is changing the landscape of genotyping. In addition to high throughput of data, its additional advantage is that DNA templates are derived from single molecules, which is a strong merit for the phasing problem. Although most currently developed technologies use genomic DNA, use of cDNA could enable genotyping with reduced costs in data production and analysis. We thus developed an HLA genotyping system based on next-generation sequencing of cDNA. Each HLA gene was divided into 3 or 4 target regions subjected to PCR amplification and subsequent sequencing with Ion Torrent PGM. The sequence data were then subjected to an automated analysis. The principle of the analysis was to construct candidate sequences generated from all possible combinations of variable bases and arrange them in decreasing order of the number of reads. Upon collecting candidate sequences from all target regions, 2 haplotypes were usually assigned. Cases not assigned 2 haplotypes were forwarded to 4 additional processes: selection of candidate sequences applying more stringent criteria, removal of artificial haplotypes, selection of candidate sequences with a relaxed threshold for sequence matching, and countermeasure for incomplete sequences in the HLA database. The genotyping system was evaluated using 30 samples; the overall accuracy was 97.0% at the field 3 level and 98.3% at the G group level. With one sample, genotyping of DPB1 was not completed due to short read size. We then developed a method for complete sequencing of individual molecules of the DPB1 gene, using the molecular barcode technology. The performance of the automatic genotyping system was comparable to that of systems developed in previous studies. Thus, next-generation sequencing of cDNA is a viable option for HLA genotyping.

  6. RNABindRPlus: a predictor that combines machine learning and sequence homology-based methods to improve the reliability of predicted RNA-binding residues in proteins.

    PubMed

    Walia, Rasna R; Xue, Li C; Wilkins, Katherine; El-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant

    2014-01-01

    Protein-RNA interactions are central to essential cellular processes such as protein synthesis and regulation of gene expression and play roles in human infectious and genetic diseases. Reliable identification of protein-RNA interfaces is critical for understanding the structural bases and functional implications of such interactions and for developing effective approaches to rational drug design. Sequence-based computational methods offer a viable, cost-effective way to identify putative RNA-binding residues in RNA-binding proteins. Here we report two novel approaches: (i) HomPRIP, a sequence homology-based method for predicting RNA-binding sites in proteins; (ii) RNABindRPlus, a new method that combines predictions from HomPRIP with those from an optimized Support Vector Machine (SVM) classifier trained on a benchmark dataset of 198 RNA-binding proteins. Although highly reliable, HomPRIP cannot make predictions for the unaligned parts of query proteins and its coverage is limited by the availability of close sequence homologs of the query protein with experimentally determined RNA-binding sites. RNABindRPlus overcomes these limitations. We compared the performance of HomPRIP and RNABindRPlus with that of several state-of-the-art predictors on two test sets, RB44 and RB111. On a subset of proteins for which homologs with experimentally determined interfaces could be reliably identified, HomPRIP outperformed all other methods achieving an MCC of 0.63 on RB44 and 0.83 on RB111. RNABindRPlus was able to predict RNA-binding residues of all proteins in both test sets, achieving an MCC of 0.55 and 0.37, respectively, and outperforming all other methods, including those that make use of structure-derived features of proteins. More importantly, RNABindRPlus outperforms all other methods for any choice of tradeoff between precision and recall. An important advantage of both HomPRIP and RNABindRPlus is that they rely on readily available sequence and sequence-derived features of RNA-binding proteins. A webserver implementation of both methods is freely available at http://einstein.cs.iastate.edu/RNABindRPlus/.

  7. Childhood maternal care is associated with DNA methylation of the genes for brain-derived neurotrophic factor (BDNF) and oxytocin receptor (OXTR) in peripheral blood cells in adult men and women.

    PubMed

    Unternaehrer, Eva; Meyer, Andrea Hans; Burkhardt, Susan C A; Dempster, Emma; Staehli, Simon; Theill, Nathan; Lieb, Roselind; Meinlschmidt, Gunther

    2015-01-01

    In adults, reporting low and high maternal care in childhood, we compared DNA methylation in two stress-associated genes (two target sequences in the oxytocin receptor gene, OXTR; one in the brain-derived neurotrophic factor gene, BDNF) in peripheral whole blood, in a cross-sectional study (University of Basel, Switzerland) during 2007-2008. We recruited 89 participants scoring < 27 (n = 47, 36 women) or > 33 (n = 42, 35 women) on the maternal care subscale of the Parental Bonding Instrument (PBI) at a previous assessment of a larger group (N = 709, range PBI maternal care = 0-36, age range = 19-66 years; median 24 years). 85 participants gave blood for DNA methylation analyses (Sequenom(R) EpiTYPER, San Diego, CA) and cell count (Sysmex PocH-100i™, Kobe, Japan). Mixed model statistical analysis showed greater DNA methylation in the low versus high maternal care group, in the BDNF target sequence [Likelihood-Ratio (1) = 4.47; p = 0.035] and in one OXTR target sequence Likelihood-Ratio (1) = 4.33; p = 0.037], but not the second OXTR target sequence [Likelihood-Ratio (1) < 0.001; p = 0.995). Mediation analyses indicated that differential blood cell count did not explain associations between low maternal care and BDNF (estimate = -0.005, 95% CI = -0.025 to 0.015; p = 0.626) or OXTR DNA methylation (estimate = -0.015, 95% CI = -0.038 to 0.008; p = 0.192). Hence, low maternal care in childhood was associated with greater DNA methylation in an OXTR and a BDNF target sequence in blood cells in adulthood. Although the study has limitations (cross-sectional, a wide age range, only three target sequences in two genes studied, small effects, uncertain relevance of changes in blood cells to gene methylation in brain), the findings may indicate components of the epiphenotype from early life stress.

  8. Analysis and functional annotation of expressed sequence tags from the fall armyworm Spodoptera frugiperda

    PubMed Central

    Deng, Youping; Dong, Yinghua; Thodima, Venkata; Clem, Rollie J; Passarelli, A Lorena

    2006-01-01

    Background Little is known about the genome sequences of lepidopteran insects, although this group of insects has been studied extensively in the fields of endocrinology, development, immunity, and pathogen-host interactions. In addition, cell lines derived from Spodoptera frugiperda and other lepidopteran insects are routinely used for baculovirus foreign gene expression. This study reports the results of an expressed sequence tag (EST) sequencing project in cells from the lepidopteran insect S. frugiperda, the fall armyworm. Results We have constructed an EST database using two cDNA libraries from the S. frugiperda-derived cell line, SF-21. The database consists of 2,367 ESTs which were assembled into 244 contigs and 951 singlets for a total of 1,195 unique sequences. Conclusion S. frugiperda is an agriculturally important pest insect and genomic information will be instrumental for establishing initial transcriptional profiling and gene function studies, and for obtaining information about genes manipulated during infections by insect pathogens such as baculoviruses. PMID:17052344

  9. Intragenic sequences in the trophectoderm harbour the greatest proportion of methylation errors in day 17 bovine conceptuses generated using assisted reproductive technologies.

    PubMed

    O'Doherty, Alan M; McGettigan, Paul; Irwin, Rachelle E; Magee, David A; Gagne, Dominic; Fournier, Eric; Al-Naib, Abdullah; Sirard, Marc-André; Walsh, Colum P; Robert, Claude; Fair, Trudee

    2018-06-05

    Assisted reproductive technologies (ART) are widely used to treat fertility issues in humans and for the production of embryos in mammalian livestock. The use of these techniques, however, is not without consequence as they are often associated with inauspicious pre- and postnatal outcomes including premature birth, intrauterine growth restriction and increased incidence of epigenetic disorders in human and large offspring syndrome in cattle. Here, global DNA methylation profiles in the trophectoderm and embryonic discs of in vitro produced (IVP), superovulation-derived (SOV) and unstimulated, synchronised control day 17 bovine conceptuses (herein referred to as AI) were interrogated using the EmbryoGENE DNA Methylation Array (EDMA). Pyrosequencing was used to validate four loci identified as differentially methylated on the array and to assess the differentially methylated regions (DMRs) of six imprinted genes in these conceptuses. The impact of embryo-production induced DNA methylation aberrations was determined using Ingenuity Pathway Analysis, shedding light on the potential functional consequences of these differences. Of the total number of differentially methylated loci identified (3140) 77.3 and 22.7% were attributable to SOV and IVP, respectively. Differential methylation was most prominent at intragenic sequences within the trophectoderm of IVP and SOV-derived conceptuses, almost a third (30.8%) of the differentially methylated loci mapped to intragenic regions. Very few differentially methylated loci were detected in embryonic discs (ED); 0.16 and 4.9% of the differentially methylated loci were located in the ED of SOV-derived and IVP conceptuses, respectively. The overall effects of SOV and IVP on the direction of methylation changes were associated with increased methylation; 70.6% of the differentially methylated loci in SOV-derived conceptuses and 57.9% of the loci in IVP-derived conceptuses were more methylated compared to AI-conceptuses. Ontology analysis of probes associated with intragenic sequences suggests enrichment for terms associated with cancer, cell morphology and growth. By examining (1) the effects of superovulation and (2) the effects of an in vitro system (oocyte maturation, fertilisation and embryo culture) we have identified that the assisted reproduction process of superovulation alone has the largest impact on the DNA methylome of subsequent embryos.

  10. Phylogenomics of Brazilian epidemic isolates of Mycobacterium abscessus subsp. bolletii reveals relationships of global outbreak strains

    PubMed Central

    Davidson, Rebecca M.; Hasan, Nabeeh A.; de Moura, Vinicius Calado Nogueira; Duarte, Rafael Silva; Jackson, Mary; Strong, Michael

    2013-01-01

    Rapidly growing, non-tuberculous mycobacteria (NTM) in the Mycobacterium abscessus (MAB) species are emerging pathogens that cause various diseases including skin and respiratory infections. The species has undergone recent taxonomic nomenclature refinement, and is currently recognized as two subspecies, M. abscessus subsp. abscessus (MAB-A) and M. abscessus subsp. bolletii (MAB-B). The recently reported outbreaks of MAB-B in surgical patients in Brazil from 2004 to 2009 and in cystic fibrosis patients in the United Kingdom (UK) in 2006 to 2012 underscore the need to investigate the genetic diversity of clinical MAB strains. To this end, we sequenced the genomes of two Brazilian MAB-B epidemic isolates (CRM-0019 and CRM-0020) derived from an outbreak of skin infections in Rio de Janeiro, two unrelated MAB strains from patients with pulmonary infections in the United States (US) (NJH8 and NJH11) and one type MAB-B strain (CCUG 48898) and compared them to 25 publically available genomes of globally diverse MAB strains. Genome-wide analyses of 27,598 core genome single nucleotide polymorphisms (SNPs) revealed that the two Brazilian derived CRM strains are nearly indistinguishable from one another and are more closely related to UK outbreak isolates infecting CF patients than to strains from the US, Malaysia or France. Comparative genomic analyses of six closely related outbreak strains revealed geographic-specific large-scale insertion/deletion variation that corresponds to bacteriophage insertions and recombination hotspots. Our study integrates new genome sequence data with existing genomic information to explore the global diversity of infectious M. abscessus isolates and to compare clinically relevant outbreak strains from different continents. PMID:24055961

  11. Correlation of RNA secondary structure statistics with thermodynamic stability and applications to folding.

    PubMed

    Wu, Johnny C; Gardner, David P; Ozer, Stuart; Gutell, Robin R; Ren, Pengyu

    2009-08-28

    The accurate prediction of the secondary and tertiary structure of an RNA with different folding algorithms is dependent on several factors, including the energy functions. However, an RNA higher-order structure cannot be predicted accurately from its sequence based on a limited set of energy parameters. The inter- and intramolecular forces between this RNA and other small molecules and macromolecules, in addition to other factors in the cell such as pH, ionic strength, and temperature, influence the complex dynamics associated with transition of a single stranded RNA to its secondary and tertiary structure. Since all of the factors that affect the formation of an RNAs 3D structure cannot be determined experimentally, statistically derived potential energy has been used in the prediction of protein structure. In the current work, we evaluate the statistical free energy of various secondary structure motifs, including base-pair stacks, hairpin loops, and internal loops, using their statistical frequency obtained from the comparative analysis of more than 50,000 RNA sequences stored in the RNA Comparative Analysis Database (rCAD) at the Comparative RNA Web (CRW) Site. Statistical energy was computed from the structural statistics for several datasets. While the statistical energy for a base-pair stack correlates with experimentally derived free energy values, suggesting a Boltzmann-like distribution, variation is observed between different molecules and their location on the phylogenetic tree of life. Our statistical energy values calculated for several structural elements were utilized in the Mfold RNA-folding algorithm. The combined statistical energy values for base-pair stacks, hairpins and internal loop flanks result in a significant improvement in the accuracy of secondary structure prediction; the hairpin flanks contribute the most.

  12. Cloning and expression of L-asparaginase gene in Escherichia coli.

    PubMed

    Wang, Y; Qian, S; Meng, G; Zhang, S

    2001-08-01

    The L-asparaginase (ASN) from Escherichia coli AS1.357 was cloned as a DNA fragment generated using polymerase chain reaction technology and primers derived from conserved regions of published ASN gene sequences. Recombinant plasmid pASN containing ASN gene and expression vector pBV220 was transformed in different E. coli host strains. The activity and expression level of ASN in the engineering strains could reach 228 IU/mL of culture fluid and about 50% of the total soluble cell protein respectively, more than 40-fold the enzyme activity of the wild strain. The recombinant plasmid in E. coli AS1.357 remained stable after 72 h of cultivation and 5 h of heat induction without selective pressure. The ASN gene of E. coli AS1.357 was sequenced and had high homology compared to the reported data.

  13. Provenance implications of Th U Pb electron microprobe ages from detrital monazite in the Carboniferous Upper Silesia Coal Basin, Poland

    NASA Astrophysics Data System (ADS)

    Kusiak, Monika Agnieszka; Kędzior, Artur; Paszkowski, Mariusz; Suzuki, Kazuhiro; González-Álvarez, Ignacio; Wajsprych, Bolesław; Doktor, Marek

    2006-05-01

    This paper reports the results of CHIME (chemical Th-U-Pb isochron method) dating of detrital monazites from Carboniferous sandstones in the Upper Silesia Coal Basin (USCB). A total of 4739 spots on 863 monazite grains were analyzed from samples of sandstone derived from six stratigraphic units in the sedimentary sequence. Age distributions were identified in detrital monazites from the USCB sequence and correlated with specific dated domains in potential source areas. Most monazites in all samples yielded ca. 300-320 Ma (Variscan) ages; however, eo-Variscan, Caledonian and Cadomian ages were also obtained. The predominant ages are comparable to reported ages of certain tectonostratigraphic domains in the polyorogenic Bohemian Massif (BM), which suggests that various crystalline lithologies in the BM were the dominant sources of USCB sediments.

  14. Insertion sequence diversity in archaea.

    PubMed

    Filée, J; Siguier, P; Chandler, M

    2007-03-01

    Insertion sequences (ISs) can constitute an important component of prokaryotic (bacterial and archaeal) genomes. Over 1,500 individual ISs are included at present in the ISfinder database (www-is.biotoul.fr), and these represent only a small portion of those in the available prokaryotic genome sequences and those that are being discovered in ongoing sequencing projects. In spite of this diversity, the transposition mechanisms of only a few of these ubiquitous mobile genetic elements are known, and these are all restricted to those present in bacteria. This review presents an overview of ISs within the archaeal kingdom. We first provide a general historical summary of the known properties and behaviors of archaeal ISs. We then consider how transposition might be regulated in some cases by small antisense RNAs and by termination codon readthrough. This is followed by an extensive analysis of the IS content in the sequenced archaeal genomes present in the public databases as of June 2006, which provides an overview of their distribution among the major archaeal classes and species. We show that the diversity of archaeal ISs is very great and comparable to that of bacteria. We compare archaeal ISs to known bacterial ISs and find that most are clearly members of families first described for bacteria. Several cases of lateral gene transfer between bacteria and archaea are clearly documented, notably for methanogenic archaea. However, several archaeal ISs do not have bacterial equivalents but can be grouped into Archaea-specific groups or families. In addition to ISs, we identify and list nonautonomous IS-derived elements, such as miniature inverted-repeat transposable elements. Finally, we present a possible scenario for the evolutionary history of ISs in the Archaea.

  15. A search for debris disks in the Herschel-ATLAS

    NASA Astrophysics Data System (ADS)

    Thompson, M. A.; Smith, D. J. B.; Stevens, J. A.; Jarvis, M. J.; Vidal Perez, E.; Marshall, J.; Dunne, L.; Eales, S.; White, G. J.; Leeuw, L.; Sibthorpe, B.; Baes, M.; González-Solares, E.; Scott, D.; Vieiria, J.; Amblard, A.; Auld, R.; Bonfield, D. G.; Burgarella, D.; Buttiglione, S.; Cava, A.; Clements, D. L.; Cooray, A.; Dariush, A.; de Zotti, G.; Dye, S.; Eales, S.; Frayer, D.; Fritz, J.; Gonzalez-Nuevo, J.; Herranz, D.; Ibar, E.; Ivison, R. J.; Lagache, G.; Lopez-Caniego, M.; Maddox, S.; Negrello, M.; Pascale, E.; Pohlen, M.; Rigby, E.; Rodighiero, G.; Samui, S.; Serjeant, S.; Temi, P.; Valtchanov, I.; Verma, A.

    2010-07-01

    Aims: We aim to demonstrate that the Herschel-ATLAS (H-ATLAS) is suitable for a blind and unbiased survey for debris disks by identifying candidate debris disks associated with main sequence stars in the initial science demonstration field of the survey. We show that H-ATLAS reveals a population of far-infrared/sub-mm sources that are associated with stars or star-like objects on the SDSS main-sequence locus. We validate our approach by comparing the properties of the most likely candidate disks to those of the known population. Methods: We use a photometric selection technique to identify main sequence stars in the SDSS DR7 catalogue and a Bayesian Likelihood Ratio method to identify H-ATLAS catalogue sources associated with these main sequence stars. Following this photometric selection we apply distance cuts to identify the most likely candidate debris disks and rule out the presence of contaminating galaxies using UKIDSS LAS K-band images. Results: We identify 78 H-ATLAS sources associated with SDSS point sources on the main-sequence locus, of which two are the most likely debris disk candidates: H-ATLAS J090315.8 and H-ATLAS J090240.2. We show that they are plausible candidates by comparing their properties to the known population of debris disks. Our initial results indicate that bright debris disks are rare, with only 2 candidates identified in a search sample of 851 stars. We also show that H-ATLAS can derive useful upper limits for debris disks associated with Hipparcos stars in the field and outline the future prospects for our debris disk search programme. Herschel is an ESA space observatory with science instruments provided by European-led Principal Investigator consortia and with important participation from NASA.

  16. Diversity analysis in Cannabis sativa based on large-scale development of expressed sequence tag-derived simple sequence repeat markers.

    PubMed

    Gao, Chunsheng; Xin, Pengfei; Cheng, Chaohua; Tang, Qing; Chen, Ping; Wang, Changbiao; Zang, Gonggu; Zhao, Lining

    2014-01-01

    Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99%) were the most abundant, followed by hexanucleotide (25.13%), dinucleotide (16.34%), tetranucloetide (3.8%), and pentanucleotide (3.74%) repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96%) was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31%) were successfully amplified and 87 (74.36%) were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis.

  17. Diversity Analysis in Cannabis sativa Based on Large-Scale Development of Expressed Sequence Tag-Derived Simple Sequence Repeat Markers

    PubMed Central

    Cheng, Chaohua; Tang, Qing; Chen, Ping; Wang, Changbiao; Zang, Gonggu; Zhao, Lining

    2014-01-01

    Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99%) were the most abundant, followed by hexanucleotide (25.13%), dinucleotide (16.34%), tetranucloetide (3.8%), and pentanucleotide (3.74%) repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96%) was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31%) were successfully amplified and 87 (74.36%) were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis. PMID:25329551

  18. Comparative Genomics Analysis of Streptomyces Species Reveals Their Adaptation to the Marine Environment and Their Diversity at the Genomic Level

    PubMed Central

    Tian, Xinpeng; Zhang, Zhewen; Yang, Tingting; Chen, Meili; Li, Jie; Chen, Fei; Yang, Jin; Li, Wenjie; Zhang, Bing; Zhang, Zhang; Wu, Jiayan; Zhang, Changsheng; Long, Lijuan; Xiao, Jingfa

    2016-01-01

    Over 200 genomes of streptomycete strains that were isolated from various environments are available from the NCBI. However, little is known about the characteristics that are linked to marine adaptation in marine-derived streptomycetes. The particularity and complexity of the marine environment suggest that marine streptomycetes are genetically diverse. Here, we sequenced nine strains from the Streptomyces genus that were isolated from different longitudes, latitudes, and depths of the South China Sea. Then we compared these strains to 22 NCBI downloaded streptomycete strains. Thirty-one streptomycete strains are clearly grouped into a marine-derived subgroup and multiple source subgroup-based phylogenetic tree. The phylogenetic analyses have revealed the dynamic process underlying streptomycete genome evolution, and lateral gene transfer is an important driving force during the process. Pan-genomics analyses have revealed that streptomycetes have an open pan-genome, which reflects the diversity of these streptomycetes and guarantees the species a quick and economical response to diverse environments. Functional and comparative genomics analyses indicate that the marine-derived streptomycetes subgroup possesses some common characteristics of marine adaptation. Our findings have expanded our knowledge of how ocean isolates of streptomycete strains adapt to marine environments. The availability of streptomycete genomes from the South China Sea will be beneficial for further analysis on marine streptomycetes and will enrich the South China Sea’s genetic data sources. PMID:27446038

  19. Optimizing exosomal RNA isolation for RNA-Seq analyses of archival sera specimens.

    PubMed

    Prendergast, Emily N; de Souza Fonseca, Marcos Abraão; Dezem, Felipe Segato; Lester, Jenny; Karlan, Beth Y; Noushmehr, Houtan; Lin, Xianzhi; Lawrenson, Kate

    2018-01-01

    Exosomes are endosome-derived membrane vesicles that contain proteins, lipids, and nucleic acids. The exosomal transcriptome mediates intercellular communication, and represents an understudied reservoir of novel biomarkers for human diseases. Next-generation sequencing enables complex quantitative characterization of exosomal RNAs from diverse sources. However, detailed protocols describing exosome purification for preparation of exosomal RNA-sequence (RNA-Seq) libraries are lacking. Here we compared methods for isolation of exosomes and extraction of exosomal RNA from human cell-free serum, as well as strategies for attaining equal representation of samples within pooled RNA-Seq libraries. We compared commercial precipitation with ultracentrifugation for exosome purification and confirmed the presence of exosomes via both transmission electron microscopy and immunoblotting. Exosomal RNA extraction was compared using four different RNA purification methods. We determined the minimal starting volume of serum required for exosome preparation and showed that high quality exosomal RNA can be isolated from sera stored for over a decade. Finally, RNA-Seq libraries were successfully prepared with exosomal RNAs extracted from human cell-free serum, cataloguing both coding and non-coding exosomal transcripts. This method provides researchers with strategic options to prepare RNA-Seq libraries and compare RNA-Seq data quantitatively from minimal volumes of fresh and archival human cell-free serum for disease biomarker discovery.

  20. A subset of conserved mammalian long non-coding RNAs are fossils of ancestral protein-coding genes.

    PubMed

    Hezroni, Hadas; Ben-Tov Perry, Rotem; Meir, Zohar; Housman, Gali; Lubelsky, Yoav; Ulitsky, Igor

    2017-08-30

    Only a small portion of human long non-coding RNAs (lncRNAs) appear to be conserved outside of mammals, but the events underlying the birth of new lncRNAs in mammals remain largely unknown. One potential source is remnants of protein-coding genes that transitioned into lncRNAs. We systematically compare lncRNA and protein-coding loci across vertebrates, and estimate that up to 5% of conserved mammalian lncRNAs are derived from lost protein-coding genes. These lncRNAs have specific characteristics, such as broader expression domains, that set them apart from other lncRNAs. Fourteen lncRNAs have sequence similarity with the loci of the contemporary homologs of the lost protein-coding genes. We propose that selection acting on enhancer sequences is mostly responsible for retention of these regions. As an example of an RNA element from a protein-coding ancestor that was retained in the lncRNA, we describe in detail a short translated ORF in the JPX lncRNA that was derived from an upstream ORF in a protein-coding gene and retains some of its functionality. We estimate that ~ 55 annotated conserved human lncRNAs are derived from parts of ancestral protein-coding genes, and loss of coding potential is thus a non-negligible source of new lncRNAs. Some lncRNAs inherited regulatory elements influencing transcription and translation from their protein-coding ancestors and those elements can influence the expression breadth and functionality of these lncRNAs.

  1. The Number of Point Mutations in Induced Pluripotent Stem Cells and Nuclear Transfer Embryonic Stem Cells Depends on the Method and Somatic Cell Type Used for Their Generation.

    PubMed

    Araki, Ryoko; Mizutani, Eiji; Hoki, Yuko; Sunayama, Misato; Wakayama, Sayaka; Nagatomo, Hiroaki; Kasama, Yasuji; Nakamura, Miki; Wakayama, Teruhiko; Abe, Masumi

    2017-05-01

    Induced pluripotent stem cells hold great promise for regenerative medicine but point mutations have been identified in these cells and have raised serious concerns about their safe use. We generated nuclear transfer embryonic stem cells (ntESCs) from both mouse embryonic fibroblasts (MEFs) and tail-tip fibroblasts (TTFs) and by whole genome sequencing found fewer mutations compared with iPSCs generated by retroviral gene transduction. Furthermore, TTF-derived ntESCs showed only a very small number of point mutations, approximately 80% less than the number observed in iPSCs generated using retrovirus. Base substitution profile analysis confirmed this greatly reduced number of point mutations. The point mutations in iPSCs are therefore not a Yamanaka factor-specific phenomenon but are intrinsic to genome reprogramming. Moreover, the dramatic reduction in point mutations in ntESCs suggests that most are not essential for genome reprogramming. Our results suggest that it is feasible to reduce the point mutation frequency in iPSCs by optimizing various genome reprogramming conditions. We conducted whole genome sequencing of ntES cells derived from MEFs or TTFs. We thereby succeeded in establishing TTF-derived ntES cell lines with far fewer point mutations. Base substitution profile analysis of these clones also indicated a reduced point mutation frequency, moving from a transversion-predominance to a transition-predominance. Stem Cells 2017;35:1189-1196. © 2017 AlphaMed Press.

  2. Draft Genome Sequences of Three European Laboratory Derivatives from Enterohemorrhagic Escherichia coli O157:H7 Strain EDL933, Including Two Plasmids.

    PubMed

    Fellner, Lea; Huptas, Christopher; Simon, Svenja; Mühlig, Anna; Scherer, Siegfried; Neuhaus, Klaus

    2016-04-07

    Escherichia coliO157:H7 EDL933, isolated in 1982 in the United States, was the first enterohemorrhagicE. coli(EHEC) strain sequenced. Unfortunately, European labs can no longer receive the original strain. We checked three European EDL933 derivatives and found major genetic deviations (deletions, inversions) in two strains. All EDL933 strains contain the cryptic EHEC-plasmid, not reported before. Copyright © 2016 Fellner et al.

  3. Enantioselective Synthesis of α-Methylene-β-hydroxy Carboxylic Acid Derivatives via a Diastereoselective Aldol-β-Elimination Sequence: Application to the C(15)–C(21) Fragment of Tedanolide C

    PubMed Central

    Barth, Roland; Roush, William R.

    2010-01-01

    An enantioselective synthesis of α-methylene-β-hydroxy carboxylic acid derivatives via a highly diastereoselective, one-pot syn-aldol and β-elimination sequence utilizing the chiral β-(phenylselenyl)propionyl imide 15 is described. This new method, which constitutes an alternative to the Baylis-Hillman reaction, has been applied to the synthesis of the C(15)-C(21) fragment of tedanolide C. PMID:20405855

  4. RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure.

    PubMed

    Liu, Qi; Yang, Yu; Chen, Chun; Bu, Jiajun; Zhang, Yin; Ye, Xiuzi

    2008-03-31

    With the rapid emergence of RNA databases and newly identified non-coding RNAs, an efficient compression algorithm for RNA sequence and structural information is needed for the storage and analysis of such data. Although several algorithms for compressing DNA sequences have been proposed, none of them are suitable for the compression of RNA sequences with their secondary structures simultaneously. This kind of compression not only facilitates the maintenance of RNA data, but also supplies a novel way to measure the informational complexity of RNA structural data, raising the possibility of studying the relationship between the functional activities of RNA structures and their complexities, as well as various structural properties of RNA based on compression. RNACompress employs an efficient grammar-based model to compress RNA sequences and their secondary structures. The main goals of this algorithm are two fold: (1) present a robust and effective way for RNA structural data compression; (2) design a suitable model to represent RNA secondary structure as well as derive the informational complexity of the structural data based on compression. Our extensive tests have shown that RNACompress achieves a universally better compression ratio compared with other sequence-specific or common text-specific compression algorithms, such as Gencompress, winrar and gzip. Moreover, a test of the activities of distinct GTP-binding RNAs (aptamers) compared with their structural complexity shows that our defined informational complexity can be used to describe how complexity varies with activity. These results lead to an objective means of comparing the functional properties of heteropolymers from the information perspective. A universal algorithm for the compression of RNA secondary structure as well as the evaluation of its informational complexity is discussed in this paper. We have developed RNACompress, as a useful tool for academic users. Extensive tests have shown that RNACompress is a universally efficient algorithm for the compression of RNA sequences with their secondary structures. RNACompress also serves as a good measurement of the informational complexity of RNA secondary structure, which can be used to study the functional activities of RNA molecules.

  5. RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure

    PubMed Central

    Liu, Qi; Yang, Yu; Chen, Chun; Bu, Jiajun; Zhang, Yin; Ye, Xiuzi

    2008-01-01

    Background With the rapid emergence of RNA databases and newly identified non-coding RNAs, an efficient compression algorithm for RNA sequence and structural information is needed for the storage and analysis of such data. Although several algorithms for compressing DNA sequences have been proposed, none of them are suitable for the compression of RNA sequences with their secondary structures simultaneously. This kind of compression not only facilitates the maintenance of RNA data, but also supplies a novel way to measure the informational complexity of RNA structural data, raising the possibility of studying the relationship between the functional activities of RNA structures and their complexities, as well as various structural properties of RNA based on compression. Results RNACompress employs an efficient grammar-based model to compress RNA sequences and their secondary structures. The main goals of this algorithm are two fold: (1) present a robust and effective way for RNA structural data compression; (2) design a suitable model to represent RNA secondary structure as well as derive the informational complexity of the structural data based on compression. Our extensive tests have shown that RNACompress achieves a universally better compression ratio compared with other sequence-specific or common text-specific compression algorithms, such as Gencompress, winrar and gzip. Moreover, a test of the activities of distinct GTP-binding RNAs (aptamers) compared with their structural complexity shows that our defined informational complexity can be used to describe how complexity varies with activity. These results lead to an objective means of comparing the functional properties of heteropolymers from the information perspective. Conclusion A universal algorithm for the compression of RNA secondary structure as well as the evaluation of its informational complexity is discussed in this paper. We have developed RNACompress, as a useful tool for academic users. Extensive tests have shown that RNACompress is a universally efficient algorithm for the compression of RNA sequences with their secondary structures. RNACompress also serves as a good measurement of the informational complexity of RNA secondary structure, which can be used to study the functional activities of RNA molecules. PMID:18373878

  6. Improved Model for Predicting the Free Energy Contribution of Dinucleotide Bulges to RNA Duplex Stability.

    PubMed

    Tomcho, Jeremy C; Tillman, Magdalena R; Znosko, Brent M

    2015-09-01

    Predicting the secondary structure of RNA is an intermediate in predicting RNA three-dimensional structure. Commonly, determining RNA secondary structure from sequence uses free energy minimization and nearest neighbor parameters. Current algorithms utilize a sequence-independent model to predict free energy contributions of dinucleotide bulges. To determine if a sequence-dependent model would be more accurate, short RNA duplexes containing dinucleotide bulges with different sequences and nearest neighbor combinations were optically melted to derive thermodynamic parameters. These data suggested energy contributions of dinucleotide bulges were sequence-dependent, and a sequence-dependent model was derived. This model assigns free energy penalties based on the identity of nucleotides in the bulge (3.06 kcal/mol for two purines, 2.93 kcal/mol for two pyrimidines, 2.71 kcal/mol for 5'-purine-pyrimidine-3', and 2.41 kcal/mol for 5'-pyrimidine-purine-3'). The predictive model also includes a 0.45 kcal/mol penalty for an A-U pair adjacent to the bulge and a -0.28 kcal/mol bonus for a G-U pair adjacent to the bulge. The new sequence-dependent model results in predicted values within, on average, 0.17 kcal/mol of experimental values, a significant improvement over the sequence-independent model. This model and new experimental values can be incorporated into algorithms that predict RNA stability and secondary structure from sequence.

  7. Genetic and DNA sequence analysis of the kanamycin resistance transposon Tn903.

    PubMed Central

    Grindley, N D; Joyce, C M

    1980-01-01

    The kanamycin resistance transposon Tn903 consists of a unique region of about 1000 base pairs bounded by a pair of 1050-base-pair inverted repeat sequences. Each repeat contains two Pvu II endonuclease cleavage sites separated by 520 base pairs. We have constructed derivatives of Tn903 in which this 520-base-pair fragment is deleted from one or both repeats. Those derivatives that lack both 520-base-pair fragments cannot transpose, whereas those that lack just one remain transposition proficient. One such transposable derivative, Tn903 delta I, has been selected for further study. We have determined the sequence of the intact inverted repeat. The 18 base pairs at each end are identical and inverted relative to one another, a structure characteristic of insertion sequences. Additional experiments indicate that a single inverted repeat from Tn903 can, in fact, transpose; we propose that this element be called IS903. To correlate the DNA sequence with genetic activities, we have created mutations by inserting a 10-base-pair DNA fragment at several sites within the intact repeat of Tn903 delta 1, and we have examined the effect of such insertions on transposability. The results suggest that IS903 encodes a 307-amino-acid polypeptide (a "transposase") that is absolutely required for transposition of IS903 or Tn903. Images PMID:6261245

  8. Mapping Phylogenetic Trees to Reveal Distinct Patterns of Evolution

    PubMed Central

    Kendall, Michelle; Colijn, Caroline

    2016-01-01

    Evolutionary relationships are frequently described by phylogenetic trees, but a central barrier in many fields is the difficulty of interpreting data containing conflicting phylogenetic signals. We present a metric-based method for comparing trees which extracts distinct alternative evolutionary relationships embedded in data. We demonstrate detection and resolution of phylogenetic uncertainty in a recent study of anole lizards, leading to alternate hypotheses about their evolutionary relationships. We use our approach to compare trees derived from different genes of Ebolavirus and find that the VP30 gene has a distinct phylogenetic signature composed of three alternatives that differ in the deep branching structure. Key words: phylogenetics, evolution, tree metrics, genetics, sequencing. PMID:27343287

  9. Rooting gene trees without outgroups: EP rooting.

    PubMed

    Sinsheimer, Janet S; Little, Roderick J A; Lake, James A

    2012-01-01

    Gene sequences are routinely used to determine the topologies of unrooted phylogenetic trees, but many of the most important questions in evolution require knowing both the topologies and the roots of trees. However, general algorithms for calculating rooted trees from gene and genomic sequences in the absence of gene paralogs are few. Using the principles of evolutionary parsimony (EP) (Lake JA. 1987a. A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol. 4:167-181) and its extensions (Cavender, J. 1989. Mechanized derivation of linear invariants. Mol Biol Evol. 6:301-316; Nguyen T, Speed TP. 1992. A derivation of all linear invariants for a nonbalanced transversion model. J Mol Evol. 35:60-76), we explicitly enumerate all linear invariants that solely contain rooting information and derive algorithms for rooting gene trees directly from gene and genomic sequences. These new EP linear rooting invariants allow one to determine rooted trees, even in the complete absence of outgroups and gene paralogs. EP rooting invariants are explicitly derived for three taxon trees, and rules for their extension to four or more taxa are provided. The method is demonstrated using 18S ribosomal DNA to illustrate how the new animal phylogeny (Aguinaldo AMA et al. 1997. Evidence for a clade of nematodes, arthropods, and other moulting animals. Nature 387:489-493; Lake JA. 1990. Origin of the metazoa. Proc Natl Acad Sci USA 87:763-766) may be rooted directly from sequences, even when they are short and paralogs are unavailable. These results are consistent with the current root (Philippe H et al. 2011. Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature 470:255-260).

  10. Rooting Gene Trees without Outgroups: EP Rooting

    PubMed Central

    Sinsheimer, Janet S.; Little, Roderick J. A.; Lake, James A.

    2012-01-01

    Gene sequences are routinely used to determine the topologies of unrooted phylogenetic trees, but many of the most important questions in evolution require knowing both the topologies and the roots of trees. However, general algorithms for calculating rooted trees from gene and genomic sequences in the absence of gene paralogs are few. Using the principles of evolutionary parsimony (EP) (Lake JA. 1987a. A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol. 4:167–181) and its extensions (Cavender, J. 1989. Mechanized derivation of linear invariants. Mol Biol Evol. 6:301–316; Nguyen T, Speed TP. 1992. A derivation of all linear invariants for a nonbalanced transversion model. J Mol Evol. 35:60–76), we explicitly enumerate all linear invariants that solely contain rooting information and derive algorithms for rooting gene trees directly from gene and genomic sequences. These new EP linear rooting invariants allow one to determine rooted trees, even in the complete absence of outgroups and gene paralogs. EP rooting invariants are explicitly derived for three taxon trees, and rules for their extension to four or more taxa are provided. The method is demonstrated using 18S ribosomal DNA to illustrate how the new animal phylogeny (Aguinaldo AMA et al. 1997. Evidence for a clade of nematodes, arthropods, and other moulting animals. Nature 387:489–493; Lake JA. 1990. Origin of the metazoa. Proc Natl Acad Sci USA 87:763–766) may be rooted directly from sequences, even when they are short and paralogs are unavailable. These results are consistent with the current root (Philippe H et al. 2011. Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature 470:255–260). PMID:22593551

  11. HIV-1 RNAs are Not Part of the Argonaute 2 Associated RNA Interference Pathway in Macrophages.

    PubMed

    Vongrad, Valentina; Imig, Jochen; Mohammadi, Pejman; Kishore, Shivendra; Jaskiewicz, Lukasz; Hall, Jonathan; Günthard, Huldrych F; Beerenwinkel, Niko; Metzner, Karin J

    2015-01-01

    MiRNAs and other small noncoding RNAs (sncRNAs) are key players in post-transcriptional gene regulation. HIV-1 derived small noncoding RNAs (sncRNAs) have been described in HIV-1 infected cells, but their biological functions still remain to be elucidated. Here, we approached the question whether viral sncRNAs may play a role in the RNA interference (RNAi) pathway or whether viral mRNAs are targeted by cellular miRNAs in human monocyte derived macrophages (MDM). The incorporation of viral sncRNAs and/or their target RNAs into RNA-induced silencing complex was investigated using photoactivatable ribonucleoside-induced cross-linking and immunoprecipitation (PAR-CLIP) as well as high-throughput sequencing of RNA isolated by cross-linking immunoprecipitation (HITS-CLIP), which capture Argonaute2-bound miRNAs and their target RNAs. HIV-1 infected monocyte-derived macrophages (MDM) were chosen as target cells, as they have previously been shown to express HIV-1 sncRNAs. In addition, we applied small RNA deep sequencing to study differential cellular miRNA expression in HIV-1 infected versus non-infected MDMs. PAR-CLIP and HITS-CLIP data demonstrated the absence of HIV-1 RNAs in Ago2-RISC, although the presence of a multitude of HIV-1 sncRNAs in HIV-1 infected MDMs was confirmed by small RNA sequencing. Small RNA sequencing revealed that 1.4% of all sncRNAs were of HIV-1 origin. However, neither HIV-1 derived sncRNAs nor putative HIV-1 target sequences incorporated into Ago2-RISC were identified suggesting that HIV-1 sncRNAs are not involved in the canonical RNAi pathway nor is HIV-1 targeted by this pathway in HIV-1 infected macrophages.

  12. Phylogenetic characterization of a biogas plant microbial community integrating clone library 16S-rDNA sequences and metagenome sequence data obtained by 454-pyrosequencing.

    PubMed

    Kröber, Magdalena; Bekel, Thomas; Diaz, Naryttza N; Goesmann, Alexander; Jaenicke, Sebastian; Krause, Lutz; Miller, Dimitri; Runte, Kai J; Viehöver, Prisca; Pühler, Alfred; Schlüter, Andreas

    2009-06-01

    The phylogenetic structure of the microbial community residing in a fermentation sample from a production-scale biogas plant fed with maize silage, green rye and liquid manure was analysed by an integrated approach using clone library sequences and metagenome sequence data obtained by 454-pyrosequencing. Sequencing of 109 clones from a bacterial and an archaeal 16S-rDNA amplicon library revealed that the obtained nucleotide sequences are similar but not identical to 16S-rDNA database sequences derived from different anaerobic environments including digestors and bioreactors. Most of the bacterial 16S-rDNA sequences could be assigned to the phylum Firmicutes with the most abundant class Clostridia and to the class Bacteroidetes, whereas most archaeal 16S-rDNA sequences cluster close to the methanogen Methanoculleus bourgensis. Further sequences of the archaeal library most probably represent so far non-characterised species within the genus Methanoculleus. A similar result derived from phylogenetic analysis of mcrA clone sequences. The mcrA gene product encodes the alpha-subunit of methyl-coenzyme-M reductase involved in the final step of methanogenesis. BLASTn analysis applying stringent settings resulted in assignment of 16S-rDNA metagenome sequence reads to 62 16S-rDNA amplicon sequences thus enabling frequency of abundance estimations for 16S-rDNA clone library sequences. Ribosomal Database Project (RDP) Classifier processing of metagenome 16S-rDNA reads revealed abundance of the phyla Firmicutes, Bacteroidetes and Euryarchaeota and the orders Clostridiales, Bacteroidales and Methanomicrobiales. Moreover, a large fraction of 16S-rDNA metagenome reads could not be assigned to lower taxonomic ranks, demonstrating that numerous microorganisms in the analysed fermentation sample of the biogas plant are still unclassified or unknown.

  13. Genome sequencing of Deutsch strain of cattle ticks, Rhipicephalus microplus: Raw Pac Bio reads.

    USDA-ARS?s Scientific Manuscript database

    Pac Bio RS II whole genome shotgun sequencing technology was used to sequence the genome of the cattle tick, Rhipicephalus microplus. The DNA was derived from 14 day old eggs from the Deutsch Texas outbreak strain reared at the USDA-ARS Cattle Fever Tick Research Laboratory, Edinburg, TX. Each corre...

  14. Draft Genome Sequence of “Cohnella kolymensis” B-2846

    PubMed Central

    Kudryashova, Ekaterina B.; Ariskina, Elena V.

    2016-01-01

    A draft genome sequence of “Cohnella kolymensis” strain B-2846 was derived using IonTorrent sequencing technology. The size of the assembly and G+C content were in agreement with those of other species of this genus. Characterization of the genome of a novel species of Cohnella will assist in bacterial systematics. PMID:26769947

  15. Frequency Effects on ESL Compositional Multi-Word Sequence Processing

    ERIC Educational Resources Information Center

    Supasiraprapa, Sarut

    2017-01-01

    The current study investigated whether adult native English speakers and English-as-a-second-language (ESL) learners exhibit sensitivity to compositional English multi-word sequences, which have a meaning derivable from word parts (e.g., don't have to worry as opposed to sequences like He left the US for good, where for good cannot be taken apart…

  16. Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics.

    PubMed

    Ren, Jie; Song, Kai; Deng, Minghua; Reinert, Gesine; Cannon, Charles H; Sun, Fengzhu

    2016-04-01

    Next-generation sequencing (NGS) technologies generate large amounts of short read data for many different organisms. The fact that NGS reads are generally short makes it challenging to assemble the reads and reconstruct the original genome sequence. For clustering genomes using such NGS data, word-count based alignment-free sequence comparison is a promising approach, but for this approach, the underlying expected word counts are essential.A plausible model for this underlying distribution of word counts is given through modeling the DNA sequence as a Markov chain (MC). For single long sequences, efficient statistics are available to estimate the order of MCs and the transition probability matrix for the sequences. As NGS data do not provide a single long sequence, inference methods on Markovian properties of sequences based on single long sequences cannot be directly used for NGS short read data. Here we derive a normal approximation for such word counts. We also show that the traditional Chi-square statistic has an approximate gamma distribution ,: using the Lander-Waterman model for physical mapping. We propose several methods to estimate the order of the MC based on NGS reads and evaluate those using simulations. We illustrate the applications of our results by clustering genomic sequences of several vertebrate and tree species based on NGS reads using alignment-free sequence dissimilarity measures. We find that the estimated order of the MC has a considerable effect on the clustering results ,: and that the clustering results that use a N: MC of the estimated order give a plausible clustering of the species. Our implementation of the statistics developed here is available as R package 'NGS.MC' at http://www-rcf.usc.edu/∼fsun/Programs/NGS-MC/NGS-MC.html fsun@usc.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. MRI to delineate the gross tumor volume of nasopharyngeal cancers: which sequences and planes should be used?

    PubMed

    Popovtzer, Aron; Ibrahim, Mohannad; Tatro, Daniel; Feng, Felix Y; Ten Haken, Randall K; Eisbruch, Avraham

    2014-09-01

    Magnetic resonance imaging (MRI) has been found to be better than computed tomography for defining the extent of primary gross tumor volume (GTV) in advanced nasopharyngeal cancer. It is routinely applied for target delineation in planning radiotherapy. However, the specific MRI sequences/planes that should be used are unknown. Twelve patients with nasopharyngeal cancer underwent primary GTV evaluation with gadolinium-enhanced axial T1 weighted image (T1) and T2 weighted image (T2), coronal T1, and sagittal T1 sequences. Each sequence was registered with the planning computed tomography scans. Planning target volumes (PTVs) were derived by uniform expansions of the GTVs. The volumes encompassed by the various sequences/planes, and the volumes common to all sequences/planes, were compared quantitatively and anatomically to the volume delineated by the commonly used axial T1-based dataset. Addition of the axial T2 sequence increased the axial T1-based GTV by 12% on average (p = 0.004), and composite evaluations that included the coronal T1 and sagittal T1 planes increased the axial T1-based GTVs by 30% on average (p = 0.003). The axial T1-based PTVs were increased by 20% by the additional sequences (p = 0.04). Each sequence/plane added unique volume extensions. The GTVs common to all the T1 planes accounted for 38% of the total volumes of all the T1 planes. Anatomically, addition of the coronal and sagittal-based GTVs extended the axial T1-based GTV caudally and cranially, notably to the base of the skull. Adding MRI planes and sequences to the traditional axial T1 sequence yields significant quantitative and anatomically important extensions of the GTVs and PTVs. For accurate target delineation in nasopharyngeal cancer, we recommend that GTVs be outlined in all MRI sequences/planes and registered with the planning computed tomography scans.

  18. Complete Genome Sequence of Escherichia coli ER1821R, a Laboratory K-12 Derivative Engineered To Be Deficient in All Methylcytosine and Methyladenine Restriction Systems.

    PubMed

    Jobling, Michael G; Raleigh, Elisabeth A; Frank, Daniel N

    2016-08-11

    We present here the complete genomic sequence of a rifampin-resistant derivative of the Escherichia coli K-12 laboratory strain ER1821, engineered to be deficient in all known restriction systems, making it suitable for generating unbiased libraries from organisms with non-K-12 methylation patterns. The ER1821R genome is most closely related to that of DH1, another popular cloning strain (both derived from MM294), but is deleted for the e14 prophage (McrA(-)) and the immigration control (McrBC(-) EcoKI R(-) M(-) Mrr(-)) loci. Copyright © 2016 Jobling et al.

  19. Enhanced cellulase producing mutants developed from heterokaryotic Aspergillus strain.

    PubMed

    Kaur, Baljit; Oberoi, H S; Chadha, B S

    2014-03-01

    A heterokaryon 28, derived through protoplast fusion between Aspergillus nidulans and Aspergillus tubingensis (Dal8), was subjected cyclic mutagenesis followed by selection on increasing levels of 2-deoxy glucose (2-DG) as selection marker. The derived deregulated cellulase hyper producing mutant '64', when compared to fusant 28, produced 9.83, 7.8, 3.2, 4.2 and 19.74 folds higher endoglucanase, β-glucosidase, cellobiohydrolase, FPase and xylanase, respectively, under shake cultures. The sequence analysis of PCR amplified β-glucosidase gene from wild and mutant showed nucleotide deletion/substitution. The mutants showed highly catalytic efficient β-glucosidase as evident from low Km and high Vmax values. The expression profiling through zymogram analysis also indicated towards over-expression of cellulases. The up/down regulated expressed proteins observed through SDS-PAGE were identified by Peptide mass fingerprinting The cellulase produced by mutants in conjunction with cellulase free xylanase derived from Thermomyces lanuginosus was used for efficient utilization of alkali treated rice straw for obtaining xylo-oligosaccharides and ethanol. Copyright © 2014 Elsevier Ltd. All rights reserved.

  20. Isolation of Onchocerca lupi in Dogs and Black Flies, California, USA

    PubMed Central

    Hassan, Hassan K.; Bolcen, Shanna; Kubofcik, Joseph; Nutman, Thomas B.; Eberhard, Mark L.; Middleton, Kelly; Wekesa, Joseph Wakoli; Ruedas, Gimena; Nelson, Kimberly J.; Dubielzig, Richard; De Lombaert, Melissa; Silverman, Bruce; Schorling, Jamie J.; Adler, Peter H.; Beeler, Emily S.

    2015-01-01

    In southern California, ocular infections caused by Onchocerca lupi were diagnosed in 3 dogs (1 in 2006, 2 in 2012). The infectious agent was confirmed through morphologic analysis of fixed parasites in tissues and by PCR and sequencing of amplicons derived from 2 mitochondrially encoded genes and 1 nuclear-encoded gene. A nested PCR based on the sequence of the cytochrome oxidase subunit 1 gene of the parasite was developed and used to screen Simulium black flies collected from southern California for O. lupi DNA. Six (2.8%; 95% CI 0.6%–5.0%) of 213 black flies contained O. lupi DNA. Partial mitochondrial16S rRNA gene sequences from the infected flies matched sequences derived from black fly larvae cytotaxonomically identified as Simulium tribulatum. These data implicate S. tribulatum flies as a putative vector for O. lupi in southern California. PMID:25897954

Top