Science.gov

Sample records for biological sequence comparison

  1. Method and apparatus for biological sequence comparison

    DOEpatents

    Marr, Thomas G.; Chang, William I-Wei

    1997-01-01

    A method and apparatus for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence.

  2. Method and apparatus for biological sequence comparison

    DOEpatents

    Marr, T.G.; Chang, W.I.

    1997-12-23

    A method and apparatus are disclosed for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence. 5 figs.

  3. Supercomputers and biological sequence comparison algorithms.

    PubMed

    Core, N G; Edmiston, E W; Saltz, J H; Smith, R M

    1989-12-01

    Comparison of biological (DNA or protein) sequences provides insight into molecular structure, function, and homology and is increasingly important as the available databases become larger and more numerous. One method of increasing the speed of the calculations is to perform them in parallel. We present the results of initial investigations using two dynamic programming algorithms on the Intel iPSC hypercube and the Connection Machine as well as an inexpensive, heuristically-based algorithm on the Encore Multimax.

  4. Parallel Computation of Multiple Biological Sequence Comparisons

    DTIC Science & Technology

    1989-07-01

    Stearothermophilus 408 Bacillus Megaterium 411 Bacillus Brevis 354 Pseudomonas Fluorescens 375 Salmonella Typhi 377 Escherichia Coli 282 Saccharomyces Octosporus...This included implied secondary structure and conservation of pairs of nucleotides that are complementary. The first four sequences are all Bacillus ...need to obtain sequences of ribonuclease P RNA from additional species to provide a more 13 Length Name 401 Bacillus Subtilis 417 Bacillus

  5. A platform for biological sequence comparison on parallel computers.

    PubMed

    Deshpande, A S; Richards, D S; Pearson, W R

    1991-04-01

    We have written two programs for searching biological sequence databases that run on Intel hypercube computers. PSCANLIB compares a single sequence against a sequence library, and PCOMPLIB compares all the entries in one sequence library against a second library. The programs provide a general framework for similarity searching; they include functions for reading in query sequences, search parameters and library entries, and reporting the results of a search. We have isolated the code for the specific function that calculates the similarity score between the query and library sequence; alternative searching algorithms can be implemented by editing two files. We have implemented the rapid FASTA sequence comparison algorithm and the more rigorous Smith-Waterman algorithm within this framework. The PSCANLIB program on a 16 node iPSC/2 80386-based hypercube can compare a 229 amino acid protein sequence with a 3.4 million residue sequence library in approximately 16 s with the FASTA algorithm. Using the Smith-Waterman algorithm, the same search takes 35 min. The PCOMPLIB program can compare a 0.8 million amino acid protein sequence library with itself in 5.3 min with FASTA on a third-generation 32 node Intel iPSC/860 hypercube.

  6. A Guaranteed Similarity Metric Learning Framework for Biological Sequence Comparison.

    PubMed

    Hua, Keru; Yu, Qin; Zhang, Ruiming

    2016-01-01

    Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. The distance and similarity between two sequence are very important and widely studied. During the last decades, Similarity(distance) metric learning is one of the hottest topics of machine learning/data mining as well as their applications in the bioinformatics field. It is feasible to introduce machine learning technology to learn similarity metric from biological data. In this paper, we propose a novel framework of guaranteed similarity metric learning (GMSL) to perform alignment of biology sequences in any feature vector space. It introduces the (ϵ, γ, τ)-goodness similarity theory to Mahalanobis metric learning. As a theoretical guaranteed similarity metric learning approach, GMSL guarantees that the learned similarity function performs well in classification and clustering. Our experiments on the most used datasets demonstrate that our approach outperforms the state-of-the-art biological sequences alignment methods and other similarity metric learning algorithms in both accuracy and stability.

  7. Comparison of biological and chemical phosphorus removals in continuous and sequencing batch reactors

    SciTech Connect

    Ketchum, L.H.; Irvine, R.L. Jr.; Breyfogle, R.E.; Manning, J.F. Jr.

    1987-01-01

    A full-scale study of phosphorus removal has been conducted at Culver using continuous-flow operation, SBR operation, and several different chemical treatment schemes. A full-scale demonstration of SBR biological phosphorus removal also has been shown to be effective. Four contributing groups of organisms and their roles in biological SBR phosphorus removal have been described: denitrifying organisms, fermentation product-manufacturing organisms, phosphorus- accumulating organisms, and aerobic autotrophs and heterotrophs. The SBR can provide the proper balance of anoxic, anaerobic, and aerobic conditions to allow these group of organisms to successfully remove phosphorus biologically, without chemical addition. Treatment results using various chemicals for phosphorus removal, both during conventional, continuous-flow operation and after the plant was converted for SBR operation, have also been provided for comparison. Effluent phosphorus concentrations were almost identical for each period, except for the period when phosphorus was removed biologically and without any chemical addition when effluent phosphorus concentrations were the lowest. These removals were made as a result of settling alone; no tertiary rapid stand filter was used or required.

  8. Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

    PubMed Central

    Nadkarni, P. M.; Miller, P. L.

    1991-01-01

    A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations. PMID:1807632

  9. Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

    PubMed

    Nadkarni, P M; Miller, P L

    1991-01-01

    A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.

  10. Iranian johnsongrass mosaic virus: the complete genome sequence, molecular and biological characterization, and comparison of coat protein gene sequences.

    PubMed

    Moradi, Zohreh; Mehrvar, Mohsen; Nazifi, Ehsan; Zakiaghl, Mohammad

    2017-02-01

    Iranian johnsongrass mosaic virus (IJMV) is one of the most prevalent viruses causing maize mosaic disease in Iran. An IJMV isolate, Maz-Bah, was obtained from the maize showing mosaic symptoms in Mazandaran, north of Iran. The complete genomic sequence of Maz-Bah is 9544 nucleotides, excluding the poly(A) tail. It contains one single open reading frame of 9165 nucleotides and encodes a large polyprotein of 3054 amino acids, flanked by a 5'-untranslated region (UTR) of 143 nucleotides and a 3'-UTR of 236 nucleotides. The entire genomic sequence of Maz-Bah isolate shares identities of 84.9 and 94.2 % with the IJMV (Shz) isolate, the lone complete genome sequence available in the GenBank at the nucleotide (nt) and deduced amino acid (aa) levels, respectively. The whole genome sequences share identities of 51.5-69.8 and 44.9-74.3 % with those of other Sugarcane mosaic virus (SCMV) subgroup potyviruses at nt and aa levels, respectively. In phylogenetic trees based on the multiple alignments of the entire nt and aa sequences, IJMV isolates formed a separate sublineage of the tree with potyviruses infecting monocotyledons of cereals, indicating that IJMV is a member of SCMV subgroup of potyviruses. IJMV is most closely related to Sorghum mosaic virus and Maize dwarf mosaic virus and less closely related to the Johnsongrass mosaic virus and Cocksfoot streak virus. To further investigate the genetic relationship of IJMV, 9 other isolates from different hosts were cloned and sequenced. The identity of IJMV CP nt and aa sequences of 11 Iranian isolates ranged from 86.4 to 99.8 % and 90.5 to 99.7 %, respectively, indicating a high nt variability in CP gene. Furthermore, in the CP-based phylogenetic tree, IJMV isolates were clustered together with a maize potyvirus described as Zea mosaic virus from Israel (with 86-89 % nt identity), indicating that both isolates probably are the strains of the same virus.

  11. Zucchini yellow mosaic virus: biological properties, detection procedures and comparison of coat protein gene sequences.

    PubMed

    Coutts, B A; Kehoe, M A; Webster, C G; Wylie, S J; Jones, R A C

    2011-12-01

    Between 2006 and 2010, 5324 samples from at least 34 weed, two cultivated legume and 11 native species were collected from three cucurbit-growing areas in tropical or subtropical Western Australia. Two new alternative hosts of zucchini yellow mosaic virus (ZYMV) were identified, the Australian native cucurbit Cucumis maderaspatanus, and the naturalised legume species Rhyncosia minima. Low-level (0.7%) seed transmission of ZYMV was found in seedlings grown from seed collected from zucchini (Cucurbita pepo) fruit infected with isolate Cvn-1. Seed transmission was absent in >9500 pumpkin (C. maxima and C. moschata) seedlings from fruit infected with isolate Knx-1. Leaf samples from symptomatic cucurbit plants collected from fields in five cucurbit-growing areas in four Australian states were tested for the presence of ZYMV. When 42 complete coat protein (CP) nucleotide (nt) sequences from the new ZYMV isolates obtained were compared to those of 101 complete CP nt sequences from five other continents, phylogenetic analysis of the 143 ZYMV sequences revealed three distinct groups (A, B and C), with four subgroups in A (I-IV) and two in B (I-II). The new Australian sequences grouped according to collection location, fitting within A-I, A-II and B-II. The 16 new sequences from one isolated location in tropical northern Western Australia all grouped into subgroup B-II, which contained no other isolates. In contrast, the three sequences from the Northern Territory fitted into A-II with 94.6-99.0% nt identities with isolates from the United States, Iran, China and Japan. The 23 new sequences from the central west coast and two east coast locations all fitted into A-I, with 95.9-98.9% nt identities to sequences from Europe and Japan. These findings suggest that (i) there have been at least three separate ZYMV introductions into Australia and (ii) there are few changes to local isolate CP sequences following their establishment in remote growing areas. Isolates from A-I and B

  12. Bringing Next-Generation Sequencing into the Classroom through a Comparison of Molecular Biology Techniques

    ERIC Educational Resources Information Center

    Bowling, Bethany; Zimmer, Erin; Pyatt, Robert E.

    2014-01-01

    Although the development of next-generation (NextGen) sequencing technologies has revolutionized genomic research and medicine, the incorporation of these topics into the classroom is challenging, given an implied high degree of technical complexity. We developed an easy-to-implement, interactive classroom activity investigating the similarities…

  13. Bringing Next-Generation Sequencing into the Classroom through a Comparison of Molecular Biology Techniques

    ERIC Educational Resources Information Center

    Bowling, Bethany; Zimmer, Erin; Pyatt, Robert E.

    2014-01-01

    Although the development of next-generation (NextGen) sequencing technologies has revolutionized genomic research and medicine, the incorporation of these topics into the classroom is challenging, given an implied high degree of technical complexity. We developed an easy-to-implement, interactive classroom activity investigating the similarities…

  14. Comparison of the Biolog OmniLog Identification System and 16S ribosomal RNA gene sequencing for accuracy in identification of atypical bacteria of clinical origin.

    PubMed

    Morgan, Megan C; Boyette, Marilyn; Goforth, Chris; Sperry, Katharine Volpe; Greene, Shermalyn R

    2009-12-01

    The Biolog OmniLog Identification System (Biolog) and the 16S ribosomal RNA (rRNA) gene sequencing methods were compared to conventional microbiological methods and evaluated for accuracy of bacterial identification. These methods were evaluated using 159 clinical isolates. Each isolate was initially identified by conventional biochemical tests and morphological characteristics and subsequently placed into one of seven categories: aerobic Actinomycetes, Bacillus, Coryneforms, fastidious Gram-negative rods (GNR), non-fermenting GNR, miscellaneous Gram-positive rods (GPR), and Vibrio/Aeromonas. After comparison to the conventional identification, the Biolog system and 16S rRNA gene sequence identifications were classified as follows: a) correct to the genus and species levels; b) correct to the genus level only; or c) neither (unacceptable) identification. Overall, 16S rRNA gene sequencing had the highest percent accuracy with 90.6% correct identifications, while the Biolog system identified 68.3% of the isolates correctly. For each category, 16S rRNA gene sequencing had a substantially higher percent accuracy compared to the conventional methods. It was determined that the Biolog system is deficient when identifying organisms in the fastidious GNR category (20.0%). The observed data suggest that 16S rRNA gene sequencing provides a more accurate identification of atypical bacteria than the Biolog system.

  15. Nonlinear analysis of biological sequences

    SciTech Connect

    Torney, D.C.; Bruno, W.; Detours, V.

    1998-11-01

    This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). The main objectives of this project involved deriving new capabilities for analyzing biological sequences. The authors focused on tabulating the statistical properties exhibited by Human coding DNA sequences and on techniques of inferring the phylogenetic relationships among protein sequences related by descent.

  16. Large-Scale Sequence Comparison.

    PubMed

    Lal, Devi; Verma, Mansi

    2017-01-01

    There are millions of sequences deposited in genomic databases, and it is an important task to categorize them according to their structural and functional roles. Sequence comparison is a prerequisite for proper categorization of both DNA and protein sequences, and helps in assigning a putative or hypothetical structure and function to a given sequence. There are various methods available for comparing sequences, alignment being first and foremost for sequences with a small number of base pairs as well as for large-scale genome comparison. Various tools are available for performing pairwise large sequence comparison. The best known tools either perform global alignment or generate local alignments between the two sequences. In this chapter we first provide basic information regarding sequence comparison. This is followed by the description of the PAM and BLOSUM matrices that form the basis of sequence comparison. We also give a practical overview of currently available methods such as BLAST and FASTA, followed by a description and overview of tools available for genome comparison including LAGAN, MumMER, BLASTZ, and AVID.

  17. Parametric bootstrapping for biological sequence motifs.

    PubMed

    O'Neill, Patrick K; Erill, Ivan

    2016-10-06

    Biological sequence motifs drive the specific interactions of proteins and nucleic acids. Accordingly, the effective computational discovery and analysis of such motifs is a central theme in bioinformatics. Many practical questions about the properties of motifs can be recast as random sampling problems. In this light, the task is to determine for a given motif whether a certain feature of interest is statistically unusual among relevantly similar alternatives. Despite the generality of this framework, its use has been frustrated by the difficulties of defining an appropriate reference class of motifs for comparison and of sampling from it effectively. We define two distributions over the space of all motifs of given dimension. The first is the maximum entropy distribution subject to mean information content, and the second is the truncated uniform distribution over all motifs having information content within a given interval. We derive exact sampling algorithms for each. As a proof of concept, we employ these sampling methods to analyze a broad collection of prokaryotic and eukaryotic transcription factor binding site motifs. In addition to positional information content, we consider the informational Gini coefficient of the motif, a measure of the degree to which information is evenly distributed throughout a motif's positions. We find that both prokaryotic and eukaryotic motifs tend to exhibit higher informational Gini coefficients (IGC) than would be expected by chance under either reference distribution. As a second application, we apply maximum entropy sampling to the motif p-value problem and use it to give elementary derivations of two new estimators. Despite the historical centrality of biological sequence motif analysis, this study constitutes to our knowledge the first use of principled null hypotheses for sequence motifs given information content. Through their use, we are able to characterize for the first time differerences in global motif statistics

  18. Comparison of Next-Generation Sequencing Versus Biological Indexing for the Optimal Detection of Viral Pathogens in Grapevine.

    PubMed

    Al Rwahnih, Maher; Daubert, Steve; Golino, Deborah; Islas, Christina; Rowhani, Adib

    2015-06-01

    A bioassay is routinely used to determine the viral phytosanitary status of commercial grapevine propagation material in many countries around the world. That test is based on the symptoms developed in the field by specific indicator host plants that are graft-inoculated from the vines being tested. We compared the bioassay against next-generation sequencing (NGS) analysis of grapevine material. NGS is a laboratory procedure that catalogs the genomic sequences of the viruses and other pathogens extracted as DNA and RNA from infected vines. NGS analysis was found to be superior to the standard bioassay in detection of viruses of agronomic significance, including virus infections at low titers. NGS was also found to be superior to the bioassay in its comprehensiveness, the speed of its analysis, and for the discovery of novel, uncharacterized viruses.

  19. Information theory applications for biological sequence analysis.

    PubMed

    Vinga, Susana

    2014-05-01

    Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.

  20. Function-Based Algorithms for Biological Sequences

    ERIC Educational Resources Information Center

    Mohanty, Pragyan Sheela P.

    2015-01-01

    Two problems at two different abstraction levels of computational biology are studied. At the molecular level, efficient pattern matching algorithms in DNA sequences are presented. For gene order data, an efficient data structure is presented capable of storing all gene re-orderings in a systematic manner. A common characteristic of presented…

  1. Function-Based Algorithms for Biological Sequences

    ERIC Educational Resources Information Center

    Mohanty, Pragyan Sheela P.

    2015-01-01

    Two problems at two different abstraction levels of computational biology are studied. At the molecular level, efficient pattern matching algorithms in DNA sequences are presented. For gene order data, an efficient data structure is presented capable of storing all gene re-orderings in a systematic manner. A common characteristic of presented…

  2. Fungal genome sequencing: basic biology to biotechnology.

    PubMed

    Sharma, Krishna Kant

    2016-08-01

    The genome sequences provide a first glimpse into the genomic basis of the biological diversity of filamentous fungi and yeast. The genome sequence of the budding yeast, Saccharomyces cerevisiae, with a small genome size, unicellular growth, and rich history of genetic and molecular analyses was a milestone of early genomics in the 1990s. The subsequent completion of fission yeast, Schizosaccharomyces pombe and genetic model, Neurospora crassa initiated a revolution in the genomics of the fungal kingdom. In due course of time, a substantial number of fungal genomes have been sequenced and publicly released, representing the widest sampling of genomes from any eukaryotic kingdom. An ambitious genome-sequencing program provides a wealth of data on metabolic diversity within the fungal kingdom, thereby enhancing research into medical science, agriculture science, ecology, bioremediation, bioenergy, and the biotechnology industry. Fungal genomics have higher potential to positively affect human health, environmental health, and the planet's stored energy. With a significant increase in sequenced fungal genomes, the known diversity of genes encoding organic acids, antibiotics, enzymes, and their pathways has increased exponentially. Currently, over a hundred fungal genome sequences are publicly available; however, no inclusive review has been published. This review is an initiative to address the significance of the fungal genome-sequencing program and provides the road map for basic and applied research.

  3. Reading biological processes from nucleotide sequences

    NASA Astrophysics Data System (ADS)

    Murugan, Anand

    Cellular processes have traditionally been investigated by techniques of imaging and biochemical analysis of the molecules involved. The recent rapid progress in our ability to manipulate and read nucleic acid sequences gives us direct access to the genetic information that directs and constrains biological processes. While sequence data is being used widely to investigate genotype-phenotype relationships and population structure, here we use sequencing to understand biophysical mechanisms. We present work on two different systems. First, in chapter 2, we characterize the stochastic genetic editing mechanism that produces diverse T-cell receptors in the human immune system. We do this by inferring statistical distributions of the underlying biochemical events that generate T-cell receptor coding sequences from the statistics of the observed sequences. This inferred model quantitatively describes the potential repertoire of T-cell receptors that can be produced by an individual, providing insight into its potential diversity and the probability of generation of any specific T-cell receptor. Then in chapter 3, we present work on understanding the functioning of regulatory DNA sequences in both prokaryotes and eukaryotes. Here we use experiments that measure the transcriptional activity of large libraries of mutagenized promoters and enhancers and infer models of the sequence-function relationship from this data. For the bacterial promoter, we infer a physically motivated 'thermodynamic' model of the interaction of DNA-binding proteins and RNA polymerase determining the transcription rate of the downstream gene. For the eukaryotic enhancers, we infer heuristic models of the sequence-function relationship and use these models to find synthetic enhancer sequences that optimize inducibility of expression. Both projects demonstrate the utility of sequence information in conjunction with sophisticated statistical inference techniques for dissecting underlying biophysical

  4. The computational linguistics of biological sequences

    SciTech Connect

    Searls, D.

    1995-12-31

    This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. Protein sequences are analogous in many respects, particularly their folding behavior. Proteins have a much richer variety of interactions, but in theory the same linguistic principles could come to bear in describing dependencies between distant residues that arise by virtue of three-dimensional structure. This tutorial will concentrate on nucleic acid sequences.

  5. Protein sequence comparison and protein evolution

    SciTech Connect

    Pearson, W.R.

    1995-12-31

    This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. This tutorial examines how the information conserved during the evolution of a protein molecule can be used to infer reliably homology, and thus a shared proteinfold and possibly a shared active site or function. The authors start by reviewing a geological/evolutionary time scale. Next they look at the evolution of several protein families. During the tutorial, these families will be used to demonstrate that homologous protein ancestry can be inferred with confidence. They also examine different modes of protein evolution and consider some hypotheses that have been presented to explain the very earliest events in protein evolution. The next part of the tutorial will examine the technical aspects of protein sequence comparison. Both optimal and heuristic algorithms and their associated parameters that are used to characterize protein sequence similarities are discussed. Perhaps more importantly, they survey the statistics of local similarity scores, and how these statistics can both be used to improve the selectivity of a search and to evaluate the significance of a match. They them examine distantly related members of three protein families, the serine proteases, the glutathione transferases, and the G-protein-coupled receptors (GCRs). Finally, the discuss how sequence similarity can be used to examine internal repeated or mosaic structures in proteins.

  6. A natural M RNA reassortant arising from two species of plant- and insect-infecting bunyaviruses and comparison of its sequence and biological properties to parental species.

    PubMed

    Webster, Craig G; Reitz, Stuart R; Perry, Keith L; Adkins, Scott

    2011-05-10

    Reassortment allows multicomponent viruses to exchange genome segments, a process well-documented in the vertebrate- and arthropod-infecting members of the family Bunyaviridae but not between distinct species of the plant- and insect-infecting members of the genus Tospovirus. Genome sequence comparisons of a virus causing severe tospovirus-like symptoms in Florida tomato with Groundnut ringspot virus (GRSV) and Tomato chlorotic spot virus (TCSV) demonstrated that reassortment has occurred, with the large (L) and small (S) RNAs coming from GRSV and the medium (M) RNA coming from TCSV (i.e. L(G)M(T)S(G)). Neither parental genotype is known to occur in the U.S. suggesting that L(G)M(T)S(G) was introduced as a reassortant. L(G)M(T)S(G) was transmitted by western flower thrips (Frankliniella occidentalis [Pergande]), and was not able to overcome the Sw5 resistance gene of tomato. Our demonstration of reassortment between GRSV and TCSV suggests caution in defining species within the family Bunyaviridae based on their ability to reassort.

  7. Sequence comparisons via algorithmic mutual information

    SciTech Connect

    Milosavijevic, A.

    1994-12-31

    One of the main problems in DNA and protein sequence comparisons is to decide whether observed similarity of two sequences should be explained by their relatedness or by mere presence of some shared internal structure, e.g., shared internal tandem repeats. The standard methods that are based on statistics or classical information theory can be used to discover either internal structure or mutual sequence similarity, but cannot take into account both. Consequently, currently used methods for sequence comparison employ {open_quotes}masking{close_quotes} techniques that simply eliminate sequences that exhibit internal repetitive structure prior to sequence comparisons. The {open_quotes}masking{close_quotes} approach precludes discovery of homologous sequences of moderate or low complexity, which abound at both DNA and protein levels. As a solution to this problem, we propose a general method that is based on algorithmic information theory and minimal length encoding. We show that algorithmic mutual information factors out the sequence similarity that is due to shared internal structure and thus enables discovery of truly related sequences. We extend the recently developed algorithmic significance method to show that significance depends exponentially on algorithmic mutual information.

  8. Discovering New Biology through Sequencing of RNA.

    PubMed

    Weber, Andreas P M

    2015-11-01

    Sequencing of RNA (RNA-Seq) was invented approximately 1 decade ago and has since revolutionized biological research. This update provides a brief historic perspective on the development of RNA-Seq and then focuses on the application of RNA-Seq in qualitative and quantitative analyses of transcriptomes. Particular emphasis is given to aspects of data analysis. Since the wet-lab and data analysis aspects of RNA-Seq are still rapidly evolving and novel applications are continuously reported, a printed review will be rapidly outdated and can only serve to provide some examples and general guidelines for planning and conducting RNA-Seq studies. Hence, selected references to frequently update online resources are given. © 2015 American Society of Plant Biologists. All Rights Reserved.

  9. Multiple alignment-free sequence comparison

    PubMed Central

    Ren, Jie; Song, Kai; Sun, Fengzhu; Deng, Minghua; Reinert, Gesine

    2013-01-01

    Motivation: Recently, a range of new statistics have become available for the alignment-free comparison of two sequences based on k-tuple word content. Here, we extend these statistics to the simultaneous comparison of more than two sequences. Our suite of statistics contains, first, and , extensions of statistics for pairwise comparison of the joint k-tuple content of all the sequences, and second, , and , averages of sums of pairwise comparison statistics. The two tasks we consider are, first, to identify sequences that are similar to a set of target sequences, and, second, to measure the similarity within a set of sequences. Results: Our investigation uses both simulated data as well as cis-regulatory module data where the task is to identify cis-regulatory modules with similar transcription factor binding sites. We find that although for real data, all of our statistics show a similar performance, on simulated data the Shepp-type statistics are in some instances outperformed by star-type statistics. The multiple alignment-free statistics are more sensitive to contamination in the data than the pairwise average statistics. Availability: Our implementation of the five statistics is available as R package named ‘multiAlignFree’ at be http://www-rcf.usc.edu/∼fsun/Programs/multiAlignFree/multiAlignFreemain.html. Contact: reinert@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23990418

  10. Intra-species sequence comparisons for annotating genomes

    SciTech Connect

    Boffelli, Dario; Weer, Claire V.; Weng, Li; Lewis, Keith D.; Shoukry, Malak I.; Pachter, Lior; Keys, David N.; Rubin, Edward M.

    2004-07-15

    Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intra-species sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents and a set of genomic intervals amplified, resequenced and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom and raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. The sequence data from this study has been submitted to GenBank under accession nos. AY667278-AY667407.

  11. Alignment-free sequence comparison based on next-generation sequencing reads.

    PubMed

    Song, Kai; Ren, Jie; Zhai, Zhiyuan; Liu, Xuemei; Deng, Minghua; Sun, Fengzhu

    2013-02-01

    Next-generation sequencing (NGS) technologies have generated enormous amounts of shotgun read data, and assembly of the reads can be challenging, especially for organisms without template sequences. We study the power of genome comparison based on shotgun read data without assembly using three alignment-free sequence comparison statistics, D(2), D(*)(2) and D(s)(2), both theoretically and by simulations. Theoretical formulas for the power of detecting the relationship between two sequences related through a common motif model are derived. It is shown that both D(*)(2) and D(s)(2), outperform D(2) for detecting the relationship between two sequences based on NGS data. We then study the effects of length of the tuple, read length, coverage, and sequencing error on the power of D(*)(2) and D(s)(2). Finally, variations of these statistics, d(2), d(*)(2) and d(s)(2), respectively, are used to first cluster five mammalian species with known phylogenetic relationships, and then cluster 13 tree species whose complete genome sequences are not available using NGS shotgun reads. The clustering results using d(s)(2) are consistent with biological knowledge for the 5 mammalian and 13 tree species, respectively. Thus, the statistic d(s)(2) provides a powerful alignment-free comparison tool to study the relationships among different organisms based on NGS read data without assembly.

  12. Mining Contiguous Sequential Generators in Biological Sequences.

    PubMed

    Zhang, Jingsong; Wang, Yinglin; Zhang, Chao; Shi, Yongyong

    2016-01-01

    The discovery of conserved sequential patterns in biological sequences is essential to unveiling common shared functions. Mining sequential generators as well as mining closed sequential patterns can contribute to a more concise result set than mining all sequential patterns, especially in the analysis of big data in bioinformatics. Previous studies have also presented convincing arguments that the generator is preferable to the closed pattern in inductive inference and classification. However, classic sequential generator mining algorithms, due to the lack of consideration on the contiguous constraint along with the lower-closed one, still pose a great challenge at spawning a large number of inefficient and redundant patterns, which is too huge for effective usage. Driven by some extensive applications of patterns with contiguous feature, we propose ConSgen, an efficient algorithm for discovering contiguous sequential generators. It adopts the n-gram model, called shingles, to generate potential frequent subsequences and leverages several pruning techniques to prune the unpromising parts of search space. And then, the contiguous sequential generators are identified by using the equivalence class-based lower-closure checking scheme. Our experiments on both DNA and protein data sets demonstrate the compactness, efficiency, and scalability of ConSgen.

  13. Metabolic pathways variability and sequence/networks comparisons

    PubMed Central

    Tun, Kyaw; Dhar, Pawan K; Palumbo, Maria Concetta; Giuliani, Alessandro

    2006-01-01

    Background In this work a simple method for the computation of relative similarities between homologous metabolic network modules is presented. The method is similar to classical sequence alignment and allows for the generation of phenotypic trees amenable to be compared with correspondent sequence based trees. The procedure can be applied to both single metabolic modules and whole metabolic network data without the need of any specific assumption. Results We demonstrate both the ability of the proposed method to build reliable biological classification of a set of microrganisms and the strong correlation between the metabolic network wiringand involved enzymes sequence space. Conclusion The method represents a valuable tool for the investigation of genotype/phenotype correlationsallowing for a direct comparison of different species as for their metabolic machinery. In addition the detection of enzymes whose sequence space is maximally correlated with the metabolicnetwork space gives an indication of the most crucial (on an evolutionary viewpoint) steps of the metabolic process. PMID:16420696

  14. SeqCompress: an algorithm for biological sequence compression.

    PubMed

    Sardaraz, Muhammad; Tahir, Muhammad; Ikram, Ataul Aziz; Bajwa, Hassan

    2014-10-01

    The growth of Next Generation Sequencing technologies presents significant research challenges, specifically to design bioinformatics tools that handle massive amount of data efficiently. Biological sequence data storage cost has become a noticeable proportion of total cost in the generation and analysis. Particularly increase in DNA sequencing rate is significantly outstripping the rate of increase in disk storage capacity, which may go beyond the limit of storage capacity. It is essential to develop algorithms that handle large data sets via better memory management. This article presents a DNA sequence compression algorithm SeqCompress that copes with the space complexity of biological sequences. The algorithm is based on lossless data compression and uses statistical model as well as arithmetic coding to compress DNA sequences. The proposed algorithm is compared with recent specialized compression tools for biological sequences. Experimental results show that proposed algorithm has better compression gain as compared to other existing algorithms.

  15. Extracting biological knowledge from DNA sequences

    SciTech Connect

    De La Vega, F.M.; Thieffry, D. |; Collado-Vides, J.

    1996-12-31

    This session describes the elucidation of information from dna sequences and what challenges computational biologists face in their task of summarizing and deciphering the human genome. Techniques discussed include methods from statistics, information theory, artificial intelligence and linguistics. 1 ref.

  16. A probabilistic measure for alignment-free sequence comparison.

    PubMed

    Pham, Tuan D; Zuegg, Johannes

    2004-12-12

    Alignment-free sequence comparison methods are still in the early stages of development compared to those of alignment-based sequence analysis. In this paper, we introduce a probabilistic measure of similarity between two biological sequences without alignment. The method is based on the concept of comparing the similarity/dissimilarity between two constructed Markov models. The method was tested against six DNA sequences, which are the thrA, thrB and thrC genes of the threonine operons from Escherichia coli K-12 and from Shigella flexneri; and one random sequence having the same base composition as thrA from E.coli. These results were compared with those obtained from CLUSTAL W algorithm (alignment-based) and the chaos game representation (alignment-free). The method was further tested against a more complex set of 40 DNA sequences and compared with other existing sequence similarity measures (alignment-free). All datasets and computer codes written in MATLAB are available upon request from the first author.

  17. Sequence analysis in molecular biology: Treasure trove or trivial pursuit

    SciTech Connect

    Von Heijne, G.

    1987-01-01

    This book deals with sequence analysis on the computer. One of its aims is to serve as a brief survey of what one can do with protein and DNA sequences either directly on a microcomputer or by using one of the main sequence/programs data banks such as BioNet or the Wisconsin package. Equally important, the book traces the origins of some of the ideas that have come to be embodied in these programs from both biological and methodological points of view: What do the standard sequence analysis algorithms really analyze, and to what degree can we trust their outputs.

  18. Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

    PubMed

    Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo

    2016-07-19

    Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .

  19. Efficient Mining of Interesting Patterns in Large Biological Sequences

    PubMed Central

    Rashid, Md. Mamunur; Karim, Md. Rezaul; Jeong, Byeong-Soo

    2012-01-01

    Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin. In this paper, we propose a new interesting measure that can provide meaningful biological information. We also propose an efficient index-based method for mining such interesting patterns. Experimental results show that our approach can find interesting patterns within an acceptable computation time. PMID:23105928

  20. Single cell sequencing approaches for complex biological systems.

    PubMed

    Baslan, Timour; Hicks, James

    2014-06-01

    Biological phenotype is the output of complex interactions between heterogeneous cells within a specified niche. These interactions are tightly governed and regulated by the genetic, epigenetic, and transcriptional states of single cells, with deregulation of these states resulting in disease. As such, genome wide single cell investigations are bound to enhance our knowledge of the underlying principles that govern biological systems. Recent technological advances have enabled such investigations in the form of single-cell sequencing. Here, we review the most recent developments in genome wide profiling of single cells, discuss some of the novel biological observations gleaned by such investigations, and touch upon the promise of single cell sequencing in unraveling biological systems.

  1. Using cellular automata to generate image representation for biological sequences.

    PubMed

    Xiao, X; Shao, S; Ding, Y; Huang, Z; Chen, X; Chou, K-C

    2005-02-01

    A novel approach to visualize biological sequences is developed based on cellular automata (Wolfram, S. Nature 1984, 311, 419-424), a set of discrete dynamical systems in which space and time are discrete. By transforming the symbolic sequence codes into the digital codes, and using some optimal space-time evolvement rules of cellular automata, a biological sequence can be represented by a unique image, the so-called cellular automata image. Many important features, which are originally hidden in a long and complicated biological sequence, can be clearly revealed thru its cellular automata image. With biological sequences entering into databanks rapidly increasing in the post-genomic era, it is anticipated that the cellular automata image will become a very useful vehicle for investigation into their key features, identification of their function, as well as revelation of their "fingerprint". It is anticipated that by using the concept of the pseudo amino acid composition (Chou, K.C. Proteins: Structure, Function, and Genetics, 2001, 43, 246-255), the cellular automata image approach can also be used to improve the quality of predicting protein attributes, such as structural class and subcellular location.

  2. Bioinformatics comparison of sulfate-reducing metabolism nucleotide sequences

    NASA Astrophysics Data System (ADS)

    Tremberger, G.; Dehipawala, Sunil; Nguyen, A.; Cheung, E.; Sullivan, R.; Holden, T.; Lieberman, D.; Cheung, T.

    2015-09-01

    The sulfate-reducing bacteria can be traced back to 3.5 billion years ago. The thermodynamics details of the sulfur cycle have been well documented. A recent sulfate-reducing bacteria report (Robator, Jungbluth, et al , 2015 Jan, Front. Microbiol) with Genbank nucleotide data has been analyzed in terms of the sulfite reductase (dsrAB) via fractal dimension and entropy values. Comparison to oil field sulfate-reducing sequences was included. The AUCG translational mass fractal dimension versus ATCG transcriptional mass fractal dimension for the low temperature dsrB and dsrA sequences reported in Reference Thirteen shows correlation R-sq ~ 0.79 , with a probably of about 3% in simulation. A recent report of using Cystathionine gamma-lyase sequence to produce CdS quantum dot in a biological method, where the sulfur is reduced just like in the H2S production process, was included for comparison. The AUCG mass fractal dimension versus ATCG mass fractal dimension for the Cystathionine gamma-lyase sequences was found to have R-sq of 0.72, similar to the low temperature dissimilatory sulfite reductase dsr group with 3% probability, in contrary to the oil field group having R-sq ~ 0.94, a high probable outcome in the simulation. The other two simulation histograms, namely, fractal dimension versus entropy R-sq outcome values, and di-nucleotide entropy versus mono-nucleotide entropy R-sq outcome values are also discussed in the data analysis focusing on low probability outcomes.

  3. Identifying features in biological sequences: Sixth workshop report

    SciTech Connect

    Burks, C.; Myers, E.; Pearson, W.R.

    1995-12-31

    This report covers the sixth of an annual series of workshops held at the Aspen Center for Physics concentrating particularly on the identification of features in DNA sequence, and more broadly on related topics in computational molecular biology. The workshop series originally focused primarily on discussion of current needs and future strategies for identifying and predicting the presence of complex functional units on sequenced, but otherwise uncharacterized, genomic DNA. We addressed the need for computationally-based, automatic tools for synthesizing available data about individual consensus sequences and local compositional patterns into the composite objects (e.g., genes) that are -- as composite entities -- the true object of interest when scanning DNA sequences. The workshop was structured to promote sustained informal contact and exchange of expertise between molecular biologists, computer scientists, and mathematicians.

  4. A parallel computing approach to genetic sequence comparison: the master-worker paradigm with interworker communication.

    PubMed

    Sittig, D F; Foulser, D; Carriero, N; McCorkle, G; Miller, P L

    1991-04-01

    We have implemented a parallel version of a dynamic programming biological sequence comparison algorithm to study the potential applicability of using parallel computers for genetic sequence comparisons. Our parallel program is built using C-Linda, a machine-independent parallel programming language, and was tested on both a 10 CPU Sequent Symmetry and a 64 CPU Intel Hypercube. C-Linda implements a shared associative memory model, "tuple space," through which multiple processes can communicate and coordinate control. In our master-worker (MW) parallel implementation, a master process creates several worker processes, extracts a test sequence and multiple library sequences from a database and stores them in tuple space. Each worker reads the test sequence and then repeatedly extracts library strings from tuple space, performs pairwise sequence comparison using a local comparison algorithm to generate a similarity score, and returns the similarity scores to tuple space. The master collects the scores from tuple space and identifies the best match over all library sequences. We also implemented a method of global interworker communication to reduce the total search time by stopping those string comparisons that had no chance of improving on the current best match. Comparisons of the total run time, speedup, and efficiency were made for parallel and sequential versions of a basic MW implementation as well as versions with the global abort threshold.

  5. Comparison of Next-Generation Sequencing Systems

    PubMed Central

    Liu, Lin; Li, Yinhu; Li, Siliang; Hu, Ni; He, Yimin; Pong, Ray; Lin, Danni; Lu, Lihua; Law, Maggie

    2012-01-01

    With fast development and wide applications of next-generation sequencing (NGS) technologies, genomic sequence information is within reach to aid the achievement of goals to decode life mysteries, make better crops, detect pathogens, and improve life qualities. NGS systems are typically represented by SOLiD/Ion Torrent PGM from Life Sciences, Genome Analyzer/HiSeq 2000/MiSeq from Illumina, and GS FLX Titanium/GS Junior from Roche. Beijing Genomics Institute (BGI), which possesses the world's biggest sequencing capacity, has multiple NGS systems including 137 HiSeq 2000, 27 SOLiD, one Ion Torrent PGM, one MiSeq, and one 454 sequencer. We have accumulated extensive experience in sample handling, sequencing, and bioinformatics analysis. In this paper, technologies of these systems are reviewed, and first-hand data from extensive experience is summarized and analyzed to discuss the advantages and specifics associated with each sequencing system. At last, applications of NGS are summarized. PMID:22829749

  6. Legume genomics: understanding biology through DNA and RNA sequencing

    PubMed Central

    O'Rourke, Jamie A.; Bolon, Yung-Tsi; Bucciarelli, Bruna; Vance, Carroll P.

    2014-01-01

    Background The legume family (Leguminosae) consists of approx. 17 000 species. A few of these species, including, but not limited to, Phaseolus vulgaris, Cicer arietinum and Cajanus cajan, are important dietary components, providing protein for approx. 300 million people worldwide. Additional species, including soybean (Glycine max) and alfalfa (Medicago sativa), are important crops utilized mainly in animal feed. In addition, legumes are important contributors to biological nitrogen, forming symbiotic relationships with rhizobia to fix atmospheric N2 and providing up to 30 % of available nitrogen for the next season of crops. The application of high-throughput genomic technologies including genome sequencing projects, genome re-sequencing (DNA-seq) and transcriptome sequencing (RNA-seq) by the legume research community has provided major insights into genome evolution, genomic architecture and domestication. Scope and Conclusions This review presents an overview of the current state of legume genomics and explores the role that next-generation sequencing technologies play in advancing legume genomics. The adoption of next-generation sequencing and implementation of associated bioinformatic tools has allowed researchers to turn each species of interest into their own model organism. To illustrate the power of next-generation sequencing, an in-depth overview of the transcriptomes of both soybean and white lupin (Lupinus albus) is provided. The soybean transcriptome focuses on analysing seed development in two near-isogenic lines, examining the role of transporters, oil biosynthesis and nitrogen utilization. The white lupin transcriptome analysis examines how phosphate deficiency alters gene expression patterns, inducing the formation of cluster roots. Such studies illustrate the power of next-generation sequencing and bioinformatic analyses in elucidating the gene networks underlying biological processes. PMID:24769535

  7. New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing

    PubMed Central

    Song, Kai; Ren, Jie; Reinert, Gesine; Deng, Minghua

    2014-01-01

    With the development of next-generation sequencing (NGS) technologies, a large amount of short read data has been generated. Assembly of these short reads can be challenging for genomes and metagenomes without template sequences, making alignment-based genome sequence comparison difficult. In addition, sequence reads from NGS can come from different regions of various genomes and they may not be alignable. Sequence signature-based methods for genome comparison based on the frequencies of word patterns in genomes and metagenomes can potentially be useful for the analysis of short reads data from NGS. Here we review the recent development of alignment-free genome and metagenome comparison based on the frequencies of word patterns with emphasis on the dissimilarity measures between sequences, the statistical power of these measures when two sequences are related and the applications of these measures to NGS data. PMID:24064230

  8. Bioclojure: a functional library for the manipulation of biological sequences

    PubMed Central

    Plieskatt, Jordan; Rinaldi, Gabriel; Brindley, Paul J.; Jia, Xinying; Potriquet, Jeremy; Bethony, Jeffrey; Mulvenna, Jason

    2014-01-01

    Motivation: BioClojure is an open-source library for the manipulation of biological sequence data written in the language Clojure. BioClojure aims to provide a functional framework for the processing of biological sequence data that provides simple mechanisms for concurrency and lazy evaluation of large datasets. Results: BioClojure provides parsers and accessors for a range of biological sequence formats, including UniProtXML, Genbank XML, FASTA and FASTQ. In addition, it provides wrappers for key analysis programs, including BLAST, SignalP, TMHMM and InterProScan, and parsers for analyzing their output. All interfaces leverage Clojure’s functional style and emphasize laziness and composability, so that BioClojure, and user-defined, functions can be chained into simple pipelines that are thread-safe and seamlessly integrate lazy evaluation. Availability and implementation: BioClojure is distributed under the Lesser GPL, and the source code is freely available from GitHub (https://github.com/s312569/clj-biosequence). Contact: jason.mulvenna@qimrberghofer.edu.au or jason.mulvenna@qimr.edu.au PMID:24794932

  9. Next-Generation Sequencing: From Understanding Biology to Personalized Medicine

    PubMed Central

    Frese, Karen S.; Katus, Hugo A.; Meder, Benjamin

    2013-01-01

    Within just a few years, the new methods for high-throughput next-generation sequencing have generated completely novel insights into the heritability and pathophysiology of human disease. In this review, we wish to highlight the benefits of the current state-of-the-art sequencing technologies for genetic and epigenetic research. We illustrate how these technologies help to constantly improve our understanding of genetic mechanisms in biological systems and summarize the progress made so far. This can be exemplified by the case of heritable heart muscle diseases, so-called cardiomyopathies. Here, next-generation sequencing is able to identify novel disease genes, and first clinical applications demonstrate the successful translation of this technology into personalized patient care. PMID:24832667

  10. Draft Sequencing and Comparative Genomics of Xylella fastidiosa Strains Reveal Novel Biological Insights

    PubMed Central

    Bhattacharyya, Anamitra; Stilwagen, Stephanie; Reznik, Gary; Feil, Helene; Feil, William S.; Anderson, Iain; Bernal, Axel; D'Souza, Mark; Ivanova, Natalia; Kapatral, Vinayak; Larsen, Niels; Los, Tamara; Lykidis, Athanasios; Selkov, Eugene; Walunas, Theresa L.; Purcell, Alexander; Edwards, Rob A.; Hawkins, Trevor; Haselkorn, Robert; Overbeek, Ross; Kyrpides, Nikos C.; Predki, Paul F.

    2002-01-01

    Draft sequencing is a rapid and efficient method for determining the near-complete sequence of microbial genomes. Here we report a comparative analysis of one complete and two draft genome sequences of the phytopathogenic bacterium, Xylella fastidiosa, which causes serious disease in plants, including citrus, almond, and oleander. We present highlights of an in silico analysis based on a comparison of reconstructions of core biological subsystems. Cellular pathway reconstructions have been used to identify a small number of genes, which are likely to reside within the draft genomes but are not captured in the draft assembly. These represented only a small fraction of all genes and were predominantly large and small ribosomal subunit protein components. By using this approach, some of the inherent limitations of draft sequence can be significantly reduced. Despite the incomplete nature of the draft genomes, it is possible to identify several phage-related genes, which appear to be absent from the draft genomes and not the result of insufficient sequence sampling. This region may therefore identify potential host-specific functions. Based on this first functional reconstruction of a phytopathogenic microbe, we spotlight an unusual respiration machinery as a potential target for biological control. We also predicted and developed a new defined growth medium for Xylella. [The sequence data from this study have been submitted to GenBank under accession nos. NC_002723 (X. fastidiosa Almond [Dixon]) and NC_002722 (X. fastidiosa Oleander [Ann-1]). PMID:12368248

  11. Biologic: Gene circuits and feedback in an introductory physics sequence for biology and premedical students

    NASA Astrophysics Data System (ADS)

    Cahn, S. B.; Mochrie, S. G. J.

    2014-05-01

    We describe an educational module on feedback and gene circuits that constitute the final topic in a new year-long introductory physics sequence aimed at biology and premedical students at Yale University. The overall goals of this sequence are threefold. First to demonstrate the application of physics and mathematics in the life sciences. Second to introduce biological science majors to mathematical and physical tools, principles, and experiences. Third to seed an enduring appreciation of quantitative approaches in biology and medicine. Here, we present a module on feedback and gene circuits that focuses on a genetic toggle switch and a repressilator. The genetic toggle switch consists of two genes, each of whose protein products represses the other's expression, while the repressilator consists of three genes, each of whose protein products represses the next gene's expression. Analytic, numerical, and electronic treatments of the genetic toggle switch show bistability. A similar treatment of the repressilator reveals sustained oscillations.

  12. Next-generation sequencing techniques for eukaryotic microorganisms: sequencing-based solutions to biological problems.

    PubMed

    Nowrousian, Minou

    2010-09-01

    Over the past 5 years, large-scale sequencing has been revolutionized by the development of several so-called next-generation sequencing (NGS) technologies. These have drastically increased the number of bases obtained per sequencing run while at the same time decreasing the costs per base. Compared to Sanger sequencing, NGS technologies yield shorter read lengths; however, despite this drawback, they have greatly facilitated genome sequencing, first for prokaryotic genomes and within the last year also for eukaryotic ones. This advance was possible due to a concomitant development of software that allows the de novo assembly of draft genomes from large numbers of short reads. In addition, NGS can be used for metagenomics studies as well as for the detection of sequence variations within individual genomes, e.g., single-nucleotide polymorphisms (SNPs), insertions/deletions (indels), or structural variants. Furthermore, NGS technologies have quickly been adopted for other high-throughput studies that were previously performed mostly by hybridization-based methods like microarrays. This includes the use of NGS for transcriptomics (RNA-seq) or the genome-wide analysis of DNA/protein interactions (ChIP-seq). This review provides an overview of NGS technologies that are currently available and the bioinformatics analyses that are necessary to obtain information from the flood of sequencing data as well as applications of NGS to address biological questions in eukaryotic microorganisms.

  13. PMBC: pattern mining from biological sequences with wildcard constraints.

    PubMed

    Wu, Xindong; Zhu, Xingquan; He, Yu; Arslan, Abdullah N

    2013-06-01

    Patterns/subsequences frequently appearing in sequences provide essential knowledge for domain experts, such as molecular biologists, to discover rules or patterns hidden behind the data. Due to the inherent complex nature of the biological data, patterns rarely exactly reproduce and repeat themselves, but rather appear with a slightly different form in each of its appearances. A gap constraint (In this paper, a gap constraint (also referred to as a wildcard) is a character that can be substituted for any character predefined in an alphabet.) provides flexibility for users to capture useful patterns even if their appearances vary in the sequences. In order to find patterns, existing tools require users to explicitly specify gap constraints beforehand. In reality, it is often nontrivial or time-consuming for users to provide proper gap constraint values. In addition, a change made to the gap values may give completely different results, and require a separate time-consuming re-mining procedure. Therefore, it is desirable to automatically and efficiently find patterns without involving user-specified gap requirements. In this paper, we study the problem of frequent pattern mining without user-specified gap constraints and propose PMBC (namely P̲atternM̲ining from B̲iological sequences with wildcard C onstraints) to solve the problem. Given a sequence and a support threshold value (i.e. pattern frequency threshold), PMBC intends to discover all subsequences with their support values equal to or greater than the given threshold value. The frequent subsequences then form patterns later on. Two heuristic methods (one-way vs. two-way scans) are proposed to discover frequent subsequences and estimate their frequency in the sequences. Experimental results on both synthetic and real-world DNA sequences demonstrate the performance of both methods for frequent pattern mining and pattern frequency estimation.

  14. Biological nanopore MspA for DNA sequencing

    NASA Astrophysics Data System (ADS)

    Manrao, Elizabeth A.

    Unlocking the information hidden in the human genome provides insight into the inner workings of complex biological systems and can be used to greatly improve health-care. In order to allow for widespread sequencing, new technologies are required that provide fast and inexpensive readings of DNA. Nanopore sequencing is a third generation DNA sequencing technology that is currently being developed to fulfill this need. In nanopore sequencing, a voltage is applied across a small pore in an electrolyte solution and the resulting ionic current is recorded. When DNA passes through the channel, the ionic current is partially blocked. If the DNA bases uniquely modulate the ionic current flowing through the channel, the time trace of the current can be related to the sequence of DNA passing through the pore. There are two main challenges to realizing nanopore sequencing: identifying a pore with sensitivity to single nucleotides and controlling the translocation of DNA through the pore so that the small single nucleotide current signatures are distinguishable from background noise. In this dissertation, I explore the use of Mycobacterium smegmatis porin A (MspA) for nanopore sequencing. In order to determine MspA's sensitivity to single nucleotides, DNA strands of various compositions are held in the pore as the resulting ionic current is measured. DNA is immobilized in MspA by attaching it to a large molecule which acts as an anchor. This technique confirms the single nucleotide resolution of the pore and additionally shows that MspA is sensitive to epigenetic modifications and single nucleotide polymorphisms. The forces from the electric field within MspA, the effective charge of nucleotides, and elasticity of DNA are estimated using a Freely Jointed Chain model of single stranded DNA. These results offer insight into the interactions of DNA within the pore. With the nucleotide sensitivity of MspA confirmed, a method is introduced to controllably pass DNA through the pore

  15. Experience using web services for biological sequence analysis

    PubMed Central

    Attwood, Teresa; Chohan, Shahid Nadeem; Côté, Richard; Cudré-Mauroux, Philippe; Falquet, Laurent; Fernandes, Pedro; Finn, Robert D.; Hupponen, Taavi; Korpelainen, Eija; Labarga, Alberto; Laugraud, Aurelie; Lima, Tania; Pafilis, Evangelos; Pagni, Marco; Pettifer, Steve; Phan, Isabelle; Rahman, Nazim

    2008-01-01

    Programmatic access to data and tools through the web using so-called web services has an important role to play in bioinformatics. In this article, we discuss the most popular approaches based on SOAP/WS-I and REST and describe our, a cross section of the community, experiences with providing and using web services in the context of biological sequence analysis. We briefly review main technological approaches as well as best practice hints that are useful for both users and developers. Finally, syntactic and semantic data integration issues with multiple web services are discussed. PMID:18621748

  16. Multiple hypothesis tracking for cluttered biological image sequences.

    PubMed

    Chenouard, Nicolas; Bloch, Isabelle; Olivo-Marin, Jean-Christophe

    2013-11-01

    In this paper, we present a method for simultaneously tracking thousands of targets in biological image sequences, which is of major importance in modern biology. The complexity and inherent randomness of the problem lead us to propose a unified probabilistic framework for tracking biological particles in microscope images. The framework includes realistic models of particle motion and existence and of fluorescence image features. For the track extraction process per se, the very cluttered conditions motivate the adoption of a multiframe approach that enforces tracking decision robustness to poor imaging conditions and to random target movements. We tackle the large-scale nature of the problem by adapting the multiple hypothesis tracking algorithm to the proposed framework, resulting in a method with a favorable tradeoff between the model complexity and the computational cost of the tracking procedure. When compared to the state-of-the-art tracking techniques for bioimaging, the proposed algorithm is shown to be the only method providing high-quality results despite the critically poor imaging conditions and the dense target presence. We thus demonstrate the benefits of advanced Bayesian tracking techniques for the accurate computational modeling of dynamical biological processes, which is promising for further developments in this domain.

  17. Biological characterization and complete nucleotide sequence of a Tunisian isolate of Moroccan watermelon mosaic virus.

    PubMed

    Yakoubi, S; Desbiez, C; Fakhfakh, H; Wipf-Scheibel, C; Marrakchi, M; Lecoq, H

    2008-01-01

    During a survey conducted in October 2005, cucurbit leaf samples showing virus-like symptoms were collected from the major cucurbit-growing areas in Tunisia. DAS-ELISA showed the presence of Moroccan watermelon mosaic virus (MWMV, Potyvirus), detected for the first time in Tunisia, in samples from the region of Cap Bon (Northern Tunisia). MWMV isolate TN05-76 (MWMV-Tn) was characterized biologically and its full-length genome sequence was established. MWMV-Tn was found to have biological properties similar to those reported for the MWMV type strain from Morocco. Phylogenetic analysis including the comparison of complete amino-acid sequences of 42 potyviruses confirmed that MWMV-Tn is related (65% amino-acid sequence identity) to Papaya ringspot virus (PRSV) isolates but is a member of a distinct virus species. Sequence analysis on parts of the CP gene of MWMV isolates from different geographical origins revealed some geographic structure of MWMV variability, with three different clusters: one cluster including isolates from the Mediterranean region, a second including isolates from western and central Africa, and a third one including isolates from the southern part of Africa. A significant correlation was observed between geographic and genetic distances between isolates. Isolates from countries in the Mediterranean region where MWMV has recently emerged (France, Spain, Portugal) have highly conserved sequences, suggesting that they may have a common and recent origin. MWMV from Sudan, a highly divergent variant, may be considered an evolutionary intermediate between MWMV and PRSV.

  18. An efficient binomial model-based measure for sequence comparison and its application.

    PubMed

    Liu, Xiaoqing; Dai, Qi; Li, Lihua; He, Zerong

    2011-04-01

    Sequence comparison is one of the major tasks in bioinformatics, which could serve as evidence of structural and functional conservation, as well as of evolutionary relations. There are several similarity/dissimilarity measures for sequence comparison, but challenges remains. This paper presented a binomial model-based measure to analyze biological sequences. With help of a random indicator, the occurrence of a word at any position of sequence can be regarded as a random Bernoulli variable, and the distribution of a sum of the word occurrence is well known to be a binomial one. By using a recursive formula, we computed the binomial probability of the word count and proposed a binomial model-based measure based on the relative entropy. The proposed measure was tested by extensive experiments including classification of HEV genotypes and phylogenetic analysis, and further compared with alignment-based and alignment-free measures. The results demonstrate that the proposed measure based on binomial model is more efficient.

  19. [Comparison study between biological vision and computer vision].

    PubMed

    Liu, W; Yuan, X G; Yang, C X; Liu, Z Q; Wang, R

    2001-08-01

    The development and bearing of biology vision in structure and mechanism were discussed, especially on the aspects including anatomical structure of biological vision, tentative classification of reception field, parallel processing of visual information, feedback and conformity effect of visual cortical, and so on. The new advance in the field was introduced through the study of the morphology of biological vision. Besides, comparison between biological vision and computer vision was made, and their similarities and differences were pointed out.

  20. Comparison of 61 Sequenced Escherichia coli Genomes

    PubMed Central

    Lukjancenko, Oksana; Wassenaar, Trudy M.

    2010-01-01

    Escherichia coli is an important component of the biosphere and is an ideal model for studies of processes involved in bacterial genome evolution. Sixty-one publically available E. coli and Shigella spp. sequenced genomes are compared, using basic methods to produce phylogenetic and proteomics trees, and to identify the pan- and core genomes of this set of sequenced strains. A hierarchical clustering of variable genes allowed clear separation of the strains into clusters, including known pathotypes; clinically relevant serotypes can also be resolved in this way. In contrast, when in silico MLST was performed, many of the various strains appear jumbled and less well resolved. The predicted pan-genome comprises 15,741 gene families, and only 993 (6%) of the families are represented in every genome, comprising the core genome. The variable or ‘accessory’ genes thus make up more than 90% of the pan-genome and about 80% of a typical genome; some of these variable genes tend to be co-localized on genomic islands. The diversity within the species E. coli, and the overlap in gene content between this and related species, suggests a continuum rather than sharp species borders in this group of Enterobacteriaceae. PMID:20623278

  1. COMPARISON OF BIOLOGICAL COMMUNITIES: THE PROBLEM OF SAMPLE REPRESENTATIVENESS

    EPA Science Inventory

    Obtaining an adequate, representative sample of biological communities or assemblages to make richness or compositional comparisons among sites is a continuing challenge. Traditionally, sample size is based on numbers of replicates or area collected or numbers of individuals enum...

  2. The Microsoft Biology Foundation Applications for High-Throughput Sequencing

    PubMed Central

    Mercer, S.

    2010-01-01

    w9-2 The need for reusable libraries of bioinformatics functions has been recognized for many years and a number of language-specific toolkits have been constructed. Such toolkits have served as valuable nucleation points for the community, promoting the sharing of code and establishing standards. The majority of DNA sequencing machines and many other standard pieces of lab equipment are controlled by PCs using Windows, and a Microsoft genomics toolkit would enable initial processing and quality control to happen closer to the instrumentation and provide opportunities for added-value services within core facilities. The Microsoft Biology Foundation (MBF) is an open source software library, freely available for both commercial and academic use, available as an early-stage betafrom mbf.codeplex.com. This presentation will describe the structure and goals of MBF and demonstrate some of its uses.

  3. Discovering New Biology through Sequencing of RNA1

    PubMed Central

    Weber, Andreas P.M.

    2015-01-01

    Sequencing of RNA (RNA-Seq) was invented approximately 1 decade ago and has since revolutionized biological research. This update provides a brief historic perspective on the development of RNA-Seq and then focuses on the application of RNA-Seq in qualitative and quantitative analyses of transcriptomes. Particular emphasis is given to aspects of data analysis. Since the wet-lab and data analysis aspects of RNA-Seq are still rapidly evolving and novel applications are continuously reported, a printed review will be rapidly outdated and can only serve to provide some examples and general guidelines for planning and conducting RNA-Seq studies. Hence, selected references to frequently update online resources are given. PMID:26353759

  4. Compressive biological sequence analysis and archival in the era of high-throughput sequencing technologies.

    PubMed

    Giancarlo, Raffaele; Rombo, Simona E; Utro, Filippo

    2014-05-01

    High-throughput sequencing technologies produce large collections of data, mainly DNA sequences with additional information, requiring the design of efficient and effective methodologies for both their compression and storage. In this context, we first provide a classification of the main techniques that have been proposed, according to three specific research directions that have emerged from the literature and, for each, we provide an overview of the current techniques. Finally, to make this review useful to researchers and technicians applying the existing software and tools, we include a synopsis of the main characteristics of the described approaches, including details on their implementation and availability. Performance of the various methods is also highlighted, although the state of the art does not lend itself to a consistent and coherent comparison among all the methods presented here.

  5. A comparison of biological and cultural evolution.

    PubMed

    Portin, Petter

    2015-03-01

    This review begins with a definition of biological evolution and a description of its general principles. This is followed by a presentation of the biological basis of culture, specifically the concept of social selection. Further, conditions for cultural evolution are proposed, including a suggestion for language being the cultural replicator corresponding to the concept of the gene in biological evolution. Principles of cultural evolution are put forward and compared to the principles of biological evolution. Special emphasis is laid on the principle of selection in cultural evolution, including presentation of the concept of cultural fitness. The importance of language as a necessary condition for cultural evolution is stressed. Subsequently, prime differences between biological and cultural evolution are presented, followed by a discussion on interaction of our genome and our culture. The review aims at contributing to the present discussion concerning the modern development of the general theory of evolution, for example by giving a tentative formulation of the necessary and sufficient conditions for cultural evolution, and proposing that human creativity and mind reading or theory of mind are motors specific for it. The paper ends with the notion of the still ongoing coevolution of genes and culture.

  6. A novel method of characterizing genetic sequences: genome space with biological distance and applications.

    PubMed

    Deng, Mo; Yu, Chenglong; Liang, Qian; He, Rong L; Yau, Stephen S-T

    2011-03-02

    Most existing methods for phylogenetic analysis involve developing an evolutionary model and then using some type of computational algorithm to perform multiple sequence alignment. There are two problems with this approach: (1) different evolutionary models can lead to different results, and (2) the computation time required for multiple alignments makes it impossible to analyse the phylogeny of a whole genome. This motivates us to create a new approach to characterize genetic sequences. To each DNA sequence, we associate a natural vector based on the distributions of nucleotides. This produces a one-to-one correspondence between the DNA sequence and its natural vector. We define the distance between two DNA sequences to be the distance between their associated natural vectors. This creates a genome space with a biological distance which makes global comparison of genomes with same topology possible. We use our proposed method to analyze the genomes of the new influenza A (H1N1) virus, human rhinoviruses (HRV) and mammalian mitochondrial. The result shows that a triple-reassortant swine virus circulating in North America and the Eurasian swine virus belong to the lineage of the influenza A (H1N1) virus. For the HRV and mammalian mitochondrial genomes, the results coincide with biologists' analyses. Our approach provides a powerful new tool for analyzing and annotating genomes and their phylogenetic relationships. Whole or partial genomes can be handled more easily and more quickly than using multiple alignment methods. Once a genome space has been constructed, it can be stored in a database. There is no need to reconstruct the genome space for subsequent applications, whereas in multiple alignment methods, realignment is needed to add new sequences. Furthermore, one can make a global comparison of all genomes simultaneously, which no other existing method can achieve.

  7. Comparison study on k-word statistical measures for protein: From sequence to 'sequence space'

    PubMed Central

    Dai, Qi; Wang, Tianming

    2008-01-01

    Background Many proposed statistical measures can efficiently compare protein sequence to further infer protein structure, function and evolutionary information. They share the same idea of using k-word frequencies of protein sequences. Given a protein sequence, the information on its related protein sequences hasn't been used for protein sequence comparison until now. This paper proposed a scheme to construct protein 'sequence space' which was associated with protein sequences related to the given protein, and the performances of statistical measures were compared when they explored the information on protein 'sequence space' or not. This paper also presented two statistical measures for protein: gre.k (generalized relative entropy) and gsm.k (gapped similarity measure). Results We tested statistical measures based on protein 'sequence space' or not with three data sets. This not only offers the systematic and quantitative experimental assessment of these statistical measures, but also naturally complements the available comparison of statistical measures based on protein sequence. Moreover, we compared our statistical measures with alignment-based measures and the existing statistical measures. The experiments were grouped into two sets. The first one, performed via ROC (Receiver Operating Curve) analysis, aims at assessing the intrinsic ability of the statistical measures to discriminate and classify protein sequences. The second set of the experiments aims at assessing how well our measure does in phylogenetic analysis. Based on the experiments, several conclusions can be drawn and, from them, novel valuable guidelines for the use of protein 'sequence space' and statistical measures were obtained. Conclusion Alignment-based measures have a clear advantage when the data is high redundant. The more efficient statistical measure is the novel gsm.k introduced by this article, the cos.k followed. When the data becomes less redundant, gre.k proposed by us achieves a

  8. Comparison of mitochondrial genome sequences of pangolins (Mammalia, Pholidota).

    PubMed

    Hassanin, Alexandre; Hugot, Jean-Pierre; van Vuuren, Bettine Jansen

    2015-04-01

    The complete mitochondrial genome was sequenced for three species of pangolins, Manis javanica, Phataginus tricuspis, and Smutsia temminckii, and comparisons were made with two other species, Manis pentadactyla and Phataginus tetradactyla. The genome of Manidae contains the 37 genes found in a typical mammalian genome, and the structure of the control region is highly conserved among species. In Manis, the overall base composition differs from that found in African genera. Phylogenetic analyses support the monophyly of the genera Manis, Phataginus, and Smutsia, as well as the basal division between Maninae and Smutsiinae. Comparisons with GenBank sequences reveal that the reference genomes of M. pentadactyla and P. tetradactyla (accession numbers NC_016008 and NC_004027) were sequenced from misidentified taxa, and that a new species of tree pangolin should be described in Gabon.

  9. Two Approaches to Biology. Index Comparison

    ERIC Educational Resources Information Center

    Milby, T. H.

    1972-01-01

    A study of Biological Abstracts" and Index to American Botanical Literature" was made to determine: (1) the degree of overlap in coverage between the two services and (2) the time lapse between the publication of an article and its appearance in one of the indexes. (8 references) (NH)

  10. Sequenced genomes and rapidly emerging technologies pave the way for conifer evolutionary developmental biology

    PubMed Central

    Uddenberg, Daniel; Akhter, Shirin; Ramachandran, Prashanth; Sundström, Jens F.; Carlsbecker, Annelie

    2015-01-01

    Conifers, Ginkgo, cycads and gnetophytes comprise the four groups of extant gymnosperms holding a unique position of sharing common ancestry with the angiosperms. Comparative studies of gymnosperms and angiosperms are the key to a better understanding of ancient seed plant morphologies, how they have shifted over evolution to shape modern day species, and how the genes governing these morphologies have evolved. However, conifers and other gymnosperms have been notoriously difficult to study due to their long generation times, inaccessibility to genetic experimentation and unavailable genome sequences. Now, with three draft genomes from spruces and pines, rapid advances in next generation sequencing methods for genome wide expression analyses, and enhanced methods for genetic transformation, we are much better equipped to address a number of key evolutionary questions relating to seed plant evolution. In this mini-review we highlight recent progress in conifer developmental biology relevant to evo-devo questions. We discuss how genome sequence data and novel techniques might allow us to explore genetic variation and naturally occurring conifer mutants, approaches to reduce long generation times to allow for genetic studies in conifers, and other potential upcoming research avenues utilizing current and emergent techniques. Results from developmental studies of conifers and other gymnosperms in comparison to those in angiosperms will provide information to trace core molecular developmental control tool kits of ancestral seed plants, but foremost they will greatly improve our understanding of the biology of conifers and other gymnosperms in their own right. PMID:26579190

  11. Sequenced genomes and rapidly emerging technologies pave the way for conifer evolutionary developmental biology.

    PubMed

    Uddenberg, Daniel; Akhter, Shirin; Ramachandran, Prashanth; Sundström, Jens F; Carlsbecker, Annelie

    2015-01-01

    Conifers, Ginkgo, cycads and gnetophytes comprise the four groups of extant gymnosperms holding a unique position of sharing common ancestry with the angiosperms. Comparative studies of gymnosperms and angiosperms are the key to a better understanding of ancient seed plant morphologies, how they have shifted over evolution to shape modern day species, and how the genes governing these morphologies have evolved. However, conifers and other gymnosperms have been notoriously difficult to study due to their long generation times, inaccessibility to genetic experimentation and unavailable genome sequences. Now, with three draft genomes from spruces and pines, rapid advances in next generation sequencing methods for genome wide expression analyses, and enhanced methods for genetic transformation, we are much better equipped to address a number of key evolutionary questions relating to seed plant evolution. In this mini-review we highlight recent progress in conifer developmental biology relevant to evo-devo questions. We discuss how genome sequence data and novel techniques might allow us to explore genetic variation and naturally occurring conifer mutants, approaches to reduce long generation times to allow for genetic studies in conifers, and other potential upcoming research avenues utilizing current and emergent techniques. Results from developmental studies of conifers and other gymnosperms in comparison to those in angiosperms will provide information to trace core molecular developmental control tool kits of ancestral seed plants, but foremost they will greatly improve our understanding of the biology of conifers and other gymnosperms in their own right.

  12. Weighted measures based on maximizing deviation for alignment-free sequence comparison

    NASA Astrophysics Data System (ADS)

    Qian, Kun; Luan, Yihui

    2017-09-01

    Alignment-free sequence comparison is becoming fairly popular in many fields of computational biology due to less requirements for sequence itself and computational efficiency for a large scale of sequence data sets. Especially, the approaches based on k-tuple like D2, D2S and D2∗ are used widely and effectively. However, these measures treat each k-tuple equally without accounting for the potential importance differences among all k-tuples. In this paper, we take advantage of maximizing deviation method proposed in multiple attribute decision making to evaluate the weights of different k-tuples. We modify D2, D2S and D2∗ with weights and test them by similarity search and evaluation on functionally related regulatory sequences. The results demonstrate that the newly proposed measures are more efficient and robust compared to existing alignment-free methods.

  13. Applications of next-generation sequencing techniques in plant biology

    USDA-ARS?s Scientific Manuscript database

    The last several years have seen revolutionary advances in DNA sequencing technologies with the advent of next generation sequencing (NGS) techniques. NGS methods now allow millions of bases to be sequenced in one round, at a fraction of the cost relative to traditional Sanger sequencing, allowing u...

  14. The DNA sequence and biology of human chromosome 19

    SciTech Connect

    Grimwood, J; Gordon, L A; Olsen, A; Terry, A; Schmutz, J; Lamerdin, J; Hellsten, U; Goodstein, D; Couronne, O; Tran-Gyamfi, M

    2004-04-06

    Chromosome 19 has the highest gene density of all human chromosomes, more than double the genome-wide average. The large clustered gene families, corresponding high GC content, CpG islands and density of repetitive DNA indicate a chromosome rich in biological and evolutionary significance. Here we describe 55.8 million base pairs of highly accurate finished sequence representing 99.9% of the euchromatin portion of the chromosome. Manual curation of gene loci reveals 1,461 protein-coding genes and 321 pseudogenes. Among these are genes directly implicated in Mendelian disorders, including familial hypercholesterolemia and insulin-resistant diabetes. Nearly one quarter of these genes belong to tandemly arranged families, encompassing more than 25% of the chromosome. Comparative analyses show a fascinating picture of conservation and divergence, revealing large blocks of gene orthology with rodents, scattered regions with more recent gene family expansions and deletions, and segments of coding and non-coding conservation with the distant fish species Takifugu.

  15. The DNA sequence and biology of human chromosome 19

    SciTech Connect

    Grimwood, Jane; Gordon, Laurie A.; Olsen, Anne; Terry, Astrid; Schmutz, Jeremy; Lamerdin, Jane; Hellsten, Uffe; Goodstein, David; Couronne, Olivier; Tran-Gyamfi, Mary; Aerts, Andrea; Altherr, Michael; Ashworth, Linda; Bajorek, Eva; Black, Stacey; Branscomb, Elbert; Caenepeel, Sean; Carrano, Anthony; Caoile, Chenier; Chan, Yee Man; Christensen, Mari; Cleland, Catherine A.; Copeland, Alex; Dalin, Eileen; Dehal, Paramvir; Denys, Mirian; Detter, John C.; Escobar, Julio; Flowers, Dave; Fotopulos, Dea; Garcia, Carmen; Georgescu, Anca M.; Glavina, Tijana; Gomez, Maria; Gonzales, Eldelyn; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Ho, Issac; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Larionov, Vladimer; Leem, Sun-Hee; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Malfatti, Stephanie; Martinez, Diego; McCready, Paula; Medina, Catherine; Morgan, Jenna; Nelson, Kathryn; Nolan, Matt; Ovcharenko, Ivan; Pitluck, Sam; Pollard, Martin; Popkie, Anthony P.; Predki, Paul; Quan, Glenda; Ramirez, Lucia; Rash, Sam; Retterer, James; Rodriguez, Alex; Rogers, Stephanine; Salamov, Asaf; Salazar, Angelica; She, Xinwei; Smith, Doug; Slezak, Tom; Solovyev, Victor; Thayer, Nina; Tice, Hope; Tsai, Ming; Ustaszewska, Anna; Vo, Nu; Wagner, Mark; Wheeler, Jeremy; Wu, Kevin; Xie, Gary; Yang, Joan; Dubchak, Inna; Furey, Terrence S.; DeJong, Pieter; Dickson, Mark; Gordon, David; Eichler, Evan E.; Pennacchio, Len A.; Richardson, Paul; Stubbs, Lisa; Rokhsar, Daniel S.; Myers, Richard M.; Rubin, Edward M.; Lucas, Susan M.

    2003-09-15

    Chromosome 19 has the highest gene density of all human chromosomes, more than double the genome-wide average. The large clustered gene families, corresponding high G1C content, CpG islands and density of repetitive DNA indicate a chromosome rich in biological and evolutionary significance. Here we describe 55.8 million base pairs of highly accurate finished sequence representing 99.9 percent of the euchromatin portion of the chromosome. Manual curation of gene loci reveals 1,461 protein-coding genes and 321 pseudogenes. Among these are genes directly implicated in mendelian disorders, including familial hypercholesterolaemia and insulin-resistant diabetes. Nearly one-quarter of these genes belong to tandemly arranged families, encompassing more than 25 percent of the chromosome. Comparative analyses show a fascinating picture of conservation and divergence, revealing large blocks of gene orthology with rodents, scattered regions with more recent gene family expansions and deletions, a nd segments of coding and non-coding conservation with the distant fish species Takifugu.

  16. It's more than stamp collecting: how genome sequencing can unify biological research.

    PubMed

    Richards, Stephen

    2015-07-01

    The availability of reference genome sequences, especially the human reference, has revolutionized the study of biology. However, while the genomes of some species have been fully sequenced, a wide range of biological problems still cannot be effectively studied for lack of genome sequence information. Here, I identify neglected areas of biology and describe how both targeted species sequencing and more broad taxonomic surveys of the tree of life can address important biological questions. I enumerate the significant benefits that would accrue from sequencing a broader range of taxa, as well as discuss the technical advances in sequencing and assembly methods that would allow for wide-ranging application of whole-genome analysis. Finally, I suggest that in addition to 'big science' survey initiatives to sequence the tree of life, a modified infrastructure-funding paradigm would better support reference genome sequence generation for research communities most in need. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. It’s More Than Stamp Collecting: How Genome Sequencing Can Unify Biological Research

    PubMed Central

    Richards, Stephen

    2015-01-01

    The availability of reference genome sequences, especially the human reference, has revolutionized the study of biology. However, whilst the genomes of some species have been fully sequenced, a wide range of biological problems still cannot be effectively studied for lack of genome sequence information. Here, I identify neglected areas of biology and describe how both targeted species sequencing and more broad taxonomic surveys of the tree of life can address important biological questions. I enumerate the significant benefits that would accrue from sequencing a broader range of taxa, as well as discuss the technical advances in sequencing and assembly methods that would allow for wide-ranging application of whole-genome analysis. Finally, I suggest that in addition to “Big Science” survey initiatives to sequence the tree of life, a modified infrastructure-funding paradigm would better support reference genome sequence generation for research communities most in need. PMID:26003218

  18. Automated methods of predicting the function of biological sequences using GO and BLAST

    PubMed Central

    Jones, Craig E; Baumann, Ute; Brown, Alfred L

    2005-01-01

    Background With the exponential increase in genomic sequence data there is a need to develop automated approaches to deducing the biological functions of novel sequences with high accuracy. Our aim is to demonstrate how accuracy benchmarking can be used in a decision-making process evaluating competing designs of biological function predictors. We utilise the Gene Ontology, GO, a directed acyclic graph of functional terms, to annotate sequences with functional information describing their biological context. Initially we examine the effect on accuracy scores of increasing the allowed distance between predicted and a test set of curator assigned terms. Next we evaluate several annotator methods using accuracy benchmarking. Given an unannotated sequence we use the Basic Local Alignment Search Tool, BLAST, to find similar sequences that have already been assigned GO terms by curators. A number of methods were developed that utilise terms associated with the best five matching sequences. These methods were compared against a benchmark method of simply using terms associated with the best BLAST-matched sequence (best BLAST approach). Results The precision and recall of estimates increases rapidly as the amount of distance permitted between a predicted term and a correct term assignment increases. Accuracy benchmarking allows a comparison of annotation methods. A covering graph approach performs poorly, except where the term assignment rate is high. A term distance concordance approach has a similar accuracy to the best BLAST approach, demonstrating lower precision but higher recall. However, a discriminant function method has higher precision and recall than the best BLAST approach and other methods shown here. Conclusion Allowing term predictions to be counted correct if closely related to a correct term decreases the reliability of the accuracy score. As such we recommend using accuracy measures that require exact matching of predicted terms with curator assigned

  19. Automated methods of predicting the function of biological sequences using GO and BLAST.

    PubMed

    Jones, Craig E; Baumann, Ute; Brown, Alfred L

    2005-11-15

    With the exponential increase in genomic sequence data there is a need to develop automated approaches to deducing the biological functions of novel sequences with high accuracy. Our aim is to demonstrate how accuracy benchmarking can be used in a decision-making process evaluating competing designs of biological function predictors. We utilise the Gene Ontology, GO, a directed acyclic graph of functional terms, to annotate sequences with functional information describing their biological context. Initially we examine the effect on accuracy scores of increasing the allowed distance between predicted and a test set of curator assigned terms. Next we evaluate several annotator methods using accuracy benchmarking. Given an unannotated sequence we use the Basic Local Alignment Search Tool, BLAST, to find similar sequences that have already been assigned GO terms by curators. A number of methods were developed that utilise terms associated with the best five matching sequences. These methods were compared against a benchmark method of simply using terms associated with the best BLAST-matched sequence (best BLAST approach). The precision and recall of estimates increases rapidly as the amount of distance permitted between a predicted term and a correct term assignment increases. Accuracy benchmarking allows a comparison of annotation methods. A covering graph approach performs poorly, except where the term assignment rate is high. A term distance concordance approach has a similar accuracy to the best BLAST approach, demonstrating lower precision but higher recall. However, a discriminant function method has higher precision and recall than the best BLAST approach and other methods shown here. Allowing term predictions to be counted correct if closely related to a correct term decreases the reliability of the accuracy score. As such we recommend using accuracy measures that require exact matching of predicted terms with curator assigned terms. Furthermore, we conclude

  20. Sequence information signal processor for local and global string comparisons

    DOEpatents

    Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

    1997-01-01

    A sequence information signal processing integrated circuit chip designed to perform high speed calculation of a dynamic programming algorithm based upon the algorithm defined by Waterman and Smith. The signal processing chip of the present invention is designed to be a building block of a linear systolic array, the performance of which can be increased by connecting additional sequence information signal processing chips to the array. The chip provides a high speed, low cost linear array processor that can locate highly similar global sequences or segments thereof such as contiguous subsequences from two different DNA or protein sequences. The chip is implemented in a preferred embodiment using CMOS VLSI technology to provide the equivalent of about 400,000 transistors or 100,000 gates. Each chip provides 16 processing elements, and is designed to provide 16 bit, two's compliment operation for maximum score precision of between -32,768 and +32,767. It is designed to provide a comparison between sequences as long as 4,194,304 elements without external software and between sequences of unlimited numbers of elements with the aid of external software. Each sequence can be assigned different deletion and insertion weight functions. Each processor is provided with a similarity measure device which is independently variable. Thus, each processor can contribute to maximum value score calculation using a different similarity measure.

  1. Expanding morphological dimensions in neuropathology, from sequence biology to pathological sequences and clinical consequences.

    PubMed

    Uchihara, Toshiki

    2011-06-01

    One of the challenges in neuropathology is to clarify how molecules, functional carriers of uni-dimensional sequence of amino acid or nucleic acid, behave to engender disease-specific pathological processes in complex three-dimensional (3D) structures such as the human brain in an ordered chronological sequence (four-dimensional extension as a whole). Along with expanding molecular explanations for brain diseases, parallel and independent hypotheses based on morphological observations are particularly useful and necessary for reasonable understanding of the brain and its dysfunction. For example, with classical methods such as silver impregnations, it is possible to differentiate underlying molecular pathologies (three-repeat tau/Campbell-Switzer vs. four-repeat tau/Gallyas silver impregnation) for improved histological diagnosis. Innovations with 3D reconstruction not only provide more realistic reproduction of the targets but also allow quantitative measurement on a 3D basis (3D volumetry). Contrary to the prevailing impression that pathological deposits are generally toxic to cells, quantification demonstrated possible countertoxic potentials of ubiquitin-positive intranuclear inclusions in CAG-repeat disorders on a two-dimensional basis and of glial cytoplasmic inclusions of multiple system atrophy on 3D volumetry. Furthermore, 3D extension of neurites around target lesions is now traceable in relation to the relevant clinical consequences. This neurite neuropathology may pave the way for early specific diagnosis of neurodegenerative disorders, as established through (123) I-metaiodobenzylguanidine cardiac scintigraphy for Parkinson disease, aiming at therapeutic intervention before depletion of mother neurons is feasible. For appropriate translation of sequence biology into the frame of human neuropathology, it is necessary to expand further the morphological dimensions so that comprehensive understanding of these disorders leads to specific diagnosis and

  2. Efficient combination of multiple word models for improved sequence comparison.

    PubMed

    Huang, Xiaoqiu; Ye, Liang; Chou, Hui-Hsien; Yang, I-Hsuan; Chao, Kun-Mao

    2004-11-01

    Studies of efficient and sensitive sequence comparison methods are driven by a need to find homologous regions of weak similarity between large genomes. We describe an improved method for finding similar regions between two sets of DNA sequences. The new method generalizes existing methods by locating word matches between sequences under two or more word models and extending word matches into high-scoring segment pairs (HSPs). The method is implemented as a computer program named DDS2. Experimental results show that DDS2 can find more HSPs by using several word models than by using one word model. The DDS2 program is freely available for academic use in binary code form at http://bioinformatics.iastate.edu/aat/align/align.html and in source code form from the corresponding author.

  3. Fast comparison of DNA sequences by oligonucleotide profiling

    PubMed Central

    Arnau, Vicente; Gallach, Miguel; Marín, Ignacio

    2008-01-01

    Background The comparison of DNA sequences is a traditional problem in genomics and bioinformatics. Many new opportunities emerge due to the improvement of personal computers, allowing the implementation of novel strategies of analysis. Findings We describe a new program, called UVWORD, which determines the number of times that each DNA word present in a sequence (target) is found in a second sequence (source), a procedure that we have called oligonucleotide profiling. On a standard computer, the user may search for words of a size ranging from k = 1 to k = 14 nucleotides. Average counts for groups of contiguous words may also be established. The rate of analysis on standard computers is from 3.4 (k = 14) to 16 millions of words per second (1 ≤ k ≤ 8). This makes feasible the fast screening of even the longest known DNA molecules. Discussion We show that the combination of the ability of analyzing words of relatively long size, which occur very rarely by chance, and the fast speed of the program allows to perform novel types of screenings, complementary to those provided by standard programs such as BLAST. This method can be used to determine oligonucleotide content, to characterize the distribution of repetitive sequences in chromosomes, to determine the evolutionary conservation of sequences in different species, to establish regions of similar DNA among chromosomes or genomes, etc. PMID:18710530

  4. Weighting in sequence space: A comparison of methods in terms of generalized sequences

    SciTech Connect

    Vingron, M. ); Sibbald, P.R. )

    1993-10-01

    Four methods for weighting aligned biological sequences have recently appeared that differ mathematically, philosophically, and in their results. Thus, while there is consensus about the need to weight sequences, the method to use is contentious. A geometric analysis based on a continuous sequence space is presented that provides a common framework in which to compare the methods. It is concluded that there are two best' methods. When the sequences are known to be phylogenetically related and a tree can be generated without introducing excessive stress into the data, the method of Altschul et al. [Altschul, S.F., Carroll, R.J. Lipman, D.J. (1989) J. Mol. Biol. 207, 647-653] is appropriate. When the sequences are not known to be phylogenetically related or a tree cannot be produced without unduly distorting the distances between the sequences, a modification of the method of Sibbald and Argos [Sibbald, P.R. Argos, p. (1990) J. Mol. Biol. 216, 813-818] is preferable. 29 refs., 3 figs., 2 tabs.

  5. Advanced Applications of Next-Generation Sequencing Technologies to Orchid Biology.

    PubMed

    Yeh, Chuan-Ming; Liu, Zhong-Jian; Tsai, Wen-Chieh

    2017-09-08

    Next-generation sequencing technologies are revolutionizing biology by permitting, transcriptome sequencing, whole-genome sequencing and resequencing, and genome-wide single nucleotide polymorphism profiling. Orchid research has benefited from this breakthrough, and a few orchid genomes are now available; new biological questions can be approached and new breeding strategies can be designed. The first part of this review describes the unique features of orchid biology. The second part provides an overview of the current next-generation sequencing platforms, many of which are already used in plant laboratories. The third part summarizes the state of orchid transcriptome and genome sequencing and illustrates current achievements. The genetic sequences currently obtained will not only provide a broad scope for the study of orchid biology, but also serves as a starting point for uncovering the mystery of orchid evolution.

  6. Multi-species sequence comparison: the next frontier in genome annotation.

    PubMed

    Dubchak, Inna; Frazer, Kelly

    2003-01-01

    Multi-species comparisons of DNA sequences are more powerful for discovering functional sequences than pairwise DNA sequence comparisons. Most current computational tools have been designed for pairwise comparisons, and efficient extension of these tools to multiple species will require knowledge of the ideal evolutionary distance to choose and the development of new algorithms for alignment, analysis of conservation, and visualization of results.

  7. Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment.

    PubMed

    Ferragina, Paolo; Giancarlo, Raffaele; Greco, Valentina; Manzini, Giovanni; Valiente, Gabriel

    2007-07-13

    Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric) has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal Compression Dissimilarity), NCD (Normalized Compression Dissimilarity) and CD (Compression Dissimilarity). Their applicability and robustness is tested on various data sets yielding a first massive quantitative estimate that the USM methodology and its approximations are of value. Despite the rich theory developed around USM, its experimental assessment has limitations: only a few data compressors have been tested in conjunction with USM and mostly at a qualitative level, no comparison among UCD, NCD and CD is available and no comparison of USM with existing methods, both based on alignments and not, seems to be available. We experimentally test the USM methodology by using 25 compressors, all three of its known approximations and six data sets of relevance to Molecular Biology. This offers the first systematic and quantitative experimental assessment of this methodology, that naturally complements the many theoretical and the preliminary experimental results available. Moreover, we compare the USM methodology both with methods based on alignments and not. We may group our experiments into two sets. The first one, performed via ROC (Receiver Operating Curve) analysis, aims at

  8. Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment

    PubMed Central

    Ferragina, Paolo; Giancarlo, Raffaele; Greco, Valentina; Manzini, Giovanni; Valiente, Gabriel

    2007-01-01

    Background Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric) has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal Compression Dissimilarity), NCD (Normalized Compression Dissimilarity) and CD (Compression Dissimilarity). Their applicability and robustness is tested on various data sets yielding a first massive quantitative estimate that the USM methodology and its approximations are of value. Despite the rich theory developed around USM, its experimental assessment has limitations: only a few data compressors have been tested in conjunction with USM and mostly at a qualitative level, no comparison among UCD, NCD and CD is available and no comparison of USM with existing methods, both based on alignments and not, seems to be available. Results We experimentally test the USM methodology by using 25 compressors, all three of its known approximations and six data sets of relevance to Molecular Biology. This offers the first systematic and quantitative experimental assessment of this methodology, that naturally complements the many theoretical and the preliminary experimental results available. Moreover, we compare the USM methodology both with methods based on alignments and not. We may group our experiments into two sets. The first one, performed via ROC (Receiver Operating Curve

  9. Revision of Begomovirus taxonomy based on pairwise sequence comparisons.

    PubMed

    Brown, Judith K; Zerbini, F Murilo; Navas-Castillo, Jesús; Moriones, Enrique; Ramos-Sobrinho, Roberto; Silva, José C F; Fiallo-Olivé, Elvira; Briddon, Rob W; Hernández-Zepeda, Cecilia; Idris, Ali; Malathi, V G; Martin, Darren P; Rivera-Bustamante, Rafael; Ueda, Shigenori; Varsani, Arvind

    2015-06-01

    Viruses of the genus Begomovirus (family Geminiviridae) are emergent pathogens of crops throughout the tropical and subtropical regions of the world. By virtue of having a small DNA genome that is easily cloned, and due to the recent innovations in cloning and low-cost sequencing, there has been a dramatic increase in the number of available begomovirus genome sequences. Even so, most of the available sequences have been obtained from cultivated plants and are likely a small and phylogenetically unrepresentative sample of begomovirus diversity, a factor constraining taxonomic decisions such as the establishment of operationally useful species demarcation criteria. In addition, problems in assigning new viruses to established species have highlighted shortcomings in the previously recommended mechanism of species demarcation. Based on the analysis of 3,123 full-length begomovirus genome (or DNA-A component) sequences available in public databases as of December 2012, a set of revised guidelines for the classification and nomenclature of begomoviruses are proposed. The guidelines primarily consider a) genus-level biological characteristics and b) results obtained using a standardized classification tool, Sequence Demarcation Tool, which performs pairwise sequence alignments and identity calculations. These guidelines are consistent with the recently published recommendations for the genera Mastrevirus and Curtovirus of the family Geminiviridae. Genome-wide pairwise identities of 91 % and 94 % are proposed as the demarcation threshold for begomoviruses belonging to different species and strains, respectively. Procedures and guidelines are outlined for resolving conflicts that may arise when assigning species and strains to categories wherever the pairwise identity falls on or very near the demarcation threshold value.

  10. High-throughput sequencing in veterinary infection biology and diagnostics.

    PubMed

    Belák, S; Karlsson, O E; Leijon, M; Granberg, F

    2013-12-01

    Sequencing methods have improved rapidly since the first versions of the Sanger techniques, facilitating the development of very powerful tools for detecting and identifying various pathogens, such as viruses, bacteria and other microbes. The ongoing development of high-throughput sequencing (HTS; also known as next-generation sequencing) technologies has resulted in a dramatic reduction in DNA sequencing costs, making the technology more accessible to the average laboratory. In this White Paper of the World Organisation for Animal Health (OIE) Collaborating Centre for the Biotechnology-based Diagnosis of Infectious Diseases in Veterinary Medicine (Uppsala, Sweden), several approaches and examples of HTS are summarised, and their diagnostic applicability is briefly discussed. Selected future aspects of HTS are outlined, including the need for bioinformatic resources, with a focus on improving the diagnosis and control of infectious diseases in veterinary medicine.

  11. Systems biology: model based evaluation and comparison of potential explanations for given biological data.

    PubMed

    Cedersund, Gunnar; Roll, Jacob

    2009-02-01

    Systems biology and its usage of mathematical modeling to analyse biological data is rapidly becoming an established approach to biology. A crucial advantage of this approach is that more information can be extracted from observations of intricate dynamics, which allows nontrivial complex explanations to be evaluated and compared. In this minireview we explain this process, and review some of the most central available analysis tools. The focus is on the evaluation and comparison of given explanations for a given set of experimental data and prior knowledge. Three types of methods are discussed: (a) for evaluation of whether a given model is sufficiently able to describe the given data to be nonrejectable; (b) for evaluation of whether a slightly superior model is significantly better; and (c) for a general evaluation and comparison of the biologically interesting features in a model. The most central methods are reviewed, both in terms of underlying assumptions, including references to more advanced literature for the theoretically oriented reader, and in terms of practical guidelines and examples, for the practically oriented reader. Many of the methods are based upon analysis tools from statistics and engineering, and we emphasize that the systems biology focus on acceptable explanations puts these methods in a nonstandard setting. We highlight some associated future improvements that will be essential for future developments of model based data analysis in biology.

  12. Single-cell sequencing in stem cell biology.

    PubMed

    Wen, Lu; Tang, Fuchou

    2016-04-15

    Cell-to-cell variation and heterogeneity are fundamental and intrinsic characteristics of stem cell populations, but these differences are masked when bulk cells are used for omic analysis. Single-cell sequencing technologies serve as powerful tools to dissect cellular heterogeneity comprehensively and to identify distinct phenotypic cell types, even within a 'homogeneous' stem cell population. These technologies, including single-cell genome, epigenome, and transcriptome sequencing technologies, have been developing rapidly in recent years. The application of these methods to different types of stem cells, including pluripotent stem cells and tissue-specific stem cells, has led to exciting new findings in the stem cell field. In this review, we discuss the recent progress as well as future perspectives in the methodologies and applications of single-cell omic sequencing technologies.

  13. Statistical Comparison of Spatial Point Patterns in Biological Imaging

    PubMed Central

    Burguet, Jasmine; Andrey, Philippe

    2014-01-01

    In biological systems, functions and spatial organizations are closely related. Spatial data in biology frequently consist of, or can be assimilated to, sets of points. An important goal in the quantitative analysis of such data is the evaluation and localization of differences in spatial distributions between groups. Because of experimental replications, achieving this goal requires comparing collections of point sets, a noticeably challenging issue for which no method has been proposed to date. We introduce a strategy to address this problem, based on the comparison of point intensities throughout space. Our method is based on a statistical test that determines whether local point intensities, estimated using replicated data, are significantly different or not. Repeating this test at different positions provides an intensity comparison map and reveals domains showing significant intensity differences. Simulated data were used to characterize and validate this approach. The method was then applied to two different neuroanatomical systems to evaluate its ability to reveal spatial differences in biological data sets. Applied to two distinct neuronal populations within the rat spinal cord, the method generated an objective representation of the spatial segregation established previously on a subjective visual basis. The method was also applied to analyze the spatial distribution of locus coeruleus neurons in control and mutant mice. The results objectively consolidated previous conclusions obtained from visual comparisons. Remarkably, they also provided new insights into the maturation of the locus coeruleus in mutant and control animals. Overall, the method introduced here is a new contribution to the quantitative analysis of biological organizations that provides meaningful spatial representations which are easy to understand and to interpret. Finally, because our approach is generic and punctual structures are widespread at the cellular and histological scales, it

  14. Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification.

    PubMed

    Borozan, Ivan; Watt, Stuart; Ferretti, Vincent

    2015-05-01

    Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. ivan.borozan@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  15. Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification

    PubMed Central

    Borozan, Ivan; Watt, Stuart; Ferretti, Vincent

    2015-01-01

    Motivation: Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Results: Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. Availability and implementation: All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. Contact: ivan.borozan@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25573913

  16. Identifying satellites and periodic repetitions in biological sequences.

    PubMed

    Sagot, M F; Myers, E W

    1998-01-01

    We present in this paper an algorithm for identifying satellites in DNA sequences. Satellites (simple, micro, or mini) are repeats in number between 30 and as many as 1,000,000 whose lengths vary between 2 and hundreds of base pairs and that appear, with some mutations, in tandem along the sequence. We concentrate here on short to moderately long (up to 30-40 base pairs) approximate tandem repeats where copies may differ up to epsilon = 15-20% from a consensus model of the repeating unit (implying individual units may vary by 2 epsilon from each other). The algorithm is composed of two parts. The first one consists of a filter that basically eliminates all regions whose probability of containing a satellite is less than one in 10(4) when epsilon = 10%. The second part realizes an exhaustive exploration of the space of all possible models for the repeating units present in the sequence. It therefore has the advantage over previous work of being able to report a consensus model, say m, of the repeated unit as well as the span of the satellite. The first phase was designed for efficiency and takes only O (n) time where n is the length of the sequence. The second phase was designed for sensitivity and takes time O (n . N (e, k)) in the worst case where k is the length of the repeating unit m, e = [epsilon k] is the number of differences allowed between each repeat unit and the model m, and N (e, k) is the maximum number of words that are not more than e differences from another word of length k. That is, N (e, k) is the maximum size of an e-neighborhood of a string of length k. Experiments reveal the second phase to be considerably faster in practice than the worst-case complexity bound suggests. Finally, the present algorithm is easily adapted to finding tandem repeats in protein sequences, as well as extended to identifying mixed direct-inverse tandem repeats.

  17. De novo Sequencing, Characterization, and Comparison of Inflorescence Transcriptomes of Cornus canadensis and C. florida (Cornaceae)

    PubMed Central

    Zhang, Jian; Franks, Robert G.; Liu, Xiang; Kang, Ming; Keebler, Jonathan E. M.; Schaff, Jennifer E.; Huang, Hong-Wen; Xiang, Qiu-Yun (Jenny)

    2013-01-01

    Background Transcriptome sequencing analysis is a powerful tool in molecular genetics and evolutionary biology. Here we report the results of de novo 454 sequencing, characterization, and comparison of inflorescence transcriptomes of two closely related dogwood species, Cornus canadensis and C. florida (Cornaceae). Our goals were to build a preliminary source of genome sequence data, and to identify genes potentially expressed differentially between the inflorescence transcriptomes for these important horticultural species. Results The sequencing of cDNAs from inflorescence buds of C. canadensis (cc) and C. florida (cf), and normalized cDNAs from leaves of C. canadensis resulted in 251799 (ccBud), 96245 (ccLeaf) and 114648 (cfBud) raw reads, respectively. The de novo assembly of the high quality (HQ) reads resulted in 36088, 17802 and 21210 unigenes for ccBud, ccLeaf and cfBud. A reference transcriptome for C. canadensis was built by assembling HQ reads of ccBud and ccLeaf, containing 40884 unigenes. Reference mapping and comparative analyses found 10926 sequences were putatively specific to ccBud, and 6979 putatively specific to cfBud. Putative differentially expressed genes between ccBud and cfBud that are related to flower development and/or stress response were identified among 7718 shared sequences by ccBud and cfBud. Bi-directional BLAST found 87 (41.83% of 208) of Arabidopsis genes related to inflorescence development had putative orthologs in the dogwood transcriptomes. Comparisons of the shared sequences by ccBud and cfBud yielded 65931 high quality SNPs between two species. The twenty unigenes with the most SNPs are listed as potential genetic markers for evolutionary studies. Conclusions The data provide an important, although preliminary, information platform for functional genomics and evolutionary developmental biology in Cornus. The study identified putative candidates potentially involved in the genetic regulation of inflorescence evolution and

  18. International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons

    PubMed Central

    Olson, Nathan D.; Lund, Steven P.; Zook, Justin M.; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S.; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B.

    2015-01-01

    This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing®, or Ion Torrent PGM®. The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies. PMID:27077030

  19. International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons.

    PubMed

    Olson, Nathan D; Lund, Steven P; Zook, Justin M; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B

    2015-03-01

    This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing(®), or Ion Torrent PGM(®). The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies.

  20. Next-generation sequencing workflows in veterinary infection biology: towards validation and quality assurance.

    PubMed

    Van Borm, S; Wang, J; Granberg, F; Colling, A

    2016-04-01

    Recent advancements in DNA sequencing methodologies and sequence data analysis have revolutionised research in many areas of biology and medicine, including veterinary infection biology. New technology is poised to bridge the gap between the research and diagnostic laboratory. This paper defines the potential diagnostic value and purposes of next-generation sequencing (NGS) applications in veterinary infection biology and explores their compatibility with the existing validation principles and methods of the World Organisation for Animal Health. Critical parameters for validation and quality control (quality metrics) are suggested, with reference to established validation and quality assurance guidelines for NGS-based methods of diagnosing human heritable diseases. Although most currently described NGS applications in veterinary infection biology are not primary diagnostic tests that directly result in control measures, this critical reflection on the advantages and remaining challenges of NGS technology should stimulate discussion on its diagnostic value and on the potential to validate NGS methods and monitor their diagnostic performance.

  1. Correlation between MCAT Biology Content Specifications and Topic Scope and Sequence of General Education College Biology Textbooks

    PubMed Central

    Rissing, Steven W.

    2013-01-01

    Most American colleges and universities offer gateway biology courses to meet the needs of three undergraduate audiences: biology and related science majors, many of whom will become biomedical researchers; premedical students meeting medical school requirements and preparing for the Medical College Admissions Test (MCAT); and students completing general education (GE) graduation requirements. Biology textbooks for these three audiences present a topic scope and sequence that correlates with the topic scope and importance ratings of the biology content specifications for the MCAT regardless of the intended audience. Texts for “nonmajors,” GE courses appear derived directly from their publisher's majors text. Topic scope and sequence of GE texts reflect those of “their” majors text and, indirectly, the MCAT. MCAT term density of GE texts equals or exceeds that of their corresponding majors text. Most American universities require a GE curriculum to promote a core level of academic understanding among their graduates. This includes civic scientific literacy, recognized as an essential competence for the development of public policies in an increasingly scientific and technological world. Deriving GE biology and related science texts from majors texts designed to meet very different learning objectives may defeat the scientific literacy goals of most schools’ GE curricula. PMID:24006392

  2. Correlation between MCAT biology content specifications and topic scope and sequence of general education college biology textbooks.

    PubMed

    Rissing, Steven W

    2013-01-01

    Most American colleges and universities offer gateway biology courses to meet the needs of three undergraduate audiences: biology and related science majors, many of whom will become biomedical researchers; premedical students meeting medical school requirements and preparing for the Medical College Admissions Test (MCAT); and students completing general education (GE) graduation requirements. Biology textbooks for these three audiences present a topic scope and sequence that correlates with the topic scope and importance ratings of the biology content specifications for the MCAT regardless of the intended audience. Texts for "nonmajors," GE courses appear derived directly from their publisher's majors text. Topic scope and sequence of GE texts reflect those of "their" majors text and, indirectly, the MCAT. MCAT term density of GE texts equals or exceeds that of their corresponding majors text. Most American universities require a GE curriculum to promote a core level of academic understanding among their graduates. This includes civic scientific literacy, recognized as an essential competence for the development of public policies in an increasingly scientific and technological world. Deriving GE biology and related science texts from majors texts designed to meet very different learning objectives may defeat the scientific literacy goals of most schools' GE curricula.

  3. Linking experimental results, biological networks and sequence analysis methods using Ontologies and Generalised Data Structures.

    PubMed

    Koehler, Jacob; Rawlings, Chris; Verrier, Paul; Mitchell, Rowan; Skusa, Andre; Ruegg, Alexander; Philippi, Stephan

    2005-01-01

    The structure of a closely integrated data warehouse is described that is designed to link different types and varying numbers of biological networks, sequence analysis methods and experimental results such as those coming from microarrays. The data schema is inspired by a combination of graph based methods and generalised data structures and makes use of ontologies and meta-data. The core idea is to consider and store biological networks as graphs, and to use generalised data structures (GDS) for the storage of further relevant information. This is possible because many biological networks can be stored as graphs: protein interactions, signal transduction networks, metabolic pathways, gene regulatory networks etc. Nodes in biological graphs represent entities such as promoters, proteins, genes and transcripts whereas the edges of such graphs specify how the nodes are related. The semantics of the nodes and edges are defined using ontologies of node and relation types. Besides generic attributes that most biological entities possess (name, attribute description), further information is stored using generalised data structures. By directly linking to underlying sequences (exons, introns, promoters, amino acid sequences) in a systematic way, close interoperability to sequence analysis methods can be achieved. This approach allows us to store, query and update a wide variety of biological information in a way that is semantically compact without requiring changes at the database schema level when new kinds of biological information is added. We describe how this datawarehouse is being implemented by extending the text-mining framework ONDEX to link, support and complement different bioinformatics applications and research activities such as microarray analysis, sequence analysis and modelling/simulation of biological systems. The system is developed under the GPL license and can be downloaded from http://sourceforge.net/projects/ondex/

  4. Multidomain Peptides: Sequence-Nanostructure Relationships and Biological Applications

    NASA Astrophysics Data System (ADS)

    Bakota, Erica Laraine

    2011-12-01

    Peptides are materials that, as a result of their polymeric nature, possess enormous versatility and customizability. Multidomain peptides are a class of peptides that self-assemble to form stable, cytocompatible hydrogels. They have an ABA block motif, in which the A block is composed of charged amino acids, such as lysine, and the B block consists of alternating hydrophilic and hydrophobic amino acids, such as glutamine and leucine. The B block forms a facial amphiphile that drives self-assembly. The charged A blocks simultaneously limit self-assembly and improve solubility. Self-assembly is triggered by charge screening of these charged amino acids, enabling the formation of beta-sheet fibers. The development of an extended nanofiber network can result in the formation of a hydrogel. Systematic modifications to both the A and B blocks were investigated, and it was found that sequence modifications have a large impact on peptide nanostructure and hydrogel rheology. The first modification examined is the substitution of amino acids within the hydrophilic positions of the B block. The second set of modifications investigated was the incorporation of aromatic amino acids in the B block. Finally, the charged block was varied to generate different net charges on the peptides, a change which impacted the ability to use these peptides in cell culture. Two applications of multidomain peptide nanofibers are explored, the first of which is the delivery of novel therapies in vivo. One multidomain peptide is able to form hydrogels that undergo shear-thinning and rapid recovery. This gel can be loaded with cytokines and growth factors that have been secreted by embryonic stem cells, and these molecules can be subsequently released in a therapeutic setting. Another application for multidomain peptide is their use as biocompatible surfactants. Single-walled carbon nanotubes have been widely investigated for their unique optical and electrical properties, but their solubility in

  5. Elucidation of the sequence of canine (pro)-calcitonin. A molecular biological and protein chemical approach.

    PubMed

    Mol, J A; Kwant, M M; Arnold, I C; Hazewinkel, H A

    1991-09-03

    From the canine thyroid gland a calcitonin (CT) immunoreactive peptide was purified by successive aqueous acid acetone extraction, gel filtration and HPLC. Gas-phase sequencing of the purified peptide showed that the first 25 amino acids had 65% sequence homology with the amino-terminus of the human CT prohormone. A canine cDNA library was then made from the thyroid gland. A plasmid was isolated containing a sequence that is homologous to part of exon 3, and the complete sequence of exon 4 of the human mRNA encoding preproCT. From this cDNA the amino acid sequence of canine CT is predicted. In comparison with well-known CT sequences of other species, the strongest homology exists with bovine, porcine and ovine CT.

  6. Sequence comparison on a cluster of workstations using the PVM system

    SciTech Connect

    Guan, X.; Mural, R.J.; Uberbacher, E.C.

    1995-02-01

    We have implemented a distributed sequence comparison algorithm on a cluster of workstations using the PVM paradigm. This implementation has achieved similar performance to the intel iPSC/860 Hypercube, a massively parallel computer. The distributed sequence comparison algorithm serves as a search tool for two Internet servers GRAIL and GENQUEST. This paper describes the implementation and the performance of the algorithm.

  7. Comparative systems biology between human and animal models based on next-generation sequencing methods.

    PubMed

    Zhao, Yu-Qi; Li, Gong-Hua; Huang, Jing-Fei

    2013-04-01

    Animal models provide myriad benefits to both experimental and clinical research. Unfortunately, in many situations, they fall short of expected results or provide contradictory results. In part, this can be the result of traditional molecular biological approaches that are relatively inefficient in elucidating underlying molecular mechanism. To improve the efficacy of animal models, a technological breakthrough is required. The growing availability and application of the high-throughput methods make systematic comparisons between human and animal models easier to perform. In the present study, we introduce the concept of the comparative systems biology, which we define as "comparisons of biological systems in different states or species used to achieve an integrated understanding of life forms with all their characteristic complexity of interactions at multiple levels". Furthermore, we discuss the applications of RNA-seq and ChIP-seq technologies to comparative systems biology between human and animal models and assess the potential applications for this approach in the future studies.

  8. Graphical visualization of the biologically significant segments in the sequence sets of the relative plant viruses.

    PubMed

    Shcherbatenko, I S

    2012-01-01

    The author's and collaborators' computational investigations of the conserved biologically significant segments within viral nucleotide and amino acid sequences are considered in the article. The results obtained suggest that the interactive graphical visualization of the short identical or similar sites in the sequence sets of relative viruses allows to reveal various specific elements such as right, inverted tandem, opposite and regular repeals; deletion/insertion; GC/AT-rich sites; contexts of translation initiation and termination codons; transcription initiation signals; spontaneous nucleotide substitutions; codon usage bias etc. To reveal and investigate different biologically significant sequences very short and simple computer programs, based on common sequence scanning algorithm, may be employed. Various graphic objects, which appeared during visualization of similar sites, may be computationally converted into corresponding nucleotide or amino acid sequences followed by writing within a text file. The change of some scanning parameters or slight modification of certain program modules allows to enlarge the program potentialities. A set of little and simplified computer programs obtained by successive modifications of the initial program is a suitable tool for quick revealing and investigating various biologically significant sequence sites.

  9. Comparison of next generation sequencing technologies for transcriptome characterization

    PubMed Central

    2009-01-01

    Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary

  10. Cost-utility of biological treatment sequences for luminal Crohn's disease in Europe.

    PubMed

    Rencz, Fanni; Gulácsi, László; Péntek, Márta; Gecse, Krisztina B; Dignass, Axel; Halfvarson, Jonas; Gomollón, Fernando; Baji, Petra; Peyrin-Biroulet, Laurent; Lakatos, Peter L; Brodszky, Valentin

    2017-04-28

    This study aims to compare the cost-effectiveness of treatment sequences with available biologics, including adalimumab (ADA), biosimilar infliximab (bsIFX), originator infliximab (IFX) and vedolizumab (VEDO) for luminal Crohn's disease in nine European countries. A Markov-model was constructed to simulate five-year medical costs and quality-adjusted life years (QALYs). Data on clinical efficacy were obtained from randomised controlled trials. Country-specific unit costs, discount rates and a third-party payer perspective were applied. The bsIFX versus conventional therapy resulted in the most favourable incremental cost-utility ratios (ICURs) ranging from €34,580 (Hungary) to €77,062/QALY (Sweden). Compared to bsIFX, the bsIFX-ADA sequence was more cost-effective than the bsIFX-VEDO sequence with ICURs varying between €70,277 (France) and €162,069/QALY (Germany). The ICURs of the bsIFX-ADA-VEDO sequence versus the bsIFX-ADA strategy were between €206,266 (The Netherlands) and €363,232/QALY (Spain). We are the first to compare cost-effectiveness of multiple biological sequences for luminal Crohn's disease. Based on our findings, bsIFX can be recommended as a first-line treatment in patients unresponsive to conventional treatments. While biological sequences only slightly differ in their associated health gains, their costs vary greatly. The bsIFX-ADA-VEDO seems to be the most cost-effective sequence of the available biologics across Europe.

  11. Genomic Sequence Comparisons, 1987-2003 Final Report

    SciTech Connect

    George M. Church

    2004-07-29

    This project was to develop new DNA sequencing and RNA and protein quantitation methods and related genome annotation tools. The project began in 1987 with the development of multiplex sequencing (published in Science in 1988), and one of the first automated sequencing methods. This lead to the first commercial genome sequence in 1994 and to the establishment of the main commercial participants (GTC then Agencourt) in the public DOE/NIH genome project. In collaboration with GTC we contributed to one of the first complete DOE genome sequences, in 1997, that of Methanobacterium thermoautotropicum, a species of great relevance to energy-rich gas production.

  12. High-throughput sequencing for the study of bacterial pathogen biology

    PubMed Central

    McAdam, Paul R; Richardson, Emily J; Fitzgerald, J Ross

    2014-01-01

    A revolution in sequencing technologies in recent years has led to dramatically increased throughput and reduced cost of bacterial genome sequencing. An increasing number of applications of the new technologies are providing broad insights into bacterial evolution, epidemiology, and pathogenesis. For example, the capacity to sequence large numbers of bacterial isolates is enabling high resolution phylogenetic analyses of bacterial populations leading to greatly enhanced understanding of the emergence, adaptation, and transmission of pathogenic clones. In addition, RNA-seq offers improved quantification and resolution for transcriptomic analysis, and the combination of high-throughput sequencing with transposon mutagenesis is a powerful approach for the identification of bacterial determinants required for survival in vivo. In this concise review we provide selected examples of how high throughput sequencing is being applied to understand the biology of bacterial pathogens, and discuss future technological advances likely to have a profound impact on the field. PMID:25033019

  13. MS4--Multi-Scale Selector of Sequence Signatures: an alignment-free method for classification of biological sequences.

    PubMed

    Corel, Eduardo; Pitschi, Florian; Laprevotte, Ivan; Grasseau, Gilles; Didier, Gilles; Devauchelle, Claudine

    2010-07-30

    While multiple alignment is the first step of usual classification schemes for biological sequences, alignment-free methods are being increasingly used as alternatives when multiple alignments fail. Subword-based combinatorial methods are popular for their low algorithmic complexity (suffix trees ...) or exhaustivity (motif search), in general with fixed length word and/or number of mismatches. We developed previously a method to detect local similarities (the N-local decoding) based on the occurrences of repeated subwords of fixed length, which does not impose a fixed number of mismatches. The resulting similarities are, for some "good" values of N, sufficiently relevant to form the basis of a reliable alignment-free classification. The aim of this paper is to develop a method that uses the similarities detected by N-local decoding while not imposing a fixed value of N. We present a procedure that selects for every position in the sequences an adaptive value of N, and we implement it as the MS4 classification tool. Among the equivalence classes produced by the N-local decodings for all N, we select a (relatively) small number of "relevant" classes corresponding to variable length subwords that carry enough information to perform the classification. The parameter N, for which correct values are data-dependent and thus hard to guess, is here replaced by the average repetitivity kappa of the sequences. We show that our approach yields classifications of several sets of HIV/SIV sequences that agree with the accepted taxonomy, even on usually discarded repetitive regions (like the non-coding part of LTR). The method MS4 satisfactorily classifies a set of sequences that are notoriously hard to align. This suggests that our approach forms the basis of a reliable alignment-free classification tool. The only parameter kappa of MS4 seems to give reasonable results even for its default value, which can be a great advantage for sequence sets for which little information is

  14. The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution

    USDA-ARS?s Scientific Manuscript database

    As a major step toward understanding the biology and evolution of ruminants, the cattle genome was sequenced to ~7x coverage using a combined whole genome shotgun and BAC skim approach. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs found in seven mammalian...

  15. The genome sequence of taurine cattle: A window to ruminant biology and evolution

    USDA-ARS?s Scientific Manuscript database

    To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (ma...

  16. A direct method for computing extreme value (Gumbel) parameters for gapped biological sequence alignments.

    PubMed

    Quinn, Terrance; Sinkala, Zachariah

    2014-01-01

    We develop a general method for computing extreme value distribution (Gumbel, 1958) parameters for gapped alignments. Our approach uses mixture distribution theory to obtain associated BLOSUM matrices for gapped alignments, which in turn are used for determining significance of gapped alignment scores for pairs of biological sequences. We compare our results with parameters already obtained in the literature.

  17. IBS: an illustrator for the presentation and visualization of biological sequences.

    PubMed

    Liu, Wenzhong; Xie, Yubin; Ma, Jiyong; Luo, Xiaotong; Nie, Peng; Zuo, Zhixiang; Lahrmann, Urs; Zhao, Qi; Zheng, Yueyuan; Zhao, Yong; Xue, Yu; Ren, Jian

    2015-10-15

    Biological sequence diagrams are fundamental for visualizing various functional elements in protein or nucleotide sequences that enable a summarization and presentation of existing information as well as means of intuitive new discoveries. Here, we present a software package called illustrator of biological sequences (IBS) that can be used for representing the organization of either protein or nucleotide sequences in a convenient, efficient and precise manner. Multiple options are provided in IBS, and biological sequences can be manipulated, recolored or rescaled in a user-defined mode. Also, the final representational artwork can be directly exported into a publication-quality figure. The standalone package of IBS was implemented in JAVA, while the online service was implemented in HTML5 and JavaScript. Both the standalone package and online service are freely available at http://ibs.biocuckoo.org. renjian.sysu@gmail.com or xueyu@hust.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  18. IBS: an illustrator for the presentation and visualization of biological sequences

    PubMed Central

    Liu, Wenzhong; Xie, Yubin; Ma, Jiyong; Luo, Xiaotong; Nie, Peng; Zuo, Zhixiang; Lahrmann, Urs; Zhao, Qi; Zheng, Yueyuan; Zhao, Yong; Xue, Yu; Ren, Jian

    2015-01-01

    Summary: Biological sequence diagrams are fundamental for visualizing various functional elements in protein or nucleotide sequences that enable a summarization and presentation of existing information as well as means of intuitive new discoveries. Here, we present a software package called illustrator of biological sequences (IBS) that can be used for representing the organization of either protein or nucleotide sequences in a convenient, efficient and precise manner. Multiple options are provided in IBS, and biological sequences can be manipulated, recolored or rescaled in a user-defined mode. Also, the final representational artwork can be directly exported into a publication-quality figure. Availability and implementation: The standalone package of IBS was implemented in JAVA, while the online service was implemented in HTML5 and JavaScript. Both the standalone package and online service are freely available at http://ibs.biocuckoo.org. Contact: renjian.sysu@gmail.com or xueyu@hust.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26069263

  19. Direct Chloroplast Sequencing: Comparison of Sequencing Platforms and Analysis Tools for Whole Chloroplast Barcoding

    PubMed Central

    Brozynska, Marta; Furtado, Agnelo; Henry, Robert James

    2014-01-01

    Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina) and Ion Torrent (Life Technology) sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare). Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels) between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis. PMID:25329378

  20. A data parallel strategy for aligning multiple biological sequences on multi-core computers.

    PubMed

    Zhu, Xiangyuan; Li, Kenli; Salah, Ahmad

    2013-05-01

    In this paper, we address the large-scale biological sequence alignment problem, which has an increasing demand in computational biology. We employ data parallelism paradigm that is suitable for handling large-scale processing on multi-core computers to achieve a high degree of parallelism. Using the data parallelism paradigm, we propose a general strategy which can be used to speed up any multiple sequence alignment method. We applied five different clustering algorithms in our strategy and implemented rigorous tests on an 8-core computer using four traditional benchmarks and artificially generated sequences. The results show that our multi-core-based implementations can achieve up to 151-fold improvements in execution time while losing 2.19% accuracy on average. The source code of the proposed strategy, together with the test sets used in our analysis, is available on request. Copyright © 2013 Elsevier Ltd. All rights reserved.

  1. Foldamers with Hybrid Biological and Synthetic Sequences as Selective DNA Fluorescent Probes

    PubMed Central

    Wang, Wei; Wan, Wei; Stachiw, Andrew; Li, Alexander D.Q.

    2008-01-01

    Foldable polymers with alternating single strand deoxyribonucleic acid (ssDNA) and planar fluorescent organic chromophores can self-organize into folded nanostructures and hence are hybrid foldamers with biological sequences and synthetic properties. The biological sequence provides highly specific molecular recognition properties while the physical properties of synthetic chromophores offer sensitive fluorescence detection. In this paper, we describe that rational designed hybrid foldamers exhibit potential in the detection of polynucleotides. Under strictly controlled laboratory conditions, fluorescence measurements indicate that configuration change due to binding of polynucleotides with one or two mismatched bases can be readily distinguished. These results shed light on the design and construction of nanostructured foldamers with actuator and sensory properties, which may find important applications as biological probes. PMID:16086577

  2. Numerical characteristics of word frequencies and their application to dissimilarity measure for sequence comparison.

    PubMed

    Dai, Qi; Liu, Xiaoqing; Yao, Yuhua; Zhao, Fukun

    2011-05-07

    Sequence comparison is one of the major tasks in bioinformatics, which can be used to study structural and functional conservation, as well as evolutionary relations among the sequences. Numerous dissimilarity measures achieve promising results in sequence comparison, but challenges remain. This paper studied numerical characteristics of word frequencies and proposed a novel dissimilarity measure for sequence comparison. Instead of using the word frequencies directly, the proposed measure considers both the word frequencies and overlapping structures of words. To verify the effectiveness of the proposed measure, we tested it with two experiments and further compared it with alignment-based and alignment-free measures. The results demonstrate that the proposed measure extracting more information on the overlapping structures of the words improves the efficiency of sequence comparison.

  3. Shot sequencing based on biological equivalent dose considerations for multiple isocenter Gamma Knife radiosurgery.

    PubMed

    Ma, Lijun; Lee, Letitia; Barani, Igor; Hwang, Andrew; Fogh, Shannon; Nakamura, Jean; McDermott, Michael; Sneed, Penny; Larson, David A; Sahgal, Arjun

    2011-11-21

    Rapid delivery of multiple shots or isocenters is one of the hallmarks of Gamma Knife radiosurgery. In this study, we investigated whether the temporal order of shots delivered with Gamma Knife Perfexion would significantly influence the biological equivalent dose for complex multi-isocenter treatments. Twenty single-target cases were selected for analysis. For each case, 3D dose matrices of individual shots were extracted and single-fraction equivalent uniform dose (sEUD) values were determined for all possible shot delivery sequences, corresponding to different patterns of temporal dose delivery within the target. We found significant variations in the sEUD values among these sequences exceeding 15% for certain cases. However, the sequences for the actual treatment delivery were found to agree (<3%) and to correlate (R² = 0.98) excellently with the sequences yielding the maximum sEUD values for all studied cases. This result is applicable for both fast and slow growing tumors with α/β values of 2 to 20 according to the linear-quadratic model. In conclusion, despite large potential variations in different shot sequences for multi-isocenter Gamma Knife treatments, current clinical delivery sequences exhibited consistent biological target dosing that approached that maximally achievable for all studied cases.

  4. Shot sequencing based on biological equivalent dose considerations for multiple isocenter Gamma Knife radiosurgery

    NASA Astrophysics Data System (ADS)

    Ma, Lijun; Lee, Letitia; Barani, Igor; Hwang, Andrew; Fogh, Shannon; Nakamura, Jean; McDermott, Michael; Sneed, Penny; Larson, David A.; Sahgal, Arjun

    2011-11-01

    Rapid delivery of multiple shots or isocenters is one of the hallmarks of Gamma Knife radiosurgery. In this study, we investigated whether the temporal order of shots delivered with Gamma Knife Perfexion would significantly influence the biological equivalent dose for complex multi-isocenter treatments. Twenty single-target cases were selected for analysis. For each case, 3D dose matrices of individual shots were extracted and single-fraction equivalent uniform dose (sEUD) values were determined for all possible shot delivery sequences, corresponding to different patterns of temporal dose delivery within the target. We found significant variations in the sEUD values among these sequences exceeding 15% for certain cases. However, the sequences for the actual treatment delivery were found to agree (<3%) and to correlate (R2 = 0.98) excellently with the sequences yielding the maximum sEUD values for all studied cases. This result is applicable for both fast and slow growing tumors with α/β values of 2 to 20 according to the linear-quadratic model. In conclusion, despite large potential variations in different shot sequences for multi-isocenter Gamma Knife treatments, current clinical delivery sequences exhibited consistent biological target dosing that approached that maximally achievable for all studied cases.

  5. Large scale comparison of non-human sequences in human sequencing data

    PubMed Central

    Tae, Hongseok; Karunasena, Enusha; Bavarva, Jasmin H.; McIver, Lauren J.; Garner, Harold R.

    2014-01-01

    Several studies have demonstrated that unmapped reads in next generation sequencing data could be used to identify infectious agents or structural variants, but there has been no intensive effort to analyze and classify all non-human sequences found in individual large data sets. To identify commonality in non-human sequences by infectious agents and putative contamination events, we analyzed non-human sequences in 150 genomic sequencing data files from the 1000 Genomes Project and observed that 0.13% of reads on average showed similarities to non-human genomes. We compared results among different sample groups divided based on ethnicities, sequencing centers and enrichment methods (whole genome sequencing vs. exome sequencing) and found that sequencing centers had specific signatures of contaminating genomes as ‘time stamps’. We also observed many unmapped reads that falsely indicated contamination because of the high similarity of human sequences to sequences in non-human genome assemblies such as mouse and Nicotiana. PMID:25173571

  6. Searching for a family of orphan sequences with SAMBA, a parallel hardware dedicated to biological applications.

    PubMed

    Guerdoux-Jamet, P; Risler, J L

    1996-01-01

    A significant proportion of coding sequences or open reading frames discovered in the course of sequencing projects do not show any similarity with other sequences deposited with the protein databanks. In such cases the search for similarities must be performed with as many comparison algorithms as possible, so as to increase the chance of finding weak relationships. A specialised parallel hardware (SAMBA) implementing the Smith & Waterman algorithm has been developed at the 'Institut de Recherche en Informatique et Systèmes Aléatoìres' (IRISA). It makes it possible to scan protein databanks at a speed comparable with that of BLAST or FASTA. We report here a study performed with SAMBA on 814 orphan sequences from S cerevisiae and compare the results with those from BLAST and FASTA.

  7. Development and Assessment of a Horizontally Integrated Biological Sciences Course Sequence for Pharmacy Education

    PubMed Central

    Wright, Nicholas J.D.; Alston, Gregory L.

    2015-01-01

    Objective. To design and assess a horizontally integrated biological sciences course sequence and to determine its effectiveness in imparting the foundational science knowledge necessary to successfully progress through the pharmacy school curriculum and produce competent pharmacy school graduates. Design. A 2-semester course sequence integrated principles from several basic science disciplines: biochemistry, molecular biology, cellular biology, anatomy, physiology, and pathophysiology. Each is a 5-credit course taught 5 days per week, with 50-minute class periods. Assessment. Achievement of outcomes was determined with course examinations, student lecture, and an annual skills mastery assessment. The North American Pharmacist Licensure Examination (NAPLEX) results were used as an indicator of competency to practice pharmacy. Conclusion. Students achieved course objectives and program level outcomes. The biological sciences integrated course sequence was successful in providing students with foundational basic science knowledge required to progress through the pharmacy program and to pass the NAPLEX. The percentage of the school’s students who passed the NAPLEX was not statistically different from the national percentage. PMID:26430276

  8. A comparison of Illumina and Ion Torrent sequencing platforms in the context of differential gene expression.

    PubMed

    Lahens, Nicholas F; Ricciotti, Emanuela; Smirnova, Olga; Toorens, Erik; Kim, Eun Ji; Baruzzo, Giacomo; Hayer, Katharina E; Ganguly, Tapan; Schug, Jonathan; Grant, Gregory R

    2017-08-10

    Though Illumina has largely dominated the RNA-Seq field, the simultaneous availability of Ion Torrent has left scientists wondering which platform is most effective for differential gene expression (DGE) analysis. Previous investigations of this question have typically used reference samples derived from cell lines and brain tissue, and do not involve biological variability. While these comparisons might inform studies of tissue-specific expression, marked by large-scale transcriptional differences, this is not the common use case. Here we employ a standard treatment/control experimental design, which enables us to evaluate these platforms in the context of the expression differences common in differential gene expression experiments. Specifically, we assessed the hepatic inflammatory response of mice by assaying liver RNA from control and IL-1β treated animals with both the Illumina HiSeq and the Ion Torrent Proton sequencing platforms. We found the greatest difference between the platforms at the level of read alignment, a moderate level of concordance at the level of DGE analysis, and nearly identical results at the level of differentially affected pathways. Interestingly, we also observed a strong interaction between sequencing platform and choice of aligner. By aligning both real and simulated Illumina and Ion Torrent data with the twelve most commonly-cited aligners in the literature, we observed that different aligner and platform combinations were better suited to probing different genomic features; for example, disentangling the source of expression in gene-pseudogene pairs. Taken together, our results indicate that while Illumina and Ion Torrent have similar capacities to detect changes in biology from a treatment/control experiment, these platforms may be tailored to interrogate different transcriptional phenomena through careful selection of alignment software.

  9. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics.

    PubMed

    Asgari, Ehsaneddin; Mofrad, Mohammad R K

    2015-01-01

    We introduce a new representation and feature extraction method for biological sequences. Named bio-vectors (BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins (amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representation can be widely used in applications of deep learning in proteomics and genomics. In the present paper, we focus on protein-vectors that can be utilized in a wide array of bioinformatics investigations such as family classification, protein visualization, structure prediction, disordered protein identification, and protein-protein interaction prediction. In this method, we adopt artificial neural network approaches and represent a protein sequence with a single dense n-dimensional vector. To evaluate this method, we apply it in classification of 324,018 protein sequences obtained from Swiss-Prot belonging to 7,027 protein families, where an average family classification accuracy of 93%±0.06% is obtained, outperforming existing family classification methods. In addition, we use ProtVec representation to predict disordered proteins from structured proteins. Two databases of disordered sequences are used: the DisProt database as well as a database featuring the disordered regions of nucleoporins rich with phenylalanine-glycine repeats (FG-Nups). Using support vector machine classifiers, FG-Nup sequences are distinguished from structured protein sequences found in Protein Data Bank (PDB) with a 99.8% accuracy, and unstructured DisProt sequences are differentiated from structured DisProt sequences with 100.0% accuracy. These results indicate that by only providing sequence data for various proteins into this model, accurate information about protein structure can be determined. Importantly, this model needs to be trained only once and can then be applied to extract a comprehensive set of information regarding proteins of interest. Moreover, this representation can be considered as

  10. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics

    PubMed Central

    Asgari, Ehsaneddin; Mofrad, Mohammad R. K.

    2015-01-01

    We introduce a new representation and feature extraction method for biological sequences. Named bio-vectors (BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins (amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representation can be widely used in applications of deep learning in proteomics and genomics. In the present paper, we focus on protein-vectors that can be utilized in a wide array of bioinformatics investigations such as family classification, protein visualization, structure prediction, disordered protein identification, and protein-protein interaction prediction. In this method, we adopt artificial neural network approaches and represent a protein sequence with a single dense n-dimensional vector. To evaluate this method, we apply it in classification of 324,018 protein sequences obtained from Swiss-Prot belonging to 7,027 protein families, where an average family classification accuracy of 93%±0.06% is obtained, outperforming existing family classification methods. In addition, we use ProtVec representation to predict disordered proteins from structured proteins. Two databases of disordered sequences are used: the DisProt database as well as a database featuring the disordered regions of nucleoporins rich with phenylalanine-glycine repeats (FG-Nups). Using support vector machine classifiers, FG-Nup sequences are distinguished from structured protein sequences found in Protein Data Bank (PDB) with a 99.8% accuracy, and unstructured DisProt sequences are differentiated from structured DisProt sequences with 100.0% accuracy. These results indicate that by only providing sequence data for various proteins into this model, accurate information about protein structure can be determined. Importantly, this model needs to be trained only once and can then be applied to extract a comprehensive set of information regarding proteins of interest. Moreover, this representation can be considered as

  11. Next-Generation Sequencing in the Understanding of Kaposi's Sarcoma-Associated Herpesvirus (KSHV) Biology.

    PubMed

    Strahan, Roxanne; Uppal, Timsy; Verma, Subhash C

    2016-03-31

    Non-Sanger-based novel nucleic acid sequencing techniques, referred to as Next-Generation Sequencing (NGS), provide a rapid, reliable, high-throughput, and massively parallel sequencing methodology that has improved our understanding of human cancers and cancer-related viruses. NGS has become a quintessential research tool for more effective characterization of complex viral and host genomes through its ever-expanding repertoire, which consists of whole-genome sequencing, whole-transcriptome sequencing, and whole-epigenome sequencing. These new NGS platforms provide a comprehensive and systematic genome-wide analysis of genomic sequences and a full transcriptional profile at a single nucleotide resolution. When combined, these techniques help unlock the function of novel genes and the related pathways that contribute to the overall viral pathogenesis. Ongoing research in the field of virology endeavors to identify the role of various underlying mechanisms that control the regulation of the herpesvirus biphasic lifecycle in order to discover potential therapeutic targets and treatment strategies. In this review, we have complied the most recent findings about the application of NGS in Kaposi's sarcoma-associated herpesvirus (KSHV) biology, including identification of novel genomic features and whole-genome KSHV diversities, global gene regulatory network profiling for intricate transcriptome analyses, and surveying of epigenetic marks (DNA methylation, modified histones, and chromatin remodelers) during de novo, latent, and productive KSHV infections.

  12. The role of adalimumab in rheumatic and autoimmune disorders: comparison with other biologic agents

    PubMed Central

    Reimold, Andreas M

    2012-01-01

    Adalimumab (ADA) is a biologic medication that dampens inflammatory pathways by binding to the cytokine tumor necrosis factor alpha. The US Food and Drug Administration has approved ADA as a medication for use in rheumatoid arthritis, psoriatic arthritis, ankylosing spondylitis, Crohn’s disease, psoriasis, and juvenile idiopathic arthritis. This year marks 10 years of clinical experience with ADA. Long-term extension studies of some of the initial clinical trials, as well as data from large patient registries, are demonstrating ongoing benefit for responders. Potential side effects such as increased risk of infection, lymphoma, congestive heart failure, and demyelination continue to be examined, as the available data are not unanimous in showing an increase in incidence. In balancing both the advantages and the disadvantages of using ADA, the drug’s overall effectiveness and its availability for use in patients with hepatic or renal comorbidities are weighed against the high cost. ADA is expected to have a leading role in the treatment of rheumatoid arthritis and other inflammatory conditions for years to come. Future studies will need to address the optimal sequence of disease-modifying antirheumatic drugs and biologics to use, combinations of disease-modifying antirheumatic drugs and biologics, and head-to-head comparisons of biologics in clinical trials. For those who go into clinical remission on an anti-tumor necrosis factor medication, unanswered questions remain about identifying the patients who can maintain the remission off all drugs, or at least off injected medication. Given the cost of biologic drugs, even studies that increase the interval between drug doses in well-controlled patients could provide financial benefits. PMID:27790010

  13. An Efficient Machine Learning Approach To Low-Complexity Filtering In Biological Sequences

    SciTech Connect

    Barber, Christopher A; Oehmen, Christopher S

    2012-06-09

    Biological sequences contain low-complexity regions (LCRs) which produce superfluous matches in homology searches, and lead to slow execution of database search algorithms such as BLAST. These regions are efficiently identified by low-complexity filtering algorithms such as SDUST and SEG, which are included in the BLAST tool-suite. These algorithms target differing notions of complexity, so an algorithm which combines their sensitivities is pursued. A variety of features are derived from these algorithms, as well as a new filtering algorithm based on Lempel-Ziv complexity. Artificial sequences with known LCRs are used to train and evaluate an SVM classifier, which significantly outperforms the standalone filtering algorithms.

  14. Identifying and Mitigating Bias in Next-Generation Sequencing Methods for Chromatin Biology

    PubMed Central

    Meyer, Clifford A.; Liu, X. Shirley

    2015-01-01

    Next generation sequencing (NGS) technologies have been used in diverse ways to investigate facets of chromatin biology by identifying genomic loci that are bound by transcription factors, occupied by nucleosomes, accessible to nuclease cleavage, or physically interact with remote genomic loci. Reaching sound biological conclusions from such NGS enrichment profiles, however, requires that many potential biases be taken into account. In this Review we discuss common ways in which bias may be introduced into NGS chromatin profiling data, ways in which these biases can be diagnosed, and analytical techniques to mitigate their effect. PMID:25223782

  15. Sequencing biological and physical events affects specific frequency bands within the human premotor cortex: an intracerebral EEG study.

    PubMed

    Caruana, Fausto; Sartori, Ivana; Lo Russo, Giorgio; Avanzini, Pietro

    2014-01-01

    Evidence that the human premotor cortex (PMC) is activated by cognitive functions involving the motor domain is classically explained as the reactivation of a motor program decoupled from its executive functions, and exploited for different purposes by means of a motor simulation. In contrast, the evidence that PMC contributes to the sequencing of non-biological events cannot be explained by the simulationist theory. Here we investigated how motor simulation and event sequencing coexist within the PMC and how these mechanisms interact when both functions are executed. We asked patients with depth electrodes implanted in the PMC to passively observe a randomized arrangement of images depicting biological actions and physical events and, in a second block, to sequence them in the correct order. This task allowed us to disambiguate between the simple observation of actions, their sequencing (recruiting different motor simulation processes), as well as the sequencing of non-biological events (recruiting a sequencer mechanism non dependant on motor simulation). We analysed the response of the gamma, alpha and beta frequency bands to evaluate the contribution of each brain rhythm to the observation and sequencing of both biological and non-biological stimuli. We found that motor simulation (biological>physical) and event sequencing (sequencing>observation) differently affect the three investigated frequency bands: motor simulation was reflected on the gamma and, partially, in the beta, but not in the alpha band. In contrast, event sequencing was also reflected on the alpha band.

  16. Comparison of Biolog GEN III MicroStation semi-automated bacterial identification system with matrix-assisted laser desorption ionization-time of flight mass spectrometry and 16S ribosomal RNA gene sequencing for the identification of bacteria of veterinary interest.

    PubMed

    Wragg, P; Randall, L; Whatmore, A M

    2014-10-01

    Recent advances in phenotypic and chemotaxonomic methods have improved the ability of systems to resolve bacterial identities at the species level. Key to the effective use of these systems is the ability to draw upon databases which can be augmented with new data gleaned from atypical or novel isolates. In this study we compared the performance of the Biolog GEN III identification system (hereafter, GEN III) with matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) and 16S rRNA gene sequencing in the identification of isolates of veterinary interest. The use of strains that had proven more difficult to identify by routine methods was designed to test the systems' abilities at the extremes of their performance range. Over an 18month period, 100 strains were analysed by all three methods. To highlight the importance of identification to species level, a weighted scoring system was devised to differentiate the capacity to identify at genus and species levels. The overall relative weighted scores were 0.869:0.781:0.769, achieved by 16S rRNA gene sequencing, GEN III and MALDI-TOF MS respectively, when compared to the 'gold standard'. Performance to the genus level was significantly better using 16S rRNA gene sequencing; however, performance to the species level was similar for all three systems. Copyright © 2014. Published by Elsevier B.V.

  17. Networking Biology: The Origins of Sequence-Sharing Practices in Genomics.

    PubMed

    Stevens, Hallam

    2015-10-01

    The wide sharing of biological data, especially nucleotide sequences, is now considered to be a key feature of genomics. Historians and sociologists have attempted to account for the rise of this sharing by pointing to precedents in model organism communities and in natural history. This article supplements these approaches by examining the role that electronic networking technologies played in generating the specific forms of sharing that emerged in genomics. The links between early computer users at the Stanford Artificial Intelligence Laboratory in the 1960s, biologists using local computer networks in the 1970s, and GenBank in the 1980s, show how networking technologies carried particular practices of communication, circulation, and data distribution from computing into biology. In particular, networking practices helped to transform sequences themselves into objects that had value as a community resource.

  18. Unravelling biology and shifting paradigms in cancer with single-cell sequencing.

    PubMed

    Baslan, Timour; Hicks, James

    2017-08-24

    The fundamental operative unit of a cancer is the genetically and epigenetically innovative single cell. Whether proliferating or quiescent, in the primary tumour mass or disseminated elsewhere, single cells govern the parameters that dictate all facets of the biology of cancer. Thus, single-cell analyses provide the ultimate level of resolution in our quest for a fundamental understanding of this disease. Historically, this quest has been hampered by technological shortcomings. In this Opinion article, we argue that the rapidly evolving field of single-cell sequencing has unshackled the cancer research community of these shortcomings. From furthering an elemental understanding of intra-tumoural genetic heterogeneity and cancer genome evolution to illuminating the governing principles of disease relapse and metastasis, we posit that single-cell sequencing promises to unravel the biology of all facets of this disease.

  19. KISSa: a strategy to build multiple sequence alignments from pairwise comparisons of very closely related sequences.

    PubMed

    Marass, Francesco; Upton, Chris

    2009-05-20

    The volume of viral genomic sequence data continues to increase rapidly. This is especially true for the smaller RNA viruses, which are relatively easy to sequence in large numbers. The data volumes cause a number of significant problems for research applications that require large multiple alignments of essentially complete genomes, which are of the order of 10 kb. We present a simple strategy to enable the creation of large quasi-multiple sequence alignments from pairwise alignment data. This process is suitable for large, closely related sequences such as the polyproteins of dengue viruses, which need the insertion of very few indels. The quasi-multiple sequence alignments generated by KISSa are sufficiently accurate to support tree-based genome selection for interactive bioinformatics analysis tools. The speed of this process is critical to providing an interactive experience for the user.

  20. Sequence-based genotyping clarifies conflicting historical morphometric and biological data for 5 Eimeria species infecting turkeys.

    PubMed

    El-Sherry, S; Ogedengbe, M E; Hafeez, M A; Sayf-Al-Din, M; Gad, N; Barta, J R

    2015-02-01

    Unlike with Eimeria species infecting chickens, specific identification and nomenclature of Eimeria species infecting turkeys is complicated, and in the absence of molecular data, imprecise. In an attempt to reconcile contradictory data reported on oocyst morphometrics and biological descriptions of various Eimeria species infecting turkey, we established single oocyst derived lines of 5 important Eimeria species infecting turkeys, Eimeria meleagrimitis (USMN08-01 strain), Eimeria adenoeides (Guelph strain), Eimeria gallopavonis (Weybridge strain), Eimeria meleagridis (USAR97-01 strain), and Eimeria dispersa (Briston strain). Short portions (514 bp) of mitochondrial cytochrome c oxidase subunit I gene (mt COI) from each were amplified and sequenced. Comparison of these sequences showed sufficient species-specific sequence variation to recommend these short mt COI sequences as species-specific markers. Uniformity of oocyst features (dimensions and oocyst structure) of each pure line was observed. Additional morphological features of the oocysts of these species are described as useful for the microscopic differentiation of these Eimeria species. Combined molecular and morphometric data on these single species lines compared with the original species descriptions and more recent data have helped to clarify some confusing, and sometimes conflicting, features associated with these Eimeria spp. For example, these new data suggest that the KCH and KR strains of E. adenoeides reported previously represent 2 distinct species, E. adenoeides and E. meleagridis, respectively. Likewise, analysis of the Weybridge strain of E. adenoeides, which has long been used as a reference strain in various studies conducted on the pathogenicity of E. adenoeides, indicates that this coccidium is actually a strain of E. gallopavonis. We highly recommend mt COI sequence-based genotyping be incorporated into all studies using Eimeria spp. of turkeys to confirm species identifications and so

  1. [Biological ingredient analysis of traditional Chinese medicines utilizing metagenomic approach based on high-throughput-sequencing and big-data-mining].

    PubMed

    Bai, Hong; Ning, Kang; Wang, Chang-yun

    2015-03-01

    The quality of traditional Chinese medicines (TCMs) has been mainly evaluated based on chemical ingredients, yet recently more attentions have been paid on biological ingredients, especially for pill-based preparations. It is a key approach to establish a fast, accurate and systematic method of biological ingredient analysis for realization of modernization, industrialization and internationalization of TCMs. The biological ingredient analysis of TCM preparations could be abstracted as the identification of multiple species from a biological mixture. The metagenomic approach based on high-throughput-sequencing (HTS) and big-data-mining has been considered as one of the most effective methods for multiple species analysis of a biological mixture, which would also be helpful for the analysis of biological ingredients in TCMs. Simultaneous identification of diverse species, including the prescribed species, adulterants, toxic species, protected species and even the biological impurities introduced through production process, could be achieved by selecting appropriate DNA biomarkers, as well as applying large-scale sequence comparison and data mining. By this approach, it is prospective to offer an evaluation basis for the effectiveness, safety and legality of TCM preparations.

  2. Comparison of simple sequence repeats in 19 Archaea.

    PubMed

    Trivedi, S

    2006-12-05

    All organisms that have been studied until now have been found to have differential distribution of simple sequence repeats (SSRs), with more SSRs in intergenic than in coding sequences. SSR distribution was investigated in Archaea genomes where complete chromosome sequences of 19 Archaea were analyzed with the program SPUTNIK to find di- to penta-nucleotide repeats. The number of repeats was determined for the complete chromosome sequences and for the coding and non-coding sequences. Different from what has been found for other groups of organisms, there is an abundance of SSRs in coding regions of the genome of some Archaea. Dinucleotide repeats were rare and CG repeats were found in only two Archaea. In general, trinucleotide repeats are the most abundant SSR motifs; however, pentanucleotide repeats are abundant in some Archaea. Some of the tetranucleotide and pentanucleotide repeat motifs are organism specific. In general, repeats are short and CG-rich repeats are present in Archaea having a CG-rich genome. Among the 19 Archaea, SSR density was not correlated with genome size or with optimum growth temperature. Pentanucleotide density had an inverse correlation with the CG content of the genome.

  3. Shotgun metagenomics of biological stains using ultra-deep DNA sequencing.

    PubMed

    Brenig, B; Beck, J; Schütz, E

    2010-07-01

    A detailed molecular analysis of blood or other biological stains at a crime scene is often hampered by the low quantity and quality of the extractable DNA. However, the determination of the origin and composition of a stain is in most cases a prerequisite for the final elucidation of a criminal case. Standard methodologies, e.g. amplification of DNA followed by microsatellite typing or mitochondrial DNA sequencing, are often not sensitive enough to result in sufficient and conclusive data. We have applied ultra-deep DNA sequencing using the 454 pyrosequencing technology on a whole genome amplified (WGA) environmental biological stain, which was analysed unsuccessfully with standard methodologies following WGA. With the combination of WGA and 454 pyrosequencing, however, we were able to generate 7242 single sequences with an average length of 195bp. A total of 1,441,971bp DNA sequences were generated and compared with public DNA sequence databases. Using RepeatMasker and basic logical alignment search tool (BLAST) searches against known microbial and mammalian genomes it was possible to determine the metagenomic composition of the stain, i.e. 4.2% bacterial DNA, 0.3% viral DNA, 2.7% fungal DNA, 10.3% mammalian repetitive DNA, 0.9% porcine DNA, 0.13% human DNA and 81.5% DNA of unknown origin. Our data demonstrate that 454 pyrosequencing has the potential to become a powerful tool not only in basic research but also in the metagenomic analysis of biological trace materials for forensic genetics.

  4. MBEToolbox: a MATLAB toolbox for sequence data analysis in molecular biology and evolution.

    PubMed

    Cai, James J; Smith, David K; Xia, Xuhua; Yuen, Kwok-Yung

    2005-03-22

    MATLAB is a high-performance language for technical computing, integrating computation, visualization, and programming in an easy-to-use environment. It has been widely used in many areas, such as mathematics and computation, algorithm development, data acquisition, modeling, simulation, and scientific and engineering graphics. However, few functions are freely available in MATLAB to perform the sequence data analyses specifically required for molecular biology and evolution. We have developed a MATLAB toolbox, called MBEToolbox, aimed at filling this gap by offering efficient implementations of the most needed functions in molecular biology and evolution. It can be used to manipulate aligned sequences, calculate evolutionary distances, estimate synonymous and nonsynonymous substitution rates, and infer phylogenetic trees. Moreover, it provides an extensible, functional framework for users with more specialized requirements to explore and analyze aligned nucleotide or protein sequences from an evolutionary perspective. The full functions in the toolbox are accessible through the command-line for seasoned MATLAB users. A graphical user interface, that may be especially useful for non-specialist end users, is also provided. MBEToolbox is a useful tool that can aid in the exploration, interpretation and visualization of data in molecular biology and evolution. The software is publicly available at http://web.hku.hk/~jamescai/mbetoolbox/ and http://bioinformatics.org/project/?group_id=454

  5. Phylogenetic relationships of Cryptosporidium determined by ribosomal RNA sequence comparison.

    PubMed

    Johnson, A M; Fielke, R; Lumb, R; Baverstock, P R

    1990-04-01

    Reverse transcription of total cellular RNA was used to obtain a partial sequence of the small subunit ribosomal RNA of Cryptosporidium, a protist currently placed in the phylum Apicomplexa. The semi-conserved regions were aligned with homologous sequences in a range of other eukaryotes, and the evolutionary relationships of Cryptosporidium were determined by two different methods of phylogenetic analysis. The prokaryotes Escherichia coli and Halobacterium cuti were included as outgroups. The results do not show an especially close relationship of Cryptosporidium to other members of the phylum Apicomplexa.

  6. Next-generation biology: Sequencing and data analysis approaches for non-model organisms.

    PubMed

    da Fonseca, Rute R; Albrechtsen, Anders; Themudo, Gonçalo Espregueira; Ramos-Madrigal, Jazmín; Sibbesen, Jonas Andreas; Maretty, Lasse; Zepeda-Mendoza, M Lisandra; Campos, Paula F; Heller, Rasmus; Pereira, Ricardo J

    2016-12-01

    As sequencing technologies become more affordable, it is now realistic to propose studying the evolutionary history of virtually any organism on a genomic scale. However, when dealing with non-model organisms it is not always easy to choose the best approach given a specific biological question, a limited budget, and challenging sample material. Furthermore, although recent advances in technology offer unprecedented opportunities for research in non-model organisms, they also demand unprecedented awareness from the researcher regarding the assumptions and limitations of each method. In this review we present an overview of the current sequencing technologies and the methods used in typical high-throughput data analysis pipelines. Subsequently, we contextualize high-throughput DNA sequencing technologies within their applications in non-model organism biology. We include tips regarding managing unconventional sample material, comparative and population genetic approaches that do not require fully assembled genomes, and advice on how to deal with low depth sequencing data. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  7. A COMPARISON OF FIXED SEQUENCE AND OPTIONAL BRANCHING AUTIOINSTRUCTIONAL METHODS.

    ERIC Educational Resources Information Center

    MELARAGNO, RALPH J.; AND OTHERS

    HYPOTHESES RELATED TO PROCEDURES PERMITTING STUDENTS TO BRANCH AT THEIR OWN OPTION WERE TESTED. THE FIRST HYPOTHESIS WAS THAT A FIXED-SEQUENCE PROGRAM WOULD BE LESS EFFECTIVE THAN THE SAME ITEMS CAST AS STATEMENTS IN TEXTBOOK FORMAT THROUGH WHICH THE STUDENT COULD SKIP AT HIS OWN OPTION. THE SECOND HYPOTHESIS WAS THAT PERFORMANCE ON A PROGRAM…

  8. High school biology students' participation in a year-long sequence of analogical activities: The relationship of development of analogical thought to student learning and classroom interactions

    NASA Astrophysics Data System (ADS)

    Hackney, Marcella Wichser

    1999-10-01

    This research explored development of analogical thought through high school biology students' participation in a year-long sequence of analogical activities. Analogizing involves: selecting a familiar analog; mapping similarities and differences between the analog and less familiar target; making inferences from the analogy; evaluating validity of the inferences; and ultimately, understanding the biological target (Holyoak & Thagard, 1995). This investigation considered: student development of independence in learning through analogical thought, student learning of biology, the relationship between development of students' analogical thinking and students' learning of biology, and the quality of student interactions in the classroom This researcher, as teacher participant, used three approaches for teaching by analogy: traditional didactic, teacher-guided, and analogy-generated-by-the-student (Zeitoun, 1983). Within cooperative groups, students in one honors biology class actively engaged in research-based analogical activities that targeted specific biological topics. Two honors biology classes participated in similar, but nonanalogical activities that targeted the same biological topics. This two-class comparison group permitted analytical separation of effects of the analogical emphasis from the effects of biology content and activity-based learning. Data collected included: fieldnotes of researcher observations, student responses to guidesheets, tapes of group interactions, student products, student perceptions survey evaluations, ratings of students' expressed analogical development, pre- and posttest scores on a biology achievement test, essay responses, and selected student interviews. These data formed the basis for researcher qualitative analysis, augmented by quantitative techniques. Through participation in the sequence of analogical activities, students developed their abilities to engage in the processes of analogical thinking, but attained different

  9. Sequence comparison and classification of beet luteovirus isolates.

    PubMed

    de Miranda, J R; Stevens, M; de Bruyne, E; Smith, H G; Bird, C; Hull, R

    1995-01-01

    Three distinct sequence groups were found among partial nucleotide sequences of 38 isolates of beet western yellows virus (BWYV) and beet mild yellowing virus (BMYV) from Europe, Iran and the USA. The first group contains both sugar beet and oilseed rape specific isolates, and the differentiating characteristic linked to this host range specificity are 2 single base pair changes in a 1,200 nucleotide region of the genome. It is proposed that the European BWYV strains that can be transferred at low frequency between rape and sugar beet belong to this group. Also belonging to this group are the published BWYV sequences of Veidt et al. and of the California BWYV-ST9 isolate. The second group contains mostly rape-derived isolates which have an intergenic region highly distinct from that of group-1 isolates but similar polymerase and coat protein regions. It is proposed that the rape-specific BWYV isolates which cannot be transmitted to sugar beet belong to this group. The third group contains mostly beet-specific isolates from Southern Europe and Iran, and may be adapted to the Mediterranean climate and flora. It is distinct from groups 1 and 2 in all three genome regions investigated and its polymerase and intergenic regions are as much related to those of potato leafroll virus (PLRV) and curcurbit aphid borne yellows virus (CABYV) as they are to those of group-1 and group-2. On the basis of sequence similarities and established nomenclature it is proposed to use BWYV for groups 1 and 2 (BWYV-1 and BWYV-2 respectively) and to use BMYV for group-3 isolates, which are distinct enough from the other two groups to merit a separate nomenclature.

  10. 3D reconstruction software comparison for short sequences

    NASA Astrophysics Data System (ADS)

    Strupczewski, Adam; Czupryński, BłaŻej

    2014-11-01

    Large scale multiview reconstruction is recently a very popular area of research. There are many open source tools that can be downloaded and run on a personal computer. However, there are few, if any, comparisons between all the available software in terms of accuracy on small datasets that a single user can create. The typical datasets for testing of the software are archeological sites or cities, comprising thousands of images. This paper presents a comparison of currently available open source multiview reconstruction software for small datasets. It also compares the open source solutions with a simple structure from motion pipeline developed by the authors from scratch with the use of OpenCV and Eigen libraries.

  11. Numerical encoding of DNA sequences by chaos game representation with application in similarity comparison.

    PubMed

    Hoang, Tung; Yin, Changchuan; Yau, Stephen S-T

    2016-10-01

    Numerical encoding plays an important role in DNA sequence analysis via computational methods, in which numerical values are associated with corresponding symbolic characters. After numerical representation, digital signal processing methods can be exploited to analyze DNA sequences. To reflect the biological properties of the original sequence, it is vital that the representation is one-to-one. Chaos Game Representation (CGR) is an iterative mapping technique that assigns each nucleotide in a DNA sequence to a respective position on the plane that allows the depiction of the DNA sequence in the form of image. Using CGR, a biological sequence can be transformed one-to-one to a numerical sequence that preserves the main features of the original sequence. In this research, we propose to encode DNA sequences by considering 2D CGR coordinates as complex numbers, and apply digital signal processing methods to analyze their evolutionary relationship. Computational experiments indicate that this approach gives comparable results to the state-of-the-art multiple sequence alignment method, Clustal Omega, and is significantly faster. The MATLAB code for our method can be accessed from: www.mathworks.com/matlabcentral/fileexchange/57152. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. First Insights into the Evolution of Streptococcus uberis: a Multilocus Sequence Typing Scheme That Enables Investigation of Its Population Biology

    PubMed Central

    Coffey, Tracey J.; Pullinger, Gillian D.; Urwin, Rachel; Jolley, Keith A.; Wilson, Stephen M.; Maiden, Martin C.; Leigh, James A.

    2006-01-01

    Intramammary infection with Streptococcus uberis is a common cause of bovine mastitis throughout the world. Several procedures to differentiate S. uberis isolates have been proposed. However, all are prone to interlaboratory variation, and none is suitable for the description of the population structure. We describe here the development of a multilocus sequence typing (MLST) scheme for S. uberis to help address these issues. The sequences of seven housekeeping gene fragments from each of 160 United Kingdom milk isolates of S. uberis were determined. Between 5 and 17 alleles were obtained per locus, giving the potential to discriminate between 1.3 × 107 sequence types. In this study, 57 sequence types (STs) were identified. Statistical comparisons between the maximum-likelihood trees constructed by using the seven housekeeping gene fragments showed that the congruence was no better than that between each tree and trees of random topology, indicating there had been significant recombination within these loci. The population contained one major lineage (designated the ST-5 complex). This dominated the population, containing 24 STs and representing 112 isolates. The other 33 STs were not assigned to any clonal complex. All of the isolates in the ST-5 lineage carried hasA, a gene that is essential for capsule production. There was no clear association between ST or clonal complex and disease. The S. uberis MLST system offers researchers a valuable tool that allows further investigation of the population biology of this organism and insights into the epidemiology of this disease on a global scale. PMID:16461695

  13. New Powerful Statistics for Alignment-free Sequence Comparison Under a Pattern Transfer Model

    PubMed Central

    Liu, Xuemei; Wan, Lin; Li, Jing; Reinert, Gesine; Waterman, Michael S.; Sun, Fengzhu

    2011-01-01

    Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D2 and its variants D2∗ and D2s showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D2, D2∗ and D2s by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. PMID:21723298

  14. Graph mining for next generation sequencing: leveraging the assembly graph for biological insights.

    PubMed

    Warnke-Sommer, Julia; Ali, Hesham

    2016-05-06

    The assembly of Next Generation Sequencing (NGS) reads remains a challenging task. This is especially true for the assembly of metagenomics data that originate from environmental samples potentially containing hundreds to thousands of unique species. The principle objective of current assembly tools is to assemble NGS reads into contiguous stretches of sequence called contigs while maximizing for both accuracy and contig length. The end goal of this process is to produce longer contigs with the major focus being on assembly only. Sequence read assembly is an aggregative process, during which read overlap relationship information is lost as reads are merged into longer sequences or contigs. The assembly graph is information rich and capable of capturing the genomic architecture of an input read data set. We have developed a novel hybrid graph in which nodes represent sequence regions at different levels of granularity. This model, utilized in the assembly and analysis pipeline Focus, presents a concise yet feature rich view of a given input data set, allowing for the extraction of biologically relevant graph structures for graph mining purposes. Focus was used to create hybrid graphs to model metagenomics data sets obtained from the gut microbiomes of five individuals with Crohn's disease and eight healthy individuals. Repetitive and mobile genetic elements are found to be associated with hybrid graph structure. Using graph mining techniques, a comparative study of the Crohn's disease and healthy data sets was conducted with focus on antibiotics resistance genes associated with transposase genes. Results demonstrated significant differences in the phylogenetic distribution of categories of antibiotics resistance genes in the healthy and diseased patients. Focus was also evaluated as a pure assembly tool and produced excellent results when compared against the Meta-velvet, Omega, and UD-IDBA assemblers. Mining the hybrid graph can reveal biological phenomena captured

  15. MOLECULAR CLONING, SEQUENCING, EXPRESSION AND BIOLOGICAL ACTIVITY OF GIANT PANDA (AILUROPODA MELANOLEUCA) INTERFERON-GAMMA.

    PubMed

    Zhu, Hui; Wang, Wen-Xiu; Wang, Bao-Qin; Zhu, Xiao-Fu; Wu, Xu-Jin; Ma, Qing-Yi; Chen, De-Kun

    2012-06-29

    The giant panda (Ailuropoda melanoleuca) is an endangered species and indigenous to China. Interferon-gamma (IFN-γ) is the only member of type □ IFN and is vital for the regulation of host adapted immunity and inflammatory response. Little is known aboutthe FN-γ gene and its roles in giant panda.In this study, IFN-γ gene of Qinling giant panda was amplified from total blood RNA by RT-CPR, cloned, sequenced and analysed. The open reading frame (ORF) of Qinling giant panda IFN-γ encodes 152 amino acidsand is highly similar to Sichuan giant panda with an identity of 99.3% in cDNA sequence. The IFN-γ cDNA sequence was ligated to the pET32a vector and transformed into E. coli BL21 competent cells. Expression of recombinant IFN-γ protein of Qinling giant panda in E. coli was confirmed by SDS-PAGE and Western blot analysis. Biological activity assay indicated that the recombinant IFN-γ protein at the concentration of 4-10 µg/ml activated the giant panda peripheral blood lymphocytes,while at 12 µg/mlinhibited. the activation of the lymphocytes.These findings provide insights into the evolution of giant panda IFN-γ and information regarding amino acid residues essential for their biological activity.

  16. a Comparison of Morphological Taxonomy and Next Generation DNA Sequencing for the Assessment of Zooplankton Diversity

    NASA Astrophysics Data System (ADS)

    Harvey, J.; Fisher, J. L.; Johnson, S.; Morgan, S.; Peterson, W. T.; Satterthwaite, E. V.; Vrijenhoek, R. C.

    2016-02-01

    Our ability to accurately characterize the diversity of planktonic organisms is affected by both the methods we use to collect water samples and our approaches to assessing sample contents. Plankton nets collect organisms from high volumes of water, but integrate sample contents along the net's path. In contrast, plankton pumps collect water from discrete depths. Autonomous underwater vehicles (AUVs) can collect water samples with pinpoint accuracy from physical features such as upwelling fronts or biological features such as phytoplankton blooms, but sample volumes are necessarily much smaller than those possible with nets. Characterization of plankton diversity and abundances in water samples may also vary with the assessment method we apply. Morphological taxonomy provides visual identification and enumeration of organisms via microscopy, but is labor intensive. Next generation DNA sequencing (NGS) shows great promise for assessing plankton diversity in water samples but accurate assessment of relative abundances may not be possible in all cases. Comparison of morphological taxonomy to molecular approaches is necessary to identify areas of overlap and also areas of disagreement between these methods. We have compared morphological taxonomic assessments to mitochondrial COI and nuclear 28S ribosomal RNA NGS results for plankton net samples collected in Monterey bay, California. We have made a similar comparison for plankton pump samples, and have also applied our NGS methods to targeted, small volume water samples collected by an AUV. Our goal is to communicate current results and lessons learned regarding application of traditional taxonomy and novel molecular approaches to the study of plankton diversity in spatially and temporally variable, coastal marine environments.

  17. Close Sequence Comparisons are Sufficient to Identify Humancis-Regulatory Elements

    SciTech Connect

    Prabhakar, Shyam; Poulin, Francis; Shoukry, Malak; Afzal, Veena; Rubin, Edward M.; Couronne, Olivier; Pennacchio, Len A.

    2005-12-01

    Cross-species DNA sequence comparison is the primary method used to identify functional noncoding elements in human and other large genomes. However, little is known about the relative merits of evolutionarily close and distant sequence comparisons, due to the lack of a universal metric for sequence conservation, and also the paucity of empirically defined benchmark sets of cis-regulatory elements. To address this problem, we developed a general-purpose algorithm (Gumby) that detects slowly-evolving regions in primate, mammalian and more distant comparisons without requiring adjustment of parameters, and ranks conserved elements by P-value using Karlin-Altschul statistics. We benchmarked Gumby predictions against previously identified cis-regulatory elements at diverse genomic loci, and also tested numerous extremely conserved human-rodent sequences for transcriptional enhancer activity using reporter-gene assays in transgenic mice. Human regulatory elements were identified with acceptable sensitivity and specificity by comparison with 1-5 other eutherian mammals or 6 other simian primates. More distant comparisons (marsupial, avian, amphibian and fish) failed to identify many of the empirically defined functional noncoding elements. We derived an intuitive relationship between ancient and recent noncoding sequence conservation from whole genome comparative analysis, which explains some of these findings. Lastly, we determined that, in addition to strength of conservation, genomic location and/or density of surrounding conserved elements must also be considered in selecting candidate enhancers for testing at embryonic time points.

  18. A National Comparison of Biochemistry and Molecular Biology Capstone Experiences

    ERIC Educational Resources Information Center

    Aguanno, Ann; Mertz, Pamela; Martin, Debra; Bell, Ellis

    2015-01-01

    Recognizing the increasingly integrative nature of the molecular life sciences, the "American Society for Biochemistry and Molecular Biology" (ASBMB) recommends that Biochemistry and Molecular Biology (BMB) programs develop curricula based on concepts, content, topics, and expected student outcomes, rather than courses. To that end,…

  19. A National Comparison of Biochemistry and Molecular Biology Capstone Experiences

    ERIC Educational Resources Information Center

    Aguanno, Ann; Mertz, Pamela; Martin, Debra; Bell, Ellis

    2015-01-01

    Recognizing the increasingly integrative nature of the molecular life sciences, the "American Society for Biochemistry and Molecular Biology" (ASBMB) recommends that Biochemistry and Molecular Biology (BMB) programs develop curricula based on concepts, content, topics, and expected student outcomes, rather than courses. To that end,…

  20. Basal Murphy belt and Chilhowee Group -- Sequence stratigraphic comparison

    SciTech Connect

    Aylor, J.G. Jr. . Dept. of Geology)

    1994-03-01

    The lower Murphy belt in the central western Blue Ridge is interpreted to be correlative to the Early Cambrian Chilhowee Group of the westernmost Blue Ridge and Appalachian fold and thrust belt. Basal Murphy belt depositional sequence stratigraphy represents a second-order, type-2 transgressive systems tract initiated with deposition of lowstand turbidites of the Dean Formation. These transgressive deposits of the Nantahala and Brasstown Formations are interpreted as middle to outer continental shelf deposits. Cyclic and stacked third-order regressive, coarsening upwards sequences of the Nantahala Formation display an overall increase in feldspar content stratigraphically upsection. These transgressive siliciclastic deposits are interpreted to be conformably overlain by a carbonate highstand systems tract of the Murphy Marble. Palinspastic reconstruction indicates that the Nantahala and Brasstown Formations possibly represent a basinward extension of up to 3 km thick siliciclastic wedge. The wedge tapers to the southwest along the strike of the Murphy belt at 10[degree] and thins northwestward to 2 km in the Tennessee depocenter where it is represented by the Chilhowee Group. The Murphy belt basin is believed to represent a transitional rift-to-drift facies deposited on the lower plate of the southern Blue Ridge rift zone.

  1. Molecular Evolution of the Escherichia Coli Chromosome. IV. Sequence Comparisons

    PubMed Central

    Milkman, R.; Bridges, M. M.

    1993-01-01

    DNA sequences have been compared in a 4,400-bp region for Escherichia coli K12 and 36 ECOR strains. Discontinuities in degree of similarity, previously inferred, are confirmed in detail. Three clonal frames are described on the basis of the present local high-resolution data, as well as previous analyses of restriction fragment length polymorphism (RFLP) and of multilocus enzyme electrophoresis (MLEE) covering small regions more widely dispersed on the chromosome. These three approaches show important consistency. The data illustrate the fact that, in the limited context of intraspecific genomic sequence variation, clonality and homology are synonymous. Two estimable quantitative properties are defined: recency of common ancestry (the reciprocal of the log(10) of the number of generations since the most recent common ancestor), and the number of nucleotide pairs over which a given recency of common ancestry applies. In principle, these parameters are measures of the degree and physical extent of homology. The small size of apparent recombinational replacements, together with the observation that they occasionally occur in discontinuous series, raises the question of whether they result from the superimposition of replacements of much larger size (as expected from an elementary interpretation of conjugation and transduction in experimental E. coli systems) or via an alternative mechanism. Length polymorphisms of several sorts are described. PMID:8095913

  2. Development of a candidate reference material for adventitious virus detection in vaccine and biologicals manufacturing by deep sequencing.

    PubMed

    Mee, Edward T; Preston, Mark D; Minor, Philip D; Schepelmann, Silke

    2016-04-12

    Unbiased deep sequencing offers the potential for improved adventitious virus screening in vaccines and biotherapeutics. Successful implementation of such assays will require appropriate control materials to confirm assay performance and sensitivity. A common reference material containing 25 target viruses was produced and 16 laboratories were invited to process it using their preferred adventitious virus detection assay. Fifteen laboratories returned results, obtained using a wide range of wet-lab and informatics methods. Six of 25 target viruses were detected by all laboratories, with the remaining viruses detected by 4-14 laboratories. Six non-target viruses were detected by three or more laboratories. The study demonstrated that a wide range of methods are currently used for adventitious virus detection screening in biological products by deep sequencing and that they can yield significantly different results. This underscores the need for common reference materials to ensure satisfactory assay performance and enable comparisons between laboratories. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  3. Reconstruction of an ancestral Yersinia pestis genome and comparison with an ancient sequence

    PubMed Central

    2015-01-01

    Background We propose the computational reconstruction of a whole bacterial ancestral genome at the nucleotide scale, and its validation by a sequence of ancient DNA. This rare possibility is offered by an ancient sequence of the late middle ages plague agent. It has been hypothesized to be ancestral to extant Yersinia pestis strains based on the pattern of nucleotide substitutions. But the dynamics of indels, duplications, insertion sequences and rearrangements has impacted all genomes much more than the substitution process, which makes the ancestral reconstruction task challenging. Results We use a set of gene families from 13 Yersinia species, construct reconciled phylogenies for all of them, and determine gene orders in ancestral species. Gene trees integrate information from the sequence, the species tree and gene order. We reconstruct ancestral sequences for ancestral genic and intergenic regions, providing nearly a complete genome sequence for the ancestor, containing a chromosome and three plasmids. Conclusion The comparison of the ancestral and ancient sequences provides a unique opportunity to assess the quality of ancestral genome reconstruction methods. But the quality of the sequencing and assembly of the ancient sequence can also be questioned by this comparison. PMID:26450112

  4. Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains

    PubMed Central

    Liao, Weinan; Ren, Jie; Wang, Kun; Wang, Shun; Zeng, Feng; Wang, Ying; Sun, Fengzhu

    2016-01-01

    The comparison between microbial sequencing data is critical to understand the dynamics of microbial communities. The alignment-based tools analyzing metagenomic datasets require reference sequences and read alignments. The available alignment-free dissimilarity approaches model the background sequences with Fixed Order Markov Chain (FOMC) yielding promising results for the comparison of microbial communities. However, in FOMC, the number of parameters grows exponentially with the increase of the order of Markov Chain (MC). Under a fixed high order of MC, the parameters might not be accurately estimated owing to the limitation of sequencing depth. In our study, we investigate an alternative to FOMC to model background sequences with the data-driven Variable Length Markov Chain (VLMC) in metatranscriptomic data. The VLMC originally designed for long sequences was extended to apply to high-throughput sequencing reads and the strategies to estimate the corresponding parameters were developed. The flexible number of parameters in VLMC avoids estimating the vast number of parameters of high-order MC under limited sequencing depth. Different from the manual selection in FOMC, VLMC determines the MC order adaptively. Several beta diversity measures based on VLMC were applied to compare the bacterial RNA-Seq and metatranscriptomic datasets. Experiments show that VLMC outperforms FOMC to model the background sequences in transcriptomic and metatranscriptomic samples. A software pipeline is available at https://d2vlmc.codeplex.com. PMID:27876823

  5. SeqHound: biological sequence and structure database as a platform for bioinformatics research

    PubMed Central

    2002-01-01

    Background SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. Results SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. Conclusions The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit. PMID:12401134

  6. A comparison of several similarity indices used in the classification of protein sequences: a multivariate analysis.

    PubMed Central

    Landès, C; Hénaut, A; Risler, J L

    1992-01-01

    The present work describes an attempt to identify reliable criteria which could be used as distance indices between protein sequences. Seven different criteria have been tested: i and ii) the scores of the alignments as given by the BESTFIT and the FASTA programs; iii) the ratio parameter, i.e. the BESTFIT score divided by the length of the aligned peptides; iv and v) the statistical significance (Z-scores) of the scores calculated by BESTFIT and FASTA, as obtained by comparison with shuffled sequences; vi) the Z-scores provided by the program RELATE which performs a segment-by-segment comparison of 2 sequences, and vii) an original distance index calculated by the program DOCMA from all the pairwise dotplots between the sequences. These 7 criteria have been tested against the aminoacid sequences of 39 globins and those of the 20 aminoacyl-tRNA synthetases from E. coli. The distances between the sequences were analyzed by the multivariate analysis techniques. The results show that the distances calculated from the scores of the pairwise alignments are not adequately sensitive. The Z-score from RELATE is not selective enough and too demanding in computer time. Three criteria gave a classification consistent with the known similarities between the sequences in the sets, namely the Z-scores from BESTFIT and FASTA and the multiple dotplot comparison distance index from DOCMA. PMID:1641329

  7. The genome sequence of taurine cattle: a window to ruminant biology and evolution.

    PubMed

    Elsik, Christine G; Tellam, Ross L; Worley, Kim C; Gibbs, Richard A; Muzny, Donna M; Weinstock, George M; Adelson, David L; Eichler, Evan E; Elnitski, Laura; Guigó, Roderic; Hamernik, Debora L; Kappes, Steve M; Lewin, Harris A; Lynn, David J; Nicholas, Frank W; Reymond, Alexandre; Rijnkels, Monique; Skow, Loren C; Zdobnov, Evgeny M; Schook, Lawrence; Womack, James; Alioto, Tyler; Antonarakis, Stylianos E; Astashyn, Alex; Chapple, Charles E; Chen, Hsiu-Chuan; Chrast, Jacqueline; Câmara, Francisco; Ermolaeva, Olga; Henrichsen, Charlotte N; Hlavina, Wratko; Kapustin, Yuri; Kiryutin, Boris; Kitts, Paul; Kokocinski, Felix; Landrum, Melissa; Maglott, Donna; Pruitt, Kim; Sapojnikov, Victor; Searle, Stephen M; Solovyev, Victor; Souvorov, Alexandre; Ucla, Catherine; Wyss, Carine; Anzola, Juan M; Gerlach, Daniel; Elhaik, Eran; Graur, Dan; Reese, Justin T; Edgar, Robert C; McEwan, John C; Payne, Gemma M; Raison, Joy M; Junier, Thomas; Kriventseva, Evgenia V; Eyras, Eduardo; Plass, Mireya; Donthu, Ravikiran; Larkin, Denis M; Reecy, James; Yang, Mary Q; Chen, Lin; Cheng, Ze; Chitko-McKown, Carol G; Liu, George E; Matukumalli, Lakshmi K; Song, Jiuzhou; Zhu, Bin; Bradley, Daniel G; Brinkman, Fiona S L; Lau, Lilian P L; Whiteside, Matthew D; Walker, Angela; Wheeler, Thomas T; Casey, Theresa; German, J Bruce; Lemay, Danielle G; Maqbool, Nauman J; Molenaar, Adrian J; Seo, Seongwon; Stothard, Paul; Baldwin, Cynthia L; Baxter, Rebecca; Brinkmeyer-Langford, Candice L; Brown, Wendy C; Childers, Christopher P; Connelley, Timothy; Ellis, Shirley A; Fritz, Krista; Glass, Elizabeth J; Herzig, Carolyn T A; Iivanainen, Antti; Lahmers, Kevin K; Bennett, Anna K; Dickens, C Michael; Gilbert, James G R; Hagen, Darren E; Salih, Hanni; Aerts, Jan; Caetano, Alexandre R; Dalrymple, Brian; Garcia, Jose Fernando; Gill, Clare A; Hiendleder, Stefan G; Memili, Erdogan; Spurlock, Diane; Williams, John L; Alexander, Lee; Brownstein, Michael J; Guan, Leluo; Holt, Robert A; Jones, Steven J M; Marra, Marco A; Moore, Richard; Moore, Stephen S; Roberts, Andy; Taniguchi, Masaaki; Waterman, Richard C; Chacko, Joseph; Chandrabose, Mimi M; Cree, Andy; Dao, Marvin Diep; Dinh, Huyen H; Gabisi, Ramatu Ayiesha; Hines, Sandra; Hume, Jennifer; Jhangiani, Shalini N; Joshi, Vandita; Kovar, Christie L; Lewis, Lora R; Liu, Yih-Shin; Lopez, John; Morgan, Margaret B; Nguyen, Ngoc Bich; Okwuonu, Geoffrey O; Ruiz, San Juana; Santibanez, Jireh; Wright, Rita A; Buhay, Christian; Ding, Yan; Dugan-Rocha, Shannon; Herdandez, Judith; Holder, Michael; Sabo, Aniko; Egan, Amy; Goodell, Jason; Wilczek-Boney, Katarzyna; Fowler, Gerald R; Hitchens, Matthew Edward; Lozado, Ryan J; Moen, Charles; Steffen, David; Warren, James T; Zhang, Jingkun; Chiu, Readman; Schein, Jacqueline E; Durbin, K James; Havlak, Paul; Jiang, Huaiyang; Liu, Yue; Qin, Xiang; Ren, Yanru; Shen, Yufeng; Song, Henry; Bell, Stephanie Nicole; Davis, Clay; Johnson, Angela Jolivet; Lee, Sandra; Nazareth, Lynne V; Patel, Bella Mayurkumar; Pu, Ling-Ling; Vattathil, Selina; Williams, Rex Lee; Curry, Stacey; Hamilton, Cerissa; Sodergren, Erica; Wheeler, David A; Barris, Wes; Bennett, Gary L; Eggen, André; Green, Ronnie D; Harhay, Gregory P; Hobbs, Matthew; Jann, Oliver; Keele, John W; Kent, Matthew P; Lien, Sigbjørn; McKay, Stephanie D; McWilliam, Sean; Ratnakumar, Abhirami; Schnabel, Robert D; Smith, Timothy; Snelling, Warren M; Sonstegard, Tad S; Stone, Roger T; Sugimoto, Yoshikazu; Takasuga, Akiko; Taylor, Jeremy F; Van Tassell, Curtis P; Macneil, Michael D; Abatepaulo, Antonio R R; Abbey, Colette A; Ahola, Virpi; Almeida, Iassudara G; Amadio, Ariel F; Anatriello, Elen; Bahadue, Suria M; Biase, Fernando H; Boldt, Clayton R; Carroll, Jeffery A; Carvalho, Wanessa A; Cervelatti, Eliane P; Chacko, Elsa; Chapin, Jennifer E; Cheng, Ye; Choi, Jungwoo; Colley, Adam J; de Campos, Tatiana A; De Donato, Marcos; Santos, Isabel K F de Miranda; de Oliveira, Carlo J F; Deobald, Heather; Devinoy, Eve; Donohue, Kaitlin E; Dovc, Peter; Eberlein, Annett; Fitzsimmons, Carolyn J; Franzin, Alessandra M; Garcia, Gustavo R; Genini, Sem; Gladney, Cody J; Grant, Jason R; Greaser, Marion L; Green, Jonathan A; Hadsell, Darryl L; Hakimov, Hatam A; Halgren, Rob; Harrow, Jennifer L; Hart, Elizabeth A; Hastings, Nicola; Hernandez, Marta; Hu, Zhi-Liang; Ingham, Aaron; Iso-Touru, Terhi; Jamis, Catherine; Jensen, Kirsty; Kapetis, Dimos; Kerr, Tovah; Khalil, Sari S; Khatib, Hasan; Kolbehdari, Davood; Kumar, Charu G; Kumar, Dinesh; Leach, Richard; Lee, Justin C-M; Li, Changxi; Logan, Krystin M; Malinverni, Roberto; Marques, Elisa; Martin, William F; Martins, Natalia F; Maruyama, Sandra R; Mazza, Raffaele; McLean, Kim L; Medrano, Juan F; Moreno, Barbara T; Moré, Daniela D; Muntean, Carl T; Nandakumar, Hari P; Nogueira, Marcelo F G; Olsaker, Ingrid; Pant, Sameer D; Panzitta, Francesca; Pastor, Rosemeire C P; Poli, Mario A; Poslusny, Nathan; Rachagani, Satyanarayana; Ranganathan, Shoba; Razpet, Andrej; Riggs, Penny K; Rincon, Gonzalo; Rodriguez-Osorio, Nelida; Rodriguez-Zas, Sandra L; Romero, Natasha E; Rosenwald, Anne; Sando, Lillian; Schmutz, Sheila M; Shen, Libing; Sherman, Laura; Southey, Bruce R; Lutzow, Ylva Strandberg; Sweedler, Jonathan V; Tammen, Imke; Telugu, Bhanu Prakash V L; Urbanski, Jennifer M; Utsunomiya, Yuri T; Verschoor, Chris P; Waardenberg, Ashley J; Wang, Zhiquan; Ward, Robert; Weikard, Rosemarie; Welsh, Thomas H; White, Stephen N; Wilming, Laurens G; Wunderlich, Kris R; Yang, Jianqi; Zhao, Feng-Qi

    2009-04-24

    To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.

  8. The Genome Sequence of Taurine Cattle: A window to ruminant biology and evolution

    PubMed Central

    Elsik, Christine G.; Tellam, Ross L.; Worley, Kim C.

    2010-01-01

    To understand the biology and evolution of ruminants, the cattle genome was sequenced to ∼7× coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1,217 are absent or undetected in non-eutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides an enabling resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production. PMID:19390049

  9. Biologically inspired multilevel approach for multiple moving targets detection from airborne forward-looking infrared sequences.

    PubMed

    Li, Yansheng; Tan, Yihua; Li, Hang; Li, Tao; Tian, Jinwen

    2014-04-01

    In this paper, a biologically inspired multilevel approach for simultaneously detecting multiple independently moving targets from airborne forward-looking infrared (FLIR) sequences is proposed. Due to the moving platform, low contrast infrared images, and nonrepeatability of the target signature, moving targets detection from FLIR sequences is still an open problem. Avoiding six parameter affine or eight parameter planar projective transformation matrix estimation of two adjacent frames, which are utilized by existing moving targets detection approaches to cope with the moving infrared camera and have become the bottleneck for the further elevation of the moving targets detection performance, the proposed moving targets detection approach comprises three sequential modules: motion perception for efficiently extracting motion cues, attended motion views extraction for coarsely localizing moving targets, and appearance perception in the local attended motion views for accurately detecting moving targets. Experimental results demonstrate that the proposed approach is efficient and outperforms the compared state-of-the-art approaches.

  10. Understanding sequence similarity and framework analysis between centromere proteins using computational biology.

    PubMed

    Doss, C George Priya; Chakrabarty, Chiranjib; Debajyoti, C; Debottam, S

    2014-11-01

    Certain mysteries pointing toward their recruitment pathways, cell cycle regulation mechanisms, spindle checkpoint assembly, and chromosome segregation process are considered the centre of attraction in cancer research. In modern times, with the established databases, ranges of computational platforms have provided a platform to examine almost all the physiological and biochemical evidences in disease-associated phenotypes. Using existing computational methods, we have utilized the amino acid residues to understand the similarity within the evolutionary variance of different associated centromere proteins. This study related to sequence similarity, protein-protein networking, co-expression analysis, and evolutionary trajectory of centromere proteins will speed up the understanding about centromere biology and will create a road map for upcoming researchers who are initiating their work of clinical sequencing using centromere proteins.

  11. A Multiresolution Graphical Representation for Similarity Relationship and Multiresolution Clustering for Biological Sequences.

    PubMed

    Yang, Lianping; Zhang, Weilin

    2017-04-01

    How we can describe the similarity relationship between the biological sequences is a basic but important problem in bioinformatics. The first graphical representation method for the similarity relationship rather than for single sequence is proposed in this article, which makes the similarity intuitional. Some properties such as sensitivity and continuity of the similarity are proved theoretically, which indicate that the similarity describer has the advantage of both alignment and alignment-free methods. With the aid of multiresolution analysis tools, we can exhibit the similarity's different profiles, from high resolution to low resolution. Then the idea of multiresolution clustering is raised first. A reassortment analysis on a benchmark flu virus genome data set is to test our method and it shows a better performance than alignment method, especially in dealing with problems involving segments' order.

  12. Insertion sequences shared by Bordetella species and implications for the biological diagnosis of pertussis syndrome.

    PubMed

    Tizolova, A; Guiso, N; Guillot, S

    2013-01-01

    The molecular diagnosis of pertussis and parapertussis syndromes is based on the detection of insertion sequences (IS) 481 and 1001, respectively. However, these IS are also detected in the genomes of various Bordetella species, such that they are not specific for either B. pertussis or B. parapertussis. Therefore, we screened the genome of recently circulating isolates of Bordetella species to compare the prevalence of IS481, IS1001 and, also IS1002 with previously published data and to sequence all IS detected. We also investigated whether the numbers of IS481 and IS1001 copies vary in recently circulating isolates of the different Bordetella species. We used the polymerase chain reaction (PCR) method for screening the genome of circulating isolates and to prepare the fragments for sequencing. We used Southern blotting and quantitative real-time PCR for quantification of the numbers of IS. We found no significant diversity in the sequences of the IS harboured in the genomes of the Bordetella isolates screened, except for a 71-nucleotide deletion from IS1002 in B. bronchiseptica. The IS copy numbers in the genome of recently circulating isolates were similar to those in reference strains. Our results confirm that biological diagnosis targeting the IS481 and IS1001 elements are not specific and detect the species B. pertussis, B. holmesii and B. bronchiseptica (IS481), and B. parapertussis and B. bronchiseptica (IS1001).

  13. Multiplex parallel pair-end-ditag sequencing approaches in system biology.

    PubMed

    Ruan, Yijun; Wei, Chia-Lin

    2010-01-01

    Characterization of all the functional components constituted in human genome relies in our ability to completely elucidate the genetic/epigenetic regulatory networks, chromatin states, nuclear architectures, and genome variations. Such endeavors demand for the development of robust and effective genomic technologies. In the past few years, the availability of disruptive next generation DNA sequencing technologies has offered new promise for whole genome interrogation. However, despite the massive parallel and ultra-high throughput capacity, the common nature of short read lengths found within these platforms limits their applications for many types of whole genome-based analyses. To overcome such constrain, pair end ditag (PET) based sequencing concept was conceived as an immediate solution to expand the information content and extend the linear coverage. By sequencing paired end signatures from any desired DNA fragment and mapping them to the reference genome, PET strategy allows the accurate demarcation of target DNA boundaries and defines their locations on the genomic landscape. Furthermore, the ability to delineate relationship between two ends of a DNA molecule enables the full scale discovery of unconventional gene products, genome rearrangements, and chromatin interactions. Coupling with the massively parallel and ultra-high throughput sequencing platforms, such unique features of PET strategy have the potential to revolutionize the approaches used to decipher regulatory networks in system biology, define the genome organizations, and characterize genome variations; which ultimately leads to the development of strategies for personalized medicine.

  14. Nucleotide sequence of a cloned woodchuck hepatitis virus genome: comparison with the hepatitis B virus sequence.

    PubMed Central

    Galibert, F; Chen, T N; Mandart, E

    1982-01-01

    The complete nucleotide sequence of a woodchuck hepatitis virus genome cloned in Escherichia coli was determined by the method of Maxam and Gilbert. This sequence was found to be 3,308 nucleotides long. Potential ATG initiator triplets and nonsense codons were identified and used to locate regions with a substantial coding capacity. A striking similarity was observed between the organization of human hepatitis B virus and woodchuck hepatitis virus. Nucleotide sequences of these open regions in the woodchuck virus were compared with corresponding regions present in hepatitis B virus. This allowed the location of four viral genes on the L strand and indicated the absence of protein coded by the S strand. Evolution rates of the various parts of the genome as well as of the four different proteins coded by hepatitis B virus and woodchuck hepatitis virus were compared. These results indicated that: (i) the core protein has evolved slightly less rapidly than the other proteins; and (ii) when a region of DNA codes for two different proteins, there is less freedom for the DNA to evolve and, moreover, one of the proteins can evolve more rapidly than the other. A hairpin structure, very well conserved in the two genomes, was located in the only region devoid of coding function, suggesting the location of the origin of replication of the viral DNA. Images PMID:7086958

  15. Understanding the granulation process of activated sludge in a biological phosphorus removal sequencing batch reactor.

    PubMed

    Wu, Chang-Yong; Peng, Yong-Zhen; Wang, Ran-Deng; Zhou, Yue-Xi

    2012-02-01

    The granulation of activated sludge was investigated using two parallel sequencing batch reactors (SBRs) operated in biological nitrogen and phosphorus removal conditions though the reactor configuration and operating parameters did not favor the granulation. Granules were not observed when the SBR was operated in biological nitrogen removal period for 30d. However, aerobic granules were formed naturally without the increase of aeration intensity when enhanced biological phosphorus removal (EBPR) was achieved. It can be detected that plenty of positive charged particles were formed with the release of phosphorus during the anaerobic period of EBPR. The size of the particles was about 5-20 μm and their highest positive ζ potential was about 73 mV. These positive charged particles can stimulate the granulation. Based on the experimental results, a hypothesis was proposed to interpret the granulation process of activated sludge in the EBPR process in SBR. Dense and compact subgranules were formed stimulated by the positive charged particles. The subgranules grew gradually by collision, adhesion and attached growth of bacteria. Finally, the extrusion and shear of hydrodynamic shear force would help the maturation of granules. Aerobic granular SBR showed excellent biological phosphorus removal ability. The average phosphorus removal efficiency was over 95% and the phosphorus in the effluent was below 0.50 mg L(-1) during the operation. Copyright © 2011 Elsevier Ltd. All rights reserved.

  16. Low temperature biological phosphorus removal and partial nitrification in a pilot sequencing batch reactor system.

    PubMed

    Yuan, Qiuyan; Oleszkiewicz, Jan A

    2011-01-01

    Partial nitrification and biological phosphorus removal appear to hold promise of a cost-effective and sustainable biological nutrient removal process. Pilot sequencing batch reactors (SBRs) were operated under anaerobic/aerobic configuration for 8 months. It was found that biological phosphorus removal can be achieved in an SBR system, along with the partial nitrification process. Sufficient volatile fatty acids supply was the key for enhanced biological phosphorus removal. This experiment demonstrated that partial nitrification can be achieved even at low temperature with high dissolved oxygen (>3 mg/L) concentration. Shorter solid retention time (SRT) for nitrite oxidizing bacteria (NOB) than for ammonia oxidizing bacteria due to the nitrite substrate limitation at the beginning of the aeration cycle was the reason that caused NOB wash-out. Controlling SRT should be the strategy for an SBR operated in cold climate to achieve partial nitrification. It was also found that the aerobic phosphorus accumulating organisms' P-uptake was more sensitive to nitrite inhibition than the process of anaerobic P-release.

  17. Biological characterization and complete genomic sequence of Apium virus Y infecting celery.

    PubMed

    Xu, Donglin; Liu, Hsing-Yeh; Koike, Steven T; Li, Fan; Li, Ruhui

    2011-01-01

    A celery isolate of Apium virus Y (ApVY-Ce) from diseased plants in a commercial field in California was characterized. The experimental host range of the virus included 13 plant species in the families Apiaceae, Chenopodiaceae and Solanaceae. Almost all infected plant species showed foliar chlorosis and distortion or severe stunting and systemic chlorosis. ApVY-Ce was transmitted to all 10 host species in the Apiaceae by green peach aphids. It reacted with the potyvirus group antibody and Celery mosaic virus (CeMV) antiserum. The complete genomic sequence of ApVY-Ce was determined to be 9917 nucleotides, excluding the 3' poly(A) tail, and it comprises a large open reading frame encoding a polyprotein of 3184 amino acid residues. Its genomic organization is typical of potyviruses, and contains conserved motifs found in the genus Potyvirus. Comparisons with available genomic sequences of other potyviruses indicate that ApVY-Ce shares 26.1-52.9% identities with species of the existing genera and unassigned viruses in the Potyviridae at the polyprotein sequence level. Extensive phylogenetic analysis based on the 3'-partial sequences confirms that ApVY-Ce is most closely related to CeMV and is a distinct species of the genus Potyvirus.

  18. Pollution control in pulp and paper industrial effluents using integrated chemical-biological treatment sequences.

    PubMed

    El-Bestawy, Ebtesam; El-Sokkary, Ibrahim; Hussein, Hany; Keela, Alaa Farouk Abu

    2008-11-01

    The main objective of the present study was to improve the quality of pulp and paper industrial wastewater of two local mills RAKTA and El-Ahlia, Alexandria, Egypt, and to bring their pollutant contents to safe discharge levels. Quality improvement was carried out using integrated chemical and biological treatment approaches after their optimization. Chemical treatment (alum, lime, and ferric chloride) was followed by oxidation using hydrogen peroxide and finally biological treatment using activated sludge (90 min for RAKTA and 60 min for El-Ahlia effluents). Chemical coagulation produced low-quality effluents, while pH adjustment during coagulation treatment did not enhance the quality of the effluents. Maximum removal of the tested pollutants was achieved using the integrated treatment and the pollutants recorded residual concentrations (RCs) of 34.67, 17.33, 0.13, and 0.43 mg/l and 15.0, 11.0, 0.0, and 0.13 mg/l for chemical oxygen demand (COD), biochemical oxygen demand (BOD5), tannin and lignin, and silica in RAKTA and El-Ahlia effluents, respectively, all of which were below their maximum permissible limits (MPLs) for the safe discharge into water courses. Specific oxygen uptake rate (SOUR) and sludge volume index (SVI) values reflect good conditions and healthy activated sludge. Based on the previous results, optimized conditions were applied as bench scale on the raw effluents of RAKTA and El-Ahlia via the batch chemical and the biological treatment sequences proposed. For RAKTA effluents, the sequence was as follows: (1) coagulation with 375 mg/l FeCl3, (2) oxidation with 50 mg/l hydrogen peroxide, and (3) biological treatment using activated sludge with 2,000 mg/l initial concentration and 90 min hydraulic retention time (HRT), while for El-Ahlia raw effluents, the sequence was (1) coagulation with 250 mg/l FeCl3, (2) oxidation with 45 mg/l hydrogen peroxide, and (3) biological treatment using activated sludge with 2,000 mg/l initial concentration and 60

  19. Improvements to pairwise sequence comparison (PASC): a genome-based web tool for virus classification.

    PubMed

    Bao, Yiming; Chetvernin, Vyacheslav; Tatusova, Tatiana

    2014-12-01

    The number of viral genome sequences in the public databases is increasing dramatically, and these sequences are playing an important role in virus classification. Pairwise sequence comparison is a sequence-based virus classification method. A program using this method calculates the pairwise identities of virus sequences within a virus family and displays their distribution, and visual analysis helps to determine demarcations at different taxonomic levels such as strain, species, genus and subfamily. Subsequent comparison of new sequences against existing ones allows viruses from which the new sequences were derived to be classified. Although this method cannot be used as the only criterion for virus classification in some cases, it is a quantitative method and has many advantages over conventional virus classification methods. It has been applied to several virus families, and there is an increasing interest in using this method for other virus families/groups. The Pairwise Sequence Comparison (PASC) classification tool was created at the National Center for Biotechnology Information. The tool's database stores pairwise identities for complete genomes/segments of 56 virus families/groups. Data in the system are updated every day to reflect changes in virus taxonomy and additions of new virus sequences to the public database. The web interface of the tool ( http://www.ncbi.nlm.nih.gov/sutils/pasc/ ) makes it easy to navigate and perform analyses. Multiple new viral genome sequences can be tested simultaneously with this system to suggest the taxonomic position of virus isolates in a specific family. PASC eliminates potential discrepancies in the results caused by different algorithms and/or different data used by researchers.

  20. Comparison of agriculture biology and general biology testing outcomes in Utah

    NASA Astrophysics Data System (ADS)

    Despain, Deric Walter

    Agriculture education can take scientific topics to higher levels, emphasize scientific concepts, involve hands-on learning, and develop interrelationships with the other sciences, thus making the living and non-living world around them relevant for students. Prior to 1996, agriculture education was not considered adequate to prepare Utah high school students to meet state biology requirements. The appropriateness of making that equalizing decision in 1996 was not tested until this 2014 study, comparing student test scores on the state biology test for general biology and agriculture biology students. The 2008-2012 data were collected from the Utah Department of Education Data and Statistics, utilizing a descriptive comparative post-test only analysis. As seen in this study, not only did B/AS students tend to score lower than their General Biology counterparts, in multiple cases this difference was significant (p ≤ .05). This contrary finding challenges the theoretical foundation of this study. As a result of this study three implications were made; (a) the Utah CRT-Biology test is not a reliable gauge of academic achievement in agriculture biology, (b) agriculture students in the sample population have not been taught with rigorous biology standards, and (c) biology standards taught in agricultural biology classes are not aligned with content tested by the biology portion of the Utah CRT-Biology test standards. The results of this study indicate to stakeholders that there is a gap occurring within the B/AS education, and the need to reevaluate the biology curriculum delivery to its population may possibly be in need of immediate action.

  1. Biochemical and in vitro biological significance of natural sequence variation in the ovine leptin gene.

    PubMed

    Reicher, Shay; Gertler, Arieh; Seroussi, Eyal; Shpilman, Michal; Gootwine, Elisha

    2011-08-01

    The hormone leptin is involved in diverse biological processes, including regulation of food intake, body-weight homeostasis and energy balance. Sequence variation in the bovine leptin gene has been found to be associated with variations in carcass fat content and average daily gain, as well as in milk yield, milk somatic cell count and several traits governing reproduction. We sequenced genomic DNA and cDNA samples of individuals from three divergent sheep breeds and revealed synonymous as well as novel non-synonymous allelic variation at the third exon of the ovine leptin gene (oLEP) as compared to the sequence published at Accession No. U84247 (reference sequence). In addition, two alternatively spliced oLEP transcripts were found in the abdominal fat tissue. The biochemical and the in vitro biological significance of the sequence variation in the oLEP was examined by generating recombinant oLEP-protein variants namely: p.Q28del, p.N78S, p.R84Q, p.P99Q, p.V123L and p.R138Q, carrying the corresponding sequence variation. Surface plasmon resonance experiments revealed, in most cases, reduced affinity of the oLEP protein variants examined, to human leptin-binding domain (hLBD), relative to the reference variant, being 0.75, 0.60, 0.60, 0.89, 0.92 and 1.03, respectively. In competitive binding assays between biotinylated oLEP and the recombinant leptin protein variants, p.N78S and p.R84Q variants exhibited the lowest affinity to hLBD (0.18 and 0.41, respectively) as compared to the reference hormone. We then tested the protein variants' ability to induce proliferation in Baf-3 cells stably expressing the long form of the human leptin receptor: significant differences in proliferative activity were only found for p.N78S (1.8-fold higher) and p.R138Q (4.2-fold lower) relative to the reference oLEP variant. Copyright © 2011 Elsevier Inc. All rights reserved.

  2. Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently

    PubMed Central

    Currin, Andrew; Swainston, Neil; Day, Philip J.

    2015-01-01

    The amino acid sequence of a protein affects both its structure and its function. Thus, the ability to modify the sequence, and hence the structure and activity, of individual proteins in a systematic way, opens up many opportunities, both scientifically and (as we focus on here) for exploitation in biocatalysis. Modern methods of synthetic biology, whereby increasingly large sequences of DNA can be synthesised de novo, allow an unprecedented ability to engineer proteins with novel functions. However, the number of possible proteins is far too large to test individually, so we need means for navigating the ‘search space’ of possible protein sequences efficiently and reliably in order to find desirable activities and other properties. Enzymologists distinguish binding (K d) and catalytic (k cat) steps. In a similar way, judicious strategies have blended design (for binding, specificity and active site modelling) with the more empirical methods of classical directed evolution (DE) for improving k cat (where natural evolution rarely seeks the highest values), especially with regard to residues distant from the active site and where the functional linkages underpinning enzyme dynamics are both unknown and hard to predict. Epistasis (where the ‘best’ amino acid at one site depends on that or those at others) is a notable feature of directed evolution. The aim of this review is to highlight some of the approaches that are being developed to allow us to use directed evolution to improve enzyme properties, often dramatically. We note that directed evolution differs in a number of ways from natural evolution, including in particular the available mechanisms and the likely selection pressures. Thus, we stress the opportunities afforded by techniques that enable one to map sequence to (structure and) activity in silico, as an effective means of modelling and exploring protein landscapes. Because known landscapes may be assessed and reasoned about as a whole

  3. Secure distributed genome analysis for GWAS and sequence comparison computation

    PubMed Central

    2015-01-01

    Background The rapid increase in the availability and volume of genomic data makes significant advances in biomedical research possible, but sharing of genomic data poses challenges due to the highly sensitive nature of such data. To address the challenges, a competition for secure distributed processing of genomic data was organized by the iDASH research center. Methods In this work we propose techniques for securing computation with real-life genomic data for minor allele frequency and chi-squared statistics computation, as well as distance computation between two genomic sequences, as specified by the iDASH competition tasks. We put forward novel optimizations, including a generalization of a version of mergesort, which might be of independent interest. Results We provide implementation results of our techniques based on secret sharing that demonstrate practicality of the suggested protocols and also report on performance improvements due to our optimization techniques. Conclusions This work describes our techniques, findings, and experimental results developed and obtained as part of iDASH 2015 research competition to secure real-life genomic computations and shows feasibility of securely computing with genomic data in practice. PMID:26733307

  4. The landscape of fusion transcripts in spitzoid melanoma and biologically indeterminate spitzoid tumors by RNA sequencing

    PubMed Central

    Wu, Gang; Barnhill, Raymond L.; Lee, Seungjae; Li, Yongjin; Shao, Ying; Easton, John; Dalton, James; Zhang, Jinghui; Pappo, Alberto; Bahrami, Armita

    2016-01-01

    Kinase activation by chromosomal translocations is a common mechanism that drives tumorigenesis in spitzoid neoplasms. To explore the landscape of fusion transcripts in these tumors, we performed whole-transcriptome sequencing using formalin-fixed paraffin-embedded tissues in malignant or biologically indeterminate spitzoid tumors from 7 patients (age 2–14 years). RNA sequence libraries enriched for coding regions were prepared and the sequencing was analyzed by a novel assembly-based algorithm designed for detecting complex fusions. In addition, tumor samples were screened for hotspot TERT promoter mutations, and telomerase expression was assessed by TERT mRNA in situ hybridization (ISH). Two patients had widespread metastasis and subsequently died of disease, and 5 patients had a benign clinical course on limited follow-up (mean: 30 months). RNA sequencing and TERT mRNA ISH were successful in 6 tumors and unsuccessful in 1 disseminating tumor due to low RNA quality. RNA sequencing identified a kinase fusion in 5 of the 6 sequenced tumors: TPM3–NTRK1 (2 tumors), complex rearrangements involving TPM3, ALK, and IL6R (1 tumor), BAIAP2L1–BRAF (1 tumor), and EML4–BRAF (1 disseminating tumor). All predicted chimeric transcripts were expressed at high levels and contained the intact kinase domain. In addition, 2 tumors each contained a second fusion gene, ARID1B-SNX9 or PTPRZ1-NFAM1. The detected chimeric genes were validated by home-brew break-apart or fusion fluorescence in situ hybridization. The 2 disseminating tumors each harbored the TERT promoter −124C>T (Chr 5:1,295,228 hg19 coordinate) mutation whereas the remaining 5 tumors retained the wild-type gene. The presence of the −124C>T mutation correlated with telomerase expression by TERT mRNA ISH. In summary, we demonstrated complex fusion transcripts and novel partner genes for BRAF by RNA sequencing of FFPE samples. The diversity of gene fusions demonstrated by RNA sequencing defines the molecular

  5. Pattern matching through Chaos Game Representation: bridging numerical and discrete data structures for biological sequence analysis

    PubMed Central

    2012-01-01

    Background Chaos Game Representation (CGR) is an iterated function that bijectively maps discrete sequences into a continuous domain. As a result, discrete sequences can be object of statistical and topological analyses otherwise reserved to numerical systems. Characteristically, CGR coordinates of substrings sharing an L-long suffix will be located within 2-L distance of each other. In the two decades since its original proposal, CGR has been generalized beyond its original focus on genomic sequences and has been successfully applied to a wide range of problems in bioinformatics. This report explores the possibility that it can be further extended to approach algorithms that rely on discrete, graph-based representations. Results The exploratory analysis described here consisted of selecting foundational string problems and refactoring them using CGR-based algorithms. We found that CGR can take the role of suffix trees and emulate sophisticated string algorithms, efficiently solving exact and approximate string matching problems such as finding all palindromes and tandem repeats, and matching with mismatches. The common feature of these problems is that they use longest common extension (LCE) queries as subtasks of their procedures, which we show to have a constant time solution with CGR. Additionally, we show that CGR can be used as a rolling hash function within the Rabin-Karp algorithm. Conclusions The analysis of biological sequences relies on algorithmic foundations facing mounting challenges, both logistic (performance) and analytical (lack of unifying mathematical framework). CGR is found to provide the latter and to promise the former: graph-based data structures for sequence analysis operations are entailed by numerical-based data structures produced by CGR maps, providing a unifying analytical framework for a diversity of pattern matching problems. PMID:22551152

  6. Comparison of Biologic Disease-Modifying Antirheumatic Drug Therapy Persistence Between Biologics Among Rheumatoid Arthritis Patients Switching from Another Biologic.

    PubMed

    Johnston, Stephen S; McMorrow, Donna; Farr, Amanda M; Juneau, Paul; Ogale, Sarika

    2015-06-01

    To compare biologic disease-modifying antirheumatic drug therapy persistence between biologics among patients with rheumatoid arthritis (RA) who previously used ≥1 other biologic. Using a large United States administrative claims dataset, we identified adult patients with RA initiating abatacept, adalimumab, certolizumab, etanercept, golimumab, infliximab, or tocilizumab between January 1, 2010 and January 1, 2012 (initiation date = index). Patients were required to have used ≥1 other biologic before index. Outcomes were biologic persistence, defined in two alternative ways: (1) time from initiation until switching to a different biologic (time to switch) and (2) time from initiation until switching or the first occurrence of a 90-day gap in treatment with the initiated biologic (time to switch/discontinuation). Rituximab was excluded from analyses due to retreatment based on clinical evaluation, which complicates the measurement of persistence. Multivariable survival analyses compared persistence outcomes between tocilizumab and the other biologics, adjusting for patient characteristics. The sample comprised 9,782 biologic initiations; mean age 54 years and 82% female. Compared with tocilizumab, the hazards of switching biologic therapy were significantly higher for abatacept [hazard ratio (HR) = 1.19, P = 0.041], adalimumab (HR = 1.39, P < 0.001), certolizumab (HR = 1.39, P < 0.001), golimumab (HR = 1.20, P = 0.047), and infliximab (HR = 1.33, P < 0.001), but not significantly different for etanercept (HR = 1.19, P = 0.095); the hazards of switching/discontinuing biologic therapy were significantly higher for adalimumab (HR = 1.16, P = 0.014) and certolizumab (HR = 1.15, P < 0.012), but not significantly different for abatacept (HR = 1.08, P = 0.229), etanercept (HR = 0.97, P = 0.644), golimumab (HR = 0.99, P = 0.829), and infliximab (HR = 0.97, P = 0.721). This is one of the first studies of biologic

  7. A novel sensitive method for the detection of user-defined compositional bias in biological sequences.

    PubMed

    Kuznetsov, Igor B; Hwang, Seungwoo

    2006-05-01

    Most biological sequences contain compositionally biased segments in which one or more residue types are significantly overrepresented. The function and evolution of these segments are poorly understood. Usually, all types of compositionally biased segments are masked and ignored during sequence analysis. However, it has been shown for a number of proteins that biased segments that contain amino acids with similar chemical properties are involved in a variety of molecular functions and human diseases. A detailed large-scale analysis of the functional implications and evolutionary conservation of different compositionally biased segments requires a sensitive method capable of detecting user-specified types of compositional bias. We present BIAS, a novel sensitive method for the detection of compositionally biased segments composed of a user-specified set of residue types. BIAS uses the discrete scan statistics that provides a highly accurate correction for multiple tests to compute analytical estimates of the significance of each compositionally biased segment. The method can take into account global compositional bias when computing analytical estimates of the significance of local clusters. BIAS is benchmarked against SEG, SAPS and CAST programs. We also use BIAS to show that groups of proteins with the same biological function are significantly associated with particular types of compositionally biased segments.

  8. Amphioxus mitochondrial DNA, chordate phylogeny, and the limits of inference based on comparisons of sequences.

    PubMed

    Naylor, G J; Brown, W M

    1998-03-01

    Analyses of both the nucleotide and amino acid sequences derived from all 13 mitochondrial protein-encoding genes (12,234 bp) of 19 metazoan species, including that of the lancelet Branchiostoma floridae ("amphioxus"), fail to yield the widely accepted phylogeny for chordates and, within chordates, for vertebrates. Given the breadth and the compelling nature of the data supporting that phylogeny, relationships supported by the mitochondrial sequence comparisons are almost certainly incorrect, despite their being supported by equally weighted parsimony, distance, and maximum-likelihood analyses. The incorrect groupings probably result in part from convergent base-compositional similarities among some of the taxa, similarities that are strong enough to overwhelm the historical signal. Comparisons among very distantly related taxa are likely to be particularly susceptible to such artifacts, because the historical signal is already greatly attenuated. Empirical results underscore the need for approaches to phylogenetic inference that go beyond simple site-by-site comparison of aligned sequences. This study and others indicate that, once a sequence sample of reasonable size has been obtained, accurate phylogenetic estimation may be better served by incorporating knowledge of molecular structures and processes into inference models and by seeking additional higher order characters embedded in those sequences, than by gathering ever larger sequence samples from the same organisms in he hope that the historical signal will eventually prevail.

  9. The Effects of Meiosis/Genetics Integration and Instructional Sequence on College Biology Student Achievement in Genetics.

    ERIC Educational Resources Information Center

    Browning, Mark

    The purpose of the research was to manipulate two aspects of genetics instruction in order to measure their effects on college, introductory biology students' achievement in genetics. One instructional sequence that was used dealt first with monohybrid autosomal inheritance patterns, then sex-linkage. The alternate sequence was the reverse.…

  10. Multifaceted biological insights from a draft genome sequence of the tobacco hornworm moth, Manduca sexta

    PubMed Central

    Kanost, Michael R.; Arrese, Estela L.; Cao, Xiaolong; Chen, Yun-Ru; Chellapilla, Sanjay; Goldsmith, Marian R; Grosse-Wilde, Ewald; Heckel, David G.; Herndon, Nicolae; Jiang, Haobo; Papanicolaou, Alexie; Qu, Jiaxin; Soulages, Jose L.; Vogel, Heiko; Walters, James; Waterhouse, Robert M.; Ahn, Seung-Joon; Almeida, Francisca C.; An, Chunju; Aqrawi, Peshtewani; Bretschneider, Anne; Bryant, William B.; Bucks, Sascha; Chao, Hsu; Chevignon, Germain; Christen, Jayne M.; Clarke, David F.; Dittmer, Neal T.; Ferguson, Laura C.F.; Garavelou, Spyridoula; Gordon, Karl H.J.; Gunaratna, Ramesh T.; Han, Yi; Hauser, Frank; He, Yan; Heidel-Fischer, Hanna; Hirsh, Ariana; Hu, Yingxia; Jiang, Hongbo; Kalra, Divya; Klinner, Christian; König, Christopher; Kovar, Christie; Kroll, Ashley R.; Kuwar, Suyog S.; Lee, Sandy L.; Lehman, Rüdiger; Li, Kai; Li, Zhaofei; Liang, Hanquan; Lovelace, Shanna; Lu, Zhiqiang; Mansfield, Jennifer H.; McCulloch, Kyle J.; Mathew, Tittu; Morton, Brian; Muzny, Donna M.; Neunemann, David; Ongeri, Fiona; Pauchet, Yannick; Pu, Ling-Ling; Pyrousis, Ioannis; Rao, Xiang-Jun; Redding, Amanda; Roesel, Charles; Sanchez-Gracia, Alejandro; Schaack, Sarah; Shukla, Aditi; Tetreau, Guillaume; Wang, Yang; Xiong, Guang-Hua; Traut, Walther; Walsh, Tom K.; Worley, Kim C.; Wu, Di; Wu, Wenbi; Wu, Yuan-Qing; Zhang, Xiufeng; Zou, Zhen; Zucker, Hannah; Briscoe, Adriana D.; Burmester, Thorsten; Clem, Rollie J.; Feyereisen, René; Grimmelikhuijzen, Cornelis J.P; Hamodrakas, Stavros J.; Hansson, Bill S.; Huguet, Elisabeth; Jermiin, Lars S.; Lan, Que; Lehman, Herman K.; Lorenzen, Marce; Merzendorfer, Hans; Michalopoulos, Ioannis; Morton, David B.; Muthukrishnan, Subbaratnam; Oakeshott, John G.; Palmer, Will; Park, Yoonseong; Passarelli, A. Lorena; Rozas, Julio; Schwartz, Lawrence M.; Smith, Wendy; Southgate, Agnes; Vilcinskas, Andreas; Vogt, Richard; Wang, Ping; Werren, John; Yu, Xiao-Qiang; Zhou, Jing-Jiang; Brown, Susan J.; Scherer, Steven E.; Richards, Stephen; Blissard, Gary W.

    2016-01-01

    Manduca sexta, known as the tobacco hornworm or Carolina sphinx moth, is a lepidopteran insect that is used extensively as a model system for research in insect biochemistry, physiology, neurobiology, development, and immunity. One important benefit of this species as an experimental model is its extremely large size, reaching more than 10 g in the larval stage. M. sexta larvae feed on solanaceous plants and thus must tolerate a substantial challenge from plant allelochemicals, including nicotine. We report the sequence and annotation of the M. sexta genome, and a survey of gene expression in various tissues and developmental stages. The Msex_1.0 genome assembly resulted in a total genome size of 419.4 Mbp. Repetitive sequences accounted for 25.8% of the assembled genome. The official gene set is comprised of 15,451 protein-coding genes, of which 2498 were manually curated. Extensive RNA-seq data from many tissues and developmental stages were used to improve gene models and for insights into gene expression patterns. Genome wide synteny analysis indicated a high level of macrosynteny in the Lepidoptera. Annotation and analyses were carried out for gene families involved in a wide spectrum of biological processes, including apoptosis, vacuole sorting, growth and development, structures of exoskeleton, egg shells, and muscle, vision, chemosensation, ion channels, signal transduction, neuropeptide signaling, neurotransmitter synthesis and transport, nicotine tolerance, lipid metabolism, and immunity. This genome sequence, annotation, and analysis provide an important new resource from a well-studied model insect species and will facilitate further biochemical and mechanistic experimental studies of many biological systems in insects. PMID:27522922

  11. Multifaceted biological insights from a draft genome sequence of the tobacco hornworm moth, Manduca sexta.

    PubMed

    Kanost, Michael R; Arrese, Estela L; Cao, Xiaolong; Chen, Yun-Ru; Chellapilla, Sanjay; Goldsmith, Marian R; Grosse-Wilde, Ewald; Heckel, David G; Herndon, Nicolae; Jiang, Haobo; Papanicolaou, Alexie; Qu, Jiaxin; Soulages, Jose L; Vogel, Heiko; Walters, James; Waterhouse, Robert M; Ahn, Seung-Joon; Almeida, Francisca C; An, Chunju; Aqrawi, Peshtewani; Bretschneider, Anne; Bryant, William B; Bucks, Sascha; Chao, Hsu; Chevignon, Germain; Christen, Jayne M; Clarke, David F; Dittmer, Neal T; Ferguson, Laura C F; Garavelou, Spyridoula; Gordon, Karl H J; Gunaratna, Ramesh T; Han, Yi; Hauser, Frank; He, Yan; Heidel-Fischer, Hanna; Hirsh, Ariana; Hu, Yingxia; Jiang, Hongbo; Kalra, Divya; Klinner, Christian; König, Christopher; Kovar, Christie; Kroll, Ashley R; Kuwar, Suyog S; Lee, Sandy L; Lehman, Rüdiger; Li, Kai; Li, Zhaofei; Liang, Hanquan; Lovelace, Shanna; Lu, Zhiqiang; Mansfield, Jennifer H; McCulloch, Kyle J; Mathew, Tittu; Morton, Brian; Muzny, Donna M; Neunemann, David; Ongeri, Fiona; Pauchet, Yannick; Pu, Ling-Ling; Pyrousis, Ioannis; Rao, Xiang-Jun; Redding, Amanda; Roesel, Charles; Sanchez-Gracia, Alejandro; Schaack, Sarah; Shukla, Aditi; Tetreau, Guillaume; Wang, Yang; Xiong, Guang-Hua; Traut, Walther; Walsh, Tom K; Worley, Kim C; Wu, Di; Wu, Wenbi; Wu, Yuan-Qing; Zhang, Xiufeng; Zou, Zhen; Zucker, Hannah; Briscoe, Adriana D; Burmester, Thorsten; Clem, Rollie J; Feyereisen, René; Grimmelikhuijzen, Cornelis J P; Hamodrakas, Stavros J; Hansson, Bill S; Huguet, Elisabeth; Jermiin, Lars S; Lan, Que; Lehman, Herman K; Lorenzen, Marce; Merzendorfer, Hans; Michalopoulos, Ioannis; Morton, David B; Muthukrishnan, Subbaratnam; Oakeshott, John G; Palmer, Will; Park, Yoonseong; Passarelli, A Lorena; Rozas, Julio; Schwartz, Lawrence M; Smith, Wendy; Southgate, Agnes; Vilcinskas, Andreas; Vogt, Richard; Wang, Ping; Werren, John; Yu, Xiao-Qiang; Zhou, Jing-Jiang; Brown, Susan J; Scherer, Steven E; Richards, Stephen; Blissard, Gary W

    2016-09-01

    Manduca sexta, known as the tobacco hornworm or Carolina sphinx moth, is a lepidopteran insect that is used extensively as a model system for research in insect biochemistry, physiology, neurobiology, development, and immunity. One important benefit of this species as an experimental model is its extremely large size, reaching more than 10 g in the larval stage. M. sexta larvae feed on solanaceous plants and thus must tolerate a substantial challenge from plant allelochemicals, including nicotine. We report the sequence and annotation of the M. sexta genome, and a survey of gene expression in various tissues and developmental stages. The Msex_1.0 genome assembly resulted in a total genome size of 419.4 Mbp. Repetitive sequences accounted for 25.8% of the assembled genome. The official gene set is comprised of 15,451 protein-coding genes, of which 2498 were manually curated. Extensive RNA-seq data from many tissues and developmental stages were used to improve gene models and for insights into gene expression patterns. Genome wide synteny analysis indicated a high level of macrosynteny in the Lepidoptera. Annotation and analyses were carried out for gene families involved in a wide spectrum of biological processes, including apoptosis, vacuole sorting, growth and development, structures of exoskeleton, egg shells, and muscle, vision, chemosensation, ion channels, signal transduction, neuropeptide signaling, neurotransmitter synthesis and transport, nicotine tolerance, lipid metabolism, and immunity. This genome sequence, annotation, and analysis provide an important new resource from a well-studied model insect species and will facilitate further biochemical and mechanistic experimental studies of many biological systems in insects.

  12. Sequence-, structure-, and dynamics-based comparisons of structurally homologous CheY-like proteins

    PubMed Central

    He, Yi; Maisuradze, Gia G.; Yin, Yanping; Kachlishvili, Khatuna; Rackovsky, S.; Scheraga, Harold A.

    2017-01-01

    We recently introduced a physically based approach to sequence comparison, the property factor method (PFM). In the present work, we apply the PFM approach to the study of a challenging set of sequences—the bacterial chemotaxis protein CheY, the N-terminal receiver domain of the nitrogen regulation protein NT-NtrC, and the sporulation response regulator Spo0F. These are all response regulators involved in signal transduction. Despite functional similarity and structural homology, they exhibit low sequence identity. PFM sequence comparison demonstrates a statistically significant qualitative difference between the sequence of CheY and those of the other two proteins that is not found using conventional alignment methods. This difference is shown to be consonant with structural characteristics, using distance matrix comparisons. We also demonstrate that residues participating strongly in native contacts during unfolding are distributed differently in CheY than in the other two proteins. The PFM result is also in accord with dynamic simulation results of several types. Molecular dynamics simulations of all three proteins were carried out at several temperatures, and it is shown that the dynamics of CheY are predicted to differ from those of NT-NtrC and Spo0F. The predicted dynamic properties of the three proteins are in good agreement with experimentally determined B factors and with fluctuations predicted by the Gaussian network model. We pinpoint the differences between the PFM and traditional sequence comparisons and discuss the informatic basis for the ability of the PFM approach to detect physical differences between these sequences that are not apparent from traditional alignment-based comparison. PMID:28143938

  13. HIGEDA: a hierarchical gene-set genetics based algorithm for finding subtle motifs in biological sequences.

    PubMed

    Le, Thanh; Altman, Tom; Gardiner, Katheleen

    2010-02-01

    Identification of motifs in biological sequences is a challenging problem because such motifs are often short, degenerate, and may contain gaps. Most algorithms that have been developed for motif-finding use the expectation-maximization (EM) algorithm iteratively. Although EM algorithms can converge quickly, they depend strongly on initialization parameters and can converge to local sub-optimal solutions. In addition, they cannot generate gapped motifs. The effectiveness of EM algorithms in motif finding can be improved by incorporating methods that choose different sets of initial parameters to enable escape from local optima, and that allow gapped alignments within motif models. We have developed HIGEDA, an algorithm that uses the hierarchical gene-set genetic algorithm (HGA) with EM to initiate and search for the best parameters for the motif model. In addition, HIGEDA can identify gapped motifs using a position weight matrix and dynamic programming to generate an optimal gapped alignment of the motif model with sequences from the dataset. We show that HIGEDA outperforms MEME and other motif-finding algorithms on both DNA and protein sequences. Source code and test datasets are available for download at http://ouray.cudenver.edu/~tnle/, implemented in C++ and supported on Linux and MS Windows.

  14. Biological treatment of shrimp aquaculture wastewater using a sequencing batch reactor.

    PubMed

    Lyles, C; Boopathy, R; Fontenot, Q; Kilgen, M

    2008-12-01

    To improve the water quality in the shrimp aquaculture, a sequencing batch reactor (SBR) has been tested for the treatment of shrimp wastewater. A SBR is a variation of the activated sludge biological treatment process. This process uses multiple steps in the same tank to take the place of multiple tanks in a conventional treatment system. The SBR accomplishes equalization, aeration, and clarification in a timed sequence in a single reactor basin. This is achieved in a simple tank, through sequencing stages, which include fill, react, settle, decant, and idle. A laboratory scale SBR and a pilot scale SBR was successfully operated using shrimp aquaculture wastewater. The wastewater contained high concentration of carbon and nitrogen. By operating the reactor sequentially, viz, aerobic and anoxic modes, nitrification and denitrification were achieved as well as removal of carbon in a laboratory scale SBR. To be specific, the initial chemical oxygen demand (COD) concentration of 1,593 mg/l was reduced to 44 mg/l within 10 days of reactor operation. Ammonia in the sludge was nitrified within 3 days. The denitrification of nitrate was achieved by the anaerobic process and 99% removal of nitrate was observed. Based on the laboratory study, a pilot scale SBR was designed and operated to remove excess nitrogen in the shrimp wastewater. The results mimicked the laboratory scale SBR.

  15. An introduction to Deep learning on biological sequence data - Examples and solutions.

    PubMed

    Jurtz, Vanessa Isabell; Rosenberg Johansen, Alexander; Nielsen, Morten; Almagro Armenteros, Jose Juan; Nielsen, Henrik; Kaae Sønderby, Casper; Winther, Ole; Kaae Sønderby, Søren

    2017-08-23

    Deep neural network architectures such as convolutional and long short-term memory networks have become increasingly popular as machine learning tools during the recent years. The availability of greater computational resources, more data, new algorithms for training deep models and easy to use libraries for implementation and training of neural networks are the drivers of this development. The use of deep learning has been especially successful in image recognition; and the development of tools, applications and code examples are in most cases centered within this field rather than within biology. Here, we aim to further the development of deep learning methods within biology by providing application examples and ready to apply and adapt code templates. Given such examples, we illustrate how architectures consisting of convolutional and long short-term memory neural networks can relatively easily be designed and trained to state-of-the-art performance on three biological sequence problems: prediction of subcellular localization, protein secondary structure and the binding of peptides to MHC Class II molecules. All implementations and datasets are available online to the scientific community at https://github.com/vanessajurtz/lasagne4bio . Supplementary data are available at Bioinformatics online.

  16. Treatment of textile effluent by chemical (Fenton's Reagent) and biological (sequencing batch reactor) oxidation.

    PubMed

    Rodrigues, Carmen S D; Madeira, Luis M; Boaventura, Rui A R

    2009-12-30

    The removal of organic compounds and colour from a synthetic effluent simulating a cotton dyeing wastewater was evaluated by using a combined process of Fenton's Reagent oxidation and biological degradation in a sequencing batch reactor (SBR). The experimental design methodology was first applied to the chemical oxidation process in order to determine the values of temperature, ferrous ion concentration and hydrogen peroxide concentration that maximize dissolved organic carbon (DOC) and colour removals and increase the effluent's biodegradability. Additional studies on the biological oxidation (SBR) of the raw and previously submitted to Fenton's oxidation effluent had been performed during 15 cycles (i.e., up to steady-state conditions), each one with the duration of 11.5h; Fenton's oxidation was performed either in conditions that maximize the colour removal or the increase in the biodegradability. The obtained results allowed concluding that the combination of the two treatment processes provides much better removals of DOC, BOD(5) and colour than the biological or chemical treatment alone. Moreover, the removal of organic matter in the integrated process is particularly effective when Fenton's pre-oxidation is carried out under conditions that promote the maximum increase in wastewater biodegradability.

  17. Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms.

    PubMed

    Galbadrakh, Bulgan; Lee, Kyung-Eun; Park, Hyun-Seok

    2012-12-01

    Grammatical inference methods are expected to find grammatical structures hidden in biological sequences. One hopes that studies of grammar serve as an appropriate tool for theory formation. Thus, we have developed JSequitur for automatically generating the grammatical structure of biological sequences in an inference framework of string compression algorithms. Our original motivation was to find any grammatical traits of several cancer genes that can be detected by string compression algorithms. Through this research, we could not find any meaningful unique traits of the cancer genes yet, but we could observe some interesting traits in regards to the relationship among gene length, similarity of sequences, the patterns of the generated grammar, and compression rate.

  18. Comparison of alignment software for genome-wide bisulphite sequence data

    PubMed Central

    Chatterjee, Aniruddha; Stockwell, Peter A.; Rodger, Euan J.; Morison, Ian M.

    2012-01-01

    Recent advances in next generation sequencing (NGS) technology now provide the opportunity to rapidly interrogate the methylation status of the genome. However, there are challenges in handling and interpretation of the methylation sequence data because of its large volume and the consequences of bisulphite modification. We sequenced reduced representation human genomes on the Illumina platform and efficiently mapped and visualized the data with different pipelines and software packages. We examined three pipelines for aligning bisulphite converted sequencing reads and compared their performance. We also comment on pre-processing and quality control of Illumina data. This comparison highlights differences in methods for NGS data processing and provides guidance to advance sequence-based methylation data analysis for molecular biologists. PMID:22344695

  19. tuple_plot: fast pairwise nucleotide sequence comparison with noise suppression.

    PubMed

    Szafranski, Karol; Jahn, Niels; Platzer, Matthias

    2006-08-01

    The program tuple_plot identifies and visualizes local similarities between two genomic sequences, typically 100 kb or longer, by applying the well-known dotplot principle. A dictionary of sequence words built from the input sequences serves to construct a task-specific expectancy model that is used to attribute significance values to pairwise word hits. The dictionary-based approach allows fast computation, the computation time scaling to O(N log N), depending on the size of the input sequences. The proposed scoring scheme appreciably increases the signal-to-noise ratio and may help to improve other word-based sequence comparison approaches. tuple_plot is available at http://genome.fli-leibniz.de/software.html and may be used under GNU public license.

  20. A Comparison of Base-calling Algorithms for Illumina Sequencing Technology.

    PubMed

    Cacho, Ashley; Smirnova, Ekaterina; Huzurbazar, Snehalata; Cui, Xinping

    2016-09-01

    Recent advances in next-generation sequencing technology have yielded increasing cost-effectiveness and higher throughput produced per run, in turn, greatly influencing the analysis of DNA sequences. Among the various sequencing technologies, Illumina is by far the most widely used platform. However, the Illumina sequencing platform suffers from several imperfections that can be attributed to the chemical processes inherent to the sequencing-by-synthesis technology. With the enormous amounts of reads produced, statistical methodologies and computationally efficient algorithms are required to improve the accuracy and speed of base-calling. Over the past few years, several papers have proposed methods to model the various imperfections, giving rise to accurate and/or efficient base-calling algorithms. In this article, we provide a comprehensive comparison of the performance of recently developed base-callers and we present a general statistical model that unifies a large majority of these base-callers.

  1. Comparison of Biology Student Performance in Quarter and Semester Systems

    ERIC Educational Resources Information Center

    Gibbens, Brian; Williams, Mary A.; Strain, Anna K.; Hoff, Courtney D. M.

    2015-01-01

    Curricula at most colleges and universities in the United States are scheduled according to quarters or semesters. While each schedule has several potential advantages over the other, it is unclear what effect each has on student performance. This study compares biology student performance during the two and a half years before and after the 1999…

  2. Comparison and quantitative verification of mapping algorithms for whole genome bisulfite sequencing

    USDA-ARS?s Scientific Manuscript database

    Coupling bisulfite conversion with next-generation sequencing (Bisulfite-seq) enables genome-wide measurement of DNA methylation, but poses unique challenges for mapping. However, despite a proliferation of Bisulfite-seq mapping tools, no systematic comparison of their genomic coverage and quantitat...

  3. A COMPARISON OF THREE MODES OF PRESENTING A PROGRAMED INSTRUCTION SEQUENCE.

    ERIC Educational Resources Information Center

    EIGEN, LEWIS D.; AND OTHERS

    A COMPARISON IS MADE BETWEEN THE TEACHING MACHINE AND HORIZONTAL AND VERTICAL TEXT FORMATS. A TEACHING MACHINE PRESENTS AN ORDERED SEQUENCE OF INSTRUCTION TO THE LEARNER, ONE FRAME AT A TIME. AFTER RESPONDING TO A STIMULUS FRAME, THE LEARNER'S ANSWER IS IMMEDIATELY CONFIRMED OR CORRECTED. THE LEARNER PROCEEDS TO THE NEXT FRAME AND IS PREVENTED…

  4. Comparison of solution-based exome capture methods for next generation sequencing

    PubMed Central

    2011-01-01

    Background Techniques enabling targeted re-sequencing of the protein coding sequences of the human genome on next generation sequencing instruments are of great interest. We conducted a systematic comparison of the solution-based exome capture kits provided by Agilent and Roche NimbleGen. A control DNA sample was captured with all four capture methods and prepared for Illumina GAII sequencing. Sequence data from additional samples prepared with the same protocols were also used in the comparison. Results We developed a bioinformatics pipeline for quality control, short read alignment, variant identification and annotation of the sequence data. In our analysis, a larger percentage of the high quality reads from the NimbleGen captures than from the Agilent captures aligned to the capture target regions. High GC content of the target sequence was associated with poor capture success in all exome enrichment methods. Comparison of mean allele balances for heterozygous variants indicated a tendency to have more reference bases than variant bases in the heterozygous variant positions within the target regions in all methods. There was virtually no difference in the genotype concordance compared to genotypes derived from SNP arrays. A minimum of 11× coverage was required to make a heterozygote genotype call with 99% accuracy when compared to common SNPs on genome-wide association arrays. Conclusions Libraries captured with NimbleGen kits aligned more accurately to the target regions. The updated NimbleGen kit most efficiently covered the exome with a minimum coverage of 20×, yet none of the kits captured all the Consensus Coding Sequence annotated exons. PMID:21955854

  5. Genome sequencing and systems biology analysis of a lipase-producing bacterial strain.

    PubMed

    Li, N; Li, D D; Zhang, Y Z; Yuan, Y Z; Geng, H; Xiong, L; Liu, D L

    2016-03-18

    Lipase-producing bacteria are naturally-occurring, industrially-relevant microorganisms that produce lipases, which can be used to synthesize biodiesel from waste oils. The efficiency of lipase expression varies between various microbial strains. Therefore, strains that can produce lipases with high efficiency must be screened, and the conditions of lipase metabolism and optimization of the production process in a given environment must be thoroughly studied. A high efficiency lipase-producing strain was isolated from the sediments of Jinsha River, identified by 16S rRNA sequence analysis as Serratia marcescens, and designated as HS-L5. A schematic diagram of the genome sequence was constructed by high-throughput genome sequencing. A series of genes related to lipid degradation were identified by functional gene annotation through sequence homology analysis. A genome-scale metabolic model of HS-ML5 was constructed using systems biology techniques. The model consisted of 1722 genes and 1567 metabolic reactions. The topological graph of the genome-scale metabolic model was compared to that of conventional metabolic pathways using a visualization software and KEGG database. The basic components and boundaries of the tributyrin degradation subnetwork were determined, and its flux balance analyzed using Matlab and COBRA Toolbox to simulate the effects of different conditions on the catalytic efficiency of lipases produced by HS-ML5. We proved that the catalytic activity of microbial lipases was closely related to the carbon metabolic pathway. As production and catalytic efficiency of lipases varied greatly with the environment, the catalytic efficiency and environmental adaptability of microbial lipases can be improved by proper control of the production conditions.

  6. Beyond Linear Sequence Comparisons: The use of genome-levelcharacters for phylogenetic reconstruction

    SciTech Connect

    Boore, Jeffrey L.

    2004-11-27

    Although the phylogenetic relationships of many organisms have been convincingly resolved by the comparisons of nucleotide or amino acid sequences, others have remained equivocal despite great effort. Now that large-scale genome sequencing projects are sampling many lineages, it is becoming feasible to compare large data sets of genome-level features and to develop this as a tool for phylogenetic reconstruction that has advantages over conventional sequence comparisons. Although it is unlikely that these will address a large number of evolutionary branch points across the broad tree of life due to the infeasibility of such sampling, they have great potential for convincingly resolving many critical, contested relationships for which no other data seems promising. However, it is important that we recognize potential pitfalls, establish reasonable standards for acceptance, and employ rigorous methodology to guard against a return to earlier days of scenario-driven evolutionary reconstructions.

  7. Comparison of biological and molecular characterization of Iranian lettuce mosaic virus isolates.

    PubMed

    Ormaz, B; Winter, S; Koohi-Habibi, M; Mosahebi, Gh; Izadpanah, K

    2006-01-01

    Lettuce mosaic virus (LMV) is one of the most damaging viruses in lettuce and endive cultivating regions. In order to review the characteristics of different LMV isolates of Iran during 2004-2005 samples were collected from lettuce fields in Esfahan, Ghom, Khorasan, Khuzestan and Tehran provinces. All of the isolates were detected by LMV polyclonal antiserum (AS-0155, DSMZ Germany) in ELISA and TIPA tests. Biological purification was done for the LMV isolates and then they were maintained and propagated on Chenopodium quinoa. A range of plant species such as C. amaranticolor, C. album, Carthamus tinctorius, Gazania sp., Gomphrena globosa, Pisum sativum, Spinacia oleracea were inoculated with these isolates using potassium phosphate buffer (0/05M). Molecular weight of coat protein was determined by Polyacrylamid gel electrophoresis (PAGE). Immunocapture reverse transcription polymerase chain reaction (IC-RT-PCR) was performed using LMV polyclonal antiserum and specific primer pairs of LMV as described by Zerbini et al. (1995). The amplified fragments were included the whole CP and 3'UTR regions and the nucleotide sequences of them determined. All isolates induced chlorotic local lesions on C. amaranticolor and chlorotic local lesions with symptoms of systemic infection (vein clearing) on C. album. Tehran isolate in addition, caused local lesions on Gomphrena globosa with red border and white centre. This isolate infected Pisum sativum without any symptoms. Back inoculation on C. quinoa and DAS-ELISA confirmed the latent infection. None of these isolates infected Carthamus tinctorius, Gazania sp. and Spinacia oleracea. The molecular weight of coat protein was determined 30.33 kDa. Western-blot proved this band as the coat protein of the virus. IC-RT-PCR amplification of LMV isolates produced the expected size IC-RT-PCR product of 1300 bps. The comparison of nucleotide sequences showed that there were 98% identities.

  8. Comparison of approaches for parameter identifiability analysis of biological systems.

    PubMed

    Raue, Andreas; Karlsson, Johan; Saccomani, Maria Pia; Jirstrand, Mats; Timmer, Jens

    2014-05-15

    Modeling of dynamical systems using ordinary differential equations is a popular approach in the field of Systems Biology. The amount of experimental data that are used to build and calibrate these models is often limited. In this setting, the model parameters may not be uniquely determinable. Structural or a priori identifiability is a property of the system equations that indicates whether, in principle, the unknown model parameters can be determined from the available data. We performed a case study using three current approaches for structural identifiability analysis for an application from cell biology. The approaches are conceptually different and are developed independently. The results of the three approaches are in agreement. We discuss strength and weaknesses of each of them and illustrate how they can be applied to real world problems. For application of the approaches to further applications, code representations (DAISY, Mathematica and MATLAB) for benchmark model and data are provided on the authors webpage. andreas.raue@fdm.uni-freiburg.de.

  9. Patome: a database server for biological sequence annotation and analysis in issued patents and published patent applications

    PubMed Central

    Lee, Byungwook; Kim, Taehyung; Kim, Seon-Kyu; Lee, Kwang H.; Lee, Doheon

    2007-01-01

    With the advent of automated and high-throughput techniques, the number of patent applications containing biological sequences has been increasing rapidly. However, they have attracted relatively little attention compared to other sequence resources. We have built a database server called Patome, which contains biological sequence data disclosed in patents and published applications, as well as their analysis information. The analysis is divided into two steps. The first is an annotation step in which the disclosed sequences were annotated with RefSeq database. The second is an association step where the sequences were linked to Entrez Gene, OMIM and GO databases, and their results were saved as a gene–patent table. From the analysis, we found that 55% of human genes were associated with patenting. The gene–patent table can be used to identify whether a particular gene or disease is related to patenting. Patome is available at ; the information is updated bimonthly. PMID:17085479

  10. SinicView: a visualization environment for comparisons of multiple nucleotide sequence alignment tools.

    PubMed

    Shih, Arthur Chun-Chieh; Lee, D T; Lin, Laurent; Peng, Chin-Lin; Chen, Shiang-Heng; Wu, Yu-Wei; Wong, Chun-Yi; Chou, Meng-Yuan; Shiao, Tze-Chang; Hsieh, Mu-Fen

    2006-03-02

    Deluged by the rate and complexity of completed genomic sequences, the need to align longer sequences becomes more urgent, and many more tools have thus been developed. In the initial stage of genomic sequence analysis, a biologist is usually faced with the questions of how to choose the best tool to align sequences of interest and how to analyze and visualize the alignment results, and then with the question of whether poorly aligned regions produced by the tool are indeed not homologous or are just results due to inappropriate alignment tools or scoring systems used. Although several systematic evaluations of multiple sequence alignment (MSA) programs have been proposed, they may not provide a standard-bearer for most biologists because those poorly aligned regions in these evaluations are never discussed. Thus, a tool that allows cross comparison of the alignment results obtained by different tools simultaneously could help a biologist evaluate their correctness and accuracy. In this paper, we present a versatile alignment visualization system, called SinicView, (for Sequence-aligning INnovative and Interactive Comparison VIEWer), which allows the user to efficiently compare and evaluate assorted nucleotide alignment results obtained by different tools. SinicView calculates similarity of the alignment outputs under a fixed window using the sum-of-pairs method and provides scoring profiles of each set of aligned sequences. The user can visually compare alignment results either in graphic scoring profiles or in plain text format of the aligned nucleotides along with the annotations information. We illustrate the capabilities of our visualization system by comparing alignment results obtained by MLAGAN, MAVID, and MULTIZ, respectively. With SinicView, users can use their own data sequences to compare various alignment tools or scoring systems and select the most suitable one to perform alignment in the initial stage of sequence analysis.

  11. Detection of Weakly Conserved Ancestral Mammalian RegulatorySequences by Primate Comparisons

    SciTech Connect

    Wang, Qian-fei; Prabhakar, Shyam; Chanan, Sumita; Cheng,Jan-Fang; Rubin, Edward M.; Boffelli, Dario

    2006-06-01

    Genomic comparisons between human and distant, non-primatemammals are commonly used to identify cis-regulatory elements based onconstrained sequence evolution. However, these methods fail to detectcryptic functional elements, which are too weakly conserved among mammalsto distinguish from nonfunctional DNA. To address this problem, weexplored the potential of deep intra-primate sequence comparisons. Wesequenced the orthologs of 558 kb of human genomic sequence, coveringmultiple loci involved in cholesterol homeostasis, in 6 nonhumanprimates. Our analysis identified 6 noncoding DNA elements displayingsignificant conservation among primates, but undetectable in more distantcomparisons. In vitro and in vivo tests revealed that at least three ofthese 6 elements have regulatory function. Notably, the mouse orthologsof these three functional human sequences had regulatory activity despitetheir lack of significant sequence conservation, indicating that they arecryptic ancestral cis-regulatory elements. These regulatory elementscould still be detected in a smaller set of three primate speciesincluding human, rhesus and marmoset. Since the human and rhesus genomesequences are already available, and the marmoset genome is activelybeing sequenced, the primate-specific conservation analysis describedhere can be applied in the near future on a whole-genome scale, tocomplement the annotation provided by more distant speciescomparisons.

  12. Chaotic motif sampler: detecting motifs from biological sequences by using chaotic neurodynamics

    NASA Astrophysics Data System (ADS)

    Matsuura, Takafumi; Ikeguchi, Tohru

    Identification of a region in biological sequences, motif extraction problem (MEP) is solved in bioinformatics. However, the MEP is an NP-hard problem. Therefore, it is almost impossible to obtain an optimal solution within a reasonable time frame. To find near optimal solutions for NP-hard combinatorial optimization problems such as traveling salesman problems, quadratic assignment problems, and vehicle routing problems, chaotic search, which is one of the deterministic approaches, has been proposed and exhibits better performance than stochastic approaches. In this paper, we propose a new alignment method that employs chaotic dynamics to solve the MEPs. It is called the Chaotic Motif Sampler. We show that the performance of the Chaotic Motif Sampler is considerably better than that of the conventional methods such as the Gibbs Site Sampler and the Neighborhood Optimization for Multiple Alignment Discovery.

  13. The Molecular Revolution in Cutaneous Biology: Era of Next-Generation Sequencing.

    PubMed

    Sarig, Ofer; Sprecher, Eli

    2017-05-01

    Like any true conceptual revolution, next-generation sequencing (NGS) has not only radically changed research and clinical practice, it has also modified scientific culture. With the possibility to investigate DNA contents of any organism and in any context, including in somatic disorders or in tissues carrying complex microbial populations, it initially seemed as if the genetic underpinning of any biological phenomenon could now be deciphered in an almost streamlined fashion. However, over the past recent years, we have once again come to understand that there is no such a thing as great opportunities without great challenges. The steadily expanding use of NGS and related applications is now facing biologists and physicians with novel technological obstacles, analytical hurdles and increasingly pressing ethical questions. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  14. Modeling RNA Secondary Structure with Sequence Comparison and Experimental Mapping Data.

    PubMed

    Tan, Zhen; Sharma, Gaurav; Mathews, David H

    2017-07-25

    Secondary structure prediction is an important problem in RNA bioinformatics because knowledge of structure is critical to understanding the functions of RNA sequences. Significant improvements in prediction accuracy have recently been demonstrated though the incorporation of experimentally obtained structural information, for instance using selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) mapping. However, such mapping data is currently available only for a limited number of RNA sequences. In this article, we present a method for extending the benefit of experimental mapping data in secondary structure prediction to homologous sequences. Specifically, we propose a method for integrating experimental mapping data into a comparative sequence analysis algorithm for secondary structure prediction of multiple homologs, whereby the mapping data benefits not only the prediction for the specific sequence that was mapped but also other homologs. The proposed method is realized by modifying the TurboFold II algorithm for prediction of RNA secondary structures to utilize basepairing probabilities guided by SHAPE experimental data when such data are available. The SHAPE-mapping-guided basepairing probabilities are obtained using the RSample method. Results demonstrate that the SHAPE mapping data for a sequence improves structure prediction accuracy of other homologous sequences beyond the accuracy obtained by sequence comparison alone (TurboFold II). The updated version of TurboFold II is freely available as part of the RNAstructure software package. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  15. Enhanced methods for unbiased deep sequencing of Lassa and Ebola RNA viruses from clinical and biological samples.

    PubMed

    Matranga, Christian B; Andersen, Kristian G; Winnicki, Sarah; Busby, Michele; Gladden, Adrianne D; Tewhey, Ryan; Stremlau, Matthew; Berlin, Aaron; Gire, Stephen K; England, Eleina; Moses, Lina M; Mikkelsen, Tarjei S; Odia, Ikponmwonsa; Ehiane, Philomena E; Folarin, Onikepe; Goba, Augustine; Kahn, S Humarr; Grant, Donald S; Honko, Anna; Hensley, Lisa; Happi, Christian; Garry, Robert F; Malboeuf, Christine M; Birren, Bruce W; Gnirke, Andreas; Levin, Joshua Z; Sabeti, Pardis C

    2014-01-01

    We have developed a robust RNA sequencing method for generating complete de novo assemblies with intra-host variant calls of Lassa and Ebola virus genomes in clinical and biological samples. Our method uses targeted RNase H-based digestion to remove contaminating poly(rA) carrier and ribosomal RNA. This depletion step improves both the quality of data and quantity of informative reads in unbiased total RNA sequencing libraries. We have also developed a hybrid-selection protocol to further enrich the viral content of sequencing libraries. These protocols have enabled rapid deep sequencing of both Lassa and Ebola virus and are broadly applicable to other viral genomics studies.

  16. Comparison of latent and nominal rabbit Ig VHa1 allotype cDNA sequences.

    PubMed

    McCormack, W T; Dhanarajan, P; Roux, K H

    1988-09-15

    The genetic basis for the expression of a latent VH allotype in the rabbit was investigated. VH region cDNA libraries were produced from spleen mRNA derived from a homozygous a2a2 rabbit expressing an induced latent VHa1 allotype and, for comparison, from a normal homozygus a1a1 rabbit expressing nominal VHa1 allotype. The deduced amino acid sequences of the nominal VHa1 cDNA were concordant with previously published VHa1 protein sequences. A comparison of two complete VH-DH-JH and six partial VHa1 sequences reveals highly conserved sequence within VH framework regions (FR) and considerable diversity in complementarity-determining regions and D region sequences. Two functional JH genes or alleles are evident. Amino acid sequencing of the N-terminal 15 residues of pooled affinity-purified latent VHa1 H chain showed complete sequence identity with the nominal VHa1 sequences. Possible latent VHa1-encoding cDNA clones, derived from the a2a2 rabbit, were selected by hybridization with oligonucleotide probes corresponding to the VHa1 allotype-associated segments of the first and third framework regions (FR1 and FR3). cDNA sequence analysis reveals that the 5' untranslated regions of nominal and latent VHa1 cDNA were virtually identical to each other and to previously reported sequences associated with VHa2 and VHa-negative genes. Moreover, some latent VHa1 genes encode FR1 segments that are essentially homologous to the corresponding segment of a nominal VHa1 allotype. In contrast, other putative latent genes display blocks of VHa1 sequence in either FR1 or FR3 that are flanked by blocks of sequence identical to other rabbit VH genes (i.e., VHa2 or VHa-negative). These composite sequences may be directly encoded by composite germ-line VH genes or may be the products of somatically generated recombination or gene conversion between genes encoding latent and nominal allotypes. The data do not support the hypothesis that latent genes are the result of extensive modification

  17. Effects of idle time on biological phosphorus removal by sequencing batch reactors.

    PubMed

    Gao, Dawen; Yin, Hang; Liu, Lin; Li, Xing; Liang, Hong

    2013-12-01

    Three identical sequencing batch reactors (SBRs) were operated to investigate the effects of various idle times on the biological phosphorus (P) removal. The idle times were set to 3 hr (R1), 10 hr (R2) and 17 hr (R3). The results showed that the idle time of a SBR had potential impact on biological phosphorus removal, especially when the influent phosphorus concentration increased. The phosphorus removal efficiencies of the R2 and R3 systems declined dramatically compared with the stable R1 system, and the P-release and P-uptake rates of the R3 system in particular decreased dramatically. The PCR-DGGE analysis showed that uncultured Pseudomonas sp. (GQ183242.1) and beta-Proteobacteria (AY823971) were the dominant phosphorus removal bacteria for the R1 and R2 systems, while uncultured gamma-Proteobacteria were the dominant phosphorus removal bacteria for the R3 system. Glycogen-accumulating organisms (GAOs), such as uncultured Sphingomonas sp. (AM889077), were found in the R2 and R3 systems. Overall, the R1 system was the most stable and exhibited the best phosphorus removal efficiency. It was found that although the idle time can be prolonged to allow the formation of intracellular polymers when the phosphorus concentration of the influent is low, systems with a long idle time can become unstable when the influent phosphorus concentration is increased.

  18. PAirwise Sequence Comparison (PASC) and its application in the classification of filoviruses.

    PubMed

    Bao, Yiming; Chetvernin, Vyacheslav; Tatusova, Tatiana

    2012-08-01

    PAirwise Sequence Comparison (PASC) is a tool that uses genome sequence similarity to help with virus classification. The PASC tool at NCBI uses two methods: local alignment based on BLAST and global alignment based on Needleman-Wunsch algorithm. It works for complete genomes of viruses of several families/groups, and for the family of Filoviridae, it currently includes 52 complete genomes available in GenBank. It has been shown that BLAST-based alignment approach works better for filoviruses, and therefore is recommended for establishing taxon demarcations criteria. When more genome sequences with high divergence become available, these demarcation will most likely become more precise. The tool can compare new genome sequences of filoviruses with the ones already in the database, and propose their taxonomic classification.

  19. A national comparison of biochemistry and molecular biology capstone experiences.

    PubMed

    Aguanno, Ann; Mertz, Pamela; Martin, Debra; Bell, Ellis

    2015-01-01

    Recognizing the increasingly integrative nature of the molecular life sciences, the American Society for Biochemistry and Molecular Biology (ASBMB) recommends that Biochemistry and Molecular Biology (BMB) programs develop curricula based on concepts, content, topics, and expected student outcomes, rather than courses. To that end, ASBMB conducted a series of regional workshops to build a BMB Concept Inventory containing validated assessment tools, based on foundational and discipline-specific knowledge and essential skills, for the community to use. A culminating activity, which integrates the educational experience, is often part of undergraduate molecular life science programs. These "capstone" experiences are commonly defined as an attempt to measure student ability to synthesize and integrate acquired knowledge. However, the format, implementation, and approach to outcome assessment of these experiences are quite varied across the nation. Here we report the results of a nation-wide survey on BMB capstone experiences and discuss this in the context of published reports about capstones and the findings of the workshops driving the development of the BMB Concept Inventory. Both the survey results and the published reports reveal that, although capstone practices do vary, certain formats for the experience are used more frequently and similarities in learning objectives were identified. The use of rubrics to measure student learning is also regularly reported, but details about these assessment instruments are sparse in the literature and were not a focus of our survey. Finally, we outline commonalities in the current practice of capstones and suggest the next steps needed to elucidate best practices.

  20. Structural comparison of biological networks based on dominant vertices.

    PubMed

    Luna, Beatriz; Galán-Vásquez, Edgardo; Ugalde, Edgardo; Martínez-Antonio, Agustino

    2013-07-01

    It is a current practice to organize biological data in a network structure where vertices represent biological components and arrows represent their interactions. A great diversity of graph theoretical notions, such as clustering coefficient, network motifs, centrality, degree distribution, etc., have been developed in order to characterize the structure of these networks. However, none of the existent characterizations allow us to determine global similarity among networks of different sizes. It is the aim of the present paper to introduce a mathematical tool to compare networks not only with regard to their topological structure, but also in their dynamical capabilities. For this reason we aim to propose a pseudo-distance between networks, built around the notions of determination and dominancy, concepts recently introduced in the context of regulatory dynamics on networks. We use our proposed pseudo-distance to compare networks from the following bacteria: E. coli, B. subtilis, P. aeruginosa, M. tuberculosis, S. aureus and C. glutamicum. We also use this pseudo-distance to compare these real bacterial networks with equivalent homogeneous, scale-free and geometric three dimensional random networks. We found that even when bacterial networks are characterized with different levels of detail, have different sizes and represent different aspects of the organisms, the proposed pseudo-distance captures all these characteristics, and indicates how similar they are or not from random networks.

  1. Biological Characterization and Next-Generation Genome Sequencing of the Unclassified Cotia Virus SPAn232 (Poxviridae)

    PubMed Central

    Afonso, Priscila P.; Silva, Patrícia M.; Schnellrath, Laila C.; Jesus, Desyreé M.; Hu, Jianhong; Yang, Yajie; Renne, Rolf; Attias, Marcia; Condit, Richard C.; Moussatché, Nissin

    2012-01-01

    Cotia virus (COTV) SPAn232 was isolated in 1961 from sentinel mice at Cotia field station, São Paulo, Brazil. Attempts to classify COTV within a recognized genus of the Poxviridae have generated contradictory findings. Studies by different researchers suggested some similarity to myxoma virus and swinepox virus, whereas another investigation characterized COTV SPAn232 as a vaccinia virus strain. Because of the lack of consensus, we have conducted an independent biological and molecular characterization of COTV. Virus growth curves reached maximum yields at approximately 24 to 48 h and were accompanied by virus DNA replication and a characteristic early/late pattern of viral protein synthesis. Interestingly, COTV did not induce detectable cytopathic effects in BSC-40 cells until 4 days postinfection and generated viral plaques only after 8 days. We determined the complete genomic sequence of COTV by using a combination of the next-generation DNA sequencing technologies 454 and Illumina. A unique contiguous sequence of 185,139 bp containing 185 genes, including the 90 genes conserved in all chordopoxviruses, was obtained. COTV has an interesting panel of open reading frames (ORFs) related to the evasion of host defense, including two novel genes encoding C-C chemokine-like proteins, each present in duplicate copies. Phylogenetic analysis revealed the highest amino acid identity scores with Cervidpoxvirus, Capripoxvirus, Suipoxvirus, Leporipoxvirus, and Yatapoxvirus. However, COTV grouped as an independent branch within this clade, which clearly excluded its classification as an Orthopoxvirus. Therefore, our data suggest that COTV could represent a new poxvirus genus. PMID:22345477

  2. iPBA: a tool for protein structure comparison using sequence alignment strategies

    PubMed Central

    Gelly, Jean-Christophe; Joseph, Agnel Praveen; Srinivasan, Narayanaswamy; de Brevern, Alexandre G.

    2011-01-01

    With the immense growth in the number of available protein structures, fast and accurate structure comparison has been essential. We propose an efficient method for structure comparison, based on a structural alphabet. Protein Blocks (PBs) is a widely used structural alphabet with 16 pentapeptide conformations that can fairly approximate a complete protein chain. Thus a 3D structure can be translated into a 1D sequence of PBs. With a simple Needleman–Wunsch approach and a raw PB substitution matrix, PB-based structural alignments were better than many popular methods. iPBA web server presents an improved alignment approach using (i) specialized PB Substitution Matrices (SM) and (ii) anchor-based alignment methodology. With these developments, the quality of ∼88% of alignments was improved. iPBA alignments were also better than DALI, MUSTANG and GANGSTA+ in >80% of the cases. The webserver is designed to for both pairwise comparisons and database searches. Outputs are given as sequence alignment and superposed 3D structures displayed using PyMol and Jmol. A local alignment option for detecting subs-structural similarity is also embedded. As a fast and efficient ‘sequence-based’ structure comparison tool, we believe that it will be quite useful to the scientific community. iPBA can be accessed at http://www.dsimb.inserm.fr/dsimb_tools/ipba/. PMID:21586582

  3. Comparison between geochemical and biological estimates of subsurface microbial activities.

    PubMed

    Phelps, T J; Murphy, E M; Pfiffner, S M; White, D C

    1994-01-01

    Geochemical and biological estimates of in situ microbial activities were compared from the aerobic and microaerophilic sediments of the Atlantic Coastal Plain. Radioisotope time-course experiments suggested oxidation rates greater than millimolar quantities per year for acetate and glucose. Geochemical analyses assessing oxygen consumption, soluble organic carbon utilization, sulfate reduction, and carbon dioxide production suggested organic oxidation rates of nano- to micromolar quantities per year. Radiotracer timecourse experiments appeared to overestimate rates of organic carbon oxidation, sulfate reduction, and biomass production by a factor of 10(3)-10(6) greater than estimates calculated from groundwater analyses. Based on the geochemical evidence, in situ microbial metabolism was estimated to be in the nano- to micromolar range per year, and the average doubling time for the microbial community was estimated to be centuries.

  4. rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison

    PubMed Central

    Hahn, Lars; Leimeister, Chris-André; Morgenstern, Burkhard

    2016-01-01

    Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don’t-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set. We propose a modified hill-climbing algorithm to optimize pattern sets for database searching, read mapping and alignment-free sequence comparison of nucleic-acid sequences; our implementation of this algorithm is called rasbhari. Depending on the application at hand, rasbhari can either minimize the overlap complexity of pattern sets, maximize their sensitivity in database searching or minimize the variance of the number of pattern-based matches in alignment-free sequence comparison. We show that, for database searching, rasbhari generates pattern sets with slightly higher sensitivity than existing approaches. In our Spaced Words approach to alignment-free sequence comparison, pattern sets calculated with rasbhari led to more accurate estimates of phylogenetic distances than the randomly generated pattern sets that we previously used. Finally, we used rasbhari to generate patterns for short read classification with CLARK-S. Here too, the sensitivity of the results could be improved, compared to the default patterns of the program. We integrated rasbhari into Spaced Words; the source code of rasbhari is freely available at http://rasbhari.gobics.de/ PMID:27760124

  5. Cytochrome oxidase subunit III from Arbacia lixula: detection of functional constraints by comparison with homologous sequences.

    PubMed

    De Giorgi, C; Martiradonna, A; Saccone, C

    1993-01-01

    In this paper we report the comparison of the sequences of the cytochrome oxidase subunit III from three different sea urchin species. Both nucleotide and amino acid sequences have been analyzed. The nucleotide sequence analysis reveals that the sea urchin sequences obey some rules already found in mammals. The base substitution analysis carried out on the sequences of the three species pairs, shows that the evolutionary dynamics of the first and the second codon positions are so slow that do not allow a quantitative measurement of their genetic distances, thus demonstrating that also in these species the COIII gene is strongly conserved during evolution. Changes occurring at the third codon positions indicate that the three species evolved from a common ancestor under different directional mutational pressure. The multi-alignment of the sea urchin proteins indicates the existence of the amino acid sequence motif N R T that represents a possible glycosylation site. Another glycosylation site has been detected in the mammalian cytochrome oxidase subunit III, in a position slightly different. Such an analysis revealed, for the first time, a new functional aspect of this sequence.

  6. mtDNAprofiler: a Web application for the nomenclature and comparison of human mitochondrial DNA sequences.

    PubMed

    Yang, In Seok; Lee, Hwan Young; Yang, Woo Ick; Shin, Kyoung-Jin

    2013-07-01

    Mitochondrial DNA (mtDNA) is a valuable tool in the fields of forensic, population, and medical genetics. However, recording and comparing mtDNA control region or entire genome sequences would be difficult if researchers are not familiar with mtDNA nomenclature conventions. Therefore, mtDNAprofiler, a Web application, was designed for the analysis and comparison of mtDNA sequences in a string format or as a list of mtDNA single-nucleotide polymorphisms (mtSNPs). mtDNAprofiler which comprises four mtDNA sequence-analysis tools (mtDNA nomenclature, mtDNA assembly, mtSNP conversion, and mtSNP concordance-check) supports not only the accurate analysis of mtDNA sequences via an automated nomenclature function, but also consistent management of mtSNP data via direct comparison and validity-check functions. Since mtDNAprofiler consists of four tools that are associated with key steps of mtDNA sequence analysis, mtDNAprofiler will be helpful for researchers working with mtDNA. mtDNAprofiler is freely available at http://mtprofiler.yonsei.ac.kr.

  7. Fast alignment-free sequence comparison using spaced-word frequencies

    PubMed Central

    Leimeister, Chris-Andre; Boden, Marcus; Horwege, Sebastian; Lindner, Sebastian; Morgenstern, Burkhard

    2014-01-01

    Motivation: Alignment-free methods for sequence comparison are increasingly used for genome analysis and phylogeny reconstruction; they circumvent various difficulties of traditional alignment-based approaches. In particular, alignment-free methods are much faster than pairwise or multiple alignments. They are, however, less accurate than methods based on sequence alignment. Most alignment-free approaches work by comparing the word composition of sequences. A well-known problem with these methods is that neighbouring word matches are far from independent. Results: To reduce the statistical dependency between adjacent word matches, we propose to use ‘spaced words’, defined by patterns of ‘match’ and ‘don’t care’ positions, for alignment-free sequence comparison. We describe a fast implementation of this approach using recursive hashing and bit operations, and we show that further improvements can be achieved by using multiple patterns instead of single patterns. To evaluate our approach, we use spaced-word frequencies as a basis for fast phylogeny reconstruction. Using real-world and simulated sequence data, we demonstrate that our multiple-pattern approach produces better phylogenies than approaches relying on contiguous words. Availability and implementation: Our program is freely available at http://spaced.gobics.de/. Contact: chris.leimeister@stud.uni-goettingen.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24700317

  8. Enzyme sequence similarity improves the reaction alignment method for cross-species pathway comparison

    SciTech Connect

    Ovacik, Meric A.; Androulakis, Ioannis P.

    2013-09-15

    Pathway-based information has become an important source of information for both establishing evolutionary relationships and understanding the mode of action of a chemical or pharmaceutical among species. Cross-species comparison of pathways can address two broad questions: comparison in order to inform evolutionary relationships and to extrapolate species differences used in a number of different applications including drug and toxicity testing. Cross-species comparison of metabolic pathways is complex as there are multiple features of a pathway that can be modeled and compared. Among the various methods that have been proposed, reaction alignment has emerged as the most successful at predicting phylogenetic relationships based on NCBI taxonomy. We propose an improvement of the reaction alignment method by accounting for sequence similarity in addition to reaction alignment method. Using nine species, including human and some model organisms and test species, we evaluate the standard and improved comparison methods by analyzing glycolysis and citrate cycle pathways conservation. In addition, we demonstrate how organism comparison can be conducted by accounting for the cumulative information retrieved from nine pathways in central metabolism as well as a more complete study involving 36 pathways common in all nine species. Our results indicate that reaction alignment with enzyme sequence similarity results in a more accurate representation of pathway specific cross-species similarities and differences based on NCBI taxonomy.

  9. The new sequencer on the block: comparison of Life Technology's Proton sequencer to an Illumina HiSeq for whole-exome sequencing.

    PubMed

    Boland, Joseph F; Chung, Charles C; Roberson, David; Mitchell, Jason; Zhang, Xijun; Im, Kate M; He, Ji; Chanock, Stephen J; Yeager, Meredith; Dean, Michael

    2013-10-01

    We assessed the performance of the new Life Technologies Proton sequencer by comparing whole-exome sequence data in a Centre d'Etude du Polymorphisme Humain trio (family 1463) to the Illumina HiSeq instrument. To simulate a typical user's results, we utilized the standard capture, alignment and variant calling methods specific to each platform. We restricted data analysis to include the capture region common to both methods. The Proton produced high quality data at a comparable average depth and read length, and the Ion Reporter variant caller identified 96 % of single nucleotide polymorphisms (SNPs) detected by the HiSeq and GATK pipeline. However, only 40 % of small insertion and deletion variants (indels) were identified by both methods. Usage of the trio structure and segregation of platform-specific alleles supported this result. Further comparison of the trio data with Complete Genomics sequence data and Illumina SNP microarray genotypes documented high concordance and accurate SNP genotyping of both Proton and Illumina platforms. However, our study underscored the problem of accurate detection of indels for both the Proton and HiSeq platforms.

  10. Definition and Analysis of a System for the Automated Comparison of Curriculum Sequencing Algorithms in Adaptive Distance Learning

    ERIC Educational Resources Information Center

    Limongelli, Carla; Sciarrone, Filippo; Temperini, Marco; Vaste, Giulia

    2011-01-01

    LS-Lab provides automatic support to comparison/evaluation of the Learning Object Sequences produced by different Curriculum Sequencing Algorithms. Through this framework a teacher can verify the correspondence between the behaviour of different sequencing algorithms and her pedagogical preferences. In fact the teacher can compare algorithms…

  11. Definition and Analysis of a System for the Automated Comparison of Curriculum Sequencing Algorithms in Adaptive Distance Learning

    ERIC Educational Resources Information Center

    Limongelli, Carla; Sciarrone, Filippo; Temperini, Marco; Vaste, Giulia

    2011-01-01

    LS-Lab provides automatic support to comparison/evaluation of the Learning Object Sequences produced by different Curriculum Sequencing Algorithms. Through this framework a teacher can verify the correspondence between the behaviour of different sequencing algorithms and her pedagogical preferences. In fact the teacher can compare algorithms…

  12. Correlation between MCAT Biology Content Specifications and Topic Scope and Sequence of General Education College Biology Textbooks

    ERIC Educational Resources Information Center

    Rissing, Steven W.

    2013-01-01

    Most American colleges and universities offer gateway biology courses to meet the needs of three undergraduate audiences: biology and related science majors, many of whom will become biomedical researchers; premedical students meeting medical school requirements and preparing for the Medical College Admissions Test (MCAT); and students completing…

  13. Correlation between MCAT Biology Content Specifications and Topic Scope and Sequence of General Education College Biology Textbooks

    ERIC Educational Resources Information Center

    Rissing, Steven W.

    2013-01-01

    Most American colleges and universities offer gateway biology courses to meet the needs of three undergraduate audiences: biology and related science majors, many of whom will become biomedical researchers; premedical students meeting medical school requirements and preparing for the Medical College Admissions Test (MCAT); and students completing…

  14. Bacterial community comparisons by taxonomy-supervised analysis independent of sequence alignment and clustering

    PubMed Central

    Sul, Woo Jun; Cole, James R.; Jesus, Ederson da C.; Wang, Qiong; Farris, Ryan J.; Fish, Jordan A.; Tiedje, James M.

    2011-01-01

    High-throughput sequencing of 16S rRNA genes has increased our understanding of microbial community structure, but now even higher-throughput methods to the Illumina scale allow the creation of much larger datasets with more samples and orders-of-magnitude more sequences that swamp current analytic methods. We developed a method capable of handling these larger datasets on the basis of assignment of sequences into an existing taxonomy using a supervised learning approach (taxonomy-supervised analysis). We compared this method with a commonly used clustering approach based on sequence similarity (taxonomy-unsupervised analysis). We sampled 211 different bacterial communities from various habitats and obtained ∼1.3 million 16S rRNA sequences spanning the V4 hypervariable region by pyrosequencing. Both methodologies gave similar ecological conclusions in that β-diversity measures calculated by using these two types of matrices were significantly correlated to each other, as were the ordination configurations and hierarchical clustering dendrograms. In addition, our taxonomy-supervised analyses were also highly correlated with phylogenetic methods, such as UniFrac. The taxonomy-supervised analysis has the advantages that it is not limited by the exhaustive computation required for the alignment and clustering necessary for the taxonomy-unsupervised analysis, is more tolerant of sequencing errors, and allows comparisons when sequences are from different regions of the 16S rRNA gene. With the tremendous expansion in 16S rRNA data acquisition underway, the taxonomy-supervised approach offers the potential to provide more rapid and extensive community comparisons across habitats and samples. PMID:21873204

  15. Testing statistical significance scores of sequence comparison methods with structure similarity

    PubMed Central

    Hulsen, Tim; de Vlieg, Jacob; Leunissen, Jack AM; Groenen, Peter MA

    2006-01-01

    Background In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. Results All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. Conclusion The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons. PMID:17038163

  16. Simultaneous removal of nanosilver and fullerene in sequencing batch reactors for biological wastewater treatment.

    PubMed

    Yang, Yu; Wang, Yifei; Hristovski, Kiril; Westerhoff, Paul

    2015-04-01

    Increasing use of engineered nanomaterials (ENMs) inevitably leads to their potential release to the sewer system. The co-removal of nano fullerenes (nC60) and nanosilver as well as their impact on COD removal were studied in biological sequencing batch reactors (SBR) for a year. When dosing nC60 at 0.07-2mgL(-1), the SBR removed greater than 95% of nC60 except for short-term interruptions occurred (i.e., dysfunction of bioreactor by nanosilver addition) when nC60 and nanosilver were dosed simultaneously. During repeated 30-d periods of adding both 2 mg L(-1) nC60 and 2 mg L(-1) nanosilver, short-term interruption of SBRs for 4d was observed and accompanied by (1) reduced total suspended solids in the reactor, (2) poor COD removal rate as low as 22%, and (3) decreased nC60 removal to 0%. After the short-term interruption, COD removal gradually returned to normal within one solids retention time. Except for during these "short-term interruptions", the silver removal rate was above 90%. A series of bottle-point batch experiments was conducted to determine the distribution coefficients of nC60 between liquid and biomass phases. A linear distribution model on nC60 combined with a mass balance equation simulated well its removal rate at a range of 0.07-0.76 mg L(-1) in SBRs. This paper illustrates the effect of "pulse" inputs (i.e., addition for a short period of time) of ENMs into biological reactors, demonstrates long-term capability of SBRs to remove ENMs and COD, and provides an example to predict the removal of ENMs in SBRs upon batch experiments.

  17. Fetal sex chromosome testing by maternal plasma DNA sequencing: clinical laboratory experience and biology.

    PubMed

    Bianchi, Diana W; Parsa, Saba; Bhatt, Sucheta; Halks-Miller, Meredith; Kurtzman, Kathryn; Sehnert, Amy J; Swanson, Amy

    2015-02-01

    To describe the clinical experience with noninvasive prenatal testing for fetal sex chromosomes using sequencing of maternal plasma cell-free DNA in a commercial laboratory. A noninvasive prenatal testing laboratory data set was examined for samples in which fetal sex chromosomes were reported. Available clinical outcomes were reviewed. Of 18,161 samples with sex chromosome results, no sex chromosome aneuploidy was detected in 98.9% and the fetal sex was reported as XY (9,236) or XX (8,721). In 4 of 32 cases in which the fetal sex was reportedly discordant between noninvasive prenatal testing and karyotype or ultrasonogram, a potential biological reason for the discordance exists, including two cases of documented co-twin demise, one case of a maternal kidney transplant from a male donor, and one case of fetal ambiguous genitalia. In the remaining 204 samples (1.1%), one of four sex chromosome aneuploidies (monosomy X, XXX, XXY, or XYY) was detected. The frequency of false positive results for sex chromosome aneuploidies is a minimum of 0.26% and a maximum of 1.05%. All but one of the discordant sex chromosome aneuploidy results involved the X chromosome. In two putative false-positive XXX cases, maternal XXX was confirmed by karyotype. For the false-positive cases, mean maternal age was significantly higher in monosomy X (P<.001) and lower in XXX (P=.008). Noninvasive prenatal testing results for sex chromosome aneuploidy can be confounded by maternal or fetal biological phenomena. When a discordant noninvasive prenatal testing result is encountered, resolution requires additional maternal history, detailed fetal ultrasonography, and determination of fetal and possibly maternal karyotypes.

  18. Genome sequences and structures of two biologically distinct strains of Grapevine leafroll-associated virus 2 and sequence analysis.

    PubMed

    Meng, Baozhong; Li, Caihong; Goszczynski, Dariusz E; Gonsalves, Dennis

    2005-08-01

    Grapevine leafroll-associated virus 2 (GLRaV-2), a member of the genus Closterovirus within Closteroviridae, is implicated in several important diseases of grapevines including "leafroll", "graft-incompatibility", and "quick decline" worldwide. Several GLRaV-2 isolates have been detected from different grapevine genotypes. However, the genomes of these isolates were not sequenced or only partially sequenced. Consequently, the relationship of these viral isolates at the molecular level has not been determined. Here, we group the various GLRaV-2 isolates into four strains based on their coat protein gene sequences. We show that isolates "PN" (originated from Vitis vinifera cv. "Pinot noir"), "Sem" (from V. vinifera cv. "Semillon") and "94/970" (from V. vinifera cv. "Muscat of Alexandria") belong to the same strain, "93/955" (from hybrid "LN-33") and "H4" (from V. rupestris "St. George") each represents a distinct strain, while Grapevine rootstock stem lesion-associated virus.

  19. Comparison of pulse sequences for R1-based electron paramagnetic resonance oxygen imaging

    NASA Astrophysics Data System (ADS)

    Epel, Boris; Halpern, Howard J.

    2015-05-01

    Electron paramagnetic resonance (EPR) spin-lattice relaxation (SLR) oxygen imaging has proven to be an indispensable tool for assessing oxygen partial pressure in live animals. EPR oxygen images show remarkable oxygen accuracy when combined with high precision and spatial resolution. Developing more effective means for obtaining SLR rates is of great practical, biological and medical importance. In this work we compared different pulse EPR imaging protocols and pulse sequences to establish advantages and areas of applicability for each method. Tests were performed using phantoms containing spin probes with oxygen concentrations relevant to in vivo oxymetry. We have found that for small animal size objects the inversion recovery sequence combined with the filtered backprojection reconstruction method delivers the best accuracy and precision. For large animals, in which large radio frequency energy deposition might be critical, free induction decay and three pulse stimulated echo sequences might find better practical usage.

  20. COMPARISON OF ANALYTICAL METHODS FOR THE MEASUREMENT OF NON-VIABLE BIOLOGICAL PM

    EPA Science Inventory

    The paper describes a preliminary research effort to develop a methodology for the measurement of non-viable biologically based particulate matter (PM), analyzing for mold, dust mite, and ragweed antigens and endotoxins. Using a comparison of analytical methods, the research obj...

  1. COMPARISON OF ANALYTICAL METHODS FOR THE MEASUREMENT OF NON-VIABLE BIOLOGICAL PM

    EPA Science Inventory

    The paper describes a preliminary research effort to develop a methodology for the measurement of non-viable biologically based particulate matter (PM), analyzing for mold, dust mite, and ragweed antigens and endotoxins. Using a comparison of analytical methods, the research obj...

  2. A Comparison of the First Two Sequenced Chloroplast Genomes in Asteraceae: Lettuce and Sunflower

    SciTech Connect

    Timme, Ruth E.; Kuehl, Jennifer V.; Boore, Jeffrey L.; Jansen, Robert K.

    2006-01-20

    Asteraceae is the second largest family of plants, with over 20,000 species. For the past few decades, numerous phylogenetic studies have contributed to our understanding of the evolutionary relationships within this family, including comparisons of the fast evolving chloroplast gene, ndhF, rbcL, as well as non-coding DNA from the trnL intron plus the trnLtrnF intergenic spacer, matK, and, with lesser resolution, psbA-trnH. This culminated in a study by Panero and Funk in 2002 that used over 13,000 bp per taxon for the largest taxonomic revision of Asteraceae in over a hundred years. Still, some uncertainties remain, and it would be very useful to have more information on the relative rates of sequence evolution among various genes and on genome structure as a potential set of phylogenetic characters to help guide future phylogenetic structures. By way of contributing to this, we report the first two complete chloroplast genome sequences from members of the Asteraceae, those of Helianthus annuus and Lactuca sativa. These plants belong to two distantly related subfamilies, Asteroideae and Cichorioideae, respectively. In addition to these, there is only one other published chloroplast genome sequence for any plant within the larger group called Eusterids II, that of Panax ginseng (Araliaceae, 156,318 bps, AY582139). Early chloroplast genome mapping studies demonstrated that H. annuus and L. sativa share a 22 kb inversion relative to members of the subfamily Barnadesioideae. By comparison to outgroups, this inversion was shown to be derived, indicating that the Asteroideae and Cichorioideae are more closely related than either is to the Barnadesioideae. Later sequencing study found that taxa that share this 22 kb inversion also contain within this region a second, smaller, 3.3 kb inversion. These sequences also enable an analysis of patterns of shared repeats in the genomes at fine level and of RNA editing by comparison to available EST sequences. In addition, since

  3. Protein sequence comparison and fold recognition: progress and good-practice benchmarking.

    PubMed

    Söding, Johannes; Remmert, Michael

    2011-06-01

    Protein sequence comparison methods have grown increasingly sensitive during the last decade and can often identify distantly related proteins sharing a common ancestor some 3 billion years ago. Although cellular function is not conserved so long, molecular functions and structures of protein domains often are. In combination with a domain-centered approach to function and structure prediction, modern remote homology detection methods have a great and largely underexploited potential for elucidating protein functions and evolution. Advances during the last few years include nonlinear scoring functions combining various sequence features, the use of sequence context information, and powerful new software packages. Since progress depends on realistically assessing new and existing methods and published benchmarks are often hard to compare, we propose 10 rules of good-practice benchmarking.

  4. Disciplinary baptisms: a comparison of the naming stories of genetics, molecular biology, genomics, and systems biology.

    PubMed

    Powell, Alexander; O'Malley, Maureen A; Müller-Wille, Staffan; Calvert, Jane; Dupré, John

    2007-01-01

    Understanding how scientific activities use naming stories to achieve disciplinary status is important not only for insight into the past, but for evaluating current claims that new disciplines are emerging. In order to gain a historical understanding of how new disciplines develop in relation to these baptismal narratives, we compare two recently formed disciplines, systems biology and genomics, with two earlier related life sciences, genetics and molecular biology. These four disciplines span the twentieth century, a period in which the processes of disciplinary demarcation fundamentally changed from those characteristic of the nineteenth century. We outline how the establishment of each discipline relies upon an interplay of factors that include paradigmatic achievements, technological innovation, and social formations. Our focus, however, is the baptism stories that give the new discipline a founding narrative and articulate core problems, general approaches and constitutive methods. The highly plastic process of achieving disciplinary identity is further marked by the openness of disciplinary definition, tension between technological possibilities and the ways in which scientific issues are conceived and approached, synthesis of reductive and integrative strategies, and complex social interactions. The importance--albeit highly variable--of naming stories in these four cases indicates the scope for future studies that focus on failed disciplines or competing names. Further attention to disciplinary histories could, we suggest, give us richer insight into scientific development.

  5. RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA

    PubMed Central

    Wright, Imogen A.; Travers, Simon A.

    2014-01-01

    The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. PMID:24861618

  6. Metagenomes obtained by 'deep sequencing' - what do they tell about the enhanced biological phosphorus removal communities?

    PubMed

    Albertsen, Mads; Saunders, Aaron M; Nielsen, Kåre L; Nielsen, Per H

    2013-01-01

    Metagenomics enables studies of the genomic potential of complex microbial communities by sequencing bulk genomic DNA directly from the environment. Knowledge of the genetic potential of a community can be used to formulate and test ecological hypotheses about stability and performance. In this study deep metagenomics and fluorescence in situ hybridization (FISH) were used to study a full-scale wastewater treatment plant with enhanced biological phosphorus removal (EBPR), and the results were compared to an existing EBPR metagenome. EBPR is a widely used process that relies on a complex community of microorganisms to function properly. Insight into community and species level stability and dynamics is valuable for knowledge-driven optimization of the EBPR process. The metagenomes of the EBPR communities were distinct compared to metagenomes of communities from a wide range of other environments, which could be attributed to selection pressures of the EBPR process. The metabolic potential of one of the key microorganisms in the EPBR process, Accumulibacter, was investigated in more detail in the two plants, revealing a potential importance of phage predation on the dynamics of Accumulibacter populations. The results demonstrate that metagenomics can be used as a powerful tool for system wide characterization of the EBPR community as well as for a deeper understanding of the function of specific community members. Furthermore, we discuss and illustrate some of the general pitfalls in metagenomics and stress the need of additional DNA extraction independent information in metagenome studies.

  7. Nitrification, denitrification and biological phosphorus removal in piggery wastewater using a sequencing batch reactor.

    PubMed

    Obaja, D; Macé, S; Costa, J; Sans, C; Mata-Alvarez, J

    2003-03-01

    Nutrients in piggery wastewater with high organic matter, nitrogen (N) and phosphorus (P) content were biologically removed in a sequencing batch reactor (SBR) with anaerobic, aerobic and anoxic stages. The SBR was operated with 3 cycles/day, temperature 30 degrees C, sludge retention time (SRT) 1 day and hydraulic retention time (HRT) 11 days. With a wastewater containing 1500 mg/l ammonium and 144 mg/l phosphate, a removal efficiency of 99.7% for nitrogen and 97.3% for phosphate was obtained. Experiments set up to evaluate the effect of temperature on the process showed that it should be run at temperatures higher than 16 degrees C to obtain good removals (> 95%). Batch tests (ammonia utilization rate, nitrogen utilization rate and oxygen utilization rate) proved to be good tools to evaluate heterotrophic and autotrophic biomass activity. The SBR proved to be a very flexible tool, and was particularly suitable for the treatment of piggery wastewater, characterized by high nutrient content and by frequent changes in composition and therefore affecting process conditions.

  8. Biological phosphorus removal in sequencing batch reactor with single-stage oxic process.

    PubMed

    Wang, Dong-Bo; Li, Xiao-Ming; Yang, Qi; Zeng, Guang-Ming; Liao, De-Xiang; Zhang, Jie

    2008-09-01

    The performance of biological phosphorus removal (BPR) in a sequencing batch reactor (SBR) with single-stage oxic process was investigated using simulated municipal wastewater. The experimental results showed that BPR could be achieved in a SBR without anaerobic phase, which was conventionally considered as a key phase for BPR. Phosphorus (P) concentration 0.22-1.79 mg L(-1) in effluent can be obtained after 4h aeration when P concentration in influent was about 15-20 mg L(-1), the dissolved oxygen (DO) was controlled at 3+/-0.2 mg L(-1) during aerobic phase and pH was maintained 7+/-0.1, which indicated the efficiencies of P removal were achieved 90% above. Experimental results also showed that P was mainly stored in the form of intracellular storage of polyphosphate (poly-P), and about 207.235 mg phosphates have been removed by the discharge of rich-phosphorus sludge for each SBR cycle. However, the energy storage poly-beta-hydroxyalkanoates (PHA) was almost kept constant at a low level (5-6 mg L(-1)) during the process. Those results showed that phosphate could be transformed to poly-P with single-stage oxic process without PHA accumulation, and BPR could be realized in net phosphate removal.

  9. Whole-Genome Sequence of Pseudomonas graminis Strain UASWS1507, a Potential Biological Control Agent and Biofertilizer Isolated in Switzerland

    PubMed Central

    Crovadore, Julien; Calmin, Gautier; Chablais, Romain; Cochard, Bastien; Schulz, Torsten

    2016-01-01

    We report here the whole-genome shotgun sequence of the strain UASWS1507 of the species Pseudomonas graminis, isolated in Switzerland from an apple tree. This is the first genome registered for this species, which is considered as a potential and valuable resource of biological control agents and biofertilizers for agriculture. PMID:27795260

  10. Whole-Genome Sequence of Pseudomonas graminis Strain UASWS1507, a Potential Biological Control Agent and Biofertilizer Isolated in Switzerland.

    PubMed

    Crovadore, Julien; Calmin, Gautier; Chablais, Romain; Cochard, Bastien; Schulz, Torsten; Lefort, François

    2016-10-06

    We report here the whole-genome shotgun sequence of the strain UASWS1507 of the species Pseudomonas graminis, isolated in Switzerland from an apple tree. This is the first genome registered for this species, which is considered as a potential and valuable resource of biological control agents and biofertilizers for agriculture. Copyright © 2016 Crovadore et al.

  11. Long-Range Correlations in the Sequence of Human Heartbeats and Other Biological Signals

    NASA Astrophysics Data System (ADS)

    Teich, Malvin C.

    1998-03-01

    The sequence of heartbeat occurrence times provides information about the state of health of the heart. We used a variety of measures, including multiresolution wavelet analysis, to identify the form of the point process that describes the human heartbeat. These measures, which are based on both interbeat (R-R) intervals and counts (heart rate), have been applied to records for both normal and heart-failure patients drawn from a standard database, and various surrogate versions thereof. Several of these measures reveal scaling behavior (1/f-type fluctuations; long-range power-law correlations).(R. G. Turcott and M. C. Teich, Proc. SPIE) 2036 (Chaos in Biology and Medicine), 22--39 (1993). Essentially all of the R-R and count-based measures we investigated, including those that exhibit scaling, differ in statistically significant ways for the normal and heart-failure patients. The wavelet measures, however, reveal a heretofore unknown scale window, between 16 and 32 heartbeats, over which the magnitudes of the wavelet-coefficient variances fall into disjoint sets for the normal and heart-failure patients.(R. G. Turcott and M. C. Teich, Ann. Biomed. Eng.) 24, 269--293 (1996).^,(S. Thurner, M. C. Feurstein, and M. C. Teich, Phys. Rev. Lett.) (in press). This enables us to correctly classify every patient in the standard data set as either belonging to the heart-failure or normal group with 100% accuracy, thereby providing a clinically significant measure of the presence of heart-failure. Previous approaches have provided only statistically significant measures. The tradeoff between sensitivity and

  12. Grouping and comparison of Indian citrus tristeza virus isolates based on coat protein gene sequences and restriction analysis patterns.

    PubMed

    Roy, A; Ramachandran, P; Brlansky, R H

    2003-04-01

    Citrus tristeza virus (CTV) is an aphid-transmitted closterovirus, which causes one of the most important citrus diseases worldwide. Isolates of CTV differ widely in their biological properties. CTV-infected samples were collected from four locations in India: Bangalore (CTV-B), Delhi (CTV-D), Nagpur (CTV-N), and Pune (CTV-P), and were maintained by grafting into Kagzi lime ( Citrus aurantifolia (Christm. Swing.). All isolates produced typical vein clearing and flecking symptoms 6-8 weeks after grafting. In addition, CTV-B and CTV-P isolates produced stem-pitting symptoms after 8-10 months. The CTV coat protein gene (CPG) was amplified by RT-PCR using CPG specific primers, yielding an amplicon of 672 bp for all the isolates. Sequence analysis of the CPG amplicon of all the four Indian isolates showed 93-94% nucleotide sequence homology to the Californian CTV severe stem pitting isolate SY568 and 92-93% homology to the Japanese seedling yellows isolate NUagA and Israeli VT p346 isolates. In phylogenetic tree analysis, Indian CTV isolates appeared far different from other isolates as they formed a separate branch. Comparison among the Indian isolates was carried out by restriction analysis and restriction fragment length polymorphism (RFLP). Specific primers to various genome segments of well-characterized CTV isolates were used to further classify the Indian CTV isolates.

  13. Structure-based evaluation of sequence comparison and fold recognition alignment accuracy.

    PubMed

    Domingues, F S; Lackner, P; Andreeva, A; Sippl, M J

    2000-04-07

    The biological role, biochemical function, and structure of uncharacterized protein sequences is often inferred from their similarity to known proteins. A constant goal is to increase the reliability, sensitivity, and accuracy of alignment techniques to enable the detection of increasingly distant relationships. Development, tuning, and testing of these methods benefit from appropriate benchmarks for the assessment of alignment accuracy.Here, we describe a benchmark protocol to estimate sequence-to-sequence and sequence-to-structure alignment accuracy. The protocol consists of structurally related pairs of proteins and procedures to evaluate alignment accuracy over the whole set. The set of protein pairs covers all the currently known fold types. The benchmark is challenging in the sense that it consists of proteins lacking clear sequence similarity. Correct target alignments are derived from the three-dimensional structures of these pairs by rigid body superposition. An evaluation engine computes the accuracy of alignments obtained from a particular algorithm in terms of alignment shifts with respect to the structure derived alignments. Using this benchmark we estimate that the best results can be obtained from a combination of amino acid residue substitution matrices and knowledge-based potentials.

  14. Proactive interference of a sequence of tones in a two-tone pitch comparison task.

    PubMed

    Ruusuvirta, T

    2000-06-01

    Subjects compared pitches of a standard tone and a comparison tone separated by 1,300-3,000 msec and responded according to whether the comparison tone sounded higher or lower in pitch than the standard tone. Three interfering tones at 300-msec intervals were presented before each pair of tones. Their pitch range varied, being either below or above the pitch of the standard tone; in some of the trials, their pitches were identical to the pitch of the standard tone (no interference). The highest error rate in performance was found when the interfering tones and the comparison tone deviated in the same direction in pitch from the standard tone. In turn, their deviations in the opposite directions resulted in the lowest error rate. This effect was not found to be dependent on whether the interfering tones were randomly ordered or monotonically ordered, together with the standard tone, into melodically ascending/descending sequences. An intermediate error rate in performance was found when the interfering tones and the standard tone were identical. The results support earlier hypotheses, presented in the context of retroactive interference, by demonstrating proactive interference of a tone sequence at the level of representations of individual tones.

  15. Biological phosphorus removal in anoxic-aerobic sequencing batch reactor with starch as sole carbon source.

    PubMed

    Luo, Dacheng; Yuan, Linjiang; Liu, Lun; Chai, Lu; Wang, Xin

    2017-01-01

    In traditional biological phosphorus removal (BPR), phosphorus release in anaerobic stage is the prerequisite of phosphorus excessive uptake in aerobic conditions. Moreover, when low molecular weight of the organic substance such as volatile fatty acids (VFAs) is scarce in bulk liquid or anaerobic condition does not exist, phosphate accumulating organisms (PAOs) have difficulty removing phosphorus. However, in this work, phosphorus removal in two anoxic-aerobic sequencing batch reactors (SBRs) was observed when starch was supplied as a sole carbon source. The relations of the BPR with idle period were investigated in the two identical SBRs; the idle times were set to 0.5 hr (R1) and 4 hr (R2), respectively. Results of the study showed that, in the two SBRs, phosphorus concentrations of 0.26-3.11 mg/L in effluent were obtained after aeration when phosphorus concentration in influent was about 8 mg/L. Moreover, lower accumulations/transformations of polyhydroxyalkanoates (PHAs) and higher transformation of glycogen occurred in the SBRs, indicating that glycogen was the main energy source that was different from the traditional mechanism of BPR. Under the different idle time, the phosphorus removal was a little different. In R2, which had a longer idle period, phosphorus release was very obvious just as occurs in a anaerobic-aerobic regime, but there was a special phenomenon of chemical oxygen demand increase, while VFAs had no notable change. It is speculated that PAOs can assimilate organic compounds in the mixed liquor, which were generated from glycolysis by fermentative organisms, coupled with phosphorus release. In R1, which had a very short idle period, anaerobic condition did not exist; phosphorus removal rate reached 63%. It is implied that a new metabolic pathway can occur even without anaerobic phosphorus release when starch is supplied as the sole carbon source.

  16. A comparison of inquiry-based teaching through concept maps and traditional teaching in biology

    NASA Astrophysics Data System (ADS)

    Gulati, Sangeeta

    2005-11-01

    The purpose of this study was to investigate affective outcomes and academic achievement for students enrolled in high school biology when instruction included concept-mapping. The research design was quasi-experimental and allowed for a comparison between an experimental group who constructed concept maps and a control group who received traditional biology instruction. The subjects were 140 ninth-grade students, distributed into six intact biology classes, three honors and three general biology classes. Chapter tests and a textbook generated 9-week comprehensive posttest were used to measure achievement. ANCOVA analysis on the comprehensive posttest indicated no significant overall effect of concept mapping on biology achievement across the whole quarter when controlling for the quarter pretest. Chi-square analyses were performed to measure students' attitude toward biology class and activities. The experimental group indicated higher than expected tendency to be positive about the instructional methods, however, the control group indicated fewer than expected positive responses. T-tests were conducted to determine the differences between the experimental and control groups on chapter tests with or without concept mapping. The group with concept mapping scored significantly better than those with traditional methods. Honors class comparisons indicated a significant difference between groups at p<.05 level on the chapter pretest. There was also a significant difference on the chapter test after intervention, but this time at p<.001 level. Although the general class comparisons indicated no significant difference on the chapter pretest, the experimental group scored significantly better than the control group on the chapter test following intervention. This suggests that average ability students benefit from concept mapping more than traditional instruction. In narrative self-evaluations, only a small percentage of participants overall listed concept mapping as the

  17. The environmental biological signature: NGS profiling for forensic comparison of soils.

    PubMed

    Giampaoli, S; Berti, A; Di Maggio, R M; Pilli, E; Valentini, A; Valeriani, F; Gianfranceschi, G; Barni, F; Ripani, L; Romano Spica, V

    2014-07-01

    The identification of the source of a specific soil sample is a crucial step in forensic investigations. Rapid advances in next generation sequencing (NGS) technology and the strong reduction of the cost of sequencing have recently opened new perspectives. In the present work a metabarcoding approach has been successfully applied to forensic and environmental soil samples, allowing the accurate and sensitive analysis of microflora (mfDNA), plants, metazoa, and protozoa DNA. The identification of the biological component by DNA metabarcoding is a strong element for the discrimination of samples geologically very similar but coming for distinct environments.

  18. CompNet: a GUI based tool for comparison of multiple biological interaction networks.

    PubMed

    Kuntal, Bhusan K; Dutta, Anirban; Mande, Sharmila S

    2016-04-26

    Network visualization and analysis tools aid in better understanding of complex biological systems. Furthermore, to understand the differences in behaviour of system(s) under various environmental conditions (e.g. stress, infection), comparing multiple networks becomes necessary. Such comparisons between multiple networks may help in asserting causation and in identifying key components of the studied biological system(s). Although many available network comparison methods exist, which employ techniques like network alignment and querying to compute pair-wise similarity between selected networks, most of them have limited features with respect to interactive visual comparison of multiple networks. In this paper, we present CompNet - a graphical user interface based network comparison tool, which allows visual comparison of multiple networks based on various network metrics. CompNet allows interactive visualization of the union, intersection and/or complement regions of a selected set of networks. Different visualization features (e.g. pie-nodes, edge-pie matrix, etc.) aid in easy identification of the key nodes/interactions and their significance across the compared networks. The tool also allows one to perform network comparisons on the basis of neighbourhood architecture of constituent nodes and community compositions, a feature particularly useful while analyzing biological networks. To demonstrate the utility of CompNet, we have compared a (time-series) human gene-expression dataset, post-infection by two strains of Mycobacterium tuberculosis, overlaid on the human protein-protein interaction network. Using various functionalities of CompNet not only allowed us to comprehend changes in interaction patterns over the course of infection, but also helped in inferring the probable fates of the host cells upon infection by the two strains. CompNet is expected to be a valuable visual data mining tool and is freely available for academic use from http

  19. Biological ingredient analysis of traditional Chinese medicine preparation based on high-throughput sequencing: the story for Liuwei Dihuang Wan

    PubMed Central

    Cheng, Xinwei; Su, Xiaoquan; Chen, Xiaohua; Zhao, Huanxin; Bo, Cunpei; Xu, Jian; Bai, Hong; Ning, Kang

    2014-01-01

    Although Traditional Chinese Medicine (TCM) preparations have long history with successful applications, the scientific and systematic quality assessment of TCM preparations mainly focuses on chemical constituents and is far from comprehensive. There are currently only few primitive studies on assessment of biological ingredients in TCM preparations. Here, we have proposed a method, M-TCM, for biological assessment of the quality of TCM preparations based on high-throughput sequencing and metagenomic analysis. We have tested this method on Liuwei Dihuang Wan (LDW), a TCM whose ingredients have been well-defined. Our results have shown that firstly, this method could determine the biological ingredients of LDW preparations. Secondly, the quality and stability of LDW varies significantly among different manufacturers. Thirdly, the overall quality of LDW samples is significantly affected by their biological contaminations. This novel strategy has the potential to achieve comprehensive ingredient profiling of TCM preparations. PMID:24888649

  20. Unveiling cryptic species diversity of flowering plants: successful biological species identification of Asian Mitella using nuclear ribosomal DNA sequences.

    PubMed

    Okuyama, Yudai; Kato, Makoto

    2009-05-16

    Although DNA sequence analysis is becoming a powerful tool for identifying species, it is not easy to assess whether the observed genetic disparity corresponds to reproductive isolation. Here, we compared the efficiency of biological species identification between nuclear ribosomal and chloroplast DNA sequences, focusing on an Asian endemic perennial lineage of Mitella (Asimitellaria; Saxifragaceae). We performed artificial cross experiments for 43 pairs of ten taxonomic species, and examined their F1 hybrid pollen fertility in vitro as a quantitative measure of postzygotic reproductive isolation. A nonlinear, multiple regression analysis indicated that the nuclear ribosomal DNA distances are sufficient to explain the observed pattern of F1 hybrid pollen fertility, and supplementation with chloroplast DNA distance data does not improve the explanatory power. Overall, with the exception of a recently diverged species complex with more than three biological species, nuclear ribosomal DNA sequences successfully circumscribed ten distinct biological species, of which two have not been described (and an additional one has not been regarded as a distinct taxonomic species) to date. We propose that nuclear ribosomal DNA sequences contribute to reliable identification of reproductively isolated and cryptic species of Mitella. More comparable studies for other plant groups are needed to generalize our findings to flowering plants.

  1. The organization of biological sequences into constrained and unconstrained parts determines fundamental properties of genotype–phenotype maps

    PubMed Central

    Greenbury, S. F.; Ahnert, S. E.

    2015-01-01

    Biological information is stored in DNA, RNA and protein sequences, which can be understood as genotypes that are translated into phenotypes. The properties of genotype–phenotype (GP) maps have been studied in great detail for RNA secondary structure. These include a highly biased distribution of genotypes per phenotype, negative correlation of genotypic robustness and evolvability, positive correlation of phenotypic robustness and evolvability, shape-space covering, and a roughly logarithmic scaling of phenotypic robustness with phenotypic frequency. More recently similar properties have been discovered in other GP maps, suggesting that they may be fundamental to biological GP maps, in general, rather than specific to the RNA secondary structure map. Here we propose that the above properties arise from the fundamental organization of biological information into ‘constrained' and ‘unconstrained' sequences, in the broadest possible sense. As ‘constrained' we describe sequences that affect the phenotype more immediately, and are therefore more sensitive to mutations, such as, e.g. protein-coding DNA or the stems in RNA secondary structure. ‘Unconstrained' sequences, on the other hand, can mutate more freely without affecting the phenotype, such as, e.g. intronic or intergenic DNA or the loops in RNA secondary structure. To test our hypothesis we consider a highly simplified GP map that has genotypes with ‘coding' and ‘non-coding' parts. We term this the Fibonacci GP map, as it is equivalent to the Fibonacci code in information theory. Despite its simplicity the Fibonacci GP map exhibits all the above properties of much more complex and biologically realistic GP maps. These properties are therefore likely to be fundamental to many biological GP maps. PMID:26609063

  2. The organization of biological sequences into constrained and unconstrained parts determines fundamental properties of genotype-phenotype maps.

    PubMed

    Greenbury, S F; Ahnert, S E

    2015-12-06

    Biological information is stored in DNA, RNA and protein sequences, which can be understood as genotypes that are translated into phenotypes. The properties of genotype-phenotype (GP) maps have been studied in great detail for RNA secondary structure. These include a highly biased distribution of genotypes per phenotype, negative correlation of genotypic robustness and evolvability, positive correlation of phenotypic robustness and evolvability, shape-space covering, and a roughly logarithmic scaling of phenotypic robustness with phenotypic frequency. More recently similar properties have been discovered in other GP maps, suggesting that they may be fundamental to biological GP maps, in general, rather than specific to the RNA secondary structure map. Here we propose that the above properties arise from the fundamental organization of biological information into 'constrained' and 'unconstrained' sequences, in the broadest possible sense. As 'constrained' we describe sequences that affect the phenotype more immediately, and are therefore more sensitive to mutations, such as, e.g. protein-coding DNA or the stems in RNA secondary structure. 'Unconstrained' sequences, on the other hand, can mutate more freely without affecting the phenotype, such as, e.g. intronic or intergenic DNA or the loops in RNA secondary structure. To test our hypothesis we consider a highly simplified GP map that has genotypes with 'coding' and 'non-coding' parts. We term this the Fibonacci GP map, as it is equivalent to the Fibonacci code in information theory. Despite its simplicity the Fibonacci GP map exhibits all the above properties of much more complex and biologically realistic GP maps. These properties are therefore likely to be fundamental to many biological GP maps.

  3. Gene tree discordance of wild and cultivated Asian rice deciphered by genome-wide sequence comparison.

    PubMed

    Yang, Ching-chia; Sakai, Hiroaki; Numa, Hisataka; Itoh, Takeshi

    2011-05-15

    Although a large number of genes are expected to correctly solve a phylogenetic relationship, inconsistent gene tree topologies have been observed. This conflicting evidence in gene tree topologies, known as gene tree discordance, becomes increasingly important as advanced sequencing technologies produce an enormous amount of sequence information for phylogenomic studies among closely related species. Here, we aim to characterize the gene tree discordance of the Asian cultivated rice Oryza sativa and its progenitor, O. rufipogon, which will be an ideal case study of gene tree discordance. Using genome and cDNA sequences of O. sativa and O. rufipogon, we have conducted the first in-depth analyses of gene tree discordance in Asian rice. Our comparison of full-length cDNA sequences of O. rufipogon with the genome sequences of the japonica and indica cultivars of O. sativa revealed that 60% of the gene trees showed a topology consistent with the expected one, whereas the remaining genes supported significantly different topologies. Moreover, the proportions of the topologies deviated significantly from expectation, suggesting at least one hybridization event between the two subgroups of O. sativa, japonica and indica. In fact, a genome-wide alignment between japonica and indica indicated that significant portions of the indica genome are derived from japonica. In addition, literature concerning the pedigree of the indica cultivar strongly supported the hybridization hypothesis. Our molecular evolutionary analyses deciphered complicated evolutionary processes in closely related species. They also demonstrated the importance of gene tree discordance in the era of high-speed DNA sequencing.

  4. A statistical physics perspective on alignment-independent protein sequence comparison.

    PubMed

    Chattopadhyay, Amit K; Nasiev, Diar; Flower, Darren R

    2015-08-01

    Within bioinformatics, the textual alignment of amino acid sequences has long dominated the determination of similarity between proteins, with all that implies for shared structure, function and evolutionary descent. Despite the relative success of modern-day sequence alignment algorithms, so-called alignment-free approaches offer a complementary means of determining and expressing similarity, with potential benefits in certain key applications, such as regression analysis of protein structure-function studies, where alignment-base similarity has performed poorly. Here, we offer a fresh, statistical physics-based perspective focusing on the question of alignment-free comparison, in the process adapting results from 'first passage probability distribution' to summarize statistics of ensemble averaged amino acid propensity values. In this article, we introduce and elaborate this approach. © The Author 2015. Published by Oxford University Press.

  5. The sequence of carnation etched ring virus DNA: comparison with cauliflower mosaic virus and retroviruses

    PubMed Central

    Hull, R.; Sadler, J.; Longstaff, M.

    1986-01-01

    Carnation etched ring virus (CERV) DNA comprises 7932 bp. CERV primer binding sites and overall genome organization are similar to those of the related cauliflower mosaic virus (CaMV). The six open reading frames of CERV showed amino acid homology (50-80%) with CaMV ORFs I-VI; no homologues of CaMV ORFs VII or VIII were found. CERV ORFs 1-5 interface each other with the sequence ATGA. The comparison of CERV ORF5 with CaMV ORFV highlighted regions which show homologies to retrovirus gag/pol protease, RNase H and DNA polymerase domains; the possibility that the DNA polymerase domain comprises two subdomains, operating off different templates, is discussed. Both CERV and CaMV ORFs I have sequence homology to tobacco mosaic virus P30 and plastocyanin. PMID:16453731

  6. A comparison of tools for the simulation of genomic next-generation sequencing data.

    PubMed

    Escalona, Merly; Rocha, Sara; Posada, David

    2016-08-01

    Computer simulation of genomic data has become increasingly popular for assessing and validating biological models or for gaining an understanding of specific data sets. Several computational tools for the simulation of next-generation sequencing (NGS) data have been developed in recent years, which could be used to compare existing and new NGS analytical pipelines. Here we review 23 of these tools, highlighting their distinct functionality, requirements and potential applications. We also provide a decision tree for the informed selection of an appropriate NGS simulation tool for the specific question at hand.

  7. A comparison of tools for the simulation of genomic next-generation sequencing data

    PubMed Central

    Escalona, Merly; Rocha, Sara; Posada, David

    2017-01-01

    Computer simulation of genomic data has become increasingly popular for assessing and validating biological models or to gain understanding about specific datasets. Multiple computational tools for the simulation of next-generation sequencing (NGS) data have been developed in recent years, which could be used to compare existing and new NGS analytical pipelines. Here we review 23 of these tools, highlighting their distinct functionality, requirements and potential applications. We also provide a decision tree for the informed selection of an appropriate NGS simulation tool for the specific question at hand. PMID:27320129

  8. Comparison of exon 5 sequences from 35 class I genes of the BALB/c mouse

    PubMed Central

    1989-01-01

    DNA sequences of the fifth exon, which encodes the transmembrane domain, were determined for the BALB/c mouse class I MHC genes and used to study the relationships between them. Based on nucleotide sequence similarity, the exon 5 sequences can be divided into seven groups. Although most members within each group are at least 80% similar to each other, comparison between groups reveals that the groups share little similarity. However, in spite of the extensive variation of the fifth exon sequences, analysis of their predicted amino acid translations reveals that only four class I gene fifth exons have frameshifts or stop codons that terminate their translation and prevent them from encoding a domain that is both hydrophobic and long enough to span a lipid bilayer. Exactly 27 of the remaining fifth exons could encode a domain that is similar to those of the transplantation antigens in that it consists of a proline-rich connecting peptide, a transmembrane segment, and a cytoplasmic portion with membrane- anchoring basic residues. The conservation of this motif in the majority of the fifth exon translations in spite of extensive variation suggests that selective pressure exists for these exons to maintain their ability to encode a functional transmembrane domain, raising the possibility that many of the nonclassical class I genes encode functionally important products. PMID:2584927

  9. Cancer systems biology in the genome sequencing era: part 1, dissecting and modeling of tumor clones and their networks.

    PubMed

    Wang, Edwin; Zou, Jinfeng; Zaman, Naif; Beitel, Lenore K; Trifiro, Mark; Paliouras, Miltiadis

    2013-08-01

    Recent tumor genome sequencing confirmed that one tumor often consists of multiple cell subpopulations (clones) which bear different, but related, genetic profiles such as mutation and copy number variation profiles. Thus far, one tumor has been viewed as a whole entity in cancer functional studies. With the advances of genome sequencing and computational analysis, we are able to quantify and computationally dissect clones from tumors, and then conduct clone-based analysis. Emerging technologies such as single-cell genome sequencing and RNA-Seq could profile tumor clones. Thus, we should reconsider how to conduct cancer systems biology studies in the genome sequencing era. We will outline new directions for conducting cancer systems biology by considering that genome sequencing technology can be used for dissecting, quantifying and genetically characterizing clones from tumors. Topics discussed in Part 1 of this review include computationally quantifying of tumor subpopulations; clone-based network modeling, cancer hallmark-based networks and their high-order rewiring principles and the principles of cell survival networks of fast-growing clones. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.

  10. Next-Generation Sequencing in the Understanding of Kaposi’s Sarcoma-Associated Herpesvirus (KSHV) Biology

    PubMed Central

    Strahan, Roxanne; Uppal, Timsy; Verma, Subhash C.

    2016-01-01

    Non-Sanger-based novel nucleic acid sequencing techniques, referred to as Next-Generation Sequencing (NGS), provide a rapid, reliable, high-throughput, and massively parallel sequencing methodology that has improved our understanding of human cancers and cancer-related viruses. NGS has become a quintessential research tool for more effective characterization of complex viral and host genomes through its ever-expanding repertoire, which consists of whole-genome sequencing, whole-transcriptome sequencing, and whole-epigenome sequencing. These new NGS platforms provide a comprehensive and systematic genome-wide analysis of genomic sequences and a full transcriptional profile at a single nucleotide resolution. When combined, these techniques help unlock the function of novel genes and the related pathways that contribute to the overall viral pathogenesis. Ongoing research in the field of virology endeavors to identify the role of various underlying mechanisms that control the regulation of the herpesvirus biphasic lifecycle in order to discover potential therapeutic targets and treatment strategies. In this review, we have complied the most recent findings about the application of NGS in Kaposi’s sarcoma-associated herpesvirus (KSHV) biology, including identification of novel genomic features and whole-genome KSHV diversities, global gene regulatory network profiling for intricate transcriptome analyses, and surveying of epigenetic marks (DNA methylation, modified histones, and chromatin remodelers) during de novo, latent, and productive KSHV infections. PMID:27043613

  11. BEAUTY: an enhanced BLAST-based search tool that integrates multiple biological information resources into sequence similarity search results.

    PubMed

    Worley, K C; Wiese, B A; Smith, R F

    1995-09-01

    BEAUTY (BLAST enhanced alignment utility) is an enhanced version of the NCBI's BLAST data base search tool that facilitates identification of the functions of matched sequences. We have created new data bases of conserved regions and functional domains for protein sequences in NCBI's Entrez data base, and BEAUTY allows this information to be incorporated directly into BLAST search results. A Conserved Regions Data Base, containing the locations of conserved regions within Entrez protein sequences, was constructed by (1) clustering the entire data base into families, (2) aligning each family using our PIMA multiple sequence alignment program, and (3) scanning the multiple alignments to locate the conserved regions within each aligned sequence. A separate Annotated Domains Data Base was constructed by extracting the locations of all annotated domains and sites from sequences represented in the Entrez, PROSITE, BLOCKS, and PRINTS data bases. BEAUTY performs a BLAST search of those Entrez sequences with conserved regions and/or annotated domains. BEAUTY then uses the information from the Conserved Regions and Annotated Domains data bases to generate, for each matched sequence, a schematic display that allows one to directly compare the relative locations of (1) the conserved regions, (2) annotated domains and sites, and (3) the locally aligned regions matched in the BLAST search. In addition, BEAUTY search results include World-Wide Web hypertext links to a number of external data bases that provide a variety of additional types of information on the function of matched sequences. This convenient integration of protein families, conserved regions, annotated domains, alignment displays, and World-Wide Web resources greatly enhances the biological informativeness of sequence similarity searches. BEAUTY searches can be performed remotely on our system using the "BCM Search Launcher" World-Wide Web pages (URL is < http:/ /gc.bcm.tmc.edu:8088/ search-launcher/launcher.html > ).

  12. Implicit Sequence Learning in Dyslexia: A Within-Sequence Comparison of First- and Higher-Order Information

    ERIC Educational Resources Information Center

    Du, Wenchong; Kelly, Steve W.

    2013-01-01

    The present study examines implicit sequence learning in adult dyslexics with a focus on comparing sequence transitions with different statistical complexities. Learning of a 12-item deterministic sequence was assessed in 12 dyslexic and 12 non-dyslexic university students. Both groups showed equivalent standard reaction time increments when the…

  13. Implicit Sequence Learning in Dyslexia: A Within-Sequence Comparison of First- and Higher-Order Information

    ERIC Educational Resources Information Center

    Du, Wenchong; Kelly, Steve W.

    2013-01-01

    The present study examines implicit sequence learning in adult dyslexics with a focus on comparing sequence transitions with different statistical complexities. Learning of a 12-item deterministic sequence was assessed in 12 dyslexic and 12 non-dyslexic university students. Both groups showed equivalent standard reaction time increments when the…

  14. The InDeVal insertion/deletion evaluation tool: a program for finding target regions in DNA sequences and for aiding in sequence comparison.

    PubMed

    Stoneberg Holt, Sierra D; Holt, Jason A

    2004-10-29

    The program InDeVal was originally developed to help researchers find known regions of insertion/deletion activity (with the exception of isolated single-base indels) in newly determined Poaceae trnL-F sequences and compare them with 533 previously determined sequences. It is supplied with input files designed for this purpose. More broadly, the program is applicable for finding specific target regions (referred to as "variable regions") in DNA sequence. A variable region is any specific sequence fragment of interest, such as an indel region, a codon or codons, or sequence coding for a particular RNA secondary structure. InDeVal input is DNA sequence and a template file (sequence flanking each variable region). Additional files contain the variable regions and user-defined messages about the sequence found within them (e.g., taxa sharing each of the different indel patterns).Variable regions are found by determining the position of flanking sequence (referred to as "conserved regions") using the LPAM (Length-Preserving Alignment Method) algorithm. This algorithm was designed for InDeVal and is described here for the first time. InDeVal output is an interactive display of the analyzed sequence, broken into user-defined units. Once the user is satisfied with the organization of the display, the information can be exported to an annotated text file. InDeVal can find multiple variable regions simultaneously (28 indel regions in the Poaceae trnL-F files) and display user-selected messages specific to the sequence variants found. InDeVal output is designed to facilitate comparison between the analyzed sequence and previously evaluated sequence. The program's sensitivity to different levels of nucleotide and/or length variation in conserved regions can be adjusted. InDeVal is currently available for Windows in Additional file 1 or from http://www.sci.muni.cz/botany/elzdroje/indeval/.

  15. The InDeVal insertion/deletion evaluation tool: a program for finding target regions in DNA sequences and for aiding in sequence comparison

    PubMed Central

    Stoneberg Holt, Sierra D; Holt, Jason A

    2004-01-01

    Background The program InDeVal was originally developed to help researchers find known regions of insertion/deletion activity (with the exception of isolated single-base indels) in newly determined Poaceae trnL-F sequences and compare them with 533 previously determined sequences. It is supplied with input files designed for this purpose. More broadly, the program is applicable for finding specific target regions (referred to as "variable regions") in DNA sequence. A variable region is any specific sequence fragment of interest, such as an indel region, a codon or codons, or sequence coding for a particular RNA secondary structure. Results InDeVal input is DNA sequence and a template file (sequence flanking each variable region). Additional files contain the variable regions and user-defined messages about the sequence found within them (e.g., taxa sharing each of the different indel patterns). Variable regions are found by determining the position of flanking sequence (referred to as "conserved regions") using the LPAM (Length-Preserving Alignment Method) algorithm. This algorithm was designed for InDeVal and is described here for the first time. InDeVal output is an interactive display of the analyzed sequence, broken into user-defined units. Once the user is satisfied with the organization of the display, the information can be exported to an annotated text file. Conclusions InDeVal can find multiple variable regions simultaneously (28 indel regions in the Poaceae trnL-F files) and display user-selected messages specific to the sequence variants found. InDeVal output is designed to facilitate comparison between the analyzed sequence and previously evaluated sequence. The program's sensitivity to different levels of nucleotide and/or length variation in conserved regions can be adjusted. InDeVal is currently available for Windows in Additional file 1 or from . PMID:15516260

  16. A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome.

    PubMed

    Allali, Imane; Arnold, Jason W; Roach, Jeffrey; Cadenas, Maria Belen; Butz, Natasha; Hassan, Hosni M; Koci, Matthew; Ballou, Anne; Mendoza, Mary; Ali, Rizwana; Azcarate-Peril, M Andrea

    2017-09-13

    Advancements in Next Generation Sequencing (NGS) technologies regarding throughput, read length and accuracy had a major impact on microbiome research by significantly improving 16S rRNA amplicon sequencing. As rapid improvements in sequencing platforms and new data analysis pipelines are introduced, it is essential to evaluate their capabilities in specific applications. The aim of this study was to assess whether the same project-specific biological conclusions regarding microbiome composition could be reached using different sequencing platforms and bioinformatics pipelines. Chicken cecum microbiome was analyzed by 16S rRNA amplicon sequencing using Illumina MiSeq, Ion Torrent PGM, and Roche 454 GS FLX Titanium platforms, with standard and modified protocols for library preparation. We labeled the bioinformatics pipelines included in our analysis QIIME1 and QIIME2 (de novo OTU picking [not to be confused with QIIME version 2 commonly referred to as QIIME2]), QIIME3 and QIIME4 (open reference OTU picking), UPARSE1 and UPARSE2 (each pair differs only in the use of chimera depletion methods), and DADA2 (for Illumina data only). GS FLX+ yielded the longest reads and highest quality scores, while MiSeq generated the largest number of reads after quality filtering. Declines in quality scores were observed starting at bases 150-199 for GS FLX+ and bases 90-99 for MiSeq. Scores were stable for PGM-generated data. Overall microbiome compositional profiles were comparable between platforms; however, average relative abundance of specific taxa varied depending on sequencing platform, library preparation method, and bioinformatics analysis. Specifically, QIIME with de novo OTU picking yielded the highest number of unique species and alpha diversity was reduced with UPARSE and DADA2 compared to QIIME. The three platforms compared in this study were capable of discriminating samples by treatment, despite differences in diversity and abundance, leading to similar biological

  17. Complete Genome Sequence of the Mosquitocidal Bacterium Bacillus sphaericus C3-41 and Comparison with Those of Closely Related Bacillus Species▿ †

    PubMed Central

    Hu, Xiaomin; Fan, Wei; Han, Bei; Liu, Haizhou; Zheng, Dasheng; Li, Qibin; Dong, Wei; Yan, Jianping; Gao, Meiying; Berry, Colin; Yuan, Zhiming

    2008-01-01

    Bacillus sphaericus strain C3-41 is an aerobic, mesophilic, spore-forming bacterium that has been used with great success in mosquito control programs worldwide. Genome sequencing revealed that the complete genome of this entomopathogenic bacterium is composed of a chromosomal replicon of 4,639,821 bp and a plasmid replicon of 177,642 bp, containing 4,786 and 186 potential protein-coding sequences, respectively. Comparison of the genome with other published sequences indicated that the B. sphaericus C3-41 chromosome is most similar to that of Bacillus sp. strain NRRL B-14905, a marine species that, like B. sphaericus, is unable to metabolize polysaccharides. The lack of key enzymes and sugar transport systems in the two bacteria appears to be the main reason for this inability, and the abundance of proteolytic enzymes and transport systems may endow these bacteria with exclusive metabolic pathways for a wide variety of organic compounds and amino acids. The genes shared between B. sphaericus C3-41 and Bacillus sp. strain NRRL B-14905, including mobile genetic elements, membrane-associated proteins, and transport systems, demonstrated that these two species are a biologically and phylogenetically divergent group. Knowledge of the genome sequence of B. sphaericus C3-41 thus increases our understanding of the bacilli and may also offer prospects for future genetic improvement of this important biological control agent. PMID:18296527

  18. Structural biology of disease-associated repetitive DNA sequences and protein-DNA complexes involved in DNA damage and repair

    SciTech Connect

    Gupta, G.; Santhana Mariappan, S.V.; Chen, X.; Catasti, P.; Silks, L.A. III; Moyzis, R.K.; Bradbury, E.M.; Garcia, A.E.

    1997-07-01

    This project is aimed at formulating the sequence-structure-function correlations of various microsatellites in the human (and other eukaryotic) genomes. Here the authors have been able to develop and apply structure biology tools to understand the following: the molecular mechanism of length polymorphism microsatellites; the molecular mechanism by which the microsatellites in the noncoding regions alter the regulation of the associated gene; and finally, the molecular mechanism by which the expansion of these microsatellites impairs gene expression and causes the disease. Their multidisciplinary structural biology approach is quantitative and can be applied to all coding and noncoding DNA sequences associated with any gene. Both NIH and DOE are interested in developing quantitative tools for understanding the function of various human genes for prevention against diseases caused by genetic and environmental effects.

  19. Biologic diversity of polyomavirus BK genomic sequences: Implications for molecular diagnostic laboratories.

    PubMed

    Luo, C; Bueno, M; Kant, J; Randhawa, P

    2008-10-01

    Data on polyomavirus genomic diversity has greatly expanded in the past few years. The implications of viral DNA sequence variation on the performance of molecular diagnostic assays have not been systematically examined. 716 BK, 1626 JC, and 73 SV40 virus sequences available in GenBank were aligned using Clustal-X. Five different published BKV PCR assays currently in use at major medical centers were evaluated for primer and probe mismatches with available GenBank sequences. Coverage of naturally occurring BKV strains varied amongst different assay methods. Targeted viral sequences showed major mismatch with primer or probe sequence in up to 30.7% of known BKV strains. BKV subtypes IVa, IVb, and IVc were more prone to this problem, reflecting common use of Type I Dun sequence for assay design. Despite the known polymorphism of this gene, 484 VP-1 sequences with conserved areas potentially suitable for PCR assay design are available. Assay targets in the Large T-antigen and agnogene are less subject to genetic variation, but sequence information corresponding to the latter two genes is available only for 164 and 174 published strains, respectively. Cross reactivity of appropriately selected BKV primers with JCV and SV40 sequences available in current databases was not a significant problem.

  20. Production of biologically active recombinant goose FSH in a single chain form with a CTP linker sequence.

    PubMed

    Li, Hui; Zhu, Huanxi; Qin, Qinming; Lei, Mingming; Shi, Zhendan

    2017-02-01

    FSH is a glycoprotein hormone secreted by the pituitary gland that is essential for gonadal development and reproductive function. In avian reproduction study, especially in avian reproduction hormone study, it is hindered by the lack of biologically active FSH. In order to overcome this shortcoming, we prepared recombinant goose FSH as a single chain molecule and tested its biological activities in the present study. Coding sequences for mature peptides of goose FSH α and β subunits were amplified from goose pituitary cDNA. A chimeric gene containing α and β subunit sequences linked by the hCG carboxyl terminal peptide coding sequence was constructed. The recombinant gene was inserted into the pcDNA3.1-Fc eukaryotic expression vector to form pcDNA-Fc-gFSHβ-CTP-α and then transfected into 293-F cells. A recombinant, single chain goose FSH was expressed and verified by SDS-PAGE and western blot analysis, and was purified using Protein A agarose affinity and gel filtration chromatography. Biological activity analysis results showed that the recombinant, chimeric goose FSH possesses the function of stimulating estradiol secretion and cell proliferation, in cultured chicken granulosa cells. These results indicated that bioactive, recombinant goose FSH has been successfully prepared in vitro. The recombinant goose FSH will have the potential of being used as a research tool for studying avian reproductive activities, and as a standard for developing avian FSH bioassays.

  1. Protein identities from 'Graphocephala atropunctata' expressed sequence tags: Expanding leafhopper vector biology

    USDA-ARS?s Scientific Manuscript database

    Heat shock proteins and 44 protein sequences from the blue-green sharpshooter, BGSS, were produced and identified. The sequences were submitted and published under accession numbers: DQ445499-DQ445542, in the National Center for Biotechnology Information, NCBI, Public Database. The blue-green sharps...

  2. Biological characterization and complete genomic sequence of Carrot thin leaf virus

    USDA-ARS?s Scientific Manuscript database

    The host range of a cilantro isolate of Carrot thin leaf virus (CTLV-Cs) was determined to include 15 plant species. The virus was also transmitted to 9 of 11 tested apiaceous species by aphids. Complete genomic sequences of CTLV-Cs and a carrot isolate of CTLV were determined. Their genomic sequenc...

  3. Substrate-Driven Mapping of the Degradome by Comparison of Sequence Logos

    PubMed Central

    Fuchs, Julian E.; von Grafenstein, Susanne; Huber, Roland G.; Kramer, Christian; Liedl, Klaus R.

    2013-01-01

    Sequence logos are frequently used to illustrate substrate preferences and specificity of proteases. Here, we employed the compiled substrates of the MEROPS database to introduce a novel metric for comparison of protease substrate preferences. The constructed similarity matrix of 62 proteases can be used to intuitively visualize similarities in protease substrate readout via principal component analysis and construction of protease specificity trees. Since our new metric is solely based on substrate data, we can engraft the protease tree including proteolytic enzymes of different evolutionary origin. Thereby, our analyses confirm pronounced overlaps in substrate recognition not only between proteases closely related on sequence basis but also between proteolytic enzymes of different evolutionary origin and catalytic type. To illustrate the applicability of our approach we analyze the distribution of targets of small molecules from the ChEMBL database in our substrate-based protease specificity trees. We observe a striking clustering of annotated targets in tree branches even though these grouped targets do not necessarily share similarity on protein sequence level. This highlights the value and applicability of knowledge acquired from peptide substrates in drug design of small molecules, e.g., for the prediction of off-target effects or drug repurposing. Consequently, our similarity metric allows to map the degradome and its associated drug target network via comparison of known substrate peptides. The substrate-driven view of protein-protein interfaces is not limited to the field of proteases but can be applied to any target class where a sufficient amount of known substrate data is available. PMID:24244149

  4. Nutrient removal, microbial community and sludge settlement in anaerobic/aerobic sequencing batch reactors without enhanced biological phosphorus removal.

    PubMed

    Wu, Guangxue; Rodgers, Michael

    2010-01-01

    Nutrient removal, microbial community and sludge settlement were examined in two 3-litre laboratory-scale anaerobic/aerobic sequencing batch reactors (SBRs). One SBR was operated at 10 degrees C and the other SBR at 20 degrees C. Different from conventional enhanced biological phosphorus removal, most of the soluble sodium acetate was removed in the aerobic phase and no organic carbon uptake or biological phosphorus release occurred in the anaerobic phase. In this type of anaerobic/aerobic SBR, the phosphorus removal and sludge settlement seemed to be unstable, and the dominant microorganism was Zoogloea sp. Although no excess biological phosphorus removal occurred, extracellular phosphorus precipitation contributed a significant proportion to total phosphorus removed. Sludge volume index decreased with increasing phosphorus contents in the biomass under all conditions. The functions of extracellular polymeric substances in sludge settlement and phosphorus removal depended on the environmental conditions applied.

  5. Comparison of Mycoplasma pneumoniae Genome Sequences from Strains Isolated from Symptomatic and Asymptomatic Patients

    PubMed Central

    Spuesens, Emiel B. M.; Brouwer, Rutger W. W.; Mol, Kristin H. J. M.; Hoogenboezem, Theo; Kockx, Christel E. M.; Jansen, Ruud; Van IJcken, Wilfred F. J.; Van Rossum, Annemarie M. C.; Vink, Cornelis

    2016-01-01

    Mycoplasma pneumoniae is a common cause of respiratory tract infections (RTIs) in children. We recently demonstrated that this bacterium can be carried asymptomatically in the respiratory tract of children. To identify potential genetic differences between M. pneumoniae strains that are carried asymptomatically and those that cause symptomatic infections, we performed whole-genome sequence analysis of 20 M. pneumoniae strains. The analyzed strains included 3 reference strains, 3 strains isolated from asymptomatic children, 13 strains isolated from clinically well-defined patients suffering from an upper (n = 4) or lower (n = 9) RTI, and one strain isolated from a follow-up patient who recently recovered from an RTI. The obtained sequences were each compared to the sequences of the reference strains. To find differences between strains isolated from asymptomatic and symptomatic individuals, a variant comparison was performed between the different groups of strains. Irrespective of the group (asymptomatic vs. symptomatic) from which the strains originated, subtype 1 and subtype 2 strains formed separate clusters. We could not identify a specific genotype associated with M. pneumoniae virulence. However, we found marked genetic differences between clinical isolates and the reference strains, which indicated that the latter strains may not be regarded as appropriate representatives of circulating M. pneumoniae strains. PMID:27833597

  6. Thermal adaptation analyzed by comparison of protein sequences from mesophilic and extremely thermophilic Methanococcus species

    NASA Technical Reports Server (NTRS)

    Haney, P. J.; Badger, J. H.; Buldak, G. L.; Reich, C. I.; Woese, C. R.; Olsen, G. J.

    1999-01-01

    The genome sequence of the extremely thermophilic archaeon Methanococcus jannaschii provides a wealth of data on proteins from a thermophile. In this paper, sequences of 115 proteins from M. jannaschii are compared with their homologs from mesophilic Methanococcus species. Although the growth temperatures of the mesophiles are about 50 degrees C below that of M. jannaschii, their genomic G+C contents are nearly identical. The properties most correlated with the proteins of the thermophile include higher residue volume, higher residue hydrophobicity, more charged amino acids (especially Glu, Arg, and Lys), and fewer uncharged polar residues (Ser, Thr, Asn, and Gln). These are recurring themes, with all trends applying to 83-92% of the proteins for which complete sequences were available. Nearly all of the amino acid replacements most significantly correlated with the temperature change are the same relatively conservative changes observed in all proteins, but in the case of the mesophile/thermophile comparison there is a directional bias. We identify 26 specific pairs of amino acids with a statistically significant (P < 0.01) preferred direction of replacement.

  7. Thermal adaptation analyzed by comparison of protein sequences from mesophilic and extremely thermophilic Methanococcus species

    NASA Technical Reports Server (NTRS)

    Haney, P. J.; Badger, J. H.; Buldak, G. L.; Reich, C. I.; Woese, C. R.; Olsen, G. J.

    1999-01-01

    The genome sequence of the extremely thermophilic archaeon Methanococcus jannaschii provides a wealth of data on proteins from a thermophile. In this paper, sequences of 115 proteins from M. jannaschii are compared with their homologs from mesophilic Methanococcus species. Although the growth temperatures of the mesophiles are about 50 degrees C below that of M. jannaschii, their genomic G+C contents are nearly identical. The properties most correlated with the proteins of the thermophile include higher residue volume, higher residue hydrophobicity, more charged amino acids (especially Glu, Arg, and Lys), and fewer uncharged polar residues (Ser, Thr, Asn, and Gln). These are recurring themes, with all trends applying to 83-92% of the proteins for which complete sequences were available. Nearly all of the amino acid replacements most significantly correlated with the temperature change are the same relatively conservative changes observed in all proteins, but in the case of the mesophile/thermophile comparison there is a directional bias. We identify 26 specific pairs of amino acids with a statistically significant (P < 0.01) preferred direction of replacement.

  8. Thermal adaptation analyzed by comparison of protein sequences from mesophilic and extremely thermophilic Methanococcus species

    PubMed Central

    Haney, Paul J.; Badger, Jonathan H.; Buldak, Gerald L.; Reich, Claudia I.; Woese, Carl R.; Olsen, Gary J.

    1999-01-01

    The genome sequence of the extremely thermophilic archaeon Methanococcus jannaschii provides a wealth of data on proteins from a thermophile. In this paper, sequences of 115 proteins from M. jannaschii are compared with their homologs from mesophilic Methanococcus species. Although the growth temperatures of the mesophiles are about 50°C below that of M. jannaschii, their genomic G+C contents are nearly identical. The properties most correlated with the proteins of the thermophile include higher residue volume, higher residue hydrophobicity, more charged amino acids (especially Glu, Arg, and Lys), and fewer uncharged polar residues (Ser, Thr, Asn, and Gln). These are recurring themes, with all trends applying to 83–92% of the proteins for which complete sequences were available. Nearly all of the amino acid replacements most significantly correlated with the temperature change are the same relatively conservative changes observed in all proteins, but in the case of the mesophile/thermophile comparison there is a directional bias. We identify 26 specific pairs of amino acids with a statistically significant (P < 0.01) preferred direction of replacement. PMID:10097079

  9. Thermal adaptation analyzed by comparison of protein sequences from mesophilic and extremely thermophilic Methanococcus species.

    PubMed

    Haney, P J; Badger, J H; Buldak, G L; Reich, C I; Woese, C R; Olsen, G J

    1999-03-30

    The genome sequence of the extremely thermophilic archaeon Methanococcus jannaschii provides a wealth of data on proteins from a thermophile. In this paper, sequences of 115 proteins from M. jannaschii are compared with their homologs from mesophilic Methanococcus species. Although the growth temperatures of the mesophiles are about 50 degrees C below that of M. jannaschii, their genomic G+C contents are nearly identical. The properties most correlated with the proteins of the thermophile include higher residue volume, higher residue hydrophobicity, more charged amino acids (especially Glu, Arg, and Lys), and fewer uncharged polar residues (Ser, Thr, Asn, and Gln). These are recurring themes, with all trends applying to 83-92% of the proteins for which complete sequences were available. Nearly all of the amino acid replacements most significantly correlated with the temperature change are the same relatively conservative changes observed in all proteins, but in the case of the mesophile/thermophile comparison there is a directional bias. We identify 26 specific pairs of amino acids with a statistically significant (P < 0.01) preferred direction of replacement.

  10. Performance evaluation of Sanger sequencing for the diagnosis of primary hyperoxaluria and comparison with targeted next generation sequencing

    PubMed Central

    Williams, Emma L; Bagg, Eleanor A L; Mueller, Michael; Vandrovcova, Jana; Aitman, Timothy J; Rumsby, Gill

    2015-01-01

    Definitive diagnosis of primary hyperoxaluria (PH) currently utilizes sequential Sanger sequencing of the AGXT, GRPHR, and HOGA1 genes but efficacy is unproven. This analysis is time-consuming, relatively expensive, and delays in diagnosis and inappropriate treatment can occur if not pursued early in the diagnostic work-up. We reviewed testing outcomes of Sanger sequencing in 200 consecutive patient samples referred for analysis. In addition, the Illumina Truseq custom amplicon system was evaluated for paralleled next-generation sequencing (NGS) of AGXT,GRHPR, and HOGA1 in 90 known PH patients. AGXT sequencing was requested in all patients, permitting a diagnosis of PH1 in 50%. All remaining patients underwent targeted exon sequencing of GRHPR and HOGA1 with 8% diagnosed with PH2 and 8% with PH3. Complete sequencing of both GRHPR and HOGA1 was not requested in 25% of patients referred leaving their diagnosis in doubt. NGS analysis showed 98% agreement with Sanger sequencing and both approaches had 100% diagnostic specificity. Diagnostic sensitivity of Sanger sequencing was 98% and for NGS it was 97%. NGS has comparable diagnostic performance to Sanger sequencing for the diagnosis of PH and, if implemented, would screen for all forms of PH simultaneously ensuring prompt diagnosis at decreased cost. PMID:25629080

  11. A cross-cultural comparison of biology lessons between China and Germany: a video study

    NASA Astrophysics Data System (ADS)

    Liu, Ning; Neuhaus, Birgit Jana

    2017-08-01

    Given the globalization of science education and the different cultures between China and Germany, we tried to compare and explain the differences on teacher questions and real life instances in biology lessons between the two countries from a culture-related perspective. 22 biology teachers from China and 21 biology teachers from Germany participated in this study. Each teacher was videotaped for one lesson on the unit blood and circulatory system. Before the teaching unit, students' prior knowledge was tested with a pretest. After the teaching unit, students' content knowledge was tested with a posttest. The aim of the knowledge tests here was for the better selection of the four samples for qualitative comparison in the two countries. The quantitative analysis showed that more lower-order teacher questions and more real life instances that were introduced after learning relevant concepts were in Chinese lessons than in German lessons. There were no significant differences in the frequency of higher-order questions or real life instances that were introduced before learning concepts. Qualitative analysis showed that both German teachers guided students to analyze the reasoning process of Landsteiner experiment, but nor Chinese teachers did that. The findings reflected the subtle influence of culture on classroom teaching. Relatively, Chinese biology teachers focused more on learning content and the application of the content in real life; German biology teachers emphasized more on invoking students' reasoning and divergent thinking.

  12. Analysis of expressed sequence tags from Uromyces appendiculatus hyphae and haustoria and their comparison to sequences from other rust fungi

    USDA-ARS?s Scientific Manuscript database

    Two separate cDNA libraries were prepared for RNA extracted from bean rust (Uromyces appendiculatus) hyphae and haustoria isolated from infected leaves bean leaves (Phaseolus vulgaris cv Pint 111) between 2 and 8 dpi. Approximately 13,000 clones were sequenced from both ends and the sequences assem...

  13. What Next? The Next Transit from Biology to Diagnostics: Next Generation Sequencing for Immunogenetics

    PubMed Central

    Gabriel, Christian; Stabentheiner, Stephanie; Danzer, Martin; Pröll, Johannes

    2011-01-01

    The human genome project triggered the introduction of next generation sequencing (NGS) systems. Although originally developed for total genome sequencing, metagenomics and plant genetics, the ultra-deep sequencing feature of NGS was utilized for diagnostic purposes in HIV resistance and tropism as well in detecting new mutations and tumor clones in oncology. Recent publications exploited the feature of clonal sequencing for immunogenetics to dissolve the growing number of ambiguities. This concept is quite reliable if all exons of interest are tested and the amplification region includes flanking introns. Challenging questions on quality control, cost effectiveness, workflow, and management of enormous loads of data remain if NGS is considered as routine method in the immunogenetics laboratory. If solved, NGS has big potential to have a major impact on immunogenetics by way of providing ambiguity-free HLA-typing results faster, but will also have a great influence on how immunogenetics testing and workflows are organized. PMID:22670120

  14. Protein science by DNA sequencing: how advances in molecular biology are accelerating biochemistry.

    PubMed

    Higgins, Sean Andrew; Savage, David F

    2017-10-09

    A fundamental goal of protein biochemistry is to determine the sequence-function relationship, but the vastness of sequence space makes comprehensive evaluation of this landscape difficult. However, advances in DNA synthesis and sequencing now allow researchers to assess the functional impact of every single mutation in many proteins, but challenges remain in library construction and the development of general assays applicable to a diverse range of protein functions. This perspective briefly outlines the technical innovations in DNA manipulation which allow massively parallel protein biochemistry, then summarizes the methods currently available for library construction and the functional assays of protein variants. Areas in need of future innovation are highlighted with a particular focus on assay development and the use of computational analysis with machine learning to effectively traverse the sequence-function landscape. Finally, applications in the fundamentals of protein biochemistry, disease prediction, and protein engineering are presented.

  15. Complete genome sequences of two biologically distinct isolates of Asparagus virus 1.

    PubMed

    Blockus, S; Lesker, T; Maiss, E

    2015-02-01

    The complete genome sequences of two asparagus virus 1 (AV-1) isolates differing in their ability to cause systemic infection in Nicotiana benthamiana were determined. Their genomes had 9,741 nucleotides excluding the 3'-terminal poly(A) tail, encoded a polyprotein of 3,112 amino acids, and shared 99.6 % nucleotide sequence identity. They differed at 37 nucleotide and 15 amino acid sequence positions (99.5 % identity) scattered over the polyprotein. The closest relatives of AV-1 in amino acid sequence identity were plum pox virus (54 %) and turnip mosaic virus (53 %), corroborating the classification of AV-1 as a member of a distinct species in the genus Potyvirus.

  16. The future role of next-generation DNA sequencing and metagenetics in aquatic biology monitoring programs

    EPA Science Inventory

    The development of current biological monitoring and bioassessment programs was a drastic improvement over previous programs created for monitoring a limited number of specific chemical pollutants. Although these assessment programs are better designed to address the transient an...

  17. The future role of next-generation DNA sequencing and metagenetics in aquatic biology monitoring programs

    EPA Science Inventory

    The development of current biological monitoring and bioassessment programs was a drastic improvement over previous programs created for monitoring a limited number of specific chemical pollutants. Although these assessment programs are better designed to address the transient an...

  18. A System to Automatically Classify and Name Any Individual Genome-Sequenced Organism Independently of Current Biological Classification and Nomenclature

    PubMed Central

    Song, Yuhyun; Leman, Scotland; Monteil, Caroline L.; Heath, Lenwood S.; Vinatzer, Boris A.

    2014-01-01

    A broadly accepted and stable biological classification system is a prerequisite for biological sciences. It provides the means to describe and communicate about life without ambiguity. Current biological classification and nomenclature use the species as the basic unit and require lengthy and laborious species descriptions before newly discovered organisms can be assigned to a species and be named. The current system is thus inadequate to classify and name the immense genetic diversity within species that is now being revealed by genome sequencing on a daily basis. To address this lack of a general intra-species classification and naming system adequate for today’s speed of discovery of new diversity, we propose a classification and naming system that is exclusively based on genome similarity and that is suitable for automatic assignment of codes to any genome-sequenced organism without requiring any phenotypic or phylogenetic analysis. We provide examples demonstrating that genome similarity-based codes largely align with current taxonomic groups at many different levels in bacteria, animals, humans, plants, and viruses. Importantly, the proposed approach is only slightly affected by the order of code assignment and can thus provide codes that reflect similarity between organisms and that do not need to be revised upon discovery of new diversity. We envision genome similarity-based codes to complement current biological nomenclature and to provide a universal means to communicate unambiguously about any genome-sequenced organism in fields as diverse as biodiversity research, infectious disease control, human and microbial forensics, animal breed and plant cultivar certification, and human ancestry research. PMID:24586551

  19. Spontaneous vs. Coherent Raman Scattering: A Comparison Under Biologically Relevant Conditions

    NASA Astrophysics Data System (ADS)

    Nichols, Sarah R.; Bachler, Brandon R.; Cui, Meng; Ogilvie, Jennifer P.

    2009-05-01

    Coherent anti-Stokes Raman scattering (CARS) microscopy has become an active field of research due to the intrinsic molecular contrast it provides. Coherent signals such as CARS have been shown to be orders of magnitude larger than those obtained with spontaneous Raman scattering under certain conditions. However, under conditions appropriate for biological imaging, there has been a lack of systematic comparison between spontaneous and coherent Raman scattering signals. We perform such a comparison imaging study on polystyrene beads and find comparable signal levels for coherent Stokes Raman scattering (CSRS) and spontaneous Stokes scattering, contrary to many reports in the CARS microscopy literature. We present calculations to support the measurements, and discuss the implications for biological imaging. The advantages provided by coherent methods are mitigated in biological samples by the low incident power (˜1mW), short interaction lengths, and low concentrations. The nature of the sample and the necessary imaging conditions must be considered when choosing between coherent and spontaneous Raman methods.

  20. Identification of Campylobacter spp. and discrimination from Helicobacter and Arcobacter spp. by direct sequencing of PCR-amplified cpn60 sequences and comparison to cpnDB, a chaperonin reference sequence database.

    PubMed

    Hill, Janet E; Paccagnella, Ana; Law, Kee; Melito, Pasquale L; Woodward, David L; Price, Lawrence; Leung, Amy H; Ng, Lai-King; Hemmingsen, Sean M; Goh, Swee Han

    2006-04-01

    A robust method for the identification of Campylobacter spp. based on direct sequencing of PCR-amplified partial cpn60 sequences and comparison of these to a reference database of cpn60 sequences is reported. A total of 53 Campylobacter isolates, representing 15 species, were identified and distinguished from phenotypically similar Helicobacter and Arcobacter strains. Pairwise cpn60 sequence identities between Campylobacter spp. ranged from 71 to 92 %, with most between 71 and 79 %, making discrimination of these species obvious. The method described overcomes limitations of existing PCR-based methods, which require time-consuming and complex post-amplification steps such as the cloning of amplification products. The results of this study demonstrate the potential for use of the reference chaperonin sequence database, cpnDB, as a tool for identification of bacterial isolates based on cpn60 sequences amplified with universal primers.

  1. Effects of linker sequence modifications on the structure, stability, and biological activity of a cyclic α-conotoxin.

    PubMed

    Carstens, Bodil B; Swedberg, Joakim; Berecki, Géza; Adams, David J; Craik, David J; Clark, Richard J

    2016-11-01

    The cyclic conotoxin analogue cVc1.1 is a promising lead molecule for the development of new treatments for neuropathic and chronic pain. The design of this peptide includes a linker sequence that joins the N and C termini together, improving peptide stability while maintaining the structure and activity of the original linear Vc1.1. The effect of linker length on the structure, activity and stability of cyclised conotoxins has been studied previously but the effect of altering the composition of the linker sequence has not been investigated. In this study, we designed three analogues of cVc1.1 with linker sequences that varied in charge, hydrophobicity and hydrogen bonding capacity and examined the effect on structure, stability, membrane permeability and biological activity. The three designed peptides were successfully synthesized using solid phase peptide synthesis approaches and had similar structures and stability compared with cVc1.1. Despite modifications in charge, hydrophobicity and hydrogen bonding potential, which are all factors that can affect membrane permeability, no changes in the ability of the peptides to pass through membranes in either PAMPA or Caco-2 cell assay were observed. Surprisingly, modification of the linker sequence was deleterious to biological activity. These results suggest the linker sequence might be a useful part of the molecule for optimization of bioactivity and not just the physiochemical properties of cVc1.1. © 2016 Wiley Periodicals, Inc. Biopolymers (Pept Sci) 106: 864-875, 2016. © 2016 Wiley Periodicals, Inc.

  2. Sequencing and characterizing the genome of Estrella lausannensis as an undergraduate project: training students and biological insights

    PubMed Central

    Bertelli, Claire; Aeby, Sébastien; Chassot, Bérénice; Clulow, James; Hilfiker, Olivier; Rappo, Samuel; Ritzmann, Sébastien; Schumacher, Paolo; Terrettaz, Céline; Benaglio, Paola; Falquet, Laurent; Farinelli, Laurent; Gharib, Walid H.; Goesmann, Alexander; Harshman, Keith; Linke, Burkhard; Miyazaki, Ryo; Rivolta, Carlo; Robinson-Rechavi, Marc; van der Meer, Jan Roelof; Greub, Gilbert

    2015-01-01

    With the widespread availability of high-throughput sequencing technologies, sequencing projects have become pervasive in the molecular life sciences. The huge bulk of data generated daily must be analyzed further by biologists with skills in bioinformatics and by “embedded bioinformaticians,” i.e., bioinformaticians integrated in wet lab research groups. Thus, students interested in molecular life sciences must be trained in the main steps of genomics: sequencing, assembly, annotation and analysis. To reach that goal, a practical course has been set up for master students at the University of Lausanne: the “Sequence a genome” class. At the beginning of the academic year, a few bacterial species whose genome is unknown are provided to the students, who sequence and assemble the genome(s) and perform manual annotation. Here, we report the progress of the first class from September 2010 to June 2011 and the results obtained by seven master students who specifically assembled and annotated the genome of Estrella lausannensis, an obligate intracellular bacterium related to Chlamydia. The draft genome of Estrella is composed of 29 scaffolds encompassing 2,819,825 bp that encode for 2233 putative proteins. Estrella also possesses a 9136 bp plasmid that encodes for 14 genes, among which we found an integrase and a toxin/antitoxin module. Like all other members of the Chlamydiales order, Estrella possesses a highly conserved type III secretion system, considered as a key virulence factor. The annotation of the Estrella genome also allowed the characterization of the metabolic abilities of this strictly intracellular bacterium. Altogether, the students provided the scientific community with the Estrella genome sequence and a preliminary understanding of the biology of this recently-discovered bacterial genus, while learning to use cutting-edge technologies for sequencing and to perform bioinformatics analyses. PMID:25745418

  3. Initial sequence of the chimpanzee genome and comparison with the human genome.

    PubMed

    2005-09-01

    Here we present a draft genome sequence of the common chimpanzee (Pan troglodytes). Through comparison with the human genome, we have generated a largely complete catalogue of the genetic differences that have accumulated since the human and chimpanzee species diverged from our common ancestor, constituting approximately thirty-five million single-nucleotide changes, five million insertion/deletion events, and various chromosomal rearrangements. We use this catalogue to explore the magnitude and regional variation of mutational forces shaping these two genomes, and the strength of positive and negative selection acting on their genes. In particular, we find that the patterns of evolution in human and chimpanzee protein-coding genes are highly correlated and dominated by the fixation of neutral and slightly deleterious alleles. We also use the chimpanzee genome as an outgroup to investigate human population genetics and identify signatures of selective sweeps in recent human evolution.

  4. Estimation of error in bankfull width comparisons from temporally sequenced raw and corrected aerial photographs

    NASA Astrophysics Data System (ADS)

    Mount, N. J.; Louis, J.; Teeuw, R. M.; Zukowskyj, P. M.; Stott, T.

    2003-11-01

    This study investigates the propagation of error through image-to-image comparison of 285 river bankfull width measurements of the Afon Trannon, mid-Wales. Bankfull width is quantified from both aerial photographs analysed as rectified images in ERDAS Imagine OrthoMax and raw images in Paintshop Pro. A method for the robust estimation of bankfull width measurement error through temporal sequences of scanned aerial photographs is presented and the improvement in accuracy achieved using rectified imagery is quantified. Results from this study are placed in the context of previously published rates of bankfull width change, from a wide range of river scales, and the bankfull change rates for robust medium-term analysis using approximately 1:10,000 historical aerial photography are identified.

  5. Molecular, biological, and morphometric comparisons between different geographical populations of Rhipicephalus sanguineus sensu lato (Acari: Ixodidae).

    PubMed

    Sanches, Gustavo S; Évora, Patrícia M; Mangold, Atílio J; Jittapalapong, Sattaporn; Rodriguez-Mallon, Alina; Guzmán, Pedro E E; Bechara, Gervásio H; Camargo-Mathias, Maria I

    2016-01-15

    In this study, different geographical populations of Rhipicephalus sanguineus sensu lato were compared by molecular, biological, and morphometric methods. Phylogenetic trees were constructed using 12S and 16S rDNA sequences and showed two distinct clades: one composed of ticks from Brazil (Jaboticabal, SP), Cuba (Havana) Thailand (Bangkok) and the so-called "tropical strain" ticks. The second clade was composed of ticks from Spain (Zaragoza), Argentina (Rafaela, Santa Fe) and the so-called "temperate strain" ticks. Morphometric analysis showed good separation between females of the two clades and within the temperate clade. Males also exhibited separation between the two clades, but with some overlap. Multiple biological parameters revealed differences between the two clades, especially the weight of the engorged female. These results confirm the existence of at least two species under the name "R. sanguineus".

  6. Sequence-related amplified polymorphism (SRAP) markers: A potential resource for studies in plant molecular biology1

    PubMed Central

    Robarts, Daniel W. H.; Wolfe, Andrea D.

    2014-01-01

    In the past few decades, many investigations in the field of plant biology have employed selectively neutral, multilocus, dominant markers such as inter-simple sequence repeat (ISSR), random-amplified polymorphic DNA (RAPD), and amplified fragment length polymorphism (AFLP) to address hypotheses at lower taxonomic levels. More recently, sequence-related amplified polymorphism (SRAP) markers have been developed, which are used to amplify coding regions of DNA with primers targeting open reading frames. These markers have proven to be robust and highly variable, on par with AFLP, and are attained through a significantly less technically demanding process. SRAP markers have been used primarily for agronomic and horticultural purposes, developing quantitative trait loci in advanced hybrids and assessing genetic diversity of large germplasm collections. Here, we suggest that SRAP markers should be employed for research addressing hypotheses in plant systematics, biogeography, conservation, ecology, and beyond. We provide an overview of the SRAP literature to date, review descriptive statistics of SRAP markers in a subset of 171 publications, and present relevant case studies to demonstrate the applicability of SRAP markers to the diverse field of plant biology. Results of these selected works indicate that SRAP markers have the potential to enhance the current suite of molecular tools in a diversity of fields by providing an easy-to-use, highly variable marker with inherent biological significance. PMID:25202637

  7. Unique nucleotide sequence (UNS)-guided assembly of repetitive DNA parts for synthetic biology applications

    PubMed Central

    Torella, Joseph P.; Lienert, Florian; Boehm, Christian R.; Chen, Jan-Hung; Way, Jeffrey C.; Silver, Pamela A.

    2016-01-01

    Recombination-based DNA construction methods, such as Gibson assembly, have made it possible to easily and simultaneously assemble multiple DNA parts and hold promise for the development and optimization of metabolic pathways and functional genetic circuits. Over time, however, these pathways and circuits have become more complex, and the increasing need for standardization and insulation of genetic parts has resulted in sequence redundancies — for example repeated terminator and insulator sequences — that complicate recombination-based assembly. We and others have recently developed DNA assembly methods that we refer to collectively as unique nucleotide sequence (UNS)-guided assembly, in which individual DNA parts are flanked with UNSs to facilitate the ordered, recombination-based assembly of repetitive sequences. Here we present a detailed protocol for UNS-guided assembly that enables researchers to convert multiple DNA parts into sequenced, correctly-assembled constructs, or into high-quality combinatorial libraries in only 2–3 days. If the DNA parts must be generated from scratch, an additional 2–5 days are necessary. This protocol requires no specialized equipment and can easily be implemented by a student with experience in basic cloning techniques. PMID:25101822

  8. Unique nucleotide sequence-guided assembly of repetitive DNA parts for synthetic biology applications

    SciTech Connect

    Torella, JP; Lienert, F; Boehm, CR; Chen, JH; Way, JC; Silver, PA

    2014-08-07

    Recombination-based DNA construction methods, such as Gibson assembly, have made it possible to easily and simultaneously assemble multiple DNA parts, and they hold promise for the development and optimization of metabolic pathways and functional genetic circuits. Over time, however, these pathways and circuits have become more complex, and the increasing need for standardization and insulation of genetic parts has resulted in sequence redundancies-for example, repeated terminator and insulator sequences-that complicate recombination-based assembly. We and others have recently developed DNA assembly methods, which we refer to collectively as unique nucleotide sequence (UNS)-guided assembly, in which individual DNA parts are flanked with UNSs to facilitate the ordered, recombination-based assembly of repetitive sequences. Here we present a detailed protocol for UNS-guided assembly that enables researchers to convert multiple DNA parts into sequenced, correctly assembled constructs, or into high-quality combinatorial libraries in only 2-3 d. If the DNA parts must be generated from scratch, an additional 2-5 d are necessary. This protocol requires no specialized equipment and can easily be implemented by a student with experience in basic cloning techniques.

  9. Trace elemental content of biological materials. A comparison of NAA and ICP-MS analysis.

    PubMed

    Ward, N I; Abou-Shakra, F R; Durrant, S F

    1990-01-01

    The advantages and disadvantages of neutron activation analysis (NAA) and inductively coupled plasma-source mass spectrometry (ICP-MS) for the analysis of biological materials is reviewed. Comparison is made between NAA (instrumental) and ICP-MS (conventional pneumatic solution nebulization and laser ablation) analysis of the biological reference material National Bureau of Standards (NBS) SRM 1577 Bovine Liver. Relatively good agreement is achieved between the results for the 18 elements analyzed by both techniques and those either certified or reported in the literature. Elemental concentrations for Li, Mg, Al, Ca, Cr, Mn, Fe, Cu, Zn, Br, Rb, and Cs are also reported for IAEA Mixed Human Diet (H9), NBS SRM 909 Human Serum, and NBS SRM 1577a Bovine Liver, analyzed by solution nebulization ICP-MS.

  10. Comparison of biologically damaging spectral solar ultraviolet radiation at a southern hemisphere sub-tropical site.

    PubMed

    Parisi, A V; Sabburg, J; Kimlin, M G

    2003-04-21

    The first dataset of a complete year of biologically damaging spectral UV at a sub-tropical latitude in the southern hemisphere has been presented. The new data provides a baseline dataset against which comparisons can be made in the future to establish if there have been any long term trends in the biologically damaging UV. The general shape of the variation of the daily biologically damaging exposures through the year depends on the relative response of the various action spectra at the different wavelengths. The ratio of the daily erythemal to actinic exposures drops by approximately 20 to 25% from winter to summer. The ratio of the erythemal to DNA exposures drops by approximately 50% over the same period. In contrast, the ratio of the erythemal to plant damage exposures is higher in summer compared to winter. This is due to the changes in the relative proportion of UVA to UVB wavebands and relative responses of the different action spectra. The relative changes for the different action spectra show that the erythemal action spectrum cannot be used as a proxy for other biologically damaging responses.

  11. Comparison of fungi within the Gaeumannomyces-Phialophora complex by analysis of ribosomal DNA sequences.

    PubMed

    Bryan, G T; Daniels, M J; Osbourn, A E

    1995-02-01

    Four ascomycete species of the genus Gaeumannomyces infect roots of monocotyledons. Gaeumannomyces graminis contains four varieties, var. tritici, var. avenae, var. graminis, and var. maydis. G. graminis varieties tritici, avenae, and graminis have Phialophora-like anamorphs and, together with the other Gaeumannomyces and Phialophora species found on cereal roots, constitute the Gaeumannomyces-Phialophora complex. Relatedness of a number of Gaeumannomyces and Phialophora isolates was assessed by comparison of DNA sequences of the 18S rRNA gene, the 5.8S rRNA gene, and the internal transcribed spacers (ITS). G. graminis var. tritici, G. graminis var. avenae, and G. graminis var. graminis isolates can be distinguished from each other by nucleotide sequence differences in the ITS regions. The G. graminis var. tritici isolates can be further subdivided into R and N isolates (correlating with ability [R] or inability [N] to infect rye). Phylogenetic analysis of the ITS regions of several oat-infecting G. graminis var. tritici isolates suggests that these isolates are actually more closely related to G. graminis var. avenae. The isolates of Magnaporthe grisea included in the analysis showed a surprising degree of relatedness to members of the Gaeumannomyces-Phialophora complex. G. graminis variety-specific oligonucleotide primers were used in PCRs to amplify DNA from cereal seedlings infected with G. graminis var. tritici or G. graminis var. avenae, and these should be valuable for sensitive detection of pathogenic isolates and for diagnosis of take-all.

  12. Whole-genome sequence comparison as a method for improving bacterial species definition.

    PubMed

    Zhang, Wen; Du, Pengcheng; Zheng, Han; Yu, Weiwen; Wan, Li; Chen, Chen

    2014-01-01

    We compared pairs of 1,226 bacterial strains with whole genome sequences and calculated their average nucleotide identity (ANI) between genomes to determine whether whole genome comparison can be directly used for bacterial species definition. We found that genome comparisons of two bacterial strains from the same species (SGC) have a significantly higher ANI than those of two strains from different species (DGC), and that the ANI between the query and the reference genomes can be used to determine whether two genomes come from the same species. Bacterial species definition based on ANI with a cut-off value of 0.92 matched well (81.5%) with the current bacterial species definition. The ANI value was shown to be consistent with the standard for traditional bacterial species definition, and it could be used in bacterial taxonomy for species definition. A new bioinformatics program (ANItools) was also provided in this study for users to obtain the ANI value of any two bacterial genome pairs (http://genome.bioinfo-icdc.org/). This program can match a query strain to all bacterial genomes, and identify the highest ANI value of the strain at the species, genus and family levels respectively, providing valuable insights for species definition.

  13. Investigating the performance of a novel cyclic rotating-bed biological reactor compared with a sequencing continuous-inflow reactor for biodegradation of catechol in wastewater.

    PubMed

    Aghapour, Ali Ahmad; Moussavi, Gholamreza; Yaghmaeian, Kamyar

    2013-06-01

    The main objective of this study was to investigate the performance of a cyclic rotating-bed biological reactor (CRBR) in comparison to a sequencing continuous-inflow reactor (SCR) for the biodegradation and mineralization of catechol. Results showed that catechol degradation and mineralization in the SCR at the organic loading of 7.82kgCOD/m(3)d and the hydraulic retention time (HRT) of 9h were 28.2% and 10.3%, respectively. Under similar operating conditions to SCR, steady-state performance of CRBR with polyurethane foam (PUF) media for degradation and mineralization of catechol achieved was 98.7% and 97.9%, respectively. In comparison, the CRBR with 2H media attained average steady-state catechol degradation and mineralization of 89.1% and 83.6%, respectively, under similar conditions. Accordingly, the CRBR with PUF media presents a promising process for efficiently treating wastewater containing high concentrations of toxic, inhibitory and resistant compounds at a relatively short HRT. Copyright © 2013 Elsevier Ltd. All rights reserved.

  14. Protein identities - Graphocephala atropunctata expressed sequenced tags: expanding leafhopper vector biology

    USDA-ARS?s Scientific Manuscript database

    A small heat shock protein was isolated and sequenced from the Blue-green sharpshooter, BGSS, Graphocephala atropunctata (Signoret) (Hemiptera: Cicadellidae). The BGSS has been the native vector of Pierce’s disease in vineyards in California for nearly a century. The importance of this vector spec...

  15. Barcoding lichen-forming fungi using 454 pyrosequencing is challenged by artifactual and biological sequence variation.

    PubMed

    Mark, Kristiina; Cornejo, Carolina; Keller, Christine; Flück, Daniela; Scheidegger, Christoph

    2016-09-01

    Although lichens (lichen-forming fungi) play an important role in the ecological integrity of many vulnerable landscapes, only a minority of lichen-forming fungi have been barcoded out of the currently accepted ∼18 000 species. Regular Sanger sequencing can be problematic when analyzing lichens since saprophytic, endophytic, and parasitic fungi live intimately admixed, resulting in low-quality sequencing reads. Here, high-throughput, long-read 454 pyrosequencing in a GS FLX+ System was tested to barcode the fungal partner of 100 epiphytic lichen species from Switzerland using fungal-specific primers when amplifying the full internal transcribed spacer region (ITS). The present study shows the potential of DNA barcoding using pyrosequencing, in that the expected lichen fungus was successfully sequenced for all samples except one. Alignment solutions such as BLAST were found to be largely adequate for the generated long reads. In addition, the NCBI nucleotide database-currently the most complete database for lichen-forming fungi-can be used as a reference database when identifying common species, since the majority of analyzed lichens were identified correctly to the species or at least to the genus level. However, several issues were encountered, including a high sequencing error rate, multiple ITS versions in a genome (incomplete concerted evolution), and in some samples the presence of mixed lichen-forming fungi (possible lichen chimeras).

  16. Biology--Chemistry--Physics, Students' Guide, A Three-Year Sequence, Parts I and II.

    ERIC Educational Resources Information Center

    Scott, Arthur; And Others

    Parts I and II of the students' guide to the three-year integrated biology, chemistry, and physics course being prepared by the Portland Project Committee are contained in this guide. A committee reviewed and selected material developed by the national course improvement groups--Physical Science Study Committee, Chemical Bond Approach, Chemical…

  17. Biology-Chemistry-Physics, Teachers' Guide, a Three-Year Sequence, Parts I and II.

    ERIC Educational Resources Information Center

    Scott, Arthur; And Others

    This is one of two teacher's guides for a three-year integrated biology, chemistry, and physics course being prepared by the Portland Project Committee. This committee reviewed and selected material developed by the national course improvement groups--Physical Science Study Committee, Chemical Bond Approach, Chemical Education Materials Study,…

  18. Accelerated Integrated Science Sequence (AISS): An Introductory Biology, Chemistry, and Physics Course

    ERIC Educational Resources Information Center

    Purvis-Roberts, Kathleen L.; Edwalds-Gilbert, Gretchen; Landsberg, Adam S.; Copp, Newton; Ulsh, Lisa; Drew, David E.

    2009-01-01

    A new interdisciplinary, introductory science course was offered for the first time during the 2007-2008 school year. The purpose of the course is to introduce students to the idea of working at the intersections of biology, chemistry, and physics and to recognize interconnections between the disciplines. Interdisciplinary laboratories are a key…

  19. Biological nutrient removal in a sequencing batch reactor operated as oxic/anoxic/extended-idle regime.

    PubMed

    Li, Xiao-ming; Chen, Hong-bo; Yang, Qi; Wang, Dong-bo; Luo, Kun; Zeng, Guang-ming

    2014-06-01

    Previous researches have demonstrated that biological phosphorus removal from wastewater could be induced by oxic/extended-idle (O/EI) regime. In this study, an anoxic period was introduced after the aeration to realize biological nutrient removal. High nitrite accumulation ratio and polyhydroxyalkanoates biosynthesis were obtained in the aeration and biological nutrient removal could be well achieved in oxic/anoxic/extended-idle (O/A/EI) regime for the wastewater used. In addition, nitrogen and phosphorus removal performance in O/A/EI regime was compared with that in conventional anaerobic/anoxic/aerobic (A(2)/O) and O/EI processes. The results showed that O/A/EI regime exhibited higher nitrogen and phosphorus removal than A(2)/O and O/EI processes. More ammonium oxidizing bacteria and polyphosphate accumulating organisms and less glycogen accumulating organisms containing in the biomass might be the principal reason for the better nitrogen and phosphorus removal in O/A/EI regime. Furthermore, biological nutrient removal with O/A/EI regime was demonstrated with municipal wastewater. The average TN, SOP and COD removal efficiencies were 93%, 95% and 87%, respectively. Copyright © 2013 Elsevier Ltd. All rights reserved.

  20. Sequencing Genetics Information: Integrating Data into Information Literacy for Undergraduate Biology Students

    ERIC Educational Resources Information Center

    MacMillan, Don

    2010-01-01

    This case study describes an information literacy lab for an undergraduate biology course that leads students through a range of resources to discover aspects of genetic information. The lab provides over 560 students per semester with the opportunity for hands-on exploration of resources in steps that simulate the pathways of higher-level…

  1. Designing and Evaluating a Context-Based Lesson Sequence Promoting Conceptual Coherence in Biology

    ERIC Educational Resources Information Center

    Ummels, M. H. J.; Kamp, M. J. A.; de Kroon, H.; Boersma, K. Th.

    2015-01-01

    Context-based education, in which students deal with biological concepts in a meaningful way, is showing promise in promoting the development of students' conceptual coherence. However, literature gives little guidance about how this kind of education should be designed. Therefore, our study aims at designing and evaluating the practicability of a…

  2. Research and Teaching: Instructor Use of Group Active Learning in an Introductory Biology Sequence

    ERIC Educational Resources Information Center

    Auerbach, Anna Jo; Schussler, Elisabeth E.

    2016-01-01

    Active learning (or learner-centered) pedagogies have been shown to enhance student learning in introductory biology courses. Student collaboration has also been shown to enhance student learning and may be a critical part of effective active learning practices. This study focused on documenting the use of individual active learning and group…

  3. Research and Teaching: Instructor Use of Group Active Learning in an Introductory Biology Sequence

    ERIC Educational Resources Information Center

    Auerbach, Anna Jo; Schussler, Elisabeth E.

    2016-01-01

    Active learning (or learner-centered) pedagogies have been shown to enhance student learning in introductory biology courses. Student collaboration has also been shown to enhance student learning and may be a critical part of effective active learning practices. This study focused on documenting the use of individual active learning and group…

  4. Sequencing Genetics Information: Integrating Data into Information Literacy for Undergraduate Biology Students

    ERIC Educational Resources Information Center

    MacMillan, Don

    2010-01-01

    This case study describes an information literacy lab for an undergraduate biology course that leads students through a range of resources to discover aspects of genetic information. The lab provides over 560 students per semester with the opportunity for hands-on exploration of resources in steps that simulate the pathways of higher-level…

  5. Designing and Evaluating a Context-Based Lesson Sequence Promoting Conceptual Coherence in Biology

    ERIC Educational Resources Information Center

    Ummels, M. H. J.; Kamp, M. J. A.; de Kroon, H.; Boersma, K. Th.

    2015-01-01

    Context-based education, in which students deal with biological concepts in a meaningful way, is showing promise in promoting the development of students' conceptual coherence. However, literature gives little guidance about how this kind of education should be designed. Therefore, our study aims at designing and evaluating the practicability of a…

  6. Application Sequence and soil biology influence anaerobic soil disinfestation induced disease suppression

    USDA-ARS?s Scientific Manuscript database

    Anaerobic soil disinfestation (ASD) and mustard seed meal (MSM) soil amendments can yield significant control of a diversity of soil-borne pests and pathogens. The mechanisms functional in disease suppression are diverse and with regard to MSM amendment, soil biology has been shown to have a signif...

  7. Accelerated Integrated Science Sequence (AISS): An Introductory Biology, Chemistry, and Physics Course

    ERIC Educational Resources Information Center

    Purvis-Roberts, Kathleen L.; Edwalds-Gilbert, Gretchen; Landsberg, Adam S.; Copp, Newton; Ulsh, Lisa; Drew, David E.

    2009-01-01

    A new interdisciplinary, introductory science course was offered for the first time during the 2007-2008 school year. The purpose of the course is to introduce students to the idea of working at the intersections of biology, chemistry, and physics and to recognize interconnections between the disciplines. Interdisciplinary laboratories are a key…

  8. Sequence specificity in photoreaction of various psoralen derivatives with DNA: role in biological activity.

    PubMed

    Boyer, V; Moustacchi, E; Sage, E

    1988-04-19

    The sequence specificity in the photoreaction of various psoralen derivatives with DNA is investigated by using DNA sequencing methodology. The 3'-5' exonuclease activity associated with T4 DNA polymerase serves as a probe to map the psoralens' photoaddition (monoadducts plus biadducts) on DNA fragments of defined sequence. This approach has already allowed us to demonstrate a strong sequence context effect on the 8-methoxypsoralen photobinding to DNA [Sage, E., & Moustacchi, E. (1987) Biochemistry 26, 3307-3314]. The psoralens studied include bifunctional derivatives [8-methoxypsoralen, 5-methoxypsoralen, and 4'-(hydroxymethyl)-4,5',8-trimethylpsoralen] and monofunctional derivatives (angelicin, 3-carbethoxypsoralen, and three pyridopsoralens). Maps of photochemical binding on two DNA fragments of the lacI gene of Escherichia coli are established for all the derivatives. These maps demonstrate the following general qualitative rules in the photoreaction of the furocoumarins with DNA: thymine residues in a GC environment are cold, adjacent thymines are better targets, 5'-TpA sites are strongly preferred versus 5'-ApT, and alternating (AT)n sequences are hot spots for photoaddition. Depending on the chemical structure of the derivatives and on their affinity for DNA, some minor differences in the binding spectrum are detected. A most interesting example is 3-carbethoxypsoralen, which specifically reacts with (AT)n sites. Our observations lead us to define two types of target sites: the "strong sites", which are preferential targets for all psoralen derivatives, and the "weak sites", which are targets only for derivatives having a high affinity for DNA. The frequency of DNA lesions is much higher in the former sites.(ABSTRACT TRUNCATED AT 250 WORDS)

  9. Complete genome sequence and biological characterization of Moroccan pepper virus (MPV) and reclassification of Lettuce necrotic stunt virus as MPV.

    PubMed

    Wintermantel, William M; Hladky, Laura L

    2013-05-01

    Moroccan pepper virus (MPV) and Lettuce necrotic stunt virus (LNSV) have been steadily increasing in prevalence in central Asia and western North America, respectively, over the past decade. Recent sequence analysis of LNSV demonstrated a close relationship between the coat proteins of LNSV and MPV. To determine the full extent of the relationship between LNSV and MPV, the genomes of three MPV isolates were sequenced and compared with that of LNSV. Sequence analysis demonstrated that genomic nucleotide sequences as well as virus-encoded proteins of the three MPV isolates and LNSV shared 97% or greater identity. A full-length clone of a California LNSV isolate was developed and virus derived from infectious transcripts was used to evaluate host plant reactions under controlled conditions. Symptoms of LNSV matched those described previously for MPV on most of a select series of host plants, although some differences were observed. Collectively, these molecular and biological results demonstrate that LNSV should be classified as MPV within the family Tombusviridae, genus Tombusvirus, and confirm the presence of MPV in North America.

  10. Structural characterization and biological activity of recombinant human epidermal growth factor proteins with different N-terminal sequences.

    PubMed

    Svoboda, M; Bauhofer, A; Schwind, P; Bade, E; Rasched, I; Przybylski, M

    1994-05-18

    The primary structures and molecular homogeneity of recombinant human epidermal growth factors from different suppliers were characterized and their biological activities evaluated by a standard DNA synthesis assay. Molecular weight determinations using 252Cf-plasma-desorption and electrospray mass spectrometry in combination with N- and C-terminal sequence analysis and determination of intramolecular disulfide bridges revealed that one recombinant protein had the correct human-identical structure (54 aa residues; 6347 Da). In contrast, a second recombinant protein (7020 Da) was found to contain a pentapeptide (KKYPR) insert following its N-terminal methionine. This structural variant showed a significant reduction in its capacity to stimulate DNA synthesis.

  11. A comparison of the biological characteristics of EV71 C4 subtypes from different epidemic strains.

    PubMed

    Wang, Li-chun; Tang, Song-qing; Li, Yan-mei; Zhao, Hong-lin; Dong, Cheng-hong; Cui, Ping-fang; Ma, Shao-hui; Liao, Yun; Liu, Long-ding; Li, Qi-han

    2010-04-01

    The comparative analysis of the biological characterization and the genetic background study of EV71 circulating strains is commonly recognized as basic work necessary for development of an effective EV71 vaccine. In this study, we sequenced five EV71 circulating strains, isolated from Fuyang, Hefei, Kunming and Shenzhen city of China and named them FY-23, FY-22, H44, K9 and S1 respectively. The sequence alignment demonstrated their genotypes be C4. The genetic distance of the VP1 gene from these isolates suggested that they were highly co-related with genetic identity similar to other previously reported EV71 strains in China. Additionally, these strains were identified to display some obvious proliferation dynamics and plaque morphology when propagated in Vero cells. However, a distinctive difference in pathogenic ability in neonatal mice was found. Some differences in cross neutralization test & immunogenic analysis were also found. All these results are related to the biological characterization of circulating EV71 strains in China and aid in the development of an EV71 vaccine in the future.

  12. Evaluation of global sequence comparison and one-to-one FASTA local alignment in regulatory allergenicity assessment of transgenic proteins in food crops.

    PubMed

    Song, Ping; Herman, Rod A; Kumpatla, Siva

    2014-09-01

    To address the high false positive rate using >35% identity over 80 amino acids in the regulatory assessment of transgenic proteins for potential allergenicity and the change of E-value with database size, the Needleman-Wunsch global sequence alignment and a one-to-one (1:1) local FASTA search (one protein in the target database at a time) using FASTA were evaluated by comparing proteins randomly selected from Arabidopsis, rice, corn, and soybean with known allergens in a peer-reviewed allergen database (http://www.allergenonline.org/). Compared with the approach of searching >35%/80aa+, the false positive rate measured by specificity rate for identification of true allergens was reduced by a 1:1 global sequence alignment with a cut-off threshold of ≧30% identity and a 1:1 FASTA local alignment with a cut-off E-value of ≦1.0E-09 while maintaining the same sensitivity. Hence, a 1:1 sequence comparison, especially using the FASTA local alignment tool with a biological relevant E-value of 1.0E-09 as a threshold, is recommended for the regulatory assessment of sequence identities between transgenic proteins in food crops and known allergens.

  13. Bayesian model comparison and parameter inference in systems biology using nested sampling.

    PubMed

    Pullen, Nick; Morris, Richard J

    2014-01-01

    Inferring parameters for models of biological processes is a current challenge in systems biology, as is the related problem of comparing competing models that explain the data. In this work we apply Skilling's nested sampling to address both of these problems. Nested sampling is a Bayesian method for exploring parameter space that transforms a multi-dimensional integral to a 1D integration over likelihood space. This approach focuses on the computation of the marginal likelihood or evidence. The ratio of evidences of different models leads to the Bayes factor, which can be used for model comparison. We demonstrate how nested sampling can be used to reverse-engineer a system's behaviour whilst accounting for the uncertainty in the results. The effect of missing initial conditions of the variables as well as unknown parameters is investigated. We show how the evidence and the model ranking can change as a function of the available data. Furthermore, the addition of data from extra variables of the system can deliver more information for model comparison than increasing the data from one variable, thus providing a basis for experimental design.

  14. Bayesian Model Comparison and Parameter Inference in Systems Biology Using Nested Sampling

    PubMed Central

    Pullen, Nick; Morris, Richard J.

    2014-01-01

    Inferring parameters for models of biological processes is a current challenge in systems biology, as is the related problem of comparing competing models that explain the data. In this work we apply Skilling's nested sampling to address both of these problems. Nested sampling is a Bayesian method for exploring parameter space that transforms a multi-dimensional integral to a 1D integration over likelihood space. This approach focusses on the computation of the marginal likelihood or evidence. The ratio of evidences of different models leads to the Bayes factor, which can be used for model comparison. We demonstrate how nested sampling can be used to reverse-engineer a system's behaviour whilst accounting for the uncertainty in the results. The effect of missing initial conditions of the variables as well as unknown parameters is investigated. We show how the evidence and the model ranking can change as a function of the available data. Furthermore, the addition of data from extra variables of the system can deliver more information for model comparison than increasing the data from one variable, thus providing a basis for experimental design. PMID:24523891

  15. A cytochrome ba3 functions as a quinol oxidase in Paracoccus denitrificans. Purification, cloning, and sequence comparison.

    PubMed

    Richter, O M; Tao, J S; Turba, A; Ludwig, B

    1994-09-16

    A quinol oxidase has been purified from the cytoplasmic membrane of Paracoccus denitrificans; its heme composition and CO binding properties identify it as a cytochrome ba3. On SDS gels, the purified enzyme complex is separated into five polypeptides. Using partial peptide sequence information for subunit II, the gene locus has been cloned and sequenced. In a typical operon pattern, four genes were identified: qoxA, -B, -C, and -D, coding for subunits II, I, III, and IV. DNA-derived amino acid sequence comparisons reveal extensive similarities to other members of the terminal oxidase superfamily.

  16. A Comparison of Biological and Adoptive Mothers and Fathers: The Relevance of Biological Kinship and Gendered Constructs of Parenthood.

    ERIC Educational Resources Information Center

    Miall, Charlene E.; March, Karen

    2003-01-01

    Used qualitative interviews to examine beliefs and values about biological and adoptive parents. Considered how biological kinship, gender, and actual parenting behavior affect the assessments respondents made of the emotional bonding between parents and children. Found that biological and adoptive parents viewed motherhood as instinctive and…

  17. Evidence for intramolecularly folded i-DNA structures in biologically relevant CCC-repeat sequences.

    PubMed Central

    Manzini, G; Yathindra, N; Xodo, L E

    1994-01-01

    The structural behaviour of repetitive cytosine DNA is examined in the oligodeoxynucleotide sequences of (CCCTAA)3CCCT (HTC4), GC(TCCC)3TCCT(TCCC)3 (KRC6) and the methylated (CCCT)3TCCT(CCCT)3C (KRM6) by circular dichroism (CD), gel electrophoresis (PAGE), and ultra violet (UV) absorbance studies. All the three sequences exhibit a pH-induced cooperative structural transition as monitored by CD. An intense positive CD band around 285 nm develops on lowering the pH from 8 to slightly acidic condition, indicative of the formation of base pairs between protonated cytosines. The oligomers are found to melt in a fully reversible and cooperative fashion, with a melting temperature (Tm) of around 50 degrees C at pH 5.5. The melting temperatures are independent from DNA concentration, indicative of an intramolecular process involved in the structural formation. PAGE experiments performed with 32P-labeled samples as well as with normal staining procedures show a predominantly single band migration for all the three oligomers suggestive of a unimolecular structure. From pH titrations the number of protons required for generating the structures formed by HTC4, KRC6 and KRM6 results to be around six. These findings strongly suggest that all the three sequences adopt an intramolecular i-motif structure. The demonstration of i-motif structure for KRC6, a critical functional stretch of the c-ki-ras promoter proto-oncogene, besides the human telomeric sequence HTC4, may be suggestive of larger significance in the functioning of DNA. Images PMID:7984411

  18. SeqDepot: streamlined database of biological sequences and precomputed features.

    PubMed

    Ulrich, Luke E; Zhulin, Igor B

    2014-01-15

    Assembling and/or producing integrated knowledge of sequence features continues to be an onerous and redundant task despite a large number of existing resources. We have developed SeqDepot-a novel database that focuses solely on two primary goals: (i) assimilating known primary sequences with predicted feature data and (ii) providing the most simple and straightforward means to procure and readily use this information. Access to >28.5 million sequences and 300 million features is provided through a well-documented and flexible RESTful interface that supports fetching specific data subsets, bulk queries, visualization and searching by MD5 digests or external database identifiers. We have also developed an HTML5/JavaScript web application exemplifying how to interact with SeqDepot and Perl/Python scripts for use with local processing pipelines. Freely available on the web at http://seqdepot.net/. RESTaccess via http://seqdepot.net/api/v1. Database files and scripts maybe downloaded from http://seqdepot.net/download.

  19. A Comparison of Molecular Biology Mechanism of Shewanella putrefaciens between Fresh and Terrestrial Sewage Wastewater

    PubMed Central

    Xu, Jiajie; He, Weina; Wang, Zhonghua; Zhang, Dijun; Sun, Jing; Zhou, Jun; Li, Yanyan; Su, Xiurong

    2016-01-01

    Municipal and industrial wastewater is often discharged into the environment without appropriate treatment, especially in developing countries. As a result, many rivers and oceans are contaminated. It is urgent to control and administer treatments to these contaminated rivers and oceans. However, most mechanisms of bacterial colonization in contaminated rivers and oceans were unknown, especially in sewage outlets. We found Shewanella putrefaciens to be the primary bacteria in the terrestrial sewage wastewater outlets around Ningbo City, China. Therefore, in this study, we applied a combination of differential proteomics, metabolomics, and real-time fluorescent quantitative PCR techniques to identify bacteria intracellular metabolites. We found S. putrefaciens had 12 different proteins differentially expressed in freshwater culture than when grown in wastewater, referring to the formation of biological membranes (Omp35, OmpW), energy metabolism (SOD, deoxyribose-phosphate pyrophosphokinase), fatty acid metabolism (beta-ketoacyl synthase), secondary metabolism, TCA cycle, lysine degradation (2-oxoglutarate reductase), and propionic acid metabolism (succinyl coenzyme A synthetase). The sequences of these 12 differentially expressed proteins were aligned with sequences downloaded from NCBI. There are also 27 differentially concentrated metabolites detected by NMR, including alcohols (ethanol, isopropanol), amines (dimethylamine, ethanolamine), amino acids (alanine, leucine), amine compounds (bilinerurine), nucleic acid compounds (nucleosides, inosines), and organic acids (formate, acetate). Formate and ethanolamine show significant difference between the two environments and are possibly involved in energy metabolism, glycerophospholipid and ether lipids metabolism to provide energy supply, and material basis for engraftment in sewage. Because understanding S. putrefaciens’s biological mechanism of colonization (protein, gene express, and metabolites) in terrestrial

  20. A Comparison of Molecular Biology Mechanism of Shewanella putrefaciens between Fresh and Terrestrial Sewage Wastewater.

    PubMed

    Xu, Jiajie; He, Weina; Wang, Zhonghua; Zhang, Dijun; Sun, Jing; Zhou, Jun; Li, Yanyan; Su, Xiurong

    2016-01-01

    Municipal and industrial wastewater is often discharged into the environment without appropriate treatment, especially in developing countries. As a result, many rivers and oceans are contaminated. It is urgent to control and administer treatments to these contaminated rivers and oceans. However, most mechanisms of bacterial colonization in contaminated rivers and oceans were unknown, especially in sewage outlets. We found Shewanella putrefaciens to be the primary bacteria in the terrestrial sewage wastewater outlets around Ningbo City, China. Therefore, in this study, we applied a combination of differential proteomics, metabolomics, and real-time fluorescent quantitative PCR techniques to identify bacteria intracellular metabolites. We found S. putrefaciens had 12 different proteins differentially expressed in freshwater culture than when grown in wastewater, referring to the formation of biological membranes (Omp35, OmpW), energy metabolism (SOD, deoxyribose-phosphate pyrophosphokinase), fatty acid metabolism (beta-ketoacyl synthase), secondary metabolism, TCA cycle, lysine degradation (2-oxoglutarate reductase), and propionic acid metabolism (succinyl coenzyme A synthetase). The sequences of these 12 differentially expressed proteins were aligned with sequences downloaded from NCBI. There are also 27 differentially concentrated metabolites detected by NMR, including alcohols (ethanol, isopropanol), amines (dimethylamine, ethanolamine), amino acids (alanine, leucine), amine compounds (bilinerurine), nucleic acid compounds (nucleosides, inosines), and organic acids (formate, acetate). Formate and ethanolamine show significant difference between the two environments and are possibly involved in energy metabolism, glycerophospholipid and ether lipids metabolism to provide energy supply, and material basis for engraftment in sewage. Because understanding S. putrefaciens's biological mechanism of colonization (protein, gene express, and metabolites) in terrestrial

  1. A comparison of biological characteristics of three strains of Chinese sacbrood virus in Apis cerana

    PubMed Central

    Hu, Ying; Fei, Dongliang; Jiang, Lili; Wei, Dong; Li, Fangbing; Diao, Qingyun; Ma, Mingxiao

    2016-01-01

    We selected and sequenced the entire genomes of three strains of Chinese sacbrood virus (CSBV): LNQY-2008 (isolated in Qingyuan, Liaoning Province), SXYL-2015 (isolated in Yulin, Shanxi Province), and JLCBS-2014 (isolated in Changbaishan, Jilin Province), by VP1 amino acid (aa) analysis. These strains are endemic in China and infect Apis cerana. Nucleotide sequences, deduced amino acid sequences, genetic backgrounds, and other molecular biological characteristics were analysed. We also examined sensitivity of these virus strains to temperature, pH, and organic solvents, as well as to other physicochemical properties. On the basis of these observations, we compared pathogenicity and tested cross-immunogenicity and protective immunity, using antisera raised against each of the three strains. Our results showed that compared with SXYL-2015, LNQY-2008 has a 10-aa deletion and 3-aa deletion (positions 282–291 and 299–301, respectively), whereas JLCBS-2014 has a 17-aa deletion (positions 284–300). However, the three strains showed no obvious differences in physicochemical properties or pathogenicity. Moreover, there was immune cross-reactivity among the antisera raised against the different strains, implying good protective effects of such antisera. The present study should significantly advance the understanding of the pathogenesis of Chinese sacbrood disease, and offers insights into comprehensive prevention and treatment of, as well as possible protection from, the disease by means of an antiserum. PMID:27853294

  2. Comparison of aerobic and anaerobic stability indices through a MSW biological treatment process.

    PubMed

    Ponsá, Sergio; Gea, Teresa; Alerm, Llorenç; Cerezo, Javier; Sánchez, Antoni

    2008-12-01

    A complex mechanical-biological waste treatment plant designed for the processing of mixed municipal solid wastes (MSW) and source-selected organic fraction of municipal solid wastes (OFMSW) has been studied by using stability indices related to aerobic (respiration index, RI) and anaerobic conditions (biochemical methane potential, BMP). Several selected stages of the plant have been characterized: waste inputs, mechanically treated wastes, anaerobically digested materials and composted wastes, according to the treatment sequence used in the plant. Results obtained showed that the main stages responsible for waste stabilization were the two first stages: mechanical separation and anaerobic digestion with a diminution of both RI and BMP around 40% and 60%, respectively, whereas the third stage, composting of digested materials, produced lesser biological degradation (20-30%). The results related to waste stabilization were similar in both lines (MSW and OFMSW), although the indices obtained for MSW were significantly lower than those obtained for OFMSW, which demonstrated a high biodegradability of OFMSW. The methodology proposed can be used for the characterization of organic wastes and the determination of the efficiency of operation units used in mechanical-biological waste treatment plants.

  3. Cancer systems biology in the genome sequencing era: part 2, evolutionary dynamics of tumor clonal networks and drug resistance.

    PubMed

    Wang, Edwin; Zou, Jinfeng; Zaman, Naif; Beitel, Lenore K; Trifiro, Mark; Paliouras, Miltiadis

    2013-08-01

    A tumor often consists of multiple cell subpopulations (clones). Current chemo-treatments often target one clone of a tumor. Although the drug kills that clone, other clones overtake it and the tumor recurs. Genome sequencing and computational analysis allows to computational dissection of clones from tumors, while singe-cell genome sequencing including RNA-Seq allows profiling of these clones. This opens a new window for treating a tumor as a system in which clones are evolving. Future cancer systems biology studies should consider a tumor as an evolving system with multiple clones. Therefore, topics discussed in Part 2 of this review include evolutionary dynamics of clonal networks, early-warning signals (e.g., genome duplication events) for formation of fast-growing clones, dissecting tumor heterogeneity, and modeling of clone-clone-stroma interactions for drug resistance. The ultimate goal of the future systems biology analysis is to obtain a 'whole-system' understanding of a tumor and therefore provides a more efficient and personalized management strategies for cancer patients.

  4. Transformation of a Traditional, Freshman Biology, Three-Semester Sequence, to a Two-Semester, Integrated Thematically Organized, and Team-Taught Course

    ERIC Educational Resources Information Center

    Soto, Julio G.; Everhart, Jerry

    2016-01-01

    Biology faculty at San José State University developed, piloted, implemented, and assessed a freshmen course sequence based on the macro-to micro-teaching approach that was team-taught, and organized around unifying themes. Content learning assessment drove the conceptual framework of our course sequence. Content student learning increased…

  5. Analysis and comparison of a set of expressed sequence tags of the parthenogenetic water flea Daphnia carinata.

    PubMed

    Xu, Xiaoqian; Song, Shuhui; Wang, Qun; Qin, Fen; Liu, Kan; Zhang, Xiaowei; Hu, Songnian; Zhao, Yunlong

    2009-08-01

    The water flea Daphnia carinata (D. carinata) reproduces both sexually and parthenogenetically, yet little is known about the genes involved in these processes. To further clarify the reproductive biology of Daphnia and elucidate their unique mechanism of reproductive transformation, we have generated and characterized an expressed sequence tag (EST) data set from D. carinata. A set of 1,495 clusters were generated from sequencing 3,072 randomly chosen clones from a parthenogenetic, juvenile water flea cDNA library. The nucleic acid and deduced amino acid sequences were compared with known GenBank sequences. Functional annotation found that 959 clusters showed significant homology with known genes involved in a broad range of activities, including metabolism, translation, development and reproduction, as well as genes involved in sensing environmental factors. We speculate that genes involved in development and reproduction, along with genes that allow the organism to sense changes in the environment, play important roles in the process of parthenogenetic reproduction and could be markers of the early steps of sexual differentiation. Additionally, 86% of the D. Carinata unique sequences could be stringently mapped to the D. pulex genome, of which 125 mapped to intergenic and intronic regions on the current assembly. Our results provide practical insight into crustacean reproductive biology, in addition to establishing a new animal model for reproductive and developmental biology.

  6. Microbial Analysis of Bite Marks by Sequence Comparison of Streptococcal DNA

    PubMed Central

    Kennedy, Darnell M.; Stanton, Jo-Ann L.; García, José A.; Mason, Chris; Rand, Christy J.; Kieser, Jules A.; Tompkins, Geoffrey R.

    2012-01-01

    Bite mark injuries often feature in violent crimes. Conventional morphometric methods for the forensic analysis of bite marks involve elements of subjective interpretation that threaten the credibility of this field. Human DNA recovered from bite marks has the highest evidentiary value, however recovery can be compromised by salivary components. This study assessed the feasibility of matching bacterial DNA sequences amplified from experimental bite marks to those obtained from the teeth responsible, with the aim of evaluating the capability of three genomic regions of streptococcal DNA to discriminate between participant samples. Bite mark and teeth swabs were collected from 16 participants. Bacterial DNA was extracted to provide the template for PCR primers specific for streptococcal 16S ribosomal RNA (16S rRNA) gene, 16S–23S intergenic spacer (ITS) and RNA polymerase beta subunit (rpoB). High throughput sequencing (GS FLX 454), followed by stringent quality filtering, generated reads from bite marks for comparison to those generated from teeth samples. For all three regions, the greatest overlaps of identical reads were between bite mark samples and the corresponding teeth samples. The average proportions of reads identical between bite mark and corresponding teeth samples were 0.31, 0.41 and 0.31, and for non-corresponding samples were 0.11, 0.20 and 0.016, for 16S rRNA, ITS and rpoB, respectively. The probabilities of correctly distinguishing matching and non-matching teeth samples were 0.92 for ITS, 0.99 for 16S rRNA and 1.0 for rpoB. These findings strongly support the tenet that bacterial DNA amplified from bite marks and teeth can provide corroborating information in the identification of assailants. PMID:23284761

  7. Microbial analysis of bite marks by sequence comparison of streptococcal DNA.

    PubMed

    Kennedy, Darnell M; Stanton, Jo-Ann L; García, José A; Mason, Chris; Rand, Christy J; Kieser, Jules A; Tompkins, Geoffrey R

    2012-01-01

    Bite mark injuries often feature in violent crimes. Conventional morphometric methods for the forensic analysis of bite marks involve elements of subjective interpretation that threaten the credibility of this field. Human DNA recovered from bite marks has the highest evidentiary value, however recovery can be compromised by salivary components. This study assessed the feasibility of matching bacterial DNA sequences amplified from experimental bite marks to those obtained from the teeth responsible, with the aim of evaluating the capability of three genomic regions of streptococcal DNA to discriminate between participant samples. Bite mark and teeth swabs were collected from 16 participants. Bacterial DNA was extracted to provide the template for PCR primers specific for streptococcal 16S ribosomal RNA (16S rRNA) gene, 16S-23S intergenic spacer (ITS) and RNA polymerase beta subunit (rpoB). High throughput sequencing (GS FLX 454), followed by stringent quality filtering, generated reads from bite marks for comparison to those generated from teeth samples. For all three regions, the greatest overlaps of identical reads were between bite mark samples and the corresponding teeth samples. The average proportions of reads identical between bite mark and corresponding teeth samples were 0.31, 0.41 and 0.31, and for non-corresponding samples were 0.11, 0.20 and 0.016, for 16S rRNA, ITS and rpoB, respectively. The probabilities of correctly distinguishing matching and non-matching teeth samples were 0.92 for ITS, 0.99 for 16S rRNA and 1.0 for rpoB. These findings strongly support the tenet that bacterial DNA amplified from bite marks and teeth can provide corroborating information in the identification of assailants.

  8. Biologically active sequences in the mouse laminin alpha3 chain G domain.

    PubMed

    Urushibata, Shunsuke; Katagiri, Fumihiko; Takaki, Shu; Yamada, Yuji; Fujimori, Chikara; Hozumi, Kentaro; Kikkawa, Yamato; Kadoya, Yuichi; Nomizu, Motoyoshi

    2009-11-10

    The laminin alpha3 chain is mainly expressed at the skin, and its C-terminal G domain has a critical role in multiple biological functions. We screened for biologically active sites on the mouse laminin alpha3 chain G domain using 107 synthetic peptides on coated plates and conjugated to Sepharose beads with HT1080 human fibrosarcoma cells, HaCaT human skin keratinocyte cells, and human dermal fibroblasts (HDFs). Eleven peptides exhibited cell attachment activity with respect to the peptide-coated plates and/or peptide-Sepharose beads. MA3G28 (WTIQTTVDRGLL) strongly binds to HaCaT cells. Four peptides promoted PC12 cell neurite outgrowth. Heparin inhibited attachment of HDFs to eight peptides on the coated plates. In contrast, EDTA significantly inhibited attachment of HDFs to MA3G27 (NAPFPKLSWTIQ) and MA3G28 but had no effect on the attachment of the other peptides. HDF cells formed well-organized actin stress fibers and focal contacts with vinculin accumulation on MA3G27. Additionally, attachment of HDFs to MA3G27 was inhibited by anti-alpha6 and anti-beta1 integrin antibodies, suggesting that MA3G27 promotes alpha6beta1 integrin-mediated cell adhesion. MA3G57 (NQRLASFSNAQQS) exhibited cell attachment activity only in the peptide bead assay. MA3G57 conjugated to a chitosan membrane promoted HDF attachment and spreading with well-organized actin stress fibers. The anti-beta1 integrin antibody partially inhibited attachment of HDFs to the MA3G57-chitosan membrane, suggesting that the MA3G57 site is involved in beta1 integrin-mediated cell attachment. These active sites are likely important in the biological activities of the laminin alpha3 chain G domain and would be useful for the study of molecular mechanisms of laminin-receptor interactions.

  9. Whole-genome sequencing of Bacillus subtilis XF-1 reveals mechanisms for biological control and multiple beneficial properties in plants.

    PubMed

    Guo, Shengye; Li, Xingyu; He, Pengfei; Ho, Honhing; Wu, Yixin; He, Yueqiu

    2015-06-01

    Bacillus subtilis XF-1 is a gram-positive, plant-associated bacterium that stimulates plant growth and produces secondary metabolites that suppress soil-borne plant pathogens. In particular, it is especially highly efficient at controlling the clubroot disease of cruciferous crops. Its 4,061,186-bp genome contains an estimated 3853 protein-coding sequences and the 1155 genes of XF-1 are present in most genome-sequenced Bacillus strains: 3757 genes in B. subtilis 168, and 1164 in B. amyloliquefaciens FZB42. Analysis using the Cluster of Orthologous Groups database of proteins shows that 60 genes control bacterial mobility, 221 genes are related to cell wall and membrane biosynthesis, and more than 112 are genes associated with secondary metabolites. In addition, the genes contributed to the strain's plant colonization, bio-control and stimulation of plant growth. Sequencing of the genome is a fundamental step for developing a desired strain to serve as an efficient biological control agent and plant growth stimulator. Similar to other members of the taxon, XF-1 has a genome that contains giant gene clusters for the non-ribosomal synthesis of antifungal lipopeptides (surfactin and fengycin), the polyketides (macrolactin and bacillaene), the siderophore bacillibactin, and the dipeptide bacilysin. There are two synthesis pathways for volatile growth-promoting compounds. The expression of biosynthesized antibiotic peptides in XF-1 was revealed by matrix-assisted laser desorption/ionization-time of flight mass spectrometry.

  10. Sequence of a New World primate insulin having low biological potency and immunoreactivity

    SciTech Connect

    Seino, S.; Steiner, D.F.; Bell, G.I.

    1987-11-01

    The organization of the insulin gene of the owl or night monkey (Aotus trivirgatus), a New World primate, is similar to that of the human gene. The sequences of these two genes and flanking regions possess 84.3% homology. An unusual feature of the owl monkey gene is the partial duplication and insertion of a portion of the A-chain coding sequence into the 3' untranslated region. The insulin gene of this primate also lacks a region of tandem repeats that is present in the 5' flanking region of the human and chimpanzee genes. Owl monkey preproinsulin has 85.5% identity with the human insulin precursor and is the most divergent of the primate insulins/preproinsulins yet described. The differences between owl monkey and human preproinsulin include three substitutions in the signal peptide, two in the B chain, seven in the C peptide, and three in the A chain. One of these replacements is the conservative substitution of valine for isoleucine a position A2, an invariant site in all other vertebrate insulins and insulin-like growth factors. The substitutions in owl monkey insulin at B9, B27, A2, A4, and A17 alter its structure so that it has only 20% of the receptor-binding activity and 1% of the affinity with guinea pig anti-porcine insulin antibodies as compared to human insulin.

  11. Biologically distinct subtypes of Mycobacterium avium differ in possession of insertion sequence IS901.

    PubMed

    Kunze, Z M; Portaels, F; McFadden, J J

    1992-09-01

    Mycobacterium avium causes disease, principally tuberculosis in immunocompromised individuals. It is the most frequent cause of disseminated infections in AIDS patients in the West. The pathogen is also associated with disease in animals, chiefly birds and livestock, and may be isolated from environmental samples such as soil and water. Analysis of strains of M. avium isolated from clinical, veterinary, and environmental sources for the presence of the mycobacterial insertion sequences IS900 and IS901 demonstrates the specific association of IS901 to animal pathogenic M. avium strains. In contrast, most clinical M. avium strains and all AIDS-derived strains examined so far lacked IS901. Significant differences in the plasmid contents and serotypes of strains with and without IS901 were also found. We therefore suggest that the presence of IS901 divides M. avium into two clearly distinct subtypes with differing host range, virulence, plasmid possession, and serotyping antigens. By using DNA sequence data from IS901 and M. avium DNA, a set of polymerase chain reactions were developed for the specific detection and differentiation of these subtypes.

  12. Biological phosphorus and nitrogen removal in sequencing batch reactors: effects of cycle length, dissolved oxygen concentration and influent particulate matter.

    PubMed

    Ginige, Maneesha P; Kayaalp, Ahmet S; Cheng, Ka Yu; Wylie, Jason; Kaksonen, Anna H

    2013-01-01

    Removal of phosphorus (P) and nitrogen (N) from municipal wastewaters is required to mitigate eutrophication of receiving water bodies. While most treatment plants achieve good N removal using influent carbon (C), the use of influent C to facilitate enhanced biological phosphorus removal (EBPR) is poorly explored. A number of operational parameters can facilitate optimum use of influent C and this study investigated the effects of cycle length, dissolved oxygen (DO) concentration during aerobic period and influent solids on biological P and N removal in sequencing batch reactors (SRBs) using municipal wastewaters. Increasing cycle length from 3 to 6 h increased P removal efficiency, which was attributed to larger portion of N being removed via nitrite pathway and more biodegradable organic C becoming available for EBPR. Further increasing cycle length from 6 to 8 h decreased P removal efficiencies as the demand for biodegradable organic C for denitrification increased as a result of complete nitrification. Decreasing DO concentration in the aerobic period from 2 to 0.8 mg L(-1) increased P removal efficiency but decreased nitrification rates possibly due to oxygen limitation. Further, sedimented wastewater was proved to be a better influent stream than non-sedimented wastewater possibility due to the detrimental effect of particulate matter on biological nutrient removal.

  13. Sequencing treatment of landfill leachate using ammonia stripping, Fenton oxidation and biological treatment.

    PubMed

    Nurisepehr, Mohammad; Jorfi, Sahand; Rezaei Kalantary, Roshanak; Akbari, Hamideh; Soltani, Reza Darvishi Cheshmeh; Samaei, Mohamad

    2012-09-01

    Landfill leachates contain a wide variety of pollutants such as organic matter, refractory compounds, ammonia, particulate and dissolved solids and hazardous metals requiring application of advanced and well designed treatment processes before release to the environment. The main purpose of this research was to evaluate the efficiency of combined air stripping, Fenton oxidation and biological treatment in treating landfill leachate, especially the elimination of ammonia and refractory organics. The laboratory scale set-up consisted of three sequential but separate steps. The optimum conditions for air stripping and the Fenton oxidation were determined for landfill leachate from Karaj city, Iran. The final step was a moving bed bioreactor with HRTs of 18, 12 and 6 h. The highest NH(3)-N removal was 79% in the air stripping process at pH 10.5. At the optimum conditions for the Fenton reaction at a reaction time of 90 min, pH 3 and a H(2)O(2)/Fe(2+) mass ratio of 20, the COD removal was 61% and improved the BOD/COD ratio from 0.42 to 0.78. The overall COD removal including the final biological reactor with a HRT of 6 h resulted in an effluent COD concentration of less than 100 mg L(-1).

  14. A Biomechanical Comparison of Proportional Electromyography Control to Biological Torque Control Using a Powered Hip Exoskeleton.

    PubMed

    Young, Aaron J; Gannon, Hannah; Ferris, Daniel P

    2017-01-01

    Despite a large increase in robotic exoskeleton research, there are few studies that have examined human performance with different control strategies on the same exoskeleton device. Direct comparison studies are needed to determine how users respond to different types of control. The purpose of this study was to compare user performance using a robotic hip exoskeleton with two different controllers: a controller that targeted a biological hip torque profile and a proportional myoelectric controller. We tested both control approaches on 10 able-bodied subjects using a pneumatically powered hip exoskeleton. The state machine controller targeted a biological hip torque profile. The myoelectric controller used electromyography (EMG) of lower limb muscles to produce a proportional control signal for the hip exoskeleton. Each subject performed two 30-min exoskeleton walking trials (1.0 m/s) using each controller and a 10-min trial with the exoskeleton unpowered. During each trial, we measured subjects' metabolic cost of walking, lower limb EMG profiles, and joint kinematics and kinetics (torques and powers) using a force treadmill and motion capture. Compared to unassisted walking in the exoskeleton, myoelectric control significantly reduced metabolic cost by 13% (p = 0.005) and biological hip torque control reduced metabolic cost by 7% (p = 0.261). Subjects reduced muscle activity relative to the unpowered condition for a greater number of lower limb muscles using myoelectric control compared to the biological hip torque control. More subjects subjectively preferred the myoelectric controller to the biological hip torque control. Myoelectric control had more advantages (metabolic cost and muscle activity reduction) compared to a controller that targeted a biological torque profile for walking with a robotic hip exoskeleton. However, these results were obtained with a single exoskeleton device with specific control configurations while level walking at a

  15. Comparison of Two Massively Parallel Sequencing Platforms using 83 Single Nucleotide Polymorphisms for Human Identification.

    PubMed

    Apaga, Dame Loveliness T; Dennis, Sheila E; Salvador, Jazelyn M; Calacal, Gayvelline C; De Ungria, Maria Corazon A

    2017-03-24

    The potential of Massively Parallel Sequencing (MPS) technology to vastly expand the capabilities of human identification led to the emergence of different MPS platforms that use forensically relevant genetic markers. Two of the MPS platforms that are currently available are the MiSeq(®) FGx™ Forensic Genomics System (Illumina) and the HID-Ion Personal Genome Machine (PGM)™ (Thermo Fisher Scientific). These are coupled with the ForenSeq™ DNA Signature Prep kit (Illumina) and the HID-Ion AmpliSeq™ Identity Panel (Thermo Fisher Scientific), respectively. In this study, we compared the genotyping performance of the two MPS systems based on 83 SNP markers that are present in both MPS marker panels. Results show that MiSeq(®) FGx™ has greater sample-to-sample variation than the HID-Ion PGM™ in terms of read counts for all the 83 SNP markers. Allele coverage ratio (ACR) values show generally balanced heterozygous reads for both platforms. Two and four SNP markers from the MiSeq(®) FGx™ and HID-Ion PGM™, respectively, have average ACR values lower than the recommended value of 0.67. Comparison of genotype calls showed 99.7% concordance between the two platforms.

  16. [Comparison research on two-stage sequencing batch MBR and one-stage MBR].

    PubMed

    Yuan, Xin-Yan; Shen, Heng-Gen; Sun, Lei; Wang, Lin; Li, Shi-Feng

    2011-01-01

    Aiming at resolving problems in MBR operation, like low nitrogen and phosphorous removal efficiency, severe membrane fouling and etc, comparison research on two-stage sequencing batch MBR (TSBMBR) and one-stage aerobic MBR has been done in this paper. The results indicated that TSBMBR owned advantages of SBR in removing nitrogen and phosphorous, which could make up the deficiency of traditional one-stage aerobic MBR in nitrogen and phosphorous removal. During steady operation period, effluent average NH4(+) -N, TN and TP concentration is 2.83, 12.20, 0.42 mg/L, which could reach domestic scenic environment use. From membrane fouling control point of view, TSBMBR has lower SMP in supernatant, specific trans-membrane flux deduction rate, membrane fouling resistant than one-stage aerobic MBR. The sedimentation and gel layer resistant of TSBMBR was only 6.5% and 33.12% of one-stage aerobic MBR. Besides high efficiency in removing nitrogen and phosphorous, TSBMBR could effectively reduce sedimentation and gel layer pollution on membrane surface. Comparing with one-stage MBR, TSBMBR could operate with higher trans-membrane flux, lower membrane fouling rate and better pollutants removal effects.

  17. Removal of typical endocrine disrupting chemicals by membrane bioreactor: in comparison with sequencing batch reactor.

    PubMed

    Zhou, Yingjun; Huang, Xia; Zhou, Haidong; Chen, Jianhua; Xue, Wenchao

    2011-01-01

    The removal of endocrine disrupting chemicals (EDCs) by a laboratory-scale membrane bioreactor (MBR) fed with synthetic sewage was evaluated and moreover, compared with that by a sequencing batch reactor (SBR) operated under same conditions in parallel. Eight kinds of typical EDCs, including 17β-estradiol (E2), estrone (E1), estriol (E3), 17α-ethynilestradiol (EE2), 4-octylphenol (4-OP), 4-nonylphenol (4-NP), bisphenol A (BPA) and nonylphenol ethoxylates (NPnEO), were spiked into the feed. Their concentrations in influent, effluent and supernatant were determined by gas chromatography-mass spectrometry method. The overall estrogenecity was evaluated as 17β-estradiol equivalent quantity (EEQ), determined via yeast estrogen screen (YES) assay. E2, E3, BPA and 4-OP were well removed by both MBR and SBR, with removal rates more than 95% and no significant differences between the two reactors. However, with regard to the other four EDCs, of which the removal rates were lower, MBR performed better. Comparison between supernatant and effluent of the two reactors indicated that membrane separation of sludge and effluent, compared with sedimentation, can relatively improve elimination of target EDCs and total estrogenecity. By applying different solids retention times (SRTs) (5, 10, 20 and 40 d) to the MBR, 10 and 5 d were found to be the lower critical SRTs for efficient target EDCs and EEQ removal, respectively.

  18. Identification of Simple Sequence Repeat Biomarkers through Cross-Species Comparison in a Tag Cloud Representation

    PubMed Central

    2014-01-01

    Simple sequence repeats (SSRs) are not only applied as genetic markers in evolutionary studies but they also play an important role in gene regulatory activities. Efficient identification of conserved and exclusive SSRs through cross-species comparison is helpful for understanding the evolutionary mechanisms and associations between specific gene groups and SSR motifs. In this paper, we developed an online cross-species comparative system and integrated it with a tag cloud visualization technique for identifying potential SSR biomarkers within fourteen frequently used model species. Ultraconserved or exclusive SSRs among cross-species orthologous genes could be effectively retrieved and displayed through a friendly interface design. Four different types of testing cases were applied to demonstrate and verify the retrieved SSR biomarker candidates. Through statistical analysis and enhanced tag cloud representation on defined functional related genes and cross-species clusters, the proposed system can correctly represent the patterns, loci, colors, and sizes of identified SSRs in accordance with gene functions, pattern qualities, and conserved characteristics among species. PMID:24800246

  19. Complete Genome Sequence of the Grouper Iridovirus and Comparison of Genomic Organization with Those of Other Iridoviruses

    PubMed Central

    Tsai, Chih-Tung; Ting, Jing-Wen; Wu, Ming-Hsien; Wu, Ming-Feng; Guo, Ing-Cherng; Chang, Chi-Yao

    2005-01-01

    The complete DNA sequence of grouper iridovirus (GIV) was determined using a whole-genome shotgun approach on virion DNA. The circular form genome was 139,793 bp in length with a 49% G+C content. It contained 120 predicted open reading frames (ORFs) with coding capacities ranging from 62 to 1,268 amino acids. A total of 21% (25 of 120) of GIV ORFs are conserved in the other five sequenced iridovirus genomes, including DNA replication, transcription, nucleotide metabolism, protein modification, viral structure, and virus-host interaction genes. The whole-genome nucleotide pairwise comparison showed that GIV virus was partially colinear with counterparts of previously sequenced ranaviruses (ATV and TFV). Besides, sequence analysis revealed that GIV possesses several unique features which are different from those of other complete sequenced iridovirus genomes: (i) GIV is the first ranavirus-like virus which has been sequenced completely and which infects fish other than amphibians, (ii) GIV is the only vertebrate iridovirus without CpG sequence methylation and lacking DNA methyltransferase, (iii) GIV contains a purine nucleoside phosphorylase gene which is not found in other iridoviruses or in any other viruses, (iv) GIV contains 17 sets of repeat sequence, with basic unit sizes ranging from 9 to 63 bp, dispersed throughout the whole genome. These distinctive features of GIV further extend our understanding of molecular events taking place between ranavirus and its hosts and the iridovirus evolution. PMID:15681403

  20. Evolutionary connections of biological kingdoms based on protein and nucleic acid sequence evidence

    NASA Technical Reports Server (NTRS)

    Dayhoff, M. O.

    1983-01-01

    Prokaryotic and eukaryotic evolutionary trees are developed from protein and nucleic-acid sequences by the methods of numerical taxonomy. Trees are presented for bacterial ferredoxins, 5S ribosomal RNA, c-type cytochromes , cytochromes c2 and c', and 5.8S ribosomal RNA; the implications for early evolution are discussed; and a composite tree showing the branching of the anaerobes, aerobes, archaebacteria, and eukaryotes is shown. Single lines are found for all oxygen-evolving photosynthetic forms and for the salt-loving and high-temperature forms of archaebacteria. It is argued that the eukaryote mitochondria, chloroplasts, and cytoplasmic host material are descended from free-living prokaryotes that formed symbiotic associations, with more than one symbiotic event involved in the evolution of each organelle.

  1. Evolutionary connections of biological kingdoms based on protein and nucleic acid sequence evidence

    NASA Technical Reports Server (NTRS)

    Dayhoff, M. O.

    1983-01-01

    Prokaryotic and eukaryotic evolutionary trees are developed from protein and nucleic-acid sequences by the methods of numerical taxonomy. Trees are presented for bacterial ferredoxins, 5S ribosomal RNA, c-type cytochromes , cytochromes c2 and c', and 5.8S ribosomal RNA; the implications for early evolution are discussed; and a composite tree showing the branching of the anaerobes, aerobes, archaebacteria, and eukaryotes is shown. Single lines are found for all oxygen-evolving photosynthetic forms and for the salt-loving and high-temperature forms of archaebacteria. It is argued that the eukaryote mitochondria, chloroplasts, and cytoplasmic host material are descended from free-living prokaryotes that formed symbiotic associations, with more than one symbiotic event involved in the evolution of each organelle.

  2. Biological removal of selenate and ammonium by activated sludge in a sequencing batch reactor.

    PubMed

    Mal, J; Nancharaiah, Y V; van Hullebusch, E D; Lens, P N L

    2017-04-01

    Wastewaters contaminated by both selenium and ammonium need to be treated prior to discharge into natural water bodies, but there are no studies on the simultaneous removal of selenium and ammonium. A sequencing batch reactor (SBR) was inoculated with activated sludge and operated for 90days. The highest ammonium removal efficiency achieved was 98%, while the total nitrogen removal was 75%. Nearly a complete chemical oxygen demand removal efficiency was attained after 16days of operation, whereas complete selenate removal was achieved only after 66days. The highest total Se removal efficiency was 97%. Batch experiments showed that the total Se in the aqueous phase decreased by 21% with increasing initial ammonium concentration from 50 to 100mgL(-1). This study showed that SBR can remove both selenate and ammonium via, respectively, bioreduction and partial nitrification-denitrification and thus offer possibilities for treating selenium and ammonium contaminated effluents.

  3. Single-cell RNA-sequencing: The future of genome biology is now.

    PubMed

    Picelli, Simone

    2017-05-04

    Genome-wide single-cell analysis represents the ultimate frontier of genomics research. In particular, single-cell RNA-sequencing (scRNA-seq) studies have been boosted in the last few years by an explosion of new technologies enabling the study of the transcriptomic landscape of thousands of single cells in complex multicellular organisms. More sensitive and automated methods are being continuously developed and promise to deliver better data quality and higher throughput with less hands-on time. The outstanding amount of knowledge that is going to be gained from present and future studies will have a profound impact in many aspects of our society, from the introduction of truly tailored cancer treatments, to a better understanding of antibiotic resistance and host-pathogen interactions; from the discovery of the mechanisms regulating stem cell differentiation to the characterization of the early event of human embryogenesis.

  4. Single-cell RNA-sequencing: The future of genome biology is now

    PubMed Central

    Picelli, Simone

    2017-01-01

    ABSTRACT Genome-wide single-cell analysis represents the ultimate frontier of genomics research. In particular, single-cell RNA-sequencing (scRNA-seq) studies have been boosted in the last few years by an explosion of new technologies enabling the study of the transcriptomic landscape of thousands of single cells in complex multicellular organisms. More sensitive and automated methods are being continuously developed and promise to deliver better data quality and higher throughput with less hands-on time. The outstanding amount of knowledge that is going to be gained from present and future studies will have a profound impact in many aspects of our society, from the introduction of truly tailored cancer treatments, to a better understanding of antibiotic resistance and host-pathogen interactions; from the discovery of the mechanisms regulating stem cell differentiation to the characterization of the early event of human embryogenesis. PMID:27442339

  5. Complete amino acid sequence of globin chains and biological activity of fragmented crocodile hemoglobin (Crocodylus siamensis).

    PubMed

    Srihongthong, Saowaluck; Pakdeesuwan, Anawat; Daduang, Sakda; Araki, Tomohiro; Dhiravisit, Apisak; Thammasirirak, Sompong

    2012-08-01

    Hemoglobin, α-chain, β-chain and fragmented hemoglobin of Crocodylus siamensis demonstrated both antibacterial and antioxidant activities. Antibacterial and antioxidant properties of the hemoglobin did not depend on the heme structure but could result from the compositions of amino acid residues and structures present in their primary structure. Furthermore, thirteen purified active peptides were obtained by RP-HPLC analyses, corresponding to fragments in the α-globin chain and the β-globin chain which are mostly located at the N-terminal and C-terminal parts. These active peptides operate on the bacterial cell membrane. The globin chains of Crocodylus siamensis showed similar amino acids to the sequences of Crocodylus niloticus. The novel amino acid substitutions of α-chain and β-chain are not associated with the heme binding site or the bicarbonate ion binding site, but could be important through their interactions with membranes of bacteria.

  6. Easy Bioinformatics Analysis (EBiAn): a package for manipulating and analysis of short biological sequences

    PubMed Central

    Bertucci Barbosa, Luiz Carlos; Garrido, Saulo Santesso; Garcia, Anderson; Delfino, Davi Barbosa; Gonçalves, Rodrigo Duarte; Marchetto, Reinaldo

    2010-01-01

    The work of biochemists and molecular biologists often is dependent or extremely favored by a preliminary computer analysis. Thus, the development of an efficient and friendly computational tool is very important. In this work, we developed a package of programs in Javascript language which can be used online or locally. The programs depend exclusively of Web browsers and are compatible with Internet Explorer, Opera, Mozilla Firefox and Google Chrome. With the EBiAn package it is can perform the main analysis and manipulation of DNA, RNA, proteins and peptides sequences. The programs can be freely accessed and adapted or modified to generate new programs. Availability http://www.iq.unesp.br/EXTENSAO/EBiAn/html/ebian.html PMID:21346860

  7. Enhanced biological nutrient removal in sequencing batch reactors operated as static/oxic/anoxic (SOA) process.

    PubMed

    Xu, Dechao; Chen, Hongbo; Li, Xiaoming; Yang, Qi; Zeng, Tianjing; Luo, Kun; Zeng, Guangming

    2013-09-01

    An innovative static/oxic/anoxic (SOA) activated sludge process characterized by static phase as a substitute for conventional anaerobic stage was developed to enhance biological nutrient removal (BNR) with influent ammonia of 20 and 40 mg/L in R1 and R2 reactors, respectively. The results demonstrated that static phase could function as conventional anaerobic stage. In R1 lower influent ammonia concentration facilitated more polyphosphate accumulating organisms (PAOs) growth, but secondary phosphorus release occurred due to NOx(-) depletion during post-anoxic period. In R2, however, denitrifying phosphorus removal proceeded with sufficient NOx(-). Both R1 and R2 saw simultaneous nitrification-denitrification. Glycogen was utilized to drive post-denitrification with denitrification rates in excess of typical endogenous decay rates. The anoxic stirring duration could be shortened from 3 to 1.5h to avoid secondary phosphorus release in R1 and little adverse impact was found on nutrients removal in R2.

  8. Biological SOAP servers and web services provided by the public sequence data bank

    PubMed Central

    Sugawara, H.; Miyazaki, S.

    2003-01-01

    A number of biological data resources (i.e. databases and data analytical tools) are searchable and usable on-line thanks to the internet and the World Wide Web (WWW) servers. The output from the web server is easy for us to browse. However, it is laborious and sometimes impossible for us to write a computer program that finds a useful data resource, sends a proper query and processes the output. It is a serious obstacle to the integration of distributed heterogeneous data resources. To solve the issue, we have implemented a SOAP (Simple Object Access Protocol) server and web services that provide a program-friendly interface. The web services are accessible at http://www.xml.nig.ac.jp/. PMID:12824432

  9. RNA-Sequencing Reveals Biological Networks during Table Grapevine (‘Fujiminori’) Fruit Development

    PubMed Central

    Shangguan, Lingfei; Mu, Qian; Fang, Xiang; Zhang, Kekun; Jia, Haifeng; Li, Xiaoying; Bao, Yiqun; Fang, Jinggui

    2017-01-01

    Grapevine berry development is a complex and genetically controlled process, with many morphological, biochemical and physiological changes occurring during the maturation process. Research carried out on grapevine berry development has been mainly concerned with wine grape, while barely focusing on table grape. ‘Fujiminori’ is an important table grapevine cultivar, which is cultivated in most provinces of China. In order to uncover the dynamic networks involved in anthocyanin biosynthesis, cell wall development, lipid metabolism and starch-sugar metabolism in ‘Fujiminori’ fruit, we employed RNA-sequencing (RNA-seq) and analyzed the whole transcriptome of grape berry during development at the expanding period (40 days after full bloom, 40DAF), véraison period (65DAF), and mature period (90DAF). The sequencing depth in each sample was greater than 12×, and the expression level of nearly half of the expressed genes were greater than 1. Moreover, greater than 64% of the clean reads were aligned to the Vitis vinifera reference genome, and 5,620, 3,381, and 5,196 differentially expressed genes (DEGs) were identified between different fruit stages, respectively. Results of the analysis of DEGs showed that the most significant changes in various processes occurred from the expanding stage to the véraison stage. The expression patterns of F3’H and F3’5’H were crucial in determining red or blue color of the fruit skin. The dynamic networks of cell wall development, lipid metabolism and starch-sugar metabolism were also constructed. A total of 4,934 SSR loci were also identified from 4,337 grapevine genes, which may be helpful for the development of phylogenetic analysis in grapevine and other fruit trees. Our work provides the foundation for developmental research of grapevine fruit as well as other non-climacteric fruits. PMID:28118385

  10. The optimal design of stepped wedge trials with equal allocation to sequences and a comparison to other trial designs.

    PubMed

    Thompson, Jennifer A; Fielding, Katherine; Hargreaves, James; Copas, Andrew

    2017-08-01

    Background/Aims We sought to optimise the design of stepped wedge trials with an equal allocation of clusters to sequences and explored sample size comparisons with alternative trial designs. Methods We developed a new expression for the design effect for a stepped wedge trial, assuming that observations are equally correlated within clusters and an equal number of observations in each period between sequences switching to the intervention. We minimised the design effect with respect to (1) the fraction of observations before the first and after the final sequence switches (the periods with all clusters in the control or intervention condition, respectively) and (2) the number of sequences. We compared the design effect of this optimised stepped wedge trial to the design effects of a parallel cluster-randomised trial, a cluster-randomised trial with baseline observations, and a hybrid trial design (a mixture of cluster-randomised trial and stepped wedge trial) with the same total cluster size for all designs. Results We found that a stepped wedge trial with an equal allocation to sequences is optimised by obtaining all observations after the first sequence switches and before the final sequence switches to the intervention; this means that the first sequence remains in the control condition and the last sequence remains in the intervention condition for the duration of the trial. With this design, the optimal number of sequences is [Formula: see text], where [Formula: see text] is the cluster-mean correlation, [Formula: see text] is the intracluster correlation coefficient, and m is the total cluster size. The optimal number of sequences is small when the intracluster correlation coefficient and cluster size are small and large when the intracluster correlation coefficient or cluster size is large. A cluster-randomised trial remains more efficient than the optimised stepped wedge trial when the intracluster correlation coefficient or cluster size is small. A

  11. Very high resolution single pass HLA genotyping using amplicon sequencing on the 454 next generation DNA sequencers: Comparison with Sanger sequencing.

    PubMed

    Yamamoto, F; Höglund, B; Fernandez-Vina, M; Tyan, D; Rastrou, M; Williams, T; Moonsamy, P; Goodridge, D; Anderson, M; Erlich, H A; Holcomb, C L

    2015-12-01

    Compared to Sanger sequencing, next-generation sequencing offers advantages for high resolution HLA genotyping including increased throughput, lower cost, and reduced genotype ambiguity. Here we describe an enhancement of the Roche 454 GS GType HLA genotyping assay to provide very high resolution (VHR) typing, by the addition of 8 primer pairs to the original 14, to genotype 11 HLA loci. These additional amplicons help resolve common and well-documented alleles and exclude commonly found null alleles in genotype ambiguity strings. Simplification of workflow to reduce the initial preparation effort using early pooling of amplicons or the Fluidigm Access Array™ is also described. Performance of the VHR assay was evaluated on 28 well characterized cell lines using Conexio Assign MPS software which uses genomic, rather than cDNA, reference sequence. Concordance was 98.4%; 1.6% had no genotype assignment. Of concordant calls, 53% were unambiguous. To further assess the assay, 59 clinical samples were genotyped and results compared to unambiguous allele assignments obtained by prior sequence-based typing supplemented with SSO and/or SSP. Concordance was 98.7% with 58.2% as unambiguous calls; 1.3% could not be assigned. Our results show that the amplicon-based VHR assay is robust and can replace current Sanger methodology. Together with software enhancements, it has the potential to provide even higher resolution HLA typing. Copyright © 2015. Published by Elsevier Inc.

  12. Structural Identifiability of Systems Biology Models: A Critical Comparison of Methods

    PubMed Central

    Chis, Oana-Teodora; Banga, Julio R.; Balsa-Canto, Eva

    2011-01-01

    Analysing the properties of a biological system through in silico experimentation requires a satisfactory mathematical representation of the system including accurate values of the model parameters. Fortunately, modern experimental techniques allow obtaining time-series data of appropriate quality which may then be used to estimate unknown parameters. However, in many cases, a subset of those parameters may not be uniquely estimated, independently of the experimental data available or the numerical techniques used for estimation. This lack of identifiability is related to the structure of the model, i.e. the system dynamics plus the observation function. Despite the interest in knowing a priori whether there is any chance of uniquely estimating all model unknown parameters, the structural identifiability analysis for general non-linear dynamic models is still an open question. There is no method amenable to every model, thus at some point we have to face the selection of one of the possibilities. This work presents a critical comparison of the currently available techniques. To this end, we perform the structural identifiability analysis of a collection of biological models. The results reveal that the generating series approach, in combination with identifiability tableaus, offers the most advantageous compromise among range of applicability, computational complexity and information provided. PMID:22132135

  13. Comparison of the Biological Characteristics of Mesenchymal Stem Cells Derived from Bone Marrow and Skin

    PubMed Central

    Liu, Ruifeng; Chang, Wenjuan; Wei, Hong; Zhang, Kaiming

    2016-01-01

    Mesenchymal stem cells (MSCs) exhibit high proliferation and self-renewal capabilities and are critical for tissue repair and regeneration during ontogenesis. They also play a role in immunomodulation. MSCs can be isolated from a variety of tissues and have many potential applications in the clinical setting. However, MSCs of different origins may possess different biological characteristics. In this study, we performed a comprehensive comparison of MSCs isolated from bone marrow and skin (BMMSCs and SMSCs, resp.), including analysis of the skin sampling area, separation method, culture conditions, primary and passage culture times, cell surface markers, multipotency, cytokine secretion, gene expression, and fibroblast-like features. The results showed that the MSCs from both sources had similar cell morphologies, surface markers, and differentiation capacities. However, the two cell types exhibited major differences in growth characteristics; the primary culture time of BMMSCs was significantly shorter than that of SMSCs, whereas the growth rate of BMMSCs was lower than that of SMSCs after passaging. Moreover, differences in gene expression and cytokine secretion profiles were observed. For example, secretion of proliferative cytokines was significantly higher for SMSCs than for BMMSCs. Our findings provide insights into the different biological functions of both cell types. PMID:27239202

  14. Interlaboratory comparison of size and surface charge measurements on nanoparticles prior to biological impact assessment

    NASA Astrophysics Data System (ADS)

    Roebben, G.; Ramirez-Garcia, S.; Hackley, V. A.; Roesslein, M.; Klaessig, F.; Kestens, V.; Lynch, I.; Garner, C. M.; Rawle, A.; Elder, A.; Colvin, V. L.; Kreyling, W.; Krug, H. F.; Lewicka, Z. A.; McNeil, S.; Nel, A.; Patri, A.; Wick, P.; Wiesner, M.; Xia, T.; Oberdörster, G.; Dawson, K. A.

    2011-07-01

    The International Alliance for NanoEHS Harmonization (IANH) organises interlaboratory comparisons of methods used to study the potential biological impacts of nanomaterials. The aim of IANH is to identify and reduce or remove sources of variability and irreproducibility in existing protocols. Here, we present results of the first IANH round robin studies into methods to assess the size and surface charge of suspended nanoparticles. The test materials used (suspensions of gold, silica, polystyrene, and ceria nanoparticles, with [primary] particles sizes between 10 nm and 80 nm) were first analysed in repeatability conditions to assess the possible contribution of between-sample heterogeneity to the between-laboratory variability. Reproducibility of the selected methods was investigated in an interlaboratory comparison between ten different laboratories in the USA and Europe. Robust statistical analysis was used to evaluate within- and between-laboratory variability. It is shown that, if detailed shipping, measurement, and reporting protocols are followed, measurement of the hydrodynamic particle diameter of nanoparticles in predispersed monomodal suspensions using the dynamic light scattering method is reproducible. On the other hand, measurements of more polydisperse suspensions of nanoparticle aggregates or agglomerates were not reproducible between laboratories. Ultrasonication, which is commonly used to prepare dispersions before cell exposures, was observed to further increase variability. The variability of the zeta potential values, which were also measured, indicates the need to define better surface charge test protocols and to identify sources of variability.

  15. Biological treatment of oil field wastewater in a sequencing batch reactor.

    PubMed

    Freire, D D; Cammarota, M C; Santanna, G L

    2001-10-01

    This work reports the results of experiments carried out in a sequencing batch reactor (SBR) operated under 24 hour cycles, treating an effluent containing a mixture of oil field wastewater and sewage, in different percentages. The removal of phenols, ammonium and COD was monitored in several experimental runs, varying the dilution degree of the oilfield wastewater (10 to 45% v/v). The volatile suspended solids (VSS) content in the reactor was also monitored and the protein (PTN) and polysaccharide (PS) contents of the suspended biomass were determined. The removal of ammonium and phenols did not vary significantly in the experimental runs, attaining average values of 95% and 65%, respectively. COD removal efficiencies in the range of 30 to 50% were attained in the experiments carried out with dilution percentages of 45 and 35% (v/v) respectively. An experiment carried out with a lower proportion of produced water (15% v/v), keeping the salinity level corresponding to a higher proportion of industrial effluent (45% v/v), led to an improvement in the COD removal, indicating that the recalcitrance of the organic compounds found in the effluent is the main cause ofthe moderate COD removal efficiencies attained in the SBR system. With regard to the composition of the microbial flocs, no significant variation was observed in the PS/PTN, PS/VSS and PTN/VSS ratios when the effluent composition changed (increased salinity and levels of organic material).

  16. Biological nutrient removal from meat processing wastewater using a sequencing batch reactor.

    PubMed

    Thayalakumaran, N; Bhamidimarri, R; Bickers, P O

    2003-01-01

    Meat processing effluents are rich in nutrients (nitrogen: 75-200 mg L(-1) and phosphorus: 20-40 mg L(-1)) and COD (800-2,000 mg L(-1)) after primary treatment. A laboratory scale sequencing batch reactor (SBR) was operated for the treatment of a beef processing effluent from slaughtering and boning operations. An effective SBR cycle was found for removal of COD, nitrogen and phosphorus at 22 degrees C. The solid retention time was 15 days while the hydraulic retention time (HRT) was 2.5 days. The total nitrogen in the wastewater was reduced to less than 10 mg L(-1), while the total phosphorus decreased to less than 1.0 mg L(-1). The residual effluent soluble COD was found to be non-biodegradable as reflected by no further soluble COD removal following prolonged aeration. Removal of biodegradable soluble COD, ammonia nitrogen and soluble phosphate phosphorus of greater than 99% was achieved in the SBR. Good prediction of ammonia and nitrate nitrogen removal was obtained using IWA Activated Sludge Model. The operating cycle is shown to be appropriate to achieve simultaneous removal of COD and nutrients from the meat processing wastewater. Alkalinity and pH have an inverse relationship during the initial anaerobic and aerobic stages due to production and stripping of CO2. Use of a low level of DO in the final aerobic stage ensured complete ammonia removal and enhanced denitrification.

  17. Automatic summarization of changes in biological image sequences using algorithmic information theory.

    PubMed

    Cohen, Andrew R; Bjornsson, Christopher S; Temple, Sally; Banker, Gary; Roysam, Badrinath

    2009-08-01

    An algorithmic information-theoretic method is presented for object-level summarization of meaningful changes in image sequences. Object extraction and tracking data are represented as an attributed tracking graph (ATG). Time courses of object states are compared using an adaptive information distance measure, aided by a closed-form multidimensional quantization. The notion of meaningful summarization is captured by using the gap statistic to estimate the randomness deficiency from algorithmic statistics. The summary is the clustering result and feature subset that maximize the gap statistic. This approach was validated on four bioimaging applications: 1) It was applied to a synthetic data set containing two populations of cells differing in the rate of growth, for which it correctly identified the two populations and the single feature out of 23 that separated them; 2) it was applied to 59 movies of three types of neuroprosthetic devices being inserted in the brain tissue at three speeds each, for which it correctly identified insertion speed as the primary factor affecting tissue strain; 3) when applied to movies of cultured neural progenitor cells, it correctly distinguished neurons from progenitors without requiring the use of a fixative stain; and 4) when analyzing intracellular molecular transport in cultured neurons undergoing axon specification, it automatically confirmed the role of kinesins in axon specification.

  18. Effect of redox conditions on pharmaceutical loss during biological wastewater treatment using sequencing batch reactors.

    PubMed

    Stadler, Lauren B; Su, Lijuan; Moline, Christopher J; Ernstoff, Alexi S; Aga, Diana S; Love, Nancy G

    2015-01-23

    We lack a clear understanding of how wastewater treatment plant (WWTP) process parameters, such as redox environment, impact pharmaceutical fate. WWTPs increasingly install more advanced aeration control systems to save energy and achieve better nutrient removal performance. The impact of redox condition, and specifically the use of microaerobic (low dissolved oxygen) treatment, is poorly understood. In this study, the fate of a mixture of pharmaceuticals and several of their transformation products present in the primary effluent of a local WWTP was assessed in sequencing batch reactors operated under different redox conditions: fully aerobic, anoxic/aerobic, and microaerobic (DO concentration ≈0.3mg/L). Among the pharmaceuticals that were tracked during this study (atenolol, trimethoprim, sulfamethoxazole, desvenlafaxine, venlafaxine, and phenytoin), overall loss varied between them and between redox environments. Losses of atenolol and trimethoprim were highest in the aerobic reactor; sulfamethoxazole loss was highest in the microaerobic reactors; and phenytoin was recalcitrant in all reactors. Transformation products of sulfamethoxazole and desvenlafaxine resulted in the reformation of their parent compounds during treatment. The results suggest that transformation products must be accounted for when assessing removal efficiencies and that redox environment influences the degree of pharmaceutical loss. Copyright © 2014 Elsevier B.V. All rights reserved.

  19. Uncovering disease mechanisms through network biology in the era of Next Generation Sequencing

    NASA Astrophysics Data System (ADS)

    Piñero, Janet; Berenstein, Ariel; Gonzalez-Perez, Abel; Chernomoretz, Ariel; Furlong, Laura I.

    2016-04-01

    Characterizing the behavior of disease genes in the context of biological networks has the potential to shed light on disease mechanisms, and to reveal both new candidate disease genes and therapeutic targets. Previous studies addressing the network properties of disease genes have produced contradictory results. Here we have explored the causes of these discrepancies and assessed the relationship between the network roles of disease genes and their tolerance to deleterious germline variants in human populations leveraging on: the abundance of interactome resources, a comprehensive catalog of disease genes and exome variation data. We found that the most salient network features of disease genes are driven by cancer genes and that genes related to different types of diseases play network roles whose centrality is inversely correlated to their tolerance to likely deleterious germline mutations. This proved to be a multiscale signature, including global, mesoscopic and local network centrality features. Cancer driver genes, the most sensitive to deleterious variants, occupy the most central positions, followed by dominant disease genes and then by recessive disease genes, which are tolerant to variants and isolated within their network modules.

  20. Uncovering disease mechanisms through network biology in the era of Next Generation Sequencing

    PubMed Central

    Piñero, Janet; Berenstein, Ariel; Gonzalez-Perez, Abel; Chernomoretz, Ariel; Furlong, Laura I.

    2016-01-01

    Characterizing the behavior of disease genes in the context of biological networks has the potential to shed light on disease mechanisms, and to reveal both new candidate disease genes and therapeutic targets. Previous studies addressing the network properties of disease genes have produced contradictory results. Here we have explored the causes of these discrepancies and assessed the relationship between the network roles of disease genes and their tolerance to deleterious germline variants in human populations leveraging on: the abundance of interactome resources, a comprehensive catalog of disease genes and exome variation data. We found that the most salient network features of disease genes are driven by cancer genes and that genes related to different types of diseases play network roles whose centrality is inversely correlated to their tolerance to likely deleterious germline mutations. This proved to be a multiscale signature, including global, mesoscopic and local network centrality features. Cancer driver genes, the most sensitive to deleterious variants, occupy the most central positions, followed by dominant disease genes and then by recessive disease genes, which are tolerant to variants and isolated within their network modules. PMID:27080396

  1. Full genome sequence of Brevibacillus laterosporus strain B9, a biological control strain isolated from Zhejiang, China.

    PubMed

    Li, Gengmi; Xu, Jie; Wu, Liwen; Ren, Deyong; Ye, Weijun; Dong, Guojun; Zhu, Li; Zeng, Dali; Guo, Longbiao

    2015-08-10

    Brevibacillus laterosporus was newly classified from Bacillus laterosporus, which has ability to be used as a biological control agent in crop field. B. laterosporus strain B9 is an aerobic, motile, Gram-positive, spore-forming rod that was isolated from a field of Oryza sativa in Zhejiang, China in 2011. This bacterium has been confirmed to be a strong antagonist against bacterial brown strip of rice caused by Acidovorex avenae subsp. avenae. Here we describe the features of B. laterosporus strain B9, together with the complete genome sequence and its annotation. The 5,272,435bp genome contains 4804 protein-coding genes and 227 RNA-only encoding genes with 2 plasmids.

  2. Biological treatment of produced water in a sequencing batch reactor by a consortium of isolated halophilic microorganisms.

    PubMed

    Pendashteh, A R; Fakhru'l-Razi, A; Chuah, T G; Radiah, A B Dayang; Madaeni, S S; Zurina, Z A

    2010-10-01

    Produced water or oilfield wastewater is the largest volume ofa waste stream associated with oil and gas production. The aim of this study was to investigate the biological pretreatment of synthetic and real produced water in a sequencing batch reactor (SBR) to remove hydrocarbon compounds. The SBR was inoculated with isolated tropical halophilic microorganisms capable of degrading crude oil. A total sequence of 24 h (60 min filling phase; 21 h aeration; 60 min settling and 60 min decant phase) was employed and studied. Synthetic produced water was treated with various organic loading rates (OLR) (0.9 kg COD m(-3) d(-1), 1.8 kg COD m(-3) d(-1) and 3.6 kg COD m(-3) d(-1)) and different total dissolved solids (TDS) concentration (35,000 mg L(-1), 100,000 mg L(-1), 150,000 mg L(-1), 200,000 mg L(-1) and 250,000 mg L(-1)). It was found that with an OLR of 0.9 kg COD m(-3) d(-1) and 1.8 kg COD m(-3) d(-1), average oil and grease (O&G) concentrations in the effluent were 7 mg L(-1) and 12 mg L(-1), respectively. At TDS concentration of 35,000 mg L(-1) and at an OLR of 1.8 kg COD m(-3)d(-1), COD and O&G removal efficiencies were more than 90%. However, with increase in salt content to 250,000 mg L(-1), COD and O&G removal efficiencies decreased to 74% and 63%, respectively. The results of biological treatment of real produced water showed that the removal rates of the main pollutants of wastewater, such as COD, TOC and O&G, were above 81%, 83%, and 85%, respectively.

  3. Multiple Comparison Analysis of Two New Genomic Sequences of ILTV Strains from China with Other Strains from Different Geographic Regions.

    PubMed

    Zhao, Yan; Kong, Congcong; Wang, Yunfeng

    2015-01-01

    To date, twenty complete genome sequences of ILTV strains have been published in GenBank, including one strain from China, and nineteen strains from Australian and the United States. To investigate the genomic information on ILTVs from different geographic regions, two additional individual complete genome sequences of WG and K317 strains from China were determined. The genomes of WG and K317 strains were 153,505 and 153,639 bp in length, respectively. Alignments performed on the amino acid sequences of the twelve glycoproteins showed that 13 out of 116 mutational sites were present only among the Chinese strain WG and the Australian strains SA2 and A20. The phylogenetic tree analysis suggested that the WG strain established close relationships with the Australian strain SA2. The recombination events were detected and confirmed in different subregions of the WG strain with the sequences of SA2 and K317 strains as parental. In this study, two new complete genome sequences of Chinese ILTV strains were used in comparative analysis with other complete genome sequences of ILTV strains from China, the United States, and Australia. The analysis of genome comparison, phylogenetic trees, and recombination events showed close relationships among the Chinese strain WG and the Australian strains SA2. The information of the two new complete genome sequences from China will help to facilitate the analysis of phylogenetic relationships and the molecular differences among ILTV strains from different geographic regions.

  4. Biological degradation of catechol in wastewater using the sequencing continuous-inflow reactor (SCR)

    PubMed Central

    2013-01-01

    Catechol is used in many industries. It can be removed from wastewater by various methods but biological processes are the most superior and commonly used technology. The SCR is a modified form of SBR used to degrade catechol. The objective of this study was to investigate the performance of SCR for biodegradation and mineralization of catechol under various inlet concentrations (630–1500 mg/L) and hydraulic retention times (HRT) (18–9 h). This study used a bench scale SCR setup to test catechol degradation. The acclimation time of biomass for catechol at degradation at 630 mg/L was 41 d. The SCR operating cycle time was 6 h and the consecutive times taken for aerating, settling and decanting were 4, 1.5 and 0.5 h, respectively. This study investigated the effects of inlet catechol concentration (630–1560 mg/L) and HRT (18–9 h). The average catechol removal efficiencies in steady-state conditions of 630, 930, 12954 and 1559 mg/L of catechol were 98.5%, 98.5%, 98.2% and 96.9% in terms catechol and 97.8%, 97.7%, 96.4% and 94.3% for COD, respectively. SCR with acclimated biomasses could effectively remove the catechol and the corresponding COD from wastewater with concentrations of up to 1560, at the loading rate of 5.38 kg COD/m3.d and at a HRT of up to 13 h. The HRT was determined as an important variable affecting catechol removal from wastewater. Reducing the HRT to below 13 h led to reduced removal of catechol and COD. PMID:24499534

  5. From First Base: The Sequence of the Tip of the X Chromosome of Drosophila melanogaster, a Comparison of Two Sequencing Strategies

    PubMed Central

    Benos, Panayiotis V.; Gatt, Melanie K.; Murphy, Lee; Harris, David; Barrell, Bart; Ferraz, Concepcion; Vidal, Sophie; Brun, Christine; Demaille, Jacques; Cadieu, Edouard; Dreano, Stephane; Gloux, Stéphanie; Lelaure, Valerie; Mottier, Stephanie; Galibert, Francis; Borkova, Dana; Miñana, Belen; Kafatos, Fotis C.; Bolshakov, Slava; Sidén-Kiamos, Inga; Papagiannakis, George; Spanos, Lefteris; Louis, Christos; Madueño, Encarnación; de Pablos, Beatriz; Modolell, Juan; Peter, Annette; Schöttler, Petra; Werner, Meike; Mourkioti, Fotini; Beinert, Nicole; Dowe, Gordon; Schäfer, Ulrich; Jäckle, Herbert; Bucheton, Alain; Callister, Debbie; Campbell, Lorna; Henderson, Nadine S.; McMillan, Paul J.; Salles, Cathy; Tait, Evelyn; Valenti, Phillipe; Saunders, Robert D.C.; Billaud, Alain; Pachter, Lior; Glover, David M.; Ashburner, Michael

    2001-01-01

    We present the sequence of a contiguous 2.63 Mb of DNA extending from the tip of the X chromosome of Drosophila melanogaster. Within this sequence, we predict 277 protein coding genes, of which 94 had been sequenced already in the course of studying the biology of their gene products, and examples of 12 different transposable elements. We show that an interval between bands 3A2 and 3C2, believed in the 1970s to show a correlation between the number of bands on the polytene chromosomes and the 20 genes identified by conventional genetics, is predicted to contain 45 genes from its DNA sequence. We have determined the insertion sites of P-elements from 111 mutant lines, about half of which are in a position likely to affect the expression of novel predicted genes, thus representing a resource for subsequent functional genomic analysis. We compare the European Drosophila Genome Project sequence with the corresponding part of the independently assembled and annotated Joint Sequence determined through “shotgun” sequencing. Discounting differences in the distribution of known transposable elements between the strains sequenced in the two projects, we detected three major sequence differences, two of which are probably explained by errors in assembly; the origin of the third major difference is unclear. In addition there are eight sequence gaps within the Joint Sequence. At least six of these eight gaps are likely to be sites of transposable elements; the other two are complex. Of the 275 genes in common to both projects, 60% are identical within 1% of their predicted amino-acid sequence and 31% show minor differences such as in choice of translation initiation or termination codons; the remaining 9% show major differences in interpretation. [All of the sequences analyzed in this paper have been deposited in the EMBL-Bank database under the following accession nos.: AL009146, AL009147, AL009171, AL009188–AL009196, AL021067, AL021086, AL021106–AL021108, AL021726, AL

  6. Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications

    USDA-ARS?s Scientific Manuscript database

    Analysis of DNA methylation patterns relies increasingly on sequencing-based profiling methods. The four most frequently used sequencing-based technologies are the bisulfite-based methods MethylC-seq and reduced representation bisulfite sequencing (RRBS), and the enrichment-based techniques methylat...

  7. The detection of microRNA associated with Alzheimer's disease in biological fluids using next-generation sequencing technologies

    PubMed Central

    Cheng, Lesley; Quek, Camelia Y. J.; Sun, Xin; Bellingham, Shayne A.; Hill, Andrew F.

    2013-01-01

    Diagnostic tools for neurodegenerative diseases such as Alzheimer's disease (AD) currently involve subjective neuropsychological testing and specialized brain imaging techniques. While definitive diagnosis requires a pathological brain evaluation at autopsy, neurodegenerative changes are believed to begin years before the clinical presentation of cognitive decline. Therefore, there is an essential need for reliable biomarkers to aid in the early detection of disease in order to implement preventative strategies. microRNAs (miRNA) are small non-coding RNA species that are involved in post-transcriptional gene regulation. Expression levels of miRNAs have potential as diagnostic biomarkers as they are known to circulate and tissue specific profiles can be identified in a number of bodily fluids such as plasma, CSF and urine. Recent developments in deep sequencing technology present a viable approach to develop biomarker discovery pipelines in order to profile miRNA signatures in bodily fluids specific to neurodegenerative diseases. Here we review the potential use of miRNA deep sequencing in biomarker identification from biological fluids and its translation into clinical practice. PMID:23964286

  8. Effect of different carbon sources on the biological phosphorus removal by a sequencing batch reactor using pressurized pure oxygen

    PubMed Central

    Wei, Jie; Imai, Tsuyoshi; Higuchi, Takaya; Arfarita, Novi; Yamamoto, Koichi; Sekine, Masahiko; Kanno, Ariyo

    2014-01-01

    The effect of different carbon source on the efficiency of enhanced biological phosphorus removal (EBPR) from synthetic wastewater with acetate and two ratios of acetate/starch as a carbon source was investigated. Three pressurized pure oxygen sequencing batch reactor (POSBR) experiments were operated. The reactors (POSBR1, POSBR2 and POSBR3) were developed and studied at different carbon source ratios of 100% acetate, 75% acetate plus 25% starch and 50% acetate plus 50% starch, respectively. The results showed that POSBR1 had a higher phosphate release-to-uptake ratio and, respectively, in a much higher phosphorus removal efficiency (93.8%) than POSBR2 (84.7%) and POSBR3 (77.3%) within 30 days of operation. This indicated that the phosphorus removal efficiency decreased the higher the starch concentration was. It was also found that POSBR1 produced more polyhydroxyalkanoates (PHAs) than the other reactors. Based on the effect of the carbon source on the PHA concentration and consumption, the conditions of POSBR1 were favourable for the growth of polyphosphate-accumulating organisms and therefore, beneficial for the biological phosphorus removal process. PMID:26019532

  9. Effect of different carbon sources on the biological phosphorus removal by a sequencing batch reactor using pressurized pure oxygen.

    PubMed

    Wei, Jie; Imai, Tsuyoshi; Higuchi, Takaya; Arfarita, Novi; Yamamoto, Koichi; Sekine, Masahiko; Kanno, Ariyo

    2014-05-04

    The effect of different carbon source on the efficiency of enhanced biological phosphorus removal (EBPR) from synthetic wastewater with acetate and two ratios of acetate/starch as a carbon source was investigated. Three pressurized pure oxygen sequencing batch reactor (POSBR) experiments were operated. The reactors (POSBR1, POSBR2 and POSBR3) were developed and studied at different carbon source ratios of 100% acetate, 75% acetate plus 25% starch and 50% acetate plus 50% starch, respectively. The results showed that POSBR1 had a higher phosphate release-to-uptake ratio and, respectively, in a much higher phosphorus removal efficiency (93.8%) than POSBR2 (84.7%) and POSBR3 (77.3%) within 30 days of operation. This indicated that the phosphorus removal efficiency decreased the higher the starch concentration was. It was also found that POSBR1 produced more polyhydroxyalkanoates (PHAs) than the other reactors. Based on the effect of the carbon source on the PHA concentration and consumption, the conditions of POSBR1 were favourable for the growth of polyphosphate-accumulating organisms and therefore, beneficial for the biological phosphorus removal process.

  10. Automatic annotation for biological sequences by extraction of keywords from MEDLINE abstracts. Development of a prototype system.

    PubMed

    Andrade, M A; Valencia, A

    1997-01-01

    We have developed a prototype for the automatic annotation of functional characteristics in protein families. The system is able to extract biological information directly from scientific literature in the form of MEDLINE abstracts. The criterion for selecting relevant keywords is the difference between their frequency in the abstracts associated with the protein family under study and its frequency in other unrelated protein families. The concept of functional information associated to protein families is the key feature of our system and gathers evolutionary information into the problem of functional annotation of biological sequences. The system has been tested in two different scenarios: first, a large set of protein families with a small number of abstract per family and second, selected protein families with large number of abstracts attached to each one. In both cases the performances are compared with annotations provided by human experts showing a clear relation between the amount of information provided to the system and the quality of the annotations. The automatic annotations are in many cases of similar quality to the ones contained in current data bases. The possibilities and difficulties to be encountered during the development of a full system for automatic annotation are discussed.

  11. The performances of the chi-square test and complexity measures for signal recognition in biological sequences.

    PubMed

    Pirhaji, Leila; Kargar, Mehdi; Sheari, Armita; Poormohammadi, Hadi; Sadeghi, Mehdi; Pezeshk, Hamid; Eslahchi, Changiz

    2008-03-21

    With large amounts of experimental data, modern molecular biology needs appropriate methods to deal with biological sequences. In this work, we apply a statistical method (Pearson's chi-square test) to recognize the signals appear in the whole genome of the Escherichia coli. To show the effectiveness of the method, we compare the Pearson's chi-square test with linguistic complexity on the complete genome of E. coli. The results suggest that Pearson's chi-square test is an efficient method for distinguishing genes (coding regions) form pseudogenes (noncoding regions). On the other hand, the performance of the linguistic complexity is much lower than the chi-square test method. We also use the Pearson's chi-square test method to determine which parts of the Open Reading Frame (ORF) have significant effect on discriminating genes form pseudogenes. Moreover, different complexity measures and Pearson's chi-square test applied on the genes with high value of Pearson's chi-square statistic. We also compute the measures on homologous of these genes. The results illustrate that there is a region near the start codon with high value of chi-square statistic and low complexity that is conserve between homologous genes.

  12. The complete genome sequences of poxviruses isolated from a penguin and a pigeon in South Africa and comparison to other sequenced avipoxviruses.

    PubMed

    Offerman, Kristy; Carulei, Olivia; van der Walt, Anelda Philine; Douglass, Nicola; Williamson, Anna-Lise

    2014-06-12

    Two novel avipoxviruses from South Africa have been sequenced, one from a Feral Pigeon (Columba livia) (FeP2) and the other from an African penguin (Spheniscus demersus) (PEPV). We present a purpose-designed bioinformatics pipeline for analysis of next generation sequence data of avian poxviruses and compare the different avipoxviruses sequenced to date with specific emphasis on their evolution and gene content. The FeP2 (282 kbp) and PEPV (306 kbp) genomes encode 271 and 284 open reading frames respectively and are more closely related to one another (94.4%) than to either fowlpox virus (FWPV) (85.3% and 84.0% respectively) or Canarypox virus (CNPV) (62.0% and 63.4% respectively). Overall, FeP2, PEPV and FWPV have syntenic gene arrangements; however, major differences exist throughout their genomes. The most striking difference between FeP2 and the FWPV-like avipoxviruses is a large deletion of ~16 kbp from the central region of the genome of FeP2 deleting a cc-chemokine-like gene, two Variola virus B22R orthologues, an N1R/p28-like gene and a V-type Ig domain family gene. FeP2 and PEPV both encode orthologues of vaccinia virus C7L and Interleukin 10. PEPV contains a 77 amino acid long orthologue of Ubiquitin sharing 97% amino acid identity to human ubiquitin. The genome sequences of FeP2 and PEPV have greatly added to the limited repository of genomic information available for the Avipoxvirus genus. In the comparison of FeP2 and PEPV to existing sequences, FWPV and CNPV, we have established insights into African avipoxvirus evolution. Our data supports the independent evolution of these South African avipoxviruses from a common ancestral virus to FWPV and CNPV.

  13. Complete genome sequence of virulent duck enteritis virus (DEV) strain 2085 and comparison with genome sequences of virulent and attenuated DEV strains.

    PubMed

    Wang, Jichun; Höper, Dirk; Beer, Martin; Osterrieder, Nikolaus

    2011-09-01

    We here report the complete genome sequence of the duck enteritis virus (DEV) wild-type strain 2085, an avian herpesvirus (GenBank ID: JF999965). The nucleotide sequence was derived from the 2085 genome cloned as an infectious bacterial artificial chromosome (BAC) clone. The DEV 2085 genome is 160,649-bp in length and encodes 78 predicted open reading frames (ORFs), a number identical to that identified for the attenuated DEV VAC strain (GenBank ID: EU082088.2). Comparison of the genome sequences DEV 2085 and VAC with partial sequences of the virulent CHv strain and the attenuated strain Clone-03 was carried out to identify nucleotide or amino acid polymorphisms that potentially contribute to DEV virulence. No amino acid changes were identified in 24 of the 78 ORFs, a result indicating high conservation in DEV independently of strain origin or virulence. In addition, 39 ORFs contain non-synonymous nucleotide substitutions, while 15 ORFs had nucleotide insertions or deletions, frame-shift mutations and/or non-synonymous nucleotide substitutions with an effect on ORF initiation or termination. In 7 of the 15 ORFs with high and 27 of the 39 ORFs with low variability, polymorphisms were exclusively found in DEV 2085, a finding that likely is a result of a different origin of 2085 (Europe) or VAC, Clone-03 and CHv (Eastern Asia). Five ORFs (UL2, UL12, US10, UL47 and UL41) with polymorphisms were identical between the virulent DEV 2085 and CHv but different from VAC or Clone-03. They, individually or in combination, may therefore represent DEV virulence factors. Our comparative analysis of four DEV sequences provides a comprehensive overview of DEV genome structure and identifies ORFs that are changed during serial virus passage.

  14. Origin and evolution of viruses: escaped DNA/RNA sequences as evolutionary accelerators and natural biological weapons.

    PubMed

    Bubanovic, Ivan; Najman, Stevo; Andjelkovic, Zlatibor

    2005-01-01

    Knowledge of the origin and evolution of viruses could provide a better understanding of a number of phenomena in the field of evolution such as the origin and development of multi-cellular organisms, the rapid diversification of species over the last 600-700 million years and the lack of transitional forms in the evolution of species ("missing links") etc. One of the possible effects of escaped DNA/RNA sequences or viruses on the evolution of multi-cellular organisms, especially vertebrates, could be the phenomenon of horizontal transmission and dissemination of genes. Interestingly, if so, this effect could be considered as a model of primeval and natural genetic engineering. Other possible links between the evolution of multi-cellular organisms and viruses are connected with the fact that viruses represent the source of different forms of selective pressure such as epidemics of infectious diseases, autoimmunity, malignant alteration, reproductive efficiency, etc. At the same time, these two models of "long-term evolutionary relations" could represent "key factors" in the evolution between viruses and multi-cellular organisms. The capability of a genome to produce and emit DNA/RNA sequences or de novo created viruses which can be a vector of genes horizontal transmission and/or cause selective pressure on concurrent or predator species gives a new characteristic to viruses--the possibility of their acting as natural biological weapons. Finally, possibly evolutionary advantages of this genome capability could be one of explanations for the phenomena such as genome instability and its ability to emit DNA/RNA sequences and/or de novo created viruses, as well as evolutionary conservation of this unique phenomena.

  15. Solid-State and Biological Nanopore for Real-Time Sensing of Single Chemical and Sequencing of DNA

    PubMed Central

    Haque, Farzin; Li, Jinghong; Wu, Hai-Chen; Liang, Xing-Jie; Guo, Peixuan

    2013-01-01

    Sensitivity and specificity are two most important factors to take into account for molecule sensing, chemical detection and disease diagnosis. A perfect sensitivity is to reach the level where a single molecule can be detected. An ideal specificity is to reach the level where the substance can be detected in the presence of many contaminants. The rapidly progressing nanopore technology is approaching this threshold. A wide assortment of biomotors and cellular pores in living organisms perform diverse biological functions. The elegant design of these transportation machineries has inspired the development of single molecule detection based on modulations of the individual current blockage events. The dynamic growth of nanotechnology and nanobiotechnology has stimulated rapid advances in the study of nanopore based instrumentation over the last decade, and inspired great interest in sensing of single molecules including ions, nucleotides, enantiomers, drugs, and polymers such as PEG, RNA, DNA, and polypeptides. This sensing technology has been extended to medical diagnostics and third generation high throughput DNA sequencing. This review covers current nanopore detection platforms including both biological pores and solid state counterparts. Several biological nanopores have been studied over the years, but this review will focus on the three best characterized systems including α-hemolysin and MspA, both containing a smaller channel for the detection of single-strand DNA, as well as bacteriophage phi29 DNA packaging motor connector that contains a larger channel for the passing of double stranded DNA. The advantage and disadvantage of each system are compared; their current and potential applications in nanomedicine, biotechnology, and nanotechnology are discussed. PMID:23504223

  16. Comparing Effects of Biologic Agents in Treating Patients with Rheumatoid Arthritis: A Multiple Treatment Comparison Regression Analysis

    PubMed Central

    Tvete, Ingunn Fride; Natvig, Bent; Gåsemyr, Jørund; Meland, Nils; Røine, Marianne; Klemp, Marianne

    2015-01-01

    Rheumatoid arthritis patients have been treated with disease modifying anti-rheumatic drugs (DMARDs) and the newer biologic drugs. We sought to compare and rank the biologics with respect to efficacy. We performed a literature search identifying 54 publications encompassing 9 biologics. We conducted a multiple treatment comparison regression analysis letting the number experiencing a 50% improvement on the ACR score be dependent upon dose level and disease duration for assessing the comparable relative effect between biologics and placebo or DMARD. The analysis embraced all treatment and comparator arms over all publications. Hence, all measured effects of any biologic agent contributed to the comparison of all biologic agents relative to each other either given alone or combined with DMARD. We found the drug effect to be dependent on dose level, but not on disease duration, and the impact of a high versus low dose level was the same for all drugs (higher doses indicated a higher frequency of ACR50 scores). The ranking of the drugs when given without DMARD was certolizumab (ranked highest), etanercept, tocilizumab/ abatacept and adalimumab. The ranking of the drugs when given with DMARD was certolizumab (ranked highest), tocilizumab, anakinra, rituximab, golimumab/ infliximab/ abatacept, adalimumab/ etanercept. Still, all drugs were effective. All biologic agents were effective compared to placebo, with certolizumab the most effective and adalimumab (without DMARD treatment) and adalimumab/ etanercept (combined with DMARD treatment) the least effective. The drugs were in general more effective, except for etanercept, when given together with DMARDs. PMID:26356639

  17. GENE SEQUENCE HOMOLOGY OF CHEMOKINES ACROSS SPECIES

    USDA-ARS?s Scientific Manuscript database

    The abundance of expressed gene and protein sequences available in the biological information databases facilitates comparison of protein homologies. A high degree of sequence similarity typically implies homology regarding structure and function and may provide clues to antibody cross-react...

  18. Nucleic acid sequence of an internal image-bearing monoclonal anti-idiotype and its comparison to the sequence of the external antigen.

    PubMed Central

    Bruck, C; Co, M S; Slaoui, M; Gaulton, G N; Smith, T; Fields, B N; Mullins, J I; Greene, M I

    1986-01-01

    The monoclonal anti-idiotypic antibody (mAb2) 87.92.6 directed against the 9B.G5 antibody specific for the virus neutralizing epitope on the mammalian reovirus type 3 hemagglutinin was previously demonstrated to express an internal image of the receptor binding epitope of the reovirus type 3. Furthermore, this mAb2 has autoimmune reactivity to the cell surface receptor of the reovirus. The nucleotide and deduced amino acid sequences of the 87.92.6 mAb2 heavy and light chains are described in this report. The sequence analysis reveals that the same heavy chain variable and joining (VH and JH) gene segments are used by the 87.92.6 anti-idiotypic mAb2 and by the dominant idiotypes of the BALB/c anti-GAT (cGAT) and anti-NP (NPa) responses. [GAT; random polymer that is 60% glutamic acid, 30% alanine, and 10% tyrosine. NP; (4-hydroxy-3-nitrophenyl)-acetyl.] Despite extensive homology at the level of the heavy chain variable regions, the NPa positive BALB/c anti-NP monoclonal antibody 17.2.25 binds neither 9B.G5 nor the cellular receptor for the hemagglutinin. Amino acid sequence comparison between the viral hemagglutinin and the 87.92.6 mAb2 light chain "internal image," reveals an area of significant homology indicating that antigen mimicry by antibodies may be achieved by sharing primary structure. PMID:2428036

  19. HIV drug resistance testing among patients failing second line antiretroviral therapy. Comparison of in-house and commercial sequencing.

    PubMed

    Chimukangara, Benjamin; Varyani, Bhavini; Shamu, Tinei; Mutsvangwa, Junior; Manasa, Justen; White, Elizabeth; Chimbetete, Cleophas; Luethy, Ruedi; Katzenstein, David

    2017-05-01

    HIV genotyping is often unavailable in low and middle-income countries due to infrastructure requirements and cost. We compared genotype resistance testing in patients with virologic failure, by amplification of HIV pol gene, followed by "in-house" sequencing and commercial sequencing. Remnant plasma samples from adults and children failing second-line ART were amplified and sequenced using in-house and commercial di-deoxysequencing, and analyzed in Harare, Zimbabwe and at Stanford, U.S.A, respectively. HIV drug resistance mutations were determined using the Stanford HIV drug resistance database. Twenty-six of 28 samples were amplified and 25 were successfully genotyped. Comparison of average percent nucleotide and amino acid identities between 23 pairs sequenced in both laboratories were 99.51 (±0.56) and 99.11 (±0.95), respectively. All pairs clustered together in phylogenetic analysis. Sequencing analysis identified 6/23 pairs with mutation discordances resulting in differences in phenotype, but these did not impact future regimens. The results demonstrate our ability to produce good quality drug resistance data in-house. Despite discordant mutations in some sequence pairs, the phenotypic predictions were not clinically significant. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. Quantification of layered patterns with structural anisotropy: a comparison of biological and geological systems.

    PubMed

    Smolyar, I; Bromage, T; Wikelski, M

    2016-03-01

    Large-scale patterns evident from satellite images of aeolian landforms on Earth and other planets; those of intermediate scale in marine and terrestrial sand ripples and sediment profiles; and small-scale patterns such as lamellae in the bones of vertebrates and annuli in fish scales are each represented by layers of different thicknesses and lengths. Layered patterns are important because they form a record of the state of internal and external factors that regulate pattern formation in these geological and biological systems. It is therefore potentially possible to recognize trends, periodicities, and events in the history of the formation of these systems among the incremental sequences. Though the structures and sizes of these 2-D patterns are typically scale-free, they are also characteristically anisotropic; that is, the number of layers and their absolute thicknesses vary significantly during formation. The aim of the present work is to quantify the structure of layered patterns and to reveal similarities and differences in the processing and interpretation of layered landforms and biological systems. To reach this goal we used N-partite graph and Boolean functions to quantify the structure of layers and plot charts for "layer thickness vs. layer number" and "layer area vs. layer number". These charts serve as a source of information about events in the history of formation of layered systems. The concept of synchronization of layer formation across a 2-D plane is introduced to develop the procedure for plotting "layer thickness vs. layer number" and "layer area vs. layer number", which takes into account the structural anisotropy of layered patterns and increase signal-to-noise ratio in charts. Examples include landforms on Mars and Earth and incremental layers in human and iguana bones.

  1. A comparison of the biological activity of 2 formulations of enoxaparin in 12 healthy volunteers.

    PubMed

    Sharma, Vineeta; Madhu, Sirisha; Natarajan, Parthiban; Muniyandi, Ganesan; Jaiswal, Vijaya; Saxena, Renu

    2010-08-01

    India is one of the few countries where biosimilar enoxaparin is available for clinical use. Despite availability since past 4 to 5 years, there is a paucity of published literature regarding their biological activity. The aim of the current study is to compare the biological activity of an endogenously developed formulation of enoxaparin with the branded formulation. Twelve healthy male volunteers received 1 subcutaneous injection of 2 different formulations of enoxaparin in a randomized, open-label, balanced, 2-treatment, 2-period, 2-sequence, cross-over study. The test formulation was Injection Troynoxa (enoxaparin sodium 40 mg/0.4 mL, Troikaa Pharmaceuticals Ltd., India) and reference formulation was Injection Clexane (enoxaparin sodium 40 mg/ 0.4 mL, Sanofi-Aventis, UK). The plasma anti-Xa activity and activated partial thromboplastin time (aPTT) were estimated on fully automated coagulometer predose and at 2, 4, 6, 8, and 10 hours following dosing with 40 mg/0.4 mL of enoxaparin. The results of mixed model analysis of repeated measures analysis of variance (ANOVA) for estimating difference between least square means of test and reference formulations, at all time points, showed no significant differences in anti-Xa activity and plasma aPTT levels. Both formulations were well tolerated and there were no bleeding episodes. After a single-dose injection in healthy participants, anti-Xa activities of 2 formulations of LMWH enoxaparin were comparable. No significant difference was observed in the mean plasma aPTT. It remains to be seen whether the 2 formulations would show comparable clinical efficacy.

  2. Comparison of base composition analysis and Sanger sequencing of mitochondrial DNA for four U.S. population groups.

    PubMed

    Kiesler, Kevin M; Coble, Michael D; Hall, Thomas A; Vallone, Peter M

    2014-01-01

    A set of 711 samples from four U.S. population groups was analyzed using a novel mass spectrometry based method for mitochondrial DNA (mtDNA) base composition profiling. Comparison of the mass spectrometry results with Sanger sequencing derived data yielded a concordance rate of 99.97%. Length heteroplasmy was identified in 46% of samples and point heteroplasmy was observed in 6.6% of samples in the combined mass spectral and Sanger data set. Using discrimination capacity as a metric, Sanger sequencing of the full control region had the highest discriminatory power, followed by the mass spectrometry base composition method, which was more discriminating than Sanger sequencing of just the hypervariable regions. This trend is in agreement with the number of nucleotides covered by each of the three assays.

  3. Comparison of sequences of RNAs 3 and 4 of rice stripe virus from China with those of Japanese isolates.

    PubMed

    Qu, Z; Liang, D; Harper, G; Hull, R

    1997-01-01

    The sequences were determined of RNAs 3 and 4 of a Chinese isolate (Y) of rice stripe tenuivirus (RStV) and were compared with those of two RStV isolates (M and T) from Japan. Both RNAs of the Y isolate were longer than those of the M and T isolates. There was almost complete conservation in the 5' and 3' non-coding regions for each RNA between the isolates. The analogous ambisense coding regions for each isolate were exactly the same size and the sequences were highly conserved. The major differences were in the intergenic regions, the sizes of which accounted for the differences in size of each RNA of the three isolates. There were no obvious patterns of differences in comparisons of the two RNA over the three isolates. The significance of the similarities and differences in sequences of isolates of RStV separated by more than 3500 km is discussed.

  4. Intrahost evolution of envelope glycoprotein and OrfA sequences after experimental infection of cats with a molecular clone and a biological isolate of feline immunodeficiency virus.

    PubMed

    Huisman, Willem; Schrauwen, Eefje J A; Rimmelzwaan, Guus F; Osterhaus, Albert D M E

    2008-10-01

    Feline immunodeficiency virus (FIV) is a member of the genus Lentivirus and causes AIDS-like disease in its natural host, the cat. Like other lentiviruses, FIV displays a high degree of nucleotide sequence variability that is reflected in both the geographic distribution of the viruses and the different cat species that are infected. Although a lot of data on sequence variation at the population level is available, relatively little is known about the intrahost variation of FIV sequences. In the present study, cats were infected with either a biological isolate of FIV or a molecular clone that was derived from the same isolate, AM19. After infection, the cats were monitored for up to 3 years and at various time points sequences were obtained of virus circulating in the plasma. Regions of the env gene and the orfA gene were amplified, cloned and their nucleotide sequence analyzed. Furthermore, the extent of sequence variation in the original inocula was also determined. It was found that FIV is displaying relative little sequence variation during infection of its host, both in the env and the orfA gene, especially after infection with molecular clone 19k1. Although the extent of variation was higher after infection with biological isolate AM19, a large portion of these variant sequences was already present in the inoculum.

  5. Comparison of Illumina and 454 Deep Sequencing in Participants Failing Raltegravir-Based Antiretroviral Therapy

    PubMed Central

    Li, Jonathan Z.; Chapman, Brad; Charlebois, Patrick; Hofmann, Oliver; Weiner, Brian; Porter, Alyssa J.; Samuel, Reshmi; Vardhanabhuti, Saran; Zheng, Lu; Eron, Joseph; Taiwo, Babafemi; Zody, Michael C.; Henn, Matthew R.; Kuritzkes, Daniel R.; Hide, Winston; Wilson, Cara C.; Berzins, Baiba I.; Acosta, Edward P.; Bastow, Barbara; Kim, Peter S.; Read, Sarah W.; Janik, Jennifer; Meres, Debra S.; Lederman, Michael M.; Mong-Kryspin, Lori; Shaw, Karl E.; Zimmerman, Louis G.; Leavitt, Randi; De La Rosa, Guy; Jennings, Amy

    2014-01-01

    Background The impact of raltegravir-resistant HIV-1 minority variants (MVs) on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs. Methods A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegravir resistance at the time of virologic failure. A control library was created by pooling integrase clones at predefined proportions. Multiplexed sequencing was performed with Illumina and 454 platforms at comparable costs. Illumina sequence analysis was performed with the novel snp-assess tool and 454 sequencing was analyzed with V-Phaser. Results Illumina sequencing resulted in significantly higher sequence coverage and a 0.095% limit of detection. Illumina accurately detected all MVs in the control library at ≥0.5% and 7/10 MVs expected at 0.1%. 454 sequencing failed to detect any MVs at 0.1% with 5 false positive calls. For MVs detected in the patient samples by both 454 and Illumina, the correlation in the detected variant frequencies was high (R2 = 0.92, P<0.001). Illumina sequencing detected 2.4-fold greater nucleotide MVs and 2.9-fold greater amino acid MVs compared to 454. The only raltegravir-resistant MV detected was an E138K mutation in one participant by Illumina sequencing, but not by 454. Conclusions In participants of A5262 with raltegravir resistance at virologic failure, baseline raltegravir-resistant MVs were rarely detected. At comparable costs to 454 sequencing, Illumina demonstrated greater depth of coverage, increased sensitivity for detecting HIV MVs, and fewer false positive variant calls. PMID:24603872

  6. Comparison of illumina and 454 deep sequencing in participants failing raltegravir-based antiretroviral therapy.

    PubMed

    Li, Jonathan Z; Chapman, Brad; Charlebois, Patrick; Hofmann, Oliver; Weiner, Brian; Porter, Alyssa J; Samuel, Reshmi; Vardhanabhuti, Saran; Zheng, Lu; Eron, Joseph; Taiwo, Babafemi; Zody, Michael C; Henn, Matthew R; Kuritzkes, Daniel R; Hide, Winston; Wilson, Cara C; Berzins, Baiba I; Acosta, Edward P; Bastow, Barbara; Kim, Peter S; Read, Sarah W; Janik, Jennifer; Meres, Debra S; Lederman, Michael M; Mong-Kryspin, Lori; Shaw, Karl E; Zimmerman, Louis G; Leavitt, Randi; De La Rosa, Guy; Jennings, Amy

    2014-01-01

    The impact of raltegravir-resistant HIV-1 minority variants (MVs) on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs. A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegravir resistance at the time of virologic failure. A control library was created by pooling integrase clones at predefined proportions. Multiplexed sequencing was performed with Illumina and 454 platforms at comparable costs. Illumina sequence analysis was performed with the novel snp-assess tool and 454 sequencing was analyzed with V-Phaser. Illumina sequencing resulted in significantly higher sequence coverage and a 0.095% limit of detection. Illumina accurately detected all MVs in the control library at ≥0.5% and 7/10 MVs expected at 0.1%. 454 sequencing failed to detect any MVs at 0.1% with 5 false positive calls. For MVs detected in the patient samples by both 454 and Illumina, the correlation in the detected variant frequencies was high (R2 = 0.92, P<0.001). Illumina sequencing detected 2.4-fold greater nucleotide MVs and 2.9-fold greater amino acid MVs compared to 454. The only raltegravir-resistant MV detected was an E138K mutation in one participant by Illumina sequencing, but not by 454. In participants of A5262 with raltegravir resistance at virologic failure, baseline raltegravir-resistant MVs were rarely detected. At comparable costs to 454 sequencing, Illumina demonstrated greater depth of coverage, increased sensitivity for detecting HIV MVs, and fewer false positive variant calls.

  7. Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity

    PubMed Central

    Leuthaeuser, Janelle B; Knutson, Stacy T; Kumar, Kiran; Babbitt, Patricia C; Fetrow, Jacquelyn S

    2015-01-01

    The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM-Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well-studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence- or structure-based networks. Sequence- and structure-based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity-based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity-based methods. PMID:26073648

  8. Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity.

    PubMed

    Leuthaeuser, Janelle B; Knutson, Stacy T; Kumar, Kiran; Babbitt, Patricia C; Fetrow, Jacquelyn S

    2015-09-01

    The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM-Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well-studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence- or structure-based networks. Sequence- and structure-based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity-based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity-based methods. © 2015 The Authors Protein Science

  9. Comparison of Exome and Genome Sequencing Technologies for the Complete Capture of Protein-Coding Regions.

    PubMed

    Lelieveld, Stefan H; Spielmann, Malte; Mundlos, Stefan; Veltman, Joris A; Gilissen, Christian

    2015-08-01

    For next-generation sequencing technologies, sufficient base-pair coverage is the foremost requirement for the reliable detection of genomic variants. We investigated whether whole-genome sequencing (WGS) platforms offer improved coverage of coding regions compared with whole-exome sequencing (WES) platforms, and compared single-base coverage for a large set of exome and genome samples. We find that WES platforms have improved considerably in the last years, but at comparable sequencing depth, WGS outperforms WES in terms of covered coding regions. At higher sequencing depth (95x-160x), WES successfully captures 95% of the coding regions with a minimal coverage of 20x, compared with 98% for WGS at 87-fold coverage. Three different assessments of sequence coverage bias showed consistent biases for WES but not for WGS. We found no clear differences for the technologies concerning their ability to achieve complete coverage of 2,759 clinically relevant genes. We show that WES performs comparable to WGS in terms of covered bases if sequenced at two to three times higher coverage. This does, however, go at the cost of substantially more sequencing biases in WES approaches. Our findings will guide laboratories to make an informed decision on which sequencing platform and coverage to choose.

  10. Comparison of Exome and Genome Sequencing Technologies for the Complete Capture of Protein‐Coding Regions

    PubMed Central

    Lelieveld, Stefan H.; Spielmann, Malte; Mundlos, Stefan; Veltman, Joris A.

    2015-01-01

    ABSTRACT For next‐generation sequencing technologies, sufficient base‐pair coverage is the foremost requirement for the reliable detection of genomic variants. We investigated whether whole‐genome sequencing (WGS) platforms offer improved coverage of coding regions compared with whole‐exome sequencing (WES) platforms, and compared single‐base coverage for a large set of exome and genome samples. We find that WES platforms have improved considerably in the last years, but at comparable sequencing depth, WGS outperforms WES in terms of covered coding regions. At higher sequencing depth (95x–160x), WES successfully captures 95% of the coding regions with a minimal coverage of 20x, compared with 98% for WGS at 87‐fold coverage. Three different assessments of sequence coverage bias showed consistent biases for WES but not for WGS. We found no clear differences for the technologies concerning their ability to achieve complete coverage of 2,759 clinically relevant genes. We show that WES performs comparable to WGS in terms of covered bases if sequenced at two to three times higher coverage. This does, however, go at the cost of substantially more sequencing biases in WES approaches. Our findings will guide laboratories to make an informed decision on which sequencing platform and coverage to choose. PMID:25973577

  11. Protein Sequence Comparison Based on Physicochemical Properties and the Position-Feature Energy Matrix

    PubMed Central

    Yu, Lulu; Zhang, Yusen; Gutman, Ivan; Shi, Yongtang; Dehmer, Matthias

    2017-01-01

    We develop a novel position-feature-based model for protein sequences by employing physicochemical properties of 20 amino acids and the measure of graph energy. The method puts the emphasis on sequence order information and describes local dynamic distributions of sequences, from which one can get a characteristic B-vector. Afterwards, we apply the relative entropy to the sequences representing B-vectors to measure their similarity/dissimilarity. The numerical results obtained in this study show that the proposed methods leads to meaningful results compared with competitors such as Clustal W. PMID:28393857

  12. Protein Sequence Comparison Based on Physicochemical Properties and the Position-Feature Energy Matrix.

    PubMed

    Yu, Lulu; Zhang, Yusen; Gutman, Ivan; Shi, Yongtang; Dehmer, Matthias

    2017-04-10

    We develop a novel position-feature-based model for protein sequences by employing physicochemical properties of 20 amino acids and the measure of graph energy. The method puts the emphasis on sequence order information and describes local dynamic distributions of sequences, from which one can get a characteristic B-vector. Afterwards, we apply the relative entropy to the sequences representing B-vectors to measure their similarity/dissimilarity. The numerical results obtained in this study show that the proposed methods leads to meaningful results compared with competitors such as Clustal W.

  13. Technologically important extremophile 16S rRNA sequence Shannon entropy and fractal property comparison with long term dormant microbes

    NASA Astrophysics Data System (ADS)

    Holden, Todd; Gadura, N.; Dehipawala, S.; Cheung, E.; Tuffour, M.; Schneider, P.; Tremberger, G., Jr.; Lieberman, D.; Cheung, T.

    2011-10-01

    Technologically important extremophiles including oil eating microbes, uranium and rocket fuel perchlorate reduction microbes, electron producing microbes and electrode electrons feeding microbes were compared in terms of their 16S rRNA sequences, a standard targeted sequence in comparative phylogeny studies. Microbes that were reported to have survived a prolonged dormant duration were also studied. Examples included the recently discovered microbe that survives after 34,000 years in a salty environment while feeding off organic compounds from other trapped dead microbes. Shannon entropy of the 16S rRNA nucleotide composition and fractal dimension of the nucleotide sequence in terms of its atomic number fluctuation analyses suggest a selected range for these extremophiles as compared to other microbes; consistent with the experience of relatively mild evolutionary pressure. However, most of the microbes that have been reported to survive in prolonged dormant duration carry sequences with fractal dimension between 1.995 and 2.005 (N = 10 out of 13). Similar results are observed for halophiles, red-shifted chlorophyll and radiation resistant microbes. The results suggest that prolonged dormant duration, in analogous to high salty or radiation environment, would select high fractal 16S rRNA sequences. Path analysis in structural equation modeling supports a causal relation between entropy and fractal dimension for the studied 16S rRNA sequences (N = 7). Candidate choices for high fractal 16S rRNA microbes could offer protection for prolonged spaceflights. BioBrick gene network manipulation could include extremophile 16S rRNA sequences in synthetic biology and shed more light on exobiology and future colonization in shielded spaceflights. Whether the high fractal 16S rRNA sequences contain an asteroidlike extra-terrestrial source could be speculative but interesting.

  14. Pleistocene glaciation of volcano Ajusco, central Mexico, and comparison with the standard Mexican glacial sequence

    NASA Astrophysics Data System (ADS)

    White, Sidney E.; Valastro, Salvatore

    1984-01-01

    Three Pleistocene glaciations and two Holocene Neoglacial advances occurred on volcano Ajusco in central Mexico. Lateral moraines of the oldest glaciation, the Marqués, above 3250 m are made of light-gray indurated till and are extensively modified by erosion. Below 3200 m the till is dark red, decomposed, and buried beneath volcanic colluvium and tephra. Very strongly to strongly developed soil profiles (Inceptisols) have formed in the Marqués till and in overlying colluvia and tephra. Large sharp-crested moraines of the second glaciation, the Santo Tomás, above 3300 m are composed of pale-brown firm till and are somewhat eroded by gullies. Below 3250 m the till is light reddish brown, cemented, and weathered. Less-strongly developed soil profiles (Inceptisols) have formed in the Santo Tomás till and in overlying colluvia and tephra. Narrow-crested moraines of yellowish-brown loose till of the third glaciation, the Albergue, are uneroded. Weakly developed soil profiles (Inceptisols) in the Albergue till have black ash in the upper horizon. Two small Neoglacial moraines of yellowish-brown bouldery till on the cirque floor of the largest valley support weakly developed soil profiles with only A and Cox horizons and no ash in the upper soil horizons. Radiocarbon dating of organic matter of the B horizons developed in tills, volcanic ash, and colluvial volcanic sand includes ages for both the soil-organic residue and the humic-acid fraction, with differences from 140 to 660 yr. The dating provides minimum ages of about 27,000 yr for the Marqués glaciation and about 25,000 yr for the Santo Tomás glaciation. Dates for the overlying tephra indicate a complex volcanic history for at least another 15,000 yr. Comparison of the Ajusco glacial sequence with that on Iztaccíhuatl to the east suggests that the Marqués and Santo Tomás glaciations may be equivalent to the Diamantes glaciation First and Second advances, the Albergue to the Alcalican glaciations, and the

  15. Comparison Between a Rapid Biological Screening Method (EPA 4425) for TCDDs/TCDFs and Chemical Analytical Methods

    SciTech Connect

    Anderson, Jack W.; Jones, Jennifer M.; McCoy, Daniel L.; Fujita, Akira; Yamamoto, Taichi; Iijima, Satoshi

    2003-08-01

    Seven polychlorinated dibenzo-p-dioxins (PCDDs), ten polychlorinated dibenzofurans (PCDFs) as well as twelve polychlorinated biphenyls (PCBs) are collectively referred to as dioxin-like compounds. The World Health Organization toxic equivalency factors (TEFs) for these persistent chlorinated organic compounds and their measured concentrations are used to produce the toxic equivalency quotient (TEQ) of a sample. TEF values are partially based on a common mechanism involving binding of the chemical to the aryl hydrocarbon receptor (AhR). Biological methods for the determination of TEQs are based on the assumption that all dioxin-related compounds act through the Ah receptor signal transduction pathway. Based on the biochemical response of CYP1A activation via the AhR, in vitro systems that utilize a reporter gene under transcriptional control of CYP1A have been developed. Several investigations have reported on the success of utilizing biological test systems to detect PCDDs, PCDFs, PCBs in environmental samples. The P450 Human Reporter Gene System assay (EPA Method 4425) utilizes a human hepatoma cell line (HepG2) in which a plasmid containing the human CYP1A1 promoter and 5'-flanking sequences with three xenobiotic responsive elements (XREs) fused to the luciferase reporter gene. The enzyme luciferase is produced in the presence of compounds that bind the XREs, and can be detected by a simple assay that measures relative light units with a luminometer. Method 4425, used by Columbia Analytical Services (CAS), has gained acceptance as a rapid and inexpensive approach for screening solvent extracts of environmental samples of soil, sediment, tissue, and water to detect compounds that activate the AhR. Investigations in the U. S. and Japan comparing the results of 4425 and standard high-resolution GC/MS (HRGC/HRMS) will be reported here. The purpose of making these comparisons is to determine if risk assessments for large dioxin sites both before and after remediation

  16. Comparison of Biological Effectiveness of Carbon-Ion Beams in Japan and Germany

    SciTech Connect

    Uzawa, Akiko; Ando, Koichi Koike, Sachiko; Furusawa, Yoshiya; Matsumoto, Yoshitaka; Takai, Nobuhiko; Hirayama, Ryoichi; Watanabe, Masahiko; Scholz, Michael; Elsaesser, Thilo; Peschke, Peter

    2009-04-01

    Purpose: To compare the biological effectiveness of 290 MeV/amu carbon-ion beams in Chiba, Japan and in Darmstadt, Germany, given that different methods for beam delivery are used for each. Methods and Materials: Murine small intestine and human salivary gland tumor (HSG) cells exponentially growing in vitro were irradiated with 6-cm width of spread-out Bragg peaks (SOBPs) adjusted to achieve nearly identical beam depth-dose profiles at the Heavy-Ion Medical Accelerator in Chiba, and the SchwerIonen Synchrotron in Darmstadt. Cell kill efficiencies of carbon ions were measured by colony formation for HSG cells and jejunum crypts survival in mice. Cobalt-60 {gamma} rays were used as the reference radiation. Isoeffective doses at given survivals were used for relative biological effectiveness (RBE) calculations and interinstitutional comparisons. Results: Isoeffective D{sub 10} doses (mean {+-} standard deviation) of HSG cells ranged from 2.37 {+-} 0.14 Gy to 3.47 {+-} 0.19 Gy for Chiba and from 2.31 {+-} 0.11 Gy to 3.66 {+-} 0.17 Gy for Darmstadt. Isoeffective D{sub 10} doses of gut crypts after single doses ranged from 8.25 {+-} 0.17 Gy to 10.32 {+-} 0.14 Gy for Chiba and from 8.27 {+-} 0.10 Gy to 10.27 {+-} 0.27 Gy for Darmstadt, whereas isoeffective D{sub 30} doses after three fractionated doses were 9.89 {+-} 0.17 Gy through 13.70 {+-} 0.54 Gy and 10.14 {+-} 0.20 Gy through 13.30 {+-} 0.41 Gy for Chiba and Darmstadt, respectively. Overall difference of RBE between the two facilities was 0-5% or 3-7% for gut crypt survival or HSG cell kill, respectively. Conclusion: The carbon-ion beams at the National Institute of Radiological Sciences in Chiba, Japan and the Gesellschaft fuer Schwerionenforschung in Darmstadt, Germany are biologically identical after single and daily fractionated irradiation.

  17. Next-generation sequencing reveals the biological significance of the N2,3-ethenoguanine lesion in vivo

    PubMed Central

    Chang, Shiou-chi; Fedeles, Bogdan I.; Wu, Jie; Delaney, James C.; Li, Deyu; Zhao, Linlin; Christov, Plamen P.; Yau, Emily; Singh, Vipender; Jost, Marco; Drennan, Catherine L.; Marnett, Lawrence J.; Rizzo, Carmelo J.; Levine, Stuart S.; Guengerich, F. Peter; Essigmann, John M.

    2015-01-01

    Etheno DNA adducts are a prevalent type of DNA damage caused by vinyl chloride (VC) exposure and oxidative stress. Etheno adducts are mutagenic and may contribute to the initiation of several pathologies; thus, elucidating the pathways by which they induce cellular transformation is critical. Although N2,3-ethenoguanine (N2,3-εG) is the most abundant etheno adduct, its biological consequences have not been well characterized in cells due to its labile glycosidic bond. Here, a stabilized 2′-fluoro-2′-deoxyribose analog of N2,3-εG was used to quantify directly its genotoxicity and mutagenicity. A multiplex method involving next-generation sequencing enabled a large-scale in vivo analysis, in which both N2,3-εG and its isomer 1,N2-ethenoguanine (1,N2-εG) were evaluated in various repair and replication backgrounds. We found that N2,3-εG potently induces G to A transitions, the same mutation previously observed in VC-associated tumors. By contrast, 1,N2-εG induces various substitutions and frameshifts. We also found that N2,3-εG is the only etheno lesion that cannot be repaired by AlkB, which partially explains its persistence. Both εG lesions are strong replication blocks and DinB, a translesion polymerase, facilitates the mutagenic bypass of both lesions. Collectively, our results indicate that N2,3-εG is a biologically important lesion and may have a functional role in VC-induced or inflammation-driven carcinogenesis. PMID:25837992

  18. Comparison of ribosomal RNA removal methods for transcriptome sequencing workflows in teleost fish

    USDA-ARS?s Scientific Manuscript database

    RNA sequencing (RNA-Seq) is becoming the standard for transcriptome analysis. Removal of contaminating ribosomal RNA (rRNA) is a priority in the preparation of libraries suitable for sequencing. rRNAs are commonly removed from total RNA via either mRNA selection or rRNA depletion. These methods have...

  19. Genomic sequence and virulence comparison of four type 2 porcine reproductive and respiratory syndrome virus strains

    USDA-ARS?s Scientific Manuscript database

    Porcine reproductive and respiratory syndrome virus (PRRSV) is a ubiquitous and costly virus that exhibits substantial sequence and virulence disparity among diverse isolates. In this study, we compared the whole genomic sequence and virulence of 4 North American Type 2 PRRSV isolates. Among the 4 i...

  20. Comparison of the solution conformations of a cell-adhesive peptide LBE and its reverse sequence EBL.

    PubMed

    Jois, S D; Hughes, R; Siahaan, T J

    1999-12-01

    T-cell adhesion is mediated by an ICAM-1/LFA-1 interaction; this interaction plays a crucial role in T-cell activation during immune response. LBE peptide, which is derived from the beta-subunit of LFA-1, has been shown to inhibit ICAM-1/LFA-1-mediated T-cell adhesion. In this work, we studied the solution conformations of LBE peptide and its reverse sequence (EBL) by NMR, CD and molecular dynamics simulations. Reverse peptides have been used as controls in biological studies. The effect of reversing the sequence of LBE to EBL peptides on their respective conformations is important in understanding their biological properties in vitro or in vivo. The NMR studies for these peptides were carried out in water and in TFE/water solvent systems. In 40% TFE/water, both peptides exhibited helical conformation. CD studies suggested that the LBE exhibits 30% helical conformation, while the EBL exhibits 20% helical conformation. From the NMR and MD simulation studies, it was evident that the peptides exhibited a stable helical conformation; a stable helical structure was found at Leu6 to Leu15 for LBE and at Gly9 to Leu17 for EBL. The helical conformations of LBE and EBL may be in equilibrium with other possible conformers; the other conformers contain loop and turn structures. Both peptides bind to divalent cations because the LBE is derived from the cation-binding region of the LFA-1. This study shows that reversing the peptide sequence did not alter the secondary structure of the corresponding sequence. Hence, caution must be exercised when using reverse peptides as controls in biological studies. This report will improve our ability to design a better inhibitor of ICAM-1/LFA-1 interaction.

  1. Comparison of ancient and modern Clonorchis sinensis based on ITS1 and ITS2 sequences.

    PubMed

    Liu, Wen-Qi; Liu, Juan; Zhang, Jun-Hua; Long, Xiao-Chun; Lei, Jia-Hui; Li, Yong-Long

    2007-02-01

    In 1975, an ancient corpse buried in 167 BC was found at Jiangling County, Hubei Province of China. The eggs of Clonorchis sinensis found in the gall bladder of the corpse were preserved well. In the present paper, we extracted the genomic DNA from the ancient eggs and modern eggs, respectively, and the internal transcribed spacer 1 and 2 (ITS1 and ITS2) at ribosomal RNA genes were studied. The results show that ITS2 sequences from the ancient sample were identical with those from modern samples, but in ITS1 differences in 15 nucleotide positions were found between the ancient and modern samples. The results demonstrated that it is possible to extract and sequence DNA from ancient parasite eggs. The ITS1 sequence obtained differed from all modern ones available to date. This might indicate sequence divergence through time, or might reflect a sequence polymorphism that may eventually be found also in modern samples.

  2. Comparison of winding-number sequences for symmetric and asymmetric oscillatory systems.

    PubMed

    Englisch, Volker; Parlitz, Ulrich; Lauterborn, Werner

    2015-08-01

    The bifurcation sets of symmetric and asymmetric periodically driven oscillators are investigated and classified by means of winding numbers. It is shown that periodic windows within chaotic regions are forming winding-number sequences on different levels. These sequences can be described by a simple formula that makes it possible to predict winding numbers at bifurcation points. Symmetric and asymmetric systems follow similar rules for the development of winding numbers within different sequences and these sequences can be combined into a single general rule. The role of the two distinct period-doubling cascades is investigated in the light of the winding-number sequences discovered. Examples are taken from the double-well Duffing oscillator, a special two-parameter Duffing oscillator, and a bubble oscillator.

  3. Patterns of Treatment Sequences in Chemotherapy and Targeted Biologics for Metastatic Colorectal Cancer: Findings from a Large Community-Based Cohort of Elderly Patients.

    PubMed

    Parikh, Rohan C; Du, Xianglin L; Morgan, Robert O; Lairson, David R

    Over the last decade, multiple chemotherapies/targeted biologics have been approved for metastatic colorectal cancer (mCRC). However, evidence is limited with regards to the array of treatments received by mCRC patients. This study examines treatment sequences (first- to third-line chemotherapy/targeted biologics) and the factors associated with first-line targeted biologics and common treatment sequences for elderly mCRC patients treated in a community setting. A retrospective cohort study was conducted in mCRC patients diagnosed from January 2004 through December 2009 using the Surveillance, Epidemiology and End Results Medicare-linked database. The treatment sequences administered to elderly mCRC patients were empirically identified. Of 4418 mCRC patients who received treatment, 1370 (31 %) received first, second, and third line; 1164 (26 %) received first and second line; and 1884 (43 %) received only first line. The most common first line of treatment for mCRC patients was 5-fluorouracil/leucovorin + oxaliplatin (FOLFOX) + bevacizumab (23 %) and FOLFOX (23 %). 5-fluorouracil/leucovorin + irinotecan (FOLFIRI)-based regimens were commonly (22 %) administered in second line. The most common treatment sequence was first-line oxaliplatin or irinotecan followed by second-line oxaliplatin or irinotecan + bevacizumab followed by a third-line targeted biologic. Of patients who received first-line therapy, 47 % also received a targeted biologic, and the factors associated were age, comorbidity score, cancer site, geographic location, and year of diagnosis. Elderly mCRC patients receive a multitude of treatments in various sequences. Further exploration of the comparative effectiveness of treatment sequences may yield important information for improving mCRC survival.

  4. Comparisons of the transmitted signals of time, aperture, and angle gating in biological tissues and a phantom

    NASA Astrophysics Data System (ADS)

    Yang, C. C.; Sun, Chia-Wei; Lee, Cheng-Kuan; Lu, Chih-Wei; Tsai, Meng-Tsan; Yang, C. C.; Kiang, Yean-Woei

    2004-03-01

    We measure transmitted signals with time, aperture, and angle gating for comparison in micro-sphere suspension, chicken breast and chicken liver tissues. We find that in each sample, the small aperture-gated (angle-gated) signals for imaging are essentially different from those of early time gating. Meanwhile, the signals obtained from aperture and angle gating come from quite different parts of the transmitted photons. For biological tissues of different structures, different gating methods may lead to different levels of imaging quality. Also, the results indicate the generally different scattering characteristics of biological tissues from that of a particle-based phantom. The scattering nature in the biological tissues may imply that random continuum scattering needs to be considered in biological imaging. Between chicken breast and liver tissues, the time-gated data show that the later has stronger scattering and absorption.

  5. Nucleotide sequence of murine PCNA: interspecies comparison of the cDNA and the 5' flanking region of the gene.

    PubMed

    Shipman-Appasamy, P M; Cohen, K S; Prystowsky, M B

    1991-01-01

    Proliferating cell nuclear antigen (PCNA) RNA levels are regulated by transcription as well as changes in stability, in growing cells. We have cloned the murine PCNA cDNA and a fragment of the murine PCNA gene flanking the transcription initiation site. Comparison of the murine deduced amino acid sequence with the PCNA sequence from rat, human, Drosophila, Saccharomyces cerevisiae, and higher plants, reveals extensive homology between species. The homology is likely to be related to the fundamental role of PCNA as an auxiliary protein for DNA replication. Consensus sequences for transcriptional regulatory factors identified within 520 bp 5' of the cap site of the murine PCNA gene include: an inverted CCAAT site, an enhancer core element (EBP-1), three cAMP-response elements (CRE-BP), one AP-2 site, three Sp1 sites, and two octamer sequences. The first 20 bp of the transcriptional unit are homologous to an initiator element, which may direct transcription from RNA polymerase II in the absence of a TATAA box. The consensus elements in the murine PCNA gene are similar in sequence and/or location to elements identified in the genes for human, Drosophilia, and yeast PCNA.

  6. Performance comparison of second- and third-generation sequencers using a bacterial genome with two chromosomes.

    PubMed

    Miyamoto, Mari; Motooka, Daisuke; Gotoh, Kazuyoshi; Imai, Takamasa; Yoshitake, Kazutoshi; Goto, Naohisa; Iida, Tetsuya; Yasunaga, Teruo; Horii, Toshihiro; Arakawa, Kazuharu; Kasahara, Masahiro; Nakamura, Shota

    2014-08-21

    The availability of diverse second- and third-generation sequencing technologies enables the rapid determination of the sequences of bacterial genomes. However, identifying the sequencing technology most suitable for producing a finished genome with multiple chromosomes remains a challenge. We evaluated the abilities of the following three second-generation sequencers: Roche 454 GS Junior (GS Jr), Life Technologies Ion PGM (Ion PGM), and Illumina MiSeq (MiSeq) and a third-generation sequencer, the Pacific Biosciences RS sequencer (PacBio), by sequencing and assembling the genome of Vibrio parahaemolyticus, which consists of a 5-Mb genome comprising two circular chromosomes. We sequenced the genome of V. parahaemolyticus with GS Jr, Ion PGM, MiSeq, and PacBio and performed de novo assembly with several genome assemblers. Although GS Jr generated the longest mean read length of 418 bp among the second-generation sequencers, the maximum contig length of the best assembly from GS Jr was 165 kbp, and the number of contigs was 309. Single runs of Ion PGM and MiSeq produced data of considerably greater sequencing coverage, 279× and 1,927×, respectively. The optimized result for Ion PGM contained 61 contigs assembled from reads of 77× coverage, and the longest contig was 895 kbp in size. Those for MiSeq were 34 contigs, 58× coverage, and 733 kbp, respectively. These results suggest that higher coverage depth is unnecessary for a better assembly result. We observed that multiple rRNA coding regions were fragmented in the assemblies from the second-generation sequencers, whereas PacBio generated two exceptionally long contigs of 3,288,561 and 1,875,537 bps, each of which was from a single chromosome, with 73× coverage and mean read length 3,119 bp, allowing us to determine the absolute positions of all rRNA operons. PacBio outperformed the other sequencers in terms of the length of contigs and reconstructed the greatest portion of the genome, achieving a genome

  7. Comparison with Magnetic Resonance Three-Dimensional Sequence for Lumbar Nerve Root with Intervertebral Foramen

    PubMed Central

    Takashima, Hiroyuki; Shishido, Hiroki; Yoshimoto, Mitsunori; Imamura, Rui; Akatsuka, Yoshihiro; Terashima, Yoshinori; Fujiwara, Hiroyoshi; Nagae, Masateru; Kubo, Toshikazu; Yamashita, Toshihiko

    2016-01-01

    Study Design Prospective study based on magnetic resonance (MR) imaging of the lumbar spinal root of the intervertebral foramen. Purpose This study was to compare MR three-dimensional (3D) sequences for the evaluation of the lumbar spinal root of the intervertebral foramen. Overview of Literature The diagnosis of spinal disorders by MR imaging is commonly performed using two-dimensional T1- and T2-weighted images, whereas 3D MR images can be used for acquiring further detailed data using thin slices with multi-planar reconstruction. Methods On twenty healthy volunteers, we investigated the contrast-to-noise ratio (CNR) of the lumbar spinal root of the intervertebral foramen with a 3D balanced sequence. The sequences used were the fast imaging employing steady state acquisition and the coherent oscillatory state acquisition for the manipulation of image contrast (COSMIC). COSMIC can be used with or without fat suppression (FS). We compared these sequence to determine the optimized visualization sequence for the lumbar spinal root of the intervertebral foramen. Results For the CNR between the nerve root and the peripheral tissue, these were no significant differences between the sequences at the entry of foramen. There was a significant difference and the highest CNR was seen with COSMIC-FS for the intra- and extra-foramen. Conclusions In this study, the findings suggest that the COSMIC-FS sequences should be used for the internal or external foramen for spinal root disorders. PMID:26949459

  8. Magnetic resonance elastography: A comparison between pulse sequences across field strengths

    NASA Astrophysics Data System (ADS)

    Owen, Graham

    Several Magnetic Resonance Elastography (MRE) techniques have been developed to non-invasively measure tissue stiffness which can be altered by disease processes such as liver fibrosis. Different MRE sequences are needed to fill various roles clinically such as spin-echo based sequences for patients with iron overload, or rapid sequences for patients who cannot execute long breath holds. The purpose of this study was to compare the mean stiffness, variance, and presence of artifacts using three MRE sequences at 1.5T and 3T in phantoms and healthy volunteers. In the phantom study variance was found to decrease with increasing slice thickness as well as at higher field strength. The SE-EPI sequence tended to overestimate low stiffness and underestimate high stiffness while the rapid sequence significantly overestimated stiffness of both the soft and stiff phantom. In the volunteers no significant difference was found between the sequences in terms of measured stiffness. The variability between acquisitions in a single setup as well as between setups was minimal, showing that MRE is a very robust technique.

  9. [Comparison of rDNA internal transcribed spacer sequences in asparagus].

    PubMed

    Ou, Li-Jun; Ye, Wei; Zeng, Gui-Ping; Jiang, Xiang-Hui; She, Chao-Wen; Xu, Dong; Yang, Jia-Qiang

    2010-10-01

    Using ITS sequence of nine species to identify counterfeiting medicine and analyse phylogenetic of Asparagus. Analysing ITS sequences by amplification, cloning,sequencing and alignment. The length range of ITS sequence of nine species was from 711 to 748 bp, the percentage of G + C content was about 60%. The phylogenetic tree constructed on the basis of the ITS sequences showed that nine species were divided into two branches: Asparagus cochinchinensis, Asparagus officinalis, Asparagus densiflorus, Asparagus densiflorus cv. Myers and Asparagus densiflorus cv. Sprengeri were a branch and the others were a branch. Asparagus densiflorus and Asparagus densflorus cv. Myers those were from Africa had priority to clustering and then clustering with Asparagus densiflorus cv. Sprengeri that was a variant of Asparagus densiflorus in the first branch. Asparagus setaceus had relatively distant genetic relationship with the others three materials in another branch. The ITS sequences could distinguish species of Asparagus to test the counterfeit. Division status in phylogenetic tree of some species were debatable and ITS sequence was combined with others analytical tools to analyze the realistic phylogeny.

  10. Human ribosomal RNA gene: nucleotide sequence of the transcription initiation region and comparison of three mammalian genes.

    PubMed Central

    Financsek, I; Mizumoto, K; Mishima, Y; Muramatsu, M

    1982-01-01

    The transcription initiation site of the human ribosomal RNA gene (rDNA) was located by using the single-strand specific nuclease protection method and by determining the first nucleotide of the in vitro capped 45S preribosomal RNA. The sequence of 1,211 nucleotides surrounding the initiation site was determined. The sequenced region was found to consist of 75% G and C and to contain a number of short direct and inverted repeats and palindromes. By comparison of the corresponding initiation regions of three mammalian species, several conserved sequences were found upstream and downstream from the transcription starting point. Two short A + T-rich sequences are present on human, mouse, and rat ribosomal RNA genes between the initiation site and 40 nucleotides upstream, and a C + T cluster is located at a position around -60. At and downstream from the initiation site, a common sequence, T-AG-C-T-G-A-C-A-C-G-C-T-G-T-C-C-T-CT-T, was found in the three genes from position -1 through +18. The strong conservation of these sequences suggests their functional significance in rDNA. The S1 nuclease protection experiments with cloned rDNA fragments indicated the presence in human 45S RNA of molecules several hundred nucleotides shorter than the supposed primary transcript. The first 19 nucleotides of these molecules appear identical--except for one mismatch--to the nucleotide sequence of the 5' end of a supposed early processing product of the mouse 45S RNA. Images PMID:6954460

  11. Comparison of Pre-Analytical FFPE Sample Preparation Methods and Their Impact on Massively Parallel Sequencing in Routine Diagnostics

    PubMed Central

    Heydt, Carina; Fassunke, Jana; Künstlinger, Helen; Ihle, Michaela Angelika; König, Katharina; Heukamp, Lukas Carl; Schildhaus, Hans-Ulrich; Odenthal, Margarete; Büttner, Reinhard; Merkelbach-Bruse, Sabine

    2014-01-01

    Over the last years, massively parallel sequencing has rapidly evolved and has now transitioned into molecular pathology routine laboratories. It is an attractive platform for analysing multiple genes at the same time with very little input material. Therefore, the need for high quality DNA obtained from automated DNA extraction systems has increased, especially to those laboratories which are dealing with formalin-fixed paraffin-embedded (FFPE) material and high sample throughput. This study evaluated five automated FFPE DNA extraction systems as well as five DNA quantification systems using the three most common techniques, UV spectrophotometry, fluorescent dye-based quantification and quantitative PCR, on 26 FFPE tissue samples. Additionally, the effects on downstream applications were analysed to find the most suitable pre-analytical methods for massively parallel sequencing in routine diagnostics. The results revealed that the Maxwell 16 from Promega (Mannheim, Germany) seems to be the superior system for DNA extraction from FFPE material. The extracts had a 1.3–24.6-fold higher DNA concentration in comparison to the other extraction systems, a higher quality and were most suitable for downstream applications. The comparison of the five quantification methods showed intermethod variations but all methods could be used to estimate the right amount for PCR amplification and for massively parallel sequencing. Interestingly, the best results in massively parallel sequencing were obtained with a DNA input of 15 ng determined by the NanoDrop 2000c spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). No difference could be detected in mutation analysis based on the results of the quantification methods. These findings emphasise, that it is particularly important to choose the most reliable and constant DNA extraction system, especially when using small biopsies and low elution volumes, and that all common DNA quantification techniques can be used for

  12. Accuracy of real-time MR temperature mapping in the brain: a comparison of fast sequences.

    PubMed

    Kickhefel, A; Roland, Jörg; Weiss, Clifford; Schick, Fritz

    2010-10-01

    To compare magnetic resonance (MR) thermometry based on the proton resonance frequency (PRF) method using a single shot echoplanar imaging (ss EPI) sequence to both of the standard sequences, gradient echo (GRE) and segmented echoplanar imaging (seg EPI) in the in vivo human brain, at 1.5T and 3T. Repetitive MR thermometry was performed on the brain of six volunteers using GRE, seg EPI, and ss EPI sequences on whole-body 1.5T and 3T clinical systems using comparable acquisition parameters. Phase stability and temperature data precision in the human head were determined over 12 min for the three sequences at both field strengths. An ex-vivo swine skeletal muscle model was used to evaluate temperature accuracy of the ss EPI sequence during heating by high intensity focused ultrasound (HIFU). In-vivo examinations of brain revealed an average temperature precision of 0.37 °C/0.39 °C/0.16 °C at 3T for the GRE/seg EPI/ss EPI sequences. At 1.5T, a precision of 0.58 °C/0.63 °C/0.21 °C was achieved. In the ex-vivo swine model, a strong correlation of temperature data derived using ss EPI and GRE sequences was found with a temperature deviation <1 °C. The ss EPI sequence was the fastest and the most precise sequence for MR thermometry, with significantly higher accuracy compared to GRE. Copyright © 2009 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.

  13. Assessing the effects of silver nanoparticles on biological nutrient removal in bench-scale activated sludge sequencing batch reactors.

    PubMed

    Alito, Christina L; Gunsch, Claudia K

    2014-01-21

    Consumer products such as clothing and medical products are increasingly integrating silver and silver nanoparticles (AgNPs) into base materials to serve as an antimicrobial agent. Thus, it is critical to assess the effects of AgNPs on wastewater microorganisms essential to biological nutrient removal. In the present study, pulse and continuous additions of 0.2 and 2 ppm gum arabic and citrate coated AgNPs as well as Ag as AgNO3 were fed into sequencing batch reactors (SBRs) inoculated with nitrifying sludge. Treatment efficiency (chemical oxygen demand (COD) and ammonia removal), Ag dissolution measurements, and 16S rRNA bacterial community analyses (terminal restriction fragment length polymorphism, T-RFLP) were performed to evaluate the response of the SBRs to Ag addition. Results suggest that the AgNPs may have been precipitating in the SBRs. While COD and ammonia removal decreased by as much as 30% or greater directly after spikes, SBRs were able to recover within 24 h (3 hydraulic retention times (HRTs)) and resume removal near 95%. T-RFLP results indicate Ag spiked SBRs were similar in a 16s rRNA bacterial community. The results shown in this study indicate that wastewater treatment could be impacted by Ag and AgNPs in the short term but the amount of treatment disruption will depend on the magnitude of influent Ag.

  14. A review of the prevalence, utility, and caveats of using chloroplast simple sequence repeats for studies of plant biology1

    PubMed Central

    Wheeler, Gregory L.; Dorman, Hanna E.; Buchanan, Alenda; Challagundla, Lavanya; Wallace, Lisa E.

    2014-01-01

    Microsatellites occur in all plant genomes and provide useful markers for studies of genetic diversity and structure. Chloroplast microsatellites (cpSSRs) are frequently targeted because they are more easily isolated than nuclear microsatellites. Here, we quantified the frequency and uses of cpSSRs based on a literature review of over 400 studies published 1995–2013. These markers are an important and economical tool for plant biologists and continue to be used alongside modern genomics approaches to study genetic diversity and structure, evolutionary history, and hybridization in native and agricultural species. Studies using species-specific primers reported a greater number of polymorphic loci than those employing universal primers. A major disadvantage to cpSSRs is fragment size homoplasy; therefore, we documented its occurrence at several cpSSR loci within and between species of Acmispon (Fabaceae). Based on our empirical data set, we recommend targeted sequencing of a subset of samples combined with fragment genotyping as a cost-efficient, data-rich approach to the use of cpSSRs and as a test of homoplasy. The availability of genomic resources for plants aids in the development of primers for new study systems, thereby enhancing the utility of cpSSRs across plant biology. PMID:25506520

  15. Isolation, amino acid sequence and biological characterization of an "aspartic-49" phospholipase A₂ from Bothrops (Rhinocerophis) ammodytoides venom.

    PubMed

    Clement, Herlinda; Costa de Oliveira, Vanessa; Zamudio, Fernando Z; Lago, Néstor R; Valdez-Cruz, Norma A; Bérnard Valle, Melisa; Hajos, Silvia E; Alagón, Alejandro; Possani, Lourival D; de Roodt, Adolfo R

    2012-12-01

    A phospholipase enzyme was separated by chromatography from the venom of the snake Bothrops (Rhinocerophis) ammodytoides and characterized. The experimentally determined molecular weight was 13,853.65 Da, and the full primary structure was determined by Edman degradation and mass spectrometry analysis. The enzyme contains 122 amino acids residues closely stabilized by 7 disulfide bridges with an isoelectric point of 6.13. Sequence comparison with other known secretory PLA2 shows that the enzyme isolated belongs to the group II, presenting an aspartic acid residue at position 48 (numbered by convention as Asp49) of the active site, and accordingly displaying enzymatic activity. The enzyme corresponds to 3% of the total mass of the venom. The enzyme is mildly toxic to mice. The intravenous LD₅₀ of this phospholipase in CD-1 mice was around 6 μg/g of mouse body weight (more exactly 117 μg/mouse of 20 g) and the minimal mortal dose (MMD) was estimated to be close to 10 μg/g. In contrast, the LD₅₀ of the venom was circa 2 μg/g mouse body weight. Toxicological analyses of the purified enzyme were performed in vitro and in vivo using experimental animals (mice and rats). The enzyme at high doses caused pulmonary congestion, intraperitoneal bleeding, inhibition of clot retraction and muscle tissue alterations with increasing of creatine kinase levels.

  16. Genome sequence of "Candidatus Microthrix parvicella" Bio17-1, a long-chain-fatty-acid-accumulating filamentous actinobacterium from a biological wastewater treatment plant.

    PubMed

    Muller, Emilie E L; Pinel, Nicolás; Gillece, John D; Schupp, James M; Price, Lance B; Engelthaler, David M; Levantesi, Caterina; Tandoi, Valter; Luong, Khai; Baliga, Nitin S; Korlach, Jonas; Keim, Paul S; Wilmes, Paul

    2012-12-01

    "Candidatus Microthrix" bacteria are deeply branching filamentous actinobacteria which occur at the water-air interface of biological wastewater treatment plants, where they are often responsible for foaming and bulking. Here, we report the first draft genome sequence of a strain from this genus: "Candidatus Microthrix parvicella" strain Bio17-1.

  17. Complete amino acid sequence of branched-chain amino acid aminotransferase (transaminase B) of Salmonella typhimurium, identification of the coenzyme-binding site and sequence comparison analysis

    SciTech Connect

    Feild, M.J.

    1988-01-01

    The complete amino acid sequence of the subunit of branched-chain amino acid aminotransferase of Salmonella typhimurium was determined by automated Edman degradation of peptide fragments generated by chemical and enzymatic digestion of S-carboxymethylated and S-pyridylethylated transaminase B. Peptide fragments of transaminase B were generated by treatment of the enzyme with trypsin, Staphylococcus aureus V8 protease, endoproteinase Lys-C, and cyanogen bromide. Protocols were developed for separation of the peptide fragments by reverse-phase high performance liquid chromatography (HPLC), ion-exchange HPLC, and SDS-urea gel electrophoresis. The enzyme subunit contains 308 amino acid residues and has a molecular weight of 33,920 daltons. The coenzyme-binding site was determined by treatment of the enzyme, containing bound pyridoxal 5-phosphate, with tritiated sodium borohydride prior to trypsin digestion. Monitoring radioactivity incorporation and peptide map comparisons with an apoenzyme tryptic digest, allowed identification of the pyridoxylated-peptide which was isolated by reverse-phase HPLC and sequenced. The coenzyme-binding site is a lysyl residue at position 159. Some peptides were further characterized by fast atom bombardment mass spectrometry.

  18. Massively parallel rRNA gene sequencing exacerbates the potential for biased community diversity comparisons due to variable library sizes

    SciTech Connect

    Gihring, Thomas; Green, Stefan; Schadt, Christopher Warren

    2011-01-01

    Technologies for massively parallel sequencing are revolutionizing microbial ecology and are vastly increasing the scale of ribosomal RNA (rRNA) gene studies. Although pyrosequencing has increased the breadth and depth of possible rRNA gene sampling, one drawback is that the number of reads obtained per sample is difficult to control. Pyrosequencing libraries typically vary widely in the number of sequences per sample, even within individual studies, and there is a need to revisit the behaviour of richness estimators and diversity indices with variable gene sequence library sizes. Multiple reports and review papers have demonstrated the bias in non-parametric richness estimators (e.g. Chao1 and ACE) and diversity indices when using clone libraries. However, we found that biased community comparisons are accumulating in the literature. Here we demonstrate the effects of sample size on Chao1, ACE, CatchAll, Shannon, Chao-Shen and Simpson's estimations specifically using pyrosequencing libraries. The need to equalize the number of reads being compared across libraries is reiterated, and investigators are directed towards available tools for making unbiased diversity comparisons.

  19. Heat*seq: an interactive web tool for high-throughput sequencing experiment comparison with public data

    PubMed Central

    Devailly, Guillaume; Mantsoki, Anna; Joshi, Anagha

    2016-01-01

    Summary: Better protocols and decreasing costs have made high-throughput sequencing experiments now accessible even to small experimental laboratories. However, comparing one or few experiments generated by an individual lab to the vast amount of relevant data freely available in the public domain might be limited due to lack of bioinformatics expertise. Though several tools, including genome browsers, allow such comparison at a single gene level, they do not provide a genome-wide view. We developed Heat*seq, a web-tool that allows genome scale comparison of high throughput experiments chromatin immuno-precipitation followed by sequencing, RNA-sequencing and Cap Analysis of Gene Expression) provided by a user, to the data in the public domain. Heat*seq currently contains over 12 000 experiments across diverse tissues and cell types in human, mouse and drosophila. Heat*seq displays interactive correlation heatmaps, with an ability to dynamically subset datasets to contextualize user experiments. High quality figures and tables are produced and can be downloaded in multiple formats. Availability and Implementation: Web application: http://www.heatstarseq.roslin.ed.ac.uk/. Source code: https://github.com/gdevailly. Contact: Guillaume.Devailly@roslin.ed.ac.uk or Anagha.Joshi@roslin.ed.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27378302

  20. Microsequence analysis of electroblotted proteins. II. Comparison of sequence performance on different types of PVDF membranes.

    PubMed

    Reim, D F; Speicher, D W

    1992-11-15

    The influence of different types of polyvinylidene difluoride (PVDF) membranes on gas phase sequence performance has been evaluated. These PVDF membranes have been classified as either high retention (Trans-Blot and ProBlott) or low retention membranes (Immobilon-P) based on their ability to bind proteins during electroblotting from gels. Initial yields, repetitive yields, and extraction efficiency of the anilinothiazolinone amino acid derivatives have been compared for several standard proteins that have been either electroblotted or loaded onto PVDF membranes by direct adsorption. These results show that the major differences in initial sequence yields between membranes arise from differences in the amount of protein actually transferred to the membrane rather than sequencer-related factors. In contrast to several previous observations from other laboratories, more tightly bound proteins do not sequence with lower initial yields and initial yields are not affected by the ratio of surface area to protein. The stronger binding on high retention PVDF membranes does not adversely affect recoveries of difficult to extract, or very hydrophobic, amino acid derivatives. Several amino acids, especially tryptophan, are actually recovered in dramatically higher yield on high retention membranes compared with either Immobilon or glass filters. At the same time, the protein and peptide binding properties of high retention membranes will frequently improve the repetitive yield by minimizing sample extraction during the sequencer cycle. Stronger protein binding together with improved electroblotting yields offer substantially improved sequence performance when high retention PVDF membranes are used.

  1. Sequence Comparison for Non-Enhanced MRA of the Lower Extremity Arteries at 7 Tesla

    PubMed Central

    Johst, Sören; Orzada, Stephan; Fischer, Anja; Schäfer, Lena C.; Nassenstein, Kai; Umutlu, Lale; Lauenstein, Thomas C.; Ladd, Mark E.; Maderwald, Stefan

    2014-01-01

    In this study three sequences for non-contrast-enhanced MRA of the lower extremity arteries at 7T were compared. Cardiac triggering was used with the aim to reduce signal variations in the arteries. Two fast single-shot 2D sequences, a modified Ultrafast Spoiled Gradient Echo (UGRE) sequence and a variant of the Quiescent-Interval Single-Shot (QISS) sequence were triggered via phonocardiogram and compared in volunteer examinations to a non-triggered 2D gradient echo (GRE) sequence. For image acquisition, a 16-channel transmit/receive coil and a manually positionable AngioSURF table were used. To tackle B1 inhomogeneities at 7T, Time-Interleaved Acquisition of Modes (TIAMO) was integrated in GRE and UGRE. To compare the three sequences quantitatively, a vessel-to-background ratio (VBR) was measured in all volunteers and stations. In conclusion, cardiac triggering was able to suppress flow artifacts satisfactorily. The modified UGRE showed only moderate image artifacts. Averaged over all volunteers and stations, GRE reached a VBR of 4.18±0.05, UGRE 5.20±0.06, and QISS 2.72±0.03. Using cardiac triggering and TIAMO imaging technique was essential to perform non-enhanced MRA of the lower extremities vessels at 7T. The modified UGRE performed best, as observed artifacts were only moderate and the highest average VBR was reached. PMID:24454963

  2. Comparison of serological and sequence-based methods for typing feline calcivirus isolates from vaccine failures.

    PubMed

    Radford, A D; Dawson, S; Wharmby, C; Ryvar, R; Gaskell, R M

    2000-01-29

    Feline calicivirus (FCV) can be typed by exploiting antigenic differences between isolates or, more recently, by the sequence analysis of a hypervariable region of the virus's capsid gene. These two methods were used to characterise FCV isolates from 20 vaccine failures which occurred after the use of a commercial, live-attenuated vaccine. Using virus neutralisation, the isolates showed a spectrum of relatedness to the vaccine; depending on the criterion adopted for identity, 10 to 40 per cent of them appeared to be similar to the vaccine virus. Using sequence analysis, the isolates fell into one of two categories; 20 per cent had a similar sequence to the vaccine (0-67 to 2-67 per cent distant), and the remainder had a dissimilar sequence (21-3 to 36-0 per cent distant). Sequence analysis identified one cat that appeared to be infected with two distinct FCVs. The serological and sequence-based typing methods gave the same result in 80 to 95 per cent of individual cases, depending on the criterion adopted for serological identity. It is suggested that molecular typing is a more definitive method for characterising the relatedness of FCV isolates.

  3. Pierre Robin Sequence and Treacher Collins Hypoplastic Mandible Comparison Using Three-Dimensional Morphometric Analysis

    PubMed Central

    Chung, Michael T.; Levi, Benjamin; Hyun, Jeong S.; Lo, David D.; Montoro, Daniel T.; Lisiecki, Jeffrey; Bradley, James P.; Buchman, Steven R.; Longaker, Michael T.; Wan, Derrick C.

    2012-01-01

    Pierre Robin sequence and Treacher Collins syndrome are both associated with mandibular hypoplasia. It has been hypothesized, however, that the mandible may be differentially affected. The purpose of this study was to therefore compare mandibular morphology in children with Pierre Robin sequence to children with Treacher Collins syndrome using three-dimensional analysis of computed tomography (CT) scans. A retrospective analysis was performed identifying children with Pierre Robin sequence and Treacher Collins syndrome receiving CT scans. Three-dimensional reconstruction was performed and ramus height, mandibular body length, and gonial angle were measured. These were then compared to control children with normal mandibles and to clinical norms corrected for age and sex based on previously published measurements. Mandibular body length was found to be significantly shorter for children with Pierre Robin sequence while ramus height was significantly shorter for children with Treacher Collins syndrome. This resulted in distinctly different ramus height/mandibular body length ratios. In addition, the gonial angle was more obtuse in both the Pierre Robin sequence and Treacher Collins syndrome groups compared with the controls. Three-dimensional mandibular morphometric analysis in patients with Pierre Robin sequence and Treacher Collins syndrome thus revealed distinctly different patterns of mandibular hypoplasia relative to normal controls. These findings underscore distinct considerations which must be made in surgical planning for reconstruction. PMID:23154353

  4. Pierre Robin sequence and Treacher Collins hypoplastic mandible comparison using three-dimensional morphometric analysis.

    PubMed

    Chung, Michael T; Levi, Benjamin; Hyun, Jeong S; Lo, David D; Montoro, Daniel T; Lisiecki, Jeffrey; Bradley, James P; Buchman, Steven R; Longaker, Michael T; Wan, Derrick C

    2012-11-01

    Pierre Robin sequence and Treacher Collins syndrome are both associated with mandibular hypoplasia. It has been hypothesized, however, that the mandible may be differentially affected. The purpose of this study was to therefore compare mandibular morphology in children with Pierre Robin sequence with children with Treacher Collins syndrome using three-dimensional analysis of computed tomographic scans. A retrospective analysis was performed identifying children with Pierre Robin sequence and Treacher Collins syndrome undergoing computed tomography. Three-dimensional reconstruction was performed, and ramus height, mandibular body length, and gonial angle were measured. These were then compared with those in control children with normal mandibles and with the clinical norms corrected for age and sex based on previously published measurements. Mandibular body length was found to be significantly shorter for children with Pierre Robin sequence, whereas ramus height was significantly shorter for children with Treacher Collins syndrome. This resulted in distinctly different ramus height-mandibular body length ratios. In addition, the gonial angle was more obtuse in both the Pierre Robin sequence and Treacher Collins syndrome groups compared with the controls. Three-dimensional mandibular morphometric analysis in patients with Pierre Robin sequence and Treacher Collins syndrome thus revealed distinctly different patterns of mandibular hypoplasia relative to normal controls. These findings underscore distinct considerations that must be made in surgical planning for reconstruction.

  5. Biological diversity of created forested wetlands in comparison to reference forested wetlands in the Bay watershed

    USGS Publications Warehouse

    Perry, M.C.; Osenton, P.C.; Stoll, C.S.; Therres, Glenn D.

    2001-01-01

    Amphibians, reptiles, birds, and mammals were surveyed at six created forested wetlands in central Maryland and at six adjacent reference forested wetlands during 1993-1996 to determine comparative biological diversity of these habitats. Amphibians and reptiles were caught in pitfall and funnel traps associated with 15.4m (50 ft) drift fences. Birds were surveyed with a complete count while walking through each area. Mammals were surveyed by capture in live traps. More species and total individuals of amphibians were caught on the reference wetlands than on the created wetlands. The red-backed salamander (Plethodon cinereus), the four-toed salamander (Hemidactylium scutatum), the eastern spadefoot (Scaphiopus holbrooki), and the wood frog (Rana sylvatica) were captured on the reference wetlands, but not on the created sites. The wood frog was captured at all reference sites and may represent the best amphibian species to characterize a forested wetland. Reptiles were not caught in sufficient numbers to warrant comparisons. Ninety-two bird species were recorded on created sites and 55 bird species on the reference sites. Bird species on the created sites represented those typically found in nonforested habitats. Mammal species were similar on both sites, but overall the reference sites had three times the number caught on created sites. The meadow vole (Microtus pennsylvanicus) was the dominant species captured on created sites, and the white-footed mouse (Peromyscus leucopus) was the dominant species on reference sites, with little habitat overlap for these two species. Although species richness and total number of animals were high for created forested wetlands, these survey results show major differences from species expected for a forested wetland. The created forested wetlands appear to provide good habitat for wildlife, but are probably not providing the full functions and values of the forested wetlands that they were constructed to replace.

  6. Comparison of Biological and Immunological Characterization of Lipopolysaccharides From Brucella abortus RB51 and S19

    PubMed Central

    Kianmehr, Zahra; Kaboudanian Ardestani, Sussan; Soleimanjahi, Hoorieh; Fotouhi, Fatemeh; Alamian, Saeed; Ahmadian, Shahin

    2015-01-01

    Background: Brucella abortus RB51 is a rough stable mutant strain, which has been widely used as a live vaccine for prevention of brucellosis in cattle instead of B. abortus strain S19. B. abortus lipopolysaccharide (LPS) has unique properties in comparison to other bacterial LPS. Objectives: In the current study, two types of LPS, smooth (S-LPS) and rough (R-LPS) were purified from B. abortus S19 and RB51, respectively. The aim of this study was to evaluate biological and immunological properties of purified LPS as an immunogenical determinant. Materials and Methods: Primarily, S19 and RB51 LPS were extracted and purified by two different modifications of the phenol water method. The final purity of LPS was determined by chemical analysis (2-keto-3-deoxyoctonate (KDO), glycan, phosphate and protein content) and different staining methods, following sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE). C57BL/6 mice were immunized subcutaneously three times at biweekly intervals with the same amount of purified LPSs. The humoral immunity was evaluated by measuring specific IgG levels and also different cytokine levels, such as IFN-γ, TNF-α, IL-4 and IL-10, were determined for assessing T-cell immune response. Results: Biochemical analysis data and SDS-PAGE profile showed that the chemical nature of S19 LPS is different from RB51 LPS. Both S and R-LPS induce an immune response. T-cell immune response induced by both S and R-LPS had almost the same pattern whereas S19 LPS elicited humoral immunity, which was higher than RB51 LPS. Conclusions: Purified LPS can be considered as a safe adjuvant and can be used as a component in prophylactic and therapeutic vaccines targeting infectious disease, cancer and allergies. PMID:26862376

  7. Comparison of hepatocellular carcinoma miRNA expression profiling as evaluated by next generation sequencing and microarray.

    PubMed

    Murakami, Yoshiki; Tanahashi, Toshihito; Okada, Rina; Toyoda, Hidenori; Kumada, Takashi; Enomoto, Masaru; Tamori, Akihiro; Kawada, Norifumi; Taguchi, Y-h; Azuma, Takeshi

    2014-01-01

    MicroRNA (miRNA) expression profiling has proven useful in diagnosing and understanding the development and progression of several diseases. Microarray is the standard method for analyzing miRNA expression profiles; however, it has several disadvantages, including its limited detection of miRNAs. In recent years, advances in genome sequencing have led to the development of next-generation sequencing (NGS) technologies, which significantly advance genome sequencing speed and discovery. In this study, we compared the expression profiles obtained by next generation sequencing (NGS) with the profiles created using microarray to assess if NGS could produce a more accurate and complete miRNA profile. Total RNA from 14 hepatocellular carcinoma tumors (HCC) and 6 matched non-tumor control tissues were sequenced with Illumina MiSeq 50-bp single-end reads. Micro RNA expression profiles were estimated using miRDeep2 software. As a comparison, miRNA expression profiles for 11 out of 14 HCCs were also established by microarray (Agilent human microRNA microarray). The average total sequencing exceeded 2.2 million reads per sample and of those reads, approximately 57% mapped to the human genome. The average correlation for miRNA expression between microarray and NGS and subtraction were 0.613 and 0.587, respectively, while miRNA expression between technical replicates was 0.976. The diagnostic accuracy of HCC, p-value, and AUC were 90.0%, 7.22×10(-4), and 0.92, respectively. In summary, NGS created an miRNA expression profile that was reproducible and comparable to that produced by microarray. Moreover, NGS discovered novel miRNAs that were otherwise undetectable by microarray. We believe that miRNA expression profiling by NGS can be a useful diagnostic tool applicable to multiple fields of medicine.

  8. Phylogenetic relationships of Salmonella based on DNA sequence comparison of atpD encoding the beta subunit of ATP synthase.

    PubMed

    Christensen, H; Olsen, J E

    1998-04-01

    DNA sequences covering 57% of atpD encoding the beta subunit of ATP synthase were determined for 16 strains of Salmonella enterica, two strains of S. bongori, and one strain each of Citrobacter freundii and Yersinia enterocolitica, and comparison was made with the published Escherichia coli and Enterobacter aerogenes sequences. The phylogenetic tree based on maximum-likelihood analysis showed separation of the subspecies of S. enterica except for two serotypes of subspecies II which were unsupported by a common node. The two serotypes of S. bongori were separated from S. enterica and related to the serotypes of subspecies II. A tight relationship was found between S. enterica subspecies IIIa consisting of monophasic serotypes and subspecies IIIb consisting of diphasic serotypes. This is in conflict with results obtained for most other housekeeping genes and the 23S rRNA gene separating mono- from diphasic subspecies.

  9. Next-Generation Sequencing of the Bacterial 16S rRNA Gene for Forensic Soil Comparison: A Feasibility Study.

    PubMed

    Jesmok, Ellen M; Hopkins, James M; Foran, David R

    2016-05-01

    Soil has the potential to be valuable forensic evidence linking a person or item to a crime scene; however, there is no established soil individualization technique. In this study, the utility of soil bacterial profiling via next-generation sequencing of the 16S rRNA gene was examined for associating soils with their place of origin. Soil samples were collected from ten diverse and nine similar habitats over time, and within three habitats at various horizontal and vertical distances. Bacterial profiles were analyzed using four methods: abundance charts and nonmetric multidimensional scaling provided simplification and visualization of the massive datasets, potentially aiding in expert testimony, while analysis of similarities and k-nearest neighbor offered objective statistical comparisons. The vast majority of soil bacterial profiles (95.4%) were classified to their location of origin, highlighting the potential of bacterial profiling via next-generation sequencing for the forensic analysis of soil samples.

  10. Digital DNA-DNA hybridization for microbial species delineation by means of genome-to-genome sequence comparison

    PubMed Central

    Auch, Alexander F.; von Jan, Mathias; Klenk, Hans-Peter; Göker, Markus

    2010-01-01

    The pragmatic species concept for Bacteria and Archaea is ultimately based on DNA-DNA hybridization (DDH). While enabling the taxonomist, in principle, to obtain an estimate of the overall similarity between the genomes of two strains, this technique is tedious and error-prone and cannot be used to incrementally build up a comparative database. Recent technological progress in the area of genome sequencing calls for bioinformatics methods to replace the wet-lab DDH by in-silico genome-to-genome comparison. Here we investigate state-of-the-art methods for inferring whole-genome distances in their ability to mimic DDH. Algorithms to efficiently determine high-scoring segment pairs or maximally unique matches perform well as a basis of inferring intergenomic distances. The examined distance functions, which are able to cope with heavily reduced genomes and repetitive sequence regions, outperform previously described ones regarding the correlation with and error ratios in emulating DDH. Simulation of incompletely sequenced genomes indicates that some distance formulas are very robust against missing fractions of genomic information. Digitally derived genome-to-genome distances show a better correlation with 16S rRNA gene sequence distances than DDH values. The future perspectives of genome-informed taxonomy are discussed, and the investigated methods are made available as a web service for genome-based species delineation. PMID:21304684

  11. Comparison of sequences formed in Marine sabkha (subaerial) and salina (Subaqueous) settings-modern and ancient

    SciTech Connect

    Warren, J.K.; St. Kendall, C.G.

    1985-06-01

    Marine evaporites occurring in modern subaqueous (salina) settings and subaerial (sabkha) settings are different. Subaqueous Holocene evaporites occur as shoalingupward lacustrine sequences up to 10 m thick. They are evaporite dominated and are composed primarily of bottom-nucleated crystals that may be deposited as massive, laminated, or rippled units. Each coastal lake is dominated by laminated evaporites with subordinate carbonate sediments. In plan view, they show a well-developed bull's-eye pattern with a sulfate center and a carbonate rim. In contrast, subaerial (sabkha) evaporites occur as part of a laterally prograding, shoaling-upward, peritidal sequence in which the supratidal unit is usually no more than 1 m thick. Sabkha sequences are matrix dominated, not evaporite dominated, with the bulk of the sulfate phase occurring as diagenetic nodules, enteroliths, or diapirlike structures. These sulfates were formed during syndepositional diagenesis by replacement and displacement processes. The various facies of the sequence tend to accumulate in belts parallel with the shoreline. Relative to the sea level or the brine level, sabkhas tend to form over paleotopographic highs whereas salinas tend to occur in paleotopographic lows. Some of the characteristics that distinguish Holocene subaerial and subaqueous evaporite sequences can be used to do the same for similar ancient facies, even when gypsum has been converted to nodular anhydrite. The distinction is important for it can be used by explorationists in the oil industry to define the paleotopography of the associated underlying porous and nonporous carbonates.

  12. A comparison of the VP7 gene sequences of human and bovine rotaviruses.

    PubMed

    Gerna, G; Steele, A D; Hoshino, Y; Sereno, M; Garcia, D; Sarasini, A; Flores, J

    1994-07-01

    The sequences of the gene encoding VP7 (the major outer capsid protein) from one bovine and three human rotavirus strains were determined because of their unusual VP7 specificities. Two of the human strains (PA 169 and PA 151) had VP7 serotype 6 specificity whereas the two other strains, recovered from a child (HAL 1166) and a calf (678) belonged to VP7 serotype 8. The serotype 8 strains exhibited a high degree of sequence conservation when compared with each other and with other serotype 8 strains previously sequenced. The serotype 6 human strains shared a greater degree of sequence similarity with previously reported serotype 6 bovine strains than with other rotavirus serotypes; however the degree of sequence similarity among PA 169, PA 151 and the bovine strains was lower than had been previously reported for strains belonging to the same serotype. The demonstration of rotavirus serotypes that are shared between human and animal species supports the concept that interspecies transmission occurs and may play a role in rotavirus evolution.

  13. [Comparison of whole genome sequences and replication ability in cell cultures between two avian leukosis viruses of subgroup B].

    PubMed

    Wu, Zhuan-Chang; Zhu, Mei-Zhen; Bian, Xiao-Ming; Ma, Cheng-Tai; Zhao, Peng; Cui, Zhi-Zhong

    2011-09-01

    The purpose of this study was to compare the whole genome sequences and replication dynamics in cell cultures of two Avian leukosis viruses of subgroup B (ALV) isolates, SDAU09E3 and SDAU09C2. Comparison of the amino acid sequences indicated that the gp85 identity of these two subgroup B isolates was 95.4%, the identity with other three ALV-B reference strains was 91.0%-94.9%, and less than 87.9% with ALV subgroup A, C, D, E and J. Comparison of the nucleotide sequence of gag and pol genes indicated that homologies of gag gene and pol gene of these two ALV-B isolates with all compared reference strains of different subgroups were above 93%. Homologies of LTR sequence of these two ALV-B isolates with other exogenous ALVs subgroups A, B, C, D and J were 72.6%-88.3%, but only 51.5% when compared with endogenous ALV subgroup E. The identity of LTR between these two ALV-B strains was only 74.8%, which was far lower than the identity of other genes. The identity of U3 region of LTR between these two ALV-B isolates was only 68.8% and there were obvious differences in the number CAAT Boxes. Replication dynamics in DF-1 cell indicated that the value of TCID50 was similar between 2 isolates but the concentration of nucleocapsid protein p27 antigen of SDAU09E3 was significantly higher than SDAU09C2 in cell culture supernatant, which indicated there was no parallel relationship between p27 antigen concentration and infectious virus particles. Whether such difference was resulted from the diversity of U3 region of LTR, further studies with their recombinant infectious clones is necessary.

  14. Comparison of whole genome amplification techniques for human single cell exome sequencing

    PubMed Central

    Borgström, Erik; Paterlini, Marta; Mold, Jeff E.; Frisen, Jonas; Lundeberg, Joakim

    2017-01-01

    Background Whole genome amplification (WGA) is currently a prerequisite for single cell whole genome or exome sequencing. Depending on the method used the rate of artifact formation, allelic dropout and sequence coverage over the genome may differ significantly. Results The largest difference between the evaluated protocols was observed when analyzing the target coverage and read depth distribution. These differences also had impact on the downstream variant calling. Conclusively, the products from the AMPLI1 and MALBAC kits were shown to be most similar to the bulk samples and are therefore recommended for WGA of single cells. Discussion In this study four commercial kits for WGA (AMPLI1, MALBAC, Repli-G and PicoPlex) were used to amplify human single cells. The WGA products were exome sequenced together with non-amplified bulk samples from the same source. The resulting data was evaluated in terms of genomic coverage, allelic dropout and SNP calling. PMID:28207771

  15. A comparison of methods for estimating the transition:transversion ratio from DNA sequences.

    PubMed

    Kristina Strandberg, A K; Salter, Laura A

    2004-08-01

    Estimation of the ratio of the rates of transitions to transversions (TI:TV ratio) for a collection of aligned nucleotide sequences is important because it provides insight into the process of molecular evolution and because such estimates may be used to further model the evolutionary process for the sequences under consideration. In this paper, we compare several methods for estimating the TI:TV ratio, including the pairwise method [TREE 11 (1996) 158], a modification of the pairwise method due to Ina [J. Mol. Evol. 46 (1998) 521], a method based on parsimony (TREE 11 (1996) 158), a method due to Purvis and Bromham [J. Mol. Evol. 44 (1997) 112] that uses phylogenetically independent pairs of sequences, the maximum likelihood method, and a Bayesian method [Bioinformatics 17 (2001) 754]. We examine the performance of each estimator under several conditions using both simulated and real data.

  16. A highly conserved N-terminal sequence for teleost vitellogenin with potential value to the biochemistry, molecular biology and pathology of vitellogenesis

    USGS Publications Warehouse

    Folmar, L.D.; Denslow, N.D.; Wallace, R.A.; LaFleur, G.; Gross, T.S.; Bonomelli, S.; Sullivan, C.V.

    1995-01-01

    N-terminal amino acid sequences for vitellogenin (Vtg) from six species of teleost fish (striped bass, mummichog, pinfish, brown bullhead, medaka, yellow perch and the sturgeon) are compared with published N-terminal Vtg sequences for the lamprey, clawed frog and domestic chicken. Striped bass and mummichog had 100% identical amino acids between positions 7 and 21, while pinfish, brown bullhead, sturgeon, lamprey, Xenopus and chicken had 87%, 93%, 60%, 47%, 47-60%) for four transcripts and had 40% identical, respectively, with striped bass for the same positions. Partial sequences obtained for medaka and yellow perch were 100% identical between positions 5 to 10. The potential utility of this conserved sequence for studies on the biochemistry, molecular biology and pathology of vitellogenesis is discussed.

  17. Extracting data from the muck: deriving biological insight from complex microbial communities and non-model organisms with next generation sequencing.

    PubMed

    Solomon, Kevin V; Haitjema, Charles H; Thompson, Dawn A; O'Malley, Michelle A

    2014-08-01

    It is becoming increasingly clear that microbes within microbial communities, for which cultured isolates have not yet been obtained, have an immense, untapped reservoir of enzymes that could help address grand challenges in human health, energy, and sustainability. Despite the obstacles associated with culturing these microbes, recent advances in next-generation sequencing (NGS) have made it possible to explore complex microbial communities in their native context for the first time. Key to extracting meaning from rapidly growing NGS datasets are bioinformatics tools that assemble the sequence data, annotate homologous sequences and interrogate it to reveal regulatory patterns. Complementing this are advances in proteomics that can link NGS data to biological function. This combination of next generation sequencing, proteomics and bioinformatic analysis forms a powerful tool to study non-model microbes, which will transform what we know about these dynamic systems.

  18. Comparison of Modern 3D and 2D MR Imaging Sequences of the Wrist at 3 Tesla.

    PubMed

    Rehnitz, C; Klaan, B; von Stillfried, F; Amarteifio, E; Burkholder, I; Kauczor, H U; Weber, M A

    2016-08-01

    To compare the image quality of modern 3 D and 2 D sequences for dedicated wrist imaging at 3 Tesla (T) MRI. At 3 T MRI, 18 patients (mean age: 36.2 years) with wrist pain and 16 healthy volunteers (mean age: 26.4 years) were examined using 2 D proton density-weighted fat-saturated (PDfs), isotropic 3 D TrueFISP, 3 D MEDIC, and 3 D PDfs SPACE sequences. Image quality was rated on a five-point scale (0 - 4) including overall image quality (OIQ), visibility of important structures (cartilage, ligaments, TFCC) and degree of artifacts. Signal-to-noise ratios (SNR) and contrast-to-noise ratios (CNR) of cartilage/bone/muscle/fluid as well as the mean overall SNR/CNR were calculated using region-of-interest analysis. ANOVA, paired t-, and Wilcoxon-signed-rank tests were applied. The image quality of all tested sequences was superior to 3 D PDfs SPACE (p < 0.01). 3 D TrueFISP had the highest combined cartilage score (mean: 3.4) and performed better in cartilage comparisons against 3 D PDfs SPACE in both groups and 2 D PDfs in volunteers (p < 0.05). 3 D MEDIC performed better in 7 of 8 comparisons (p < 0.05) regarding ligaments and TFCC. 2 D PDfs provided constantly high scores. The mean overall SNR/CNR for 2 D PDfs, 3 D PDfs SPACE, 3 D TrueFISP, and 3 D MEDIC were 68/65, 32/27, 45/47, and 57/45, respectively. 2 D PDfs performed best in most SNR/CNR comparisons (p < 0.05) and 3 D MEDIC performed best within the 3 D sequences (p < 0.05). Except 3 D PDfs SPACE, all tested 3 D and 2 D sequences provided high image quality. 3 D TrueFISP was best for cartilage imaging, 3 D MEDIC for ligaments and TFCC and 2 D PDfs for general wrist imaging. • 3 D TrueFISP is recommended for cartilage imaging of the wrist at 3 T.• 3 D MEDIC is recommended for ligaments and TFCC.• Robust 2 D PDfs should be used in routine protocols. 3 D sequences may be added depending on the clinical question

  19. Genomic Analysis of a Marine Bacterium: Bioinformatics for Comparison, Evaluation, and Interpretation of DNA Sequences.

    PubMed

    Rekadwad, Bhagwan N; Gonzalez, Juan M; Khobragade, Chandrahasya N

    2016-01-01

    A total of five highly related strains of an unidentified marine bacterium were analyzed through their short genome sequences (AM260709-AM260713). Genome-to-Genome Distance (GGDC) showed high similarity to Pseudoalteromonas haloplanktis (X67024). The generated unique Quick Response (QR) codes indicated no identity to other microbial species or gene sequences. Chaos Game Representation (CGR) showed the number of bases concentrated in the area. Guanine residues were highest in number followed by cytosine. Frequency of Chaos Game Representation (FCGR) indicated that CC and GG blocks have higher frequency in the sequence from the evaluated marine bacterium strains. Maximum GC content for the marine bacterium strains ranged 53-54%. The use of QR codes, CGR, FCGR, and GC dataset helped in identifying and interpreting short genome sequences from specific isolates. A phylogenetic tree was constructed with the bootstrap test (1000 replicates) using MEGA6 software. Principal Component Analysis (PCA) was carried out using EMBL-EBI MUSCLE program. Thus, generated genomic data are of great assistance for hierarchical classification in Bacterial Systematics which combined with phenotypic features represents a basic procedure for a polyphasic approach on unambiguous bacterial isolate taxonomic classification.

  20. Comparison of human papillomavirus detection and genotyping with four different prime sets by PCR-sequencing.

    PubMed

    Cai, Yu Pin; Yang, Yi; Zhu, Bao Li; Li, Yuan; Xia, Xiao Yu; Zhang, Rui Fen; Xiang, Yang

    2013-01-01

    To assess and compare the Human Papillomavirus (HPV) detection efficiency and the potential clinical utility of PCR sequencing-based technology. Four HPV consensus primer sets (GP5+/6+, MGP, MY09/11, and PGMY09/11) were used in order to amplify a broad spectrum of HPV types for HPV infection in 325 cervical samples and the PCR products were sequenced afterwards for the HPV genotyping. The HPV-positive rate was 75.4%, of which 35.5% harbored more than one HPV genotype. A total of 36 different genotypes was found, with HPV 16 (24.1%) being the most prevalent, followed by HPV 58 (13.3%) and HPV 52 (9.6%). There were substantial to almost perfect agreements between different primer sets regarding HPV detection efficiency, with the kappa value varying from 0.751 to 0.925, MGP, and PGMY09/11 were the most effective in detecting multiple infections (P < 0.001). With each of the primer sets, a board range of HPV types could be identified, though there were several differences for a few genotypes. The substantial agreement between PCR-sequencing and HC2 for the detection of high-risk HPV (kappa=0.761) indicated that PCR-sequencing is also suitable for routine HPV screening. Copyright © 2013 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.

  1. Genomic Analysis of a Marine Bacterium: Bioinformatics for Comparison, Evaluation, and Interpretation of DNA Sequences

    PubMed Central

    Khobragade, Chandrahasya N.

    2016-01-01

    A total of five highly related strains of an unidentified marine bacterium were analyzed through their short genome sequences (AM260709–AM260713). Genome-to-Genome Distance (GGDC) showed high similarity to Pseudoalteromonas haloplanktis (X67024). The generated unique Quick Response (QR) codes indicated no identity to other microbial species or gene sequences. Chaos Game Representation (CGR) showed the number of bases concentrated in the area. Guanine residues were highest in number followed by cytosine. Frequency of Chaos Game Representation (FCGR) indicated that CC and GG blocks have higher frequency in the sequence from the evaluated marine bacterium strains. Maximum GC content for the marine bacterium strains ranged 53-54%. The use of QR codes, CGR, FCGR, and GC dataset helped in identifying and interpreting short genome sequences from specific isolates. A phylogenetic tree was constructed with the bootstrap test (1000 replicates) using MEGA6 software. Principal Component Analysis (PCA) was carried out using EMBL-EBI MUSCLE program. Thus, generated genomic data are of great assistance for hierarchical classification in Bacterial Systematics which combined with phenotypic features represents a basic procedure for a polyphasic approach on unambiguous bacterial isolate taxonomic classification. PMID:27882328

  2. Comparison of ribotyping and sequence-based typing for discriminating among isolates of Bordetella bronchiseptica

    USDA-ARS?s Scientific Manuscript database

    Aims: Our goal was to compare the discriminatory power of PvuII ribotyping and MLST using a single set of diverse Bordetella bronchiseptica isolates and to determine whether subtyping based on repeat region sequences of the pertactin gene (prn) provides additional resolution. Methods and Results: ...

  3. Comparison of Computer Vision and Photogrammetric Approaches for Epipolar Resampling of Image Sequence

    PubMed Central

    Kim, Jae-In; Kim, Taejung

    2016-01-01

    Epipolar resampling is the procedure of eliminating vertical disparity between stereo images. Due to its importance, many methods have been developed in the computer vision and photogrammetry field. However, we argue that epipolar resampling of image sequences, instead of a single pair, has not been studied thoroughly. In this paper, we compare epipolar resampling methods developed in both fields for handling image sequences. Firstly we briefly review the uncalibrated and calibrated epipolar resampling methods developed in computer vision and photogrammetric epipolar resampling methods. While it is well known that epipolar resampling methods developed in computer vision and in photogrammetry are mathematically identical, we also point out differences in parameter estimation between them. Secondly, we tested representative resampling methods in both fields and performed an analysis. We showed that for epipolar resampling of a single image pair all uncalibrated and photogrammetric methods tested could be used. More importantly, we also showed that, for image sequences, all methods tested, except the photogrammetric Bayesian method, showed significant variations in epipolar resampling performance. Our results indicate that the Bayesian method is favorable for epipolar resampling of image sequences. PMID:27011186

  4. Comparison of Synchronization Techniques for the AFIT Direct Sequence Spread Spectrum System

    DTIC Science & Technology

    1992-04-21

    13 Sklar , Bernard . Digital Communications - Fundamentals and Applications. Englewood Cliffs, N.J.: Prentice Hall, 1988. 14. Stephens, James P. Direct...Transmitted Reference Synchronization Diagram ..... ... 17 3. Baseband Digital Matched Filter ...... ............ .. 22 4. Delay Line Matched Filter...direct sequence technique uses a pseudo-noise (PN) digital code to directly modulate a conventional frequency modulated (FM) carrier in the AFIT system

  5. Next-Generation Sequencing of Aquatic Oligochaetes: Comparison of Experimental Communities

    PubMed Central

    Vivien, Régis; Lejzerowicz, Franck; Pawlowski, Jan

    2016-01-01

    Aquatic oligochaetes are a common group of freshwater benthic invertebrates known to be very sensitive to environmental changes and currently used as bioindicators in some countries. However, more extensive application of oligochaetes for assessing the ecological quality of sediments in watercourses and lakes would require overcoming the difficulties related to morphology-based identification of oligochaetes species. This study tested the Next-Generation Sequencing (NGS) of a standard cytochrome c oxydase I (COI) barcode as a tool for the rapid assessment of oligochaete diversity in environmental samples, based on mixed specimen samples. To know the composition of each sample we Sanger sequenced every specimen present in these samples. Our study showed that a large majority of OTUs (Operational Taxonomic Unit) could be detected by NGS analyses. We also observed congruence between the NGS and specimen abundance data for several but not all OTUs. Because the differences in sequence abundance data were consistent across samples, we exploited these variations to empirically design correction factors. We showed that such factors increased the congruence between the values of oligochaetes-based indices inferred from the NGS and the Sanger-sequenced specimen data. The validation of these correction factors by further experimental studies will be needed for the adaptation and use of NGS technology in biomonitoring studies based on oligochaete communities. PMID:26866802

  6. The 5'-flanking regions of three pea legumin genes: comparison of the DNA sequences.

    PubMed Central

    Lycett, G W; Croy, R R; Shirsat, A H; Richards, D M; Boulter, D

    1985-01-01

    Approximately 1200 nucleotides of sequence data from the promoter and 5'-flanking regions of each of three pea (Pisum sativum L.) legumin genes (legA, legB and legC) are presented. The promoter regions of all three genes were found to be identical including the "TATA box", and "CAAT box', and sequences showing homology to the SV40 enhancers. The legA sequence begins to diverge from the others about 300bp from the start codon, whereas the other two genes remain identical for another 550bp. The regions of partial homology exhibit deletions or insertions and some short, comparatively well conserved sequences. The significance of these features is discussed in terms of evolutionary mechanisms and their possible functional roles. The legC gene contains a region that may potentially form either of two mutually exclusive stem-loop structures, one of which has a stem 42bp long, which suggests that it could be fairly stable. We suggest that a mechanism of switching between such alternative structures may play some role in gene control or may represent the insertion of a transposable element. PMID:2997721

  7. Next-Generation Sequencing of Aquatic Oligochaetes: Comparison of Experimental Communities.

    PubMed

    Vivien, Régis; Lejzerowicz, Franck; Pawlowski, Jan

    2016-01-01

    Aquatic oligochaetes are a common group of freshwater benthic invertebrates known to be very sensitive to environmental changes and currently used as bioindicators in some countries. However, more extensive application of oligochaetes for assessing the ecological quality of sediments in watercourses and lakes would require overcoming the difficulties relat