Science.gov

Sample records for biological sequence comparison

  1. Method and apparatus for biological sequence comparison

    DOEpatents

    Marr, T.G.; Chang, W.I.

    1997-12-23

    A method and apparatus are disclosed for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence. 5 figs.

  2. Method and apparatus for biological sequence comparison

    DOEpatents

    Marr, Thomas G.; Chang, William I-Wei

    1997-01-01

    A method and apparatus for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence.

  3. Circular Helix-Like Curve: An Effective Tool of Biological Sequence Analysis and Comparison.

    PubMed

    Li, Yushuang; Xiao, Wenli

    2016-01-01

    This paper constructed a novel injection from a DNA sequence to a 3D graph, named circular helix-like curve (CHC). The presented graphical representation is available for visualizing characterizations of a single DNA sequence and identifying similarities and differences among several DNAs. A 12-dimensional vector extracted from CHC, as a numerical characterization of CHC, was applied to analyze phylogenetic relationships of 11 species, 74 ribosomal RNAs, 48 Hepatitis E viruses, and 18 eutherian mammals, respectively. Successful experiments illustrated that CHC is an effective tool of biological sequence analysis and comparison. PMID:27403205

  4. Circular Helix-Like Curve: An Effective Tool of Biological Sequence Analysis and Comparison

    PubMed Central

    Li, Yushuang

    2016-01-01

    This paper constructed a novel injection from a DNA sequence to a 3D graph, named circular helix-like curve (CHC). The presented graphical representation is available for visualizing characterizations of a single DNA sequence and identifying similarities and differences among several DNAs. A 12-dimensional vector extracted from CHC, as a numerical characterization of CHC, was applied to analyze phylogenetic relationships of 11 species, 74 ribosomal RNAs, 48 Hepatitis E viruses, and 18 eutherian mammals, respectively. Successful experiments illustrated that CHC is an effective tool of biological sequence analysis and comparison. PMID:27403205

  5. Comparison of biological and chemical phosphorus removals in continuous and sequencing batch reactors

    SciTech Connect

    Ketchum, L.H.; Irvine, R.L. Jr.; Breyfogle, R.E.; Manning, J.F. Jr.

    1987-01-01

    A full-scale study of phosphorus removal has been conducted at Culver using continuous-flow operation, SBR operation, and several different chemical treatment schemes. A full-scale demonstration of SBR biological phosphorus removal also has been shown to be effective. Four contributing groups of organisms and their roles in biological SBR phosphorus removal have been described: denitrifying organisms, fermentation product-manufacturing organisms, phosphorus- accumulating organisms, and aerobic autotrophs and heterotrophs. The SBR can provide the proper balance of anoxic, anaerobic, and aerobic conditions to allow these group of organisms to successfully remove phosphorus biologically, without chemical addition. Treatment results using various chemicals for phosphorus removal, both during conventional, continuous-flow operation and after the plant was converted for SBR operation, have also been provided for comparison. Effluent phosphorus concentrations were almost identical for each period, except for the period when phosphorus was removed biologically and without any chemical addition when effluent phosphorus concentrations were the lowest. These removals were made as a result of settling alone; no tertiary rapid stand filter was used or required.

  6. Zucchini yellow mosaic virus: biological properties, detection procedures and comparison of coat protein gene sequences.

    PubMed

    Coutts, B A; Kehoe, M A; Webster, C G; Wylie, S J; Jones, R A C

    2011-12-01

    Between 2006 and 2010, 5324 samples from at least 34 weed, two cultivated legume and 11 native species were collected from three cucurbit-growing areas in tropical or subtropical Western Australia. Two new alternative hosts of zucchini yellow mosaic virus (ZYMV) were identified, the Australian native cucurbit Cucumis maderaspatanus, and the naturalised legume species Rhyncosia minima. Low-level (0.7%) seed transmission of ZYMV was found in seedlings grown from seed collected from zucchini (Cucurbita pepo) fruit infected with isolate Cvn-1. Seed transmission was absent in >9500 pumpkin (C. maxima and C. moschata) seedlings from fruit infected with isolate Knx-1. Leaf samples from symptomatic cucurbit plants collected from fields in five cucurbit-growing areas in four Australian states were tested for the presence of ZYMV. When 42 complete coat protein (CP) nucleotide (nt) sequences from the new ZYMV isolates obtained were compared to those of 101 complete CP nt sequences from five other continents, phylogenetic analysis of the 143 ZYMV sequences revealed three distinct groups (A, B and C), with four subgroups in A (I-IV) and two in B (I-II). The new Australian sequences grouped according to collection location, fitting within A-I, A-II and B-II. The 16 new sequences from one isolated location in tropical northern Western Australia all grouped into subgroup B-II, which contained no other isolates. In contrast, the three sequences from the Northern Territory fitted into A-II with 94.6-99.0% nt identities with isolates from the United States, Iran, China and Japan. The 23 new sequences from the central west coast and two east coast locations all fitted into A-I, with 95.9-98.9% nt identities to sequences from Europe and Japan. These findings suggest that (i) there have been at least three separate ZYMV introductions into Australia and (ii) there are few changes to local isolate CP sequences following their establishment in remote growing areas. Isolates from A-I and B

  7. Bringing Next-Generation Sequencing into the Classroom through a Comparison of Molecular Biology Techniques

    ERIC Educational Resources Information Center

    Bowling, Bethany; Zimmer, Erin; Pyatt, Robert E.

    2014-01-01

    Although the development of next-generation (NextGen) sequencing technologies has revolutionized genomic research and medicine, the incorporation of these topics into the classroom is challenging, given an implied high degree of technical complexity. We developed an easy-to-implement, interactive classroom activity investigating the similarities…

  8. Indigenous and introduced potyviruses of legumes and Passiflora spp. from Australia: biological properties and comparison of coat protein sequences

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Coat protein sequences of 33 Potyvirus isolates from legume and Passiflora spp. were sequenced to determine the identity of infecting viruses. Phylogenetic analysis of the sequences revealed the presence of seven distinct virus species....

  9. Indigenous and introduced potyviruses of legumes and Passiflora spp. from Australia: biological properties and comparison of coat protein nucleotide sequences.

    PubMed

    Coutts, Brenda A; Kehoe, Monica A; Webster, Craig G; Wylie, Stephen J; Jones, Roger A C

    2011-10-01

    Five Australian potyviruses, passion fruit woodiness virus (PWV), passiflora mosaic virus (PaMV), passiflora virus Y, clitoria chlorosis virus (ClCV) and hardenbergia mosaic virus (HarMV), and two introduced potyviruses, bean common mosaic virus (BCMV) and cowpea aphid-borne mosaic virus (CAbMV), were detected in nine wild or cultivated Passiflora and legume species growing in tropical, subtropical or Mediterranean climatic regions of Western Australia. When ClCV (1), PaMV (1), PaVY (8) and PWV (5) isolates were inoculated to 15 plant species, PWV and two PaVY P. foetida isolates infected P. edulis and P. caerulea readily but legumes only occasionally. Another PaVY P. foetida isolate resembled five PaVY legume isolates in infecting legumes readily but not infecting P. edulis. PaMV resembled PaVY legume isolates in legumes but also infected P. edulis. ClCV did not infect P. edulis or P. caerulea and behaved differently from PaVY legume isolates and PaMV when inoculated to two legume species. When complete coat protein (CP) nucleotide (nt) sequences of 33 new isolates were compared with 41 others, PWV (8), HarMV (4), PaMV (1) and ClCV (1) were within a large group of Australian isolates, while PaVY (14), CAbMV (1) and BCMV (3) isolates were in three other groups. Variation among PWV and PaVY isolates was sufficient for division into four clades each (I-IV). A variable block of 56 amino acid residues at the N-terminal region of the CPs of PaMV and ClCV distinguished them from PWV. Comparison of PWV, PaMV and ClCV CP sequences showed that nt identities were both above and below the 76-77% potyvirus species threshold level. This research gives insights into invasion of new hosts by potyviruses at the natural vegetation and cultivated area interface, and illustrates the potential of indigenous viruses to emerge to infect introduced plants. PMID:21744001

  10. Biological sequence classification with multivariate string kernels.

    PubMed

    Kuksa, Pavel P

    2013-01-01

    String kernel-based machine learning methods have yielded great success in practical tasks of structured/sequential data analysis. They often exhibit state-of-the-art performance on many practical tasks of sequence analysis such as biological sequence classification, remote homology detection, or protein superfamily and fold prediction. However, typical string kernel methods rely on the analysis of discrete 1D string data (e.g., DNA or amino acid sequences). In this paper, we address the multiclass biological sequence classification problems using multivariate representations in the form of sequences of features vectors (as in biological sequence profiles, or sequences of individual amino acid physicochemical descriptors) and a class of multivariate string kernels that exploit these representations. On three protein sequence classification tasks, the proposed multivariate representations and kernels show significant 15-20 percent improvements compared to existing state-of-the-art sequence classification methods. PMID:24384708

  11. Biological Sequence Analysis with Multivariate String Kernels.

    PubMed

    Kuksa, Pavel P

    2013-03-01

    String kernel-based machine learning methods have yielded great success in practical tasks of structured/sequential data analysis. They often exhibit state-of-the-art performance on many practical tasks of sequence analysis such as biological sequence classification, remote homology detection, or protein superfamily and fold prediction. However, typical string kernel methods rely on analysis of discrete one-dimensional (1D) string data (e.g., DNA or amino acid sequences). In this work we address the multi-class biological sequence classification problems using multivariate representations in the form of sequences of features vectors (as in biological sequence profiles, or sequences of individual amino acid physico-chemical descriptors) and a class of multivariate string kernels that exploit these representations. On a number of protein sequence classification tasks proposed multivariate representations and kernels show significant 15-20\\% improvements compared to existing state-of-the-art sequence classification methods. PMID:23509193

  12. Adaptive seeds tame genomic sequence comparison.

    PubMed

    Kiełbasa, Szymon M; Wan, Raymond; Sato, Kengo; Horton, Paul; Frith, Martin C

    2011-03-01

    The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The difficulty is caused by the nonuniform (oligo)nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e.g., BLAST) to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition. PMID:21209072

  13. Meeting Highlights: Genome Sequencing and Biology 2001

    PubMed Central

    2001-01-01

    We bring you a report from the CSHL Genome Sequencing and Biology Meeting, which has a long and prestigious history. This year there were sessions on large-scale sequencing and analysis, polymorphisms (covering discovery and technologies and mapping and analysis), comparative genomics of mammalian and model organism genomes, functional genomics and bioinformatics. PMID:18628920

  14. Sequence comparison via polar coordinates representation and curve tree.

    PubMed

    Dai, Qi; Guo, Xiaodong; Li, Lihua

    2012-01-01

    Sequence comparison has become one of the essential bioinformatics tools in bioinformatics research, which could serve as evidence of structural and functional conservation, as well as of evolutionary relations among the sequences. Existing graphical representation methods have achieved promising results in sequence comparison, but there are some design challenges with the graphical representations and feature-based measures. We reported here a new method for sequence comparison. It considers whole distribution of dual bases and employs polar coordinates method to map a biological sequence into a closed curve. The curve tree was then constructed to numerically characterize the closed curve of biological sequences, and further compared biological sequences by evaluating the distance of the curve tree of the query sequence matching against a corresponding curve tree of the template sequence. The proposed method was tested by phylogenetic analysis, and its performance was further compared with alignment-based methods. The results demonstrate that using polar coordinates representation and curve tree to compare sequences is more efficient. PMID:22001081

  15. Function-Based Algorithms for Biological Sequences

    ERIC Educational Resources Information Center

    Mohanty, Pragyan Sheela P.

    2015-01-01

    Two problems at two different abstraction levels of computational biology are studied. At the molecular level, efficient pattern matching algorithms in DNA sequences are presented. For gene order data, an efficient data structure is presented capable of storing all gene re-orderings in a systematic manner. A common characteristic of presented…

  16. Learning Interpretable SVMs for Biological Sequence Classification

    PubMed Central

    Rätsch, Gunnar; Sonnenburg, Sören; Schäfer, Christin

    2006-01-01

    Background Support Vector Machines (SVMs) – using a variety of string kernels – have been successfully applied to biological sequence classification problems. While SVMs achieve high classification accuracy they lack interpretability. In many applications, it does not suffice that an algorithm just detects a biological signal in the sequence, but it should also provide means to interpret its solution in order to gain biological insight. Results We propose novel and efficient algorithms for solving the so-called Support Vector Multiple Kernel Learning problem. The developed techniques can be used to understand the obtained support vector decision function in order to extract biologically relevant knowledge about the sequence analysis problem at hand. We apply the proposed methods to the task of acceptor splice site prediction and to the problem of recognizing alternatively spliced exons. Our algorithms compute sparse weightings of substring locations, highlighting which parts of the sequence are important for discrimination. Conclusion The proposed method is able to deal with thousands of examples while combining hundreds of kernels within reasonable time, and reliably identifies a few statistically significant positions. PMID:16723012

  17. Protein sequence comparison and protein evolution

    SciTech Connect

    Pearson, W.R.

    1995-12-31

    This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. This tutorial examines how the information conserved during the evolution of a protein molecule can be used to infer reliably homology, and thus a shared proteinfold and possibly a shared active site or function. The authors start by reviewing a geological/evolutionary time scale. Next they look at the evolution of several protein families. During the tutorial, these families will be used to demonstrate that homologous protein ancestry can be inferred with confidence. They also examine different modes of protein evolution and consider some hypotheses that have been presented to explain the very earliest events in protein evolution. The next part of the tutorial will examine the technical aspects of protein sequence comparison. Both optimal and heuristic algorithms and their associated parameters that are used to characterize protein sequence similarities are discussed. Perhaps more importantly, they survey the statistics of local similarity scores, and how these statistics can both be used to improve the selectivity of a search and to evaluate the significance of a match. They them examine distantly related members of three protein families, the serine proteases, the glutathione transferases, and the G-protein-coupled receptors (GCRs). Finally, the discuss how sequence similarity can be used to examine internal repeated or mosaic structures in proteins.

  18. Fungal genome sequencing: basic biology to biotechnology.

    PubMed

    Sharma, Krishna Kant

    2016-08-01

    The genome sequences provide a first glimpse into the genomic basis of the biological diversity of filamentous fungi and yeast. The genome sequence of the budding yeast, Saccharomyces cerevisiae, with a small genome size, unicellular growth, and rich history of genetic and molecular analyses was a milestone of early genomics in the 1990s. The subsequent completion of fission yeast, Schizosaccharomyces pombe and genetic model, Neurospora crassa initiated a revolution in the genomics of the fungal kingdom. In due course of time, a substantial number of fungal genomes have been sequenced and publicly released, representing the widest sampling of genomes from any eukaryotic kingdom. An ambitious genome-sequencing program provides a wealth of data on metabolic diversity within the fungal kingdom, thereby enhancing research into medical science, agriculture science, ecology, bioremediation, bioenergy, and the biotechnology industry. Fungal genomics have higher potential to positively affect human health, environmental health, and the planet's stored energy. With a significant increase in sequenced fungal genomes, the known diversity of genes encoding organic acids, antibiotics, enzymes, and their pathways has increased exponentially. Currently, over a hundred fungal genome sequences are publicly available; however, no inclusive review has been published. This review is an initiative to address the significance of the fungal genome-sequencing program and provides the road map for basic and applied research. PMID:25721271

  19. Reading biological processes from nucleotide sequences

    NASA Astrophysics Data System (ADS)

    Murugan, Anand

    Cellular processes have traditionally been investigated by techniques of imaging and biochemical analysis of the molecules involved. The recent rapid progress in our ability to manipulate and read nucleic acid sequences gives us direct access to the genetic information that directs and constrains biological processes. While sequence data is being used widely to investigate genotype-phenotype relationships and population structure, here we use sequencing to understand biophysical mechanisms. We present work on two different systems. First, in chapter 2, we characterize the stochastic genetic editing mechanism that produces diverse T-cell receptors in the human immune system. We do this by inferring statistical distributions of the underlying biochemical events that generate T-cell receptor coding sequences from the statistics of the observed sequences. This inferred model quantitatively describes the potential repertoire of T-cell receptors that can be produced by an individual, providing insight into its potential diversity and the probability of generation of any specific T-cell receptor. Then in chapter 3, we present work on understanding the functioning of regulatory DNA sequences in both prokaryotes and eukaryotes. Here we use experiments that measure the transcriptional activity of large libraries of mutagenized promoters and enhancers and infer models of the sequence-function relationship from this data. For the bacterial promoter, we infer a physically motivated 'thermodynamic' model of the interaction of DNA-binding proteins and RNA polymerase determining the transcription rate of the downstream gene. For the eukaryotic enhancers, we infer heuristic models of the sequence-function relationship and use these models to find synthetic enhancer sequences that optimize inducibility of expression. Both projects demonstrate the utility of sequence information in conjunction with sophisticated statistical inference techniques for dissecting underlying biophysical

  20. The computational linguistics of biological sequences

    SciTech Connect

    Searls, D.

    1995-12-31

    This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. Protein sequences are analogous in many respects, particularly their folding behavior. Proteins have a much richer variety of interactions, but in theory the same linguistic principles could come to bear in describing dependencies between distant residues that arise by virtue of three-dimensional structure. This tutorial will concentrate on nucleic acid sequences.

  1. A simple method for global sequence comparison.

    PubMed Central

    Pizzi, E; Attimonelli, M; Liuni, S; Frontali, C; Saccone, C

    1992-01-01

    A simple method of sequence comparison, based on a correlation analysis of oligonucleotide frequency distributions, is here shown to be a reliable test of overall sequence similarity. The method does not involve sequence alignment procedures and permits the rapid screening of large amounts of sequence data. It identifies those sequences which deserve more careful analysis of sequence similarity at the level of resolution of the single nucleotide. It uses observed quantities only and does not involve the adoption of any theoretical model. PMID:1738591

  2. Frequent patterns mining in multiple biological sequences.

    PubMed

    Chen, Ling; Liu, Wei

    2013-10-01

    Existing algorithms for mining frequent patterns in multiple biosequences may generate multiple projected databases and short candidate patterns, which can increase computation time and memory requirement. In order to overcome such shortcomings, we propose a fast and efficient algorithm for mining frequent patterns in multiple biological sequences (MSPM). We first present the concept of a primary pattern, which can be extended to form larger patterns in the sequence. To detect frequent primary patterns, a prefix tree is constructed. Based on this prefix tree, a pattern-extending approach is also presented to mine frequent patterns without producing a large number of irrelevant candidate patterns. The experimental results show that the MSPM algorithm can achieve not only faster speed, but also higher quality results as compared with other methods. PMID:24034736

  3. Intra-species sequence comparisons for annotating genomes

    SciTech Connect

    Boffelli, Dario; Weer, Claire V.; Weng, Li; Lewis, Keith D.; Shoukry, Malak I.; Pachter, Lior; Keys, David N.; Rubin, Edward M.

    2004-07-15

    Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intra-species sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents and a set of genomic intervals amplified, resequenced and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom and raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. The sequence data from this study has been submitted to GenBank under accession nos. AY667278-AY667407.

  4. A natural M RNA reassortant arising from two species of plant-and-insect-infecting bunyaviruses and comparison of its sequences and biological properties to parental species

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Nucleic acid sequencing of viruses collected from Florida tomatoes contained a reassortment of two distinct species with the S and L segments belonging to Groundnut ringspot virus (GRSV), whereas the M segment belonged to tomato chlorotic spot virus (TCSV). Inoculations on resistance gene containin...

  5. Classification and identification of geminiviruses using sequence comparisons.

    PubMed

    Padidam, M; Beachy, R N; Fauquet, C M

    1995-02-01

    The genomes and ORFs of 36 geminiviruses were compared to obtain phylogenetic trees and frequency distributions of all possible pairwise comparisons with an objective to classify geminiviruses. Such comparisons show that geminiviruses form two distinct clusters of leafhopper-transmitted viruses that infect monocots (subgroup I) and whitefly-transmitted viruses that infect dicots (subgroup III), irrespective of the part of the genome considered. Of the two leafhopper-transmitted viruses that infect dicots, tobacco yellow dwarf virus has a sequence most similar to subgroup I viruses, and that of beet curly top virus differed depending upon the ORF considered. The distributions of identities within subgroups are significantly different suggesting that the taxonomic status of a particular isolate within a subgroup can be quantified. All the recognized strains of any one virus have greater than 90% sequence identity. It was observed that the 200 nucleotide intercistronic regions of geminiviruses are more variable than the remainder of the genome. The amino acid sequences of the coat protein (CP) of subgroup III viruses are more conserved than the remainder of the genome. However, a short N-terminal region (60-70 amino acids) of the CP is more variable than the rest of the CP sequence and is a close representation of the genome. PCR primers based on conserved sequences can be used to clone and sequence the N-terminal sequences of the CP of the geminiviruses; this sequence is sufficient to classify a virus isolate. A possible taxonomic structure for geminiviruses is proposed after considering the sequence comparisons and biological properties. PMID:7844548

  6. A Parallel Non-Alignment Based Approach to Efficient Sequence Comparison using Longest Common Subsequences

    NASA Astrophysics Data System (ADS)

    Bhowmick, S.; Shafiullah, M.; Rai, H.; Bastola, D.

    2010-11-01

    Biological sequence comparison programs have revolutionized the practice of biochemistry, and molecular and evolutionary biology. Pairwise comparison of genomic sequences is a popular method of choice for analyzing genetic sequence data. However the quality of results from most sequence comparison methods are significantly affected by small perturbations in the data and furthermore, there is a dearth of computational tools to compare sequences beyond a certain length. In this paper, we describe a parallel algorithm for comparing genetic sequences using an alignment free-method based on computing the Longest Common Subsequence (LCS) between genetic sequences. We validate the quality of our results by comparing the phylogenetic tress obtained from ClustalW and LCS. We also show through complexity analysis of the isoefficiency and by empirical measurement of the running time that our algorithm is very scalable.

  7. Mining frequent biological sequences based on bitmap without candidate sequence generation.

    PubMed

    Wang, Qian; Davis, Darryl N; Ren, Jiadong

    2016-02-01

    Biological sequences carry a lot of important genetic information of organisms. Furthermore, there is an inheritance law related to protein function and structure which is useful for applications such as disease prediction. Frequent sequence mining is a core technique for association rule discovery, but existing algorithms suffer from low efficiency or poor error rate because biological sequences differ from general sequences with more characteristics. In this paper, an algorithm for mining Frequent Biological Sequence based on Bitmap, FBSB, is proposed. FBSB uses bitmaps as the simple data structure and transforms each row into a quicksort list QS-list for sequence growth. For the continuity and accuracy requirement of biological sequence mining, tested sequences used during the mining process of FBSB are real ones instead of generated candidates, and all the frequent sequences can be mined without any errors. Comparing with other algorithms, the experimental results show that FBSB can achieve a better performance on both run time and scalability. PMID:26773937

  8. A prime number approach to biological sequencing.

    PubMed

    Greer, W; Barrett, A N; Sowden, J M

    1985-03-01

    Computational sequencing of nucleic acid and amino acid sequences is placing increasing demands on computer resources. The use of prime numbers is explored as a convenient means of improving program speed and reducing storage requirements. It is concluded that the application of the prime number approach leads to significant increases in speed and some reduction in storage requirements. PMID:3840126

  9. The DNA sequence and biological annotation of human chromosome 1.

    PubMed

    Gregory, S G; Barlow, K F; McLay, K E; Kaul, R; Swarbreck, D; Dunham, A; Scott, C E; Howe, K L; Woodfine, K; Spencer, C C A; Jones, M C; Gillson, C; Searle, S; Zhou, Y; Kokocinski, F; McDonald, L; Evans, R; Phillips, K; Atkinson, A; Cooper, R; Jones, C; Hall, R E; Andrews, T D; Lloyd, C; Ainscough, R; Almeida, J P; Ambrose, K D; Anderson, F; Andrew, R W; Ashwell, R I S; Aubin, K; Babbage, A K; Bagguley, C L; Bailey, J; Beasley, H; Bethel, G; Bird, C P; Bray-Allen, S; Brown, J Y; Brown, A J; Buckley, D; Burton, J; Bye, J; Carder, C; Chapman, J C; Clark, S Y; Clarke, G; Clee, C; Cobley, V; Collier, R E; Corby, N; Coville, G J; Davies, J; Deadman, R; Dunn, M; Earthrowl, M; Ellington, A G; Errington, H; Frankish, A; Frankland, J; French, L; Garner, P; Garnett, J; Gay, L; Ghori, M R J; Gibson, R; Gilby, L M; Gillett, W; Glithero, R J; Grafham, D V; Griffiths, C; Griffiths-Jones, S; Grocock, R; Hammond, S; Harrison, E S I; Hart, E; Haugen, E; Heath, P D; Holmes, S; Holt, K; Howden, P J; Hunt, A R; Hunt, S E; Hunter, G; Isherwood, J; James, R; Johnson, C; Johnson, D; Joy, A; Kay, M; Kershaw, J K; Kibukawa, M; Kimberley, A M; King, A; Knights, A J; Lad, H; Laird, G; Lawlor, S; Leongamornlert, D A; Lloyd, D M; Loveland, J; Lovell, J; Lush, M J; Lyne, R; Martin, S; Mashreghi-Mohammadi, M; Matthews, L; Matthews, N S W; McLaren, S; Milne, S; Mistry, S; Moore, M J F; Nickerson, T; O'Dell, C N; Oliver, K; Palmeiri, A; Palmer, S A; Parker, A; Patel, D; Pearce, A V; Peck, A I; Pelan, S; Phelps, K; Phillimore, B J; Plumb, R; Rajan, J; Raymond, C; Rouse, G; Saenphimmachak, C; Sehra, H K; Sheridan, E; Shownkeen, R; Sims, S; Skuce, C D; Smith, M; Steward, C; Subramanian, S; Sycamore, N; Tracey, A; Tromans, A; Van Helmond, Z; Wall, M; Wallis, J M; White, S; Whitehead, S L; Wilkinson, J E; Willey, D L; Williams, H; Wilming, L; Wray, P W; Wu, Z; Coulson, A; Vaudin, M; Sulston, J E; Durbin, R; Hubbard, T; Wooster, R; Dunham, I; Carter, N P; McVean, G; Ross, M T; Harrow, J; Olson, M V; Beck, S; Rogers, J; Bentley, D R; Banerjee, R; Bryant, S P; Burford, D C; Burrill, W D H; Clegg, S M; Dhami, P; Dovey, O; Faulkner, L M; Gribble, S M; Langford, C F; Pandian, R D; Porter, K M; Prigmore, E

    2006-05-18

    The reference sequence for each human chromosome provides the framework for understanding genome function, variation and evolution. Here we report the finished sequence and biological annotation of human chromosome 1. Chromosome 1 is gene-dense, with 3,141 genes and 991 pseudogenes, and many coding sequences overlap. Rearrangements and mutations of chromosome 1 are prevalent in cancer and many other diseases. Patterns of sequence variation reveal signals of recent selection in specific genes that may contribute to human fitness, and also in regions where no function is evident. Fine-scale recombination occurs in hotspots of varying intensity along the sequence, and is enriched near genes. These and other studies of human biology and disease encoded within chromosome 1 are made possible with the highly accurate annotated sequence, as part of the completed set of chromosome sequences that comprise the reference human genome. PMID:16710414

  10. RCARE: RNA Sequence Comparison and Annotation for RNA Editing

    PubMed Central

    2015-01-01

    The post-transcriptional sequence modification of transcripts through RNA editing is an important mechanism for regulating protein function and is associated with human disease phenotypes. The identification of RNA editing or RNA-DNA difference (RDD) sites is a fundamental step in the study of RNA editing. However, a substantial number of false-positive RDD sites have been identified recently. A major challenge in identifying RDD sites is to distinguish between the true RNA editing sites and the false positives. Furthermore, determining the location of condition-specific RDD sites and elucidating their functional roles will help toward understanding various biological phenomena that are mediated by RNA editing. The present study developed RNA-sequence comparison and annotation for RNA editing (RCARE) for searching, annotating, and visualizing RDD sites using thousands of previously known editing sites, which can be used for comparative analyses between multiple samples. RCARE also provides evidence for improving the reliability of identified RDD sites. RCARE is a web-based comparison, annotation, and visualization tool, which provides rich biological annotations and useful summary plots. The developers of previous tools that identify or annotate RNA-editing sites seldom mention the reliability of their respective tools. In order to address the issue, RCARE utilizes a number of scientific publications and databases to find specific documentations respective to a particular RNA-editing site, which generates evidence levels to convey the reliability of RCARE. Sequence-based alignment files can be converted into VCF files using a Python script and uploaded to the RCARE server for further analysis. RCARE is available for free at http://www.snubi.org/software/rcare/. PMID:26043858

  11. Bioinformatics comparison of sulfate-reducing metabolism nucleotide sequences

    NASA Astrophysics Data System (ADS)

    Tremberger, G.; Dehipawala, Sunil; Nguyen, A.; Cheung, E.; Sullivan, R.; Holden, T.; Lieberman, D.; Cheung, T.

    2015-09-01

    The sulfate-reducing bacteria can be traced back to 3.5 billion years ago. The thermodynamics details of the sulfur cycle have been well documented. A recent sulfate-reducing bacteria report (Robator, Jungbluth, et al , 2015 Jan, Front. Microbiol) with Genbank nucleotide data has been analyzed in terms of the sulfite reductase (dsrAB) via fractal dimension and entropy values. Comparison to oil field sulfate-reducing sequences was included. The AUCG translational mass fractal dimension versus ATCG transcriptional mass fractal dimension for the low temperature dsrB and dsrA sequences reported in Reference Thirteen shows correlation R-sq ~ 0.79 , with a probably of about 3% in simulation. A recent report of using Cystathionine gamma-lyase sequence to produce CdS quantum dot in a biological method, where the sulfur is reduced just like in the H2S production process, was included for comparison. The AUCG mass fractal dimension versus ATCG mass fractal dimension for the Cystathionine gamma-lyase sequences was found to have R-sq of 0.72, similar to the low temperature dissimilatory sulfite reductase dsr group with 3% probability, in contrary to the oil field group having R-sq ~ 0.94, a high probable outcome in the simulation. The other two simulation histograms, namely, fractal dimension versus entropy R-sq outcome values, and di-nucleotide entropy versus mono-nucleotide entropy R-sq outcome values are also discussed in the data analysis focusing on low probability outcomes.

  12. Biological Processes Discovered by High-Throughput Sequencing.

    PubMed

    Reon, Brian J; Dutta, Anindya

    2016-04-01

    Advances in DNA and RNA sequencing technologies have completely transformed the field of genomics. High-throughput sequencing (HTS) is now a widely used and accessible technology that allows scientists to sequence an entire transcriptome or genome in a timely and cost-effective manner. Application of HTS techniques has led to many key discoveries, including the identification of long noncoding RNAs, microDNAs, a family of small extrachromosomal circular DNA species, and tRNA-derived fragments, which are a group of small non-miRNAs that are derived from tRNAs. Furthermore, public sequencing repositories provide unique opportunities for laboratories to parse large sequencing databases to identify proteins and noncoding RNAs at a scale that was not possible a decade ago. Herein, we review how HTS has led to the discovery of novel nucleic acid species and uncovered new biological processes during the course. PMID:26828742

  13. Local predictability in biological sequences, algorithm and applications.

    PubMed

    Lebbe, J; Vignes, R

    1993-01-01

    The goal of this paper is to propose an algorithm based on the k nearest neighbours to compute a local predictability measure in biological sequences. Some ideas about the usefulness of this measure are discussed on the basis of preliminary experimentations. PMID:8347724

  14. Reads meet rotamers: structural biology in the age of deep sequencing.

    PubMed

    Sethi, Anurag; Clarke, Declan; Chen, Jieming; Kumar, Sushant; Galeev, Timur R; Regan, Lynne; Gerstein, Mark

    2015-12-01

    Structure has traditionally been interrelated with sequence, usually in the framework of comparing sequences across species sharing a common fold. However, the nature of information within the sequence and structure databases is evolving, changing the type of comparisons possible. In particular, we now have a vast amount of personal genome sequences from human populations and a greater fraction of new structures contain interacting proteins within large complexes. Consequently, we have to recast our conception of sequence conservation and its relation to structure-for example, focusing more on selection within the human population. Moreover, within structural biology there is less emphasis on the discovery of novel folds and more on relating structures to networks of protein interactions. We cover this changing mindset here. PMID:26658741

  15. Computing distribution of scale independent motifs in biological sequences

    PubMed Central

    Almeida, Jonas S; Vinga, Susana

    2006-01-01

    The use of Chaos Game Representation (CGR) or its generalization, Universal Sequence Maps (USM), to describe the distribution of biological sequences has been found objectionable because of the fractal structure of that coordinate system. Consequently, the investigation of distribution of symbolic motifs at multiple scales is hampered by an inexact association between distance and sequence dissimilarity. A solution to this problem could unleash the use of iterative maps as phase-state representation of sequences where its statistical properties can be conveniently investigated. In this study a family of kernel density functions is described that accommodates the fractal nature of iterative function representations of symbolic sequences and, consequently, enables the exact investigation of sequence motifs of arbitrary lengths in that scale-independent representation. Furthermore, the proposed kernel density includes both Markovian succession and currently used alignment-free sequence dissimilarity metrics as special solutions. Therefore, the fractal kernel described is in fact a generalization that provides a common framework for a diverse suite of sequence analysis techniques. PMID:17049089

  16. Comparison of Next-Generation Sequencing Systems

    PubMed Central

    Liu, Lin; Li, Yinhu; Li, Siliang; Hu, Ni; He, Yimin; Pong, Ray; Lin, Danni; Lu, Lihua; Law, Maggie

    2012-01-01

    With fast development and wide applications of next-generation sequencing (NGS) technologies, genomic sequence information is within reach to aid the achievement of goals to decode life mysteries, make better crops, detect pathogens, and improve life qualities. NGS systems are typically represented by SOLiD/Ion Torrent PGM from Life Sciences, Genome Analyzer/HiSeq 2000/MiSeq from Illumina, and GS FLX Titanium/GS Junior from Roche. Beijing Genomics Institute (BGI), which possesses the world's biggest sequencing capacity, has multiple NGS systems including 137 HiSeq 2000, 27 SOLiD, one Ion Torrent PGM, one MiSeq, and one 454 sequencer. We have accumulated extensive experience in sample handling, sequencing, and bioinformatics analysis. In this paper, technologies of these systems are reviewed, and first-hand data from extensive experience is summarized and analyzed to discuss the advantages and specifics associated with each sequencing system. At last, applications of NGS are summarized. PMID:22829749

  17. Identifying features in biological sequences: Sixth workshop report

    SciTech Connect

    Burks, C.; Myers, E.; Pearson, W.R.

    1995-12-31

    This report covers the sixth of an annual series of workshops held at the Aspen Center for Physics concentrating particularly on the identification of features in DNA sequence, and more broadly on related topics in computational molecular biology. The workshop series originally focused primarily on discussion of current needs and future strategies for identifying and predicting the presence of complex functional units on sequenced, but otherwise uncharacterized, genomic DNA. We addressed the need for computationally-based, automatic tools for synthesizing available data about individual consensus sequences and local compositional patterns into the composite objects (e.g., genes) that are -- as composite entities -- the true object of interest when scanning DNA sequences. The workshop was structured to promote sustained informal contact and exchange of expertise between molecular biologists, computer scientists, and mathematicians.

  18. CloneQC: lightweight sequence verification for synthetic biology

    PubMed Central

    Lee, Pablo A.; Dymond, Jessica S.; Scheifele, Lisa Z.; Richardson, Sarah M.; Foelber, Katrina J.; Boeke, Jef D.; Bader, Joel S.

    2010-01-01

    Synthetic biology projects aim to produce physical DNA that matches a designed target sequence. Chemically synthesized oligomers are generally used as the starting point for building larger and larger sequences. Due to the error rate of chemical synthesis, these oligomers can have many differences from the target sequence. As oligomers are joined together to make larger and larger synthetic intermediates, it becomes essential to perform quality control to eliminate intermediates with errors and retain only those DNA molecules that are error free with respect to the target. This step is often performed by transforming bacteria with synthetic DNA and sequencing colonies until a clone with a perfect sequence is identified. Here we present CloneQC, a lightweight software pipeline available as a free web server and as source code that performs quality control on sequenced clones. Input to the server is a list of desired sequences and forward and reverse reads for each clone. The server generates summary statistics (error rates and success rates target-by-target) and a detailed report of perfect clones. This software will be useful to laboratories conducting in-house DNA synthesis and is available at http://cloneqc.thruhere.net/ and as Berkeley Software Distribution (BSD) licensed source. PMID:20211841

  19. Legume genomics: understanding biology through DNA and RNA sequencing

    PubMed Central

    O'Rourke, Jamie A.; Bolon, Yung-Tsi; Bucciarelli, Bruna; Vance, Carroll P.

    2014-01-01

    Background The legume family (Leguminosae) consists of approx. 17 000 species. A few of these species, including, but not limited to, Phaseolus vulgaris, Cicer arietinum and Cajanus cajan, are important dietary components, providing protein for approx. 300 million people worldwide. Additional species, including soybean (Glycine max) and alfalfa (Medicago sativa), are important crops utilized mainly in animal feed. In addition, legumes are important contributors to biological nitrogen, forming symbiotic relationships with rhizobia to fix atmospheric N2 and providing up to 30 % of available nitrogen for the next season of crops. The application of high-throughput genomic technologies including genome sequencing projects, genome re-sequencing (DNA-seq) and transcriptome sequencing (RNA-seq) by the legume research community has provided major insights into genome evolution, genomic architecture and domestication. Scope and Conclusions This review presents an overview of the current state of legume genomics and explores the role that next-generation sequencing technologies play in advancing legume genomics. The adoption of next-generation sequencing and implementation of associated bioinformatic tools has allowed researchers to turn each species of interest into their own model organism. To illustrate the power of next-generation sequencing, an in-depth overview of the transcriptomes of both soybean and white lupin (Lupinus albus) is provided. The soybean transcriptome focuses on analysing seed development in two near-isogenic lines, examining the role of transporters, oil biosynthesis and nitrogen utilization. The white lupin transcriptome analysis examines how phosphate deficiency alters gene expression patterns, inducing the formation of cluster roots. Such studies illustrate the power of next-generation sequencing and bioinformatic analyses in elucidating the gene networks underlying biological processes. PMID:24769535

  20. Recognition of Yeast Species from Gene Sequence Comparisons

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This review discusses recognition of yeast species from gene sequence comparisons, which have been responsible for doubling the number of known yeasts over the past decade. The resolution provided by various single gene sequences is examined for both ascomycetous and basidiomycetous species, and th...

  1. Pegasys: software for executing and integrating analyses of biological sequences

    PubMed Central

    Shah, Sohrab P; He, David YM; Sawkins, Jessica N; Druce, Jeffrey C; Quon, Gerald; Lett, Drew; Zheng, Grace XY; Xu, Tao; Ouellette, BF Francis

    2004-01-01

    Background We present Pegasys – a flexible, modular and customizable software system that facilitates the execution and data integration from heterogeneous biological sequence analysis tools. Results The Pegasys system includes numerous tools for pair-wise and multiple sequence alignment, ab initio gene prediction, RNA gene detection, masking repetitive sequences in genomic DNA as well as filters for database formatting and processing raw output from various analysis tools. We introduce a novel data structure for creating workflows of sequence analyses and a unified data model to store its results. The software allows users to dynamically create analysis workflows at run-time by manipulating a graphical user interface. All non-serial dependent analyses are executed in parallel on a compute cluster for efficiency of data generation. The uniform data model and backend relational database management system of Pegasys allow for results of heterogeneous programs included in the workflow to be integrated and exported into General Feature Format for further analyses in GFF-dependent tools, or GAME XML for import into the Apollo genome editor. The modularity of the design allows for new tools to be added to the system with little programmer overhead. The database application programming interface allows programmatic access to the data stored in the backend through SQL queries. Conclusions The Pegasys system enables biologists and bioinformaticians to create and manage sequence analysis workflows. The software is released under the Open Source GNU General Public License. All source code and documentation is available for download at . PMID:15096276

  2. Effective Automated Feature Construction and Selection for Classification of Biological Sequences

    PubMed Central

    Kamath, Uday; De Jong, Kenneth; Shehu, Amarda

    2014-01-01

    Background Many open problems in bioinformatics involve elucidating underlying functional signals in biological sequences. DNA sequences, in particular, are characterized by rich architectures in which functional signals are increasingly found to combine local and distal interactions at the nucleotide level. Problems of interest include detection of regulatory regions, splice sites, exons, hypersensitive sites, and more. These problems naturally lend themselves to formulation as classification problems in machine learning. When classification is based on features extracted from the sequences under investigation, success is critically dependent on the chosen set of features. Methodology We present an algorithmic framework (EFFECT) for automated detection of functional signals in biological sequences. We focus here on classification problems involving DNA sequences which state-of-the-art work in machine learning shows to be challenging and involve complex combinations of local and distal features. EFFECT uses a two-stage process to first construct a set of candidate sequence-based features and then select a most effective subset for the classification task at hand. Both stages make heavy use of evolutionary algorithms to efficiently guide the search towards informative features capable of discriminating between sequences that contain a particular functional signal and those that do not. Results To demonstrate its generality, EFFECT is applied to three separate problems of importance in DNA research: the recognition of hypersensitive sites, splice sites, and ALU sites. Comparisons with state-of-the-art algorithms show that the framework is both general and powerful. In addition, a detailed analysis of the constructed features shows that they contain valuable biological information about DNA architecture, allowing biologists and other researchers to directly inspect the features and potentially use the insights obtained to assist wet-laboratory studies on retainment or

  3. High-resolution network biology: connecting sequence with function

    PubMed Central

    Ryan, Colm J.; Cimermančič, Peter; Szpiech, Zachary A.; Sali, Andrej; Hernandez, Ryan D.; Krogan, Nevan J.

    2014-01-01

    Proteins are not monolithic entities; rather, they can contain multiple domains that mediate distinct interactions, and their functionality can be regulated through post-translational modifications at multiple distinct sites. Traditionally, network biology has ignored such properties of proteins and has instead examined either the physical interactions of whole proteins or the consequences of removing entire genes. In this Review, we discuss experimental and computational methods to increase the resolution of protein– protein, genetic and drug–gene interaction studies to the domain and residue levels. Such work will be crucial for using interaction networks to connect sequence and structural information, and to understand the biological consequences of disease-associated mutations, which will hopefully lead to more effective therapeutic strategies. PMID:24197012

  4. Biological nanopore MspA for DNA sequencing

    NASA Astrophysics Data System (ADS)

    Manrao, Elizabeth A.

    Unlocking the information hidden in the human genome provides insight into the inner workings of complex biological systems and can be used to greatly improve health-care. In order to allow for widespread sequencing, new technologies are required that provide fast and inexpensive readings of DNA. Nanopore sequencing is a third generation DNA sequencing technology that is currently being developed to fulfill this need. In nanopore sequencing, a voltage is applied across a small pore in an electrolyte solution and the resulting ionic current is recorded. When DNA passes through the channel, the ionic current is partially blocked. If the DNA bases uniquely modulate the ionic current flowing through the channel, the time trace of the current can be related to the sequence of DNA passing through the pore. There are two main challenges to realizing nanopore sequencing: identifying a pore with sensitivity to single nucleotides and controlling the translocation of DNA through the pore so that the small single nucleotide current signatures are distinguishable from background noise. In this dissertation, I explore the use of Mycobacterium smegmatis porin A (MspA) for nanopore sequencing. In order to determine MspA's sensitivity to single nucleotides, DNA strands of various compositions are held in the pore as the resulting ionic current is measured. DNA is immobilized in MspA by attaching it to a large molecule which acts as an anchor. This technique confirms the single nucleotide resolution of the pore and additionally shows that MspA is sensitive to epigenetic modifications and single nucleotide polymorphisms. The forces from the electric field within MspA, the effective charge of nucleotides, and elasticity of DNA are estimated using a Freely Jointed Chain model of single stranded DNA. These results offer insight into the interactions of DNA within the pore. With the nucleotide sensitivity of MspA confirmed, a method is introduced to controllably pass DNA through the pore

  5. Experience using web services for biological sequence analysis.

    PubMed

    Stockinger, Heinz; Attwood, Teresa; Chohan, Shahid Nadeem; Côté, Richard; Cudré-Mauroux, Philippe; Falquet, Laurent; Fernandes, Pedro; Finn, Robert D; Hupponen, Taavi; Korpelainen, Eija; Labarga, Alberto; Laugraud, Aurelie; Lima, Tania; Pafilis, Evangelos; Pagni, Marco; Pettifer, Steve; Phan, Isabelle; Rahman, Nazim

    2008-11-01

    Programmatic access to data and tools through the web using so-called web services has an important role to play in bioinformatics. In this article, we discuss the most popular approaches based on SOAP/WS-I and REST and describe our, a cross section of the community, experiences with providing and using web services in the context of biological sequence analysis. We briefly review main technological approaches as well as best practice hints that are useful for both users and developers. Finally, syntactic and semantic data integration issues with multiple web services are discussed. PMID:18621748

  6. Membrane platforms for biological nanopore sensing and sequencing.

    PubMed

    Schmidt, Jacob

    2016-06-01

    In the past two decades, biological nanopores have been developed and explored for use in sensing applications as a result of their exquisite sensitivity and easily engineered, reproducible, and economically manufactured structures. Nanopore sensing has been shown to differentiate between highly similar analytes, measure polymer size, detect the presence of specific genes, and rapidly sequence nucleic acids translocating through the pore. Devices featuring protein nanopores have been limited in part by the membrane support containing the nanopore, the shortcomings of which have been addressed in recent work developing new materials, approaches, and apparatus resulting in membrane platforms featuring automatability and increased robustness, lifetime, and measurement throughput. PMID:26773300

  7. COMPARISON OF BIOLOGICAL COMMUNITIES: THE PROBLEM OF SAMPLE REPRESENTATIVENESS

    EPA Science Inventory

    Obtaining an adequate, representative sample of biological communities or assemblages to make richness or compositional comparisons among sites is a continuing challenge. Traditionally, sample size is based on numbers of replicates or area collected or numbers of individuals enum...

  8. Discovering Motifs in Biological Sequences Using the Micron Automata Processor.

    PubMed

    Roy, Indranil; Aluru, Srinivas

    2016-01-01

    Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26,11). We propose a novel algorithm for the (l,d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39,18) and (40,17). The paper serves as a useful guide to solving problems using this new accelerator technology. PMID:26886735

  9. Comparison between optimized GRE and RARE sequences for 19F MRI studies

    NASA Astrophysics Data System (ADS)

    Soffientini, Chiara D.; Mastropietro, Alfonso; Caffini, Matteo; Cocco, Sara; Zucca, Ileana; Scotti, Alessandro; Baselli, Giuseppe; Bruzzone, Maria Grazia

    2014-03-01

    In 19F-MRI studies limiting factors are the presence of a low signal due to the low concentration of 19F-nuclei, necessary for biological applications, and the inherent low sensitivity of MRI. Hence, acquiring images using the pulse sequence with the best signal to noise ratio (SNR) by optimizing the acquisition parameters specifically to a 19F compound is a core issue. In 19F-MRI, multiple-spin-echo (RARE) and gradient-echo (GRE) are the two most frequently used pulse sequence families; therefore we performed an optimization study of GRE pulse sequences based on numerical simulations and experimental acquisitions on fluorinated compounds. We compared GRE performance to an optimized RARE sequence. Images were acquired on a 7T MRI preclinical scanner on phantoms containing different fluorinated compounds. Actual relaxation times (T1, T2, T2*) were evaluated in order to predict SNR dependence on sequence parameters. Experimental comparisons between spoiled GRE and RARE, obtained at a fixed acquisition time and in steady state condition, showed RARE sequence outperforming the spoiled GRE (up to 406% higher). Conversely, the use of the unbalanced-SSFP showed a significant increase in SNR compared to RARE (up to 28% higher). Moreover, this sequence (as GRE in general) was confirmed to be virtually insensitive to T1 and T2 relaxation times, after proper optimization, thus improving marker independence from the biological environment. These results confirm the efficacy of the proposed optimization tool and foster further investigation addressing in-vivo applicability.

  10. Comparison of mitochondrial genome sequences of pangolins (Mammalia, Pholidota).

    PubMed

    Hassanin, Alexandre; Hugot, Jean-Pierre; van Vuuren, Bettine Jansen

    2015-04-01

    The complete mitochondrial genome was sequenced for three species of pangolins, Manis javanica, Phataginus tricuspis, and Smutsia temminckii, and comparisons were made with two other species, Manis pentadactyla and Phataginus tetradactyla. The genome of Manidae contains the 37 genes found in a typical mammalian genome, and the structure of the control region is highly conserved among species. In Manis, the overall base composition differs from that found in African genera. Phylogenetic analyses support the monophyly of the genera Manis, Phataginus, and Smutsia, as well as the basal division between Maninae and Smutsiinae. Comparisons with GenBank sequences reveal that the reference genomes of M. pentadactyla and P. tetradactyla (accession numbers NC_016008 and NC_004027) were sequenced from misidentified taxa, and that a new species of tree pangolin should be described in Gabon. PMID:25746396

  11. Quantitative Comparison of Large-Scale DNA Enrichment Sequencing Data.

    PubMed

    Lienhard, Matthias; Chavez, Lukas

    2016-01-01

    DNA enrichment followed by sequencing (DNA-IP seq) is a versatile tool in molecular biology with a wide variety of applications. Computational analysis of differential DNA enrichment between conditions is important for identifying epigenetic alterations in disease compared to healthy controls and for revealing dynamic epigenetic modifications throughout normal and distorted cell differentiation and development. We present a protocol for genome-wide comparative analysis of DNA-IP sequencing data to identify statistically significant differential sequencing coverage between two conditions by considering variation across replicates. The protocol provides a detailed description for the comparative analysis of DNA-IP sequencing data including basic data processing, quality controls, and identification of differential enrichment using the Bioconductor package "MEDIPS". PMID:27008016

  12. Alignment-Free Sequence Comparison Based on Next-Generation Sequencing Reads

    PubMed Central

    Song, Kai; Ren, Jie; Zhai, Zhiyuan; Liu, Xuemei

    2013-01-01

    Abstract Next-generation sequencing (NGS) technologies have generated enormous amounts of shotgun read data, and assembly of the reads can be challenging, especially for organisms without template sequences. We study the power of genome comparison based on shotgun read data without assembly using three alignment-free sequence comparison statistics, D2, \\documentclass{aastex}\\usepackage{amsbsy}\\usepackage{amsfonts}\\usepackage{amssymb}\\usepackage{bm}\\usepackage{mathrsfs}\\usepackage{pifont}\\usepackage{stmaryrd}\\usepackage{textcomp}\\usepackage{portland, xspace}\\usepackage{amsmath, amsxtra}\\pagestyle{empty}\\DeclareMathSizes{10}{9}{7}{6}\\begin{document} $$\\textbf{\\textit{D}}_{\\bf 2}^{\\bf *}$$ \\end{document}, and \\documentclass{aastex}\\usepackage{amsbsy}\\usepackage{amsfonts}\\usepackage{amssymb}\\usepackage{bm}\\usepackage{mathrsfs}\\usepackage{pifont}\\usepackage{stmaryrd}\\usepackage{textcomp}\\usepackage{portland, xspace}\\usepackage{amsmath, amsxtra}\\pagestyle{empty}\\DeclareMathSizes{10}{9}{7}{6}\\begin{document} $$\\textbf{\\textit{D}}_{\\bf 2}^S$$ \\end{document}, both theoretically and by simulations. Theoretical formulas for the power of detecting the relationship between two sequences related through a common motif model are derived. It is shown that both \\documentclass{aastex}\\usepackage{amsbsy}\\usepackage{amsfonts}\\usepackage{amssymb}\\usepackage{bm}\\usepackage{mathrsfs}\\usepackage{pifont}\\usepackage{stmaryrd}\\usepackage{textcomp}\\usepackage{portland, xspace}\\usepackage{amsmath, amsxtra}\\pagestyle{empty}\\DeclareMathSizes{10}{9}{7}{6}\\begin{document} $$\\textbf{\\textit{D}}_{\\bf 2}^{\\bf *}$$ \\end{document} and \\documentclass{aastex}\\usepackage{amsbsy}\\usepackage{amsfonts}\\usepackage{amssymb}\\usepackage{bm}\\usepackage{mathrsfs}\\usepackage{pifont}\\usepackage{stmaryrd}\\usepackage{textcomp}\\usepackage{portland, xspace}\\usepackage{amsmath, amsxtra}\\pagestyle{empty}\\DeclareMathSizes{10}{9}{7}{6}\\begin

  13. Sequenced genomes and rapidly emerging technologies pave the way for conifer evolutionary developmental biology

    PubMed Central

    Uddenberg, Daniel; Akhter, Shirin; Ramachandran, Prashanth; Sundström, Jens F.; Carlsbecker, Annelie

    2015-01-01

    Conifers, Ginkgo, cycads and gnetophytes comprise the four groups of extant gymnosperms holding a unique position of sharing common ancestry with the angiosperms. Comparative studies of gymnosperms and angiosperms are the key to a better understanding of ancient seed plant morphologies, how they have shifted over evolution to shape modern day species, and how the genes governing these morphologies have evolved. However, conifers and other gymnosperms have been notoriously difficult to study due to their long generation times, inaccessibility to genetic experimentation and unavailable genome sequences. Now, with three draft genomes from spruces and pines, rapid advances in next generation sequencing methods for genome wide expression analyses, and enhanced methods for genetic transformation, we are much better equipped to address a number of key evolutionary questions relating to seed plant evolution. In this mini-review we highlight recent progress in conifer developmental biology relevant to evo-devo questions. We discuss how genome sequence data and novel techniques might allow us to explore genetic variation and naturally occurring conifer mutants, approaches to reduce long generation times to allow for genetic studies in conifers, and other potential upcoming research avenues utilizing current and emergent techniques. Results from developmental studies of conifers and other gymnosperms in comparison to those in angiosperms will provide information to trace core molecular developmental control tool kits of ancestral seed plants, but foremost they will greatly improve our understanding of the biology of conifers and other gymnosperms in their own right. PMID:26579190

  14. Detecting frame shifts by amino acid sequence comparison.

    PubMed

    Claverie, J M

    1993-12-20

    Various amino acid substitution scoring matrices are used in conjunction with local alignments programs to detect regions of similarity and infer potential common ancestry between proteins. The usual scoring schemes derive from the implicit hypothesis that related proteins evolve from a common ancestor by the accumulation of point mutations and that amino acids tend to be progressively substituted by others with similar properties. However, other frequent single mutation events, like nucleotide insertion or deletion and gene inversion, change the translation reading frame and cause previously encoded amino acid sequences to become unrecognizable at once. Here, I derive five new types of scoring matrix, each capable of detecting a specific frame shift (deletion, insertion and inversion in 3 frames) and use them with a regular local alignments program to detect amino acid sequences that may have derived from alternative reading frames of the same nucleotide sequence. Frame shifts are inferred from the sole comparison of the protein sequences. The five scoring matrices were used with the BLASTP program to compare all the protein sequences in the Swissprot database. Surprisingly, the searches revealed hundreds of highly significant frame shift matches, of which many are likely to represent sequencing errors. Others provide some evidence that frame shift mutations might be used in protein evolution as a way to create new amino acid sequences from pre-existing coding regions. PMID:7903399

  15. Sequence information signal processor for local and global string comparisons

    DOEpatents

    Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

    1997-01-01

    A sequence information signal processing integrated circuit chip designed to perform high speed calculation of a dynamic programming algorithm based upon the algorithm defined by Waterman and Smith. The signal processing chip of the present invention is designed to be a building block of a linear systolic array, the performance of which can be increased by connecting additional sequence information signal processing chips to the array. The chip provides a high speed, low cost linear array processor that can locate highly similar global sequences or segments thereof such as contiguous subsequences from two different DNA or protein sequences. The chip is implemented in a preferred embodiment using CMOS VLSI technology to provide the equivalent of about 400,000 transistors or 100,000 gates. Each chip provides 16 processing elements, and is designed to provide 16 bit, two's compliment operation for maximum score precision of between -32,768 and +32,767. It is designed to provide a comparison between sequences as long as 4,194,304 elements without external software and between sequences of unlimited numbers of elements with the aid of external software. Each sequence can be assigned different deletion and insertion weight functions. Each processor is provided with a similarity measure device which is independently variable. Thus, each processor can contribute to maximum value score calculation using a different similarity measure.

  16. It’s More Than Stamp Collecting: How Genome Sequencing Can Unify Biological Research

    PubMed Central

    Richards, Stephen

    2015-01-01

    The availability of reference genome sequences, especially the human reference, has revolutionized the study of biology. However, whilst the genomes of some species have been fully sequenced, a wide range of biological problems still cannot be effectively studied for lack of genome sequence information. Here, I identify neglected areas of biology and describe how both targeted species sequencing and more broad taxonomic surveys of the tree of life can address important biological questions. I enumerate the significant benefits that would accrue from sequencing a broader range of taxa, as well as discuss the technical advances in sequencing and assembly methods that would allow for wide-ranging application of whole-genome analysis. Finally, I suggest that in addition to “Big Science” survey initiatives to sequence the tree of life, a modified infrastructure-funding paradigm would better support reference genome sequence generation for research communities most in need. PMID:26003218

  17. Applications of next-generation sequencing techniques in plant biology

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The last several years have seen revolutionary advances in DNA sequencing technologies with the advent of next generation sequencing (NGS) techniques. NGS methods now allow millions of bases to be sequenced in one round, at a fraction of the cost relative to traditional Sanger sequencing, allowing u...

  18. The DNA sequence and biology of human chromosome 19

    SciTech Connect

    Grimwood, J; Gordon, L A; Olsen, A; Terry, A; Schmutz, J; Lamerdin, J; Hellsten, U; Goodstein, D; Couronne, O; Tran-Gyamfi, M

    2004-04-06

    Chromosome 19 has the highest gene density of all human chromosomes, more than double the genome-wide average. The large clustered gene families, corresponding high GC content, CpG islands and density of repetitive DNA indicate a chromosome rich in biological and evolutionary significance. Here we describe 55.8 million base pairs of highly accurate finished sequence representing 99.9% of the euchromatin portion of the chromosome. Manual curation of gene loci reveals 1,461 protein-coding genes and 321 pseudogenes. Among these are genes directly implicated in Mendelian disorders, including familial hypercholesterolemia and insulin-resistant diabetes. Nearly one quarter of these genes belong to tandemly arranged families, encompassing more than 25% of the chromosome. Comparative analyses show a fascinating picture of conservation and divergence, revealing large blocks of gene orthology with rodents, scattered regions with more recent gene family expansions and deletions, and segments of coding and non-coding conservation with the distant fish species Takifugu.

  19. The DNA sequence and biology of human chromosome 19

    SciTech Connect

    Grimwood, Jane; Gordon, Laurie A.; Olsen, Anne; Terry, Astrid; Schmutz, Jeremy; Lamerdin, Jane; Hellsten, Uffe; Goodstein, David; Couronne, Olivier; Tran-Gyamfi, Mary; Aerts, Andrea; Altherr, Michael; Ashworth, Linda; Bajorek, Eva; Black, Stacey; Branscomb, Elbert; Caenepeel, Sean; Carrano, Anthony; Caoile, Chenier; Chan, Yee Man; Christensen, Mari; Cleland, Catherine A.; Copeland, Alex; Dalin, Eileen; Dehal, Paramvir; Denys, Mirian; Detter, John C.; Escobar, Julio; Flowers, Dave; Fotopulos, Dea; Garcia, Carmen; Georgescu, Anca M.; Glavina, Tijana; Gomez, Maria; Gonzales, Eldelyn; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Ho, Issac; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Larionov, Vladimer; Leem, Sun-Hee; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Malfatti, Stephanie; Martinez, Diego; McCready, Paula; Medina, Catherine; Morgan, Jenna; Nelson, Kathryn; Nolan, Matt; Ovcharenko, Ivan; Pitluck, Sam; Pollard, Martin; Popkie, Anthony P.; Predki, Paul; Quan, Glenda; Ramirez, Lucia; Rash, Sam; Retterer, James; Rodriguez, Alex; Rogers, Stephanine; Salamov, Asaf; Salazar, Angelica; She, Xinwei; Smith, Doug; Slezak, Tom; Solovyev, Victor; Thayer, Nina; Tice, Hope; Tsai, Ming; Ustaszewska, Anna; Vo, Nu; Wagner, Mark; Wheeler, Jeremy; Wu, Kevin; Xie, Gary; Yang, Joan; Dubchak, Inna; Furey, Terrence S.; DeJong, Pieter; Dickson, Mark; Gordon, David; Eichler, Evan E.; Pennacchio, Len A.; Richardson, Paul; Stubbs, Lisa; Rokhsar, Daniel S.; Myers, Richard M.; Rubin, Edward M.; Lucas, Susan M.

    2003-09-15

    Chromosome 19 has the highest gene density of all human chromosomes, more than double the genome-wide average. The large clustered gene families, corresponding high G1C content, CpG islands and density of repetitive DNA indicate a chromosome rich in biological and evolutionary significance. Here we describe 55.8 million base pairs of highly accurate finished sequence representing 99.9 percent of the euchromatin portion of the chromosome. Manual curation of gene loci reveals 1,461 protein-coding genes and 321 pseudogenes. Among these are genes directly implicated in mendelian disorders, including familial hypercholesterolaemia and insulin-resistant diabetes. Nearly one-quarter of these genes belong to tandemly arranged families, encompassing more than 25 percent of the chromosome. Comparative analyses show a fascinating picture of conservation and divergence, revealing large blocks of gene orthology with rodents, scattered regions with more recent gene family expansions and deletions, a nd segments of coding and non-coding conservation with the distant fish species Takifugu.

  20. Extraction of high quality k-words for alignment-free sequence comparison.

    PubMed

    Gunasinghe, Upuli; Alahakoon, Damminda; Bedingfield, Susan

    2014-10-01

    The weighted Euclidean distance (D(2)) is one of the earliest dissimilarity measures used for alignment free comparison of biological sequences. This distance measure and its variants have been used in numerous applications due to its fast computation, and many variants of it have been subsequently introduced. The D(2) distance measure is based on the count of k-words in the two sequences that are compared. Traditionally, all k-words are compared when computing the distance. In this paper we show that similar accuracy in sequence comparison can be achieved by using a selected subset of k-words. We introduce a term variance based quality measure for identifying the important k-words. We demonstrate the application of the proposed technique in phylogeny reconstruction and show that up to 99% of the k-words can be filtered out for certain datasets, resulting in faster sequence comparison. The paper also presents an exploratory analysis based evaluation of optimal k-word values and discusses the impact of using subsets of k-words in such optimal instances. PMID:24846728

  1. Toward in Silico Biology (from Sequences to Systems)

    NASA Astrophysics Data System (ADS)

    Yamato, Ichiro; Ando, Tadashi; Suzuki, Ayumi; Harada, Kazuo; Itoh, Seigo; Miyazaki, Satoru; Kobayashi, Naoki; Takeda, Masayuki

    2008-03-01

    Thanks to the many large-scale genome sequencing projects, thousands of primary sequences have been determined and the number of uncharacterized sequences will continue to grow. One of the major goals of genome science is to create a living system in a computer using such sequence information. To this goal, many researchers are developing several algorithms to predict folding structures of proteins from their amino acid sequences, to predict functions from sequence information or structures, and finally to simulate living systems in a computer. In this review, we describe our trials toward this goal.

  2. The nucleotide sequence of the mouse immunoglobulin epsilon gene: comparison with the human epsilon gene sequence.

    PubMed Central

    Ishida, N; Ueda, S; Hayashida, H; Miyata, T; Honjo, T

    1982-01-01

    We have determined the nucleotide sequence of the immunoglobulin epsilon gene cloned from newborn mouse DNA. The epsilon gene sequence allows prediction of the amino acid sequence of the constant region of the epsilon chain and comparison of it with sequences of the human epsilon and other mouse immunoglobulin genes. The epsilon gene was shown to be under the weakest selection pressure at the protein level among the immunoglobulin genes although the divergence at the synonymous position is similar. Our results suggest that the epsilon gene may be dispensable, which is in accord with the fact that IgE has only obscure roles in the immune defense system but has an undesirable role as a mediator of hypersensitivity. The sequence data suggest that the human and murine epsilon genes were derived from different ancestors duplicated a long time ago. The amino acid sequence of the epsilon chain is more homologous to those of the gamma chains than the other mouse heavy chains. Two membrane exons, separated by an 80-base intron, were identified 1.7 kb 3' to the CH4 domain of the epsilon gene and shown to conserve a hydrophobic portion similar to those of other heavy chain genes. RNA blot hybridization showed that the epsilon membrane exons are transcribed into two species of mRNA in an IgE hybridoma. Images Fig. 4. PMID:6329728

  3. Comparison of DNA Quantification Methods for Next Generation Sequencing

    PubMed Central

    Robin, Jérôme D.; Ludlow, Andrew T.; LaRanger, Ryan; Wright, Woodring E.; Shay, Jerry W.

    2016-01-01

    Next Generation Sequencing (NGS) is a powerful tool that depends on loading a precise amount of DNA onto a flowcell. NGS strategies have expanded our ability to investigate genomic phenomena by referencing mutations in cancer and diseases through large-scale genotyping, developing methods to map rare chromatin interactions (4C; 5C and Hi-C) and identifying chromatin features associated with regulatory elements (ChIP-seq, Bis-Seq, ChiA-PET). While many methods are available for DNA library quantification, there is no unambiguous gold standard. Most techniques use PCR to amplify DNA libraries to obtain sufficient quantities for optical density measurement. However, increased PCR cycles can distort the library’s heterogeneity and prevent the detection of rare variants. In this analysis, we compared new digital PCR technologies (droplet digital PCR; ddPCR, ddPCR-Tail) with standard methods for the titration of NGS libraries. DdPCR-Tail is comparable to qPCR and fluorometry (QuBit) and allows sensitive quantification by analysis of barcode repartition after sequencing of multiplexed samples. This study provides a direct comparison between quantification methods throughout a complete sequencing experiment and provides the impetus to use ddPCR-based quantification for improvement of NGS quality. PMID:27048884

  4. Comparison of DNA Quantification Methods for Next Generation Sequencing.

    PubMed

    Robin, Jérôme D; Ludlow, Andrew T; LaRanger, Ryan; Wright, Woodring E; Shay, Jerry W

    2016-01-01

    Next Generation Sequencing (NGS) is a powerful tool that depends on loading a precise amount of DNA onto a flowcell. NGS strategies have expanded our ability to investigate genomic phenomena by referencing mutations in cancer and diseases through large-scale genotyping, developing methods to map rare chromatin interactions (4C; 5C and Hi-C) and identifying chromatin features associated with regulatory elements (ChIP-seq, Bis-Seq, ChiA-PET). While many methods are available for DNA library quantification, there is no unambiguous gold standard. Most techniques use PCR to amplify DNA libraries to obtain sufficient quantities for optical density measurement. However, increased PCR cycles can distort the library's heterogeneity and prevent the detection of rare variants. In this analysis, we compared new digital PCR technologies (droplet digital PCR; ddPCR, ddPCR-Tail) with standard methods for the titration of NGS libraries. DdPCR-Tail is comparable to qPCR and fluorometry (QuBit) and allows sensitive quantification by analysis of barcode repartition after sequencing of multiplexed samples. This study provides a direct comparison between quantification methods throughout a complete sequencing experiment and provides the impetus to use ddPCR-based quantification for improvement of NGS quality. PMID:27048884

  5. High-throughput sequencing in veterinary infection biology and diagnostics.

    PubMed

    Belák, S; Karlsson, O E; Leijon, M; Granberg, F

    2013-12-01

    Sequencing methods have improved rapidly since the first versions of the Sanger techniques, facilitating the development of very powerful tools for detecting and identifying various pathogens, such as viruses, bacteria and other microbes. The ongoing development of high-throughput sequencing (HTS; also known as next-generation sequencing) technologies has resulted in a dramatic reduction in DNA sequencing costs, making the technology more accessible to the average laboratory. In this White Paper of the World Organisation for Animal Health (OIE) Collaborating Centre for the Biotechnology-based Diagnosis of Infectious Diseases in Veterinary Medicine (Uppsala, Sweden), several approaches and examples of HTS are summarised, and their diagnostic applicability is briefly discussed. Selected future aspects of HTS are outlined, including the need for bioinformatic resources, with a focus on improving the diagnosis and control of infectious diseases in veterinary medicine. PMID:24761741

  6. Bacterial epidemiology and biology - lessons from genome sequencing

    PubMed Central

    2011-01-01

    Next-generation sequencing has ushered in a new era of microbial genomics, enabling the detailed historical and geographical tracing of bacteria. This is helping to shape our understanding of bacterial evolution. PMID:22027015

  7. A singular value decomposition approach for improved taxonomic classification of biological sequences

    PubMed Central

    2011-01-01

    Background Singular value decomposition (SVD) is a powerful technique for information retrieval; it helps uncover relationships between elements that are not prima facie related. SVD was initially developed to reduce the time needed for information retrieval and analysis of very large data sets in the complex internet environment. Since information retrieval from large-scale genome and proteome data sets has a similar level of complexity, SVD-based methods could also facilitate data analysis in this research area. Results We found that SVD applied to amino acid sequences demonstrates relationships and provides a basis for producing clusters and cladograms, demonstrating evolutionary relatedness of species that correlates well with Linnaean taxonomy. The choice of a reasonable number of singular values is crucial for SVD-based studies. We found that fewer singular values are needed to produce biologically significant clusters when SVD is employed. Subsequently, we developed a method to determine the lowest number of singular values and fewest clusters needed to guarantee biological significance; this system was developed and validated by comparison with Linnaean taxonomic classification. Conclusions By using SVD, we can reduce uncertainty concerning the appropriate rank value necessary to perform accurate information retrieval analyses. In tests, clusters that we developed with SVD perfectly matched what was expected based on Linnaean taxonomy. PMID:22369633

  8. Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification

    PubMed Central

    Borozan, Ivan; Watt, Stuart; Ferretti, Vincent

    2015-01-01

    Motivation: Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Results: Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. Availability and implementation: All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. Contact: ivan.borozan@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25573913

  9. Effect of k-tuple length on sample-comparison with high-throughput sequencing data.

    PubMed

    Wang, Ying; Lei, Xiaoye; Wang, Shun; Wang, Zicheng; Song, Nianfeng; Zeng, Feng; Chen, Ting

    2016-01-22

    The high-throughput metagenomic sequencing offers a powerful technique to compare the microbial communities. Without requiring extra reference sequences, alignment-free models with short k-tuple (k = 2-10 bp) yielded promising results. Short k-tuples describe the overall statistical distribution, but is hard to capture the specific characteristics inside one microbial community. Longer k-tuple contains more abundant information. However, because the frequency vector of long k-tuple(k ≥ 30 bp) is sparse, the statistical measures designed for short k-tuples are not applicable. In our study, we considered each tuple as a meaningful word and then each sequencing data as a document composed of the words. Therefore, the comparison between two sequencing data is processed as "topic analysis of documents" in text mining. We designed a pipeline with long k-tuple features to compare metagenomic samples combined using algorithms from text mining and pattern recognition. The pipeline is available at http://culotuple.codeplex.com/. Experiments show that our pipeline with long k-tuple features: ①separates genomes with high similarity; ②outperforms short k-tuple models in all experiments. When k ≥ 12, the short k-tuple measures are not applicable anymore. When k is between 20 and 40, long k-tuple pipeline obtains much better grouping results; ③is free from the effect of sequencing platforms/protocols. ③We obtained meaningful and supported biological results on the 40-tuples selected for comparison. PMID:26721429

  10. International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons.

    PubMed

    Olson, Nathan D; Lund, Steven P; Zook, Justin M; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B

    2015-03-01

    This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing(®), or Ion Torrent PGM(®). The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies. PMID:27077030

  11. Test on the structure of biological sequences via Chaos Game Representation.

    PubMed

    Cénac, Peggy

    2005-01-01

    In this paper biological sequences are modelled by stationary ergodic sequences. A new family of statistical tests to characterize the randomness of the inputs is proposed and analyzed. Tests for independence and for the determination of the appropriate order of a Markov chain are constructed with the Chaos Game Representation (CGR), and applied to several genomes. PMID:16646845

  12. TRANSFAC database as a bridge between sequence data libraries and biological function

    SciTech Connect

    Wingender, E.; Karas, H.; Knueppel, R.

    1996-12-31

    The TRANSFAC database contains information about regulatory DNA sequences and the proteins (transcription factors) binding to and acting through them. It may thus serve as a dictionary for the biological meaning of these sequence elements. Moreover, the TRANSFAC data can be used to describe these elements, to define consensi and matrices for elements of certain function, and thus to provide means of identifying regulatory signals in newly unravelled genomic sequences. 12 refs., 1 fig.

  13. A fast algorithm for exact sequence search in biological sequences using polyphase decomposition

    PubMed Central

    Srikantha, Abhilash; Bopardikar, Ajit S.; Kaipa, Kalyan Kumar; Venkataraman, Parthasarathy; Lee, Kyusang; Ahn, TaeJin; Narayanan, Rangavittal

    2010-01-01

    Motivation: Exact sequence search allows a user to search for a specific DNA subsequence in a larger DNA sequence or database. It serves as a vital block in many areas such as Pharmacogenetics, Phylogenetics and Personal Genomics. As sequencing of genomic data becomes increasingly affordable, the amount of sequence data that must be processed will also increase exponentially. In this context, fast sequence search algorithms will play an important role in exploiting the information contained in the newly sequenced data. Many existing algorithms do not scale up well for large sequences or databases because of their high-computational costs. This article describes an efficient algorithm for performing fast searches on large DNA sequences. It makes use of hash tables of Q-grams that are constructed after downsampling the database, to enable efficient search and memory use. Time complexity for pattern search is reduced using beam pruning techniques. Theoretical complexity calculations and performance figures are presented to indicate the potential of the proposed algorithm. Contact: s.abhilash@samsung.com; ajit.b@samsung.com PMID:20823301

  14. Evaluation of intra- and interspecific divergence of satellite DNA sequences by nucleotide frequency calculation and pairwise sequence comparison

    PubMed Central

    2003-01-01

    Satellite DNA sequences are known to be highly variable and to have been subjected to concerted evolution that homogenizes member sequences within species. We have analyzed the mode of evolution of satellite DNA sequences in four fishes from the genus Diplodus by calculating the nucleotide frequency of the sequence array and the phylogenetic distances between member sequences. Calculation of nucleotide frequency and pairwise sequence comparison enabled us to characterize the divergence among member sequences in this satellite DNA family. The results suggest that the evolutionary rate of satellite DNA in D. bellottii is about two-fold greater than the average of the other three fishes, and that the sequence homogenization event occurred in D. puntazzo more recently than in the others. The procedures described here are effective to characterize mode of evolution of satellite DNA. PMID:12734555

  15. Evaluation of intra- and interspecific divergence of satellite DNA sequences by nucleotide frequency calculation and pairwise sequence comparison.

    PubMed

    Kato, Mikio

    2003-01-01

    Satellite DNA sequences are known to be highly variable and to have been subjected to concerted evolution that homogenizes member sequences within species. We have analyzed the mode of evolution of satellite DNA sequences in four fishes from the genus Diplodus by calculating the nucleotide frequency of the sequence array and the phylogenetic distances between member sequences. Calculation of nucleotide frequency and pairwise sequence comparison enabled us to characterize the divergence among member sequences in this satellite DNA family. The results suggest that the evolutionary rate of satellite DNA in D. bellottii is about two-fold greater than the average of the other three fishes, and that the sequence homogenization event occurred in D. puntazzo more recently than in the others. The procedures described here are effective to characterize mode of evolution of satellite DNA. PMID:12734555

  16. Whole Chloroplast Genome Sequencing in Fragaria Using Deep Sequencing: A Comparison of Three Methods

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Chloroplast sequences previously investigated in Fragaria revealed low amounts of variation. Deep sequencing technologies enable economical sequencing of complete chloroplast genomes. These sequences can potentially provide robust phylogenetic resolution, even at low taxonomic levels within plant gr...

  17. Next-generation sequencing workflows in veterinary infection biology: towards validation and quality assurance.

    PubMed

    Van Borm, S; Wang, J; Granberg, F; Colling, A

    2016-04-01

    Recent advancements in DNA sequencing methodologies and sequence data analysis have revolutionised research in many areas of biology and medicine, including veterinary infection biology. New technology is poised to bridge the gap between the research and diagnostic laboratory. This paper defines the potential diagnostic value and purposes of next-generation sequencing (NGS) applications in veterinary infection biology and explores their compatibility with the existing validation principles and methods of the World Organisation for Animal Health. Critical parameters for validation and quality control (quality metrics) are suggested, with reference to established validation and quality assurance guidelines for NGS-based methods of diagnosing human heritable diseases. Although most currently described NGS applications in veterinary infection biology are not primary diagnostic tests that directly result in control measures, this critical reflection on the advantages and remaining challenges of NGS technology should stimulate discussion on its diagnostic value and on the potential to validate NGS methods and monitor their diagnostic performance. PMID:27217169

  18. Alignment-free comparison of genome sequences by a new numerical characterization.

    PubMed

    Huang, Guohua; Zhou, Houqing; Li, Yongfan; Xu, Lixin

    2011-07-21

    In order to compare different genome sequences, an alignment-free method has proposed. First, we presented a new graphical representation of DNA sequences without degeneracy, which is conducive to intuitive comparison of sequences. Then, a new numerical characterization based on the representation was introduced to quantitatively depict the intrinsic nature of genome sequences, and considered as a 10-dimensional vector in the mathematical space. Alignment-free comparison of sequences was performed by computing the distances between vectors of the corresponding numerical characterizations, which define the evolutionary relationship. Two data sets of DNA sequences were constructed to assess the performance on sequence comparison. The results illustrate well validity of the method. The new numerical characterization provides a powerful tool for genome comparison. PMID:21536050

  19. The Transcriptome Sequence of Dientamoeba fragilis Offers New Biological Insights on its Metabolism, Kinome, Degradome and Potential Mechanisms of Pathogenicity.

    PubMed

    Barratt, Joel L N; Cao, Maisie; Stark, Damien J; Ellis, John T

    2015-09-01

    Dientamoeba fragilis is a human bowel parasite with a worldwide distribution. Dientamoeba was once described as a rare and harmless commensal though recent reports suggest it is common and potentially pathogenic. Molecular data on Dientamoeba is scarce which limits our understanding of this parasite. To address this, sequencing of the Dientamoeba transcriptome was performed. Messenger RNA was extracted from cultured Dientamoeba trophozoites originating from clinical stool specimens, and sequenced using Roche GS FLX and Illumina HiSeq technologies. In total 6,595 Dientamoeba transcripts were identified. These sequences were analysed using the BLAST2GO software suite and via BLAST comparisons to sequences available from TrichDB, GenBank, MEROPS and kinase.com. Several novel KEGG pathway maps were generated and gene ontology analysis was also performed. These results are thoroughly discussed guided by knowledge available for other related protozoa. Attention is paid to the novel biological insights afforded by this data including peptidases and kinases of Dientamoeba, as well as its metabolism, novel chemotherapeutics and possible mechanisms of pathogenicity. Currently, this work represents the largest contribution to our understanding of Dientamoeba molecular biology and also represents a major contribution to our understanding of the trichomonads generally, many of which are important pathogens of humans and animals. PMID:26188431

  20. Comparison on extreme pathways reveals nature of different biological processes

    PubMed Central

    2014-01-01

    Background Constraint-based reconstruction and analysis (COBRA) is used for modeling genome-scale metabolic networks (MNs). In a COBRA model, extreme pathways (ExPas) are the edges of its conical solution space, which is formed by all viable steady-state flux distributions. ExPa analysis has been successfully applied to MNs to reveal their phenotypic capabilities and properties. Recently, the COBRA framework has been extended to transcriptional regulatory networks (TRNs) and transcriptional and translational networks (TTNs), so efforts are needed to determine whether ExPa analysis is also effective on these two types of networks. Results In this paper, the ExPas resulting from the COBRA models of E.coli's MN, TRN and TTN were horizontally compared from 5 aspects: (1) Total number and the ratio of their amount to reaction amount; (2) Length distribution; (3) Reaction participation; (4) Correlated reaction sets (CoSets); (5) interconnectivity degree. Significant discrepancies in above properties were observed during the comparison, which reveals the biological natures of different biological processes. Besides, by demonstrating the application of ExPa analysis on E.coli, we provide a practical guidance of an improved approach to compute ExPas on COBRA models of TRNs. Conclusions ExPas of E.coli's MN, TRN and TTN have different properties, which are strongly connected with various biological natures of biochemical networks, such as topological structure, specificity and redundancy. Our study shows that ExPas are biologically meaningful on the newborn models and suggests the effectiveness of ExPa analysis on them. PMID:24565046

  1. FastAlert--an automatic search system to alert about new entries in biological sequence databanks.

    PubMed

    Eggenberger, F; Redaschi, N; Doelz, R

    1996-04-01

    This paper describes a new tool enabling awareness of new sequence databank entries of interest. The FastAlert system relieves the researcher from the burden of repeating FASTA searches in order to keep up with the rapidly growing amount of information found in biological sequence databanks. The query sequence can be submitted from any computer connected to the Internet. Upon registration, the databank, including the updates, is scanned at periodic intervals with the sequence provided. The results, so-called FastAlert reports, are delivered via electronic mail. The reports contain the FASTA best-scores list and the similarity statistics for each entry listed. PMID:8744775

  2. Correlation between MCAT Biology Content Specifications and Topic Scope and Sequence of General Education College Biology Textbooks

    PubMed Central

    Rissing, Steven W.

    2013-01-01

    Most American colleges and universities offer gateway biology courses to meet the needs of three undergraduate audiences: biology and related science majors, many of whom will become biomedical researchers; premedical students meeting medical school requirements and preparing for the Medical College Admissions Test (MCAT); and students completing general education (GE) graduation requirements. Biology textbooks for these three audiences present a topic scope and sequence that correlates with the topic scope and importance ratings of the biology content specifications for the MCAT regardless of the intended audience. Texts for “nonmajors,” GE courses appear derived directly from their publisher's majors text. Topic scope and sequence of GE texts reflect those of “their” majors text and, indirectly, the MCAT. MCAT term density of GE texts equals or exceeds that of their corresponding majors text. Most American universities require a GE curriculum to promote a core level of academic understanding among their graduates. This includes civic scientific literacy, recognized as an essential competence for the development of public policies in an increasingly scientific and technological world. Deriving GE biology and related science texts from majors texts designed to meet very different learning objectives may defeat the scientific literacy goals of most schools’ GE curricula. PMID:24006392

  3. Correlation between MCAT biology content specifications and topic scope and sequence of general education college biology textbooks.

    PubMed

    Rissing, Steven W

    2013-01-01

    Most American colleges and universities offer gateway biology courses to meet the needs of three undergraduate audiences: biology and related science majors, many of whom will become biomedical researchers; premedical students meeting medical school requirements and preparing for the Medical College Admissions Test (MCAT); and students completing general education (GE) graduation requirements. Biology textbooks for these three audiences present a topic scope and sequence that correlates with the topic scope and importance ratings of the biology content specifications for the MCAT regardless of the intended audience. Texts for "nonmajors," GE courses appear derived directly from their publisher's majors text. Topic scope and sequence of GE texts reflect those of "their" majors text and, indirectly, the MCAT. MCAT term density of GE texts equals or exceeds that of their corresponding majors text. Most American universities require a GE curriculum to promote a core level of academic understanding among their graduates. This includes civic scientific literacy, recognized as an essential competence for the development of public policies in an increasingly scientific and technological world. Deriving GE biology and related science texts from majors texts designed to meet very different learning objectives may defeat the scientific literacy goals of most schools' GE curricula. PMID:24006392

  4. Sequence comparison on a cluster of workstations using the PVM system

    SciTech Connect

    Guan, X.; Mural, R.J.; Uberbacher, E.C.

    1995-02-01

    We have implemented a distributed sequence comparison algorithm on a cluster of workstations using the PVM paradigm. This implementation has achieved similar performance to the intel iPSC/860 Hypercube, a massively parallel computer. The distributed sequence comparison algorithm serves as a search tool for two Internet servers GRAIL and GENQUEST. This paper describes the implementation and the performance of the algorithm.

  5. Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis.

    PubMed

    Bonham-Carter, Oliver; Steele, Joe; Bastola, Dhundy

    2014-11-01

    Modern sequencing and genome assembly technologies have provided a wealth of data, which will soon require an analysis by comparison for discovery. Sequence alignment, a fundamental task in bioinformatics research, may be used but with some caveats. Seminal techniques and methods from dynamic programming are proving ineffective for this work owing to their inherent computational expense when processing large amounts of sequence data. These methods are prone to giving misleading information because of genetic recombination, genetic shuffling and other inherent biological events. New approaches from information theory, frequency analysis and data compression are available and provide powerful alternatives to dynamic programming. These new methods are often preferred, as their algorithms are simpler and are not affected by synteny-related problems. In this review, we provide a detailed discussion of computational tools, which stem from alignment-free methods based on statistical analysis from word frequencies. We provide several clear examples to demonstrate applications and the interpretations over several different areas of alignment-free analysis such as base-base correlations, feature frequency profiles, compositional vectors, an improved string composition and the D2 statistic metric. Additionally, we provide detailed discussion and an example of analysis by Lempel-Ziv techniques from data compression. PMID:23904502

  6. Development of a candidate reference material for adventitious virus detection in vaccine and biologicals manufacturing by deep sequencing

    PubMed Central

    Mee, Edward T.; Preston, Mark D.; Minor, Philip D.; Schepelmann, Silke; Huang, Xuening; Nguyen, Jenny; Wall, David; Hargrove, Stacey; Fu, Thomas; Xu, George; Li, Li; Cote, Colette; Delwart, Eric; Li, Linlin; Hewlett, Indira; Simonyan, Vahan; Ragupathy, Viswanath; Alin, Voskanian-Kordi; Mermod, Nicolas; Hill, Christiane; Ottenwälder, Birgit; Richter, Daniel C.; Tehrani, Arman; Jacqueline, Weber-Lehmann; Cassart, Jean-Pol; Letellier, Carine; Vandeputte, Olivier; Ruelle, Jean-Louis; Deyati, Avisek; La Neve, Fabio; Modena, Chiara; Mee, Edward; Schepelmann, Silke; Preston, Mark; Minor, Philip; Eloit, Marc; Muth, Erika; Lamamy, Arnaud; Jagorel, Florence; Cheval, Justine; Anscombe, Catherine; Misra, Raju; Wooldridge, David; Gharbia, Saheer; Rose, Graham; Ng, Siemon H.S.; Charlebois, Robert L.; Gisonni-Lex, Lucy; Mallet, Laurent; Dorange, Fabien; Chiu, Charles; Naccache, Samia; Kellam, Paul; van der Hoek, Lia; Cotten, Matt; Mitchell, Christine; Baier, Brian S.; Sun, Wenping; Malicki, Heather D.

    2016-01-01

    Background Unbiased deep sequencing offers the potential for improved adventitious virus screening in vaccines and biotherapeutics. Successful implementation of such assays will require appropriate control materials to confirm assay performance and sensitivity. Methods A common reference material containing 25 target viruses was produced and 16 laboratories were invited to process it using their preferred adventitious virus detection assay. Results Fifteen laboratories returned results, obtained using a wide range of wet-lab and informatics methods. Six of 25 target viruses were detected by all laboratories, with the remaining viruses detected by 4–14 laboratories. Six non-target viruses were detected by three or more laboratories. Conclusion The study demonstrated that a wide range of methods are currently used for adventitious virus detection screening in biological products by deep sequencing and that they can yield significantly different results. This underscores the need for common reference materials to ensure satisfactory assay performance and enable comparisons between laboratories. PMID:26709640

  7. Water Mediates Recognition of DNA Sequence via Ionic Current Blockade in a Biological Nanopore.

    PubMed

    Bhattacharya, Swati; Yoo, Jejoong; Aksimentiev, Aleksei

    2016-04-26

    Electric field-driven translocation of DNA strands through biological nanopores has been shown to produce blockades of the nanopore ionic current that depend on the nucleotide composition of the strands. Coupling a biological nanopore MspA to a DNA processing enzyme has made DNA sequencing via measurement of ionic current blockades possible. Nevertheless, the physical mechanism enabling the DNA sequence readout has remained undetermined. Here, we report the results of all-atom molecular dynamics simulations that elucidated the physical mechanism of ionic current blockades in the biological nanopore MspA. We find that the amount of water displaced from the nanopore by the DNA strand determines the nanopore ionic current, whereas the steric and base-stacking properties of the DNA nucleotides determine the amount of water displaced. Unexpectedly, we find the effective force on DNA in MspA to undergo large fluctuations, which may produce insertion errors in the DNA sequence readout. PMID:27054820

  8. Multidomain Peptides: Sequence-Nanostructure Relationships and Biological Applications

    NASA Astrophysics Data System (ADS)

    Bakota, Erica Laraine

    2011-12-01

    Peptides are materials that, as a result of their polymeric nature, possess enormous versatility and customizability. Multidomain peptides are a class of peptides that self-assemble to form stable, cytocompatible hydrogels. They have an ABA block motif, in which the A block is composed of charged amino acids, such as lysine, and the B block consists of alternating hydrophilic and hydrophobic amino acids, such as glutamine and leucine. The B block forms a facial amphiphile that drives self-assembly. The charged A blocks simultaneously limit self-assembly and improve solubility. Self-assembly is triggered by charge screening of these charged amino acids, enabling the formation of beta-sheet fibers. The development of an extended nanofiber network can result in the formation of a hydrogel. Systematic modifications to both the A and B blocks were investigated, and it was found that sequence modifications have a large impact on peptide nanostructure and hydrogel rheology. The first modification examined is the substitution of amino acids within the hydrophilic positions of the B block. The second set of modifications investigated was the incorporation of aromatic amino acids in the B block. Finally, the charged block was varied to generate different net charges on the peptides, a change which impacted the ability to use these peptides in cell culture. Two applications of multidomain peptide nanofibers are explored, the first of which is the delivery of novel therapies in vivo. One multidomain peptide is able to form hydrogels that undergo shear-thinning and rapid recovery. This gel can be loaded with cytokines and growth factors that have been secreted by embryonic stem cells, and these molecules can be subsequently released in a therapeutic setting. Another application for multidomain peptide is their use as biocompatible surfactants. Single-walled carbon nanotubes have been widely investigated for their unique optical and electrical properties, but their solubility in

  9. Elucidation of the sequence of canine (pro)-calcitonin. A molecular biological and protein chemical approach.

    PubMed

    Mol, J A; Kwant, M M; Arnold, I C; Hazewinkel, H A

    1991-09-01

    From the canine thyroid gland a calcitonin (CT) immunoreactive peptide was purified by successive aqueous acid acetone extraction, gel filtration and HPLC. Gas-phase sequencing of the purified peptide showed that the first 25 amino acids had 65% sequence homology with the amino-terminus of the human CT prohormone. A canine cDNA library was then made from the thyroid gland. A plasmid was isolated containing a sequence that is homologous to part of exon 3, and the complete sequence of exon 4 of the human mRNA encoding preproCT. From this cDNA the amino acid sequence of canine CT is predicted. In comparison with well-known CT sequences of other species, the strongest homology exists with bovine, porcine and ovine CT. PMID:1758974

  10. Comparison of Whole-Genome Sequences from Two Colony Morphovars of Burkholderia pseudomallei

    PubMed Central

    Hsueh, Pei-Tan; Chen, Yao-Shen; Lin, Hsi-Hsu; Liu, Pei-Ju; Ni, Wen-Fan; Liu, Mei-Chun

    2015-01-01

    The entire genomes of two isogenic morphovars (vgh16W and vgh16R) of Burkholderia pseudomallei were sequenced. A comparison of the sequences from both strains indicates that they show 99.99% identity, are composed of 22 tandem repeated sequences with <100 bp of indels, and have 199 single-base variants. PMID:26472836

  11. nWayComp: A Tool for Universal Comparison of DNA and Protein Sequences

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The increasing number of whole genomic sequences of microorganisms has increased the complexity of genome-wide annotation and gene sequence comparison among multiple microorganisms. To address this problem, we developed nWayComp software that compares DNA and protein sequences of phylogenetically-r...

  12. Genomic Sequence Comparisons, 1987-2003 Final Report

    SciTech Connect

    George M. Church

    2004-07-29

    This project was to develop new DNA sequencing and RNA and protein quantitation methods and related genome annotation tools. The project began in 1987 with the development of multiplex sequencing (published in Science in 1988), and one of the first automated sequencing methods. This lead to the first commercial genome sequence in 1994 and to the establishment of the main commercial participants (GTC then Agencourt) in the public DOE/NIH genome project. In collaboration with GTC we contributed to one of the first complete DOE genome sequences, in 1997, that of Methanobacterium thermoautotropicum, a species of great relevance to energy-rich gas production.

  13. Direct Chloroplast Sequencing: Comparison of Sequencing Platforms and Analysis Tools for Whole Chloroplast Barcoding

    PubMed Central

    Brozynska, Marta; Furtado, Agnelo; Henry, Robert James

    2014-01-01

    Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina) and Ion Torrent (Life Technology) sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare). Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels) between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis. PMID:25329378

  14. The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution

    Technology Transfer Automated Retrieval System (TEKTRAN)

    As a major step toward understanding the biology and evolution of ruminants, the cattle genome was sequenced to ~7x coverage using a combined whole genome shotgun and BAC skim approach. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs found in seven mammalian...

  15. The genome sequence of taurine cattle: A window to ruminant biology and evolution

    Technology Transfer Automated Retrieval System (TEKTRAN)

    To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (ma...

  16. Evolutionary sequence comparisons using high-density oligonucleotide arrays.

    PubMed

    Hacia, J G; Makalowski, W; Edgemon, K; Erdos, M R; Robbins, C M; Fodor, S P; Brody, L C; Collins, F S

    1998-02-01

    We explored the utility of high-density oligonucleotide arrays (DNA chips) for obtaining sequence information from homologous genes in closely related species. Orthologues of the human BRCA1 exon 11, all approximately 3.4 kb in length and ranging from 98.2% to 83.5% nucleotide identity, were subjected to hybridization-based and conventional dideoxysequencing analysis. Retrospective guidelines for identifying high-fidelity hybridization-based sequence calls were formulated based upon dideoxysequencing results. Prospective application of these rules yielded base-calling with at least 98.8% accuracy over orthologous sequence tracts shown to have approximately 99% identity. For higher primate sequences with greater than 97% nucleotide identity, base-calling was made with at least 99.91% accuracy covering a minimum of 97% of the sequence. Using a second-tier confirmatory hybridization chip strategy, shown in several cases to confirm the identity of predicted sequence changes, the complete sequence of the chimpanzee, gorilla and orangutan orthologues should be deducible solely through hybridization-based methodologies. Analysis of less highly conserved orthologues can still identify conserved nucleotide tracts of at least 15 nucleotides and can provide useful information for designing primers. DNA-chip based assays can be a valuable new technology for obtaining high-throughput cost-effective sequence information from related genomes. PMID:9462745

  17. KeBABS: an R package for kernel-based analysis of biological sequences.

    PubMed

    Palme, Johannes; Hochreiter, Sepp; Bodenhofer, Ulrich

    2015-08-01

    KeBABS provides a powerful, flexible and easy to use framework for KE: rnel- B: ased A: nalysis of B: iological S: equences in R. It includes efficient implementations of the most important sequence kernels, also including variants that allow for taking sequence annotations and positional information into account. KeBABS seamlessly integrates three common support vector machine (SVM) implementations with a unified interface. It allows for hyperparameter selection by cross validation, nested cross validation and also features grouped cross validation. The biological interpretation of SVM models is supported by (1) the computation of weights of sequence patterns and (2) prediction profiles that highlight the contributions of individual sequence positions or sections. PMID:25812745

  18. Development and Assessment of a Horizontally Integrated Biological Sciences Course Sequence for Pharmacy Education

    PubMed Central

    Wright, Nicholas J.D.; Alston, Gregory L.

    2015-01-01

    Objective. To design and assess a horizontally integrated biological sciences course sequence and to determine its effectiveness in imparting the foundational science knowledge necessary to successfully progress through the pharmacy school curriculum and produce competent pharmacy school graduates. Design. A 2-semester course sequence integrated principles from several basic science disciplines: biochemistry, molecular biology, cellular biology, anatomy, physiology, and pathophysiology. Each is a 5-credit course taught 5 days per week, with 50-minute class periods. Assessment. Achievement of outcomes was determined with course examinations, student lecture, and an annual skills mastery assessment. The North American Pharmacist Licensure Examination (NAPLEX) results were used as an indicator of competency to practice pharmacy. Conclusion. Students achieved course objectives and program level outcomes. The biological sciences integrated course sequence was successful in providing students with foundational basic science knowledge required to progress through the pharmacy program and to pass the NAPLEX. The percentage of the school’s students who passed the NAPLEX was not statistically different from the national percentage. PMID:26430276

  19. Close sequence comparisons are sufficient to identify human cis-regulatory elements.

    PubMed

    Prabhakar, Shyam; Poulin, Francis; Shoukry, Malak; Afzal, Veena; Rubin, Edward M; Couronne, Olivier; Pennacchio, Len A

    2006-07-01

    Cross-species DNA sequence comparison is the primary method used to identify functional noncoding elements in human and other large genomes. However, little is known about the relative merits of evolutionarily close and distant sequence comparisons. To address this problem, we identified evolutionarily conserved noncoding regions in primate, mammalian, and more distant comparisons using a uniform approach (Gumby) that facilitates unbiased assessment of the impact of evolutionary distance on predictive power. We benchmarked computational predictions against previously identified cis-regulatory elements at diverse genomic loci and also tested numerous extremely conserved human-rodent sequences for transcriptional enhancer activity using an in vivo enhancer assay in transgenic mice. Human regulatory elements were identified with acceptable sensitivity (53%-80%) and true-positive rate (27%-67%) by comparison with one to five other eutherian mammals or six other simian primates. More distant comparisons (marsupial, avian, amphibian, and fish) failed to identify many of the empirically defined functional noncoding elements. Our results highlight the practical utility of close sequence comparisons, and the loss of sensitivity entailed by more distant comparisons. We derived an intuitive relationship between ancient and recent noncoding sequence conservation from whole-genome comparative analysis that explains most of the observations from empirical benchmarking. Lastly, we determined that, in addition to strength of conservation, genomic location and/or density of surrounding conserved elements must also be considered in selecting candidate enhancers for in vivo testing at embryonic time points. PMID:16769978

  20. Next-Generation Sequencing in the Understanding of Kaposi's Sarcoma-Associated Herpesvirus (KSHV) Biology.

    PubMed

    Strahan, Roxanne; Uppal, Timsy; Verma, Subhash C

    2016-01-01

    Non-Sanger-based novel nucleic acid sequencing techniques, referred to as Next-Generation Sequencing (NGS), provide a rapid, reliable, high-throughput, and massively parallel sequencing methodology that has improved our understanding of human cancers and cancer-related viruses. NGS has become a quintessential research tool for more effective characterization of complex viral and host genomes through its ever-expanding repertoire, which consists of whole-genome sequencing, whole-transcriptome sequencing, and whole-epigenome sequencing. These new NGS platforms provide a comprehensive and systematic genome-wide analysis of genomic sequences and a full transcriptional profile at a single nucleotide resolution. When combined, these techniques help unlock the function of novel genes and the related pathways that contribute to the overall viral pathogenesis. Ongoing research in the field of virology endeavors to identify the role of various underlying mechanisms that control the regulation of the herpesvirus biphasic lifecycle in order to discover potential therapeutic targets and treatment strategies. In this review, we have complied the most recent findings about the application of NGS in Kaposi's sarcoma-associated herpesvirus (KSHV) biology, including identification of novel genomic features and whole-genome KSHV diversities, global gene regulatory network profiling for intricate transcriptome analyses, and surveying of epigenetic marks (DNA methylation, modified histones, and chromatin remodelers) during de novo, latent, and productive KSHV infections. PMID:27043613

  1. A novel algorithm for detecting multiple covariance and clustering of biological sequences.

    PubMed

    Shen, Wei; Li, Yan

    2016-01-01

    Single genetic mutations are always followed by a set of compensatory mutations. Thus, multiple changes commonly occur in biological sequences and play crucial roles in maintaining conformational and functional stability. Although many methods are available to detect single mutations or covariant pairs, detecting non-synchronous multiple changes at different sites in sequences remains challenging. Here, we develop a novel algorithm, named Fastcov, to identify multiple correlated changes in biological sequences using an independent pair model followed by a tandem model of site-residue elements based on inter-restriction thinking. Fastcov performed exceptionally well at harvesting co-pairs and detecting multiple covariant patterns. By 10-fold cross-validation using datasets of different scales, the characteristic patterns successfully classified the sequences into target groups with an accuracy of greater than 98%. Moreover, we demonstrated that the multiple covariant patterns represent co-evolutionary modes corresponding to the phylogenetic tree, and provide a new understanding of protein structural stability. In contrast to other methods, Fastcov provides not only a reliable and effective approach to identify covariant pairs but also more powerful functions, including multiple covariance detection and sequence classification, that are most useful for studying the point and compensatory mutations caused by natural selection, drug induction, environmental pressure, etc. PMID:27451921

  2. A novel algorithm for detecting multiple covariance and clustering of biological sequences

    PubMed Central

    Shen, Wei; Li, Yan

    2016-01-01

    Single genetic mutations are always followed by a set of compensatory mutations. Thus, multiple changes commonly occur in biological sequences and play crucial roles in maintaining conformational and functional stability. Although many methods are available to detect single mutations or covariant pairs, detecting non-synchronous multiple changes at different sites in sequences remains challenging. Here, we develop a novel algorithm, named Fastcov, to identify multiple correlated changes in biological sequences using an independent pair model followed by a tandem model of site-residue elements based on inter-restriction thinking. Fastcov performed exceptionally well at harvesting co-pairs and detecting multiple covariant patterns. By 10-fold cross-validation using datasets of different scales, the characteristic patterns successfully classified the sequences into target groups with an accuracy of greater than 98%. Moreover, we demonstrated that the multiple covariant patterns represent co-evolutionary modes corresponding to the phylogenetic tree, and provide a new understanding of protein structural stability. In contrast to other methods, Fastcov provides not only a reliable and effective approach to identify covariant pairs but also more powerful functions, including multiple covariance detection and sequence classification, that are most useful for studying the point and compensatory mutations caused by natural selection, drug induction, environmental pressure, etc. PMID:27451921

  3. Comparison of simple sequence repeats in 19 Archaea.

    PubMed

    Trivedi, S

    2006-01-01

    All organisms that have been studied until now have been found to have differential distribution of simple sequence repeats (SSRs), with more SSRs in intergenic than in coding sequences. SSR distribution was investigated in Archaea genomes where complete chromosome sequences of 19 Archaea were analyzed with the program SPUTNIK to find di- to penta-nucleotide repeats. The number of repeats was determined for the complete chromosome sequences and for the coding and non-coding sequences. Different from what has been found for other groups of organisms, there is an abundance of SSRs in coding regions of the genome of some Archaea. Dinucleotide repeats were rare and CG repeats were found in only two Archaea. In general, trinucleotide repeats are the most abundant SSR motifs; however, pentanucleotide repeats are abundant in some Archaea. Some of the tetranucleotide and pentanucleotide repeat motifs are organism specific. In general, repeats are short and CG-rich repeats are present in Archaea having a CG-rich genome. Among the 19 Archaea, SSR density was not correlated with genome size or with optimum growth temperature. Pentanucleotide density had an inverse correlation with the CG content of the genome. PMID:17183484

  4. Use of gene sequence analyses and genome comparisons for yeast systematics

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Detection, identification, and classification of yeasts has undergone a major transformation in the past decade and a half following application of gene sequence analyses and genome comparisons. Development of a database (barcode) of easily determined gene sequences from domains 1 and 2 of large sub...

  5. An Efficient Machine Learning Approach To Low-Complexity Filtering In Biological Sequences

    SciTech Connect

    Barber, Christopher A; Oehmen, Christopher S

    2012-06-09

    Biological sequences contain low-complexity regions (LCRs) which produce superfluous matches in homology searches, and lead to slow execution of database search algorithms such as BLAST. These regions are efficiently identified by low-complexity filtering algorithms such as SDUST and SEG, which are included in the BLAST tool-suite. These algorithms target differing notions of complexity, so an algorithm which combines their sensitivities is pursued. A variety of features are derived from these algorithms, as well as a new filtering algorithm based on Lempel-Ziv complexity. Artificial sequences with known LCRs are used to train and evaluate an SVM classifier, which significantly outperforms the standalone filtering algorithms.

  6. Identifying and Mitigating Bias in Next-Generation Sequencing Methods for Chromatin Biology

    PubMed Central

    Meyer, Clifford A.; Liu, X. Shirley

    2015-01-01

    Next generation sequencing (NGS) technologies have been used in diverse ways to investigate facets of chromatin biology by identifying genomic loci that are bound by transcription factors, occupied by nucleosomes, accessible to nuclease cleavage, or physically interact with remote genomic loci. Reaching sound biological conclusions from such NGS enrichment profiles, however, requires that many potential biases be taken into account. In this Review we discuss common ways in which bias may be introduced into NGS chromatin profiling data, ways in which these biases can be diagnosed, and analytical techniques to mitigate their effect. PMID:25223782

  7. Phylogenetic relationships of Cryptosporidium determined by ribosomal RNA sequence comparison.

    PubMed

    Johnson, A M; Fielke, R; Lumb, R; Baverstock, P R

    1990-04-01

    Reverse transcription of total cellular RNA was used to obtain a partial sequence of the small subunit ribosomal RNA of Cryptosporidium, a protist currently placed in the phylum Apicomplexa. The semi-conserved regions were aligned with homologous sequences in a range of other eukaryotes, and the evolutionary relationships of Cryptosporidium were determined by two different methods of phylogenetic analysis. The prokaryotes Escherichia coli and Halobacterium cuti were included as outgroups. The results do not show an especially close relationship of Cryptosporidium to other members of the phylum Apicomplexa. PMID:2332273

  8. A special-purpose computer for exploring similar biological sequences: Bioler-2 with multi-pipeline and multi-sequence architecture

    NASA Astrophysics Data System (ADS)

    Sugie, Takashige; Ito, Tomoyoshi; Ebisuzaki, Toshikazu

    2004-09-01

    We developed a special-purpose computer for exploring similar biological sequences by Smith-Waterman method, Bioler-2 (BIOLogical sequence explorER). It can compute a complete similarity score between two biological sequences which have less than 10,000 characters. We integrated the system on two FPGA (Field Programmable Gate Array) chips, XC2V6000 (6M gates) by Xilinx corporation. They are mounted on the 32 bit PCI (Peripheral Component Interconnect) bus card which is connected to the host computer. The performance of Bioler-2 is 142 times faster than a general-purpose computer installed the Linux kernel version 2.4.25 compiled by gcc (Gnu Compiler Collection) version 3.3.3 with Pentium4 enabled hyper threading technology at 3.2 GHz. Bioler-2 is effective in the biological sequence analysis.

  9. Sequence-based genotyping clarifies conflicting historical morphometric and biological data for 5 Eimeria species infecting turkeys.

    PubMed

    El-Sherry, S; Ogedengbe, M E; Hafeez, M A; Sayf-Al-Din, M; Gad, N; Barta, J R

    2015-02-01

    Unlike with Eimeria species infecting chickens, specific identification and nomenclature of Eimeria species infecting turkeys is complicated, and in the absence of molecular data, imprecise. In an attempt to reconcile contradictory data reported on oocyst morphometrics and biological descriptions of various Eimeria species infecting turkey, we established single oocyst derived lines of 5 important Eimeria species infecting turkeys, Eimeria meleagrimitis (USMN08-01 strain), Eimeria adenoeides (Guelph strain), Eimeria gallopavonis (Weybridge strain), Eimeria meleagridis (USAR97-01 strain), and Eimeria dispersa (Briston strain). Short portions (514 bp) of mitochondrial cytochrome c oxidase subunit I gene (mt COI) from each were amplified and sequenced. Comparison of these sequences showed sufficient species-specific sequence variation to recommend these short mt COI sequences as species-specific markers. Uniformity of oocyst features (dimensions and oocyst structure) of each pure line was observed. Additional morphological features of the oocysts of these species are described as useful for the microscopic differentiation of these Eimeria species. Combined molecular and morphometric data on these single species lines compared with the original species descriptions and more recent data have helped to clarify some confusing, and sometimes conflicting, features associated with these Eimeria spp. For example, these new data suggest that the KCH and KR strains of E. adenoeides reported previously represent 2 distinct species, E. adenoeides and E. meleagridis, respectively. Likewise, analysis of the Weybridge strain of E. adenoeides, which has long been used as a reference strain in various studies conducted on the pathogenicity of E. adenoeides, indicates that this coccidium is actually a strain of E. gallopavonis. We highly recommend mt COI sequence-based genotyping be incorporated into all studies using Eimeria spp. of turkeys to confirm species identifications and so

  10. Networking Biology: The Origins of Sequence-Sharing Practices in Genomics.

    PubMed

    Stevens, Hallam

    2015-10-01

    The wide sharing of biological data, especially nucleotide sequences, is now considered to be a key feature of genomics. Historians and sociologists have attempted to account for the rise of this sharing by pointing to precedents in model organism communities and in natural history. This article supplements these approaches by examining the role that electronic networking technologies played in generating the specific forms of sharing that emerged in genomics. The links between early computer users at the Stanford Artificial Intelligence Laboratory in the 1960s, biologists using local computer networks in the 1970s, and GenBank in the 1980s, show how networking technologies carried particular practices of communication, circulation, and data distribution from computing into biology. In particular, networking practices helped to transform sequences themselves into objects that had value as a community resource. PMID:26593711

  11. A COMPARISON OF FIXED SEQUENCE AND OPTIONAL BRANCHING AUTIOINSTRUCTIONAL METHODS.

    ERIC Educational Resources Information Center

    MELARAGNO, RALPH J.; AND OTHERS

    HYPOTHESES RELATED TO PROCEDURES PERMITTING STUDENTS TO BRANCH AT THEIR OWN OPTION WERE TESTED. THE FIRST HYPOTHESIS WAS THAT A FIXED-SEQUENCE PROGRAM WOULD BE LESS EFFECTIVE THAN THE SAME ITEMS CAST AS STATEMENTS IN TEXTBOOK FORMAT THROUGH WHICH THE STUDENT COULD SKIP AT HIS OWN OPTION. THE SECOND HYPOTHESIS WAS THAT PERFORMANCE ON A PROGRAM…

  12. Model annotation for synthetic biology: automating model to nucleotide sequence conversion

    PubMed Central

    Misirli, Goksel; Hallinan, Jennifer S.; Yu, Tommy; Lawson, James R.; Wimalaratne, Sarala M.; Cooling, Michael T.; Wipat, Anil

    2011-01-01

    Motivation: The need for the automated computational design of genetic circuits is becoming increasingly apparent with the advent of ever more complex and ambitious synthetic biology projects. Currently, most circuits are designed through the assembly of models of individual parts such as promoters, ribosome binding sites and coding sequences. These low level models are combined to produce a dynamic model of a larger device that exhibits a desired behaviour. The larger model then acts as a blueprint for physical implementation at the DNA level. However, the conversion of models of complex genetic circuits into DNA sequences is a non-trivial undertaking due to the complexity of mapping the model parts to their physical manifestation. Automating this process is further hampered by the lack of computationally tractable information in most models. Results: We describe a method for automatically generating DNA sequences from dynamic models implemented in CellML and Systems Biology Markup Language (SBML). We also identify the metadata needed to annotate models to facilitate automated conversion, and propose and demonstrate a method for the markup of these models using RDF. Our algorithm has been implemented in a software tool called MoSeC. Availability: The software is available from the authors' web site http://research.ncl.ac.uk/synthetic_biology/downloads.html. Contact: anil.wipat@ncl.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21296753

  13. Marine organism cell biology and regulatory sequence discoveryin comparative functional genomics.

    PubMed

    Barnes, David W; Mattingly, Carolyn J; Parton, Angela; Dowell, Lori M; Bayne, Christopher J; Forrest, John N

    2004-10-01

    The use of bioinformatics to integrate phenotypic and genomic data from mammalian models is well established as a means of understanding human biology and disease. Beyond direct biomedical applications of these approaches in predicting structure-function relationships between coding sequences and protein activities, comparative studies also promote understanding of molecular evolution and the relationship between genomic sequence and morphological and physiological specialization. Recently recognized is the potential of comparative studies to identify functionally significant regulatory regions and to generate experimentally testable hypotheses that contribute to understanding mechanisms that regulate gene expression, including transcriptional activity, alternative splicing and transcript stability. Functional tests of hypotheses generated by computational approaches require experimentally tractable in vitro systems, including cell cultures. Comparative sequence analysis strategies that use genomic sequences from a variety of evolutionarily diverse organisms are critical for identifying conserved regulatory motifs in the 5'-upstream, 3'-downstream and introns of genes. Genomic sequences and gene orthologues in the first aquatic vertebrate and protovertebrate organisms to be fully sequenced (Fugu rubripes, Ciona intestinalis, Tetraodon nigroviridis, Danio rerio) as well as in the elasmobranchs, spiny dogfish shark (Squalus acanthias) and little skate (Raja erinacea), and marine invertebrate models such as the sea urchin (Strongylocentrotus purpuratus) are valuable in the prediction of putative genomic regulatory regions. Cell cultures have been derived for these and other model species. Data and tools resulting from these kinds of studies will contribute to understanding transcriptional regulation of biomedically important genes and provide new avenues for medical therapeutics and disease prevention. PMID:19003267

  14. Structuring temporal sequences: comparison of models and factors of complexity.

    PubMed

    Essens, P

    1995-05-01

    Two stages for structuring tone sequences have been distinguished by Povel and Essens (1985). In the first, a mental clock segments a sequence into equal time units (clock model); in the second, intervals are specified in terms of subdivisions of these units. The present findings support the clock model in that it predicts human performance better than three other algorithmic models. Two further experiments in which clock and subdivision characteristics were varied did not support the hypothesized effect of the nature of the subdivisions on complexity. A model focusing on the variations in the beat-anchored envelopes of the tone clusters was proposed. Errors in reproduction suggest a dual-code representation comprising temporal and figural characteristics. The temporal part of the representation is based on the clock model but specifies, in addition, the metric of the level below the clock. The beat-tone-cluster envelope concept was proposed to specify the figural part. PMID:7596749

  15. A National Comparison of Biochemistry and Molecular Biology Capstone Experiences

    ERIC Educational Resources Information Center

    Aguanno, Ann; Mertz, Pamela; Martin, Debra; Bell, Ellis

    2015-01-01

    Recognizing the increasingly integrative nature of the molecular life sciences, the "American Society for Biochemistry and Molecular Biology" (ASBMB) recommends that Biochemistry and Molecular Biology (BMB) programs develop curricula based on concepts, content, topics, and expected student outcomes, rather than courses. To that end,…

  16. 3D reconstruction software comparison for short sequences

    NASA Astrophysics Data System (ADS)

    Strupczewski, Adam; Czupryński, BłaŻej

    2014-11-01

    Large scale multiview reconstruction is recently a very popular area of research. There are many open source tools that can be downloaded and run on a personal computer. However, there are few, if any, comparisons between all the available software in terms of accuracy on small datasets that a single user can create. The typical datasets for testing of the software are archeological sites or cities, comprising thousands of images. This paper presents a comparison of currently available open source multiview reconstruction software for small datasets. It also compares the open source solutions with a simple structure from motion pipeline developed by the authors from scratch with the use of OpenCV and Eigen libraries.

  17. High school biology students' participation in a year-long sequence of analogical activities: The relationship of development of analogical thought to student learning and classroom interactions

    NASA Astrophysics Data System (ADS)

    Hackney, Marcella Wichser

    1999-10-01

    This research explored development of analogical thought through high school biology students' participation in a year-long sequence of analogical activities. Analogizing involves: selecting a familiar analog; mapping similarities and differences between the analog and less familiar target; making inferences from the analogy; evaluating validity of the inferences; and ultimately, understanding the biological target (Holyoak & Thagard, 1995). This investigation considered: student development of independence in learning through analogical thought, student learning of biology, the relationship between development of students' analogical thinking and students' learning of biology, and the quality of student interactions in the classroom This researcher, as teacher participant, used three approaches for teaching by analogy: traditional didactic, teacher-guided, and analogy-generated-by-the-student (Zeitoun, 1983). Within cooperative groups, students in one honors biology class actively engaged in research-based analogical activities that targeted specific biological topics. Two honors biology classes participated in similar, but nonanalogical activities that targeted the same biological topics. This two-class comparison group permitted analytical separation of effects of the analogical emphasis from the effects of biology content and activity-based learning. Data collected included: fieldnotes of researcher observations, student responses to guidesheets, tapes of group interactions, student products, student perceptions survey evaluations, ratings of students' expressed analogical development, pre- and posttest scores on a biology achievement test, essay responses, and selected student interviews. These data formed the basis for researcher qualitative analysis, augmented by quantitative techniques. Through participation in the sequence of analogical activities, students developed their abilities to engage in the processes of analogical thinking, but attained different

  18. Comparison of inoculation strategies to assess biological interactions between sorghum lines and fungi

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Comparison of inoculation strategies to assess biological interactions between sorghum lines and fungi Bioassays were assessed for utility to characterize fungal species associated with sorghum or to screen germplasm for advancement in breeding programs. Isolates of Alternaria alternata, Fusarium e...

  19. Close Sequence Comparisons are Sufficient to Identify Humancis-Regulatory Elements

    SciTech Connect

    Prabhakar, Shyam; Poulin, Francis; Shoukry, Malak; Afzal, Veena; Rubin, Edward M.; Couronne, Olivier; Pennacchio, Len A.

    2005-12-01

    Cross-species DNA sequence comparison is the primary method used to identify functional noncoding elements in human and other large genomes. However, little is known about the relative merits of evolutionarily close and distant sequence comparisons, due to the lack of a universal metric for sequence conservation, and also the paucity of empirically defined benchmark sets of cis-regulatory elements. To address this problem, we developed a general-purpose algorithm (Gumby) that detects slowly-evolving regions in primate, mammalian and more distant comparisons without requiring adjustment of parameters, and ranks conserved elements by P-value using Karlin-Altschul statistics. We benchmarked Gumby predictions against previously identified cis-regulatory elements at diverse genomic loci, and also tested numerous extremely conserved human-rodent sequences for transcriptional enhancer activity using reporter-gene assays in transgenic mice. Human regulatory elements were identified with acceptable sensitivity and specificity by comparison with 1-5 other eutherian mammals or 6 other simian primates. More distant comparisons (marsupial, avian, amphibian and fish) failed to identify many of the empirically defined functional noncoding elements. We derived an intuitive relationship between ancient and recent noncoding sequence conservation from whole genome comparative analysis, which explains some of these findings. Lastly, we determined that, in addition to strength of conservation, genomic location and/or density of surrounding conserved elements must also be considered in selecting candidate enhancers for testing at embryonic time points.

  20. Reconstruction of an ancestral Yersinia pestis genome and comparison with an ancient sequence

    PubMed Central

    2015-01-01

    Background We propose the computational reconstruction of a whole bacterial ancestral genome at the nucleotide scale, and its validation by a sequence of ancient DNA. This rare possibility is offered by an ancient sequence of the late middle ages plague agent. It has been hypothesized to be ancestral to extant Yersinia pestis strains based on the pattern of nucleotide substitutions. But the dynamics of indels, duplications, insertion sequences and rearrangements has impacted all genomes much more than the substitution process, which makes the ancestral reconstruction task challenging. Results We use a set of gene families from 13 Yersinia species, construct reconciled phylogenies for all of them, and determine gene orders in ancestral species. Gene trees integrate information from the sequence, the species tree and gene order. We reconstruct ancestral sequences for ancestral genic and intergenic regions, providing nearly a complete genome sequence for the ancestor, containing a chromosome and three plasmids. Conclusion The comparison of the ancestral and ancient sequences provides a unique opportunity to assess the quality of ancestral genome reconstruction methods. But the quality of the sequencing and assembly of the ancient sequence can also be questioned by this comparison. PMID:26450112

  1. Correlation between Protein Sequence Similarity and Crystallization Reagents in the Biological Macromolecule Crystallization Database

    PubMed Central

    Lu, Hui-Meng; Yin, Da-Chuan; Liu, Yong-Ming; Guo, Wei-Hong; Zhou, Ren-Bin

    2012-01-01

    The protein structural entries grew far slower than the sequence entries. This is partly due to the bottleneck in obtaining diffraction quality protein crystals for structural determination using X-ray crystallography. The first step to achieve protein crystallization is to find out suitable chemical reagents. However, it is not an easy task. Exhausting trial and error tests of numerous combinations of different reagents mixed with the protein solution are usually necessary to screen out the pursuing crystallization conditions. Therefore, any attempts to help find suitable reagents for protein crystallization are helpful. In this paper, an analysis of the relationship between the protein sequence similarity and the crystallization reagents according to the information from the existing databases is presented. We extracted information of reagents and sequences from the Biological Macromolecule Crystallization Database (BMCD) and the Protein Data Bank (PDB) database, classified the proteins into different clusters according to the sequence similarity, and statistically analyzed the relationship between the sequence similarity and the crystallization reagents. The results showed that there is a pronounced positive correlation between them. Therefore, according to the correlation, prediction of feasible chemical reagents that are suitable to be used in crystallization screens for a specific protein is possible. PMID:22949812

  2. Basal Murphy belt and Chilhowee Group -- Sequence stratigraphic comparison

    SciTech Connect

    Aylor, J.G. Jr. . Dept. of Geology)

    1994-03-01

    The lower Murphy belt in the central western Blue Ridge is interpreted to be correlative to the Early Cambrian Chilhowee Group of the westernmost Blue Ridge and Appalachian fold and thrust belt. Basal Murphy belt depositional sequence stratigraphy represents a second-order, type-2 transgressive systems tract initiated with deposition of lowstand turbidites of the Dean Formation. These transgressive deposits of the Nantahala and Brasstown Formations are interpreted as middle to outer continental shelf deposits. Cyclic and stacked third-order regressive, coarsening upwards sequences of the Nantahala Formation display an overall increase in feldspar content stratigraphically upsection. These transgressive siliciclastic deposits are interpreted to be conformably overlain by a carbonate highstand systems tract of the Murphy Marble. Palinspastic reconstruction indicates that the Nantahala and Brasstown Formations possibly represent a basinward extension of up to 3 km thick siliciclastic wedge. The wedge tapers to the southwest along the strike of the Murphy belt at 10[degree] and thins northwestward to 2 km in the Tennessee depocenter where it is represented by the Chilhowee Group. The Murphy belt basin is believed to represent a transitional rift-to-drift facies deposited on the lower plate of the southern Blue Ridge rift zone.

  3. Molecular evolution of the Escherichia coli chromosome. IV. Sequence comparisons.

    PubMed

    Milkman, R; Bridges, M M

    1993-03-01

    DNA sequences have been compared in a 4,400-bp region for Escherichia coli K12 and 36 ECOR strains. Discontinuities in degree of similarity, previously inferred, are confirmed in detail. Three clonal frames are described on the basis of the present local high-resolution data, as well as previous analyses of restriction fragment length polymorphism (RFLP) and of multilocus enzyme electrophoresis (MLEE) covering small regions more widely dispersed on the chromosome. These three approaches show important consistency. The data illustrate the fact that, in the limited context of intraspecific genomic sequence variation, clonality and homology are synonymous. Two estimable quantitative properties are defined: recency of common ancestry (the reciprocal of the log10 of the number of generations since the most recent common ancestor), and the number of nucleotide pairs over which a given recency of common ancestry applies. In principle, these parameters are measures of the degree and physical extent of homology. The small size of apparent recombinational replacements, together with the observation that they occasionally occur in discontinuous series, raises the question of whether they result from the superimposition of replacements of much larger size (as expected from an elementary interpretation of conjugation and transduction in experimental E. coli systems) or via an alternative mechanism. Length polymorphisms of several sorts are described. PMID:8095913

  4. Comparison of Sequencing (Barcode Region) and Sequence-Tagged-Site PCR for Blastocystis Subtyping

    PubMed Central

    2013-01-01

    Blastocystis is the most common nonfungal microeukaryote of the human intestinal tract and comprises numerous subtypes (STs), nine of which have been found in humans (ST1 to ST9). While efforts continue to explore the relationship between human health status and subtypes, no consensus regarding subtyping methodology exists. It has been speculated that differences detected in subtype distribution in various cohorts may to some extent reflect different approaches. Blastocystis subtypes have been determined primarily in one of two ways: (i) sequencing of small subunit rRNA gene (SSU-rDNA) PCR products and (ii) PCR with subtype-specific sequence-tagged-site (STS) diagnostic primers. Here, STS primers were evaluated against a panel of samples (n = 58) already subtyped by SSU-rDNA sequencing (barcode region), including subtypes for which STS primers are not available, and a small panel of DNAs from four other eukaryotes often present in feces (n = 18). Although the STS primers appeared to be highly specific, their sensitivity was only moderate, and the results indicated that some infections may go undetected when this method is used. False-negative STS results were not linked exclusively to certain subtypes or alleles, and evidence of substantial genetic variation in STS loci was obtained. Since the majority of DNAs included here were extracted from feces, it is possible that STS primers may generally work better with DNAs extracted from Blastocystis cultures. In conclusion, due to its higher applicability and sensitivity, and since sequence information is useful for other forms of research, SSU-rDNA barcoding is recommended as the method of choice for Blastocystis subtyping. PMID:23115257

  5. Nucleotide sequence of a cloned woodchuck hepatitis virus genome: comparison with the hepatitis B virus sequence.

    PubMed Central

    Galibert, F; Chen, T N; Mandart, E

    1982-01-01

    The complete nucleotide sequence of a woodchuck hepatitis virus genome cloned in Escherichia coli was determined by the method of Maxam and Gilbert. This sequence was found to be 3,308 nucleotides long. Potential ATG initiator triplets and nonsense codons were identified and used to locate regions with a substantial coding capacity. A striking similarity was observed between the organization of human hepatitis B virus and woodchuck hepatitis virus. Nucleotide sequences of these open regions in the woodchuck virus were compared with corresponding regions present in hepatitis B virus. This allowed the location of four viral genes on the L strand and indicated the absence of protein coded by the S strand. Evolution rates of the various parts of the genome as well as of the four different proteins coded by hepatitis B virus and woodchuck hepatitis virus were compared. These results indicated that: (i) the core protein has evolved slightly less rapidly than the other proteins; and (ii) when a region of DNA codes for two different proteins, there is less freedom for the DNA to evolve and, moreover, one of the proteins can evolve more rapidly than the other. A hairpin structure, very well conserved in the two genomes, was located in the only region devoid of coding function, suggesting the location of the origin of replication of the viral DNA. Images PMID:7086958

  6. Secure distributed genome analysis for GWAS and sequence comparison computation

    PubMed Central

    2015-01-01

    Background The rapid increase in the availability and volume of genomic data makes significant advances in biomedical research possible, but sharing of genomic data poses challenges due to the highly sensitive nature of such data. To address the challenges, a competition for secure distributed processing of genomic data was organized by the iDASH research center. Methods In this work we propose techniques for securing computation with real-life genomic data for minor allele frequency and chi-squared statistics computation, as well as distance computation between two genomic sequences, as specified by the iDASH competition tasks. We put forward novel optimizations, including a generalization of a version of mergesort, which might be of independent interest. Results We provide implementation results of our techniques based on secret sharing that demonstrate practicality of the suggested protocols and also report on performance improvements due to our optimization techniques. Conclusions This work describes our techniques, findings, and experimental results developed and obtained as part of iDASH 2015 research competition to secure real-life genomic computations and shows feasibility of securely computing with genomic data in practice. PMID:26733307

  7. Biologically inspired multilevel approach for multiple moving targets detection from airborne forward-looking infrared sequences.

    PubMed

    Li, Yansheng; Tan, Yihua; Li, Hang; Li, Tao; Tian, Jinwen

    2014-04-01

    In this paper, a biologically inspired multilevel approach for simultaneously detecting multiple independently moving targets from airborne forward-looking infrared (FLIR) sequences is proposed. Due to the moving platform, low contrast infrared images, and nonrepeatability of the target signature, moving targets detection from FLIR sequences is still an open problem. Avoiding six parameter affine or eight parameter planar projective transformation matrix estimation of two adjacent frames, which are utilized by existing moving targets detection approaches to cope with the moving infrared camera and have become the bottleneck for the further elevation of the moving targets detection performance, the proposed moving targets detection approach comprises three sequential modules: motion perception for efficiently extracting motion cues, attended motion views extraction for coarsely localizing moving targets, and appearance perception in the local attended motion views for accurately detecting moving targets. Experimental results demonstrate that the proposed approach is efficient and outperforms the compared state-of-the-art approaches. PMID:24695135

  8. The Genome Sequence of Taurine Cattle: A window to ruminant biology and evolution

    PubMed Central

    Elsik, Christine G.; Tellam, Ross L.; Worley, Kim C.

    2010-01-01

    To understand the biology and evolution of ruminants, the cattle genome was sequenced to ∼7× coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1,217 are absent or undetected in non-eutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides an enabling resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production. PMID:19390049

  9. The genome sequence of taurine cattle: a window to ruminant biology and evolution.

    PubMed

    Elsik, Christine G; Tellam, Ross L; Worley, Kim C; Gibbs, Richard A; Muzny, Donna M; Weinstock, George M; Adelson, David L; Eichler, Evan E; Elnitski, Laura; Guigó, Roderic; Hamernik, Debora L; Kappes, Steve M; Lewin, Harris A; Lynn, David J; Nicholas, Frank W; Reymond, Alexandre; Rijnkels, Monique; Skow, Loren C; Zdobnov, Evgeny M; Schook, Lawrence; Womack, James; Alioto, Tyler; Antonarakis, Stylianos E; Astashyn, Alex; Chapple, Charles E; Chen, Hsiu-Chuan; Chrast, Jacqueline; Câmara, Francisco; Ermolaeva, Olga; Henrichsen, Charlotte N; Hlavina, Wratko; Kapustin, Yuri; Kiryutin, Boris; Kitts, Paul; Kokocinski, Felix; Landrum, Melissa; Maglott, Donna; Pruitt, Kim; Sapojnikov, Victor; Searle, Stephen M; Solovyev, Victor; Souvorov, Alexandre; Ucla, Catherine; Wyss, Carine; Anzola, Juan M; Gerlach, Daniel; Elhaik, Eran; Graur, Dan; Reese, Justin T; Edgar, Robert C; McEwan, John C; Payne, Gemma M; Raison, Joy M; Junier, Thomas; Kriventseva, Evgenia V; Eyras, Eduardo; Plass, Mireya; Donthu, Ravikiran; Larkin, Denis M; Reecy, James; Yang, Mary Q; Chen, Lin; Cheng, Ze; Chitko-McKown, Carol G; Liu, George E; Matukumalli, Lakshmi K; Song, Jiuzhou; Zhu, Bin; Bradley, Daniel G; Brinkman, Fiona S L; Lau, Lilian P L; Whiteside, Matthew D; Walker, Angela; Wheeler, Thomas T; Casey, Theresa; German, J Bruce; Lemay, Danielle G; Maqbool, Nauman J; Molenaar, Adrian J; Seo, Seongwon; Stothard, Paul; Baldwin, Cynthia L; Baxter, Rebecca; Brinkmeyer-Langford, Candice L; Brown, Wendy C; Childers, Christopher P; Connelley, Timothy; Ellis, Shirley A; Fritz, Krista; Glass, Elizabeth J; Herzig, Carolyn T A; Iivanainen, Antti; Lahmers, Kevin K; Bennett, Anna K; Dickens, C Michael; Gilbert, James G R; Hagen, Darren E; Salih, Hanni; Aerts, Jan; Caetano, Alexandre R; Dalrymple, Brian; Garcia, Jose Fernando; Gill, Clare A; Hiendleder, Stefan G; Memili, Erdogan; Spurlock, Diane; Williams, John L; Alexander, Lee; Brownstein, Michael J; Guan, Leluo; Holt, Robert A; Jones, Steven J M; Marra, Marco A; Moore, Richard; Moore, Stephen S; Roberts, Andy; Taniguchi, Masaaki; Waterman, Richard C; Chacko, Joseph; Chandrabose, Mimi M; Cree, Andy; Dao, Marvin Diep; Dinh, Huyen H; Gabisi, Ramatu Ayiesha; Hines, Sandra; Hume, Jennifer; Jhangiani, Shalini N; Joshi, Vandita; Kovar, Christie L; Lewis, Lora R; Liu, Yih-Shin; Lopez, John; Morgan, Margaret B; Nguyen, Ngoc Bich; Okwuonu, Geoffrey O; Ruiz, San Juana; Santibanez, Jireh; Wright, Rita A; Buhay, Christian; Ding, Yan; Dugan-Rocha, Shannon; Herdandez, Judith; Holder, Michael; Sabo, Aniko; Egan, Amy; Goodell, Jason; Wilczek-Boney, Katarzyna; Fowler, Gerald R; Hitchens, Matthew Edward; Lozado, Ryan J; Moen, Charles; Steffen, David; Warren, James T; Zhang, Jingkun; Chiu, Readman; Schein, Jacqueline E; Durbin, K James; Havlak, Paul; Jiang, Huaiyang; Liu, Yue; Qin, Xiang; Ren, Yanru; Shen, Yufeng; Song, Henry; Bell, Stephanie Nicole; Davis, Clay; Johnson, Angela Jolivet; Lee, Sandra; Nazareth, Lynne V; Patel, Bella Mayurkumar; Pu, Ling-Ling; Vattathil, Selina; Williams, Rex Lee; Curry, Stacey; Hamilton, Cerissa; Sodergren, Erica; Wheeler, David A; Barris, Wes; Bennett, Gary L; Eggen, André; Green, Ronnie D; Harhay, Gregory P; Hobbs, Matthew; Jann, Oliver; Keele, John W; Kent, Matthew P; Lien, Sigbjørn; McKay, Stephanie D; McWilliam, Sean; Ratnakumar, Abhirami; Schnabel, Robert D; Smith, Timothy; Snelling, Warren M; Sonstegard, Tad S; Stone, Roger T; Sugimoto, Yoshikazu; Takasuga, Akiko; Taylor, Jeremy F; Van Tassell, Curtis P; Macneil, Michael D; Abatepaulo, Antonio R R; Abbey, Colette A; Ahola, Virpi; Almeida, Iassudara G; Amadio, Ariel F; Anatriello, Elen; Bahadue, Suria M; Biase, Fernando H; Boldt, Clayton R; Carroll, Jeffery A; Carvalho, Wanessa A; Cervelatti, Eliane P; Chacko, Elsa; Chapin, Jennifer E; Cheng, Ye; Choi, Jungwoo; Colley, Adam J; de Campos, Tatiana A; De Donato, Marcos; Santos, Isabel K F de Miranda; de Oliveira, Carlo J F; Deobald, Heather; Devinoy, Eve; Donohue, Kaitlin E; Dovc, Peter; Eberlein, Annett; Fitzsimmons, Carolyn J; Franzin, Alessandra M; Garcia, Gustavo R; Genini, Sem; Gladney, Cody J; Grant, Jason R; Greaser, Marion L; Green, Jonathan A; Hadsell, Darryl L; Hakimov, Hatam A; Halgren, Rob; Harrow, Jennifer L; Hart, Elizabeth A; Hastings, Nicola; Hernandez, Marta; Hu, Zhi-Liang; Ingham, Aaron; Iso-Touru, Terhi; Jamis, Catherine; Jensen, Kirsty; Kapetis, Dimos; Kerr, Tovah; Khalil, Sari S; Khatib, Hasan; Kolbehdari, Davood; Kumar, Charu G; Kumar, Dinesh; Leach, Richard; Lee, Justin C-M; Li, Changxi; Logan, Krystin M; Malinverni, Roberto; Marques, Elisa; Martin, William F; Martins, Natalia F; Maruyama, Sandra R; Mazza, Raffaele; McLean, Kim L; Medrano, Juan F; Moreno, Barbara T; Moré, Daniela D; Muntean, Carl T; Nandakumar, Hari P; Nogueira, Marcelo F G; Olsaker, Ingrid; Pant, Sameer D; Panzitta, Francesca; Pastor, Rosemeire C P; Poli, Mario A; Poslusny, Nathan; Rachagani, Satyanarayana; Ranganathan, Shoba; Razpet, Andrej; Riggs, Penny K; Rincon, Gonzalo; Rodriguez-Osorio, Nelida; Rodriguez-Zas, Sandra L; Romero, Natasha E; Rosenwald, Anne; Sando, Lillian; Schmutz, Sheila M; Shen, Libing; Sherman, Laura; Southey, Bruce R; Lutzow, Ylva Strandberg; Sweedler, Jonathan V; Tammen, Imke; Telugu, Bhanu Prakash V L; Urbanski, Jennifer M; Utsunomiya, Yuri T; Verschoor, Chris P; Waardenberg, Ashley J; Wang, Zhiquan; Ward, Robert; Weikard, Rosemarie; Welsh, Thomas H; White, Stephen N; Wilming, Laurens G; Wunderlich, Kris R; Yang, Jianqi; Zhao, Feng-Qi

    2009-04-24

    To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production. PMID:19390049

  10. Biology of Treponema pallidum: correlation of functional activities with genome sequence data.

    PubMed

    Norris, S J; Cox, D L; Weinstock, G M

    2001-01-01

    Aspects of the biology of T. pallidum subsp. pallidum, the agent of syphilis, are examined in the context of a century of experimental studies and the recently determined genome sequence. T. pallidum and a group of closely related pathogenic spirochetes have evolved to become highly invasive, persistent pathogens with little toxigenic activity and an inability to survive outside the mammalian host. Analysis of the genome sequence confirms morphologic studies indicating the lack of lipopolysaccharide and lipid biosynthesis mechanisms, as well as a paucity of outer membrane protein candidates. The metabolic capabilities and adaptability of T. pallidum are minimal, and this relative deficiency is reflected by the absence of many pathways, including the tricarboxylic acid cycle, components of oxidative phosphorylation, and most biosynthetic pathways. Although multiplication of T. pallidum has been obtained in a tissue culture system, continuous in vitro culture has not been achieved. The balance of oxygen utilization and toxicity is key to the survival and growth of T. pallidum, and the genome sequence reveals a similarity to lactic acid bacteria that may be useful in understanding this relationship. The identification of relatively few genes potentially involved in pathogenesis reflects our lack of understanding of invasive pathogens relative to toxigenic organisms. The genome sequence will provide useful raw data for additional functional studies on the structure, metabolism, and pathogenesis of this enigmatic organism. PMID:11200228

  11. Uniform Accuracy of the Maximum Likelihood Estimates for Probabilistic Models of Biological Sequences

    PubMed Central

    Ekisheva, Svetlana

    2010-01-01

    Probabilistic models for biological sequences (DNA and proteins) have many useful applications in bioinformatics. Normally, the values of parameters of these models have to be estimated from empirical data. However, even for the most common estimates, the maximum likelihood (ML) estimates, properties have not been completely explored. Here we assess the uniform accuracy of the ML estimates for models of several types: the independence model, the Markov chain and the hidden Markov model (HMM). Particularly, we derive rates of decay of the maximum estimation error by employing the measure concentration as well as the Gaussian approximation, and compare these rates. PMID:21318122

  12. Customized care 2020: how medical sequencing and network biology will enable personalized medicine

    PubMed Central

    Arnaout, Ramy; Hill, Colin

    2009-01-01

    Applications of next-generation nucleic acid sequencing technologies will lead to the development of precision diagnostics that will, in turn, be a major technology enabler of precision medicine. Terabyte-scale, multidimensional data sets derived using these technologies will be used to reverse engineer the specific disease networks that underlie individual patients’ conditions. Modeling and simulation of these networks in the presence of virtual drugs, and combinations of drugs, will identify the most efficacious therapy for precision medicine and customized care. In coming years the practice of medicine will routinely employ network biology analytics supported by high-performance supercomputing. PMID:20948615

  13. Comparison of Biology Student Performance in Quarter and Semester Systems

    ERIC Educational Resources Information Center

    Gibbens, Brian; Williams, Mary A.; Strain, Anna K.; Hoff, Courtney D. M.

    2015-01-01

    Curricula at most colleges and universities in the United States are scheduled according to quarters or semesters. While each schedule has several potential advantages over the other, it is unclear what effect each has on student performance. This study compares biology student performance during the two and a half years before and after the 1999…

  14. Genome-Wide SNP Calling from Genotyping by Sequencing (GBS) Data: A Comparison of Seven Pipelines and Two Sequencing Technologies

    PubMed Central

    Torkamaneh, Davoud; Laroche, Jérôme; Belzile, François

    2016-01-01

    Next-generation sequencing (NGS) has revolutionized plant and animal research in many ways including new methods of high throughput genotyping. Genotyping-by-sequencing (GBS) has been demonstrated to be a robust and cost-effective genotyping method capable of producing thousands to millions of SNPs across a wide range of species. Undoubtedly, the greatest barrier to its broader use is the challenge of data analysis. Herein we describe a comprehensive comparison of seven GBS bioinformatics pipelines developed to process raw GBS sequence data into SNP genotypes. We compared five pipelines requiring a reference genome (TASSEL-GBS v1& v2, Stacks, IGST, and Fast-GBS) and two de novo pipelines that do not require a reference genome (UNEAK and Stacks). Using Illumina sequence data from a set of 24 re-sequenced soybean lines, we performed SNP calling with these pipelines and compared the GBS SNP calls with the re-sequencing data to assess their accuracy. The number of SNPs called without a reference genome was lower (13k to 24k) than with a reference genome (25k to 54k SNPs) while accuracy was high (92.3 to 98.7%) for all but one pipeline (TASSEL-GBSv1, 76.1%). Among pipelines offering a high accuracy (>95%), Fast-GBS called the greatest number of polymorphisms (close to 35,000 SNPs + Indels) and yielded the highest accuracy (98.7%). Using Ion Torrent sequence data for the same 24 lines, we compared the performance of Fast-GBS with that of TASSEL-GBSv2. It again called more polymorphisms (25.8K vs 22.9K) and these proved more accurate (95.2 vs 91.1%). Typically, SNP catalogues called from the same sequencing data using different pipelines resulted in highly overlapping SNP catalogues (79–92% overlap). In contrast, overlap between SNP catalogues obtained using the same pipeline but different sequencing technologies was less extensive (~50–70%). PMID:27547936

  15. Genome-Wide SNP Calling from Genotyping by Sequencing (GBS) Data: A Comparison of Seven Pipelines and Two Sequencing Technologies.

    PubMed

    Torkamaneh, Davoud; Laroche, Jérôme; Belzile, François

    2016-01-01

    Next-generation sequencing (NGS) has revolutionized plant and animal research in many ways including new methods of high throughput genotyping. Genotyping-by-sequencing (GBS) has been demonstrated to be a robust and cost-effective genotyping method capable of producing thousands to millions of SNPs across a wide range of species. Undoubtedly, the greatest barrier to its broader use is the challenge of data analysis. Herein we describe a comprehensive comparison of seven GBS bioinformatics pipelines developed to process raw GBS sequence data into SNP genotypes. We compared five pipelines requiring a reference genome (TASSEL-GBS v1& v2, Stacks, IGST, and Fast-GBS) and two de novo pipelines that do not require a reference genome (UNEAK and Stacks). Using Illumina sequence data from a set of 24 re-sequenced soybean lines, we performed SNP calling with these pipelines and compared the GBS SNP calls with the re-sequencing data to assess their accuracy. The number of SNPs called without a reference genome was lower (13k to 24k) than with a reference genome (25k to 54k SNPs) while accuracy was high (92.3 to 98.7%) for all but one pipeline (TASSEL-GBSv1, 76.1%). Among pipelines offering a high accuracy (>95%), Fast-GBS called the greatest number of polymorphisms (close to 35,000 SNPs + Indels) and yielded the highest accuracy (98.7%). Using Ion Torrent sequence data for the same 24 lines, we compared the performance of Fast-GBS with that of TASSEL-GBSv2. It again called more polymorphisms (25.8K vs 22.9K) and these proved more accurate (95.2 vs 91.1%). Typically, SNP catalogues called from the same sequencing data using different pipelines resulted in highly overlapping SNP catalogues (79-92% overlap). In contrast, overlap between SNP catalogues obtained using the same pipeline but different sequencing technologies was less extensive (~50-70%). PMID:27547936

  16. Identification of the bacteriophage T5 dUTPase by protein sequence comparisons.

    PubMed

    Kaliman, A V

    1996-01-01

    It is shown by protein sequence comparisons that a 148 amino acid open reading frame (ORF 148) located at 67% of the bacteriophage T5 genome encodes a protein with strong similarity to known dUTPases. This protein contains five characteristic amino acid sequence motifs that are common to the dUTPase gene family. A similarity in size and high degree of sequence identity strongly suggest that the protein encoded by the ORF 148 of bacteriophage T5 is dUTPase. PMID:8988373

  17. Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently

    PubMed Central

    Currin, Andrew; Swainston, Neil; Day, Philip J.

    2015-01-01

    The amino acid sequence of a protein affects both its structure and its function. Thus, the ability to modify the sequence, and hence the structure and activity, of individual proteins in a systematic way, opens up many opportunities, both scientifically and (as we focus on here) for exploitation in biocatalysis. Modern methods of synthetic biology, whereby increasingly large sequences of DNA can be synthesised de novo, allow an unprecedented ability to engineer proteins with novel functions. However, the number of possible proteins is far too large to test individually, so we need means for navigating the ‘search space’ of possible protein sequences efficiently and reliably in order to find desirable activities and other properties. Enzymologists distinguish binding (K d) and catalytic (k cat) steps. In a similar way, judicious strategies have blended design (for binding, specificity and active site modelling) with the more empirical methods of classical directed evolution (DE) for improving k cat (where natural evolution rarely seeks the highest values), especially with regard to residues distant from the active site and where the functional linkages underpinning enzyme dynamics are both unknown and hard to predict. Epistasis (where the ‘best’ amino acid at one site depends on that or those at others) is a notable feature of directed evolution. The aim of this review is to highlight some of the approaches that are being developed to allow us to use directed evolution to improve enzyme properties, often dramatically. We note that directed evolution differs in a number of ways from natural evolution, including in particular the available mechanisms and the likely selection pressures. Thus, we stress the opportunities afforded by techniques that enable one to map sequence to (structure and) activity in silico, as an effective means of modelling and exploring protein landscapes. Because known landscapes may be assessed and reasoned about as a whole

  18. Ribosomal DNA ITS-1 and ITS-2 sequence comparisons as a tool for predicting genetic relatedness.

    PubMed

    Coleman, A W; Mai, J C

    1997-08-01

    The determination of the secondary structure of the internal transcribed spacer (ITS) regions separating nuclear ribosomal RNA genes of Chlorophytes has improved the fidelity of alignment of nuclear ribosomal ITS sequences from related organisms. Application of this information to sequences from green algae and plants suggested that a subset of the ITS-2 positions is relatively conserved. Organisms that can mate are identical at all of these 116 positions, or differ by at most, one nucleotide change. Here we sequenced and compared the ITS-1 and ITS-2 of 40 green flagellates in search of the nearest relative to Chlamydomonas reinhardtii. The analysis clearly revealed one unique candidate, C. incerta. Several ancillary benefits of the analysis included the identification of mislabelled cultures, the resolution of confusion concerning C. smithii, the discovery of misidentified sequences in GenBank derived from a green algal contaminant, and an overview of evolutionary relationships among the Volvocales, which is congruent with that derived from rDNA gene sequence comparisons but improves upon its resolution. The study further delineates the taxonomic level at which ITS sequences, in comparison to ribosomal gene sequences, are most useful in systematic and other studies. PMID:9236277

  19. Comparison of alignment software for genome-wide bisulphite sequence data

    PubMed Central

    Chatterjee, Aniruddha; Stockwell, Peter A.; Rodger, Euan J.; Morison, Ian M.

    2012-01-01

    Recent advances in next generation sequencing (NGS) technology now provide the opportunity to rapidly interrogate the methylation status of the genome. However, there are challenges in handling and interpretation of the methylation sequence data because of its large volume and the consequences of bisulphite modification. We sequenced reduced representation human genomes on the Illumina platform and efficiently mapped and visualized the data with different pipelines and software packages. We examined three pipelines for aligning bisulphite converted sequencing reads and compared their performance. We also comment on pre-processing and quality control of Illumina data. This comparison highlights differences in methods for NGS data processing and provides guidance to advance sequence-based methylation data analysis for molecular biologists. PMID:22344695

  20. The landscape of fusion transcripts in spitzoid melanoma and biologically indeterminate spitzoid tumors by RNA sequencing

    PubMed Central

    Wu, Gang; Barnhill, Raymond L.; Lee, Seungjae; Li, Yongjin; Shao, Ying; Easton, John; Dalton, James; Zhang, Jinghui; Pappo, Alberto; Bahrami, Armita

    2016-01-01

    Kinase activation by chromosomal translocations is a common mechanism that drives tumorigenesis in spitzoid neoplasms. To explore the landscape of fusion transcripts in these tumors, we performed whole-transcriptome sequencing using formalin-fixed paraffin-embedded tissues in malignant or biologically indeterminate spitzoid tumors from 7 patients (age 2–14 years). RNA sequence libraries enriched for coding regions were prepared and the sequencing was analyzed by a novel assembly-based algorithm designed for detecting complex fusions. In addition, tumor samples were screened for hotspot TERT promoter mutations, and telomerase expression was assessed by TERT mRNA in situ hybridization (ISH). Two patients had widespread metastasis and subsequently died of disease, and 5 patients had a benign clinical course on limited follow-up (mean: 30 months). RNA sequencing and TERT mRNA ISH were successful in 6 tumors and unsuccessful in 1 disseminating tumor due to low RNA quality. RNA sequencing identified a kinase fusion in 5 of the 6 sequenced tumors: TPM3–NTRK1 (2 tumors), complex rearrangements involving TPM3, ALK, and IL6R (1 tumor), BAIAP2L1–BRAF (1 tumor), and EML4–BRAF (1 disseminating tumor). All predicted chimeric transcripts were expressed at high levels and contained the intact kinase domain. In addition, 2 tumors each contained a second fusion gene, ARID1B-SNX9 or PTPRZ1-NFAM1. The detected chimeric genes were validated by home-brew break-apart or fusion fluorescence in situ hybridization. The 2 disseminating tumors each harbored the TERT promoter −124C>T (Chr 5:1,295,228 hg19 coordinate) mutation whereas the remaining 5 tumors retained the wild-type gene. The presence of the −124C>T mutation correlated with telomerase expression by TERT mRNA ISH. In summary, we demonstrated complex fusion transcripts and novel partner genes for BRAF by RNA sequencing of FFPE samples. The diversity of gene fusions demonstrated by RNA sequencing defines the molecular

  1. The Effects of Meiosis/Genetics Integration and Instructional Sequence on College Biology Student Achievement in Genetics.

    ERIC Educational Resources Information Center

    Browning, Mark

    The purpose of the research was to manipulate two aspects of genetics instruction in order to measure their effects on college, introductory biology students' achievement in genetics. One instructional sequence that was used dealt first with monohybrid autosomal inheritance patterns, then sex-linkage. The alternate sequence was the reverse.…

  2. Comparison of solution-based exome capture methods for next generation sequencing

    PubMed Central

    2011-01-01

    Background Techniques enabling targeted re-sequencing of the protein coding sequences of the human genome on next generation sequencing instruments are of great interest. We conducted a systematic comparison of the solution-based exome capture kits provided by Agilent and Roche NimbleGen. A control DNA sample was captured with all four capture methods and prepared for Illumina GAII sequencing. Sequence data from additional samples prepared with the same protocols were also used in the comparison. Results We developed a bioinformatics pipeline for quality control, short read alignment, variant identification and annotation of the sequence data. In our analysis, a larger percentage of the high quality reads from the NimbleGen captures than from the Agilent captures aligned to the capture target regions. High GC content of the target sequence was associated with poor capture success in all exome enrichment methods. Comparison of mean allele balances for heterozygous variants indicated a tendency to have more reference bases than variant bases in the heterozygous variant positions within the target regions in all methods. There was virtually no difference in the genotype concordance compared to genotypes derived from SNP arrays. A minimum of 11× coverage was required to make a heterozygote genotype call with 99% accuracy when compared to common SNPs on genome-wide association arrays. Conclusions Libraries captured with NimbleGen kits aligned more accurately to the target regions. The updated NimbleGen kit most efficiently covered the exome with a minimum coverage of 20×, yet none of the kits captured all the Consensus Coding Sequence annotated exons. PMID:21955854

  3. Comparison and quantitative verification of mapping algorithms for whole genome bisulfite sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Coupling bisulfite conversion with next-generation sequencing (Bisulfite-seq) enables genome-wide measurement of DNA methylation, but poses unique challenges for mapping. However, despite a proliferation of Bisulfite-seq mapping tools, no systematic comparison of their genomic coverage and quantitat...

  4. Mining and comparison of haplotype-based expressed sequence tag single nucleotide polymorphisms among citrus cultivars

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In this paper, haplotype-based SNPs were mined out of publicly available citrus expressed sequence tags (ESTs) from different citrus cultivars (genotypes) individually and collectively for comparison. There were a total of 567,297 ESTs belonging to 27 cultivars in varying numbers and consequentially...

  5. Genomic sequence comparison of eif(iso)4E between Arabidopsis and melon

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Eukaryotic initiation factors (eifs) bind to mRNA and initiate translation in plants. Mutations in eifs condition recessively inherited virus resistances. While coding regions among eifs have been compared both within and among species, comparisons among flanking genomic sequences are lacking. We ...

  6. Progressive structure-based alignment of homologous proteins: Adopting sequence comparison strategies.

    PubMed

    Joseph, Agnel Praveen; Srinivasan, Narayanaswamy; de Brevern, Alexandre G

    2012-09-01

    Comparison of multiple protein structures has a broad range of applications in the analysis of protein structure, function and evolution. Multiple structure alignment tools (MSTAs) are necessary to obtain a simultaneous comparison of a family of related folds. In this study, we have developed a method for multiple structure comparison largely based on sequence alignment techniques. A widely used Structural Alphabet named Protein Blocks (PBs) was used to transform the information on 3D protein backbone conformation as a 1D sequence string. A progressive alignment strategy similar to CLUSTALW was adopted for multiple PB sequence alignment (mulPBA). Highly similar stretches identified by the pairwise alignments are given higher weights during the alignment. The residue equivalences from PB based alignments are used to obtain a three dimensional fit of the structures followed by an iterative refinement of the structural superposition. Systematic comparisons using benchmark datasets of MSTAs underlines that the alignment quality is better than MULTIPROT, MUSTANG and the alignments in HOMSTRAD, in more than 85% of the cases. Comparison with other rigid-body and flexible MSTAs also indicate that mulPBA alignments are superior to most of the rigid-body MSTAs and highly comparable to the flexible alignment methods. PMID:22676903

  7. Beyond Linear Sequence Comparisons: The use of genome-levelcharacters for phylogenetic reconstruction

    SciTech Connect

    Boore, Jeffrey L.

    2004-11-27

    Although the phylogenetic relationships of many organisms have been convincingly resolved by the comparisons of nucleotide or amino acid sequences, others have remained equivocal despite great effort. Now that large-scale genome sequencing projects are sampling many lineages, it is becoming feasible to compare large data sets of genome-level features and to develop this as a tool for phylogenetic reconstruction that has advantages over conventional sequence comparisons. Although it is unlikely that these will address a large number of evolutionary branch points across the broad tree of life due to the infeasibility of such sampling, they have great potential for convincingly resolving many critical, contested relationships for which no other data seems promising. However, it is important that we recognize potential pitfalls, establish reasonable standards for acceptance, and employ rigorous methodology to guard against a return to earlier days of scenario-driven evolutionary reconstructions.

  8. Multifaceted biological insights from a draft genome sequence of the tobacco hornworm moth, Manduca sexta.

    PubMed

    Kanost, Michael R; Arrese, Estela L; Cao, Xiaolong; Chen, Yun-Ru; Chellapilla, Sanjay; Goldsmith, Marian R; Grosse-Wilde, Ewald; Heckel, David G; Herndon, Nicolae; Jiang, Haobo; Papanicolaou, Alexie; Qu, Jiaxin; Soulages, Jose L; Vogel, Heiko; Walters, James; Waterhouse, Robert M; Ahn, Seung-Joon; Almeida, Francisca C; An, Chunju; Aqrawi, Peshtewani; Bretschneider, Anne; Bryant, William B; Bucks, Sascha; Chao, Hsu; Chevignon, Germain; Christen, Jayne M; Clarke, David F; Dittmer, Neal T; Ferguson, Laura C F; Garavelou, Spyridoula; Gordon, Karl H J; Gunaratna, Ramesh T; Han, Yi; Hauser, Frank; He, Yan; Heidel-Fischer, Hanna; Hirsh, Ariana; Hu, Yingxia; Jiang, Hongbo; Kalra, Divya; Klinner, Christian; König, Christopher; Kovar, Christie; Kroll, Ashley R; Kuwar, Suyog S; Lee, Sandy L; Lehman, Rüdiger; Li, Kai; Li, Zhaofei; Liang, Hanquan; Lovelace, Shanna; Lu, Zhiqiang; Mansfield, Jennifer H; McCulloch, Kyle J; Mathew, Tittu; Morton, Brian; Muzny, Donna M; Neunemann, David; Ongeri, Fiona; Pauchet, Yannick; Pu, Ling-Ling; Pyrousis, Ioannis; Rao, Xiang-Jun; Redding, Amanda; Roesel, Charles; Sanchez-Gracia, Alejandro; Schaack, Sarah; Shukla, Aditi; Tetreau, Guillaume; Wang, Yang; Xiong, Guang-Hua; Traut, Walther; Walsh, Tom K; Worley, Kim C; Wu, Di; Wu, Wenbi; Wu, Yuan-Qing; Zhang, Xiufeng; Zou, Zhen; Zucker, Hannah; Briscoe, Adriana D; Burmester, Thorsten; Clem, Rollie J; Feyereisen, René; Grimmelikhuijzen, Cornelis J P; Hamodrakas, Stavros J; Hansson, Bill S; Huguet, Elisabeth; Jermiin, Lars S; Lan, Que; Lehman, Herman K; Lorenzen, Marce; Merzendorfer, Hans; Michalopoulos, Ioannis; Morton, David B; Muthukrishnan, Subbaratnam; Oakeshott, John G; Palmer, Will; Park, Yoonseong; Passarelli, A Lorena; Rozas, Julio; Schwartz, Lawrence M; Smith, Wendy; Southgate, Agnes; Vilcinskas, Andreas; Vogt, Richard; Wang, Ping; Werren, John; Yu, Xiao-Qiang; Zhou, Jing-Jiang; Brown, Susan J; Scherer, Steven E; Richards, Stephen; Blissard, Gary W

    2016-09-01

    Manduca sexta, known as the tobacco hornworm or Carolina sphinx moth, is a lepidopteran insect that is used extensively as a model system for research in insect biochemistry, physiology, neurobiology, development, and immunity. One important benefit of this species as an experimental model is its extremely large size, reaching more than 10 g in the larval stage. M. sexta larvae feed on solanaceous plants and thus must tolerate a substantial challenge from plant allelochemicals, including nicotine. We report the sequence and annotation of the M. sexta genome, and a survey of gene expression in various tissues and developmental stages. The Msex_1.0 genome assembly resulted in a total genome size of 419.4 Mbp. Repetitive sequences accounted for 25.8% of the assembled genome. The official gene set is comprised of 15,451 protein-coding genes, of which 2498 were manually curated. Extensive RNA-seq data from many tissues and developmental stages were used to improve gene models and for insights into gene expression patterns. Genome wide synteny analysis indicated a high level of macrosynteny in the Lepidoptera. Annotation and analyses were carried out for gene families involved in a wide spectrum of biological processes, including apoptosis, vacuole sorting, growth and development, structures of exoskeleton, egg shells, and muscle, vision, chemosensation, ion channels, signal transduction, neuropeptide signaling, neurotransmitter synthesis and transport, nicotine tolerance, lipid metabolism, and immunity. This genome sequence, annotation, and analysis provide an important new resource from a well-studied model insect species and will facilitate further biochemical and mechanistic experimental studies of many biological systems in insects. PMID:27522922

  9. Biological treatment of shrimp aquaculture wastewater using a sequencing batch reactor.

    PubMed

    Lyles, C; Boopathy, R; Fontenot, Q; Kilgen, M

    2008-12-01

    To improve the water quality in the shrimp aquaculture, a sequencing batch reactor (SBR) has been tested for the treatment of shrimp wastewater. A SBR is a variation of the activated sludge biological treatment process. This process uses multiple steps in the same tank to take the place of multiple tanks in a conventional treatment system. The SBR accomplishes equalization, aeration, and clarification in a timed sequence in a single reactor basin. This is achieved in a simple tank, through sequencing stages, which include fill, react, settle, decant, and idle. A laboratory scale SBR and a pilot scale SBR was successfully operated using shrimp aquaculture wastewater. The wastewater contained high concentration of carbon and nitrogen. By operating the reactor sequentially, viz, aerobic and anoxic modes, nitrification and denitrification were achieved as well as removal of carbon in a laboratory scale SBR. To be specific, the initial chemical oxygen demand (COD) concentration of 1,593 mg/l was reduced to 44 mg/l within 10 days of reactor operation. Ammonia in the sludge was nitrified within 3 days. The denitrification of nitrate was achieved by the anaerobic process and 99% removal of nitrate was observed. Based on the laboratory study, a pilot scale SBR was designed and operated to remove excess nitrogen in the shrimp wastewater. The results mimicked the laboratory scale SBR. PMID:18561032

  10. Clinical next-generation sequencing reveals aggressive cancer biology in adolescent and young adult patients

    PubMed Central

    Subbiah, Vivek; Bupathi, Manojkumar; Kato, Shumei; Livingston, Andrew; Slopis, John; Anderson, Pete M.; Hong, David S.

    2015-01-01

    Background The aggressive biology of cancers arising in adolescent and young adult (AYA; ages 15–39 years) patients is thought to contribute to poor survival outcomes. Methods We used clinical next-generation sequencing (NGS) results to examine the molecular alterations and diverse biology of cancer in AYA patients referred to the Phase 1 program at UT MD Anderson Cancer Center. Results Among the 28 patients analyzed (14 female and 14 male), 12 had pediatric-type cancers, six had adult-type cancers, and ten had orphan cancers. Unique, hitherto unreported aberrations were identified in all types of cancers. Aberrations in TP53, NKX2-1, KRAS, CDKN2A, MDM4, MCL1, MYC, BCL2L2, and RB1 were demonstrated across all tumor types. Five patients harbored TP53 aberrations; three patients harbored MYC, MCL1, and CDKN2A aberrations; and two patients harbored NKX2-1, KRAS, MDM4, BCL2L2, and RB1 alterations. Several patients had multiple aberrations; a patient with wild-type gastrointestinal stromal tumor harbored five alterations (MDM4, MCL1, KIT, AKT3, and PDGRFA). Conclusions This preliminary report of NGS of cancer in AYA patients reveals diverse and unique aberrations. Further molecular profiling and a deeper understanding of the biology of these unique aberrations are warranted and may lead to targeted therapeutic interventions. PMID:26328274

  11. Comparison of the Legionella pneumophila population structure as determined by sequence-based typing and whole genome sequencing

    PubMed Central

    2013-01-01

    Background Legionella pneumophila is an opportunistic pathogen of humans where the source of infection is usually from contaminated man-made water systems. When an outbreak of Legionnaires’ disease caused by L. pneumophila occurs, it is necessary to discover the source of infection. A seven allele sequence-based typing scheme (SBT) has been very successful in providing the means to attribute outbreaks of L. pneumophila to a particular source or sources. Particular sequence types described by this scheme are known to exhibit specific phenotypes. For instance some types are seen often in clinical cases but are rarely isolated from the environment and vice versa. Of those causing human disease some types are thought to be more likely to cause more severe disease. It is possible that the genetic basis for these differences are vertically inherited and associated with particular genetic lineages within the population. In order to provide a framework within which to test this hypothesis and others relating to the population biology of L. pneumophila, a set of genomes covering the known diversity of the organism is required. Results Firstly, this study describes a means to group L. pneumophila strains into pragmatic clusters, using a methodology that takes into consideration the genetic forces operating on the population. These clusters can be used as a standardised nomenclature, so those wishing to describe a group of strains can do so. Secondly, the clusters generated from the first part of the study were used to select strains rationally for whole genome sequencing (WGS). The data generated was used to compare phylogenies derived from SBT and WGS. In general the SBT sequence type (ST) accurately reflects the whole genome-based genotype. Where there are exceptions and recombination has resulted in the ST no longer reflecting the genetic lineage described by the whole genome sequence, the clustering technique employed detects these sequence types as being admixed

  12. Detection of Weakly Conserved Ancestral Mammalian RegulatorySequences by Primate Comparisons

    SciTech Connect

    Wang, Qian-fei; Prabhakar, Shyam; Chanan, Sumita; Cheng,Jan-Fang; Rubin, Edward M.; Boffelli, Dario

    2006-06-01

    Genomic comparisons between human and distant, non-primatemammals are commonly used to identify cis-regulatory elements based onconstrained sequence evolution. However, these methods fail to detectcryptic functional elements, which are too weakly conserved among mammalsto distinguish from nonfunctional DNA. To address this problem, weexplored the potential of deep intra-primate sequence comparisons. Wesequenced the orthologs of 558 kb of human genomic sequence, coveringmultiple loci involved in cholesterol homeostasis, in 6 nonhumanprimates. Our analysis identified 6 noncoding DNA elements displayingsignificant conservation among primates, but undetectable in more distantcomparisons. In vitro and in vivo tests revealed that at least three ofthese 6 elements have regulatory function. Notably, the mouse orthologsof these three functional human sequences had regulatory activity despitetheir lack of significant sequence conservation, indicating that they arecryptic ancestral cis-regulatory elements. These regulatory elementscould still be detected in a smaller set of three primate speciesincluding human, rhesus and marmoset. Since the human and rhesus genomesequences are already available, and the marmoset genome is activelybeing sequenced, the primate-specific conservation analysis describedhere can be applied in the near future on a whole-genome scale, tocomplement the annotation provided by more distant speciescomparisons.

  13. Comparison of Prostate IMRT and VMAT Biologically Optimised Treatment Plans

    SciTech Connect

    Hardcastle, Nicholas; Tome, Wolfgang A.; Foo, Kerwyn; Miller, Andrew; Carolan, Martin; Metcalfe, Peter

    2011-10-01

    Recently, a new radiotherapy delivery technique has become clinically available-volumetric modulated arc therapy (VMAT). VMAT is the delivery of IMRT while the gantry is in motion using dynamic leaf motion. The perceived benefit of VMAT over IMRT is a reduction in delivery time. In this study, VMAT was compared directly with IMRT for a series of prostate cases. For 10 patients, a biologically optimized seven-field IMRT plan was compared with a biologically optimized VMAT plan using the same planning objectives. The Pinnacle RTPS was used. The resultant target and organ-at-risk dose-volume histograms (DVHs) were compared. The normal tissue complication probability (NTCP) for the IMRT and VMAT plans was calculated for 3 model parameter sets. The delivery efficiency and time for the IMRT and VMAT plans was compared. The VMAT plans resulted in a statistically significant reduction in the rectal V25Gy parameter of 8.2% on average over the IMRT plans. For one of the NTCP parameter sets, the VMAT plans had a statistically significant lower rectal NTCP. These reductions in rectal dose were achieved using 18.6% fewer monitor units and a delivery time reduction of up to 69%. VMAT plans resulted in reductions in rectal doses for all 10 patients in the study. This was achieved with significant reductions in delivery time and monitor units. Given the target coverage was equivalent, the VMAT plans were superior.

  14. Genome sequencing and systems biology analysis of a lipase-producing bacterial strain.

    PubMed

    Li, N; Li, D D; Zhang, Y Z; Yuan, Y Z; Geng, H; Xiong, L; Liu, D L

    2016-01-01

    Lipase-producing bacteria are naturally-occurring, industrially-relevant microorganisms that produce lipases, which can be used to synthesize biodiesel from waste oils. The efficiency of lipase expression varies between various microbial strains. Therefore, strains that can produce lipases with high efficiency must be screened, and the conditions of lipase metabolism and optimization of the production process in a given environment must be thoroughly studied. A high efficiency lipase-producing strain was isolated from the sediments of Jinsha River, identified by 16S rRNA sequence analysis as Serratia marcescens, and designated as HS-L5. A schematic diagram of the genome sequence was constructed by high-throughput genome sequencing. A series of genes related to lipid degradation were identified by functional gene annotation through sequence homology analysis. A genome-scale metabolic model of HS-ML5 was constructed using systems biology techniques. The model consisted of 1722 genes and 1567 metabolic reactions. The topological graph of the genome-scale metabolic model was compared to that of conventional metabolic pathways using a visualization software and KEGG database. The basic components and boundaries of the tributyrin degradation subnetwork were determined, and its flux balance analyzed using Matlab and COBRA Toolbox to simulate the effects of different conditions on the catalytic efficiency of lipases produced by HS-ML5. We proved that the catalytic activity of microbial lipases was closely related to the carbon metabolic pathway. As production and catalytic efficiency of lipases varied greatly with the environment, the catalytic efficiency and environmental adaptability of microbial lipases can be improved by proper control of the production conditions. PMID:27050954

  15. Comparison of Dixon Sequences for Estimation of Percent Breast Fibroglandular Tissue

    PubMed Central

    Ledger, Araminta E. W.; Scurr, Erica D.; Hughes, Julie; Macdonald, Alison; Wallace, Toni; Thomas, Karen; Wilson, Robin; Leach, Martin O.; Schmidt, Maria A.

    2016-01-01

    Objectives To evaluate sources of error in the Magnetic Resonance Imaging (MRI) measurement of percent fibroglandular tissue (%FGT) using two-point Dixon sequences for fat-water separation. Methods Ten female volunteers (median age: 31 yrs, range: 23–50 yrs) gave informed consent following Research Ethics Committee approval. Each volunteer was scanned twice following repositioning to enable an estimation of measurement repeatability from high-resolution gradient-echo (GRE) proton-density (PD)-weighted Dixon sequences. Differences in measures of %FGT attributable to resolution, T1 weighting and sequence type were assessed by comparison of this Dixon sequence with low-resolution GRE PD-weighted Dixon data, and against gradient-echo (GRE) or spin-echo (SE) based T1-weighted Dixon datasets, respectively. Results %FGT measurement from high-resolution PD-weighted Dixon sequences had a coefficient of repeatability of ±4.3%. There was no significant difference in %FGT between high-resolution and low-resolution PD-weighted data. Values of %FGT from GRE and SE T1-weighted data were strongly correlated with that derived from PD-weighted data (r = 0.995 and 0.96, respectively). However, both sequences exhibited higher mean %FGT by 2.9% (p < 0.0001) and 12.6% (p < 0.0001), respectively, in comparison with PD-weighted data; the increase in %FGT from the SE T1-weighted sequence was significantly larger at lower breast densities. Conclusion Although measurement of %FGT at low resolution is feasible, T1 weighting and sequence type impact on the accuracy of Dixon-based %FGT measurements; Dixon MRI protocols for %FGT measurement should be carefully considered, particularly for longitudinal or multi-centre studies. PMID:27011312

  16. A Multiple-Sequence Variant of the Multiple-Baseline Design: A Strategy for Analysis of Sequence Effects and Treatment Comparison.

    ERIC Educational Resources Information Center

    Noell, George H.; Gresham, Frank M.

    2001-01-01

    Describes design logic and potential uses of a variant of the multiple-baseline design. The multiple-baseline multiple-sequence (MBL-MS) consists of multiple-baseline designs that are interlaced with one another and include all possible sequences of treatments. The MBL-MS design appears to be primarily useful for comparison of treatments taking…

  17. PepTool and GeneTool: platform-independent tools for biological sequence analysis.

    PubMed

    Wishart, D S; Stothard, P; Van Domselaar, G H

    2000-01-01

    Although we are unable to discuss all of the functionality available in PepTool and GeneTool, it should be evident from this brief review that both packages offer a great deal in terms of functionality and ease-of-use. Furthermore, a number of useful innovations including platform-independent GUI design, networked parallelism, direct internet connectivity, database compression, and a variety of enhanced or improved algorithms should make these two programs particularly useful in the rapidly changing world of biological sequence analysis. More complete descriptions of the programs, algorithms and operation of PepTool and GeneTool are available on the BioTools web site (www.biotools.com), in the associated program user manuals and in the on-line Help pages. PMID:10547833

  18. Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence.

    PubMed

    Cole, S T; Brosch, R; Parkhill, J; Garnier, T; Churcher, C; Harris, D; Gordon, S V; Eiglmeier, K; Gas, S; Barry, C E; Tekaia, F; Badcock, K; Basham, D; Brown, D; Chillingworth, T; Connor, R; Davies, R; Devlin, K; Feltwell, T; Gentles, S; Hamlin, N; Holroyd, S; Hornsby, T; Jagels, K; Krogh, A; McLean, J; Moule, S; Murphy, L; Oliver, K; Osborne, J; Quail, M A; Rajandream, M A; Rogers, J; Rutter, S; Seeger, K; Skelton, J; Squares, R; Squares, S; Sulston, J E; Taylor, K; Whitehead, S; Barrell, B G

    1998-06-11

    Countless millions of people have died from tuberculosis, a chronic infectious disease caused by the tubercle bacillus. The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been determined and analysed in order to improve our understanding of the biology of this slow-growing pathogen and to help the conception of new prophylactic and therapeutic interventions. The genome comprises 4,411,529 base pairs, contains around 4,000 genes, and has a very high guanine + cytosine content that is reflected in the biased amino-acid content of the proteins. M. tuberculosis differs radically from other bacteria in that a very large portion of its coding capacity is devoted to the production of enzymes involved in lipogenesis and lipolysis, and to two new families of glycine-rich proteins with a repetitive structure that may represent a source of antigenic variation. PMID:9634230

  19. Comparison of biological chromophores: photophysical properties of cyanophenylalanine derivatives.

    PubMed

    Martin, Joshua P; Fetto, Natalie R; Tucker, Matthew J

    2016-07-27

    Within this work, the family of cyanophenylalanine spectroscopic reporters is extended by showing the ortho and meta derivatives have intrinsic photophysical properties that are useful for studies of protein structure and dynamics. The molar absorptivities of 2-cyanophenylalanine and 3-cyanophenylalanine are shown to be comparable to that of 4-cyanophenylalanine with similar spectral features in their absorbance and emission profiles, demonstrating that these probes can be utilized interchangeably. The fluorescence quantum yields are also on the same scale as commonly used fluorophores in peptides and proteins, tyrosine and tryptophan. These new cyano-fluorophores can be paired with either 4-cyanophenylalanine or tryptophan to capture distances in peptide structure through Förster resonance energy transfer. Additionally, the spectroscopic properties of these chromophores can report the local solvent environment via changes in fluorescence emission intensity as a result of hydrogen bonding and/or hydration. A decrease in the quantum yield is also observed in basic environments due to photoinduced electron transfer from a deprotonated amine in the free PheCN species and at the N-terminus of a short peptide, providing an avenue to detect pH in biological systems. Our results show the potential of these probes, 2-cyanophenylalanine and 3-cyanophenylalanine, to be incorporated into a single peptide chain, either individually or in tandem with 4-cyanophenylalanine, tryptophan, or tyrosine, in order to obtain information about peptide structure and dynamics. PMID:27412819

  20. Definition and Analysis of a System for the Automated Comparison of Curriculum Sequencing Algorithms in Adaptive Distance Learning

    ERIC Educational Resources Information Center

    Limongelli, Carla; Sciarrone, Filippo; Temperini, Marco; Vaste, Giulia

    2011-01-01

    LS-Lab provides automatic support to comparison/evaluation of the Learning Object Sequences produced by different Curriculum Sequencing Algorithms. Through this framework a teacher can verify the correspondence between the behaviour of different sequencing algorithms and her pedagogical preferences. In fact the teacher can compare algorithms…

  1. Enzyme sequence similarity improves the reaction alignment method for cross-species pathway comparison

    SciTech Connect

    Ovacik, Meric A.; Androulakis, Ioannis P.

    2013-09-15

    Pathway-based information has become an important source of information for both establishing evolutionary relationships and understanding the mode of action of a chemical or pharmaceutical among species. Cross-species comparison of pathways can address two broad questions: comparison in order to inform evolutionary relationships and to extrapolate species differences used in a number of different applications including drug and toxicity testing. Cross-species comparison of metabolic pathways is complex as there are multiple features of a pathway that can be modeled and compared. Among the various methods that have been proposed, reaction alignment has emerged as the most successful at predicting phylogenetic relationships based on NCBI taxonomy. We propose an improvement of the reaction alignment method by accounting for sequence similarity in addition to reaction alignment method. Using nine species, including human and some model organisms and test species, we evaluate the standard and improved comparison methods by analyzing glycolysis and citrate cycle pathways conservation. In addition, we demonstrate how organism comparison can be conducted by accounting for the cumulative information retrieved from nine pathways in central metabolism as well as a more complete study involving 36 pathways common in all nine species. Our results indicate that reaction alignment with enzyme sequence similarity results in a more accurate representation of pathway specific cross-species similarities and differences based on NCBI taxonomy.

  2. Effect of cycle changes on simultaneous biological nutrient removal in a sequencing batch reactor (SBR).

    PubMed

    Coma, M; Puig, S; Monclús, H; Balaguer, M D; Colprim, J

    2010-03-01

    The destabilization of a microbial population is sometimes hard to solve when different biological reactions are coupled in the same reactor as in sequencing batch reactors (SBRs). This paper will try to guide through practical experiences the recovery of simultaneous nitrogen and phosphorus removal in an SBR after increasing the demand of wastewater treatment by taking advantage of its flexibility. The results demonstrate that the length of phases and the optimization of influent distribution are key factors in stabilizing the system for long-term periods with high nutrient removal (88%, 93% and 99% of carbon, nitrogen and phosphorus, respectively). In order to recover a biological nutrient removal (BNR) system, different interactions such as simultaneous nitrification and denitrification and also phosphorus removal must be taken into account. As a general conclusion, it can be stated there is no such thing as a perfect SBR operation, and that much will depend on the state of the BNR system. Hence, the SBR operating strategy must be based on a dynamic cycle definition in line with process efficiency. PMID:20426270

  3. Effects of idle time on biological phosphorus removal by sequencing batch reactors.

    PubMed

    Gao, Dawen; Yin, Hang; Liu, Lin; Li, Xing; Liang, Hong

    2013-12-01

    Three identical sequencing batch reactors (SBRs) were operated to investigate the effects of various idle times on the biological phosphorus (P) removal. The idle times were set to 3 hr (R1), 10 hr (R2) and 17 hr (R3). The results showed that the idle time of a SBR had potential impact on biological phosphorus removal, especially when the influent phosphorus concentration increased. The phosphorus removal efficiencies of the R2 and R3 systems declined dramatically compared with the stable R1 system, and the P-release and P-uptake rates of the R3 system in particular decreased dramatically. The PCR-DGGE analysis showed that uncultured Pseudomonas sp. (GQ183242.1) and beta-Proteobacteria (AY823971) were the dominant phosphorus removal bacteria for the R1 and R2 systems, while uncultured gamma-Proteobacteria were the dominant phosphorus removal bacteria for the R3 system. Glycogen-accumulating organisms (GAOs), such as uncultured Sphingomonas sp. (AM889077), were found in the R2 and R3 systems. Overall, the R1 system was the most stable and exhibited the best phosphorus removal efficiency. It was found that although the idle time can be prolonged to allow the formation of intracellular polymers when the phosphorus concentration of the influent is low, systems with a long idle time can become unstable when the influent phosphorus concentration is increased. PMID:24649669

  4. Biological Characterization and Next-Generation Genome Sequencing of the Unclassified Cotia Virus SPAn232 (Poxviridae)

    PubMed Central

    Afonso, Priscila P.; Silva, Patrícia M.; Schnellrath, Laila C.; Jesus, Desyreé M.; Hu, Jianhong; Yang, Yajie; Renne, Rolf; Attias, Marcia; Condit, Richard C.; Moussatché, Nissin

    2012-01-01

    Cotia virus (COTV) SPAn232 was isolated in 1961 from sentinel mice at Cotia field station, São Paulo, Brazil. Attempts to classify COTV within a recognized genus of the Poxviridae have generated contradictory findings. Studies by different researchers suggested some similarity to myxoma virus and swinepox virus, whereas another investigation characterized COTV SPAn232 as a vaccinia virus strain. Because of the lack of consensus, we have conducted an independent biological and molecular characterization of COTV. Virus growth curves reached maximum yields at approximately 24 to 48 h and were accompanied by virus DNA replication and a characteristic early/late pattern of viral protein synthesis. Interestingly, COTV did not induce detectable cytopathic effects in BSC-40 cells until 4 days postinfection and generated viral plaques only after 8 days. We determined the complete genomic sequence of COTV by using a combination of the next-generation DNA sequencing technologies 454 and Illumina. A unique contiguous sequence of 185,139 bp containing 185 genes, including the 90 genes conserved in all chordopoxviruses, was obtained. COTV has an interesting panel of open reading frames (ORFs) related to the evasion of host defense, including two novel genes encoding C-C chemokine-like proteins, each present in duplicate copies. Phylogenetic analysis revealed the highest amino acid identity scores with Cervidpoxvirus, Capripoxvirus, Suipoxvirus, Leporipoxvirus, and Yatapoxvirus. However, COTV grouped as an independent branch within this clade, which clearly excluded its classification as an Orthopoxvirus. Therefore, our data suggest that COTV could represent a new poxvirus genus. PMID:22345477

  5. COMPARISON OF ANALYTICAL METHODS FOR THE MEASUREMENT OF NON-VIABLE BIOLOGICAL PM

    EPA Science Inventory

    The paper describes a preliminary research effort to develop a methodology for the measurement of non-viable biologically based particulate matter (PM), analyzing for mold, dust mite, and ragweed antigens and endotoxins. Using a comparison of analytical methods, the research obj...

  6. Visual Literacy in Biology: A Comparison of Visual Representations in Textbooks and Journal Articles

    ERIC Educational Resources Information Center

    Rybarczyk, Brian

    2011-01-01

    Using course materials to promote visual literacy skills is an important aspect of undergraduate science education. A comparison study was undertaken to determine the composition of visual representations, specifically representations of data generated from experimental research, found in general biology and discipline-specific textbooks compared…

  7. Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications.

    PubMed

    Harris, R Alan; Wang, Ting; Coarfa, Cristian; Nagarajan, Raman P; Hong, Chibo; Downey, Sara L; Johnson, Brett E; Fouse, Shaun D; Delaney, Allen; Zhao, Yongjun; Olshen, Adam; Ballinger, Tracy; Zhou, Xin; Forsberg, Kevin J; Gu, Junchen; Echipare, Lorigail; O'Geen, Henriette; Lister, Ryan; Pelizzola, Mattia; Xi, Yuanxin; Epstein, Charles B; Bernstein, Bradley E; Hawkins, R David; Ren, Bing; Chung, Wen-Yu; Gu, Hongcang; Bock, Christoph; Gnirke, Andreas; Zhang, Michael Q; Haussler, David; Ecker, Joseph R; Li, Wei; Farnham, Peggy J; Waterland, Robert A; Meissner, Alexander; Marra, Marco A; Hirst, Martin; Milosavljevic, Aleksandar; Costello, Joseph F

    2010-10-01

    Analysis of DNA methylation patterns relies increasingly on sequencing-based profiling methods. The four most frequently used sequencing-based technologies are the bisulfite-based methods MethylC-seq and reduced representation bisulfite sequencing (RRBS), and the enrichment-based techniques methylated DNA immunoprecipitation sequencing (MeDIP-seq) and methylated DNA binding domain sequencing (MBD-seq). We applied all four methods to biological replicates of human embryonic stem cells to assess their genome-wide CpG coverage, resolution, cost, concordance and the influence of CpG density and genomic context. The methylation levels assessed by the two bisulfite methods were concordant (their difference did not exceed a given threshold) for 82% for CpGs and 99% of the non-CpG cytosines. Using binary methylation calls, the two enrichment methods were 99% concordant and regions assessed by all four methods were 97% concordant. We combined MeDIP-seq with methylation-sensitive restriction enzyme (MRE-seq) sequencing for comprehensive methylome coverage at lower cost. This, along with RNA-seq and ChIP-seq of the ES cells enabled us to detect regions with allele-specific epigenetic states, identifying most known imprinted regions and new loci with monoallelic epigenetic marks and monoallelic expression. PMID:20852635

  8. 3D representations of amino acids—applications to protein sequence comparison and classification

    PubMed Central

    Li, Jie; Koehl, Patrice

    2014-01-01

    The amino acid sequence of a protein is the key to understanding its structure and ultimately its function in the cell. This paper addresses the fundamental issue of encoding amino acids in ways that the representation of such a protein sequence facilitates the decoding of its information content. We show that a feature-based representation in a three-dimensional (3D) space derived from amino acid substitution matrices provides an adequate representation that can be used for direct comparison of protein sequences based on geometry. We measure the performance of such a representation in the context of the protein structural fold prediction problem. We compare the results of classifying different sets of proteins belonging to distinct structural folds against classifications of the same proteins obtained from sequence alone or directly from structural information. We find that sequence alone performs poorly as a structure classifier. We show in contrast that the use of the three dimensional representation of the sequences significantly improves the classification accuracy. We conclude with a discussion of the current limitations of such a representation and with a description of potential improvements. PMID:25379143

  9. Comparison of pulse sequences for R1-based electron paramagnetic resonance oxygen imaging.

    PubMed

    Epel, Boris; Halpern, Howard J

    2015-05-01

    Electron paramagnetic resonance (EPR) spin-lattice relaxation (SLR) oxygen imaging has proven to be an indispensable tool for assessing oxygen partial pressure in live animals. EPR oxygen images show remarkable oxygen accuracy when combined with high precision and spatial resolution. Developing more effective means for obtaining SLR rates is of great practical, biological and medical importance. In this work we compared different pulse EPR imaging protocols and pulse sequences to establish advantages and areas of applicability for each method. Tests were performed using phantoms containing spin probes with oxygen concentrations relevant to in vivo oxymetry. We have found that for small animal size objects the inversion recovery sequence combined with the filtered backprojection reconstruction method delivers the best accuracy and precision. For large animals, in which large radio frequency energy deposition might be critical, free induction decay and three pulse stimulated echo sequences might find better practical usage. PMID:25828242

  10. Comparison of pulse sequences for R1-based electron paramagnetic resonance oxygen imaging

    NASA Astrophysics Data System (ADS)

    Epel, Boris; Halpern, Howard J.

    2015-05-01

    Electron paramagnetic resonance (EPR) spin-lattice relaxation (SLR) oxygen imaging has proven to be an indispensable tool for assessing oxygen partial pressure in live animals. EPR oxygen images show remarkable oxygen accuracy when combined with high precision and spatial resolution. Developing more effective means for obtaining SLR rates is of great practical, biological and medical importance. In this work we compared different pulse EPR imaging protocols and pulse sequences to establish advantages and areas of applicability for each method. Tests were performed using phantoms containing spin probes with oxygen concentrations relevant to in vivo oxymetry. We have found that for small animal size objects the inversion recovery sequence combined with the filtered backprojection reconstruction method delivers the best accuracy and precision. For large animals, in which large radio frequency energy deposition might be critical, free induction decay and three pulse stimulated echo sequences might find better practical usage.

  11. Comparison of Pulse Sequences for R1–based Electron Paramagnetic Resonance Oxygen Imaging

    PubMed Central

    Epel, Boris; Halpern, Howard J.

    2015-01-01

    Electron paramagnetic resonance (EPR) spin-lattice relaxation (SLR) oxygen imaging has proven to be an indispensable tool for assessing oxygen partial pressure in live animals. EPR oxygen images show remarkable oxygen accuracy when combined with high precision and spatial resolution. Developing more effective means for obtaining SLR rates is of great practical, biological and medical importance. In this work we compared different pulse EPR imaging protocols and pulse sequences to establish advantages and areas of applicability for each method. Tests were performed using phantoms containing spin probes with oxygen concentrations relevant to in vivo oxymetry. We have found that for small animal size objects the inversion recovery sequence combined with the filtered backprojection reconstruction method delivers the best accuracy and precision. For large animals, in which large radio frequency energy deposition might be critical, free induction decay and three pulse stimulated echo sequences might find better practical usage. PMID:25828242

  12. A Comparison of the First Two Sequenced Chloroplast Genomes in Asteraceae: Lettuce and Sunflower

    SciTech Connect

    Timme, Ruth E.; Kuehl, Jennifer V.; Boore, Jeffrey L.; Jansen, Robert K.

    2006-01-20

    Asteraceae is the second largest family of plants, with over 20,000 species. For the past few decades, numerous phylogenetic studies have contributed to our understanding of the evolutionary relationships within this family, including comparisons of the fast evolving chloroplast gene, ndhF, rbcL, as well as non-coding DNA from the trnL intron plus the trnLtrnF intergenic spacer, matK, and, with lesser resolution, psbA-trnH. This culminated in a study by Panero and Funk in 2002 that used over 13,000 bp per taxon for the largest taxonomic revision of Asteraceae in over a hundred years. Still, some uncertainties remain, and it would be very useful to have more information on the relative rates of sequence evolution among various genes and on genome structure as a potential set of phylogenetic characters to help guide future phylogenetic structures. By way of contributing to this, we report the first two complete chloroplast genome sequences from members of the Asteraceae, those of Helianthus annuus and Lactuca sativa. These plants belong to two distantly related subfamilies, Asteroideae and Cichorioideae, respectively. In addition to these, there is only one other published chloroplast genome sequence for any plant within the larger group called Eusterids II, that of Panax ginseng (Araliaceae, 156,318 bps, AY582139). Early chloroplast genome mapping studies demonstrated that H. annuus and L. sativa share a 22 kb inversion relative to members of the subfamily Barnadesioideae. By comparison to outgroups, this inversion was shown to be derived, indicating that the Asteroideae and Cichorioideae are more closely related than either is to the Barnadesioideae. Later sequencing study found that taxa that share this 22 kb inversion also contain within this region a second, smaller, 3.3 kb inversion. These sequences also enable an analysis of patterns of shared repeats in the genomes at fine level and of RNA editing by comparison to available EST sequences. In addition, since

  13. Sequence-matched probes produce increased cross-platform consistency and more reproducible biological results in microarray-based gene expression measurements

    PubMed Central

    Mecham, Brigham H.; Klus, Gregory T.; Strovel, Jeffrey; Augustus, Meena; Byrne, David; Bozso, Peter; Wetmore, Daniel Z.; Mariani, Thomas J.; Kohane, Isaac S.; Szallasi, Zoltan

    2004-01-01

    Cancer derived microarray data sets are routinely produced by various platforms that are either commercially available or manufactured by academic groups. The fundamental difference in their probe selection strategies holds the promise that identical observations produced by more than one platform prove to be more robust when validated by biology. However, cross-platform comparison requires matching corresponding probe sets. We are introducing here sequence-based matching of probes instead of gene identifier-based matching. We analyzed breast cancer cell line derived RNA aliquots using Agilent cDNA and Affymetrix oligonucleotide microarray platforms to assess the advantage of this method. We show, that at different levels of the analysis, including gene expression ratios and difference calls, cross-platform consistency is significantly improved by sequence- based matching. We also present evidence that sequence-based probe matching produces more consistent results when comparing similar biological data sets obtained by different microarray platforms. This strategy allowed a more efficient transfer of classification of breast cancer samples between data sets produced by cDNA microarray and Affymetrix gene-chip platforms. PMID:15161944

  14. Sequence-matched probes produce increased cross-platform consistency and more reproducible biological results in microarray-based gene expression measurements.

    PubMed

    Mecham, Brigham H; Klus, Gregory T; Strovel, Jeffrey; Augustus, Meena; Byrne, David; Bozso, Peter; Wetmore, Daniel Z; Mariani, Thomas J; Kohane, Isaac S; Szallasi, Zoltan

    2004-01-01

    Cancer derived microarray data sets are routinely produced by various platforms that are either commercially available or manufactured by academic groups. The fundamental difference in their probe selection strategies holds the promise that identical observations produced by more than one platform prove to be more robust when validated by biology. However, cross-platform comparison requires matching corresponding probe sets. We are introducing here sequence-based matching of probes instead of gene identifier-based matching. We analyzed breast cancer cell line derived RNA aliquots using Agilent cDNA and Affymetrix oligonucleotide microarray platforms to assess the advantage of this method. We show, that at different levels of the analysis, including gene expression ratios and difference calls, cross-platform consistency is significantly improved by sequence- based matching. We also present evidence that sequence-based probe matching produces more consistent results when comparing similar biological data sets obtained by different microarray platforms. This strategy allowed a more efficient transfer of classification of breast cancer samples between data sets produced by cDNA microarray and Affymetrix gene-chip platforms. PMID:15161944

  15. Correlation between MCAT Biology Content Specifications and Topic Scope and Sequence of General Education College Biology Textbooks

    ERIC Educational Resources Information Center

    Rissing, Steven W.

    2013-01-01

    Most American colleges and universities offer gateway biology courses to meet the needs of three undergraduate audiences: biology and related science majors, many of whom will become biomedical researchers; premedical students meeting medical school requirements and preparing for the Medical College Admissions Test (MCAT); and students completing…

  16. Multi-species sequence comparison reveals conservation of ghrelin gene-derived splice variants encoding a truncated ghrelin peptide.

    PubMed

    Seim, Inge; Jeffery, Penny L; Thomas, Patrick B; Walpole, Carina M; Maugham, Michelle; Fung, Jenny N T; Yap, Pei-Yi; O'Keeffe, Angela J; Lai, John; Whiteside, Eliza J; Herington, Adrian C; Chopin, Lisa K

    2016-06-01

    The peptide hormone ghrelin is a potent orexigen produced predominantly in the stomach. It has a number of other biological actions, including roles in appetite stimulation, energy balance, the stimulation of growth hormone release and the regulation of cell proliferation. Recently, several ghrelin gene splice variants have been described. Here, we attempted to identify conserved alternative splicing of the ghrelin gene by cross-species sequence comparisons. We identified a novel human exon 2-deleted variant and provide preliminary evidence that this splice variant and in1-ghrelin encode a C-terminally truncated form of the ghrelin peptide, termed minighrelin. These variants are expressed in humans and mice, demonstrating conservation of alternative splicing spanning 90 million years. Minighrelin appears to have similar actions to full-length ghrelin, as treatment with exogenous minighrelin peptide stimulates appetite and feeding in mice. Forced expression of the exon 2-deleted preproghrelin variant mirrors the effect of the canonical preproghrelin, stimulating cell proliferation and migration in the PC3 prostate cancer cell line. This is the first study to characterise an exon 2-deleted preproghrelin variant and to demonstrate sequence conservation of ghrelin gene-derived splice variants that encode a truncated ghrelin peptide. This adds further impetus for studies into the alternative splicing of the ghrelin gene and the function of novel ghrelin peptides in vertebrates. PMID:26792793

  17. Simultaneous removal of nanosilver and fullerene in sequencing batch reactors for biological wastewater treatment.

    PubMed

    Yang, Yu; Wang, Yifei; Hristovski, Kiril; Westerhoff, Paul

    2015-04-01

    Increasing use of engineered nanomaterials (ENMs) inevitably leads to their potential release to the sewer system. The co-removal of nano fullerenes (nC60) and nanosilver as well as their impact on COD removal were studied in biological sequencing batch reactors (SBR) for a year. When dosing nC60 at 0.07-2mgL(-1), the SBR removed greater than 95% of nC60 except for short-term interruptions occurred (i.e., dysfunction of bioreactor by nanosilver addition) when nC60 and nanosilver were dosed simultaneously. During repeated 30-d periods of adding both 2 mg L(-1) nC60 and 2 mg L(-1) nanosilver, short-term interruption of SBRs for 4d was observed and accompanied by (1) reduced total suspended solids in the reactor, (2) poor COD removal rate as low as 22%, and (3) decreased nC60 removal to 0%. After the short-term interruption, COD removal gradually returned to normal within one solids retention time. Except for during these "short-term interruptions", the silver removal rate was above 90%. A series of bottle-point batch experiments was conducted to determine the distribution coefficients of nC60 between liquid and biomass phases. A linear distribution model on nC60 combined with a mass balance equation simulated well its removal rate at a range of 0.07-0.76 mg L(-1) in SBRs. This paper illustrates the effect of "pulse" inputs (i.e., addition for a short period of time) of ENMs into biological reactors, demonstrates long-term capability of SBRs to remove ENMs and COD, and provides an example to predict the removal of ENMs in SBRs upon batch experiments. PMID:25532763

  18. Long-Range Correlations in the Sequence of Human Heartbeats and Other Biological Signals

    NASA Astrophysics Data System (ADS)

    Teich, Malvin C.

    1998-03-01

    The sequence of heartbeat occurrence times provides information about the state of health of the heart. We used a variety of measures, including multiresolution wavelet analysis, to identify the form of the point process that describes the human heartbeat. These measures, which are based on both interbeat (R-R) intervals and counts (heart rate), have been applied to records for both normal and heart-failure patients drawn from a standard database, and various surrogate versions thereof. Several of these measures reveal scaling behavior (1/f-type fluctuations; long-range power-law correlations).(R. G. Turcott and M. C. Teich, Proc. SPIE) 2036 (Chaos in Biology and Medicine), 22--39 (1993). Essentially all of the R-R and count-based measures we investigated, including those that exhibit scaling, differ in statistically significant ways for the normal and heart-failure patients. The wavelet measures, however, reveal a heretofore unknown scale window, between 16 and 32 heartbeats, over which the magnitudes of the wavelet-coefficient variances fall into disjoint sets for the normal and heart-failure patients.(R. G. Turcott and M. C. Teich, Ann. Biomed. Eng.) 24, 269--293 (1996).^,(S. Thurner, M. C. Feurstein, and M. C. Teich, Phys. Rev. Lett.) (in press). This enables us to correctly classify every patient in the standard data set as either belonging to the heart-failure or normal group with 100% accuracy, thereby providing a clinically significant measure of the presence of heart-failure. Previous approaches have provided only statistically significant measures. The tradeoff between sensitivity and

  19. Genome sequence comparison of two United States live attenuated vaccines of infectious laryngotracheitis virus (ILTV).

    PubMed

    Chandra, Yohanna Gita; Lee, Jeongyoon; Kong, Byung-Whi

    2012-06-01

    This study was conducted to identify unique nucleotide differences in two U.S. chicken embryo origin (CEO) vaccines [LT Blen (GenBank accession: JQ083493) designated as vaccine 1; Laryngo-Vac(®) (GenBank accession: JQ083494) designated as vaccine 2] of infectious laryngotracheitis virus (ILTV) genomes compared to an Australian Serva vaccine reference ILTV genome sequence [Gallid herpesvirus 1 (GaHV-1); GenBank accession number: HQ630064]. Genomes of the two vaccine ILTV strains were sequenced using Illumina Genome Analyzer 2X of 36 cycles of single-end reads. Results revealed that few nucleotide differences (23 in vaccine 1; 31 in vaccine 2) were found and indicate that the US CEO strains are practically identical to the Australian Serva CEO strain, which is a European-origin vaccine. The sequence differences demonstrated the spectrum of variability among vaccine strains. Only eight amino acid differences were found in ILTV proteins including UL54, UL27, UL28, UL20, UL1, ICP4, and US8 in vaccine 1. Similarly, in vaccine 2, eight amino acid differences were found in UL54, UL27, UL28, UL36, UL1, ICP4, US10, and US8. Further comparison of US CEO vaccines to several ILTV genome sequences revealed that US CEO vaccines are genetically close to both the Serva vaccine and 63140/C/08/BR (GenBank accession: HM188407) and are distinct from the two Australian-origin CEO vaccines, SA2 (GenBank accession: JN596962) and A20 (GenBank accession: JN596963), which showed close similarity to each other. These data demonstrate the potential of high-throughput sequencing technology to yield insight into the sequence variation of different ILTV strains. This information can be used to discriminate between vaccine ILTV strains and further, to identify newly emerging mutant strains of field isolates. PMID:22382591

  20. A comparison of tools for the simulation of genomic next-generation sequencing data.

    PubMed

    Escalona, Merly; Rocha, Sara; Posada, David

    2016-08-01

    Computer simulation of genomic data has become increasingly popular for assessing and validating biological models or for gaining an understanding of specific data sets. Several computational tools for the simulation of next-generation sequencing (NGS) data have been developed in recent years, which could be used to compare existing and new NGS analytical pipelines. Here we review 23 of these tools, highlighting their distinct functionality, requirements and potential applications. We also provide a decision tree for the informed selection of an appropriate NGS simulation tool for the specific question at hand. PMID:27320129

  1. A statistical physics perspective on alignment-independent protein sequence comparison

    PubMed Central

    Chattopadhyay, Amit K.; Nasiev, Diar; Flower, Darren R.

    2015-01-01

    Motivation: Within bioinformatics, the textual alignment of amino acid sequences has long dominated the determination of similarity between proteins, with all that implies for shared structure, function and evolutionary descent. Despite the relative success of modern-day sequence alignment algorithms, so-called alignment-free approaches offer a complementary means of determining and expressing similarity, with potential benefits in certain key applications, such as regression analysis of protein structure-function studies, where alignment-base similarity has performed poorly. Results: Here, we offer a fresh, statistical physics-based perspective focusing on the question of alignment-free comparison, in the process adapting results from ‘first passage probability distribution’ to summarize statistics of ensemble averaged amino acid propensity values. In this article, we introduce and elaborate this approach. Contact: d.r.flower@aston.ac.uk PMID:25810434

  2. Comparison of Complete Genome Sequences of Usutu Virus Strains Detected in Spain, Central Europe, and Africa

    PubMed Central

    Busquets, Núria; Nowotny, Norbert

    2014-01-01

    Abstract The complete genomic sequence of Usutu virus (USUV, genus Flavivirus, family Flaviviridae) strain MB119/06, detected in a pool of Culex pipiens mosquitoes in northeastern Spain (Viladecans, Catalonia) in 2006, was determined and analyzed. The phylogenetic relationship with all other available complete USUV genome sequences was established. The Spanish sequence investigated showed the closest relationship to the USUV prototype strain SA AR 1776 isolated in South Africa in 1959 (96.9% nucleotide and 98.8% amino acid identities). Conserved structural elements and enzyme motifs of the putative polyprotein precursor were identified. Unique amino acid substitutions were recognized; however, their potential roles as virulence markers could not be verified. Comparisons of the polyprotein precursor sequences of USUV strains detected in mosquitoes, birds, and humans could not confirm the predicted role of unique amino acid substitutions in relation to virulence in humans. Phylogenetic analysis of a partial coding section of the NS5 protein gene region indicated that USUV strains circulating in Europe form three different genetic clusters. Broad and targeted surveys for USUV in mosquitoes could reveal further details of the geographic distribution and genetic diversity of the virus in Europe and in Africa. PMID:24746182

  3. PipTools: a computational toolkit to annotate and analyze pairwise comparisons of genomic sequences.

    PubMed

    Elnitski, Laura; Riemer, Cathy; Petrykowska, Hanna; Florea, Liliana; Schwartz, Scott; Miller, Webb; Hardison, Ross

    2002-12-01

    Sequence conservation between species is useful both for locating coding regions of genes and for identifying functional noncoding segments. Hence interspecies alignment of genomic sequences is an important computational technique. However, its utility is limited without extensive annotation. We describe a suite of software tools, PipTools, and related programs that facilitate the annotation of genes and putative regulatory elements in pairwise alignments. The alignment server PipMaker uses the output of these tools to display detailed information needed to interpret alignments. These programs are provided in a portable format for use on common desktop computers and both the toolkit and the PipMaker server can be found at our Web site (http://bio.cse.psu.edu/). We illustrate the utility of the toolkit using annotation of a pairwise comparison of the mouse MHC class II and class III regions with orthologous human sequences and subsequently identify conserved, noncoding sequences that are DNase I hypersensitive sites in chromatin of mouse cells. PMID:12504859

  4. In Silico Genome Comparison and Distribution Analysis of Simple Sequences Repeats in Cassava

    PubMed Central

    Vásquez, Andrea; López, Camilo

    2014-01-01

    We conducted a SSRs density analysis in different cassava genomic regions. The information obtained was useful to establish comparisons between cassava's SSRs genomic distribution and those of poplar, flax, and Jatropha. In general, cassava has a low SSR density (~50 SSRs/Mbp) and has a high proportion of pentanucleotides, (24,2 SSRs/Mbp). It was found that coding sequences have 15,5 SSRs/Mbp, introns have 82,3 SSRs/Mbp, 5′ UTRs have 196,1 SSRs/Mbp, and 3′ UTRs have 50,5 SSRs/Mbp. Through motif analysis of cassava's genome SSRs, the most abundant motif was AT/AT while in intron sequences and UTRs regions it was AG/CT. In addition, in coding sequences the motif AAG/CTT was also found to occur most frequently; in fact, it is the third most used codon in cassava. Sequences containing SSRs were classified according to their functional annotation of Gene Ontology categories. The identified SSRs here may be a valuable addition for genetic mapping and future studies in phylogenetic analyses and genomic evolution. PMID:25374887

  5. Phallometric comparison of pedophilic interest in nonadmitting sexual offenders against stepdaughters, biological daughters, other biologically related girls, and unrelated girls.

    PubMed

    Blanchard, Ray; Kuban, Michael E; Blak, Thomas; Cantor, James M; Klassen, Philip; Dickey, Robert

    2006-01-01

    This study compared the mean levels of sexual response to children produced by four groups of men with sexual offences against prepubescent girls and two comparison groups with other offences or no offences. All groups (N = 291) consisted of patients referred for clinical assessment of their sexual behavior or interests. Group assignment was determined by the victim's age and her relation to the patient: biological daughter; stepdaughter; other biologically related girl (e.g., sister, niece, granddaughter); unrelated girl; adult woman; and no known victim. The men with sexual offences had precisely one known victim each. The patients with offences may or may not have denied the act of which they were accused, but all patients denied an erotic preference for children. Sexual response to children was assessed by means of phallometric testing, a psychophysiological technique in which the individual's penile blood volume is monitored while he is presented with a standardized set of laboratory stimuli depicting male and female children and adults. The results indicated that the mean level of pedophilic response in men with offences against daughters or stepdaughters is intermediate between that in men with offences against otherwise-related or unrelated girls and that in men with no offences against girls at all. PMID:16598663

  6. Biological ingredient analysis of traditional Chinese medicine preparation based on high-throughput sequencing: the story for Liuwei Dihuang Wan.

    PubMed

    Cheng, Xinwei; Su, Xiaoquan; Chen, Xiaohua; Zhao, Huanxin; Bo, Cunpei; Xu, Jian; Bai, Hong; Ning, Kang

    2014-01-01

    Although Traditional Chinese Medicine (TCM) preparations have long history with successful applications, the scientific and systematic quality assessment of TCM preparations mainly focuses on chemical constituents and is far from comprehensive. There are currently only few primitive studies on assessment of biological ingredients in TCM preparations. Here, we have proposed a method, M-TCM, for biological assessment of the quality of TCM preparations based on high-throughput sequencing and metagenomic analysis. We have tested this method on Liuwei Dihuang Wan (LDW), a TCM whose ingredients have been well-defined. Our results have shown that firstly, this method could determine the biological ingredients of LDW preparations. Secondly, the quality and stability of LDW varies significantly among different manufacturers. Thirdly, the overall quality of LDW samples is significantly affected by their biological contaminations. This novel strategy has the potential to achieve comprehensive ingredient profiling of TCM preparations. PMID:24888649

  7. Biological ingredient analysis of traditional Chinese medicine preparation based on high-throughput sequencing: the story for Liuwei Dihuang Wan

    PubMed Central

    Cheng, Xinwei; Su, Xiaoquan; Chen, Xiaohua; Zhao, Huanxin; Bo, Cunpei; Xu, Jian; Bai, Hong; Ning, Kang

    2014-01-01

    Although Traditional Chinese Medicine (TCM) preparations have long history with successful applications, the scientific and systematic quality assessment of TCM preparations mainly focuses on chemical constituents and is far from comprehensive. There are currently only few primitive studies on assessment of biological ingredients in TCM preparations. Here, we have proposed a method, M-TCM, for biological assessment of the quality of TCM preparations based on high-throughput sequencing and metagenomic analysis. We have tested this method on Liuwei Dihuang Wan (LDW), a TCM whose ingredients have been well-defined. Our results have shown that firstly, this method could determine the biological ingredients of LDW preparations. Secondly, the quality and stability of LDW varies significantly among different manufacturers. Thirdly, the overall quality of LDW samples is significantly affected by their biological contaminations. This novel strategy has the potential to achieve comprehensive ingredient profiling of TCM preparations. PMID:24888649

  8. Phylogenetic comparison of the pre-mRNA adenosine deaminase ADAR2 genes and transcripts: conservation and diversity in editing site sequence and alternative splicing patterns.

    PubMed

    Slavov, D; Gardiner, K

    2002-10-16

    Adenosine deaminase that acts on RNA -2 (ADAR2) is a member of a family of vertebrate genes that encode adenosine (A)-to-inosine (I) RNA deaminases, enzymes that deaminate specific A residues in specific pre-mRNAs to produce I. Known substrates of ADAR2 include sites within the coding regions of pre-mRNAs of the ionotropic glutamate receptors, GluR2-6, and the serotonin receptor, 5HT2C. Mammalian ADAR2 expression is itself regulated by A-to-I editing and by several alternative splicing events. Because the biological consequences of ADAR2 function are significant, we have undertaken a phylogenetic comparison of these features. Here we report a comparison of cDNA sequences, genomic organization, editing site sequences and patterns of alternative splicing of ADAR2 genes from human, mouse, chicken, pufferfish and zebrafish. Coding sequences and intron/exon organization are highly conserved. All ADAR2 genes show evidence of transcript editing with required sequences and predicted secondary structures very highly conserved. Patterns and levels of editing and alternative splicing vary among organisms, and include novel N-terminal exons and splicing events. PMID:12459255

  9. nWayComp: a genome-wide sequence comparison tool for multiple strains/species of phylogenetically related microorganisms.

    PubMed

    Yao, Jiqiang; Lin, Hong; Doddapaneni, Harshavardhan; Civerolo, Edwin L

    2007-01-01

    The increasing number of whole genomic sequences of microorganisms has led to the complexity of genome-wide annotation and gene sequence comparison among multiple microorganisms. To address this problem, we have developed nWayComp software that compares DNA and protein sequences of phylogenetically-related microorganisms. This package integrates a series of bioinformatics tools such as BLAST, ClustalW, ALIGN, PHYLIP and PRIMER3 for sequence comparison. It searches for homologous sequences among multiple organisms and identifies genes that are unique to a particular organism. The homologous gene sets are then ranked in the descending order of the sequence similarity. For each set of homologous sequences, a table of sequence identity among homologous genes along with sequence variations such as SNPs and INDELS is developed, and a phylogenetic tree is constructed. In addition, a common set of primers that can amplify all the homologous sequences are generated. The nWayComp package provides users with a quick and convenient tool to compare genomic sequences among multiple organisms at the whole-genome level. PMID:17688445

  10. The organization of biological sequences into constrained and unconstrained parts determines fundamental properties of genotype-phenotype maps.

    PubMed

    Greenbury, S F; Ahnert, S E

    2015-12-01

    Biological information is stored in DNA, RNA and protein sequences, which can be understood as genotypes that are translated into phenotypes. The properties of genotype-phenotype (GP) maps have been studied in great detail for RNA secondary structure. These include a highly biased distribution of genotypes per phenotype, negative correlation of genotypic robustness and evolvability, positive correlation of phenotypic robustness and evolvability, shape-space covering, and a roughly logarithmic scaling of phenotypic robustness with phenotypic frequency. More recently similar properties have been discovered in other GP maps, suggesting that they may be fundamental to biological GP maps, in general, rather than specific to the RNA secondary structure map. Here we propose that the above properties arise from the fundamental organization of biological information into 'constrained' and 'unconstrained' sequences, in the broadest possible sense. As 'constrained' we describe sequences that affect the phenotype more immediately, and are therefore more sensitive to mutations, such as, e.g. protein-coding DNA or the stems in RNA secondary structure. 'Unconstrained' sequences, on the other hand, can mutate more freely without affecting the phenotype, such as, e.g. intronic or intergenic DNA or the loops in RNA secondary structure. To test our hypothesis we consider a highly simplified GP map that has genotypes with 'coding' and 'non-coding' parts. We term this the Fibonacci GP map, as it is equivalent to the Fibonacci code in information theory. Despite its simplicity the Fibonacci GP map exhibits all the above properties of much more complex and biologically realistic GP maps. These properties are therefore likely to be fundamental to many biological GP maps. PMID:26609063

  11. The organization of biological sequences into constrained and unconstrained parts determines fundamental properties of genotype–phenotype maps

    PubMed Central

    Greenbury, S. F.; Ahnert, S. E.

    2015-01-01

    Biological information is stored in DNA, RNA and protein sequences, which can be understood as genotypes that are translated into phenotypes. The properties of genotype–phenotype (GP) maps have been studied in great detail for RNA secondary structure. These include a highly biased distribution of genotypes per phenotype, negative correlation of genotypic robustness and evolvability, positive correlation of phenotypic robustness and evolvability, shape-space covering, and a roughly logarithmic scaling of phenotypic robustness with phenotypic frequency. More recently similar properties have been discovered in other GP maps, suggesting that they may be fundamental to biological GP maps, in general, rather than specific to the RNA secondary structure map. Here we propose that the above properties arise from the fundamental organization of biological information into ‘constrained' and ‘unconstrained' sequences, in the broadest possible sense. As ‘constrained' we describe sequences that affect the phenotype more immediately, and are therefore more sensitive to mutations, such as, e.g. protein-coding DNA or the stems in RNA secondary structure. ‘Unconstrained' sequences, on the other hand, can mutate more freely without affecting the phenotype, such as, e.g. intronic or intergenic DNA or the loops in RNA secondary structure. To test our hypothesis we consider a highly simplified GP map that has genotypes with ‘coding' and ‘non-coding' parts. We term this the Fibonacci GP map, as it is equivalent to the Fibonacci code in information theory. Despite its simplicity the Fibonacci GP map exhibits all the above properties of much more complex and biologically realistic GP maps. These properties are therefore likely to be fundamental to many biological GP maps. PMID:26609063

  12. Implicit Sequence Learning in Dyslexia: A Within-Sequence Comparison of First- and Higher-Order Information

    ERIC Educational Resources Information Center

    Du, Wenchong; Kelly, Steve W.

    2013-01-01

    The present study examines implicit sequence learning in adult dyslexics with a focus on comparing sequence transitions with different statistical complexities. Learning of a 12-item deterministic sequence was assessed in 12 dyslexic and 12 non-dyslexic university students. Both groups showed equivalent standard reaction time increments when the…

  13. Computational and biological analysis of 680 kb of DNA sequence from the human 5q31 cytokine gene cluster region.

    PubMed

    Frazer, K A; Ueda, Y; Zhu, Y; Gifford, V R; Garofalo, M R; Mohandas, N; Martin, C H; Palazzolo, M J; Cheng, J F; Rubin, E M

    1997-05-01

    With the human genome project advancing into what will be a 7- to 10-year DNA sequencing phase, we are presented with the challenge of developing strategies to convert genomic sequence data, as they become available, into biologically meaningful information. We have analyzed 680 kb of noncontiguous DNA sequence from a 1-Mb region of human chromosome 5q31, coupling computational analysis with gene expression studies of tissues isolated from humans as well as from mice containing human YAC transgenes. This genomic interval has been noted previously for containing the cytokine gene cluster and a quantitative trait locus associated with inflammatory diseases. Our analysis identified and verified expression of 16 new genes, as well as 7 previously known genes. Of the total of 23 genes in this region, 78% had similarity matches to sequences in protein databases and 83% had exact expressed sequence tag (EST) database matches. Comparative mapping studies of eight of the new human genes discovered in the 5q31 region revealed that all are located in the syntenic region of mouse chromosome 11q. Our analysis demonstrates an approach for examining human sequence as it is made available from large sequencing programs and has resulted in the discovery of several biomedically important genes, including a cyclin, a transcription factor that is homologous to an oncogene, a protein involved in DNA repair, and several new members of a family of transporter proteins. PMID:9149945

  14. Next-Generation Sequencing in the Understanding of Kaposi’s Sarcoma-Associated Herpesvirus (KSHV) Biology

    PubMed Central

    Strahan, Roxanne; Uppal, Timsy; Verma, Subhash C.

    2016-01-01

    Non-Sanger-based novel nucleic acid sequencing techniques, referred to as Next-Generation Sequencing (NGS), provide a rapid, reliable, high-throughput, and massively parallel sequencing methodology that has improved our understanding of human cancers and cancer-related viruses. NGS has become a quintessential research tool for more effective characterization of complex viral and host genomes through its ever-expanding repertoire, which consists of whole-genome sequencing, whole-transcriptome sequencing, and whole-epigenome sequencing. These new NGS platforms provide a comprehensive and systematic genome-wide analysis of genomic sequences and a full transcriptional profile at a single nucleotide resolution. When combined, these techniques help unlock the function of novel genes and the related pathways that contribute to the overall viral pathogenesis. Ongoing research in the field of virology endeavors to identify the role of various underlying mechanisms that control the regulation of the herpesvirus biphasic lifecycle in order to discover potential therapeutic targets and treatment strategies. In this review, we have complied the most recent findings about the application of NGS in Kaposi’s sarcoma-associated herpesvirus (KSHV) biology, including identification of novel genomic features and whole-genome KSHV diversities, global gene regulatory network profiling for intricate transcriptome analyses, and surveying of epigenetic marks (DNA methylation, modified histones, and chromatin remodelers) during de novo, latent, and productive KSHV infections. PMID:27043613

  15. BEAUTY: an enhanced BLAST-based search tool that integrates multiple biological information resources into sequence similarity search results.

    PubMed

    Worley, K C; Wiese, B A; Smith, R F

    1995-09-01

    BEAUTY (BLAST enhanced alignment utility) is an enhanced version of the NCBI's BLAST data base search tool that facilitates identification of the functions of matched sequences. We have created new data bases of conserved regions and functional domains for protein sequences in NCBI's Entrez data base, and BEAUTY allows this information to be incorporated directly into BLAST search results. A Conserved Regions Data Base, containing the locations of conserved regions within Entrez protein sequences, was constructed by (1) clustering the entire data base into families, (2) aligning each family using our PIMA multiple sequence alignment program, and (3) scanning the multiple alignments to locate the conserved regions within each aligned sequence. A separate Annotated Domains Data Base was constructed by extracting the locations of all annotated domains and sites from sequences represented in the Entrez, PROSITE, BLOCKS, and PRINTS data bases. BEAUTY performs a BLAST search of those Entrez sequences with conserved regions and/or annotated domains. BEAUTY then uses the information from the Conserved Regions and Annotated Domains data bases to generate, for each matched sequence, a schematic display that allows one to directly compare the relative locations of (1) the conserved regions, (2) annotated domains and sites, and (3) the locally aligned regions matched in the BLAST search. In addition, BEAUTY search results include World-Wide Web hypertext links to a number of external data bases that provide a variety of additional types of information on the function of matched sequences. This convenient integration of protein families, conserved regions, annotated domains, alignment displays, and World-Wide Web resources greatly enhances the biological informativeness of sequence similarity searches. BEAUTY searches can be performed remotely on our system using the "BCM Search Launcher" World-Wide Web pages (URL is < http:/ /gc.bcm.tmc.edu:8088/ search

  16. Biological activities of a synthetic peptide composed of two unlinked domains from a retroviral transmembrane protein sequence.

    PubMed Central

    Wegemer, D E; Kabat, K G; Kloetzer, W S

    1990-01-01

    We report several biological activities of a synthetic peptide whose sequence contains the highly conserved region of feline leukemia virus transmembrane protein (TM) synthetically linked to another short TM-derived sequence particularly rich in polar positive residues. This 29-amino-acid peptide blocked [3H]thymidine uptake 30 to 50% by concanavalin A-stimulated CD4(+)--but not CD8(+)-enriched murine splenocytes. Maximal suppression was detected at 12.5 micrograms (3 microM) to 75 micrograms (19 microM) per ml of growth medium; stimulation of [3H]thymidine uptake was observed at higher peptide concentrations. The synthetic peptide inhibited but did not stimulate [3H]thymidine uptake by mitogen-activated thymocytes and antibody production by splenocytes as determined in a liquid hemolytic plaque assay. Similarities are reported between a consensus sequence of diverse retroviral TMs and a region of alpha interferons shown by others to be important for antiviral and cytostatic properties. The TM sequence-derived synthetic peptide blocked in a nontoxic and sequence-specific manner the release of murine leukemia virus from two chronically infected cell lines. We suggest that some of the biological effects of retroviral TM are mediated through a common pathway shared with alpha interferons. Images PMID:1969500

  17. uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts

    PubMed Central

    Jiang, Minghui; Anderson, James; Gillespie, Joel; Mayne, Martin

    2008-01-01

    Background Randomly shuffled sequences are routinely used in sequence analysis to evaluate the statistical significance of a biological sequence. In many cases, biologists need sophisticated shuffling tools that preserve not only the counts of distinct letters but also higher-order statistics such as doublet counts, triplet counts, and, in general, k-let counts. Results We present a sequence analysis tool (named uShuffle) for generating uniform random permutations of biological sequences (such as DNAs, RNAs, and proteins) that preserve the exact k-let counts. The uShuffle tool implements the latest variant of the Euler algorithm and uses Wilson's algorithm in the crucial step of arborescence generation. It is carefully engineered and extremely efficient. The uShuffle tool achieves maximum flexibility by allowing arbitrary alphabet size and let size. It can be used as a command-line program, a web application, or a utility library. Source code in C, Java, and C#, and integration instructions for Perl and Python are provided. Conclusion The uShuffle tool surpasses existing implementation of the Euler algorithm in both performance and flexibility. It is a useful tool for the bioinformatics community. PMID:18405375

  18. Sequence of subunit c of the Na(+)-translocating F1F0 ATPase of Acetobacterium woodii: proposal for determinants of Na+ specificity as revealed by sequence comparisons.

    PubMed

    Rahlfs, S; Müller, V

    1997-03-10

    A 3.2 kb EcoRI fragment carrying genes for Na(+)-F1F0 ATPase was cloned from chromosomal DNA of Acetobacterium woodii. DNA sequence analysis revealed the presence of an open reading frame which was identified by data base searches and comparison with the experimentally derived N-terminal amino acid sequence to code for subunit c of Na(+)-F1F0 ATPase. A comparison of the primary sequences of the two well established Na(+)-translocating F1F0 ATPases from Acetobacterium woodii and Propionigenium modestum with H(+)-translocating enzymes indicates the length of the C-terminus as well as specific residues located in the cytoplasmic membrane to be important for Na+ transport. PMID:9119076

  19. Substrate-Driven Mapping of the Degradome by Comparison of Sequence Logos

    PubMed Central

    Fuchs, Julian E.; von Grafenstein, Susanne; Huber, Roland G.; Kramer, Christian; Liedl, Klaus R.

    2013-01-01

    Sequence logos are frequently used to illustrate substrate preferences and specificity of proteases. Here, we employed the compiled substrates of the MEROPS database to introduce a novel metric for comparison of protease substrate preferences. The constructed similarity matrix of 62 proteases can be used to intuitively visualize similarities in protease substrate readout via principal component analysis and construction of protease specificity trees. Since our new metric is solely based on substrate data, we can engraft the protease tree including proteolytic enzymes of different evolutionary origin. Thereby, our analyses confirm pronounced overlaps in substrate recognition not only between proteases closely related on sequence basis but also between proteolytic enzymes of different evolutionary origin and catalytic type. To illustrate the applicability of our approach we analyze the distribution of targets of small molecules from the ChEMBL database in our substrate-based protease specificity trees. We observe a striking clustering of annotated targets in tree branches even though these grouped targets do not necessarily share similarity on protein sequence level. This highlights the value and applicability of knowledge acquired from peptide substrates in drug design of small molecules, e.g., for the prediction of off-target effects or drug repurposing. Consequently, our similarity metric allows to map the degradome and its associated drug target network via comparison of known substrate peptides. The substrate-driven view of protein-protein interfaces is not limited to the field of proteases but can be applied to any target class where a sufficient amount of known substrate data is available. PMID:24244149

  20. Comparison of ribotyping and sequence-based typing for discriminating among isolates of Bordetella bronchiseptica.

    PubMed

    Register, Karen B; Nicholson, Tracy L; Brunelle, Brian W

    2016-10-01

    PvuII ribotyping and MLST are each highly discriminatory methods for genotyping Bordetella bronchiseptica, but a direct comparison between these approaches has not been undertaken. The goal of this study was to directly compare the discriminatory power of PvuII ribotyping and MLST, using a single set of geographically and genetically diverse strains, and to determine whether subtyping based on repeat region sequences of the pertactin gene (prn) provides additional resolution. One hundred twenty-two isolates were analyzed, representing 11 mammalian or avian hosts, sourced from the United States, Europe, Israel and Australia. Thirty-two ribotype patterns were identified; one isolate could not be typed. In comparison, all isolates were typeable by MLST and a total of 30 sequence types was identified. An analysis based on Simpson's Index of Diversity (SID) revealed that ribotyping and MLST are nearly equally discriminatory, with SIDs of 0.920 for ribotyping and 0.919 for MLST. Nonetheless, for ten ribotypes and eight MLST sequence types, the alternative method discriminates among isolates that otherwise type identically. Pairing prn repeat region typing with ribotyping yielded 54 genotypes and increased the SID to 0.954. Repeat region typing combined with MLST resulted in 47 genotypes and an SID of 0.944. Given the technical and practical advantages of MLST over ribotyping, and the nominal difference in their SIDs, we conclude MLST is the preferred primary typing tool. We recommend the combination of MLST and prn repeat region typing as a high-resolution, objective and standardized approach valuable for investigating the population structure and epidemiology of B. bronchiseptica. PMID:27542997

  1. A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies

    PubMed Central

    Zhang, Wenyu; Chen, Jiajia; Yang, Yang; Tang, Yifei; Shang, Jing; Shen, Bairong

    2011-01-01

    The advent of next-generation sequencing technologies is accompanied with the development of many whole-genome sequence assembly methods and software, especially for de novo fragment assembly. Due to the poor knowledge about the applicability and performance of these software tools, choosing a befitting assembler becomes a tough task. Here, we provide the information of adaptivity for each program, then above all, compare the performance of eight distinct tools against eight groups of simulated datasets from Solexa sequencing platform. Considering the computational time, maximum random access memory (RAM) occupancy, assembly accuracy and integrity, our study indicate that string-based assemblers, overlap-layout-consensus (OLC) assemblers are well-suited for very short reads and longer reads of small genomes respectively. For large datasets of more than hundred millions of short reads, De Bruijn graph-based assemblers would be more appropriate. In terms of software implementation, string-based assemblers are superior to graph-based ones, of which SOAPdenovo is complex for the creation of configuration file. Our comparison study will assist researchers in selecting a well-suited assembler and offer essential information for the improvement of existing assemblers or the developing of novel assemblers. PMID:21423806

  2. Thermal adaptation analyzed by comparison of protein sequences from mesophilic and extremely thermophilic Methanococcus species

    NASA Technical Reports Server (NTRS)

    Haney, P. J.; Badger, J. H.; Buldak, G. L.; Reich, C. I.; Woese, C. R.; Olsen, G. J.

    1999-01-01

    The genome sequence of the extremely thermophilic archaeon Methanococcus jannaschii provides a wealth of data on proteins from a thermophile. In this paper, sequences of 115 proteins from M. jannaschii are compared with their homologs from mesophilic Methanococcus species. Although the growth temperatures of the mesophiles are about 50 degrees C below that of M. jannaschii, their genomic G+C contents are nearly identical. The properties most correlated with the proteins of the thermophile include higher residue volume, higher residue hydrophobicity, more charged amino acids (especially Glu, Arg, and Lys), and fewer uncharged polar residues (Ser, Thr, Asn, and Gln). These are recurring themes, with all trends applying to 83-92% of the proteins for which complete sequences were available. Nearly all of the amino acid replacements most significantly correlated with the temperature change are the same relatively conservative changes observed in all proteins, but in the case of the mesophile/thermophile comparison there is a directional bias. We identify 26 specific pairs of amino acids with a statistically significant (P < 0.01) preferred direction of replacement.

  3. Structural biology of disease-associated repetitive DNA sequences and protein-DNA complexes involved in DNA damage and repair

    SciTech Connect

    Gupta, G.; Santhana Mariappan, S.V.; Chen, X.; Catasti, P.; Silks, L.A. III; Moyzis, R.K.; Bradbury, E.M.; Garcia, A.E.

    1997-07-01

    This project is aimed at formulating the sequence-structure-function correlations of various microsatellites in the human (and other eukaryotic) genomes. Here the authors have been able to develop and apply structure biology tools to understand the following: the molecular mechanism of length polymorphism microsatellites; the molecular mechanism by which the microsatellites in the noncoding regions alter the regulation of the associated gene; and finally, the molecular mechanism by which the expansion of these microsatellites impairs gene expression and causes the disease. Their multidisciplinary structural biology approach is quantitative and can be applied to all coding and noncoding DNA sequences associated with any gene. Both NIH and DOE are interested in developing quantitative tools for understanding the function of various human genes for prevention against diseases caused by genetic and environmental effects.

  4. Sequence comparisons in the aminoacyl-tRNA synthetases with emphasis on regions of likely homology with sequences in the Rossmann fold in the methionyl and tyrosyl enzymes.

    PubMed

    Walker, E J; Jeffrey, P D

    1988-02-01

    Amino acid sequences of aminoacyl-tRNA synthetases specific for 12 different amino acids have now been published. Differences in origin at the species and organelle level result in 20 distinct sequences being available for comparison. Some of these were compared in small groups as they were determined and, although some homologies were detected, it was generally concluded that there was surprisingly little sequence homology in this functionally related group of enzymes. We have made comparisons of all of the available sequences by using a combination of computer and manual alignment methods and knowledge of the sequences in the Rossmann fold region of methionyl-tRNA synthetase from E. coli and tyrosyl-tRNA synthetase from B. stearothermophilus, enzymes whose three-dimensional structures have been described. It emerges that all of the aminoacyl-tRNA synthetase sequences thus examined show considerable homology with each other over at least parts of this region, some over virtually all of it. We conclude that a great deal more similarity than had previously been suspected exists in these proteins. In particular, the alignments we have made strongly imply the existence of a mononucleotide binding site of the Rossmann fold configuration in all of the synthetases compared. PMID:3283733

  5. Statistical physics approach to categorize biologic signals: From heart rate dynamics to DNA sequences

    NASA Astrophysics Data System (ADS)

    Peng, C.-K.; Yang, Albert C.-C.; Goldberger, Ary L.

    2007-03-01

    We recently proposed a novel approach to categorize information carried by symbolic sequences based on their usage of repetitive patterns. A simple quantitative index to measure the dissimilarity between two symbolic sequences can be defined. This information dissimilarity index, defined by our formula, is closely related to the Shannon entropy and rank order of the repetitive patterns in the symbolic sequences. Here we discuss the underlying statistical physics assumptions of this dissimilarity index. We use human cardiac interbeat interval time series and DNA sequences as examples to illustrate the applicability of this generic approach to real-world problems.

  6. Eimeria maxima phosphatidylinositol 4-phosphate 5-kinase: locus sequencing, characterization, and cross-phylum comparison.

    PubMed

    Goh, Mei-Yen; Pan, Mei-Zhen; Blake, Damer P; Wan, Kiew-Lian; Song, Beng-Kah

    2011-03-01

    Phosphatidylinositol 4-phosphate 5-kinase (PIP5K) may play an important role in host-cell invasion by the Eimeria species, protozoan parasites which can cause severe intestinal disease in livestock. Here, we report the structural organization of the PIP5K gene in Eimeria maxima (Weybridge strain). Two E. maxima BAC clones carrying the E. maxima PIP5K (EmPIP5K) coding sequences were selected for shotgun sequencing, yielding a 9.1-kb genomic segment. The EmPIP5K coding region was initially identified using in silico gene-prediction approaches and subsequently confirmed by mapping rapid amplification of cDNA ends and RT-PCR-generated cDNA sequence to its genomic segment. The putative EmPIP5K gene was located at position 710-8036 nt on the complimentary strand and comprised of 23 exons. Alignment of the 1147 amino acid sequence with previously annotated PIP5K proteins from other Apicomplexa species detected three conserved motifs encompassing the kinase core domain, which has been shown by previous protein deletion studies to be necessary for PIP5K protein function. Phylogenetic analysis provided further evidence that the putative EmPIP5K protein is orthologous to that of other Apicomplexa. Subsequent comparative gene structure characterization revealed events of intron loss/gain throughout the evolution of the apicomplexan PIP5K gene. Further scrutiny of the genomic structure revealed a possible trend towards "intron gain" between two of the motif regions. Our findings offer preliminary insights into the structural variations that have occurred during the evolution of the PIP5K locus and may aid in understanding the functional role of this gene in the cellular biology of apicomplexan parasites. PMID:20938684

  7. Molecular, biological, and morphometric comparisons between different geographical populations of Rhipicephalus sanguineus sensu lato (Acari: Ixodidae).

    PubMed

    Sanches, Gustavo S; Évora, Patrícia M; Mangold, Atílio J; Jittapalapong, Sattaporn; Rodriguez-Mallon, Alina; Guzmán, Pedro E E; Bechara, Gervásio H; Camargo-Mathias, Maria I

    2016-01-15

    In this study, different geographical populations of Rhipicephalus sanguineus sensu lato were compared by molecular, biological, and morphometric methods. Phylogenetic trees were constructed using 12S and 16S rDNA sequences and showed two distinct clades: one composed of ticks from Brazil (Jaboticabal, SP), Cuba (Havana) Thailand (Bangkok) and the so-called "tropical strain" ticks. The second clade was composed of ticks from Spain (Zaragoza), Argentina (Rafaela, Santa Fe) and the so-called "temperate strain" ticks. Morphometric analysis showed good separation between females of the two clades and within the temperate clade. Males also exhibited separation between the two clades, but with some overlap. Multiple biological parameters revealed differences between the two clades, especially the weight of the engorged female. These results confirm the existence of at least two species under the name "R. sanguineus". PMID:26790741

  8. Biological characterization and complete genomic sequence of Carrot thin leaf virus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The host range of a cilantro isolate of Carrot thin leaf virus (CTLV-Cs) was determined to include 15 plant species. The virus was also transmitted to 9 of 11 tested apiaceous species by aphids. Complete genomic sequences of CTLV-Cs and a carrot isolate of CTLV were determined. Their genomic sequenc...

  9. Protein identities from 'Graphocephala atropunctata' expressed sequence tags: Expanding leafhopper vector biology

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Heat shock proteins and 44 protein sequences from the blue-green sharpshooter, BGSS, were produced and identified. The sequences were submitted and published under accession numbers: DQ445499-DQ445542, in the National Center for Biotechnology Information, NCBI, Public Database. The blue-green sharps...

  10. Sequence-dependent collective properties of DNAs and their role in biological systems

    NASA Astrophysics Data System (ADS)

    De Santis, Pasquale; Scipioni, Anita

    2013-03-01

    DNA actively interacts with proteins involved in replication, transcription, repair, and regulation processes inside the cell. The base sequence encodes the dynamics of these transformations from the atomic to the nanometre scale length, and over higher spatial scales. In fact, although an important part of the DNA informational content acts locally, it exerts its functions as collective properties of relatively long sequences and manifests as static and dynamic curvature. Physical models that explore different aspects of DNA collective properties associated to such superstructural properties encoded in the sequence will be reviewed. The B-DNA periodicity operates as band-pass-filter; only the local physical-chemical variance associated to the sequence, in phase with the helical periodicity, sums up and reveals at higher scale. In this light, the gel electrophoresis behaviour of DNAs, the nucleosome thermodynamic stability and positioning along genomes were interpreted and discussed. Finally, a part of this review is reserved to describe the ability of some inorganic crystal surfaces to recognize and stabilize certain DNA tracts with peculiar sequences. The collective superstructural properties of DNAs could be involved in the selective interaction between DNA sequence and particular crystal surfaces. It may be conceived that sequences strongly adsorbed on surface could nucleate and expand bits of information in primeval DNA (and/or RNA) chains, early characterized by random sequences, since more protected against the physical-chemical injuries by the environment, and therefore involved in the evolution of their informational content.

  11. Analysis of expressed sequence tags from Uromyces appendiculatus hyphae and haustoria and their comparison to sequences from other rust fungi

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Two separate cDNA libraries were prepared for RNA extracted from bean rust (Uromyces appendiculatus) hyphae and haustoria isolated from infected leaves bean leaves (Phaseolus vulgaris cv Pint 111) between 2 and 8 dpi. Approximately 13,000 clones were sequenced from both ends and the sequences assem...

  12. A comparison of 454 sequencing and clonal sequencing for the characterization of hepatitis C virus NS3 variants.

    PubMed

    Ho, Cynthia K Y; Welkers, Matthijs R A; Thomas, Xiomara V; Sullivan, James C; Kieffer, Tara L; Reesink, Henk W; Rebers, Sjoerd P H; de Jong, Menno D; Schinkel, Janke; Molenkamp, Richard

    2015-07-01

    We compared 454 amplicon sequencing with clonal sequencing for the characterization of intra-host hepatitis C virus (HCV) NS3 variants. Clonal and 454 sequences were obtained from 12 patients enrolled in a clinical phase I study for telaprevir, an NS3-4a protease inhibitor. Thirty-nine datasets were used to compare the consensus sequence, average pairwise distance, normalized Shannon entropy, phylogenetic tree topology and the number and frequency of variants derived from both sequencing techniques. In general, a good concordance was observed between both techniques for the majority of datasets. Discordant results were observed for 5 out of 39 clonal and 454 datasets, which could be attributed to primer-related selective amplification used for clonal sequencing. Both 454 and clonal datasets consisted of a few major variants and a large number of low-frequency variants. Telaprevir resistance-associated variants were observed in low frequencies and were detected more often by 454. We conclude that performance of 454 and clonal sequencing is comparable for the characterization of intra-host virus populations. Not surprisingly, 454 is superior for the detection of low frequency resistance-associated variants. However, despite the greater coverage, 454 failed to detect some low frequency variants detected by clonal sequencing. PMID:25818622

  13. H3 and H4 histone cDNA sequences from Xenopus: a sequence comparison of H4 genes.

    PubMed Central

    Turner, P C; Woodland, H R

    1982-01-01

    Ovarian poly (A) + RNA from Xenopus laevis and Xenopus borealis was used to construct two cDNA libraries which were screened for histone sequences. cDNA clones to H4 mRNA were obtained from both species and an H3 cDNA clone from Xenopus laevis. The complete DNA sequences of these clones have been determined and are presented. These new sequences are compared with other H3 and H4 DNA sequences both in the coding and 3' noncoding regions. We find that there is considerable non-random codon usage in ten H4 genes. In addition there are some sequence similarities in the 3' noncoding regions of H3 and H4 genes. PMID:6896750

  14. Comparison of the nucleotide sequences of wheat dwarf virus (WDV) isolates from Hungary and Ukraine.

    PubMed

    Tóbiás, Istvan; Shevchenko, Oleksiy; Kiss, Balázs; Bysov, Andriy; Snihur, Halina; Polischuk, Valery; Salánki, Katalin; Palkovics, László

    2011-01-01

    Wheat dwarf virus (WDV) is the most ubiquitous virus in cereals causing huge losses in both Hungary and Ukraine. The presence of barley-and wheat-adapted strains has been confirmed, suggesting that the barley strain is restricted to barley, while the wheat strain is present in both wheat and barley plants. Five WDV isolates from wheat plants sampled in Hungary and Ukraine were sequenced and compared with known WDV isolates from GenBank. Four WDV isolates belonged to the wheat strain. Our results indicate that WDV-Odessa is an isolate of special interest since it has originated from wheat, but belongs to the barley-adapted strain, providing novel data on WDV biology and raising issues of pathogen epidemiology. PMID:21905629

  15. Shotguns and SNPs: how fast and cheap sequencing is revolutionizing plant biology.

    PubMed

    Rounsley, Steven D; Last, Robert L

    2010-03-01

    In 1998 Cereon Genomics LLC, a subsidiary of Monsanto Co., performed a shotgun sequencing of the Arabidopsis thaliana Landsberg erecta genome to a depth of twofold coverage using 'classic' Sanger sequencing. This sequence was assembled and aligned to the Columbia ecotype sequence produced by the Arabidopsis Genome Initiative. The analysis provided tens of thousands of high-confidence predictions of polymorphisms between these two varieties of A. thaliana, and the predicted polymorphisms and Landsberg erecta sequence were subsequently made available to the not-for-profit research community by Monsanto. These data have been used for a wide variety of published studies, including map-based gene identification from forward genetic screens, studies of recombination and organelle genetics, and gene expression studies. The combination of resequencing approaches with next-generation sequencing technology has led to an increasing number of similar studies of genome-wide genetic diversity in A. thaliana, including the 1001 genomes project (http://1001genomes.org). Similar approaches are becoming possible in any number of crop species as DNA sequencing costs plummet and throughput rapidly increases, promising to lay the groundwork for revolutionizing our understanding of the relationship between genotype and phenotype in plants. PMID:20409267

  16. Complete genome sequences of two biologically distinct isolates of Asparagus virus 1.

    PubMed

    Blockus, S; Lesker, T; Maiss, E

    2015-02-01

    The complete genome sequences of two asparagus virus 1 (AV-1) isolates differing in their ability to cause systemic infection in Nicotiana benthamiana were determined. Their genomes had 9,741 nucleotides excluding the 3'-terminal poly(A) tail, encoded a polyprotein of 3,112 amino acids, and shared 99.6 % nucleotide sequence identity. They differed at 37 nucleotide and 15 amino acid sequence positions (99.5 % identity) scattered over the polyprotein. The closest relatives of AV-1 in amino acid sequence identity were plum pox virus (54 %) and turnip mosaic virus (53 %), corroborating the classification of AV-1 as a member of a distinct species in the genus Potyvirus. PMID:25216774

  17. What Next? The Next Transit from Biology to Diagnostics: Next Generation Sequencing for Immunogenetics

    PubMed Central

    Gabriel, Christian; Stabentheiner, Stephanie; Danzer, Martin; Pröll, Johannes

    2011-01-01

    The human genome project triggered the introduction of next generation sequencing (NGS) systems. Although originally developed for total genome sequencing, metagenomics and plant genetics, the ultra-deep sequencing feature of NGS was utilized for diagnostic purposes in HIV resistance and tropism as well in detecting new mutations and tumor clones in oncology. Recent publications exploited the feature of clonal sequencing for immunogenetics to dissolve the growing number of ambiguities. This concept is quite reliable if all exons of interest are tested and the amplification region includes flanking introns. Challenging questions on quality control, cost effectiveness, workflow, and management of enormous loads of data remain if NGS is considered as routine method in the immunogenetics laboratory. If solved, NGS has big potential to have a major impact on immunogenetics by way of providing ambiguity-free HLA-typing results faster, but will also have a great influence on how immunogenetics testing and workflows are organized. PMID:22670120

  18. Comparison of fungi within the Gaeumannomyces-Phialophora complex by analysis of ribosomal DNA sequences.

    PubMed Central

    Bryan, G T; Daniels, M J; Osbourn, A E

    1995-01-01

    Four ascomycete species of the genus Gaeumannomyces infect roots of monocotyledons. Gaeumannomyces graminis contains four varieties, var. tritici, var. avenae, var. graminis, and var. maydis. G. graminis varieties tritici, avenae, and graminis have Phialophora-like anamorphs and, together with the other Gaeumannomyces and Phialophora species found on cereal roots, constitute the Gaeumannomyces-Phialophora complex. Relatedness of a number of Gaeumannomyces and Phialophora isolates was assessed by comparison of DNA sequences of the 18S rRNA gene, the 5.8S rRNA gene, and the internal transcribed spacers (ITS). G. graminis var. tritici, G. graminis var. avenae, and G. graminis var. graminis isolates can be distinguished from each other by nucleotide sequence differences in the ITS regions. The G. graminis var. tritici isolates can be further subdivided into R and N isolates (correlating with ability [R] or inability [N] to infect rye). Phylogenetic analysis of the ITS regions of several oat-infecting G. graminis var. tritici isolates suggests that these isolates are actually more closely related to G. graminis var. avenae. The isolates of Magnaporthe grisea included in the analysis showed a surprising degree of relatedness to members of the Gaeumannomyces-Phialophora complex. G. graminis variety-specific oligonucleotide primers were used in PCRs to amplify DNA from cereal seedlings infected with G. graminis var. tritici or G. graminis var. avenae, and these should be valuable for sensitive detection of pathogenic isolates and for diagnosis of take-all. PMID:7574606

  19. Comparison and quantitative verification of mapping algorithms for whole-genome bisulfite sequencing.

    PubMed

    Kunde-Ramamoorthy, Govindarajan; Coarfa, Cristian; Laritsky, Eleonora; Kessler, Noah J; Harris, R Alan; Xu, Mingchu; Chen, Rui; Shen, Lanlan; Milosavljevic, Aleksandar; Waterland, Robert A

    2014-04-01

    Coupling bisulfite conversion with next-generation sequencing (Bisulfite-seq) enables genome-wide measurement of DNA methylation, but poses unique challenges for mapping. However, despite a proliferation of Bisulfite-seq mapping tools, no systematic comparison of their genomic coverage and quantitative accuracy has been reported. We sequenced bisulfite-converted DNA from two tissues from each of two healthy human adults and systematically compared five widely used Bisulfite-seq mapping algorithms: Bismark, BSMAP, Pash, BatMeth and BS Seeker. We evaluated their computational speed and genomic coverage and verified their percentage methylation estimates. With the exception of BatMeth, all mappers covered >70% of CpG sites genome-wide and yielded highly concordant estimates of percentage methylation (r(2) ≥ 0.95). Fourfold variation in mapping time was found between BSMAP (fastest) and Pash (slowest). In each library, 8-12% of genomic regions covered by Bismark and Pash were not covered by BSMAP. An experiment using simulated reads confirmed that Pash has an exceptional ability to uniquely map reads in genomic regions of structural variation. Independent verification by bisulfite pyrosequencing generally confirmed the percentage methylation estimates by the mappers. Of these algorithms, Bismark provides an attractive combination of processing speed, genomic coverage and quantitative accuracy, whereas Pash offers considerably higher genomic coverage. PMID:24391148

  20. Comparison and quantitative verification of mapping algorithms for whole-genome bisulfite sequencing

    PubMed Central

    Kunde-Ramamoorthy, Govindarajan; Coarfa, Cristian; Laritsky, Eleonora; Kessler, Noah J.; Harris, R. Alan; Xu, Mingchu; Chen, Rui; Shen, Lanlan; Milosavljevic, Aleksandar; Waterland, Robert A.

    2014-01-01

    Coupling bisulfite conversion with next-generation sequencing (Bisulfite-seq) enables genome-wide measurement of DNA methylation, but poses unique challenges for mapping. However, despite a proliferation of Bisulfite-seq mapping tools, no systematic comparison of their genomic coverage and quantitative accuracy has been reported. We sequenced bisulfite-converted DNA from two tissues from each of two healthy human adults and systematically compared five widely used Bisulfite-seq mapping algorithms: Bismark, BSMAP, Pash, BatMeth and BS Seeker. We evaluated their computational speed and genomic coverage and verified their percentage methylation estimates. With the exception of BatMeth, all mappers covered >70% of CpG sites genome-wide and yielded highly concordant estimates of percentage methylation (r2 ≥ 0.95). Fourfold variation in mapping time was found between BSMAP (fastest) and Pash (slowest). In each library, 8–12% of genomic regions covered by Bismark and Pash were not covered by BSMAP. An experiment using simulated reads confirmed that Pash has an exceptional ability to uniquely map reads in genomic regions of structural variation. Independent verification by bisulfite pyrosequencing generally confirmed the percentage methylation estimates by the mappers. Of these algorithms, Bismark provides an attractive combination of processing speed, genomic coverage and quantitative accuracy, whereas Pash offers considerably higher genomic coverage. PMID:24391148

  1. The future role of next-generation DNA sequencing and metagenetics in aquatic biology monitoring programs

    EPA Science Inventory

    The development of current biological monitoring and bioassessment programs was a drastic improvement over previous programs created for monitoring a limited number of specific chemical pollutants. Although these assessment programs are better designed to address the transient an...

  2. Sequencing and characterizing the genome of Estrella lausannensis as an undergraduate project: training students and biological insights.

    PubMed

    Bertelli, Claire; Aeby, Sébastien; Chassot, Bérénice; Clulow, James; Hilfiker, Olivier; Rappo, Samuel; Ritzmann, Sébastien; Schumacher, Paolo; Terrettaz, Céline; Benaglio, Paola; Falquet, Laurent; Farinelli, Laurent; Gharib, Walid H; Goesmann, Alexander; Harshman, Keith; Linke, Burkhard; Miyazaki, Ryo; Rivolta, Carlo; Robinson-Rechavi, Marc; van der Meer, Jan Roelof; Greub, Gilbert

    2015-01-01

    With the widespread availability of high-throughput sequencing technologies, sequencing projects have become pervasive in the molecular life sciences. The huge bulk of data generated daily must be analyzed further by biologists with skills in bioinformatics and by "embedded bioinformaticians," i.e., bioinformaticians integrated in wet lab research groups. Thus, students interested in molecular life sciences must be trained in the main steps of genomics: sequencing, assembly, annotation and analysis. To reach that goal, a practical course has been set up for master students at the University of Lausanne: the "Sequence a genome" class. At the beginning of the academic year, a few bacterial species whose genome is unknown are provided to the students, who sequence and assemble the genome(s) and perform manual annotation. Here, we report the progress of the first class from September 2010 to June 2011 and the results obtained by seven master students who specifically assembled and annotated the genome of Estrella lausannensis, an obligate intracellular bacterium related to Chlamydia. The draft genome of Estrella is composed of 29 scaffolds encompassing 2,819,825 bp that encode for 2233 putative proteins. Estrella also possesses a 9136 bp plasmid that encodes for 14 genes, among which we found an integrase and a toxin/antitoxin module. Like all other members of the Chlamydiales order, Estrella possesses a highly conserved type III secretion system, considered as a key virulence factor. The annotation of the Estrella genome also allowed the characterization of the metabolic abilities of this strictly intracellular bacterium. Altogether, the students provided the scientific community with the Estrella genome sequence and a preliminary understanding of the biology of this recently-discovered bacterial genus, while learning to use cutting-edge technologies for sequencing and to perform bioinformatics analyses. PMID:25745418

  3. Sequencing and characterizing the genome of Estrella lausannensis as an undergraduate project: training students and biological insights

    PubMed Central

    Bertelli, Claire; Aeby, Sébastien; Chassot, Bérénice; Clulow, James; Hilfiker, Olivier; Rappo, Samuel; Ritzmann, Sébastien; Schumacher, Paolo; Terrettaz, Céline; Benaglio, Paola; Falquet, Laurent; Farinelli, Laurent; Gharib, Walid H.; Goesmann, Alexander; Harshman, Keith; Linke, Burkhard; Miyazaki, Ryo; Rivolta, Carlo; Robinson-Rechavi, Marc; van der Meer, Jan Roelof; Greub, Gilbert

    2015-01-01

    With the widespread availability of high-throughput sequencing technologies, sequencing projects have become pervasive in the molecular life sciences. The huge bulk of data generated daily must be analyzed further by biologists with skills in bioinformatics and by “embedded bioinformaticians,” i.e., bioinformaticians integrated in wet lab research groups. Thus, students interested in molecular life sciences must be trained in the main steps of genomics: sequencing, assembly, annotation and analysis. To reach that goal, a practical course has been set up for master students at the University of Lausanne: the “Sequence a genome” class. At the beginning of the academic year, a few bacterial species whose genome is unknown are provided to the students, who sequence and assemble the genome(s) and perform manual annotation. Here, we report the progress of the first class from September 2010 to June 2011 and the results obtained by seven master students who specifically assembled and annotated the genome of Estrella lausannensis, an obligate intracellular bacterium related to Chlamydia. The draft genome of Estrella is composed of 29 scaffolds encompassing 2,819,825 bp that encode for 2233 putative proteins. Estrella also possesses a 9136 bp plasmid that encodes for 14 genes, among which we found an integrase and a toxin/antitoxin module. Like all other members of the Chlamydiales order, Estrella possesses a highly conserved type III secretion system, considered as a key virulence factor. The annotation of the Estrella genome also allowed the characterization of the metabolic abilities of this strictly intracellular bacterium. Altogether, the students provided the scientific community with the Estrella genome sequence and a preliminary understanding of the biology of this recently-discovered bacterial genus, while learning to use cutting-edge technologies for sequencing and to perform bioinformatics analyses. PMID:25745418

  4. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers

    PubMed Central

    2012-01-01

    Background Next generation sequencing (NGS) technology has revolutionized genomic and genetic research. The pace of change in this area is rapid with three major new sequencing platforms having been released in 2011: Ion Torrent’s PGM, Pacific Biosciences’ RS and the Illumina MiSeq. Here we compare the results obtained with those platforms to the performance of the Illumina HiSeq, the current market leader. In order to compare these platforms, and get sufficient coverage depth to allow meaningful analysis, we have sequenced a set of 4 microbial genomes with mean GC content ranging from 19.3 to 67.7%. Together, these represent a comprehensive range of genome content. Here we report our analysis of that sequence data in terms of coverage distribution, bias, GC distribution, variant detection and accuracy. Results Sequence generated by Ion Torrent, MiSeq and Pacific Biosciences technologies displays near perfect coverage behaviour on GC-rich, neutral and moderately AT-rich genomes, but a profound bias was observed upon sequencing the extremely AT-rich genome of Plasmodium falciparum on the PGM, resulting in no coverage for approximately 30% of the genome. We analysed the ability to call variants from each platform and found that we could call slightly more variants from Ion Torrent data compared to MiSeq data, but at the expense of a higher false positive rate. Variant calling from Pacific Biosciences data was possible but higher coverage depth was required. Context specific errors were observed in both PGM and MiSeq data, but not in that from the Pacific Biosciences platform. Conclusions All three fast turnaround sequencers evaluated here were able to generate usable sequence. However there are key differences between the quality of that data and the applications it will support. PMID:22827831

  5. Bayesian Model Comparison and Parameter Inference in Systems Biology Using Nested Sampling

    PubMed Central

    Pullen, Nick; Morris, Richard J.

    2014-01-01

    Inferring parameters for models of biological processes is a current challenge in systems biology, as is the related problem of comparing competing models that explain the data. In this work we apply Skilling's nested sampling to address both of these problems. Nested sampling is a Bayesian method for exploring parameter space that transforms a multi-dimensional integral to a 1D integration over likelihood space. This approach focusses on the computation of the marginal likelihood or evidence. The ratio of evidences of different models leads to the Bayes factor, which can be used for model comparison. We demonstrate how nested sampling can be used to reverse-engineer a system's behaviour whilst accounting for the uncertainty in the results. The effect of missing initial conditions of the variables as well as unknown parameters is investigated. We show how the evidence and the model ranking can change as a function of the available data. Furthermore, the addition of data from extra variables of the system can deliver more information for model comparison than increasing the data from one variable, thus providing a basis for experimental design. PMID:24523891

  6. Bayesian model comparison and parameter inference in systems biology using nested sampling.

    PubMed

    Pullen, Nick; Morris, Richard J

    2014-01-01

    Inferring parameters for models of biological processes is a current challenge in systems biology, as is the related problem of comparing competing models that explain the data. In this work we apply Skilling's nested sampling to address both of these problems. Nested sampling is a Bayesian method for exploring parameter space that transforms a multi-dimensional integral to a 1D integration over likelihood space. This approach focuses on the computation of the marginal likelihood or evidence. The ratio of evidences of different models leads to the Bayes factor, which can be used for model comparison. We demonstrate how nested sampling can be used to reverse-engineer a system's behaviour whilst accounting for the uncertainty in the results. The effect of missing initial conditions of the variables as well as unknown parameters is investigated. We show how the evidence and the model ranking can change as a function of the available data. Furthermore, the addition of data from extra variables of the system can deliver more information for model comparison than increasing the data from one variable, thus providing a basis for experimental design. PMID:24523891

  7. Sequence-related amplified polymorphism (SRAP) markers: A potential resource for studies in plant molecular biology1

    PubMed Central

    Robarts, Daniel W. H.; Wolfe, Andrea D.

    2014-01-01

    In the past few decades, many investigations in the field of plant biology have employed selectively neutral, multilocus, dominant markers such as inter-simple sequence repeat (ISSR), random-amplified polymorphic DNA (RAPD), and amplified fragment length polymorphism (AFLP) to address hypotheses at lower taxonomic levels. More recently, sequence-related amplified polymorphism (SRAP) markers have been developed, which are used to amplify coding regions of DNA with primers targeting open reading frames. These markers have proven to be robust and highly variable, on par with AFLP, and are attained through a significantly less technically demanding process. SRAP markers have been used primarily for agronomic and horticultural purposes, developing quantitative trait loci in advanced hybrids and assessing genetic diversity of large germplasm collections. Here, we suggest that SRAP markers should be employed for research addressing hypotheses in plant systematics, biogeography, conservation, ecology, and beyond. We provide an overview of the SRAP literature to date, review descriptive statistics of SRAP markers in a subset of 171 publications, and present relevant case studies to demonstrate the applicability of SRAP markers to the diverse field of plant biology. Results of these selected works indicate that SRAP markers have the potential to enhance the current suite of molecular tools in a diversity of fields by providing an easy-to-use, highly variable marker with inherent biological significance. PMID:25202637

  8. A Comparison of Biological and Adoptive Mothers and Fathers: The Relevance of Biological Kinship and Gendered Constructs of Parenthood.

    ERIC Educational Resources Information Center

    Miall, Charlene E.; March, Karen

    2003-01-01

    Used qualitative interviews to examine beliefs and values about biological and adoptive parents. Considered how biological kinship, gender, and actual parenting behavior affect the assessments respondents made of the emotional bonding between parents and children. Found that biological and adoptive parents viewed motherhood as instinctive and…

  9. Mean-Field Analysis of Recursive Entropic Segmentation of Biological Sequences

    NASA Astrophysics Data System (ADS)

    Cheong, Siew-Ann; Stodghill, Paul; Schneider, David; Myers, Christopher

    2007-03-01

    Horizontal gene transfer in bacteria results in genomic sequences which are mosaic in nature. An important first step in the analysis of a bacterial genome would thus be to model the statistically nonstationary nucleotide or protein sequence with a collection of P stationary Markov chains, and partition the sequence of length N into M statistically stationary segments/domains. This can be done for Markov chains of order K = 0 using a recursive segmentation scheme based on the Jensen-Shannon divergence, where the unknown parameters P and M are estimated from a hypothesis testing/model selection process. In this talk, we describe how the Jensen-Shannon divergence can be generalized to Markov chains of order K > 0, as well as an algorithm optimizing the positions of a fixed number of domain walls. We then describe a mean field analysis of the generalized recursive Jensen-Shannon segmentation scheme, and show how most domain walls appear as local maxima in the divergence spectrum of the sequence, before highlighting the main problem associated with the recursive segmentation scheme, i.e. the strengths of the domain walls selected recursively do not decrease monotonically. This problem is especially severe in repetitive sequences, whose statistical signatures we will also discuss.

  10. Unique nucleotide sequence (UNS)-guided assembly of repetitive DNA parts for synthetic biology applications

    PubMed Central

    Torella, Joseph P.; Lienert, Florian; Boehm, Christian R.; Chen, Jan-Hung; Way, Jeffrey C.; Silver, Pamela A.

    2016-01-01

    Recombination-based DNA construction methods, such as Gibson assembly, have made it possible to easily and simultaneously assemble multiple DNA parts and hold promise for the development and optimization of metabolic pathways and functional genetic circuits. Over time, however, these pathways and circuits have become more complex, and the increasing need for standardization and insulation of genetic parts has resulted in sequence redundancies — for example repeated terminator and insulator sequences — that complicate recombination-based assembly. We and others have recently developed DNA assembly methods that we refer to collectively as unique nucleotide sequence (UNS)-guided assembly, in which individual DNA parts are flanked with UNSs to facilitate the ordered, recombination-based assembly of repetitive sequences. Here we present a detailed protocol for UNS-guided assembly that enables researchers to convert multiple DNA parts into sequenced, correctly-assembled constructs, or into high-quality combinatorial libraries in only 2–3 days. If the DNA parts must be generated from scratch, an additional 2–5 days are necessary. This protocol requires no specialized equipment and can easily be implemented by a student with experience in basic cloning techniques. PMID:25101822

  11. Unique nucleotide sequence-guided assembly of repetitive DNA parts for synthetic biology applications

    SciTech Connect

    Torella, JP; Lienert, F; Boehm, CR; Chen, JH; Way, JC; Silver, PA

    2014-08-07

    Recombination-based DNA construction methods, such as Gibson assembly, have made it possible to easily and simultaneously assemble multiple DNA parts, and they hold promise for the development and optimization of metabolic pathways and functional genetic circuits. Over time, however, these pathways and circuits have become more complex, and the increasing need for standardization and insulation of genetic parts has resulted in sequence redundancies-for example, repeated terminator and insulator sequences-that complicate recombination-based assembly. We and others have recently developed DNA assembly methods, which we refer to collectively as unique nucleotide sequence (UNS)-guided assembly, in which individual DNA parts are flanked with UNSs to facilitate the ordered, recombination-based assembly of repetitive sequences. Here we present a detailed protocol for UNS-guided assembly that enables researchers to convert multiple DNA parts into sequenced, correctly assembled constructs, or into high-quality combinatorial libraries in only 2-3 d. If the DNA parts must be generated from scratch, an additional 2-5 d are necessary. This protocol requires no specialized equipment and can easily be implemented by a student with experience in basic cloning techniques.

  12. Nucleotide sequence of the phosphoglycerate kinase gene from the extreme thermophile Thermus thermophilus. Comparison of the deduced amino acid sequence with that of the mesophilic yeast phosphoglycerate kinase.

    PubMed Central

    Bowen, D; Littlechild, J A; Fothergill, J E; Watson, H C; Hall, L

    1988-01-01

    Using oligonucleotide probes derived from amino acid sequencing information, the structural gene for phosphoglycerate kinase from the extreme thermophile, Thermus thermophilus, was cloned in Escherichia coli and its complete nucleotide sequence determined. The gene consists of an open reading frame corresponding to a protein of 390 amino acid residues (calculated Mr 41,791) with an extreme bias for G or C (93.1%) in the codon third base position. Comparison of the deduced amino acid sequence with that of the corresponding mesophilic yeast enzyme indicated a number of significant differences. These are discussed in terms of the unusual codon bias and their possible role in enhanced protein thermal stability. Images Fig. 1. PMID:3052437

  13. The Complete Genome Sequence of Escherichia coli DH10B: Insights into the Biology of a Laboratory Workhorse▿ †

    PubMed Central

    Durfee, Tim; Nelson, Richard; Baldwin, Schuyler; Plunkett, Guy; Burland, Valerie; Mau, Bob; Petrosino, Joseph F.; Qin, Xiang; Muzny, Donna M.; Ayele, Mulu; Gibbs, Richard A.; Csörgő, Bálint; Pósfai, György; Weinstock, George M.; Blattner, Frederick R.

    2008-01-01

    Escherichia coli DH10B was designed for the propagation of large insert DNA library clones. It is used extensively, taking advantage of properties such as high DNA transformation efficiency and maintenance of large plasmids. The strain was constructed by serial genetic recombination steps, but the underlying sequence changes remained unverified. We report the complete genomic sequence of DH10B by using reads accumulated from the bovine sequencing project at Baylor College of Medicine and assembled with DNAStar's SeqMan genome assembler. The DH10B genome is largely colinear with that of the wild-type K-12 strain MG1655, although it is substantially more complex than previously appreciated, allowing DH10B biology to be further explored. The 226 mutated genes in DH10B relative to MG1655 are mostly attributable to the extensive genetic manipulations the strain has undergone. However, we demonstrate that DH10B has a 13.5-fold higher mutation rate than MG1655, resulting from a dramatic increase in insertion sequence (IS) transposition, especially IS150. IS elements appear to have remodeled genome architecture, providing homologous recombination sites for a 113,260-bp tandem duplication and an inversion. DH10B requires leucine for growth on minimal medium due to the deletion of leuLABCD and harbors both the relA1 and spoT1 alleles causing both sensitivity to nutritional downshifts and slightly lower growth rates relative to the wild type. Finally, while the sequence confirms most of the reported alleles, the sequence of deoR is wild type, necessitating reexamination of the assumed basis for the high transformability of DH10B. PMID:18245285

  14. Protein identities - Graphocephala atropunctata expressed sequenced tags: expanding leafhopper vector biology

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A small heat shock protein was isolated and sequenced from the Blue-green sharpshooter, BGSS, Graphocephala atropunctata (Signoret) (Hemiptera: Cicadellidae). The BGSS has been the native vector of Pierce’s disease in vineyards in California for nearly a century. The importance of this vector spec...

  15. Accelerated Integrated Science Sequence (AISS): An Introductory Biology, Chemistry, and Physics Course

    ERIC Educational Resources Information Center

    Purvis-Roberts, Kathleen L.; Edwalds-Gilbert, Gretchen; Landsberg, Adam S.; Copp, Newton; Ulsh, Lisa; Drew, David E.

    2009-01-01

    A new interdisciplinary, introductory science course was offered for the first time during the 2007-2008 school year. The purpose of the course is to introduce students to the idea of working at the intersections of biology, chemistry, and physics and to recognize interconnections between the disciplines. Interdisciplinary laboratories are a key…

  16. Biology-Chemistry-Physics, Teachers' Guide, a Three-Year Sequence, Parts I and II.

    ERIC Educational Resources Information Center

    Scott, Arthur; And Others

    This is one of two teacher's guides for a three-year integrated biology, chemistry, and physics course being prepared by the Portland Project Committee. This committee reviewed and selected material developed by the national course improvement groups--Physical Science Study Committee, Chemical Bond Approach, Chemical Education Materials Study,…

  17. Designing and Evaluating a Context-Based Lesson Sequence Promoting Conceptual Coherence in Biology

    ERIC Educational Resources Information Center

    Ummels, M. H. J.; Kamp, M. J. A.; de Kroon, H.; Boersma, K. Th.

    2015-01-01

    Context-based education, in which students deal with biological concepts in a meaningful way, is showing promise in promoting the development of students' conceptual coherence. However, literature gives little guidance about how this kind of education should be designed. Therefore, our study aims at designing and evaluating the practicability…

  18. Biological nutrient removal in a sequencing batch reactor operated as oxic/anoxic/extended-idle regime.

    PubMed

    Li, Xiao-ming; Chen, Hong-bo; Yang, Qi; Wang, Dong-bo; Luo, Kun; Zeng, Guang-ming

    2014-06-01

    Previous researches have demonstrated that biological phosphorus removal from wastewater could be induced by oxic/extended-idle (O/EI) regime. In this study, an anoxic period was introduced after the aeration to realize biological nutrient removal. High nitrite accumulation ratio and polyhydroxyalkanoates biosynthesis were obtained in the aeration and biological nutrient removal could be well achieved in oxic/anoxic/extended-idle (O/A/EI) regime for the wastewater used. In addition, nitrogen and phosphorus removal performance in O/A/EI regime was compared with that in conventional anaerobic/anoxic/aerobic (A(2)/O) and O/EI processes. The results showed that O/A/EI regime exhibited higher nitrogen and phosphorus removal than A(2)/O and O/EI processes. More ammonium oxidizing bacteria and polyphosphate accumulating organisms and less glycogen accumulating organisms containing in the biomass might be the principal reason for the better nitrogen and phosphorus removal in O/A/EI regime. Furthermore, biological nutrient removal with O/A/EI regime was demonstrated with municipal wastewater. The average TN, SOP and COD removal efficiencies were 93%, 95% and 87%, respectively. PMID:24393562

  19. Sequencing Genetics Information: Integrating Data into Information Literacy for Undergraduate Biology Students

    ERIC Educational Resources Information Center

    MacMillan, Don

    2010-01-01

    This case study describes an information literacy lab for an undergraduate biology course that leads students through a range of resources to discover aspects of genetic information. The lab provides over 560 students per semester with the opportunity for hands-on exploration of resources in steps that simulate the pathways of higher-level…

  20. The Short ITS2 Sequence Serves as an Efficient Taxonomic Sequence Tag in Comparison with the Full-Length ITS

    PubMed Central

    Han, Jianping; Zhu, Yingjie; Chen, Xiaochen; Liao, Baoshen; Yao, Hui; Song, Jingyuan; Chen, Shilin; Meng, Fanyun

    2013-01-01

    An ideal DNA barcoding region should be short enough to be amplified from degraded DNA. In this paper, we discuss the possibility of using a short nuclear DNA sequence as a barcode to identify a wide range of medicinal plant species. First, the PCR and sequencing success rates of ITS and ITS2 were evaluated based entirely on materials from dry medicinal product and herbarium voucher specimens, including some samples collected back to 90 years ago. The results showed that ITS2 could recover 91% while ITS could recover only 23% efficiency of PCR and sequencing by using one pair of primer. Second, 12861 ITS and ITS2 plant sequences were used to compare the identification efficiency of the two regions. Four identification criteria (BLAST, inter- and intradivergence Wilcoxon signed rank tests, and TaxonDNA) were evaluated. Our results supported the hypothesis that ITS2 can be used as a minibarcode to effectively identify species in a wide variety of specimens and medicinal materials. PMID:23484151

  1. Microbial Analysis of Bite Marks by Sequence Comparison of Streptococcal DNA

    PubMed Central

    Kennedy, Darnell M.; Stanton, Jo-Ann L.; García, José A.; Mason, Chris; Rand, Christy J.; Kieser, Jules A.; Tompkins, Geoffrey R.

    2012-01-01

    Bite mark injuries often feature in violent crimes. Conventional morphometric methods for the forensic analysis of bite marks involve elements of subjective interpretation that threaten the credibility of this field. Human DNA recovered from bite marks has the highest evidentiary value, however recovery can be compromised by salivary components. This study assessed the feasibility of matching bacterial DNA sequences amplified from experimental bite marks to those obtained from the teeth responsible, with the aim of evaluating the capability of three genomic regions of streptococcal DNA to discriminate between participant samples. Bite mark and teeth swabs were collected from 16 participants. Bacterial DNA was extracted to provide the template for PCR primers specific for streptococcal 16S ribosomal RNA (16S rRNA) gene, 16S–23S intergenic spacer (ITS) and RNA polymerase beta subunit (rpoB). High throughput sequencing (GS FLX 454), followed by stringent quality filtering, generated reads from bite marks for comparison to those generated from teeth samples. For all three regions, the greatest overlaps of identical reads were between bite mark samples and the corresponding teeth samples. The average proportions of reads identical between bite mark and corresponding teeth samples were 0.31, 0.41 and 0.31, and for non-corresponding samples were 0.11, 0.20 and 0.016, for 16S rRNA, ITS and rpoB, respectively. The probabilities of correctly distinguishing matching and non-matching teeth samples were 0.92 for ITS, 0.99 for 16S rRNA and 1.0 for rpoB. These findings strongly support the tenet that bacterial DNA amplified from bite marks and teeth can provide corroborating information in the identification of assailants. PMID:23284761

  2. Co-evolutionary Models for Reconstructing Ancestral Genomic Sequences: Computational Issues and Biological Examples

    NASA Astrophysics Data System (ADS)

    Tuller, Tamir; Birin, Hadas; Kupiec, Martin; Ruppin, Eytan

    The inference of ancestral genomes is a fundamental problem in molecular evolution. Due to the statistical nature of this problem, the most likely or the most parsimonious ancestral genomes usually include considerable error rates. In general, these errors cannot be abolished by utilizing more exhaustive computational approaches, by using longer genomic sequences, or by analyzing more taxa. In recent studies we showed that co-evolution is an important force that can be used for significantly improving the inference of ancestral genome content.

  3. Complete genome sequence of the hyperthermophilic archaeon Thermococcus kodakaraensis KOD1 and comparison with Pyrococcus genomes

    PubMed Central

    Fukui, Toshiaki; Atomi, Haruyuki; Kanai, Tamotsu; Matsumi, Rie; Fujiwara, Shinsuke; Imanaka, Tadayuki

    2005-01-01

    The genus Thermococcus, comprised of sulfur-reducing hyperthermophilic archaea, belongs to the order Thermococcales in Euryarchaeota along with the closely related genus Pyrococcus. The members of Thermococcus are ubiquitously present in natural high-temperature environments, and are therefore considered to play a major role in the ecology and metabolic activity of microbial consortia within hot-water ecosystems. To obtain insight into this important genus, we have determined and annotated the complete 2,088,737-base genome of Thermococcus kodakaraensis strain KOD1, followed by a comparison with the three complete genomes of Pyrococcus spp. A total of 2306 coding DNA sequences (CDSs) have been identified, among which half (1165 CDSs) are annotatable, whereas the functions of 41% (936 CDSs) cannot be predicted from the primary structures. The genome contains seven genes for probable transposases and four virus-related regions. Several proteins within these genetic elements show high similarities to those in Pyrococcus spp., implying the natural occurrence of horizontal gene transfer of such mobile elements among the order Thermococcales. Comparative genomics clarified that 1204 proteins, including those for information processing and basic metabolisms, are shared among T. kodakaraensis and the three Pyrococcus spp. On the other hand, among the set of 689 proteins unique to T. kodakaraensis, there are several intriguing proteins that might be responsible for the specific trait of the genus Thermococcus, such as proteins involved in additional pyruvate oxidation, nucleotide metabolisms, unique or additional metal ion transporters, improved stress response system, and a distinct restriction system. PMID:15710748

  4. A comparison between equations describing in vivo MT: The effects of noise and sequence parameters

    NASA Astrophysics Data System (ADS)

    Cercignani, Mara; Barker, Gareth J.

    2008-04-01

    Quantitative models of magnetization transfer (MT) allow the estimation of physical properties of tissue which are thought to reflect myelination, and are therefore likely to be useful for clinical application. Although a model describing a two-pool system under continuous wave-saturation has been available for two decades, generalizing such a model to pulsed MT, and therefore to in vivo applications, is not straightforward, and only recently have a range of equations predicting the outcome of pulsed MT experiments been proposed. These solutions of the 2-pool model are based on differing assumptions and involve differing degrees of complexity, so their individual advantages and limitations are not always obvious. This paper is concerned with the comparison of three differing signal equations. After reviewing the theory behind each of them, their accuracy and precision is investigated using numerical simulations under variable experimental conditions such as degree of T1-weighting of the acquisition sequence and SNR, and the consistency of numerical results is tested using in vivo data. We show that while in conditions of minimal T1-weighting, high SNR, and large duty cycle the solutions of the three equations are consistent, they have a different tolerance to deviations from the basic assumptions behind their development, which should be taken into account when designing a quantitative MT protocol.

  5. Removal of typical endocrine disrupting chemicals by membrane bioreactor: in comparison with sequencing batch reactor.

    PubMed

    Zhou, Yingjun; Huang, Xia; Zhou, Haidong; Chen, Jianhua; Xue, Wenchao

    2011-01-01

    The removal of endocrine disrupting chemicals (EDCs) by a laboratory-scale membrane bioreactor (MBR) fed with synthetic sewage was evaluated and moreover, compared with that by a sequencing batch reactor (SBR) operated under same conditions in parallel. Eight kinds of typical EDCs, including 17β-estradiol (E2), estrone (E1), estriol (E3), 17α-ethynilestradiol (EE2), 4-octylphenol (4-OP), 4-nonylphenol (4-NP), bisphenol A (BPA) and nonylphenol ethoxylates (NPnEO), were spiked into the feed. Their concentrations in influent, effluent and supernatant were determined by gas chromatography-mass spectrometry method. The overall estrogenecity was evaluated as 17β-estradiol equivalent quantity (EEQ), determined via yeast estrogen screen (YES) assay. E2, E3, BPA and 4-OP were well removed by both MBR and SBR, with removal rates more than 95% and no significant differences between the two reactors. However, with regard to the other four EDCs, of which the removal rates were lower, MBR performed better. Comparison between supernatant and effluent of the two reactors indicated that membrane separation of sludge and effluent, compared with sedimentation, can relatively improve elimination of target EDCs and total estrogenecity. By applying different solids retention times (SRTs) (5, 10, 20 and 40 d) to the MBR, 10 and 5 d were found to be the lower critical SRTs for efficient target EDCs and EEQ removal, respectively. PMID:22105134

  6. Identification of Simple Sequence Repeat Biomarkers through Cross-Species Comparison in a Tag Cloud Representation

    PubMed Central

    2014-01-01

    Simple sequence repeats (SSRs) are not only applied as genetic markers in evolutionary studies but they also play an important role in gene regulatory activities. Efficient identification of conserved and exclusive SSRs through cross-species comparison is helpful for understanding the evolutionary mechanisms and associations between specific gene groups and SSR motifs. In this paper, we developed an online cross-species comparative system and integrated it with a tag cloud visualization technique for identifying potential SSR biomarkers within fourteen frequently used model species. Ultraconserved or exclusive SSRs among cross-species orthologous genes could be effectively retrieved and displayed through a friendly interface design. Four different types of testing cases were applied to demonstrate and verify the retrieved SSR biomarker candidates. Through statistical analysis and enhanced tag cloud representation on defined functional related genes and cross-species clusters, the proposed system can correctly represent the patterns, loci, colors, and sizes of identified SSRs in accordance with gene functions, pattern qualities, and conserved characteristics among species. PMID:24800246

  7. Genetic diversity of human immunodeficiency virus type 2: evidence for distinct sequence subtypes with differences in virus biology.

    PubMed Central

    Gao, F; Yue, L; Robertson, D L; Hill, S C; Hui, H; Biggar, R J; Neequaye, A E; Whelan, T M; Ho, D D; Shaw, G M

    1994-01-01

    The virulence properties of human immunodeficiency virus type 2 (HIV-2) are known to vary significantly and to range from relative attenuation in certain individuals to high-level pathogenicity in others. These differences in clinical manifestations may, at least in part, be determined by genetic differences among infecting virus strains. Evaluation of the full spectrum of HIV-2 genetic diversity is thus a necessary first step towards understanding its molecular epidemiology, natural history of infection, and biological diversity. In this study, we have used nested PCR techniques to amplify viral sequences from the DNA of uncultured peripheral blood mononuclear cells from 12 patients with HIV-2 seroreactivity. Sequence analysis of four nonoverlapping genomic regions allowed a comprehensive analysis of HIV-2 phylogeny. The results revealed (i) the existence of five distinct and roughly equidistant evolutionary lineages of HIV-2 which, by analogy with HIV-1, have been termed sequence subtypes A to E; (ii) evidence for a mosaic HIV-2 genome, indicating that coinfection with genetically divergent strains and recombination can occur in HIV-2-infected individuals; and (iii) evidence supporting the conclusion that some of the HIV-2 subtypes may have arisen from independent introductions of genetically diverse sooty mangabey viruses into the human population. Importantly, only a subset of HIV-2 strains replicated in culture: all subtype A viruses grew to high titers, but attempts to isolate representatives of subtypes C, D, and E, as well as the majority of subtype B viruses, remained unsuccessful. Infection with all five viral subtypes was detectable by commercially available serological (Western immunoblot) assays, despite intersubtype sequence differences of up to 25% in the gag, pol, and env regions. These results indicate that the genetic and biological diversity of HIV-2 is far greater than previously appreciated and suggest that there may be subtype

  8. Comparison of the Biological Characteristics of Mesenchymal Stem Cells Derived from Bone Marrow and Skin.

    PubMed

    Liu, Ruifeng; Chang, Wenjuan; Wei, Hong; Zhang, Kaiming

    2016-01-01

    Mesenchymal stem cells (MSCs) exhibit high proliferation and self-renewal capabilities and are critical for tissue repair and regeneration during ontogenesis. They also play a role in immunomodulation. MSCs can be isolated from a variety of tissues and have many potential applications in the clinical setting. However, MSCs of different origins may possess different biological characteristics. In this study, we performed a comprehensive comparison of MSCs isolated from bone marrow and skin (BMMSCs and SMSCs, resp.), including analysis of the skin sampling area, separation method, culture conditions, primary and passage culture times, cell surface markers, multipotency, cytokine secretion, gene expression, and fibroblast-like features. The results showed that the MSCs from both sources had similar cell morphologies, surface markers, and differentiation capacities. However, the two cell types exhibited major differences in growth characteristics; the primary culture time of BMMSCs was significantly shorter than that of SMSCs, whereas the growth rate of BMMSCs was lower than that of SMSCs after passaging. Moreover, differences in gene expression and cytokine secretion profiles were observed. For example, secretion of proliferative cytokines was significantly higher for SMSCs than for BMMSCs. Our findings provide insights into the different biological functions of both cell types. PMID:27239202

  9. Structural Identifiability of Systems Biology Models: A Critical Comparison of Methods

    PubMed Central

    Chis, Oana-Teodora; Banga, Julio R.; Balsa-Canto, Eva

    2011-01-01

    Analysing the properties of a biological system through in silico experimentation requires a satisfactory mathematical representation of the system including accurate values of the model parameters. Fortunately, modern experimental techniques allow obtaining time-series data of appropriate quality which may then be used to estimate unknown parameters. However, in many cases, a subset of those parameters may not be uniquely estimated, independently of the experimental data available or the numerical techniques used for estimation. This lack of identifiability is related to the structure of the model, i.e. the system dynamics plus the observation function. Despite the interest in knowing a priori whether there is any chance of uniquely estimating all model unknown parameters, the structural identifiability analysis for general non-linear dynamic models is still an open question. There is no method amenable to every model, thus at some point we have to face the selection of one of the possibilities. This work presents a critical comparison of the currently available techniques. To this end, we perform the structural identifiability analysis of a collection of biological models. The results reveal that the generating series approach, in combination with identifiability tableaus, offers the most advantageous compromise among range of applicability, computational complexity and information provided. PMID:22132135

  10. Comparison of the Biological Characteristics of Mesenchymal Stem Cells Derived from Bone Marrow and Skin

    PubMed Central

    Liu, Ruifeng; Chang, Wenjuan; Wei, Hong; Zhang, Kaiming

    2016-01-01

    Mesenchymal stem cells (MSCs) exhibit high proliferation and self-renewal capabilities and are critical for tissue repair and regeneration during ontogenesis. They also play a role in immunomodulation. MSCs can be isolated from a variety of tissues and have many potential applications in the clinical setting. However, MSCs of different origins may possess different biological characteristics. In this study, we performed a comprehensive comparison of MSCs isolated from bone marrow and skin (BMMSCs and SMSCs, resp.), including analysis of the skin sampling area, separation method, culture conditions, primary and passage culture times, cell surface markers, multipotency, cytokine secretion, gene expression, and fibroblast-like features. The results showed that the MSCs from both sources had similar cell morphologies, surface markers, and differentiation capacities. However, the two cell types exhibited major differences in growth characteristics; the primary culture time of BMMSCs was significantly shorter than that of SMSCs, whereas the growth rate of BMMSCs was lower than that of SMSCs after passaging. Moreover, differences in gene expression and cytokine secretion profiles were observed. For example, secretion of proliferative cytokines was significantly higher for SMSCs than for BMMSCs. Our findings provide insights into the different biological functions of both cell types. PMID:27239202