Sample records for biological sequence simulation

  1. Use of Internet Resources in the Biology Lecture Classroom.

    ERIC Educational Resources Information Center

    Francis, Joseph W.

    2000-01-01

    Introduces internet resources that are available for instructional use in biology classrooms. Provides information on video-based technologies to create and capture video sequences, interactive web sites that allow interaction with biology simulations, online texts, and interactive videos that display animated video sequences. (YDS)

  2. Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing

    PubMed Central

    2012-01-01

    Background RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Results Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. Conclusions This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates. PMID:22985019

  3. Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing.

    PubMed

    Robles, José A; Qureshi, Sumaira E; Stephen, Stuart J; Wilson, Susan R; Burden, Conrad J; Taylor, Jennifer M

    2012-09-17

    RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates.

  4. High performance computing in biology: multimillion atom simulations of nanoscale systems

    PubMed Central

    Sanbonmatsu, K. Y.; Tung, C.-S.

    2007-01-01

    Computational methods have been used in biology for sequence analysis (bioinformatics), all-atom simulation (molecular dynamics and quantum calculations), and more recently for modeling biological networks (systems biology). Of these three techniques, all-atom simulation is currently the most computationally demanding, in terms of compute load, communication speed, and memory load. Breakthroughs in electrostatic force calculation and dynamic load balancing have enabled molecular dynamics simulations of large biomolecular complexes. Here, we report simulation results for the ribosome, using approximately 2.64 million atoms, the largest all-atom biomolecular simulation published to date. Several other nanoscale systems with different numbers of atoms were studied to measure the performance of the NAMD molecular dynamics simulation program on the Los Alamos National Laboratory Q Machine. We demonstrate that multimillion atom systems represent a 'sweet spot' for the NAMD code on large supercomputers. NAMD displays an unprecedented 85% parallel scaling efficiency for the ribosome system on 1024 CPUs. We also review recent targeted molecular dynamics simulations of the ribosome that prove useful for studying conformational changes of this large biomolecular complex in atomic detail. PMID:17187988

  5. The effects of computer-simulated experiments on high school biology students' problem-solving skills and achievement

    NASA Astrophysics Data System (ADS)

    Carmack, Gay Lynn Dickinson

    2000-10-01

    This two-part quasi-experimental repeated measures study examined whether computer simulated experiments have an effect on the problem solving skills of high school biology students in a school-within-a-school magnet program. Specifically, the study identified episodes in a simulation sequence where problem solving skills improved. In the Fall academic semester, experimental group students (n = 30) were exposed to two simulations: CaseIt! and EVOLVE!. Control group students participated in an internet research project and a paper Hardy-Weinberg activity. In the Spring academic semester, experimental group students were exposed to three simulations: Genetics Construction Kit, CaseIt! and EVOLVE! . Spring control group students participated in a Drosophila lab, an internet research project, and Advanced Placement lab 8. Results indicate that the Fall and Spring experimental groups experienced significant gains in scientific problem solving after the second simulation in the sequence. These gains were independent of the simulation sequence or the amount of time spent on the simulations. These gains were significantly greater than control group scores in the Fall. The Spring control group significantly outscored all other study groups on both pretest measures. Even so, the Spring experimental group problem solving performance caught up to the Spring control group performance after the third simulation. There were no significant differences between control and experimental groups on content achievement. Results indicate that CSE is as effective as traditional laboratories in promoting scientific problem solving and that CSE is a useful tool for improving students' scientific problem solving skills. Moreover, retention of problem solving skills is enhanced by utilizing more than one simulation.

  6. Pydna: a simulation and documentation tool for DNA assembly strategies using python.

    PubMed

    Pereira, Filipa; Azevedo, Flávio; Carvalho, Ângela; Ribeiro, Gabriela F; Budde, Mark W; Johansson, Björn

    2015-05-02

    Recent advances in synthetic biology have provided tools to efficiently construct complex DNA molecules which are an important part of many molecular biology and biotechnology projects. The planning of such constructs has traditionally been done manually using a DNA sequence editor which becomes error-prone as scale and complexity of the construction increase. A human-readable formal description of cloning and assembly strategies, which also allows for automatic computer simulation and verification, would therefore be a valuable tool. We have developed pydna, an extensible, free and open source Python library for simulating basic molecular biology DNA unit operations such as restriction digestion, ligation, PCR, primer design, Gibson assembly and homologous recombination. A cloning strategy expressed as a pydna script provides a description that is complete, unambiguous and stable. Execution of the script automatically yields the sequence of the final molecule(s) and that of any intermediate constructs. Pydna has been designed to be understandable for biologists with limited programming skills by providing interfaces that are semantically similar to the description of molecular biology unit operations found in literature. Pydna simplifies both the planning and sharing of cloning strategies and is especially useful for complex or combinatorial DNA molecule construction. An important difference compared to existing tools with similar goals is the use of Python instead of a specifically constructed language, providing a simulation environment that is more flexible and extensible by the user.

  7. Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores.

    PubMed

    Bastien, Olivier; Maréchal, Eric

    2008-08-07

    Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. Two statistical models have been proposed. In the asymptotic limit of long sequences, the Karlin-Altschul model is based on the computation of a P-value, assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Alternatively, the Lipman-Pearson model is based on the computation of a Z-value from a random score distribution obtained by a Monte-Carlo simulation. Z-values allow the deduction of an upper bound of the P-value (1/Z-value2) following the TULIP theorem. Simulations of Z-value distribution is known to fit with a Gumbel law. This remarkable property was not demonstrated and had no obvious biological support. We built a model of evolution of sequences based on aging, as meant in Reliability Theory, using the fact that the amount of information shared between an initial sequence and the sequences in its lineage (i.e., mutual information in Information Theory) is a decreasing function of time. This quantity is simply measured by a sequence alignment score. In systems aging, the failure rate is related to the systems longevity. The system can be a machine with structured components, or a living entity or population. "Reliability" refers to the ability to operate properly according to a standard. Here, the "reliability" of a sequence refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure). Homologous sequences were considered as systems 1) having a high redundancy of information reflected by the magnitude of their alignment scores, 2) which components are the amino acids that can independently be damaged by random DNA mutations. From these assumptions, we deduced that information shared at each amino acid position evolved with a constant rate, corresponding to the information hazard rate, and that pairwise sequence alignment scores should follow a Gumbel distribution, which parameters could find some theoretical rationale. In particular, one parameter corresponds to the information hazard rate. Extreme value distribution of alignment scores, assessed from high scoring segments pairs following the Karlin-Altschul model, can also be deduced from the Reliability Theory applied to molecular sequences. It reflects the redundancy of information between homologous sequences, under functional conservative pressure. This model also provides a link between concepts of biological sequence analysis and of systems biology.

  8. Computational complexity of algorithms for sequence comparison, short-read assembly and genome alignment.

    PubMed

    Baichoo, Shakuntala; Ouzounis, Christos A

    A multitude of algorithms for sequence comparison, short-read assembly and whole-genome alignment have been developed in the general context of molecular biology, to support technology development for high-throughput sequencing, numerous applications in genome biology and fundamental research on comparative genomics. The computational complexity of these algorithms has been previously reported in original research papers, yet this often neglected property has not been reviewed previously in a systematic manner and for a wider audience. We provide a review of space and time complexity of key sequence analysis algorithms and highlight their properties in a comprehensive manner, in order to identify potential opportunities for further research in algorithm or data structure optimization. The complexity aspect is poised to become pivotal as we will be facing challenges related to the continuous increase of genomic data on unprecedented scales and complexity in the foreseeable future, when robust biological simulation at the cell level and above becomes a reality. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Sequencing Genetics Information: Integrating Data into Information Literacy for Undergraduate Biology Students

    ERIC Educational Resources Information Center

    MacMillan, Don

    2010-01-01

    This case study describes an information literacy lab for an undergraduate biology course that leads students through a range of resources to discover aspects of genetic information. The lab provides over 560 students per semester with the opportunity for hands-on exploration of resources in steps that simulate the pathways of higher-level…

  10. An experimental phylogeny to benchmark ancestral sequence reconstruction

    PubMed Central

    Randall, Ryan N.; Radford, Caelan E.; Roof, Kelsey A.; Natarajan, Divya K.; Gaucher, Eric A.

    2016-01-01

    Ancestral sequence reconstruction (ASR) is a still-burgeoning method that has revealed many key mechanisms of molecular evolution. One criticism of the approach is an inability to validate its algorithms within a biological context as opposed to a computer simulation. Here we build an experimental phylogeny using the gene of a single red fluorescent protein to address this criticism. The evolved phylogeny consists of 19 operational taxonomic units (leaves) and 17 ancestral bifurcations (nodes) that display a wide variety of fluorescent phenotypes. The 19 leaves then serve as ‘modern' sequences that we subject to ASR analyses using various algorithms and to benchmark against the known ancestral genotypes and ancestral phenotypes. We confirm computer simulations that show all algorithms infer ancient sequences with high accuracy, yet we also reveal wide variation in the phenotypes encoded by incorrectly inferred sequences. Specifically, Bayesian methods incorporating rate variation significantly outperform the maximum parsimony criterion in phenotypic accuracy. Subsampling of extant sequences had minor effect on the inference of ancestral sequences. PMID:27628687

  11. Structurally detailed coarse-grained model for Sec-facilitated co-translational protein translocation and membrane integration

    PubMed Central

    Miller, Thomas F.

    2017-01-01

    We present a coarse-grained simulation model that is capable of simulating the minute-timescale dynamics of protein translocation and membrane integration via the Sec translocon, while retaining sufficient chemical and structural detail to capture many of the sequence-specific interactions that drive these processes. The model includes accurate geometric representations of the ribosome and Sec translocon, obtained directly from experimental structures, and interactions parameterized from nearly 200 μs of residue-based coarse-grained molecular dynamics simulations. A protocol for mapping amino-acid sequences to coarse-grained beads enables the direct simulation of trajectories for the co-translational insertion of arbitrary polypeptide sequences into the Sec translocon. The model reproduces experimentally observed features of membrane protein integration, including the efficiency with which polypeptide domains integrate into the membrane, the variation in integration efficiency upon single amino-acid mutations, and the orientation of transmembrane domains. The central advantage of the model is that it connects sequence-level protein features to biological observables and timescales, enabling direct simulation for the mechanistic analysis of co-translational integration and for the engineering of membrane proteins with enhanced membrane integration efficiency. PMID:28328943

  12. Rényi continuous entropy of DNA sequences.

    PubMed

    Vinga, Susana; Almeida, Jonas S

    2004-12-07

    Entropy measures of DNA sequences estimate their randomness or, inversely, their repeatability. L-block Shannon discrete entropy accounts for the empirical distribution of all length-L words and has convergence problems for finite sequences. A new entropy measure that extends Shannon's formalism is proposed. Renyi's quadratic entropy calculated with Parzen window density estimation method applied to CGR/USM continuous maps of DNA sequences constitute a novel technique to evaluate sequence global randomness without some of the former method drawbacks. The asymptotic behaviour of this new measure was analytically deduced and the calculation of entropies for several synthetic and experimental biological sequences was performed. The results obtained were compared with the distributions of the null model of randomness obtained by simulation. The biological sequences have shown a different p-value according to the kernel resolution of Parzen's method, which might indicate an unknown level of organization of their patterns. This new technique can be very useful in the study of DNA sequence complexity and provide additional tools for DNA entropy estimation. The main MATLAB applications developed and additional material are available at the webpage . Specialized functions can be obtained from the authors.

  13. Simulating Next-Generation Sequencing Datasets from Empirical Mutation and Sequencing Models

    PubMed Central

    Stephens, Zachary D.; Hudson, Matthew E.; Mainzer, Liudmila S.; Taschuk, Morgan; Weber, Matthew R.; Iyer, Ravishankar K.

    2016-01-01

    An obstacle to validating and benchmarking methods for genome analysis is that there are few reference datasets available for which the “ground truth” about the mutational landscape of the sample genome is known and fully validated. Additionally, the free and public availability of real human genome datasets is incompatible with the preservation of donor privacy. In order to better analyze and understand genomic data, we need test datasets that model all variants, reflecting known biology as well as sequencing artifacts. Read simulators can fulfill this requirement, but are often criticized for limited resemblance to true data and overall inflexibility. We present NEAT (NExt-generation sequencing Analysis Toolkit), a set of tools that not only includes an easy-to-use read simulator, but also scripts to facilitate variant comparison and tool evaluation. NEAT has a wide variety of tunable parameters which can be set manually on the default model or parameterized using real datasets. The software is freely available at github.com/zstephens/neat-genreads. PMID:27893777

  14. Modeling Structure-Function Relationships in Synthetic DNA Sequences using Attribute Grammars

    PubMed Central

    Cai, Yizhi; Lux, Matthew W.; Adam, Laura; Peccoud, Jean

    2009-01-01

    Recognizing that certain biological functions can be associated with specific DNA sequences has led various fields of biology to adopt the notion of the genetic part. This concept provides a finer level of granularity than the traditional notion of the gene. However, a method of formally relating how a set of parts relates to a function has not yet emerged. Synthetic biology both demands such a formalism and provides an ideal setting for testing hypotheses about relationships between DNA sequences and phenotypes beyond the gene-centric methods used in genetics. Attribute grammars are used in computer science to translate the text of a program source code into the computational operations it represents. By associating attributes with parts, modifying the value of these attributes using rules that describe the structure of DNA sequences, and using a multi-pass compilation process, it is possible to translate DNA sequences into molecular interaction network models. These capabilities are illustrated by simple example grammars expressing how gene expression rates are dependent upon single or multiple parts. The translation process is validated by systematically generating, translating, and simulating the phenotype of all the sequences in the design space generated by a small library of genetic parts. Attribute grammars represent a flexible framework connecting parts with models of biological function. They will be instrumental for building mathematical models of libraries of genetic constructs synthesized to characterize the function of genetic parts. This formalism is also expected to provide a solid foundation for the development of computer assisted design applications for synthetic biology. PMID:19816554

  15. Sub-Terrahertz Spectroscopy of E.COLI Dna: Experiment, Statistical Model, and MD Simulations

    NASA Astrophysics Data System (ADS)

    Sizov, I.; Dorofeeva, T.; Khromova, T.; Gelmont, B.; Globus, T.

    2012-06-01

    We will present result of combined experimental and computational study of sub-THz absorption spectra from Escherichia coli (E.coli) DNA. Measurements were conducted using a Bruker FTIR spectrometer with a liquid helium cooled bolometer and a recently developed frequency domain sensor operating at room temperature, with spectral resolution of 0.25 cm-1 and 0.03 cm-1, correspondingly. We have earlier demonstrated that molecular dynamics (MD) simulation can be effectively applied for characterizing relatively small biological molecules, such as transfer RNA or small protein thioredoxin from E. coli , and help to understand and predict their absorption spectra. Large size of DNA macromolecules ( 5 million base pairs for E. coli DNA) prevents, however, direct application of MD simulation at the current level of computational capabilities. Therefore, by applying a second order Markov chain approach and Monte-Carlo technique, we have developed a new statistical model to construct DNA sequences from biological cells. These short representative sequences (20-60 base pairs) are built upon the most frequently repeated fragments (2-10 base pairs) in the original DNA. Using this new approach, we constructed DNA sequences for several non-pathogenic strains of E.coli, including a well-known strain BL21, uro-pathogenic strain, CFT073, and deadly EDL933 strain (O157:H7), and used MD simulations to calculate vibrational absorption spectra of these strains. Significant differences are clearly present in spectra of strains in averaged spectra and in all components for particular orientations. The mechanism of interaction of THz radiation with a biological molecule is studied by analyzing dynamics of atoms and correlation of local vibrations in the modeled molecule. Simulated THz vibrational spectra of DNA are compared with experimental results. With the spectral resolution of 0.1 cm-1 or better, which is now available in experiments, the very easy discrimination between different strains of the same bacteria becomes possible.

  16. Interpreting Microbial Biosynthesis in the Genomic Age: Biological and Practical Considerations

    PubMed Central

    Miller, Ian J.; Chevrette, Marc G.; Kwan, Jason C.

    2017-01-01

    Genome mining has become an increasingly powerful, scalable, and economically accessible tool for the study of natural product biosynthesis and drug discovery. However, there remain important biological and practical problems that can complicate or obscure biosynthetic analysis in genomic and metagenomic sequencing projects. Here, we focus on limitations of available technology as well as computational and experimental strategies to overcome them. We review the unique challenges and approaches in the study of symbiotic and uncultured systems, as well as those associated with biosynthetic gene cluster (BGC) assembly and product prediction. Finally, to explore sequencing parameters that affect the recovery and contiguity of large and repetitive BGCs assembled de novo, we simulate Illumina and PacBio sequencing of the Salinispora tropica genome focusing on assembly of the salinilactam (slm) BGC. PMID:28587290

  17. Combining protein sequence, structure, and dynamics: A novel approach for functional evolution analysis of PAS domain superfamily.

    PubMed

    Dong, Zheng; Zhou, Hongyu; Tao, Peng

    2018-02-01

    PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence-structure-dynamics-function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence-conserved residues and build phylogenetic tree. Three-dimensional structure alignment was also applied to obtain structure-conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics. © 2017 The Protein Society.

  18. SynBioSS designer: a web-based tool for the automated generation of kinetic models for synthetic biological constructs

    PubMed Central

    Weeding, Emma; Houle, Jason

    2010-01-01

    Modeling tools can play an important role in synthetic biology the same way modeling helps in other engineering disciplines: simulations can quickly probe mechanisms and provide a clear picture of how different components influence the behavior of the whole. We present a brief review of available tools and present SynBioSS Designer. The Synthetic Biology Software Suite (SynBioSS) is used for the generation, storing, retrieval and quantitative simulation of synthetic biological networks. SynBioSS consists of three distinct components: the Desktop Simulator, the Wiki, and the Designer. SynBioSS Designer takes as input molecular parts involved in gene expression and regulation (e.g. promoters, transcription factors, ribosome binding sites, etc.), and automatically generates complete networks of reactions that represent transcription, translation, regulation, induction and degradation of those parts. Effectively, Designer uses DNA sequences as input and generates networks of biomolecular reactions as output. In this paper we describe how Designer uses universal principles of molecular biology to generate models of any arbitrary synthetic biological system. These models are useful as they explain biological phenotypic complexity in mechanistic terms. In turn, such mechanistic explanations can assist in designing synthetic biological systems. We also discuss, giving practical guidance to users, how Designer interfaces with the Registry of Standard Biological Parts, the de facto compendium of parts used in synthetic biology applications. PMID:20639523

  19. GEM System: automatic prototyping of cell-wide metabolic pathway models from genomes.

    PubMed

    Arakawa, Kazuharu; Yamada, Yohei; Shinoda, Kosaku; Nakayama, Yoichi; Tomita, Masaru

    2006-03-23

    Successful realization of a "systems biology" approach to analyzing cells is a grand challenge for our understanding of life. However, current modeling approaches to cell simulation are labor-intensive, manual affairs, and therefore constitute a major bottleneck in the evolution of computational cell biology. We developed the Genome-based Modeling (GEM) System for the purpose of automatically prototyping simulation models of cell-wide metabolic pathways from genome sequences and other public biological information. Models generated by the GEM System include an entire Escherichia coli metabolism model comprising 968 reactions of 1195 metabolites, achieving 100% coverage when compared with the KEGG database, 92.38% with the EcoCyc database, and 95.06% with iJR904 genome-scale model. The GEM System prototypes qualitative models to reduce the labor-intensive tasks required for systems biology research. Models of over 90 bacterial genomes are available at our web site.

  20. Graphene Nanopores for Protein Sequencing.

    PubMed

    Wilson, James; Sloman, Leila; He, Zhiren; Aksimentiev, Aleksei

    2016-07-19

    An inexpensive, reliable method for protein sequencing is essential to unraveling the biological mechanisms governing cellular behavior and disease. Current protein sequencing methods suffer from limitations associated with the size of proteins that can be sequenced, the time, and the cost of the sequencing procedures. Here, we report the results of all-atom molecular dynamics simulations that investigated the feasibility of using graphene nanopores for protein sequencing. We focus our study on the biologically significant phenylalanine-glycine repeat peptides (FG-nups)-parts of the nuclear pore transport machinery. Surprisingly, we found FG-nups to behave similarly to single stranded DNA: the peptides adhere to graphene and exhibit step-wise translocation when subject to a transmembrane bias or a hydrostatic pressure gradient. Reducing the peptide's charge density or increasing the peptide's hydrophobicity was found to decrease the translocation speed. Yet, unidirectional and stepwise translocation driven by a transmembrane bias was observed even when the ratio of charged to hydrophobic amino acids was as low as 1:8. The nanopore transport of the peptides was found to produce stepwise modulations of the nanopore ionic current correlated with the type of amino acids present in the nanopore, suggesting that protein sequencing by measuring ionic current blockades may be possible.

  1. PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.

    PubMed

    Mirarab, Siavash; Nguyen, Nam; Guo, Sheng; Wang, Li-San; Kim, Junhyong; Warnow, Tandy

    2015-05-01

    We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate--slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory.

  2. Simulating Chemical-Induced Injury Using Virtual Hepatic Tissues

    EPA Science Inventory

    Chemical-induced liver injury involves a dynamic sequence of events that span multiple levels of biological organization. Current methods for testing the toxicity of a single chemical can cost millions of dollars, take up to two years and sacrifice thousands of animals. It is dif...

  3. Clustering evolving proteins into homologous families.

    PubMed

    Chan, Cheong Xin; Mahbob, Maisarah; Ragan, Mark A

    2013-04-08

    Clustering sequences into groups of putative homologs (families) is a critical first step in many areas of comparative biology and bioinformatics. The performance of clustering approaches in delineating biologically meaningful families depends strongly on characteristics of the data, including content bias and degree of divergence. New, highly scalable methods have recently been introduced to cluster the very large datasets being generated by next-generation sequencing technologies. However, there has been little systematic investigation of how characteristics of the data impact the performance of these approaches. Using clusters from a manually curated dataset as reference, we examined the performance of a widely used graph-based Markov clustering algorithm (MCL) and a greedy heuristic approach (UCLUST) in delineating protein families coded by three sets of bacterial genomes of different G+C content. Both MCL and UCLUST generated clusters that are comparable to the reference sets at specific parameter settings, although UCLUST tends to under-cluster compositionally biased sequences (G+C content 33% and 66%). Using simulated data, we sought to assess the individual effects of sequence divergence, rate heterogeneity, and underlying G+C content. Performance decreased with increasing sequence divergence, decreasing among-site rate variation, and increasing G+C bias. Two MCL-based methods recovered the simulated families more accurately than did UCLUST. MCL using local alignment distances is more robust across the investigated range of sequence features than are greedy heuristics using distances based on global alignment. Our results demonstrate that sequence divergence, rate heterogeneity and content bias can individually and in combination affect the accuracy with which MCL and UCLUST can recover homologous protein families. For application to data that are more divergent, and exhibit higher among-site rate variation and/or content bias, MCL may often be the better choice, especially if computational resources are not limiting.

  4. Computing Life

    ERIC Educational Resources Information Center

    National Institute of General Medical Sciences (NIGMS), 2009

    2009-01-01

    Computer advances now let researchers quickly search through DNA sequences to find gene variations that could lead to disease, simulate how flu might spread through one's school, and design three-dimensional animations of molecules that rival any video game. By teaming computers and biology, scientists can answer new and old questions that could…

  5. NRGC: a novel referential genome compression algorithm.

    PubMed

    Saha, Subrata; Rajasekaran, Sanguthevar

    2016-11-15

    Next-generation sequencing techniques produce millions to billions of short reads. The procedure is not only very cost effective but also can be done in laboratory environment. The state-of-the-art sequence assemblers then construct the whole genomic sequence from these reads. Current cutting edge computing technology makes it possible to build genomic sequences from the billions of reads within a minimal cost and time. As a consequence, we see an explosion of biological sequences in recent times. In turn, the cost of storing the sequences in physical memory or transmitting them over the internet is becoming a major bottleneck for research and future medical applications. Data compression techniques are one of the most important remedies in this context. We are in need of suitable data compression algorithms that can exploit the inherent structure of biological sequences. Although standard data compression algorithms are prevalent, they are not suitable to compress biological sequencing data effectively. In this article, we propose a novel referential genome compression algorithm (NRGC) to effectively and efficiently compress the genomic sequences. We have done rigorous experiments to evaluate NRGC by taking a set of real human genomes. The simulation results show that our algorithm is indeed an effective genome compression algorithm that performs better than the best-known algorithms in most of the cases. Compression and decompression times are also very impressive. The implementations are freely available for non-commercial purposes. They can be downloaded from: http://www.engr.uconn.edu/~rajasek/NRGC.zip CONTACT: rajasek@engr.uconn.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences

    PubMed Central

    Mirarab, Siavash; Nguyen, Nam; Guo, Sheng; Wang, Li-San; Kim, Junhyong

    2015-01-01

    Abstract We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate—slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory. PMID:25549288

  7. Towards programming languages for genetic engineering of living cells

    PubMed Central

    Pedersen, Michael; Phillips, Andrew

    2009-01-01

    Synthetic biology aims at producing novel biological systems to carry out some desired and well-defined functions. An ultimate dream is to design these systems at a high level of abstraction using engineering-based tools and programming languages, press a button, and have the design translated to DNA sequences that can be synthesized and put to work in living cells. We introduce such a programming language, which allows logical interactions between potentially undetermined proteins and genes to be expressed in a modular manner. Programs can be translated by a compiler into sequences of standard biological parts, a process that relies on logic programming and prototype databases that contain known biological parts and protein interactions. Programs can also be translated to reactions, allowing simulations to be carried out. While current limitations on available data prevent full use of the language in practical applications, the language can be used to develop formal models of synthetic systems, which are otherwise often presented by informal notations. The language can also serve as a concrete proposal on which future language designs can be discussed, and can help to guide the emerging standard of biological parts which so far has focused on biological, rather than logical, properties of parts. PMID:19369220

  8. Towards programming languages for genetic engineering of living cells.

    PubMed

    Pedersen, Michael; Phillips, Andrew

    2009-08-06

    Synthetic biology aims at producing novel biological systems to carry out some desired and well-defined functions. An ultimate dream is to design these systems at a high level of abstraction using engineering-based tools and programming languages, press a button, and have the design translated to DNA sequences that can be synthesized and put to work in living cells. We introduce such a programming language, which allows logical interactions between potentially undetermined proteins and genes to be expressed in a modular manner. Programs can be translated by a compiler into sequences of standard biological parts, a process that relies on logic programming and prototype databases that contain known biological parts and protein interactions. Programs can also be translated to reactions, allowing simulations to be carried out. While current limitations on available data prevent full use of the language in practical applications, the language can be used to develop formal models of synthetic systems, which are otherwise often presented by informal notations. The language can also serve as a concrete proposal on which future language designs can be discussed, and can help to guide the emerging standard of biological parts which so far has focused on biological, rather than logical, properties of parts.

  9. Evolutionary distances in the twilight zone--a rational kernel approach.

    PubMed

    Schwarz, Roland F; Fletcher, William; Förster, Frank; Merget, Benjamin; Wolf, Matthias; Schultz, Jörg; Markowetz, Florian

    2010-12-31

    Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.

  10. A Bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data

    PubMed Central

    Feng, Hao; Conneely, Karen N.; Wu, Hao

    2014-01-01

    DNA methylation is an important epigenetic modification that has essential roles in cellular processes including gene regulation, development and disease and is widely dysregulated in most types of cancer. Recent advances in sequencing technology have enabled the measurement of DNA methylation at single nucleotide resolution through methods such as whole-genome bisulfite sequencing and reduced representation bisulfite sequencing. In DNA methylation studies, a key task is to identify differences under distinct biological contexts, for example, between tumor and normal tissue. A challenge in sequencing studies is that the number of biological replicates is often limited by the costs of sequencing. The small number of replicates leads to unstable variance estimation, which can reduce accuracy to detect differentially methylated loci (DML). Here we propose a novel statistical method to detect DML when comparing two treatment groups. The sequencing counts are described by a lognormal-beta-binomial hierarchical model, which provides a basis for information sharing across different CpG sites. A Wald test is developed for hypothesis testing at each CpG site. Simulation results show that the proposed method yields improved DML detection compared to existing methods, particularly when the number of replicates is low. The proposed method is implemented in the Bioconductor package DSS. PMID:24561809

  11. Simulation and estimation of gene number in a biological pathway using almost complete saturation mutagenesis screening of haploid mouse cells.

    PubMed

    Tokunaga, Masahiro; Kokubu, Chikara; Maeda, Yusuke; Sese, Jun; Horie, Kyoji; Sugimoto, Nakaba; Kinoshita, Taroh; Yusa, Kosuke; Takeda, Junji

    2014-11-24

    Genome-wide saturation mutagenesis and subsequent phenotype-driven screening has been central to a comprehensive understanding of complex biological processes in classical model organisms such as flies, nematodes, and plants. The degree of "saturation" (i.e., the fraction of possible target genes identified) has been shown to be a critical parameter in determining all relevant genes involved in a biological function, without prior knowledge of their products. In mammalian model systems, however, the relatively large scale and labor intensity of experiments have hampered the achievement of actual saturation mutagenesis, especially for recessive traits that require biallelic mutations to manifest detectable phenotypes. By exploiting the recently established haploid mouse embryonic stem cells (ESCs), we present an implementation of almost complete saturation mutagenesis in a mammalian system. The haploid ESCs were mutagenized with the chemical mutagen N-ethyl-N-nitrosourea (ENU) and processed for the screening of mutants defective in various steps of the glycosylphosphatidylinositol-anchor biosynthetic pathway. The resulting 114 independent mutant clones were characterized by a functional complementation assay, and were shown to be defective in any of 20 genes among all 22 known genes essential for this well-characterized pathway. Ten mutants were further validated by whole-exome sequencing. The predominant generation of single-nucleotide substitutions by ENU resulted in a gene mutation rate proportional to the length of the coding sequence, which facilitated the experimental design of saturation mutagenesis screening with the aid of computational simulation. Our study enables mammalian saturation mutagenesis to become a realistic proposition. Computational simulation, combined with a pilot mutagenesis experiment, could serve as a tool for the estimation of the number of genes essential for biological processes such as drug target pathways when a positive selection of mutants is available.

  12. Measuring the Evolutionary Rewiring of Biological Networks

    PubMed Central

    Shou, Chong; Bhardwaj, Nitin; Lam, Hugo Y. K.; Yan, Koon-Kiu; Kim, Philip M.; Snyder, Michael; Gerstein, Mark B.

    2011-01-01

    We have accumulated a large amount of biological network data and expect even more to come. Soon, we anticipate being able to compare many different biological networks as we commonly do for molecular sequences. It has long been believed that many of these networks change, or “rewire”, at different rates. It is therefore important to develop a framework to quantify the differences between networks in a unified fashion. We developed such a formalism based on analogy to simple models of sequence evolution, and used it to conduct a systematic study of network rewiring on all the currently available biological networks. We found that, similar to sequences, biological networks show a decreased rate of change at large time divergences, because of saturation in potential substitutions. However, different types of biological networks consistently rewire at different rates. Using comparative genomics and proteomics data, we found a consistent ordering of the rewiring rates: transcription regulatory, phosphorylation regulatory, genetic interaction, miRNA regulatory, protein interaction, and metabolic pathway network, from fast to slow. This ordering was found in all comparisons we did of matched networks between organisms. To gain further intuition on network rewiring, we compared our observed rewirings with those obtained from simulation. We also investigated how readily our formalism could be mapped to other network contexts; in particular, we showed how it could be applied to analyze changes in a range of “commonplace” networks such as family trees, co-authorships and linux-kernel function dependencies. PMID:21253555

  13. Cell illustrator 4.0: a computational platform for systems biology.

    PubMed

    Nagasaki, Masao; Saito, Ayumu; Jeong, Euna; Li, Chen; Kojima, Kaname; Ikeda, Emi; Miyano, Satoru

    2011-01-01

    Cell Illustrator is a software platform for Systems Biology that uses the concept of Petri net for modeling and simulating biopathways. It is intended for biological scientists working at bench. The latest version of Cell Illustrator 4.0 uses Java Web Start technology and is enhanced with new capabilities, including: automatic graph grid layout algorithms using ontology information; tools using Cell System Markup Language (CSML) 3.0 and Cell System Ontology 3.0; parameter search module; high-performance simulation module; CSML database management system; conversion from CSML model to programming languages (FORTRAN, C, C++, Java, Python and Perl); import from SBML, CellML, and BioPAX; and, export to SVG and HTML. Cell Illustrator employs an extension of hybrid Petri net in an object-oriented style so that biopathway models can include objects such as DNA sequence, molecular density, 3D localization information, transcription with frame-shift, translation with codon table, as well as biochemical reactions.

  14. Cell Illustrator 4.0: a computational platform for systems biology.

    PubMed

    Nagasaki, Masao; Saito, Ayumu; Jeong, Euna; Li, Chen; Kojima, Kaname; Ikeda, Emi; Miyano, Satoru

    2010-01-01

    Cell Illustrator is a software platform for Systems Biology that uses the concept of Petri net for modeling and simulating biopathways. It is intended for biological scientists working at bench. The latest version of Cell Illustrator 4.0 uses Java Web Start technology and is enhanced with new capabilities, including: automatic graph grid layout algorithms using ontology information; tools using Cell System Markup Language (CSML) 3.0 and Cell System Ontology 3.0; parameter search module; high-performance simulation module; CSML database management system; conversion from CSML model to programming languages (FORTRAN, C, C++, Java, Python and Perl); import from SBML, CellML, and BioPAX; and, export to SVG and HTML. Cell Illustrator employs an extension of hybrid Petri net in an object-oriented style so that biopathway models can include objects such as DNA sequence, molecular density, 3D localization information, transcription with frame-shift, translation with codon table, as well as biochemical reactions.

  15. Occurrence probability of structured motifs in random sequences.

    PubMed

    Robin, S; Daudin, J-J; Richard, H; Sagot, M-F; Schbath, S

    2002-01-01

    The problem of extracting from a set of nucleic acid sequences motifs which may have biological function is more and more important. In this paper, we are interested in particular motifs that may be implicated in the transcription process. These motifs, called structured motifs, are composed of two ordered parts separated by a variable distance and allowing for substitutions. In order to assess their statistical significance, we propose approximations of the probability of occurrences of such a structured motif in a given sequence. An application of our method to evaluate candidate promoters in E. coli and B. subtilis is presented. Simulations show the goodness of the approximations.

  16. Highly multiplexed subcellular RNA sequencing in situ

    PubMed Central

    Lee, Je Hyuk; Daugharthy, Evan R.; Scheiman, Jonathan; Kalhor, Reza; Ferrante, Thomas C.; Yang, Joyce L.; Terry, Richard; Jeanty, Sauveur S. F.; Li, Chao; Amamoto, Ryoji; Peters, Derek T.; Turczyk, Brian M.; Marblestone, Adam H.; Inverso, Samuel A.; Bernard, Amy; Mali, Prashant; Rios, Xavier; Aach, John; Church, George M.

    2014-01-01

    Understanding the spatial organization of gene expression with single nucleotide resolution requires localizing the sequences of expressed RNA transcripts within a cell in situ. Here we describe fluorescent in situ RNA sequencing (FISSEQ), in which stably cross-linked cDNA amplicons are sequenced within a biological sample. Using 30-base reads from 8,742 genes in situ, we examined RNA expression and localization in human primary fibroblasts using a simulated wound healing assay. FISSEQ is compatible with tissue sections and whole mount embryos, and reduces the limitations of optical resolution and noisy signals on single molecule detection. Our platform enables massively parallel detection of genetic elements, including gene transcripts and molecular barcodes, and can be used to investigate cellular phenotype, gene regulation, and environment in situ. PMID:24578530

  17. A synthetic biology approach to the development of transcriptional regulatory models and custom enhancer design☆,☆☆

    PubMed Central

    Martinez, Carlos A.; Barr, Kenneth; Kim, Ah-Ram; Reinitz, John

    2013-01-01

    Synthetic biology offers novel opportunities for elucidating transcriptional regulatory mechanisms and enhancer logic. Complex cis-regulatory sequences—like the ones driving expression of the Drosophila even-skipped gene—have proven difficult to design from existing knowledge, presumably due to the large number of protein-protein interactions needed to drive the correct expression patterns of genes in multicellular organisms. This work discusses two novel computational methods for the custom design of enhancers that employ a sophisticated, empirically validated transcriptional model, optimization algorithms, and synthetic biology. These synthetic elements have both utilitarian and academic value, including improving existing regulatory models as well as evolutionary questions. The first method involves the use of simulated annealing to explore the sequence space for synthetic enhancers whose expression output fit a given search criterion. The second method uses a novel optimization algorithm to find functionally accessible pathways between two enhancer sequences. These paths describe a set of mutations wherein the predicted expression pattern does not significantly vary at any point along the path. Both methods rely on a predictive mathematical framework that maps the enhancer sequence space to functional output. PMID:23732772

  18. In silico characterization and analysis of RTBP1 and NgTRF1 protein through MD simulation and molecular docking - A comparative study.

    PubMed

    Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran

    2015-02-06

    Gaining access to sequence and structure information of telomere binding proteins helps in understanding the essential biological processes involve in conserved sequence specific interaction between DNA and the proteins. Rice telomere binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix turn helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain but till now there is very less communication on the in silico studies of these complete proteins.Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK web server.Digging up all the facts about the proteins it was reveled that around 120 amino acids in the tail part was showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicates the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and Energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.

  19. In Silico Characterization and Analysis of RTBP1 and NgTRF1 Protein Through MD Simulation and Molecular Docking: A Comparative Study.

    PubMed

    Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran

    2015-09-01

    Gaining access to sequence and structure information of telomere-binding proteins helps in understanding the essential biological processes involve in conserved sequence-specific interaction between DNA and the proteins. Rice telomere-binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix-turn-helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain, but till now there is very less communication on the in silico studies of these complete proteins. Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK Web server. By digging up all the facts about the proteins, it was revealed that around 120 amino acids in the tail part were showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicate the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA-binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.

  20. A random walk in physical biology

    NASA Astrophysics Data System (ADS)

    Peterson, Eric Lee

    Biology as a scientific discipline is becoming evermore quantitative as tools become available to probe living systems on every scale from the macro to the micro and now even to the nanoscale. In quantitative biology the challenge is to understand the living world in an in vivo context, where it is often difficult for simple theoretical models to connect with the full richness and complexity of the observed data. Computational models and simulations offer a way to bridge the gap between simple theoretical models and real biological systems; towards that aspiration are presented in this thesis three case studies in applying computational models that may give insight into native biological structures.The first is concerned with soluble proteins; proteins, like DNA, are linear polymers written in a twenty-letter "language" of amino acids. Despite the astronomical number of possible proteins sequences, a great amount of similarity is observed among the folded structures of globular proteins. One useful way of discovering similar sequences is to align their sequences, as done e.g. by the popular BLAST program. By clustering together amino acids and reducing the alphabet that proteins are written in to fewer than twenty letters, we find that pairwise sequence alignments are actually more sensitive to proteins with similar structures.The second case study is concerned with the measurement of forces applied to a membrane. We demonstrate a general method for extracting the forces applied to a fluid lipid bilayer of arbitrary shape and show that the subpiconewton forces applied by optical tweezers to vesicles can be accurately measured in this way.In the third and final case study we examine the forces between proteins in a lipid bilayer membrane. Due to the bending of the membrane surrounding them, such proteins feel mutually attractive forces which can help them to self-organize and act in concert. These finding are relevant at the areal densities estimated for membrane proteins such as the MscL mechanosensitive channel. The findings of the analytical studies were confirmed by a Monte Carlo Markov Chain simulation using the fully two-dimensional potentials between two model proteins in a membrane.Living systems present us with beautiful and intricate structures, from the helices and sheets of a folded protein to the dynamic morphology of cellular organelles and the self-organization of proteins in a biomembrane and a synergy of theoretical and it in silico approaches should enable us to build and refine models of in vivo biological data.

  1. DNA-based random number generation in security circuitry.

    PubMed

    Gearheart, Christy M; Arazi, Benjamin; Rouchka, Eric C

    2010-06-01

    DNA-based circuit design is an area of research in which traditional silicon-based technologies are replaced by naturally occurring phenomena taken from biochemistry and molecular biology. This research focuses on further developing DNA-based methodologies to mimic digital data manipulation. While exhibiting fundamental principles, this work was done in conjunction with the vision that DNA-based circuitry, when the technology matures, will form the basis for a tamper-proof security module, revolutionizing the meaning and concept of tamper-proofing and possibly preventing it altogether based on accurate scientific observations. A paramount part of such a solution would be self-generation of random numbers. A novel prototype schema employs solid phase synthesis of oligonucleotides for random construction of DNA sequences; temporary storage and retrieval is achieved through plasmid vectors. A discussion of how to evaluate sequence randomness is included, as well as how these techniques are applied to a simulation of the random number generation circuitry. Simulation results show generated sequences successfully pass three selected NIST random number generation tests specified for security applications.

  2. Effect of hot acid hydrolysis and hot chlorine dioxide stage on bleaching effluent biodegradability.

    PubMed

    Gomes, C M; Colodette, J L; Delantonio, N R N; Mounteer, A H; Silva, C M

    2007-01-01

    The hot acid hydrolysis followed by chlorine dioxide (A/D*) and hot chlorine dioxide (D*) technologies have proven very useful for bleaching of eucalyptus kraft pulp. Although the characteristics and biodegradability of effluents from conventional chlorine dioxide bleaching are well known, such information is not yet available for effluents derived from hot acid hydrolysis and hot chorine dioxide bleaching. This study discusses the characteristics and biodegradability of such effluents. Combined whole effluents from the complete sequences DEpD, D*EpD, A/D*EpD and ADEpD, and from the pre-bleaching sequences DEp, D*Ep, A/D*Ep and ADEp were characterized by quantifying their colour, AOX and organic load (BOD, COD, TOC). These effluents were also evaluated for their treatability by simulation of an activated sludge system. It was concluded that treatment in the laboratory sequencing batch reactor was efficient for removal of COD, BOD and TOC of all effluents. However, colour increased after biological treatment, with the greatest increase found for the effluent produced using the AD technology. Biological treatment was less efficient at removing AOX of effluents from the sequences with D*, A/D* and AD as the first stages, when compared to the reference D stage; there was evidence of the lower treatability of these organochlorine compounds from these sequences.

  3. SynBioSS-aided design of synthetic biological constructs.

    PubMed

    Kaznessis, Yiannis N

    2011-01-01

    We present walkthrough examples of using SynBioSS to design, model, and simulate synthetic gene regulatory networks. SynBioSS stands for Synthetic Biology Software Suite, a platform that is publicly available with Open Licenses at www.synbioss.org. An important aim of computational synthetic biology is the development of a mathematical modeling formalism that is applicable to a wide variety of simple synthetic biological constructs. SynBioSS-based modeling of biomolecular ensembles that interact away from the thermodynamic limit and not necessarily at steady state affords for a theoretical framework that is generally applicable to known synthetic biological systems, such as bistable switches, AND gates, and oscillators. Here, we discuss how SynBioSS creates links between DNA sequences and targeted dynamic phenotypes of these simple systems. Copyright © 2011 Elsevier Inc. All rights reserved.

  4. Your actions in my cerebellum: subclinical deficits in action observation in patients with unilateral chronic cerebellar stroke.

    PubMed

    Cattaneo, Luigi; Fasanelli, Monica; Andreatta, Olaf; Bonifati, Domenico Marco; Barchiesi, Guido; Caruana, Fausto

    2012-03-01

    Empirical evidence indicates that cognitive consequences of cerebellar lesions tend to be mild and less important than the symptoms due to lesions to cerebral areas. By contrast, imaging studies consistently report strong cerebellar activity during tasks of action observation and action understanding. This has been interpreted as part of the automatic motor simulation process that takes place in the context of action observation. The function of the cerebellum as a sequencer during executed movements makes it a good candidate, within the framework of embodied cognition, for a pivotal role in understanding the timing of action sequences. Here, we investigated a cohort of eight patients with chronic, first-ever, isolated, ischemic lesions of the cerebellum. The experimental task consisted in identifying a plausible sequence of pictures from a randomly ordered group of still frames extracted from (a) a complex action performed by a human actor ("biological action" test) or (b) a complex physical event occurring to an inanimate object ("folk physics" test). A group of 16 healthy participants was used as control. The main result showed that cerebellar patients performed significantly worse than controls in both sequencing tasks, but performed much worse in the "biological action" test than in the "folk physics" test. The dissociation described here suggests that observed sequences of simple motor acts seem to be represented differentially from other sequences in the cerebellum.

  5. Memetic algorithms for de novo motif-finding in biomedical sequences.

    PubMed

    Bi, Chengpeng

    2012-09-01

    The objectives of this study are to design and implement a new memetic algorithm for de novo motif discovery, which is then applied to detect important signals hidden in various biomedical molecular sequences. In this paper, memetic algorithms are developed and tested in de novo motif-finding problems. Several strategies in the algorithm design are employed that are to not only efficiently explore the multiple sequence local alignment space, but also effectively uncover the molecular signals. As a result, there are a number of key features in the implementation of the memetic motif-finding algorithm (MaMotif), including a chromosome replacement operator, a chromosome alteration-aware local search operator, a truncated local search strategy, and a stochastic operation of local search imposed on individual learning. To test the new algorithm, we compare MaMotif with a few of other similar algorithms using simulated and experimental data including genomic DNA, primary microRNA sequences (let-7 family), and transmembrane protein sequences. The new memetic motif-finding algorithm is successfully implemented in C++, and exhaustively tested with various simulated and real biological sequences. In the simulation, it shows that MaMotif is the most time-efficient algorithm compared with others, that is, it runs 2 times faster than the expectation maximization (EM) method and 16 times faster than the genetic algorithm-based EM hybrid. In both simulated and experimental testing, results show that the new algorithm is compared favorably or superior to other algorithms. Notably, MaMotif is able to successfully discover the transcription factors' binding sites in the chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) data, correctly uncover the RNA splicing signals in gene expression, and precisely find the highly conserved helix motif in the transmembrane protein sequences, as well as rightly detect the palindromic segments in the primary microRNA sequences. The memetic motif-finding algorithm is effectively designed and implemented, and its applications demonstrate it is not only time-efficient, but also exhibits excellent performance while compared with other popular algorithms. Copyright © 2012 Elsevier B.V. All rights reserved.

  6. Rogue taxa phenomenon: a biological companion to simulation analysis

    PubMed Central

    Westover, Kristi M.; Rusinko, Joseph P.; Hoin, Jon; Neal, Matthew

    2013-01-01

    To provide a baseline biological comparison to simulation study predictions about the frequency of rogue taxa effects, we evaluated the frequency of a rogue taxa effect using viral data sets which differed in diversity. Using a quartet-tree framework, we measured the frequency of a rogue taxa effect in three data sets of increasing genetic variability (within viral serotype, between viral serotype, and between viral family) to test whether the rogue taxa was correlated with the mean sequence diversity of the respective data sets. We found a slight increase in the percentage of rogues as nucleotide diversity increased. Even though the number of rogues increased with diversity, the distribution of the types of rogues (friendly, crazy, or evil) did not depend on the diversity and in the case of the order-level data set the net rogue effect was slightly positive. This study, assessing frequency of the rogue taxa effect using biological data, indicated that simulation studies may over-predict the prevalence of the rogue taxa effect. Further investigations are necessary to understand which types of data sets are susceptible to a negative rogue effect and thus merit the removal of taxa from large phylogenetic reconstructions. PMID:23707704

  7. Rogue taxa phenomenon: a biological companion to simulation analysis.

    PubMed

    Westover, Kristi M; Rusinko, Joseph P; Hoin, Jon; Neal, Matthew

    2013-10-01

    To provide a baseline biological comparison to simulation study predictions about the frequency of rogue taxa effects, we evaluated the frequency of a rogue taxa effect using viral data sets which differed in diversity. Using a quartet-tree framework, we measured the frequency of a rogue taxa effect in three data sets of increasing genetic variability (within viral serotype, between viral serotype, and between viral family) to test whether the rogue taxa was correlated with the mean sequence diversity of the respective data sets. We found a slight increase in the percentage of rogues as nucleotide diversity increased. Even though the number of rogues increased with diversity, the distribution of the types of rogues (friendly, crazy, or evil) did not depend on the diversity and in the case of the order-level data set the net rogue effect was slightly positive. This study, assessing frequency of the rogue taxa effect using biological data, indicated that simulation studies may over-predict the prevalence of the rogue taxa effect. Further investigations are necessary to understand which types of data sets are susceptible to a negative rogue effect and thus merit the removal of taxa from large phylogenetic reconstructions. Copyright © 2013 Elsevier Inc. All rights reserved.

  8. Nanoscale Bio-engineering Solutions for Space Exploration: The Nanopore Sequencer

    NASA Technical Reports Server (NTRS)

    Stolc, Viktor; Cozmuta, Ioana

    2004-01-01

    Characterization of biological systems at the molecular level and extraction of essential information for nano-engineering design to guide the nano-fabrication of solid-state sensors and molecular identification devices is a computational challenge. The alpha hemolysin protein ion channel is used as a model system for structural analysis of nucleic acids like DNA. Applied voltage draws a DNA strand and surrounding ionic solution through the biological nanopore. The subunits in the DNA strand block ion flow by differing amounts. Atomistic scale simulations are employed using NASA supercomputers to study DNA translocation, with the aim to enhance single DNA subunit identification. Compared to protein channels, solid-state nanopores offer a better temporal control of the translocation of DNA and the possibility to easily tune its chemistry to increase the signal resolution. Potential applications for NASA missions, besides real-time genome sequencing include astronaut health, life detection and decoding of various genomes.

  9. Nanoscale Bioengineering Solutions for Space Exploration the Nanopore Sequencer

    NASA Technical Reports Server (NTRS)

    Ioana, Cozmuta; Viktor, Stoic

    2005-01-01

    Characterization of biological systems at the molecular level and extraction of essential information for nano-engineering design to guide the nano-fabrication of solid-state sensors and molecular identification devices is a computational challenge. The alpha hemolysin protein ion channel is used as a model system for structural analysis of nucleic acids like DNA. Applied voltage draws a DNA strand and surrounding ionic solution through the biological nanopore. The subunits in the DNA strand block ion flow by differing amounts. Atomistic scale simulations are employed using NASA supercomputers to study DNA translocation. with the aim to enhance single DNA subunit identification. Compared to protein channels, solid-state nanopores offer a better temporal control of the translocation of DNA and the possibility to easily tune its chemistry to increase the signal resolution. Potential applications for NASA missions, besides real-time genome sequencing include astronaut health, life detection and decoding of various genomes. http://phenomrph.arc.nasa.gov/index.php

  10. Effect of the SH3-SH2 domain linker sequence on the structure of Hck kinase.

    PubMed

    Meiselbach, Heike; Sticht, Heinrich

    2011-08-01

    The coordination of activity in biological systems requires the existence of different signal transduction pathways that interact with one another and must be precisely regulated. The Src-family tyrosine kinases, which are found in many signaling pathways, differ in their physiological function despite their high overall structural similarity. In this context, the differences in the SH3-SH2 domain linkers might play a role for differential regulation, but the structural consequences of linker sequence remain poorly understood. We have therefore performed comparative molecular dynamics simulations of wildtype Hck and of a mutant Hck in which the SH3-SH2 domain linker is replaced by the corresponding sequence from the homologous kinase Lck. These simulations reveal that linker replacement not only affects the orientation of the SH3 domain itself, but also leads to an alternative conformation of the activation segment in the Hck kinase domain. The sequence of the SH3-SH2 domain linker thus exerts a remote effect on the active site geometry and might therefore play a role in modulating the structure of the inactive kinase or in fine-tuning the activation process itself.

  11. Open-pNovo: De Novo Peptide Sequencing with Thousands of Protein Modifications.

    PubMed

    Yang, Hao; Chi, Hao; Zhou, Wen-Jing; Zeng, Wen-Feng; He, Kun; Liu, Chao; Sun, Rui-Xiang; He, Si-Min

    2017-02-03

    De novo peptide sequencing has improved remarkably, but sequencing full-length peptides with unexpected modifications is still a challenging problem. Here we present an open de novo sequencing tool, Open-pNovo, for de novo sequencing of peptides with arbitrary types of modifications. Although the search space increases by ∼300 times, Open-pNovo is close to or even ∼10-times faster than the other three proposed algorithms. Furthermore, considering top-1 candidates on three MS/MS data sets, Open-pNovo can recall over 90% of the results obtained by any one traditional algorithm and report 5-87% more peptides, including 14-250% more modified peptides. On a high-quality simulated data set, ∼85% peptides with arbitrary modifications can be recalled by Open-pNovo, while hardly any results can be recalled by others. In summary, Open-pNovo is an excellent tool for open de novo sequencing and has great potential for discovering unexpected modifications in the real biological applications.

  12. Computation of repetitions and regularities of biologically weighted sequences.

    PubMed

    Christodoulakis, M; Iliopoulos, C; Mouchard, L; Perdikuri, K; Tsakalidis, A; Tsichlas, K

    2006-01-01

    Biological weighted sequences are used extensively in molecular biology as profiles for protein families, in the representation of binding sites and often for the representation of sequences produced by a shotgun sequencing strategy. In this paper, we address three fundamental problems in the area of biologically weighted sequences: (i) computation of repetitions, (ii) pattern matching, and (iii) computation of regularities. Our algorithms can be used as basic building blocks for more sophisticated algorithms applied on weighted sequences.

  13. Noise-robust speech recognition through auditory feature detection and spike sequence decoding.

    PubMed

    Schafer, Phillip B; Jin, Dezhe Z

    2014-03-01

    Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences--one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.

  14. Brownian dynamics simulations of sequence-dependent duplex denaturation in dynamically superhelical DNA

    NASA Astrophysics Data System (ADS)

    Mielke, Steven P.; Grønbech-Jensen, Niels; Krishnan, V. V.; Fink, William H.; Benham, Craig J.

    2005-09-01

    The topological state of DNA in vivo is dynamically regulated by a number of processes that involve interactions with bound proteins. In one such process, the tracking of RNA polymerase along the double helix during transcription, restriction of rotational motion of the polymerase and associated structures, generates waves of overtwist downstream and undertwist upstream from the site of transcription. The resulting superhelical stress is often sufficient to drive double-stranded DNA into a denatured state at locations such as promoters and origins of replication, where sequence-specific duplex opening is a prerequisite for biological function. In this way, transcription and other events that actively supercoil the DNA provide a mechanism for dynamically coupling genetic activity with regulatory and other cellular processes. Although computer modeling has provided insight into the equilibrium dynamics of DNA supercoiling, to date no model has appeared for simulating sequence-dependent DNA strand separation under the nonequilibrium conditions imposed by the dynamic introduction of torsional stress. Here, we introduce such a model and present results from an initial set of computer simulations in which the sequences of dynamically superhelical, 147 base pair DNA circles were systematically altered in order to probe the accuracy with which the model can predict location, extent, and time of stress-induced duplex denaturation. The results agree both with well-tested statistical mechanical calculations and with available experimental information. Additionally, we find that sites susceptible to denaturation show a propensity for localizing to supercoil apices, suggesting that base sequence determines locations of strand separation not only through the energetics of interstrand interactions, but also by influencing the geometry of supercoiling.

  15. Brownian dynamics simulations of sequence-dependent duplex denaturation in dynamically superhelical DNA.

    PubMed

    Mielke, Steven P; Grønbech-Jensen, Niels; Krishnan, V V; Fink, William H; Benham, Craig J

    2005-09-22

    The topological state of DNA in vivo is dynamically regulated by a number of processes that involve interactions with bound proteins. In one such process, the tracking of RNA polymerase along the double helix during transcription, restriction of rotational motion of the polymerase and associated structures, generates waves of overtwist downstream and undertwist upstream from the site of transcription. The resulting superhelical stress is often sufficient to drive double-stranded DNA into a denatured state at locations such as promoters and origins of replication, where sequence-specific duplex opening is a prerequisite for biological function. In this way, transcription and other events that actively supercoil the DNA provide a mechanism for dynamically coupling genetic activity with regulatory and other cellular processes. Although computer modeling has provided insight into the equilibrium dynamics of DNA supercoiling, to date no model has appeared for simulating sequence-dependent DNA strand separation under the nonequilibrium conditions imposed by the dynamic introduction of torsional stress. Here, we introduce such a model and present results from an initial set of computer simulations in which the sequences of dynamically superhelical, 147 base pair DNA circles were systematically altered in order to probe the accuracy with which the model can predict location, extent, and time of stress-induced duplex denaturation. The results agree both with well-tested statistical mechanical calculations and with available experimental information. Additionally, we find that sites susceptible to denaturation show a propensity for localizing to supercoil apices, suggesting that base sequence determines locations of strand separation not only through the energetics of interstrand interactions, but also by influencing the geometry of supercoiling.

  16. EdiPy: a resource to simulate the evolution of plant mitochondrial genes under the RNA editing.

    PubMed

    Picardi, Ernesto; Quagliariello, Carla

    2006-02-01

    EdiPy is an online resource appropriately designed to simulate the evolution of plant mitochondrial genes in a biologically realistic fashion. EdiPy takes into account the presence of sites subjected to RNA editing and provides multiple artificial alignments corresponding to both genomic and cDNA sequences. Each artificial data set can successively be submitted to main and widespread evolutionary and phylogenetic software packages such as PAUP, Phyml, PAML and Phylip. As an online bioinformatic resource, EdiPy is available at the following web page: http://biologia.unical.it/py_script/index.html.

  17. Epsilon-Q: An Automated Analyzer Interface for Mass Spectral Library Search and Label-Free Protein Quantification.

    PubMed

    Cho, Jin-Young; Lee, Hyoung-Joo; Jeong, Seul-Ki; Paik, Young-Ki

    2017-12-01

    Mass spectrometry (MS) is a widely used proteome analysis tool for biomedical science. In an MS-based bottom-up proteomic approach to protein identification, sequence database (DB) searching has been routinely used because of its simplicity and convenience. However, searching a sequence DB with multiple variable modification options can increase processing time, false-positive errors in large and complicated MS data sets. Spectral library searching is an alternative solution, avoiding the limitations of sequence DB searching and allowing the detection of more peptides with high sensitivity. Unfortunately, this technique has less proteome coverage, resulting in limitations in the detection of novel and whole peptide sequences in biological samples. To solve these problems, we previously developed the "Combo-Spec Search" method, which uses manually multiple references and simulated spectral library searching to analyze whole proteomes in a biological sample. In this study, we have developed a new analytical interface tool called "Epsilon-Q" to enhance the functions of both the Combo-Spec Search method and label-free protein quantification. Epsilon-Q performs automatically multiple spectral library searching, class-specific false-discovery rate control, and result integration. It has a user-friendly graphical interface and demonstrates good performance in identifying and quantifying proteins by supporting standard MS data formats and spectrum-to-spectrum matching powered by SpectraST. Furthermore, when the Epsilon-Q interface is combined with the Combo-Spec search method, called the Epsilon-Q system, it shows a synergistic function by outperforming other sequence DB search engines for identifying and quantifying low-abundance proteins in biological samples. The Epsilon-Q system can be a versatile tool for comparative proteome analysis based on multiple spectral libraries and label-free quantification.

  18. A Study of the Comparative Effectiveness of Zoology Prerequisites at Slippery Rock State College.

    ERIC Educational Resources Information Center

    Morrison, William Sechler

    This study compared the effectiveness of three sequences of prerequisite courses required before taking zoology. Sequence 1 prerequisite courses consisted of general biology and human biology; Sequence 2 consisted of general biology; and Sequence 3 required cell biology. Zoology students in the spring of 1972 were pretest and a posttest. The mean…

  19. An improved immune algorithm for optimizing the pulse width modulation control sequence of inverters

    NASA Astrophysics Data System (ADS)

    Sheng, L.; Qian, S. Q.; Ye, Y. Q.; Wu, Y. H.

    2017-09-01

    In this article, an improved immune algorithm (IIA), based on the fundamental principles of the biological immune system, is proposed for optimizing the pulse width modulation (PWM) control sequence of a single-phase full-bridge inverter. The IIA takes advantage of the receptor editing and adaptive mutation mechanisms of the immune system to develop two operations that enhance the population diversity and convergence of the proposed algorithm. To verify the effectiveness and examine the performance of the IIA, 17 cases are considered, including fixed and disturbed resistances. Simulation results show that the IIA is able to obtain an effective PWM control sequence. Furthermore, when compared with existing immune algorithms (IAs), genetic algorithms (GAs), a non-traditional GA, simplified simulated annealing, and a generalized Hopfield neural network method, the IIA can achieve small total harmonic distortion (THD) and large magnitude. Meanwhile, a non-parametric test indicates that the IIA is significantly better than most comparison algorithms. Supplemental data for this article can be accessed at http://dx.doi.org/10.1080/0305215X.2016.1250894.

  20. Using evolutionary computations to understand the design and evolution of gene and cell regulatory networks.

    PubMed

    Spirov, Alexander; Holloway, David

    2013-07-15

    This paper surveys modeling approaches for studying the evolution of gene regulatory networks (GRNs). Modeling of the design or 'wiring' of GRNs has become increasingly common in developmental and medical biology, as a means of quantifying gene-gene interactions, the response to perturbations, and the overall dynamic motifs of networks. Drawing from developments in GRN 'design' modeling, a number of groups are now using simulations to study how GRNs evolve, both for comparative genomics and to uncover general principles of evolutionary processes. Such work can generally be termed evolution in silico. Complementary to these biologically-focused approaches, a now well-established field of computer science is Evolutionary Computations (ECs), in which highly efficient optimization techniques are inspired from evolutionary principles. In surveying biological simulation approaches, we discuss the considerations that must be taken with respect to: (a) the precision and completeness of the data (e.g. are the simulations for very close matches to anatomical data, or are they for more general exploration of evolutionary principles); (b) the level of detail to model (we proceed from 'coarse-grained' evolution of simple gene-gene interactions to 'fine-grained' evolution at the DNA sequence level); (c) to what degree is it important to include the genome's cellular context; and (d) the efficiency of computation. With respect to the latter, we argue that developments in computer science EC offer the means to perform more complete simulation searches, and will lead to more comprehensive biological predictions. Copyright © 2013 Elsevier Inc. All rights reserved.

  1. Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer.

    PubMed

    Bernard, Guillaume; Chan, Cheong Xin; Ragan, Mark A

    2016-07-01

    Alignment-free (AF) approaches have recently been highlighted as alternatives to methods based on multiple sequence alignment in phylogenetic inference. However, the sensitivity of AF methods to genome-scale evolutionary scenarios is little known. Here, using simulated microbial genome data we systematically assess the sensitivity of nine AF methods to three important evolutionary scenarios: sequence divergence, lateral genetic transfer (LGT) and genome rearrangement. Among these, AF methods are most sensitive to the extent of sequence divergence, less sensitive to low and moderate frequencies of LGT, and most robust against genome rearrangement. We describe the application of AF methods to three well-studied empirical genome datasets, and introduce a new application of the jackknife to assess node support. Our results demonstrate that AF phylogenomics is computationally scalable to multi-genome data and can generate biologically meaningful phylogenies and insights into microbial evolution.

  2. It’s More Than Stamp Collecting: How Genome Sequencing Can Unify Biological Research

    PubMed Central

    Richards, Stephen

    2015-01-01

    The availability of reference genome sequences, especially the human reference, has revolutionized the study of biology. However, whilst the genomes of some species have been fully sequenced, a wide range of biological problems still cannot be effectively studied for lack of genome sequence information. Here, I identify neglected areas of biology and describe how both targeted species sequencing and more broad taxonomic surveys of the tree of life can address important biological questions. I enumerate the significant benefits that would accrue from sequencing a broader range of taxa, as well as discuss the technical advances in sequencing and assembly methods that would allow for wide-ranging application of whole-genome analysis. Finally, I suggest that in addition to “Big Science” survey initiatives to sequence the tree of life, a modified infrastructure-funding paradigm would better support reference genome sequence generation for research communities most in need. PMID:26003218

  3. It's more than stamp collecting: how genome sequencing can unify biological research.

    PubMed

    Richards, Stephen

    2015-07-01

    The availability of reference genome sequences, especially the human reference, has revolutionized the study of biology. However, while the genomes of some species have been fully sequenced, a wide range of biological problems still cannot be effectively studied for lack of genome sequence information. Here, I identify neglected areas of biology and describe how both targeted species sequencing and more broad taxonomic surveys of the tree of life can address important biological questions. I enumerate the significant benefits that would accrue from sequencing a broader range of taxa, as well as discuss the technical advances in sequencing and assembly methods that would allow for wide-ranging application of whole-genome analysis. Finally, I suggest that in addition to 'big science' survey initiatives to sequence the tree of life, a modified infrastructure-funding paradigm would better support reference genome sequence generation for research communities most in need. Copyright © 2015 Elsevier Ltd. All rights reserved.

  4. Quantum annealing versus classical machine learning applied to a simplified computational biology problem

    PubMed Central

    Li, Richard Y.; Di Felice, Rosa; Rohs, Remo; Lidar, Daniel A.

    2018-01-01

    Transcription factors regulate gene expression, but how these proteins recognize and specifically bind to their DNA targets is still debated. Machine learning models are effective means to reveal interaction mechanisms. Here we studied the ability of a quantum machine learning approach to predict binding specificity. Using simplified datasets of a small number of DNA sequences derived from actual binding affinity experiments, we trained a commercially available quantum annealer to classify and rank transcription factor binding. The results were compared to state-of-the-art classical approaches for the same simplified datasets, including simulated annealing, simulated quantum annealing, multiple linear regression, LASSO, and extreme gradient boosting. Despite technological limitations, we find a slight advantage in classification performance and nearly equal ranking performance using the quantum annealer for these fairly small training data sets. Thus, we propose that quantum annealing might be an effective method to implement machine learning for certain computational biology problems. PMID:29652405

  5. Detecting Recombination Hotspots from Patterns of Linkage Disequilibrium.

    PubMed

    Wall, Jeffrey D; Stevison, Laurie S

    2016-08-09

    With recent advances in DNA sequencing technologies, it has become increasingly easy to use whole-genome sequencing of unrelated individuals to assay patterns of linkage disequilibrium (LD) across the genome. One type of analysis that is commonly performed is to estimate local recombination rates and identify recombination hotspots from patterns of LD. One method for detecting recombination hotspots, LDhot, has been used in a handful of species to further our understanding of the basic biology of recombination. For the most part, the effectiveness of this method (e.g., power and false positive rate) is unknown. In this study, we run extensive simulations to compare the effectiveness of three different implementations of LDhot. We find large differences in the power and false positive rates of these different approaches, as well as a strong sensitivity to the window size used (with smaller window sizes leading to more accurate estimation of hotspot locations). We also compared our LDhot simulation results with comparable simulation results obtained from a Bayesian maximum-likelihood approach for identifying hotspots. Surprisingly, we found that the latter computationally intensive approach had substantially lower power over the parameter values considered in our simulations. Copyright © 2016 Wall and Stevison.

  6. Conformational landscape of the HIV-V3 hairpin loop from all-atom free-energy simulations

    NASA Astrophysics Data System (ADS)

    Verma, Abhinav; Wenzel, Wolfgang

    2008-03-01

    Small beta hairpins have many distinct biological functions, including their involvement in chemokine and viral receptor recognition. The relevance of structural similarities between different hairpin loops with near homologous sequences is not yet understood, calling for the development of methods for de novo hairpin structure prediction and simulation. De novo folding of beta strands is more difficult than that of helical proteins because of nonlocal hydrogen bonding patterns that connect amino acids that are distant in the amino acid sequence and there is a large variety of possible hydrogen bond patterns. Here we use a greedy version of the basin hopping technique with our free-energy forcefield PFF02 to reproducibly and predictively fold the hairpin structure of a HIV-V3 loop. We performed 20 independent basin hopping runs for 500cycles corresponding to 7.4×107 energy evaluations each. The lowest energy structure found in the simulation has a backbone root mean square deviation (bRMSD) of only 2.04Å to the native conformation. The lowest 9 out of the 20 simulations converged to conformations deviating less than 2.5Å bRMSD from native.

  7. Conformational landscape of the HIV-V3 hairpin loop from all-atom free-energy simulations.

    PubMed

    Verma, Abhinav; Wenzel, Wolfgang

    2008-03-14

    Small beta hairpins have many distinct biological functions, including their involvement in chemokine and viral receptor recognition. The relevance of structural similarities between different hairpin loops with near homologous sequences is not yet understood, calling for the development of methods for de novo hairpin structure prediction and simulation. De novo folding of beta strands is more difficult than that of helical proteins because of nonlocal hydrogen bonding patterns that connect amino acids that are distant in the amino acid sequence and there is a large variety of possible hydrogen bond patterns. Here we use a greedy version of the basin hopping technique with our free-energy forcefield PFF02 to reproducibly and predictively fold the hairpin structure of a HIV-V3 loop. We performed 20 independent basin hopping runs for 500 cycles corresponding to 7.4 x 10(7) energy evaluations each. The lowest energy structure found in the simulation has a backbone root mean square deviation (bRMSD) of only 2.04 A to the native conformation. The lowest 9 out of the 20 simulations converged to conformations deviating less than 2.5 A bRMSD from native.

  8. Design and Engineering of a Multi-Target (Multiplex) DNA Simulant to Evaluate Nulceic Acid Based Assays for Detection of Biological Threat Agents

    DTIC Science & Technology

    2006-11-01

    mallei , Burkholderia pseudomallei and Variola virus (smallpox virus). A chimera of 2040 bp was engineered to produce PCR amplicons of different sizes...potential bio-warfare use have been completely sequenced, B. mallei , the etiologic agent of glanders , and B. pseudomallei, causative agent of... Burkholderia mallei Nierman et al, 2004 Burkholderia pseudomallei Holden et al, 2004 Burkholderia thailandensis

  9. Unified Deep Learning Architecture for Modeling Biology Sequence.

    PubMed

    Wu, Hongjie; Cao, Chengyuan; Xia, Xiaoyan; Lu, Qiang

    2017-10-09

    Prediction of the spatial structure or function of biological macromolecules based on their sequence remains an important challenge in bioinformatics. When modeling biological sequences using traditional sequencing models, characteristics, such as long-range interactions between basic units, the complicated and variable output of labeled structures, and the variable length of biological sequences, usually lead to different solutions on a case-by-case basis. This study proposed the use of bidirectional recurrent neural networks based on long short-term memory or a gated recurrent unit to capture long-range interactions by designing the optional reshape operator to adapt to the diversity of the output labels and implementing a training algorithm to support the training of sequence models capable of processing variable-length sequences. Additionally, the merge and pooling operators enhanced the ability to capture short-range interactions between basic units of biological sequences. The proposed deep-learning model and its training algorithm might be capable of solving currently known biological sequence-modeling problems through the use of a unified framework. We validated our model on one of the most difficult biological sequence-modeling problems currently known, with our results indicating the ability of the model to obtain predictions of protein residue interactions that exceeded the accuracy of current popular approaches by 10% based on multiple benchmarks.

  10. Colored petri net modeling of small interfering RNA-mediated messenger RNA degradation.

    PubMed

    Nickaeen, Niloofar; Moein, Shiva; Heidary, Zarifeh; Ghaisari, Jafar

    2016-01-01

    Mathematical modeling of biological systems is an attractive way for studying complex biological systems and their behaviors. Petri Nets, due to their ability to model systems with various levels of qualitative information, have been wildly used in modeling biological systems in which enough qualitative data may not be at disposal. These nets have been used to answer questions regarding the dynamics of different cell behaviors including the translation process. In one stage of the translation process, the RNA sequence may be degraded. In the process of degradation of RNA sequence, small-noncoding RNA molecules known as small interfering RNA (siRNA) match the target RNA sequence. As a result of this matching, the target RNA sequence is destroyed. In this context, the process of matching and destruction is modeled using Colored Petri Nets (CPNs). The model is constructed using CPNs which allow tokens to have a value or type on them. Thus, CPN is a suitable tool to model string structures in which each element of the string has a different type. Using CPNs, long RNA, and siRNA strings are modeled with a finite set of colors. The model is simulated via CPN Tools. A CPN model of the matching between RNA and siRNA strings is constructed in CPN Tools environment. In previous studies, a network of stoichiometric equations was modeled. However, in this particular study, we modeled the mechanism behind the silencing process. Modeling this kind of mechanisms provides us with a tool to examine the effects of different factors such as mutation or drugs on the process.

  11. Advanced Applications of Next-Generation Sequencing Technologies to Orchid Biology.

    PubMed

    Yeh, Chuan-Ming; Liu, Zhong-Jian; Tsai, Wen-Chieh

    2018-01-01

    Next-generation sequencing technologies are revolutionizing biology by permitting, transcriptome sequencing, whole-genome sequencing and resequencing, and genome-wide single nucleotide polymorphism profiling. Orchid research has benefited from this breakthrough, and a few orchid genomes are now available; new biological questions can be approached and new breeding strategies can be designed. The first part of this review describes the unique features of orchid biology. The second part provides an overview of the current next-generation sequencing platforms, many of which are already used in plant laboratories. The third part summarizes the state of orchid transcriptome and genome sequencing and illustrates current achievements. The genetic sequences currently obtained will not only provide a broad scope for the study of orchid biology, but also serves as a starting point for uncovering the mystery of orchid evolution.

  12. Biological phosphorus removal in an extended ASM2 model: Roles of extracellular polymeric substances and kinetic modeling.

    PubMed

    Yang, Shan-Shan; Pang, Ji-Wei; Guo, Wan-Qian; Yang, Xiao-Yin; Wu, Zhong-Yang; Ren, Nan-Qi; Zhao, Zhi-Qing

    2017-05-01

    This paper presents the results of an extended ASM2 model for the modeling and calibration of the role of extracellular polymeric substances (EPS) in phosphorus (P) removal in an anaerobic-aerobic process. In this extended ASM2 model, two new components, the bound EPS (X EPS ) and the soluble EPS (S EPS ), are introduced. Compared with the ASM2, 7.71, 8.53, and 9.28% decreases in polyphosphate (polyP) were observed in the extended ASM2 in three sequencing batch reactors feeding with different COD/P ratios, indicating that 7.71-9.28% of P in the liquid was adsorbed by EPS. Sensitive analysis indicated that, five parameters were the significant influential parameters and had been chosen for further model calibration by using the least square method to simulate by MATLAB. This extended ASM2 has been successfully established to simulate the output variables and provides a useful reference for the mathematic simulations of the role of EPS in biological phosphorus removal process. Copyright © 2017. Published by Elsevier Ltd.

  13. Cysteine-containing peptide tag for site-specific conjugation of proteins

    DOEpatents

    Backer, Marina V.; Backer, Joseph M.

    2008-04-08

    The present invention is directed to a biological conjugate, comprising: (a) a targeting moiety comprising a polypeptide having an amino acid sequence comprising the polypeptide sequence of SEQ ID NO:2 and the polypeptide sequence of a selected targeting protein; and (b) a binding moiety bound to the targeting moiety; the biological conjugate having a covalent bond between the thiol group of SEQ ID NO:2 and a functional group in the binding moiety. The present invention is directed to a biological conjugate, comprising: (a) a targeting moiety comprising a polypeptide having an amino acid sequence comprising the polypeptide sequence of SEQ ID NO:2 and the polypeptide sequence of a selected targeting protein; and (b) a binding moiety that comprises an adapter protein, the adapter protein having a thiol group; the biological conjugate having a disulfide bond between the thiol group of SEQ ID NO:2 and the thiol group of the adapter protein. The present invention is also directed to biological sequences employed in the above biological conjugates, as well as pharmaceutical preparations and methods using the above biological conjugates.

  14. Cysteine-containing peptide tag for site-specific conjugation of proteins

    DOEpatents

    Backer, Marina V.; Backer, Joseph M.

    2010-10-05

    The present invention is directed to a biological conjugate, comprising: (a) a targeting moiety comprising a polypeptide having an amino acid sequence comprising the polypeptide sequence of SEQ ID NO:2 and the polypeptide sequence of a selected targeting protein; and (b) a binding moiety bound to the targeting moiety; the biological conjugate having a covalent bond between the thiol group of SEQ ID NO:2 and a functional group in the binding moiety. The present invention is directed to a biological conjugate, comprising: (a) a targeting moiety comprising a polypeptide having an amino acid sequence comprising the polypeptide sequence of SEQ ID NO:2 and the polypeptide sequence of a selected targeting protein; and (b) a binding moiety that comprises an adapter protein, the adapter protein having a thiol group; the biological conjugate having a disulfide bond between the thiol group of SEQ ID NO:2 and the thiol group of the adapter protein. The present invention is also directed to biological sequences employed in the above biological conjugates, as well as pharmaceutical preparations and methods using the above biological conjugates.

  15. Evolutionary Dynamics on Protein Bi-stability Landscapes can Potentially Resolve Adaptive Conflicts

    PubMed Central

    Sikosek, Tobias; Bornberg-Bauer, Erich; Chan, Hue Sun

    2012-01-01

    Experimental studies have shown that some proteins exist in two alternative native-state conformations. It has been proposed that such bi-stable proteins can potentially function as evolutionary bridges at the interface between two neutral networks of protein sequences that fold uniquely into the two different native conformations. Under adaptive conflict scenarios, bi-stable proteins may be of particular advantage if they simultaneously provide two beneficial biological functions. However, computational models that simulate protein structure evolution do not yet recognize the importance of bi-stability. Here we use a biophysical model to analyze sequence space to identify bi-stable or multi-stable proteins with two or more equally stable native-state structures. The inclusion of such proteins enhances phenotype connectivity between neutral networks in sequence space. Consideration of the sequence space neighborhood of bridge proteins revealed that bi-stability decreases gradually with each mutation that takes the sequence further away from an exactly bi-stable protein. With relaxed selection pressures, we found that bi-stable proteins in our model are highly successful under simulated adaptive conflict. Inspired by these model predictions, we developed a method to identify real proteins in the PDB with bridge-like properties, and have verified a clear bi-stability gradient for a series of mutants studied by Alexander et al. (Proc Nat Acad Sci USA 2009, 106:21149–21154) that connect two sequences that fold uniquely into two different native structures via a bridge-like intermediate mutant sequence. Based on these findings, new testable predictions for future studies on protein bi-stability and evolution are discussed. PMID:23028272

  16. Node fingerprinting: an efficient heuristic for aligning biological networks.

    PubMed

    Radu, Alex; Charleston, Michael

    2014-10-01

    With the continuing increase in availability of biological data and improvements to biological models, biological network analysis has become a promising area of research. An emerging technique for the analysis of biological networks is through network alignment. Network alignment has been used to calculate genetic distance, similarities between regulatory structures, and the effect of external forces on gene expression, and to depict conditional activity of expression modules in cancer. Network alignment is algorithmically complex, and therefore we must rely on heuristics, ideally as efficient and accurate as possible. The majority of current techniques for network alignment rely on precomputed information, such as with protein sequence alignment, or on tunable network alignment parameters, which may introduce an increased computational overhead. Our presented algorithm, which we call Node Fingerprinting (NF), is appropriate for performing global pairwise network alignment without precomputation or tuning, can be fully parallelized, and is able to quickly compute an accurate alignment between two biological networks. It has performed as well as or better than existing algorithms on biological and simulated data, and with fewer computational resources. The algorithmic validation performed demonstrates the low computational resource requirements of NF.

  17. A standard-enabled workflow for synthetic biology.

    PubMed

    Myers, Chris J; Beal, Jacob; Gorochowski, Thomas E; Kuwahara, Hiroyuki; Madsen, Curtis; McLaughlin, James Alastair; Mısırlı, Göksel; Nguyen, Tramy; Oberortner, Ernst; Samineni, Meher; Wipat, Anil; Zhang, Michael; Zundel, Zach

    2017-06-15

    A synthetic biology workflow is composed of data repositories that provide information about genetic parts, sequence-level design tools to compose these parts into circuits, visualization tools to depict these designs, genetic design tools to select parts to create systems, and modeling and simulation tools to evaluate alternative design choices. Data standards enable the ready exchange of information within such a workflow, allowing repositories and tools to be connected from a diversity of sources. The present paper describes one such workflow that utilizes, among others, the Synthetic Biology Open Language (SBOL) to describe genetic designs, the Systems Biology Markup Language to model these designs, and SBOL Visual to visualize these designs. We describe how a standard-enabled workflow can be used to produce types of design information, including multiple repositories and software tools exchanging information using a variety of data standards. Recently, the ACS Synthetic Biology journal has recommended the use of SBOL in their publications. © 2017 The Author(s); published by Portland Press Limited on behalf of the Biochemical Society.

  18. Hands-on-Entropy, Energy Balance with Biological Relevance

    NASA Astrophysics Data System (ADS)

    Reeves, Mark

    2015-03-01

    Entropy changes underlie the physics that dominates biological interactions. Indeed, introductory biology courses often begin with an exploration of the qualities of water that are important to living systems. However, one idea that is not explicitly addressed in most introductory physics or biology textbooks is important contribution of the entropy in driving fundamental biological processes towards equilibrium. From diffusion to cell-membrane formation, to electrostatic binding in protein folding, to the functioning of nerve cells, entropic effects often act to counterbalance deterministic forces such as electrostatic attraction and in so doing, allow for effective molecular signaling. A small group of biology, biophysics and computer science faculty have worked together for the past five years to develop curricular modules (based on SCALEUP pedagogy). This has enabled students to create models of stochastic and deterministic processes. Our students are first-year engineering and science students in the calculus-based physics course and they are not expected to know biology beyond the high-school level. In our class, they learn to reduce complex biological processes and structures in order model them mathematically to account for both deterministic and probabilistic processes. The students test these models in simulations and in laboratory experiments that are biologically relevant such as diffusion, ionic transport, and ligand-receptor binding. Moreover, the students confront random forces and traditional forces in problems, simulations, and in laboratory exploration throughout the year-long course as they move from traditional kinematics through thermodynamics to electrostatic interactions. This talk will present a number of these exercises, with particular focus on the hands-on experiments done by the students, and will give examples of the tangible material that our students work with throughout the two-semester sequence of their course on introductory physics with a bio focus. Supported by NSF DUE.

  19. TinkerCell: modular CAD tool for synthetic biology.

    PubMed

    Chandran, Deepak; Bergmann, Frank T; Sauro, Herbert M

    2009-10-29

    Synthetic biology brings together concepts and techniques from engineering and biology. In this field, computer-aided design (CAD) is necessary in order to bridge the gap between computational modeling and biological data. Using a CAD application, it would be possible to construct models using available biological "parts" and directly generate the DNA sequence that represents the model, thus increasing the efficiency of design and construction of synthetic networks. An application named TinkerCell has been developed in order to serve as a CAD tool for synthetic biology. TinkerCell is a visual modeling tool that supports a hierarchy of biological parts. Each part in this hierarchy consists of a set of attributes that define the part, such as sequence or rate constants. Models that are constructed using these parts can be analyzed using various third-party C and Python programs that are hosted by TinkerCell via an extensive C and Python application programming interface (API). TinkerCell supports the notion of a module, which are networks with interfaces. Such modules can be connected to each other, forming larger modular networks. TinkerCell is a free and open-source project under the Berkeley Software Distribution license. Downloads, documentation, and tutorials are available at http://www.tinkercell.com. An ideal CAD application for engineering biological systems would provide features such as: building and simulating networks, analyzing robustness of networks, and searching databases for components that meet the design criteria. At the current state of synthetic biology, there are no established methods for measuring robustness or identifying components that fit a design. The same is true for databases of biological parts. TinkerCell's flexible modeling framework allows it to cope with changes in the field. Such changes may involve the way parts are characterized or the way synthetic networks are modeled and analyzed computationally. TinkerCell can readily accept third-party algorithms, allowing it to serve as a platform for testing different methods relevant to synthetic biology.

  20. TinkerCell: modular CAD tool for synthetic biology

    PubMed Central

    Chandran, Deepak; Bergmann, Frank T; Sauro, Herbert M

    2009-01-01

    Background Synthetic biology brings together concepts and techniques from engineering and biology. In this field, computer-aided design (CAD) is necessary in order to bridge the gap between computational modeling and biological data. Using a CAD application, it would be possible to construct models using available biological "parts" and directly generate the DNA sequence that represents the model, thus increasing the efficiency of design and construction of synthetic networks. Results An application named TinkerCell has been developed in order to serve as a CAD tool for synthetic biology. TinkerCell is a visual modeling tool that supports a hierarchy of biological parts. Each part in this hierarchy consists of a set of attributes that define the part, such as sequence or rate constants. Models that are constructed using these parts can be analyzed using various third-party C and Python programs that are hosted by TinkerCell via an extensive C and Python application programming interface (API). TinkerCell supports the notion of a module, which are networks with interfaces. Such modules can be connected to each other, forming larger modular networks. TinkerCell is a free and open-source project under the Berkeley Software Distribution license. Downloads, documentation, and tutorials are available at . Conclusion An ideal CAD application for engineering biological systems would provide features such as: building and simulating networks, analyzing robustness of networks, and searching databases for components that meet the design criteria. At the current state of synthetic biology, there are no established methods for measuring robustness or identifying components that fit a design. The same is true for databases of biological parts. TinkerCell's flexible modeling framework allows it to cope with changes in the field. Such changes may involve the way parts are characterized or the way synthetic networks are modeled and analyzed computationally. TinkerCell can readily accept third-party algorithms, allowing it to serve as a platform for testing different methods relevant to synthetic biology. PMID:19874625

  1. Biospark: scalable analysis of large numerical datasets from biological simulations and experiments using Hadoop and Spark.

    PubMed

    Klein, Max; Sharma, Rati; Bohrer, Chris H; Avelis, Cameron M; Roberts, Elijah

    2017-01-15

    Data-parallel programming techniques can dramatically decrease the time needed to analyze large datasets. While these methods have provided significant improvements for sequencing-based analyses, other areas of biological informatics have not yet adopted them. Here, we introduce Biospark, a new framework for performing data-parallel analysis on large numerical datasets. Biospark builds upon the open source Hadoop and Spark projects, bringing domain-specific features for biology. Source code is licensed under the Apache 2.0 open source license and is available at the project website: https://www.assembla.com/spaces/roberts-lab-public/wiki/Biospark CONTACT: eroberts@jhu.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  2. Simulations Meet Experiment to Reveal New Insights into DNA Intrinsic Mechanics

    PubMed Central

    Ben Imeddourene, Akli; Elbahnsi, Ahmad; Guéroult, Marc; Oguey, Christophe; Foloppe, Nicolas; Hartmann, Brigitte

    2015-01-01

    The accurate prediction of the structure and dynamics of DNA remains a major challenge in computational biology due to the dearth of precise experimental information on DNA free in solution and limitations in the DNA force-fields underpinning the simulations. A new generation of force-fields has been developed to better represent the sequence-dependent B-DNA intrinsic mechanics, in particular with respect to the BI ↔ BII backbone equilibrium, which is essential to understand the B-DNA properties. Here, the performance of MD simulations with the newly updated force-fields Parmbsc0εζOLI and CHARMM36 was tested against a large ensemble of recent NMR data collected on four DNA dodecamers involved in nucleosome positioning. We find impressive progress towards a coherent, realistic representation of B-DNA in solution, despite residual shortcomings. This improved representation allows new and deeper interpretation of the experimental observables, including regarding the behavior of facing phosphate groups in complementary dinucleotides, and their modulation by the sequence. It also provides the opportunity to extensively revisit and refine the coupling between backbone states and inter base pair parameters, which emerges as a common theme across all the complementary dinucleotides. In sum, the global agreement between simulations and experiment reveals new aspects of intrinsic DNA mechanics, a key component of DNA-protein recognition. PMID:26657165

  3. Detection of Non-hazardous, Fluorescent Ricin-B Via an Immunoassay on Simulated Plastic Wings

    DTIC Science & Technology

    2012-09-01

    biological weapon . Large quantities of the toxin could potentially wind up in the wrong hands and be armed with relative ease. This danger has spawned many...classified as a toxin), which poses as the antigen (target) molecule. A relatively dilute solution of the tagged antigen complex was aerosolized and...validates the antibody- antigen binding sequence. A test station was developed to further evaluate hazard detection capability in an experimental setup

  4. Dynamics at a Peptide-TiO2 Anatase (101) Interface.

    PubMed

    Polimeni, Marco; Petridis, Loukas; Smith, Jeremy C; Arcangeli, Caterina

    2017-09-28

    The interface between biological matter and inorganic materials is a widely investigated research topic due to possible applications in biomedicine and nanotechnology. In this context, the molecular level adsorption mechanism that drives specific recognition between small peptide sequences and inorganic surfaces represents an important topic likely to provide much information useful for designing bioderived materials. Here, we investigate the dynamics at the interface between a Ti-binding peptide sequence (AMRKLPDAPGMHC) and a TiO 2 anatase surface by using molecular dynamics (MD) simulations. In the simulations the adsorption mechanism is characterized by diffusion of the peptide from the bulk water phase toward the TiO 2 surface, followed by the anchoring of the peptide to the surface. The anchoring is mediated by the interfacial water layers by means of the charged groups of the side chains of the peptide. The peptide samples anchored and dissociated states from the surface and its conformation is not affected by the surface when anchored.

  5. Dynamics at a Peptide–TiO 2 Anatase (101) Interface

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Polimeni, Marco; Petridis, Loukas; Smith, Jeremy C.

    The interface between biological matter and inorganic materials is a widely investigated research topic due to possible applications in biomedicine and nanotechnology. In this context, the molecular level adsorption mechanism that drives specific recognition between small peptide sequences and inorganic surfaces represents an important topic likely to provide much information useful for designing bioderived materials. In this paper, we investigate the dynamics at the interface between a Ti-binding peptide sequence (AMRKLPDAPGMHC) and a TiO 2 anatase surface by using molecular dynamics (MD) simulations. In the simulations the adsorption mechanism is characterized by diffusion of the peptide from the bulk watermore » phase toward the TiO 2 surface, followed by the anchoring of the peptide to the surface. The anchoring is mediated by the interfacial water layers by means of the charged groups of the side chains of the peptide. Finally, the peptide samples anchored and dissociated states from the surface and its conformation is not affected by the surface when anchored.« less

  6. Dynamics at a Peptide–TiO 2 Anatase (101) Interface

    DOE PAGES

    Polimeni, Marco; Petridis, Loukas; Smith, Jeremy C.; ...

    2017-08-29

    The interface between biological matter and inorganic materials is a widely investigated research topic due to possible applications in biomedicine and nanotechnology. In this context, the molecular level adsorption mechanism that drives specific recognition between small peptide sequences and inorganic surfaces represents an important topic likely to provide much information useful for designing bioderived materials. In this paper, we investigate the dynamics at the interface between a Ti-binding peptide sequence (AMRKLPDAPGMHC) and a TiO 2 anatase surface by using molecular dynamics (MD) simulations. In the simulations the adsorption mechanism is characterized by diffusion of the peptide from the bulk watermore » phase toward the TiO 2 surface, followed by the anchoring of the peptide to the surface. The anchoring is mediated by the interfacial water layers by means of the charged groups of the side chains of the peptide. Finally, the peptide samples anchored and dissociated states from the surface and its conformation is not affected by the surface when anchored.« less

  7. A confidence interval analysis of sampling effort, sequencing depth, and taxonomic resolution of fungal community ecology in the era of high-throughput sequencing.

    PubMed

    Oono, Ryoko

    2017-01-01

    High-throughput sequencing technology has helped microbial community ecologists explore ecological and evolutionary patterns at unprecedented scales. The benefits of a large sample size still typically outweigh that of greater sequencing depths per sample for accurate estimations of ecological inferences. However, excluding or not sequencing rare taxa may mislead the answers to the questions 'how and why are communities different?' This study evaluates the confidence intervals of ecological inferences from high-throughput sequencing data of foliar fungal endophytes as case studies through a range of sampling efforts, sequencing depths, and taxonomic resolutions to understand how technical and analytical practices may affect our interpretations. Increasing sampling size reliably decreased confidence intervals across multiple community comparisons. However, the effects of sequencing depths on confidence intervals depended on how rare taxa influenced the dissimilarity estimates among communities and did not significantly decrease confidence intervals for all community comparisons. A comparison of simulated communities under random drift suggests that sequencing depths are important in estimating dissimilarities between microbial communities under neutral selective processes. Confidence interval analyses reveal important biases as well as biological trends in microbial community studies that otherwise may be ignored when communities are only compared for statistically significant differences.

  8. Design of Protein Multi-specificity Using an Independent Sequence Search Reduces the Barrier to Low Energy Sequences

    PubMed Central

    Sevy, Alexander M.; Jacobs, Tim M.; Crowe, James E.; Meiler, Jens

    2015-01-01

    Computational protein design has found great success in engineering proteins for thermodynamic stability, binding specificity, or enzymatic activity in a ‘single state’ design (SSD) paradigm. Multi-specificity design (MSD), on the other hand, involves considering the stability of multiple protein states simultaneously. We have developed a novel MSD algorithm, which we refer to as REstrained CONvergence in multi-specificity design (RECON). The algorithm allows each state to adopt its own sequence throughout the design process rather than enforcing a single sequence on all states. Convergence to a single sequence is encouraged through an incrementally increasing convergence restraint for corresponding positions. Compared to MSD algorithms that enforce (constrain) an identical sequence on all states the energy landscape is simplified, which accelerates the search drastically. As a result, RECON can readily be used in simulations with a flexible protein backbone. We have benchmarked RECON on two design tasks. First, we designed antibodies derived from a common germline gene against their diverse targets to assess recovery of the germline, polyspecific sequence. Second, we design “promiscuous”, polyspecific proteins against all binding partners and measure recovery of the native sequence. We show that RECON is able to efficiently recover native-like, biologically relevant sequences in this diverse set of protein complexes. PMID:26147100

  9. A confidence interval analysis of sampling effort, sequencing depth, and taxonomic resolution of fungal community ecology in the era of high-throughput sequencing

    PubMed Central

    2017-01-01

    High-throughput sequencing technology has helped microbial community ecologists explore ecological and evolutionary patterns at unprecedented scales. The benefits of a large sample size still typically outweigh that of greater sequencing depths per sample for accurate estimations of ecological inferences. However, excluding or not sequencing rare taxa may mislead the answers to the questions ‘how and why are communities different?’ This study evaluates the confidence intervals of ecological inferences from high-throughput sequencing data of foliar fungal endophytes as case studies through a range of sampling efforts, sequencing depths, and taxonomic resolutions to understand how technical and analytical practices may affect our interpretations. Increasing sampling size reliably decreased confidence intervals across multiple community comparisons. However, the effects of sequencing depths on confidence intervals depended on how rare taxa influenced the dissimilarity estimates among communities and did not significantly decrease confidence intervals for all community comparisons. A comparison of simulated communities under random drift suggests that sequencing depths are important in estimating dissimilarities between microbial communities under neutral selective processes. Confidence interval analyses reveal important biases as well as biological trends in microbial community studies that otherwise may be ignored when communities are only compared for statistically significant differences. PMID:29253889

  10. SeqCompress: an algorithm for biological sequence compression.

    PubMed

    Sardaraz, Muhammad; Tahir, Muhammad; Ikram, Ataul Aziz; Bajwa, Hassan

    2014-10-01

    The growth of Next Generation Sequencing technologies presents significant research challenges, specifically to design bioinformatics tools that handle massive amount of data efficiently. Biological sequence data storage cost has become a noticeable proportion of total cost in the generation and analysis. Particularly increase in DNA sequencing rate is significantly outstripping the rate of increase in disk storage capacity, which may go beyond the limit of storage capacity. It is essential to develop algorithms that handle large data sets via better memory management. This article presents a DNA sequence compression algorithm SeqCompress that copes with the space complexity of biological sequences. The algorithm is based on lossless data compression and uses statistical model as well as arithmetic coding to compress DNA sequences. The proposed algorithm is compared with recent specialized compression tools for biological sequences. Experimental results show that proposed algorithm has better compression gain as compared to other existing algorithms. Copyright © 2014 Elsevier Inc. All rights reserved.

  11. Probabilities and predictions: modeling the development of scientific problem-solving skills.

    PubMed

    Stevens, Ron; Johnson, David F; Soller, Amy

    2005-01-01

    The IMMEX (Interactive Multi-Media Exercises) Web-based problem set platform enables the online delivery of complex, multimedia simulations, the rapid collection of student performance data, and has already been used in several genetic simulations. The next step is the use of these data to understand and improve student learning in a formative manner. This article describes the development of probabilistic models of undergraduate student problem solving in molecular genetics that detailed the spectrum of strategies students used when problem solving, and how the strategic approaches evolved with experience. The actions of 776 university sophomore biology majors from three molecular biology lecture courses were recorded and analyzed. Each of six simulations were first grouped by artificial neural network clustering to provide individual performance measures, and then sequences of these performances were probabilistically modeled by hidden Markov modeling to provide measures of progress. The models showed that students with different initial problem-solving abilities choose different strategies. Initial and final strategies varied across different sections of the same course and were not strongly correlated with other achievement measures. In contrast to previous studies, we observed no significant gender differences. We suggest that instructor interventions based on early student performances with these simulations may assist students to recognize effective and efficient problem-solving strategies and enhance learning.

  12. Sequence-specific backbone resonance assignments and microsecond timescale molecular dynamics simulation of human eosinophil-derived neurotoxin.

    PubMed

    Gagné, Donald; Narayanan, Chitra; Bafna, Khushboo; Charest, Laurie-Anne; Agarwal, Pratul K; Doucet, Nicolas

    2017-10-01

    Eight active canonical members of the pancreatic-like ribonuclease A (RNase A) superfamily have been identified in human. All structural homologs share similar RNA-degrading functions, while also cumulating other various biological activities in different tissues. The functional homologs eosinophil-derived neurotoxin (EDN, or RNase 2) and eosinophil cationic protein (ECP, or RNase 3) are known to be expressed and secreted by eosinophils in response to infection, and have thus been postulated to play an important role in host defense and inflammatory response. We recently initiated the biophysical and dynamical investigation of several vertebrate RNase homologs and observed that clustering residue dynamics appear to be linked with the phylogeny and biological specificity of several members. Here we report the 1 H, 13 C and 15 N backbone resonance assignments of human EDN (RNase 2) and its molecular dynamics simulation on the microsecond timescale, providing means to pursue this comparative atomic-scale functional and dynamical analysis by NMR and computation over multiple time frames.

  13. Quantum annealing versus classical machine learning applied to a simplified computational biology problem

    NASA Astrophysics Data System (ADS)

    Li, Richard Y.; Di Felice, Rosa; Rohs, Remo; Lidar, Daniel A.

    2018-03-01

    Transcription factors regulate gene expression, but how these proteins recognize and specifically bind to their DNA targets is still debated. Machine learning models are effective means to reveal interaction mechanisms. Here we studied the ability of a quantum machine learning approach to classify and rank binding affinities. Using simplified data sets of a small number of DNA sequences derived from actual binding affinity experiments, we trained a commercially available quantum annealer to classify and rank transcription factor binding. The results were compared to state-of-the-art classical approaches for the same simplified data sets, including simulated annealing, simulated quantum annealing, multiple linear regression, LASSO, and extreme gradient boosting. Despite technological limitations, we find a slight advantage in classification performance and nearly equal ranking performance using the quantum annealer for these fairly small training data sets. Thus, we propose that quantum annealing might be an effective method to implement machine learning for certain computational biology problems.

  14. Universality of long-range correlations in expansion randomization systems

    NASA Astrophysics Data System (ADS)

    Messer, P. W.; Lässig, M.; Arndt, P. F.

    2005-10-01

    We study the stochastic dynamics of sequences evolving by single-site mutations, segmental duplications, deletions, and random insertions. These processes are relevant for the evolution of genomic DNA. They define a universality class of non-equilibrium 1D expansion-randomization systems with generic stationary long-range correlations in a regime of growing sequence length. We obtain explicitly the two-point correlation function of the sequence composition and the distribution function of the composition bias in sequences of finite length. The characteristic exponent χ of these quantities is determined by the ratio of two effective rates, which are explicitly calculated for several specific sequence evolution dynamics of the universality class. Depending on the value of χ, we find two different scaling regimes, which are distinguished by the detectability of the initial composition bias. All analytic results are accurately verified by numerical simulations. We also discuss the non-stationary build-up and decay of correlations, as well as more complex evolutionary scenarios, where the rates of the processes vary in time. Our findings provide a possible example for the emergence of universality in molecular biology.

  15. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vassilevska, Tanya

    This is the first code, designed to run on a desktop, which models the intracellular replication and the cell-to-cell infection and demonstrates virus evolution at the molecular level. This code simulates the infection of a population of "idealized biological cells" (represented as objects that do not divide or have metabolism) with "virus" (represented by its genetic sequence), the replication and simultaneous mutation of the virus which leads to evolution of the population of genetically diverse viruses. The code is built to simulate single-stranded RNA viruses. The input for the code is 1. the number of biological cells in the culture,more » 2. the initial composition of the virus population, 3. the reference genome of the RNA virus, 4. the coordinates of the genome regions and their significance and, 5. parameters determining the dynamics of virus replication, such as the mutation rate. The simulation ends when all cells have been infected or when no more infections occurs after a given number of attempts. The code has the ability to simulate the evolution of the virus in serial passage of cell "cultures", i.e. after the end of a simulation, a new one is immediately scheduled with a new culture of infected cells. The code outputs characteristics of the resulting virus population dynamics and genetic composition of the virus population, such as the top dominant genomes, percentage of a genome with specific characteristics.« less

  16. Detection and interrogation of biomolecules via nanoscale probes: From fundamental physics to DNA sequencing

    NASA Astrophysics Data System (ADS)

    Zwolak, Michael

    2013-03-01

    A rapid and low-cost method to sequence DNA would revolutionize personalized medicine, where genetic information is used to diagnose, treat, and prevent diseases. There is a longstanding interest in nanopores as a platform for rapid interrogation of single DNA molecules. I will discuss a sequencing protocol based on the measurement of transverse electronic currents during the translocation of single-stranded DNA through nanopores. Using molecular dynamics simulations coupled to quantum mechanical calculations of the tunneling current, I will show that the DNA nucleotides are predicted to have distinguishable electronic signatures in experimentally realizable systems. Several recent experiments support our theoretical predictions. In addition to their possible impact in medicine and biology, the above methods offer ideal test beds to study open scientific issues in the relatively unexplored area at the interface between solids, liquids, and biomolecules at the nanometer length scale. http://mike.zwolak.org

  17. Reconstructing metastatic seeding patterns of human cancers

    PubMed Central

    Reiter, Johannes G.; Makohon-Moore, Alvin P.; Gerold, Jeffrey M.; Bozic, Ivana; Chatterjee, Krishnendu; Iacobuzio-Donahue, Christine A.; Vogelstein, Bert; Nowak, Martin A.

    2017-01-01

    Reconstructing the evolutionary history of metastases is critical for understanding their basic biological principles and has profound clinical implications. Genome-wide sequencing data has enabled modern phylogenomic methods to accurately dissect subclones and their phylogenies from noisy and impure bulk tumour samples at unprecedented depth. However, existing methods are not designed to infer metastatic seeding patterns. Here we develop a tool, called Treeomics, to reconstruct the phylogeny of metastases and map subclones to their anatomic locations. Treeomics infers comprehensive seeding patterns for pancreatic, ovarian, and prostate cancers. Moreover, Treeomics correctly disambiguates true seeding patterns from sequencing artifacts; 7% of variants were misclassified by conventional statistical methods. These artifacts can skew phylogenies by creating illusory tumour heterogeneity among distinct samples. In silico benchmarking on simulated tumour phylogenies across a wide range of sample purities (15–95%) and sequencing depths (25-800 × ) demonstrates the accuracy of Treeomics compared with existing methods. PMID:28139641

  18. An improved stochastic fractal search algorithm for 3D protein structure prediction.

    PubMed

    Zhou, Changjun; Sun, Chuan; Wang, Bin; Wang, Xiaojun

    2018-05-03

    Protein structure prediction (PSP) is a significant area for biological information research, disease treatment, and drug development and so on. In this paper, three-dimensional structures of proteins are predicted based on the known amino acid sequences, and the structure prediction problem is transformed into a typical NP problem by an AB off-lattice model. This work applies a novel improved Stochastic Fractal Search algorithm (ISFS) to solve the problem. The Stochastic Fractal Search algorithm (SFS) is an effective evolutionary algorithm that performs well in exploring the search space but falls into local minimums sometimes. In order to avoid the weakness, Lvy flight and internal feedback information are introduced in ISFS. In the experimental process, simulations are conducted by ISFS algorithm on Fibonacci sequences and real peptide sequences. Experimental results prove that the ISFS performs more efficiently and robust in terms of finding the global minimum and avoiding getting stuck in local minimums.

  19. Application of multi-objective optimization to pooled experiments of next generation sequencing for detection of rare mutations.

    PubMed

    Zilinskas, Julius; Lančinskas, Algirdas; Guarracino, Mario Rosario

    2014-01-01

    In this paper we propose some mathematical models to plan a Next Generation Sequencing experiment to detect rare mutations in pools of patients. A mathematical optimization problem is formulated for optimal pooling, with respect to minimization of the experiment cost. Then, two different strategies to replicate patients in pools are proposed, which have the advantage to decrease the overall costs. Finally, a multi-objective optimization formulation is proposed, where the trade-off between the probability to detect a mutation and overall costs is taken into account. The proposed solutions are devised in pursuance of the following advantages: (i) the solution guarantees mutations are detectable in the experimental setting, and (ii) the cost of the NGS experiment and its biological validation using Sanger sequencing is minimized. Simulations show replicating pools can decrease overall experimental cost, thus making pooling an interesting option.

  20. Structure and specificity of the RNA-guided endonuclease Cas9 during DNA interrogation, target binding and cleavage

    PubMed Central

    Josephs, Eric A.; Kocak, D. Dewran; Fitzgibbon, Christopher J.; McMenemy, Joshua; Gersbach, Charles A.; Marszalek, Piotr E.

    2015-01-01

    CRISPR-associated endonuclease Cas9 cuts DNA at variable target sites designated by a Cas9-bound RNA molecule. Cas9's ability to be directed by single ‘guide RNA’ molecules to target nearly any sequence has been recently exploited for a number of emerging biological and medical applications. Therefore, understanding the nature of Cas9's off-target activity is of paramount importance for its practical use. Using atomic force microscopy (AFM), we directly resolve individual Cas9 and nuclease-inactive dCas9 proteins as they bind along engineered DNA substrates. High-resolution imaging allows us to determine their relative propensities to bind with different guide RNA variants to targeted or off-target sequences. Mapping the structural properties of Cas9 and dCas9 to their respective binding sites reveals a progressive conformational transformation at DNA sites with increasing sequence similarity to its target. With kinetic Monte Carlo (KMC) simulations, these results provide evidence of a ‘conformational gating’ mechanism driven by the interactions between the guide RNA and the 14th–17th nucleotide region of the targeted DNA, the stabilities of which we find correlate significantly with reported off-target cleavage rates. KMC simulations also reveal potential methodologies to engineer guide RNA sequences with improved specificity by considering the invasion of guide RNAs into targeted DNA duplex. PMID:26384421

  1. Cost-Effectiveness of Treatment Sequences of Chemotherapies and Targeted Biologics for Elderly Metastatic Colorectal Cancer Patients.

    PubMed

    Parikh, Rohan C; Du, Xianglin L; Robert, Morgan O; Lairson, David R

    2017-01-01

    Treatment patterns for metastatic colorectal cancer (mCRC) patients have changed considerably over the last decade with the introduction of new chemotherapies and targeted biologics. These treatments are often administered in various sequences with limited evidence regarding their cost-effectiveness. To conduct a pharmacoeconomic evaluation of commonly administered treatment sequences among elderly mCRC patients. A probabilistic discrete event simulation model assuming Weibull distribution was developed to evaluate the cost-effectiveness of the following common treatment sequences: (a) first-line oxaliplatin/irinotecan followed by second-line oxaliplatin/irinotecan + bevacizumab (OI-OIB); (b) first-line oxaliplatin/irinotecan + bevacizumab followed by second-line oxaliplatin/irinotecan + bevacizumab (OIB-OIB); (c) OI-OIB followed by a third-line targeted biologic (OI-OIB-TB); and (d) OIB-OIB followed by a third-line targeted biologic (OIB-OIB-TB). Input parameters for the model were primarily obtained from the Surveillance, Epidemiology, and End Results-Medicare linked dataset for incident mCRC patients aged 65 years and older diagnosed from January 2004 through December 2009. A probabilistic sensitivity analysis was performed to account for parameter uncertainty. Costs (2014 U.S. dollars) and effectiveness were discounted at an annual rate of 3%. In the base case analyses, at the willingness-to-pay (WTP) threshold of $100,000/quality-adjusted life-year (QALY) gained, the treatment sequence OIB-OIB (vs. OI-OIB) was not cost-effective with an incremental cost-effectiveness ratio (ICER) per patient of $119,007/QALY; OI-OIB-TB (vs. OIB-OIB) was dominated; and OIB-OIB-TB (vs. OIB-OIB) was not cost-effective with an ICER of $405,857/QALY. Results similar to the base case analysis were obtained assuming log-normal distribution. Cost-effectiveness acceptability curves derived from a probabilistic sensitivity analysis showed that at a WTP of $100,000/QALY gained, sequence OI-OIB was 34% cost-effective, followed by OIB-OIB (31%), OI-OIB-TB (20%), and OIB-OIB-TB (15%). Overall, survival increases marginally with the addition of targeted biologics, such as bevacizumab, at first line and third line at substantial costs. Treatment sequences with bevacizumab at first line and targeted biologics at third line may not be cost-effective at the commonly used threshold of $100,000/QALY gained, but a marginal decrease in the cost of bevacizumab may make treatment sequences with first-line bevacizumab cost-effective. Future economic evaluations should validate the study results using parameters from ongoing clinical trials. This study was supported in part by a grant from the Agency for Healthcare Research and Quality (R01-HS018956) and in part by a grant from the Cancer Prevention and Research Institute of Texas (RP130051), which were obtained by Du. The authors report no conflicts of interest. Study concept and design were primarily contributed by Parikh, along with the other authors. All authors participated in data collection, and Parikh took the lead in data interpretation and analysis, along with Lairson and Morgan, with assistance from Du. The manuscript was written primarily by Parikh, along with Lairson, Morgan, and Du, and revised by Parikh.

  2. DRME: Count-based differential RNA methylation analysis at small sample size scenario.

    PubMed

    Liu, Lian; Zhang, Shao-Wu; Gao, Fan; Zhang, Yixin; Huang, Yufei; Chen, Runsheng; Meng, Jia

    2016-04-15

    Differential methylation, which concerns difference in the degree of epigenetic regulation via methylation between two conditions, has been formulated as a beta or beta-binomial distribution to address the within-group biological variability in sequencing data. However, a beta or beta-binomial model is usually difficult to infer at small sample size scenario with discrete reads count in sequencing data. On the other hand, as an emerging research field, RNA methylation has drawn more and more attention recently, and the differential analysis of RNA methylation is significantly different from that of DNA methylation due to the impact of transcriptional regulation. We developed DRME to better address the differential RNA methylation problem. The proposed model can effectively describe within-group biological variability at small sample size scenario and handles the impact of transcriptional regulation on RNA methylation. We tested the newly developed DRME algorithm on simulated and 4 MeRIP-Seq case-control studies and compared it with Fisher's exact test. It is in principle widely applicable to several other RNA-related data types as well, including RNA Bisulfite sequencing and PAR-CLIP. The code together with an MeRIP-Seq dataset is available online (https://github.com/lzcyzm/DRME) for evaluation and reproduction of the figures shown in this article. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. MitoGen: A Framework for Generating 3D Synthetic Time-Lapse Sequences of Cell Populations in Fluorescence Microscopy.

    PubMed

    Svoboda, David; Ulman, Vladimir

    2017-01-01

    The proper analysis of biological microscopy images is an important and complex task. Therefore, it requires verification of all steps involved in the process, including image segmentation and tracking algorithms. It is generally better to verify algorithms with computer-generated ground truth datasets, which, compared to manually annotated data, nowadays have reached high quality and can be produced in large quantities even for 3D time-lapse image sequences. Here, we propose a novel framework, called MitoGen, which is capable of generating ground truth datasets with fully 3D time-lapse sequences of synthetic fluorescence-stained cell populations. MitoGen shows biologically justified cell motility, shape and texture changes as well as cell divisions. Standard fluorescence microscopy phenomena such as photobleaching, blur with real point spread function (PSF), and several types of noise, are simulated to obtain realistic images. The MitoGen framework is scalable in both space and time. MitoGen generates visually plausible data that shows good agreement with real data in terms of image descriptors and mean square displacement (MSD) trajectory analysis. Additionally, it is also shown in this paper that four publicly available segmentation and tracking algorithms exhibit similar performance on both real and MitoGen-generated data. The implementation of MitoGen is freely available.

  4. Using Genotype Abundance to Improve Phylogenetic Inference

    PubMed Central

    Mesin, Luka; Victora, Gabriel D; Minin, Vladimir N; Matsen, Frederick A

    2018-01-01

    Abstract Modern biological techniques enable very dense genetic sampling of unfolding evolutionary histories, and thus frequently sample some genotypes multiple times. This motivates strategies to incorporate genotype abundance information in phylogenetic inference. In this article, we synthesize a stochastic process model with standard sequence-based phylogenetic optimality, and show that tree estimation is substantially improved by doing so. Our method is validated with extensive simulations and an experimental single-cell lineage tracing study of germinal center B cell receptor affinity maturation. PMID:29474671

  5. Single-cell sequencing technologies: current and future.

    PubMed

    Liang, Jialong; Cai, Wanshi; Sun, Zhongsheng

    2014-10-20

    Intensively developed in the last few years, single-cell sequencing technologies now present numerous advantages over traditional sequencing methods for solving the problems of biological heterogeneity and low quantities of available biological materials. The application of single-cell sequencing technologies has profoundly changed our understanding of a series of biological phenomena, including gene transcription, embryo development, and carcinogenesis. However, before single-cell sequencing technologies can be used extensively, researchers face the serious challenge of overcoming inherent issues of high amplification bias, low accuracy and reproducibility. Here, we simply summarize the techniques used for single-cell isolation, and review the current technologies used in single-cell genomic, transcriptomic, and epigenomic sequencing. We discuss the merits, defects, and scope of application of single-cell sequencing technologies and then speculate on the direction of future developments. Copyright © 2014 Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, and Genetics Society of China. Published by Elsevier Ltd. All rights reserved.

  6. Bridging the physical scales in evolutionary biology: From protein sequence space to fitness of organisms and populations

    PubMed Central

    Bershtein, Shimon; Serohijos, Adrian W.R.; Shakhnovich, Eugene I.

    2016-01-01

    Bridging the gap between the molecular properties of proteins and organismal/population fitness is essential for understanding evolutionary processes. This task requires the integration of the several physical scales of biological organization, each defined by a distinct set of mechanisms and constraints, into a single unifying model. The molecular scale is dominated by the constraints imposed by the physico-chemical properties of proteins and their substrates, which give rise to trade-offs and epistatic (non-additive) effects of mutations. At the systems scale, biological networks modulate protein expression and can either buffer or enhance the fitness effects of mutations. The population scale is influenced by the mutational input, selection regimes, and stochastic changes affecting the size and structure of populations, which eventually determine the evolutionary fate of mutations. Here, we summarize the recent advances in theory, computer simulations, and experiments that advance our understanding of the links between various physical scales in biology. PMID:27810574

  7. Bridging the physical scales in evolutionary biology: from protein sequence space to fitness of organisms and populations.

    PubMed

    Bershtein, Shimon; Serohijos, Adrian Wr; Shakhnovich, Eugene I

    2017-02-01

    Bridging the gap between the molecular properties of proteins and organismal/population fitness is essential for understanding evolutionary processes. This task requires the integration of the several physical scales of biological organization, each defined by a distinct set of mechanisms and constraints, into a single unifying model. The molecular scale is dominated by the constraints imposed by the physico-chemical properties of proteins and their substrates, which give rise to trade-offs and epistatic (non-additive) effects of mutations. At the systems scale, biological networks modulate protein expression and can either buffer or enhance the fitness effects of mutations. The population scale is influenced by the mutational input, selection regimes, and stochastic changes affecting the size and structure of populations, which eventually determine the evolutionary fate of mutations. Here, we summarize the recent advances in theory, computer simulations, and experiments that advance our understanding of the links between various physical scales in biology. Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. Polyester: simulating RNA-seq datasets with differential transcript expression.

    PubMed

    Frazee, Alyssa C; Jaffe, Andrew E; Langmead, Ben; Leek, Jeffrey T

    2015-09-01

    Statistical methods development for differential expression analysis of RNA sequencing (RNA-seq) requires software tools to assess accuracy and error rate control. Since true differential expression status is often unknown in experimental datasets, artificially constructed datasets must be utilized, either by generating costly spike-in experiments or by simulating RNA-seq data. Polyester is an R package designed to simulate RNA-seq data, beginning with an experimental design and ending with collections of RNA-seq reads. Its main advantage is the ability to simulate reads indicating isoform-level differential expression across biological replicates for a variety of experimental designs. Data generated by Polyester is a reasonable approximation to real RNA-seq data and standard differential expression workflows can recover differential expression set in the simulation by the user. Polyester is freely available from Bioconductor (http://bioconductor.org/). jtleek@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  9. Elongation Factor-Tu (EF-Tu) proteins structural stability and bioinformatics in ancestral gene reconstruction

    NASA Astrophysics Data System (ADS)

    Dehipawala, Sunil; Nguyen, A.; Tremberger, G.; Cheung, E.; Schneider, P.; Lieberman, D.; Holden, T.; Cheung, T.

    2013-09-01

    A paleo-experimental evolution report on elongation factor EF-Tu structural stability results has provided an opportunity to rewind the tape of life using the ancestral protein sequence reconstruction modeling approach; consistent with the book of life dogma in current biology and being an important component in the astrobiology community. Fractal dimension via the Higuchi fractal method and Shannon entropy of the DNA sequence classification could be used in a diagram that serves as a simple summary. Results from biomedical gene research provide examples on the diagram methodology. Comparisons between biomedical genes such as EEF2 (elongation factor 2 human, mouse, etc), WDR85 in epigenetics, HAR1 in human specificity, DLG1 in cognitive skill, and HLA-C in mosquito bite immunology with EF Tu DNA sequences have accounted for the reported circular dichroism thermo-stability data systematically; the results also infer a relatively less volatility geologic time period from 2 to 3 Gyr from adaptation viewpoint. Comparison to Thermotoga maritima MSB8 and Psychrobacter shows that Thermus thermophilus HB8 EF-Tu calibration sequence could be an outlier, consistent with free energy calculation by NUPACK. Diagram methodology allows computer simulation studies and HAR1 shows about 0.5% probability from chimp to human in terms of diagram location, and SNP simulation results such as amoebic meningoencephalitis NAF1 suggest correlation. Extensions to the studies of the translation and transcription elongation factor sequences in Megavirus Chiliensis, Megavirus Lba and Pandoravirus show that the studied Pandoravirus sequence could be an outlier with the highest fractal dimension and lowest entropy, as compared to chicken as a deviant in the DNMT3A DNA methylation gene sequences from zebrafish to human and to the less than one percent probability in computer simulation using the HAR1 0.5% probability as reference. The diagram methodology would be useful in ancestral gene reconstruction studies in astrobiology and also be applicable to the study of point mutation in conformational thermostabilization research with Synchrotron based X-ray data for drug applications such as Parkinson's disease.

  10. Transport of Proteins through Nanopores

    NASA Astrophysics Data System (ADS)

    Luan, Binquan

    In biological cells, a malfunctioned protein (such as misfolded or damaged) is degraded by a protease in which an unfoldase actively drags the protein into a nanopore-like structure and then a peptidase cuts the linearized protein into small fragments (i.e. a recycling process). Mimicking this biological process, many experimental studies have focused on the transport of proteins through a biological protein pore or a synthetic solid-state nanopore. Potentially, the nanopore-based sensors can provide a platform for interrogating proteins that might be disease-related or be targeted by a new drug molecule. The single-profile of a protein chain inside an extremely small nanopore might even permit the sequencing of the protein. Here, through all-atom molecular dynamics simulations, I will show various types of protein transport through a nanopore and reveal the nanoscale mechanics/energetics that plays an important role governing the protein transport.

  11. Large-Scale Simulations of Plastic Neural Networks on Neuromorphic Hardware

    PubMed Central

    Knight, James C.; Tully, Philip J.; Kaplan, Bernhard A.; Lansner, Anders; Furber, Steve B.

    2016-01-01

    SpiNNaker is a digital, neuromorphic architecture designed for simulating large-scale spiking neural networks at speeds close to biological real-time. Rather than using bespoke analog or digital hardware, the basic computational unit of a SpiNNaker system is a general-purpose ARM processor, allowing it to be programmed to simulate a wide variety of neuron and synapse models. This flexibility is particularly valuable in the study of biological plasticity phenomena. A recently proposed learning rule based on the Bayesian Confidence Propagation Neural Network (BCPNN) paradigm offers a generic framework for modeling the interaction of different plasticity mechanisms using spiking neurons. However, it can be computationally expensive to simulate large networks with BCPNN learning since it requires multiple state variables for each synapse, each of which needs to be updated every simulation time-step. We discuss the trade-offs in efficiency and accuracy involved in developing an event-based BCPNN implementation for SpiNNaker based on an analytical solution to the BCPNN equations, and detail the steps taken to fit this within the limited computational and memory resources of the SpiNNaker architecture. We demonstrate this learning rule by learning temporal sequences of neural activity within a recurrent attractor network which we simulate at scales of up to 2.0 × 104 neurons and 5.1 × 107 plastic synapses: the largest plastic neural network ever to be simulated on neuromorphic hardware. We also run a comparable simulation on a Cray XC-30 supercomputer system and find that, if it is to match the run-time of our SpiNNaker simulation, the super computer system uses approximately 45× more power. This suggests that cheaper, more power efficient neuromorphic systems are becoming useful discovery tools in the study of plasticity in large-scale brain models. PMID:27092061

  12. Equilibrium point control of a monkey arm simulator by a fast learning tree structured artificial neural network.

    PubMed

    Dornay, M; Sanger, T D

    1993-01-01

    A planar 17 muscle model of the monkey's arm based on realistic biomechanical measurements was simulated on a Symbolics Lisp Machine. The simulator implements the equilibrium point hypothesis for the control of arm movements. Given initial and final desired positions, it generates a minimum-jerk desired trajectory of the hand and uses the backdriving algorithm to determine an appropriate sequence of motor commands to the muscles (Flash 1987; Mussa-Ivaldi et al. 1991; Dornay 1991b). These motor commands specify a temporal sequence of stable (attractive) equilibrium positions which lead to the desired hand movement. A strong disadvantage of the simulator is that it has no memory of previous computations. Determining the desired trajectory using the minimum-jerk model is instantaneous, but the laborious backdriving algorithm is slow, and can take up to one hour for some trajectories. The complexity of the required computations makes it a poor model for biological motor control. We propose a computationally simpler and more biologically plausible method for control which achieves the benefits of the backdriving algorithm. A fast learning, tree-structured network (Sanger 1991c) was trained to remember the knowledge obtained by the backdriving algorithm. The neural network learned the nonlinear mapping from a 2-dimensional cartesian planar hand position (x,y) to a 17-dimensional motor command space (u1, . . ., u17). Learning 20 training trajectories, each composed of 26 sample points [[x,y], [u1, . . ., u17] took only 20 min on a Sun-4 Sparc workstation. After the learning stage, new, untrained test trajectories as well as the original trajectories of the hand were given to the neural network as input. The network calculated the required motor commands for these movements. The resulting movements were close to the desired ones for both the training and test cases.

  13. diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data.

    PubMed

    Lun, Aaron T L; Smyth, Gordon K

    2015-08-19

    Chromatin conformation capture with high-throughput sequencing (Hi-C) is a technique that measures the in vivo intensity of interactions between all pairs of loci in the genome. Most conventional analyses of Hi-C data focus on the detection of statistically significant interactions. However, an alternative strategy involves identifying significant changes in the interaction intensity (i.e., differential interactions) between two or more biological conditions. This is more statistically rigorous and may provide more biologically relevant results. Here, we present the diffHic software package for the detection of differential interactions from Hi-C data. diffHic provides methods for read pair alignment and processing, counting into bin pairs, filtering out low-abundance events and normalization of trended or CNV-driven biases. It uses the statistical framework of the edgeR package to model biological variability and to test for significant differences between conditions. Several options for the visualization of results are also included. The use of diffHic is demonstrated with real Hi-C data sets. Performance against existing methods is also evaluated with simulated data. On real data, diffHic is able to successfully detect interactions with significant differences in intensity between biological conditions. It also compares favourably to existing software tools on simulated data sets. These results suggest that diffHic is a viable approach for differential analyses of Hi-C data.

  14. An investigation of the impact of science course sequencing on student performance in high school science and math

    NASA Astrophysics Data System (ADS)

    Mary, Michael Todd

    High school students in the United States for the past century have typically taken science courses in a sequence of biology followed by chemistry and concluding with physics. An alternative sequence, typically referred to as "physics first" inverts the traditional sequence by having students begin with physics and end with biology. Proponents of physics first cite advances in biological sciences that have dramatically changed the nature of high school biology and the potential benefit to student learning in math that would accompany taking an algebra-based physics course in the early years of high school to support changing the sequence. Using a quasi-experimental, quantitative research design, the purpose of this study was to investigate the impact of science course sequencing on student achievement in math and science at a school district that offered both course sequences. The Texas state end-of-course exams in biology, chemistry, physics, algebra I and geometry were used as the instruments measuring student achievement in math and science at the end of each academic year. Various statistical models were used to analyze these achievement data. The conclusion was, for students in this study, the sequence in which students took biology, chemistry, and physics had little or no impact on performance on the end-of-course assessments in each of these courses. Additionally there was only a minimal effect found with respect to math performance, leading to the conclusion that neither the traditional or "physics first" science course sequence presented an advantage for student achievement in math or science.

  15. Next-Generation Sequencing Platforms

    NASA Astrophysics Data System (ADS)

    Mardis, Elaine R.

    2013-06-01

    Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.

  16. Using cellular automata to generate image representation for biological sequences.

    PubMed

    Xiao, X; Shao, S; Ding, Y; Huang, Z; Chen, X; Chou, K-C

    2005-02-01

    A novel approach to visualize biological sequences is developed based on cellular automata (Wolfram, S. Nature 1984, 311, 419-424), a set of discrete dynamical systems in which space and time are discrete. By transforming the symbolic sequence codes into the digital codes, and using some optimal space-time evolvement rules of cellular automata, a biological sequence can be represented by a unique image, the so-called cellular automata image. Many important features, which are originally hidden in a long and complicated biological sequence, can be clearly revealed thru its cellular automata image. With biological sequences entering into databanks rapidly increasing in the post-genomic era, it is anticipated that the cellular automata image will become a very useful vehicle for investigation into their key features, identification of their function, as well as revelation of their "fingerprint". It is anticipated that by using the concept of the pseudo amino acid composition (Chou, K.C. Proteins: Structure, Function, and Genetics, 2001, 43, 246-255), the cellular automata image approach can also be used to improve the quality of predicting protein attributes, such as structural class and subcellular location.

  17. Electro-peroxone pretreatment for enhanced simulated hospital wastewater treatment and antibiotic resistance genes reduction.

    PubMed

    Zheng, He-Shan; Guo, Wan-Qian; Wu, Qu-Li; Ren, Nan-Qi; Chang, Jo-Shu

    2018-06-01

    Hospital wastewater is one of the possible sources responsible for antibiotic resistant bacteria spread into the environment. This study proposed a promising strategy, electro-peroxone (E-peroxone) pretreatment followed by a sequencing batch reactor (SBR) for simulated hospital wastewater treatment, aiming to enhance the wastewater treatment performance and to reduce antibiotic resistance genes production simultaneously. The highest chemical oxygen demand (COD) and total organic carbon (TOC) removal efficiency of 94.3% and 92.8% were obtained using the E-peroxone-SBR process. The microbial community analysis through high-throughput sequencing showed that E-peroxone pretreatment could guarantee microbial richness and diversity in SBR, as well as reduce the microbial inhibitions caused by antibiotic and raise the amount of nitrification and denitrification genera. Specially, quantitative real-time PCRs revealed that E-peroxone pretreatment could largely reduce the numbers and contents of antibiotic resistance genes (ARGs) production in the following biological treatment unit. It was indicated that E-peroxone-SBR process may provide an effective way for hospital wastewater treatment and possible ARGs reduction. Copyright © 2018 Elsevier Ltd. All rights reserved.

  18. Role of mutation on fibril formation in small peptides by REMD

    NASA Astrophysics Data System (ADS)

    Mahmoudinobar, Farbod; Dias, Cristiano

    Amyloid fibrils are now recognized as a common form of protein structure. They have wide implications for neurological diseases and entities involved in the survival of living organisms, e.g., silkmoth eggshells. Biological functions of these entities are often related to the superior mechanical strength of fibrils that persists over a broad range of chemical and thermal conditions desirable for various biotechnological applications, e.g., to encapsulate drugs. Mechanical properties of fibrils was shown to depend strongly on the amino acid sequence of its constituent peptides whereby bending rigidities can vary by two orders of magnitude. Therefore, the rational design of new fibril-prone peptides with tailored properties depends on our understanding of the relation between amino acid sequence and its propensity to fibrillize. In this presentation I will show results from extensive Replica Exchange Molecular Dynamics (REMD) simulations of a 12-residue peptide containing the fibril-prone motif KFFE and its mutants. Simulations are performed on monomers, dimers, and tetramers. I will discuss effects of side chain packing, hydrophobicity, charges and beta-sheet propensity on fibril formation. Physics Department, University Heights, Newark, New Jersey, 07102-1982, USA.

  19. A comparison of students' achievement and attitude as a function of lecture/lab sequencing in a non-science majors introductory biology course

    NASA Astrophysics Data System (ADS)

    Hurst March, Robin Denise

    This investigation compared student achievement and attitudes toward science from three different sequencing approaches used in teaching biology to nonscience students. The three sequencing approaches were the lecture course only, lecture/laboratory courses taken together, and laboratory with previously taken lecture approach. The purposes of this study were to determine if (1) a relationship exists between the Attitude Towards Science in School Assessment (ATSSA) scores (Germann, 1988) and biology achievement, (2) a difference exists among the ATSSA scores and sequencing, (3) a difference exists among the biology achievement scores and sequencing, and (4) the ATSSA is a reliable instrument of science attitude assessment for the undergraduate students in an introductory biology nonmajors laboratory and lecture courses at a research I institution during the fall semester 1996. Fifty-four students comprised the lecture only group, 90 students comprised the lecture and laboratory taken together approach, and 23 students comprised the laboratory only approach. Research questions addressed were (1) What are the differences in student biology achievement as a function of the three different methods of instruction? (2) What are the differences in student attitude towards science as a function of the three different methods of instruction? (3) What is the relationship between post-attitude (ATSSA) and biology achievement for each of the three methods of instruction? An analysis of variance utilized the mean posttest scores on the ATSSA and mean achievement scores as the dependent variables. The independent variables were the three different sequences of enrollment in introductory biology. At the.05 level of significance, it was found that no significant difference existed between the ATTS and laboratory/lecture sequence. At the.05 level of significance, it was found that no significant difference existed between achievement and laboratory/lecture sequence. A Pearson product moment correlation was used to see if a relationship existed between posttest ATSSA scores and achievement totals in each sequence. A significant relationship was noted between the ATSSA and achievement in each sequence that involved a laboratory component.

  20. A distributed system for fast alignment of next-generation sequencing data.

    PubMed

    Srimani, Jaydeep K; Wu, Po-Yen; Phan, John H; Wang, May D

    2010-12-01

    We developed a scalable distributed computing system using the Berkeley Open Interface for Network Computing (BOINC) to align next-generation sequencing (NGS) data quickly and accurately. NGS technology is emerging as a promising platform for gene expression analysis due to its high sensitivity compared to traditional genomic microarray technology. However, despite the benefits, NGS datasets can be prohibitively large, requiring significant computing resources to obtain sequence alignment results. Moreover, as the data and alignment algorithms become more prevalent, it will become necessary to examine the effect of the multitude of alignment parameters on various NGS systems. We validate the distributed software system by (1) computing simple timing results to show the speed-up gained by using multiple computers, (2) optimizing alignment parameters using simulated NGS data, and (3) computing NGS expression levels for a single biological sample using optimal parameters and comparing these expression levels to that of a microarray sample. Results indicate that the distributed alignment system achieves approximately a linear speed-up and correctly distributes sequence data to and gathers alignment results from multiple compute clients.

  1. HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

    PubMed

    Wan, Shixiang; Zou, Quan

    2017-01-01

    Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.

  2. Implicit Learning of a Finger Motor Sequence by Patients with Cerebral Palsy After Neurofeedback.

    PubMed

    Alves-Pinto, Ana; Turova, Varvara; Blumenstein, Tobias; Hantuschke, Conny; Lampe, Renée

    2017-03-01

    Facilitation of implicit learning of a hand motor sequence after a single session of neurofeedback training of alpha power recorded from the motor cortex has been shown in healthy individuals (Ros et al., Biological Psychology 95:54-58, 2014). This facilitation effect could be potentially applied to improve the outcome of rehabilitation in patients with impaired hand motor function. In the current study a group of ten patients diagnosed with cerebral palsy trained reduction of alpha power derived from brain activity recorded from right and left motor areas. Training was distributed in three periods of 8 min each. In between, participants performed a serial reaction time task with their non-dominant hand, to a total of five runs. A similar procedure was repeated a week or more later but this time training was based on simulated brain activity. Reaction times pooled across participants decreased on each successive run faster after neurofeedback training than after the simulation training. Also recorded were two 3-min baseline conditions, once with the eyes open, another with the eyes closed, at the beginning and end of the experimental session. No significant changes in alpha power with neurofeedback or with simulation training were obtained and no correlation with the reductions in reaction time could be established. Contributions for this are discussed.

  3. Biological sequence compression algorithms.

    PubMed

    Matsumoto, T; Sadakane, K; Imai, H

    2000-01-01

    Today, more and more DNA sequences are becoming available. The information about DNA sequences are stored in molecular biology databases. The size and importance of these databases will be bigger and bigger in the future, therefore this information must be stored or communicated efficiently. Furthermore, sequence compression can be used to define similarities between biological sequences. The standard compression algorithms such as gzip or compress cannot compress DNA sequences, but only expand them in size. On the other hand, CTW (Context Tree Weighting Method) can compress DNA sequences less than two bits per symbol. These algorithms do not use special structures of biological sequences. Two characteristic structures of DNA sequences are known. One is called palindromes or reverse complements and the other structure is approximate repeats. Several specific algorithms for DNA sequences that use these structures can compress them less than two bits per symbol. In this paper, we improve the CTW so that characteristic structures of DNA sequences are available. Before encoding the next symbol, the algorithm searches an approximate repeat and palindrome using hash and dynamic programming. If there is a palindrome or an approximate repeat with enough length then our algorithm represents it with length and distance. By using this preprocessing, a new program achieves a little higher compression ratio than that of existing DNA-oriented compression algorithms. We also describe new compression algorithm for protein sequences.

  4. Evaluation of a Secondary School Science Program Inversion: Moving from a Traditional to a Modifified-PCB Sequence

    ERIC Educational Resources Information Center

    Gaubatz, Julie

    2013-01-01

    Studies of high-school science course sequences have been limited primarily to a small number of site-specific investigations comparing traditional science sequences (e.g., Biology-Chemistry-Physics: BCP) to various Physics First-influenced sequences (Physics-Chemistry-Biology: PCB). The present study summarizes a five-year program evaluation…

  5. UltraPse: A Universal and Extensible Software Platform for Representing Biological Sequences.

    PubMed

    Du, Pu-Feng; Zhao, Wei; Miao, Yang-Yang; Wei, Le-Yi; Wang, Likun

    2017-11-14

    With the avalanche of biological sequences in public databases, one of the most challenging problems in computational biology is to predict their biological functions and cellular attributes. Most of the existing prediction algorithms can only handle fixed-length numerical vectors. Therefore, it is important to be able to represent biological sequences with various lengths using fixed-length numerical vectors. Although several algorithms, as well as software implementations, have been developed to address this problem, these existing programs can only provide a fixed number of representation modes. Every time a new sequence representation mode is developed, a new program will be needed. In this paper, we propose the UltraPse as a universal software platform for this problem. The function of the UltraPse is not only to generate various existing sequence representation modes, but also to simplify all future programming works in developing novel representation modes. The extensibility of UltraPse is particularly enhanced. It allows the users to define their own representation mode, their own physicochemical properties, or even their own types of biological sequences. Moreover, UltraPse is also the fastest software of its kind. The source code package, as well as the executables for both Linux and Windows platforms, can be downloaded from the GitHub repository.

  6. Probability of misclassifying biological elements in surface waters.

    PubMed

    Loga, Małgorzata; Wierzchołowska-Dziedzic, Anna

    2017-11-24

    Measurement uncertainties are inherent to assessment of biological indices of water bodies. The effect of these uncertainties on the probability of misclassification of ecological status is the subject of this paper. Four Monte-Carlo (M-C) models were applied to simulate the occurrence of random errors in the measurements of metrics corresponding to four biological elements of surface waters: macrophytes, phytoplankton, phytobenthos, and benthic macroinvertebrates. Long series of error-prone measurement values of these metrics, generated by M-C models, were used to identify cases in which values of any of the four biological indices lay outside of the "true" water body class, i.e., outside the class assigned from the actual physical measurements. Fraction of such cases in the M-C generated series was used to estimate the probability of misclassification. The method is particularly useful for estimating the probability of misclassification of the ecological status of surface water bodies in the case of short sequences of measurements of biological indices. The results of the Monte-Carlo simulations show a relatively high sensitivity of this probability to measurement errors of the river macrophyte index (MIR) and high robustness to measurement errors of the benthic macroinvertebrate index (MMI). The proposed method of using Monte-Carlo models to estimate the probability of misclassification has significant potential for assessing the uncertainty of water body status reported to the EC by the EU member countries according to WFD. The method can be readily applied also in risk assessment of water management decisions before adopting the status dependent corrective actions.

  7. Sim3C: simulation of Hi-C and Meta3C proximity ligation sequencing technologies.

    PubMed

    DeMaere, Matthew Z; Darling, Aaron E

    2018-02-01

    Chromosome conformation capture (3C) and Hi-C DNA sequencing methods have rapidly advanced our understanding of the spatial organization of genomes and metagenomes. Many variants of these protocols have been developed, each with their own strengths. Currently there is no systematic means for simulating sequence data from this family of sequencing protocols, potentially hindering the advancement of algorithms to exploit this new datatype. We describe a computational simulator that, given simple parameters and reference genome sequences, will simulate Hi-C sequencing on those sequences. The simulator models the basic spatial structure in genomes that is commonly observed in Hi-C and 3C datasets, including the distance-decay relationship in proximity ligation, differences in the frequency of interaction within and across chromosomes, and the structure imposed by cells. A means to model the 3D structure of randomly generated topologically associating domains is provided. The simulator considers several sources of error common to 3C and Hi-C library preparation and sequencing methods, including spurious proximity ligation events and sequencing error. We have introduced the first comprehensive simulator for 3C and Hi-C sequencing protocols. We expect the simulator to have use in testing of Hi-C data analysis algorithms, as well as more general value for experimental design, where questions such as the required depth of sequencing, enzyme choice, and other decisions can be made in advance in order to ensure adequate statistical power with respect to experimental hypothesis testing.

  8. Using VMD - An Introductory Tutorial

    PubMed Central

    Hsin, Jen; Arkhipov, Anton; Yin, Ying; Stone, John E.; Schulten, Klaus

    2010-01-01

    VMD (Visual Molecular Dynamics) is a molecular visualization and analysis program designed for biological systems such as proteins, nucleic acids, lipid bilayer assemblies, etc. This unit will serve as an introductory VMD tutorial. We will present several step-by-step examples of some of VMD’s most popular features, including visualizing molecules in three dimensions with different drawing and coloring methods, rendering publication-quality figures, animate and analyze the trajectory of a molecular dynamics simulation, scripting in the text-based Tcl/Tk interface, and analyzing both sequence and structure data for proteins. PMID:19085979

  9. BlackOPs: increasing confidence in variant detection through mappability filtering.

    PubMed

    Cabanski, Christopher R; Wilkerson, Matthew D; Soloway, Matthew; Parker, Joel S; Liu, Jinze; Prins, Jan F; Marron, J S; Perou, Charles M; Hayes, D Neil

    2013-10-01

    Identifying variants using high-throughput sequencing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical artifact results from incorrectly aligning experimentally observed sequences to their true genomic origin ('mismapping') and inferring differences in mismapped sequences to be true variants. We developed BlackOPs, an open-source tool that simulates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklists contain thousands of artifact variants that are indistinguishable from true variants and, for a given sample, are expected to be almost completely false positives. We show that these blacklist positions are specific to the alignment algorithm and read length used, and BlackOPs allows users to generate a blacklist specific to their experimental setup. We queried the dbSNP and COSMIC variant databases and found numerous variants indistinguishable from mapping errors. We demonstrate how filtering against blacklist positions reduces the number of potential false variants using an RNA-seq glioblastoma cell line data set. In summary, accounting for mapping-caused variants tuned to experimental setups reduces false positives and, therefore, improves genome characterization by high-throughput sequencing.

  10. An Introduction to Programming for Bioscientists: A Python-Based Primer

    PubMed Central

    Mura, Cameron

    2016-01-01

    Computing has revolutionized the biological sciences over the past several decades, such that virtually all contemporary research in molecular biology, biochemistry, and other biosciences utilizes computer programs. The computational advances have come on many fronts, spurred by fundamental developments in hardware, software, and algorithms. These advances have influenced, and even engendered, a phenomenal array of bioscience fields, including molecular evolution and bioinformatics; genome-, proteome-, transcriptome- and metabolome-wide experimental studies; structural genomics; and atomistic simulations of cellular-scale molecular assemblies as large as ribosomes and intact viruses. In short, much of post-genomic biology is increasingly becoming a form of computational biology. The ability to design and write computer programs is among the most indispensable skills that a modern researcher can cultivate. Python has become a popular programming language in the biosciences, largely because (i) its straightforward semantics and clean syntax make it a readily accessible first language; (ii) it is expressive and well-suited to object-oriented programming, as well as other modern paradigms; and (iii) the many available libraries and third-party toolkits extend the functionality of the core language into virtually every biological domain (sequence and structure analyses, phylogenomics, workflow management systems, etc.). This primer offers a basic introduction to coding, via Python, and it includes concrete examples and exercises to illustrate the language’s usage and capabilities; the main text culminates with a final project in structural bioinformatics. A suite of Supplemental Chapters is also provided. Starting with basic concepts, such as that of a “variable,” the Chapters methodically advance the reader to the point of writing a graphical user interface to compute the Hamming distance between two DNA sequences. PMID:27271528

  11. An Introduction to Programming for Bioscientists: A Python-Based Primer.

    PubMed

    Ekmekci, Berk; McAnany, Charles E; Mura, Cameron

    2016-06-01

    Computing has revolutionized the biological sciences over the past several decades, such that virtually all contemporary research in molecular biology, biochemistry, and other biosciences utilizes computer programs. The computational advances have come on many fronts, spurred by fundamental developments in hardware, software, and algorithms. These advances have influenced, and even engendered, a phenomenal array of bioscience fields, including molecular evolution and bioinformatics; genome-, proteome-, transcriptome- and metabolome-wide experimental studies; structural genomics; and atomistic simulations of cellular-scale molecular assemblies as large as ribosomes and intact viruses. In short, much of post-genomic biology is increasingly becoming a form of computational biology. The ability to design and write computer programs is among the most indispensable skills that a modern researcher can cultivate. Python has become a popular programming language in the biosciences, largely because (i) its straightforward semantics and clean syntax make it a readily accessible first language; (ii) it is expressive and well-suited to object-oriented programming, as well as other modern paradigms; and (iii) the many available libraries and third-party toolkits extend the functionality of the core language into virtually every biological domain (sequence and structure analyses, phylogenomics, workflow management systems, etc.). This primer offers a basic introduction to coding, via Python, and it includes concrete examples and exercises to illustrate the language's usage and capabilities; the main text culminates with a final project in structural bioinformatics. A suite of Supplemental Chapters is also provided. Starting with basic concepts, such as that of a "variable," the Chapters methodically advance the reader to the point of writing a graphical user interface to compute the Hamming distance between two DNA sequences.

  12. Oligo kernels for datamining on biological sequences: a case study on prokaryotic translation initiation sites

    PubMed Central

    Meinicke, Peter; Tech, Maike; Morgenstern, Burkhard; Merkl, Rainer

    2004-01-01

    Background Kernel-based learning algorithms are among the most advanced machine learning methods and have been successfully applied to a variety of sequence classification tasks within the field of bioinformatics. Conventional kernels utilized so far do not provide an easy interpretation of the learnt representations in terms of positional and compositional variability of the underlying biological signals. Results We propose a kernel-based approach to datamining on biological sequences. With our method it is possible to model and analyze positional variability of oligomers of any length in a natural way. On one hand this is achieved by mapping the sequences to an intuitive but high-dimensional feature space, well-suited for interpretation of the learnt models. On the other hand, by means of the kernel trick we can provide a general learning algorithm for that high-dimensional representation because all required statistics can be computed without performing an explicit feature space mapping of the sequences. By introducing a kernel parameter that controls the degree of position-dependency, our feature space representation can be tailored to the characteristics of the biological problem at hand. A regularized learning scheme enables application even to biological problems for which only small sets of example sequences are available. Our approach includes a visualization method for transparent representation of characteristic sequence features. Thereby importance of features can be measured in terms of discriminative strength with respect to classification of the underlying sequences. To demonstrate and validate our concept on a biochemically well-defined case, we analyze E. coli translation initiation sites in order to show that we can find biologically relevant signals. For that case, our results clearly show that the Shine-Dalgarno sequence is the most important signal upstream a start codon. The variability in position and composition we found for that signal is in accordance with previous biological knowledge. We also find evidence for signals downstream of the start codon, previously introduced as transcriptional enhancers. These signals are mainly characterized by occurrences of adenine in a region of about 4 nucleotides next to the start codon. Conclusions We showed that the oligo kernel can provide a valuable tool for the analysis of relevant signals in biological sequences. In the case of translation initiation sites we could clearly deduce the most discriminative motifs and their positional variation from example sequences. Attractive features of our approach are its flexibility with respect to oligomer length and position conservation. By means of these two parameters oligo kernels can easily be adapted to different biological problems. PMID:15511290

  13. IBS: an illustrator for the presentation and visualization of biological sequences.

    PubMed

    Liu, Wenzhong; Xie, Yubin; Ma, Jiyong; Luo, Xiaotong; Nie, Peng; Zuo, Zhixiang; Lahrmann, Urs; Zhao, Qi; Zheng, Yueyuan; Zhao, Yong; Xue, Yu; Ren, Jian

    2015-10-15

    Biological sequence diagrams are fundamental for visualizing various functional elements in protein or nucleotide sequences that enable a summarization and presentation of existing information as well as means of intuitive new discoveries. Here, we present a software package called illustrator of biological sequences (IBS) that can be used for representing the organization of either protein or nucleotide sequences in a convenient, efficient and precise manner. Multiple options are provided in IBS, and biological sequences can be manipulated, recolored or rescaled in a user-defined mode. Also, the final representational artwork can be directly exported into a publication-quality figure. The standalone package of IBS was implemented in JAVA, while the online service was implemented in HTML5 and JavaScript. Both the standalone package and online service are freely available at http://ibs.biocuckoo.org. renjian.sysu@gmail.com or xueyu@hust.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  14. IBS: an illustrator for the presentation and visualization of biological sequences

    PubMed Central

    Liu, Wenzhong; Xie, Yubin; Ma, Jiyong; Luo, Xiaotong; Nie, Peng; Zuo, Zhixiang; Lahrmann, Urs; Zhao, Qi; Zheng, Yueyuan; Zhao, Yong; Xue, Yu; Ren, Jian

    2015-01-01

    Summary: Biological sequence diagrams are fundamental for visualizing various functional elements in protein or nucleotide sequences that enable a summarization and presentation of existing information as well as means of intuitive new discoveries. Here, we present a software package called illustrator of biological sequences (IBS) that can be used for representing the organization of either protein or nucleotide sequences in a convenient, efficient and precise manner. Multiple options are provided in IBS, and biological sequences can be manipulated, recolored or rescaled in a user-defined mode. Also, the final representational artwork can be directly exported into a publication-quality figure. Availability and implementation: The standalone package of IBS was implemented in JAVA, while the online service was implemented in HTML5 and JavaScript. Both the standalone package and online service are freely available at http://ibs.biocuckoo.org. Contact: renjian.sysu@gmail.com or xueyu@hust.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26069263

  15. A Personal Journey of Discovery: Developing Technology and Changing Biology

    NASA Astrophysics Data System (ADS)

    Hood, Lee

    2008-07-01

    This autobiographical article describes my experiences in developing chemically based, biological technologies for deciphering biological information: DNA, RNA, proteins, interactions, and networks. The instruments developed include protein and DNA sequencers and synthesizers, as well as ink-jet technology for synthesizing DNA chips. Diverse new strategies for doing biology also arose from novel applications of these instruments. The functioning of these instruments can be integrated to generate powerful new approaches to cloning and characterizing genes from a small amount of protein sequence or to using gene sequences to synthesize peptide fragments so as to characterize various properties of the proteins. I also discuss the five paradigm changes in which I have participated: the development and integration of biological instrumentation; the human genome project; cross-disciplinary biology; systems biology; and predictive, personalized, preventive, and participatory (P4) medicine. Finally, I discuss the origins, the philosophy, some accomplishments, and the future trajectories of the Institute for Systems Biology.

  16. Survey of local and global biological network alignment: the need to reconcile the two sides of the same coin.

    PubMed

    Guzzi, Pietro Hiram; Milenkovic, Tijana

    2018-05-01

    Analogous to genomic sequence alignment that allows for across-species transfer of biological knowledge between conserved sequence regions, biological network alignment can be used to guide the knowledge transfer between conserved regions of molecular networks of different species. Hence, biological network alignment can be used to redefine the traditional notion of a sequence-based homology to a new notion of network-based homology. Analogous to genomic sequence alignment, there exist local and global biological network alignments. Here, we survey prominent and recent computational approaches of each network alignment type and discuss their (dis)advantages. Then, as it was recently shown that the two approach types are complementary, in the sense that they capture different slices of cellular functioning, we discuss the need to reconcile the two network alignment types and present a recent first step in this direction. We conclude with some open research problems on this topic and comment on the usefulness of network alignment in other domains besides computational biology.

  17. Coupled binding-bending-folding: The complex conformational dynamics of protein-DNA binding studied by atomistic molecular dynamics simulations.

    PubMed

    van der Vaart, Arjan

    2015-05-01

    Protein-DNA binding often involves dramatic conformational changes such as protein folding and DNA bending. While thermodynamic aspects of this behavior are understood, and its biological function is often known, the mechanism by which the conformational changes occur is generally unclear. By providing detailed structural and energetic data, molecular dynamics simulations have been helpful in elucidating and rationalizing protein-DNA binding. This review will summarize recent atomistic molecular dynamics simulations of the conformational dynamics of DNA and protein-DNA binding. A brief overview of recent developments in DNA force fields is given as well. Simulations have been crucial in rationalizing the intrinsic flexibility of DNA, and have been instrumental in identifying the sequence of binding events, the triggers for the conformational motion, and the mechanism of binding for a number of important DNA-binding proteins. Molecular dynamics simulations are an important tool for understanding the complex binding behavior of DNA-binding proteins. With recent advances in force fields and rapid increases in simulation time scales, simulations will become even more important for future studies. This article is part of a Special Issue entitled Recent developments of molecular dynamics. Copyright © 2014. Published by Elsevier B.V.

  18. Efficient Mining of Interesting Patterns in Large Biological Sequences

    PubMed Central

    Rashid, Md. Mamunur; Karim, Md. Rezaul; Jeong, Byeong-Soo

    2012-01-01

    Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin. In this paper, we propose a new interesting measure that can provide meaningful biological information. We also propose an efficient index-based method for mining such interesting patterns. Experimental results show that our approach can find interesting patterns within an acceptable computation time. PMID:23105928

  19. Efficient mining of interesting patterns in large biological sequences.

    PubMed

    Rashid, Md Mamunur; Karim, Md Rezaul; Jeong, Byeong-Soo; Choi, Ho-Jin

    2012-03-01

    Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin. In this paper, we propose a new interesting measure that can provide meaningful biological information. We also propose an efficient index-based method for mining such interesting patterns. Experimental results show that our approach can find interesting patterns within an acceptable computation time.

  20. Systems Biology for Smart Crops and Agricultural Innovation: Filling the Gaps between Genotype and Phenotype for Complex Traits Linked with Robust Agricultural Productivity and Sustainability

    PubMed Central

    Pathak, Rajesh Kumar; Gupta, Sanjay Mohan; Gaur, Vikram Singh; Pandey, Dinesh

    2015-01-01

    Abstract In recent years, rapid developments in several omics platforms and next generation sequencing technology have generated a huge amount of biological data about plants. Systems biology aims to develop and use well-organized and efficient algorithms, data structure, visualization, and communication tools for the integration of these biological data with the goal of computational modeling and simulation. It studies crop plant systems by systematically perturbing them, checking the gene, protein, and informational pathway responses; integrating these data; and finally, formulating mathematical models that describe the structure of system and its response to individual perturbations. Consequently, systems biology approaches, such as integrative and predictive ones, hold immense potential in understanding of molecular mechanism of agriculturally important complex traits linked to agricultural productivity. This has led to identification of some key genes and proteins involved in networks of pathways involved in input use efficiency, biotic and abiotic stress resistance, photosynthesis efficiency, root, stem and leaf architecture, and nutrient mobilization. The developments in the above fields have made it possible to design smart crops with superior agronomic traits through genetic manipulation of key candidate genes. PMID:26484978

  1. Image-based computer-assisted diagnosis system for benign paroxysmal positional vertigo

    NASA Astrophysics Data System (ADS)

    Kohigashi, Satoru; Nakamae, Koji; Fujioka, Hiromu

    2005-04-01

    We develop the image based computer assisted diagnosis system for benign paroxysmal positional vertigo (BPPV) that consists of the balance control system simulator, the 3D eye movement simulator, and the extraction method of nystagmus response directly from an eye movement image sequence. In the system, the causes and conditions of BPPV are estimated by searching the database for record matching with the nystagmus response for the observed eye image sequence of the patient with BPPV. The database includes the nystagmus responses for simulated eye movement sequences. The eye movement velocity is obtained by using the balance control system simulator that allows us to simulate BPPV under various conditions such as canalithiasis, cupulolithiasis, number of otoconia, otoconium size, and so on. Then the eye movement image sequence is displayed on the CRT by the 3D eye movement simulator. The nystagmus responses are extracted from the image sequence by the proposed method and are stored in the database. In order to enhance the diagnosis accuracy, the nystagmus response for a newly simulated sequence is matched with that for the observed sequence. From the matched simulation conditions, the causes and conditions of BPPV are estimated. We apply our image based computer assisted diagnosis system to two real eye movement image sequences for patients with BPPV to show its validity.

  2. Blind Predictions of DNA and RNA Tweezers Experiments with Force and Torque

    PubMed Central

    Chou, Fang-Chieh; Lipfert, Jan; Das, Rhiju

    2014-01-01

    Single-molecule tweezers measurements of double-stranded nucleic acids (dsDNA and dsRNA) provide unprecedented opportunities to dissect how these fundamental molecules respond to forces and torques analogous to those applied by topoisomerases, viral capsids, and other biological partners. However, tweezers data are still most commonly interpreted post facto in the framework of simple analytical models. Testing falsifiable predictions of state-of-the-art nucleic acid models would be more illuminating but has not been performed. Here we describe a blind challenge in which numerical predictions of nucleic acid mechanical properties were compared to experimental data obtained recently for dsRNA under applied force and torque. The predictions were enabled by the HelixMC package, first presented in this paper. HelixMC advances crystallography-derived base-pair level models (BPLMs) to simulate kilobase-length dsDNAs and dsRNAs under external forces and torques, including their global linking numbers. These calculations recovered the experimental bending persistence length of dsRNA within the error of the simulations and accurately predicted that dsRNA's “spring-like” conformation would give a two-fold decrease of stretch modulus relative to dsDNA. Further blind predictions of helix torsional properties, however, exposed inaccuracies in current BPLM theory, including three-fold discrepancies in torsional persistence length at the high force limit and the incorrect sign of dsRNA link-extension (twist-stretch) coupling. Beyond these experiments, HelixMC predicted that ‘nucleosome-excluding’ poly(A)/poly(T) is at least two-fold stiffer than random-sequence dsDNA in bending, stretching, and torsional behaviors; Z-DNA to be at least three-fold stiffer than random-sequence dsDNA, with a near-zero link-extension coupling; and non-negligible effects from base pair step correlations. We propose that experimentally testing these predictions should be powerful next steps for understanding the flexibility of dsDNA and dsRNA in sequence contexts and under mechanical stresses relevant to their biology. PMID:25102226

  3. A biological compression model and its applications.

    PubMed

    Cao, Minh Duc; Dix, Trevor I; Allison, Lloyd

    2011-01-01

    A biological compression model, expert model, is presented which is superior to existing compression algorithms in both compression performance and speed. The model is able to compress whole eukaryotic genomes. Most importantly, the model provides a framework for knowledge discovery from biological data. It can be used for repeat element discovery, sequence alignment and phylogenetic analysis. We demonstrate that the model can handle statistically biased sequences and distantly related sequences where conventional knowledge discovery tools often fail.

  4. Mean-field approaches to the totally asymmetric exclusion process with quenched disorder and large particles

    NASA Astrophysics Data System (ADS)

    Shaw, Leah B.; Sethna, James P.; Lee, Kelvin H.

    2004-08-01

    The process of protein synthesis in biological systems resembles a one-dimensional driven lattice gas in which the particles (ribosomes) have spatial extent, covering more than one lattice site. Realistic, nonuniform gene sequences lead to quenched disorder in the particle hopping rates. We study the totally asymmetric exclusion process with large particles and quenched disorder via several mean-field approaches and compare the mean-field results with Monte Carlo simulations. Mean-field equations obtained from the literature are found to be reasonably effective in describing this system. A numerical technique is developed for computing the particle current rapidly. The mean-field approach is extended to include two-point correlations between adjacent sites. The two-point results are found to match Monte Carlo simulations more closely.

  5. Simulations of Karenia Brevis on the West Florida Shelf

    NASA Astrophysics Data System (ADS)

    Lenes, J. M.; Darrow, B. P.; Chen, F. R.; Walsh, J. J.; Dieterle, D. A.; Weisberg, R. H.

    2010-12-01

    The ecological model, HABSIM, was developed and tested to examine the initiation and maintenance of red tides of the toxic dinoflagellate Karenia brevis on the West Florida shelf (WFS). Phytoplankton competition among K. brevis, nitrogen fixing cyanophytes (Trichodesmium spp.), large siliceous phytoplankton (diatoms), and small non-siliceous phytoplankton (microflagellates) thus explores the sequence of events required to support the observed bloom from August to December 2001. The ecological model contains twenty two state variables within four submodels: atmospheric (iron deposition), biological (phytoplankton, bacteria, zooplankton, and fish), chemical (multiple species of carbon, nitrogen, phosphorus, silica, and iron), and benthic (nutrient regeneration). Here, we present results for the 2001 1-d hindcast simulations, with and without data assimilation of the Karenia state variable, as well as preliminary 3-d results.

  6. Protein Structure and Function Prediction Using I-TASSER

    PubMed Central

    Yang, Jianyi; Zhang, Yang

    2016-01-01

    I-TASSER is a hierarchical protocol for automated protein structure prediction and structure-based function annotation. Starting from the amino acid sequence of target proteins, I-TASSER first generates full-length atomic structural models from multiple threading alignments and iterative structural assembly simulations followed by atomic-level structure refinement. The biological functions of the protein, including ligand-binding sites, enzyme commission number, and gene ontology terms, are then inferred from known protein function databases based on sequence and structure profile comparisons. I-TASSER is freely available as both an on-line server and a stand-alone package. This unit describes how to use the I-TASSER protocol to generate structure and function prediction and how to interpret the prediction results, as well as alternative approaches for further improving the I-TASSER modeling quality for distant-homologous and multi-domain protein targets. PMID:26678386

  7. Conservation of Dynamics Associated with Biological Function in an Enzyme Superfamily.

    PubMed

    Narayanan, Chitra; Bernard, David N; Bafna, Khushboo; Gagné, Donald; Chennubhotla, Chakra S; Doucet, Nicolas; Agarwal, Pratul K

    2018-03-06

    Enzyme superfamily members that share common chemical and/or biological functions also share common features. While the role of structure is well characterized, the link between enzyme function and dynamics is not well understood. We present a systematic characterization of intrinsic dynamics of over 20 members of the pancreatic-type RNase superfamily, which share a common structural fold. This study is motivated by the fact that the range of chemical activity as well as molecular motions of RNase homologs spans over 10 5 folds. Dynamics was characterized using a combination of nuclear magnetic resonance experiments and computer simulations. Phylogenetic clustering led to the grouping of sequences into functionally distinct subfamilies. Detailed characterization of the diverse RNases showed conserved dynamical traits for enzymes within subfamilies. These results suggest that selective pressure for the conservation of dynamical behavior, among other factors, may be linked to the distinct chemical and biological functions in an enzyme superfamily. Copyright © 2018 Elsevier Ltd. All rights reserved.

  8. The role of sequence in altering the unfolding pathway of an RNA pseudoknot: a steered molecular dynamics study.

    PubMed

    Gupta, Asmita; Bansal, Manju

    2016-10-19

    Mechanical unfolding studies on Ribonucleic Acid (RNA) structures are a subject of tremendous interest as they shed light on the principles of higher order assembly of these structures. Pseudoknotting is one of the most elementary ways in which this higher order assembly is achieved as discrete secondary structural units in RNA are brought in close proximity to form a tertiary structure. Using steered molecular dynamics (SMD) simulations, we have studied the unfolding of five RNA pseudoknot structures that differ from each other either by base substitutions in helices or loops. Our SMD simulations reveal the manner in which a biologically functional RNA pseudoknot unfolds and the effect of changes in the primary structure on this unfolding pathway, providing necessary insights into the driving forces behind the functioning of these structures. We observed that an A → C mutation in the loop sequence makes the pseudoknot far more resistant against force induced disruption relative to its wild type structure. In contrast to this, a base-pair substitution GC → AU near the pseudoknot junction region renders it more vulnerable to this disruption. The quantitative estimation of differences in the unfolding paths was carried out using force extension curves, potential of mean force profiles, and the opening of different Watson-Crick and non-Watson-Crick interactions. The results provide a quantified view in which the unfolding paths of the small RNA structures can be used for investigating the programmability of RNA chains for designing RNA switches and aptamers as their biological folding and unfolding could be assessed and manipulated.

  9. Peptoid nanosheets exhibit a new secondary-structure motif.

    PubMed

    Mannige, Ranjan V; Haxton, Thomas K; Proulx, Caroline; Robertson, Ellen J; Battigelli, Alessia; Butterfoss, Glenn L; Zuckermann, Ronald N; Whitelam, Stephen

    2015-10-15

    A promising route to the synthesis of protein-mimetic materials that are capable of complex functions, such as molecular recognition and catalysis, is provided by sequence-defined peptoid polymers--structural relatives of biologically occurring polypeptides. Peptoids, which are relatively non-toxic and resistant to degradation, can fold into defined structures through a combination of sequence-dependent interactions. However, the range of possible structures that are accessible to peptoids and other biological mimetics is unknown, and our ability to design protein-like architectures from these polymer classes is limited. Here we use molecular-dynamics simulations, together with scattering and microscopy data, to determine the atomic-resolution structure of the recently discovered peptoid nanosheet, an ordered supramolecular assembly that extends macroscopically in only two dimensions. Our simulations show that nanosheets are structurally and dynamically heterogeneous, can be formed only from peptoids of certain lengths, and are potentially porous to water and ions. Moreover, their formation is enabled by the peptoids' adoption of a secondary structure that is not seen in the natural world. This structure, a zigzag pattern that we call a Σ('sigma')-strand, results from the ability of adjacent backbone monomers to adopt opposed rotational states, thereby allowing the backbone to remain linear and untwisted. Linear backbones tiled in a brick-like way form an extended two-dimensional nanostructure, the Σ-sheet. The binary rotational-state motif of the Σ-strand is not seen in regular protein structures, which are usually built from one type of rotational state. We also show that the concept of building regular structures from multiple rotational states can be generalized beyond the peptoid nanosheet system.

  10. International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons

    PubMed Central

    Olson, Nathan D.; Lund, Steven P.; Zook, Justin M.; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S.; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B.

    2015-01-01

    This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing®, or Ion Torrent PGM®. The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies. PMID:27077030

  11. [Progress on molecular biology of Isaria farinosa, pathogen of host of Ophiocordyceps sinensis during the artificial culture].

    PubMed

    Liu, Fei; Wu, Xiao-Li; Liu, Ying; Chen, Da-Xia; Zhang, De-Li; Yang, Da-Jian

    2016-02-01

    Isaria farinosa is the pathogen of the host of Ophiocordyceps sinensis. The present research has analyzed the progress on the molecular biology according to the bibliometrics, the sequences (including the gene sequences) of I. farinosa in the NCBI. The results indicated that different country had published different number of the papers, and had landed different kinds and different number of the sequences (including the gene sequences). China had published the most number of the papers, and had landed the most number of the sequences (including the gene sequences). America had landed the most numbers of the function genes. The main content about the pathogen study was focus on the biological controlling. The main content about the molecular study concentrated on the phylogenies classification. In recent years some protease genes and chitinase genes had been researched. With the increase of the effect on the healthy of O. sinensis, and the whole sequence and more and more pharmacological activities of I. farinosa being made known to the public, the study on the molecular biology of the I. farinosa would be deeper and wider. Copyright© by the Chinese Pharmaceutical Association.

  12. Simulation Studies as Designed Experiments: The Comparison of Penalized Regression Models in the “Large p, Small n” Setting

    PubMed Central

    Chaibub Neto, Elias; Bare, J. Christopher; Margolin, Adam A.

    2014-01-01

    New algorithms are continuously proposed in computational biology. Performance evaluation of novel methods is important in practice. Nonetheless, the field experiences a lack of rigorous methodology aimed to systematically and objectively evaluate competing approaches. Simulation studies are frequently used to show that a particular method outperforms another. Often times, however, simulation studies are not well designed, and it is hard to characterize the particular conditions under which different methods perform better. In this paper we propose the adoption of well established techniques in the design of computer and physical experiments for developing effective simulation studies. By following best practices in planning of experiments we are better able to understand the strengths and weaknesses of competing algorithms leading to more informed decisions about which method to use for a particular task. We illustrate the application of our proposed simulation framework with a detailed comparison of the ridge-regression, lasso and elastic-net algorithms in a large scale study investigating the effects on predictive performance of sample size, number of features, true model sparsity, signal-to-noise ratio, and feature correlation, in situations where the number of covariates is usually much larger than sample size. Analysis of data sets containing tens of thousands of features but only a few hundred samples is nowadays routine in computational biology, where “omics” features such as gene expression, copy number variation and sequence data are frequently used in the predictive modeling of complex phenotypes such as anticancer drug response. The penalized regression approaches investigated in this study are popular choices in this setting and our simulations corroborate well established results concerning the conditions under which each one of these methods is expected to perform best while providing several novel insights. PMID:25289666

  13. Genome sequences of three strains of Aspergillus flavus for the biological control of Aflatoxin

    USDA-ARS?s Scientific Manuscript database

    The genomes of three strains of Aspergillus flavus with demonstrated utility for the biological control of aflatoxin were sequenced. These sequences were assembled with MIRA and annotated with Augustus using A. flavus strain 3357 (NCBI EQ963472) as a reference. Each strain had a genome of 36.3 to ...

  14. SNAD: Sequence Name Annotation-based Designer.

    PubMed

    Sidorov, Igor A; Reshetov, Denis A; Gorbalenya, Alexander E

    2009-08-14

    A growing diversity of biological data is tagged with unique identifiers (UIDs) associated with polynucleotides and proteins to ensure efficient computer-mediated data storage, maintenance, and processing. These identifiers, which are not informative for most people, are often substituted by biologically meaningful names in various presentations to facilitate utilization and dissemination of sequence-based knowledge. This substitution is commonly done manually that may be a tedious exercise prone to mistakes and omissions. Here we introduce SNAD (Sequence Name Annotation-based Designer) that mediates automatic conversion of sequence UIDs (associated with multiple alignment or phylogenetic tree, or supplied as plain text list) into biologically meaningful names and acronyms. This conversion is directed by precompiled or user-defined templates that exploit wealth of annotation available in cognate entries of external databases. Using examples, we demonstrate how this tool can be used to generate names for practical purposes, particularly in virology. A tool for controllable annotation-based conversion of sequence UIDs into biologically meaningful names and acronyms has been developed and placed into service, fostering links between quality of sequence annotation, and efficiency of communication and knowledge dissemination among researchers.

  15. Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

    PubMed

    Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo

    2016-07-19

    Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .

  16. Planetary and Space Simulation Facilities (PSI) at DLR

    NASA Astrophysics Data System (ADS)

    Panitz, Corinna; Rabbow, E.; Rettberg, P.; Kloss, M.; Reitz, G.; Horneck, G.

    2010-05-01

    The Planetary and Space Simulation facilities at DLR offer the possibility to expose biological and physical samples individually or integrated into space hardware to defined and controlled space conditions like ultra high vacuum, low temperature and extraterrestrial UV radiation. An x-ray facility stands for the simulation of the ionizing component at the disposal. All of the simulation facilities are required for the preparation of space experiments: - for testing of the newly developed space hardware - for investigating the effect of different space parameters on biological systems as a preparation for the flight experiment - for performing the 'Experiment Verification Tests' (EVT) for the specification of the test parameters - and 'Experiment Sequence Tests' (EST) by simulating sample assemblies, exposure to selected space parameters, and sample disassembly. To test the compatibility of the different biological and chemical systems and their adaptation to the opportunities and constraints of space conditions a profound ground support program has been developed among many others for the ESA facilities of the ongoing missions EXPOSE-R and EXPOSE-E on board of the International Space Station ISS . Several experiment verification tests EVTs and an experiment sequence test EST have been conducted in the carefully equipped and monitored planetary and space simulation facilities PSI of the Institute of Aerospace Medicine at DLR in Cologne, Germany. These ground based pre-flight studies allowed the investigation of a much wider variety of samples and the selection of the most promising organisms for the flight experiment. EXPOSE-E had been attached to the outer balcony of the European Columbus module of the ISS in February 2008 and stayed for 1,5 years in space; EXPOSE-R has been attached to the Russian Svezda module of the ISS in spring 2009 and mission duration will be approx. 1,5 years. The missions will give new insights into the survivability of terrestrial organisms in space and will contribute to the understanding of the organic chemistry processes in space, the biological adaptation strategies to extreme conditions, e.g. on early Earth and Mars, and the distribution of life beyond its planet of origin The results gained during the simulation experiments demonstrated mission preparation as a basic requirement for successful and significant results of every space flight experiment. Hence, the Mission preparation program that was performed in the context of the space missions EXPOSE-E and EXPOSE-R proofed the outstanding importance and accentuated need for ground based experiments before and during a space mission. The facilities are also necessary for the performance of the ground control experiment during the mission, the so-called Mission Simulation Test (MST) under simulated space conditions, by parallel exposure of samples to simulated space parameters according to flight data received by telemetry. Finally the facilities also provide the possibility to simulate the surface and climate conditions of the planet Mars. In this way they offer the possibility to investigate under simulated Mars conditions the chances for development of life on Mars and to gain previous knowledge for the search for life on today's Mars and in this context especially the parameters for a manned mission to Mars. References [1] Rabbow E, Rettberg P, Panitz C, Drescher J, Horneck G, Reitz G (2005) SSIOUX - Space Simulation for Investigating Organics, Evolution and Exobiology, Adv. Space Res. 36 (2) 297-302, doi:10.1016/j.asr.2005.08.040Aman, A. and Bman, B. (1997) JGR, 90,1151-1154. [2] Fekete A, Modos K, Hegedüs M, Kovacs G, Ronto Gy, Peter A, Lammer H, Panitz C (2005) DNA Damage under simulated extraterrestrial conditions in bacteriophage T7 Adv. Space Res. 305-310Aman, A. et al. (1997) Meteoritics & Planet. Sci., 32,A74. [3] Cockell Ch, Schuerger AC, Billi D., Friedmann EI, Panitz C (2005) Effects of a Simulated Martian UV Flux on the Cyanobacterium, Chroococcidiopsis sp. 029, Astrobiology, 5/2 127-140Aman, A. (1996) LPS XXVII, 1344-1 [4] de la Torre Noetzel, R.; Sancho, L.G.; Pintado,A.; Rettberg, Petra; Rabbow, Elke; Panitz,Corinna; Deutschmann, U.; Reina, M.; Horneck, Gerda (2007): BIOPAN experiment LICHENS on the Foton M2 mission Pre-flight verification tests of the Rhizocarpon geographicum-granite ecosystem. COSPAR [Hrsg.]: Advances in Space Research, 40, Elsevier, S. 1665 - 1671, DOI 10.1016/j.asr.2007.02.022

  17. Quantitative analysis of image quality for acceptance and commissioning of an MRI simulator with a semiautomatic method.

    PubMed

    Chen, Xinyuan; Dai, Jianrong

    2018-05-01

    Magnetic Resonance Imaging (MRI) simulation differs from diagnostic MRI in purpose, technical requirements, and implementation. We propose a semiautomatic method for image acceptance and commissioning for the scanner, the radiofrequency (RF) coils, and pulse sequences for an MRI simulator. The ACR MRI accreditation large phantom was used for image quality analysis with seven parameters. Standard ACR sequences with a split head coil were adopted to examine the scanner's basic performance. The performance of simulation RF coils were measured and compared using the standard sequence with different clinical diagnostic coils. We used simulation sequences with simulation coils to test the quality of image and advanced performance of the scanner. Codes and procedures were developed for semiautomatic image quality analysis. When using standard ACR sequences with a split head coil, image quality passed all ACR recommended criteria. The image intensity uniformity with a simulation RF coil decreased about 34% compared with the eight-channel diagnostic head coil, while the other six image quality parameters were acceptable. Those two image quality parameters could be improved to more than 85% by built-in intensity calibration methods. In the simulation sequences test, the contrast resolution was sensitive to the FOV and matrix settings. The geometric distortion of simulation sequences such as T1-weighted and T2-weighted images was well-controlled in the isocenter and 10 cm off-center within a range of ±1% (2 mm). We developed a semiautomatic image quality analysis method for quantitative evaluation of images and commissioning of an MRI simulator. The baseline performances of simulation RF coils and pulse sequences have been established for routine QA. © 2018 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.

  18. in silico Whole Genome Sequencer & Analyzer (iWGS): A Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhou, Xiaofan; Peris, David; Kominek, Jacek

    The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimentalmore » design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.« less

  19. in silico Whole Genome Sequencer & Analyzer (iWGS): A Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies

    DOE PAGES

    Zhou, Xiaofan; Peris, David; Kominek, Jacek; ...

    2016-09-16

    The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimentalmore » design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.« less

  20. Yeast Genomics for Bread, Beer, Biology, Bucks and Breath

    NASA Astrophysics Data System (ADS)

    Sakharkar, Kishore R.; Sakharkar, Meena K.

    The rapid advances and scale up of projects in DNA sequencing dur ing the past two decades have produced complete genome sequences of several eukaryotic species. The versatile genetic malleability of the yeast, and the high degree of conservation between its cellular processes and those of human cells have made it a model of choice for pioneering research in molecular and cell biology. The complete sequence of yeast genome has proven to be extremely useful as a reference towards the sequences of human and for providing systems to explore key gene functions. Yeast has been a ‘legendary model’ for new technologies and gaining new biological insights into basic biological sciences and biotechnology. This chapter describes the awesome power of yeast genetics, genomics and proteomics in understanding of biological function. The applications of yeast as a screening tool to the field of drug discovery and development are highlighted and the traditional importance of yeast for bakers and brewers is discussed.

  1. Development and Assessment of a Horizontally Integrated Biological Sciences Course Sequence for Pharmacy Education

    PubMed Central

    Wright, Nicholas J.D.; Alston, Gregory L.

    2015-01-01

    Objective. To design and assess a horizontally integrated biological sciences course sequence and to determine its effectiveness in imparting the foundational science knowledge necessary to successfully progress through the pharmacy school curriculum and produce competent pharmacy school graduates. Design. A 2-semester course sequence integrated principles from several basic science disciplines: biochemistry, molecular biology, cellular biology, anatomy, physiology, and pathophysiology. Each is a 5-credit course taught 5 days per week, with 50-minute class periods. Assessment. Achievement of outcomes was determined with course examinations, student lecture, and an annual skills mastery assessment. The North American Pharmacist Licensure Examination (NAPLEX) results were used as an indicator of competency to practice pharmacy. Conclusion. Students achieved course objectives and program level outcomes. The biological sciences integrated course sequence was successful in providing students with foundational basic science knowledge required to progress through the pharmacy program and to pass the NAPLEX. The percentage of the school’s students who passed the NAPLEX was not statistically different from the national percentage. PMID:26430276

  2. The Anopheles stephensi odorant binding protein 1 (AsteObp1) gene: a new molecular marker for biological forms diagnosis.

    PubMed

    Gholizadeh, S; Firooziyan, S; Ladonni, H; Hajipirloo, H Mohammadzadeh; Djadid, N Dinparast; Hosseini, A; Raz, A

    2015-06-01

    Anopheles (Cellia) stephensi Liston 1901 is known as an Asian malaria vector. Three biological forms, namely "mysorensis", "intermediate", and "type" have been earlier reported in this species. Nevertheless, the present morphological and molecular information is insufficient to diagnose these forms. During this investigation, An. stephensi biological forms were morphologically identified and sequenced for odorant-binding protein 1 (Obp1) gene. Also, intron I sequences were used to construct phylogenetic trees. Despite nucleotide sequence variation in exon of AsteObp1, nearly 100% identity was observed at the amino acid level among the three biological forms. In order to overcome difficulties in using egg morphology characters, intron I sequences of An. stephensi Obp1 opens new molecular way to the identification of the main Asian malaria vector biological forms. However, multidisciplinary studies are needed to establish the taxonomic status of An. stephensi. Copyright © 2015 Elsevier B.V. All rights reserved.

  3. Atomistic molecular dynamics simulations of bioactive engrailed 1 interference peptides (EN1-iPeps).

    PubMed

    Gandhi, Neha S; Blancafort, Pilar; Mancera, Ricardo L

    2018-04-27

    The neural-specific transcription factor Engrailed 1 - is overexpressed in basal-like breast tumours. Synthetic interference peptides - comprising a cell-penetrating peptide/nuclear localisation sequence and the Engrailed 1-specific sequence from the N-terminus have been engineered to produce a strong apoptotic response in tumour cells overexpressing EN1, with no toxicity to normal or non Engrailed 1-expressing cells. Here scaled molecular dynamics simulations were used to study the conformational dynamics of these interference peptides in aqueous solution to characterise their structure and dynamics. Transitions from disordered to α-helical conformation, stabilised by hydrogen bonds and proline-aromatic interactions, were observed throughout the simulations. The backbone of the wild-type peptide folds to a similar conformation as that found in ternary complexes of anterior Hox proteins with conserved hexapeptide motifs important for recognition of pre-B-cell leukemia Homeobox 1, indicating that the motif may possess an intrinsic preference for helical structure. The predicted NMR chemical shifts of these peptides are consistent with the Hox hexapeptides in solution and Engrailed 2 NMR data. These findings highlight the importance of aromatic residues in determining the structure of Engrailed 1 interference peptides, shedding light on the rational design strategy of molecules that could be adopted to inhibit other transcription factors overexpressed in other cancer types, potentially including other transcription factor families that require highly conserved and cooperative protein-protein partnerships for biological activity.

  4. Molecular mechanisms of adaptation emerging from the physics and evolution of nucleic acids and proteins.

    PubMed

    Goncearenco, Alexander; Ma, Bin-Guang; Berezovsky, Igor N

    2014-03-01

    DNA, RNA and proteins are major biological macromolecules that coevolve and adapt to environments as components of one highly interconnected system. We explore here sequence/structure determinants of mechanisms of adaptation of these molecules, links between them, and results of their mutual evolution. We complemented statistical analysis of genomic and proteomic sequences with folding simulations of RNA molecules, unraveling causal relations between compositional and sequence biases reflecting molecular adaptation on DNA, RNA and protein levels. We found many compositional peculiarities related to environmental adaptation and the life style. Specifically, thermal adaptation of protein-coding sequences in Archaea is characterized by a stronger codon bias than in Bacteria. Guanine and cytosine load in the third codon position is important for supporting the aerobic life style, and it is highly pronounced in Bacteria. The third codon position also provides a tradeoff between arginine and lysine, which are favorable for thermal adaptation and aerobicity, respectively. Dinucleotide composition provides stability of nucleic acids via strong base-stacking in ApG dinucleotides. In relation to coevolution of nucleic acids and proteins, thermostability-related demands on the amino acid composition affect the nucleotide content in the second codon position in Archaea.

  5. Molecular mechanisms of adaptation emerging from the physics and evolution of nucleic acids and proteins

    PubMed Central

    Goncearenco, Alexander; Ma, Bin-Guang; Berezovsky, Igor N.

    2014-01-01

    DNA, RNA and proteins are major biological macromolecules that coevolve and adapt to environments as components of one highly interconnected system. We explore here sequence/structure determinants of mechanisms of adaptation of these molecules, links between them, and results of their mutual evolution. We complemented statistical analysis of genomic and proteomic sequences with folding simulations of RNA molecules, unraveling causal relations between compositional and sequence biases reflecting molecular adaptation on DNA, RNA and protein levels. We found many compositional peculiarities related to environmental adaptation and the life style. Specifically, thermal adaptation of protein-coding sequences in Archaea is characterized by a stronger codon bias than in Bacteria. Guanine and cytosine load in the third codon position is important for supporting the aerobic life style, and it is highly pronounced in Bacteria. The third codon position also provides a tradeoff between arginine and lysine, which are favorable for thermal adaptation and aerobicity, respectively. Dinucleotide composition provides stability of nucleic acids via strong base-stacking in ApG dinucleotides. In relation to coevolution of nucleic acids and proteins, thermostability-related demands on the amino acid composition affect the nucleotide content in the second codon position in Archaea. PMID:24371267

  6. Hydrodynamic Radii of Intrinsically Disordered Proteins Determined from Experimental Polyproline II Propensities

    PubMed Central

    Tomasso, Maria E.; Tarver, Micheal J.; Devarajan, Deepa; Whitten, Steven T.

    2016-01-01

    The properties of disordered proteins are thought to depend on intrinsic conformational propensities for polyproline II (PP II) structure. While intrinsic PP II propensities have been measured for the common biological amino acids in short peptides, the ability of these experimentally determined propensities to quantitatively reproduce structural behavior in intrinsically disordered proteins (IDPs) has not been established. Presented here are results from molecular simulations of disordered proteins showing that the hydrodynamic radius (R h) can be predicted from experimental PP II propensities with good agreement, even when charge-based considerations are omitted. The simulations demonstrate that R h and chain propensity for PP II structure are linked via a simple power-law scaling relationship, which was tested using the experimental R h of 22 IDPs covering a wide range of peptide lengths, net charge, and sequence composition. Charge effects on R h were found to be generally weak when compared to PP II effects on R h. Results from this study indicate that the hydrodynamic dimensions of IDPs are evidence of considerable sequence-dependent backbone propensities for PP II structure that qualitatively, if not quantitatively, match conformational propensities measured in peptides. PMID:26727467

  7. The Effects of Meiosis/Genetics Integration and Instructional Sequence on College Biology Student Achievement in Genetics.

    ERIC Educational Resources Information Center

    Browning, Mark

    The purpose of the research was to manipulate two aspects of genetics instruction in order to measure their effects on college, introductory biology students' achievement in genetics. One instructional sequence that was used dealt first with monohybrid autosomal inheritance patterns, then sex-linkage. The alternate sequence was the reverse.…

  8. Neuron array with plastic synapses and programmable dendrites.

    PubMed

    Ramakrishnan, Shubha; Wunderlich, Richard; Hasler, Jennifer; George, Suma

    2013-10-01

    We describe a novel neuromorphic chip architecture that models neurons for efficient computation. Traditional architectures of neuron array chips consist of large scale systems that are interfaced with AER for implementing intra- or inter-chip connectivity. We present a chip that uses AER for inter-chip communication but uses fast, reconfigurable FPGA-style routing with local memory for intra-chip connectivity. We model neurons with biologically realistic channel models, synapses and dendrites. This chip is suitable for small-scale network simulations and can also be used for sequence detection, utilizing directional selectivity properties of dendrites, ultimately for use in word recognition.

  9. Network Analysis Reveals the Recognition Mechanism for Mannose-binding Lectins

    NASA Astrophysics Data System (ADS)

    Zhao, Yunjie; Jian, Yiren; Zeng, Chen; Computational Biophysics Lab Team

    The specific carbohydrate binding of mannose-binding lectin (MBL) protein in plants makes it a very useful molecular tool for cancer cell detection and other applications. The biological states of most MBL proteins are dimeric. Using dynamics network analysis on molecular dynamics (MD) simulations on the model protein of MBL, we elucidate the short- and long-range driving forces behind the dimer formation. The results are further supported by sequence coevolution analysis. We propose a general framework for deciphering the recognition mechanism underlying protein-protein interactions that may have potential applications in signaling pathways.

  10. Is Mutation Random or Targeted?: No Evidence for Hypermutability in Snail Toxin Genes.

    PubMed

    Roy, Scott W

    2016-10-01

    Ever since Luria and Delbruck, the notion that mutation is random with respect to fitness has been foundational to modern biology. However, various studies have claimed striking exceptions to this rule. One influential case involves toxin-encoding genes in snails of the genus Conus, termed conotoxins, a large gene family that undergoes rapid diversification of their protein-coding sequences by positive selection. Previous reconstructions of the sequence evolution of conotoxin genes claimed striking patterns: (1) elevated synonymous change, interpreted as being due to targeted "hypermutation" in this region; (2) elevated transversion-to-transition ratios, interpreted as reflective of the particular mechanism of hypermutation; and (3) much lower rates of synonymous change in the codons encoding several highly conserved cysteine residues, interpreted as strong position-specific codon bias. This work has spawned a variety of studies on the potential mechanisms of hypermutation and on causes for cysteine codon bias, and has inspired hypermutation hypotheses for various other fast-evolving genes. Here, I show that all three findings are likely to be artifacts of statistical reconstruction. First, by simulating nonsynonymous change I show that high rates of dN can lead to overestimation of dS. Second, I show that there is no evidence for any of these three patterns in comparisons of closely related conotoxin sequences, suggesting that the reported findings are due to breakdown of statistical methods at high levels of sequence divergence. The current findings suggest that mutation and codon bias in conotoxin genes may not be atypical, and that random mutation and selection can explain the evolution of even these exceptional loci. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  11. MultiSeq: unifying sequence and structure data for evolutionary analysis

    PubMed Central

    Roberts, Elijah; Eargle, John; Wright, Dan; Luthey-Schulten, Zaida

    2006-01-01

    Background Since the publication of the first draft of the human genome in 2000, bioinformatic data have been accumulating at an overwhelming pace. Currently, more than 3 million sequences and 35 thousand structures of proteins and nucleic acids are available in public databases. Finding correlations in and between these data to answer critical research questions is extremely challenging. This problem needs to be approached from several directions: information science to organize and search the data; information visualization to assist in recognizing correlations; mathematics to formulate statistical inferences; and biology to analyze chemical and physical properties in terms of sequence and structure changes. Results Here we present MultiSeq, a unified bioinformatics analysis environment that allows one to organize, display, align and analyze both sequence and structure data for proteins and nucleic acids. While special emphasis is placed on analyzing the data within the framework of evolutionary biology, the environment is also flexible enough to accommodate other usage patterns. The evolutionary approach is supported by the use of predefined metadata, adherence to standard ontological mappings, and the ability for the user to adjust these classifications using an electronic notebook. MultiSeq contains a new algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of a homologous group of distantly related proteins. The method, based on the multidimensional QR factorization of multiple sequence and structure alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins. Conclusion MultiSeq is a major extension of the Multiple Alignment tool that is provided as part of VMD, a structural visualization program for analyzing molecular dynamics simulations. Both are freely distributed by the NIH Resource for Macromolecular Modeling and Bioinformatics and MultiSeq is included with VMD starting with version 1.8.5. The MultiSeq website has details on how to download and use the software: PMID:16914055

  12. Metabolic pathways for the whole community.

    PubMed

    Hanson, Niels W; Konwar, Kishori M; Hawley, Alyse K; Altman, Tomer; Karp, Peter D; Hallam, Steven J

    2014-07-22

    A convergence of high-throughput sequencing and computational power is transforming biology into information science. Despite these technological advances, converting bits and bytes of sequence information into meaningful insights remains a challenging enterprise. Biological systems operate on multiple hierarchical levels from genomes to biomes. Holistic understanding of biological systems requires agile software tools that permit comparative analyses across multiple information levels (DNA, RNA, protein, and metabolites) to identify emergent properties, diagnose system states, or predict responses to environmental change. Here we adopt the MetaPathways annotation and analysis pipeline and Pathway Tools to construct environmental pathway/genome databases (ePGDBs) that describe microbial community metabolism using MetaCyc, a highly curated database of metabolic pathways and components covering all domains of life. We evaluate Pathway Tools' performance on three datasets with different complexity and coding potential, including simulated metagenomes, a symbiotic system, and the Hawaii Ocean Time-series. We define accuracy and sensitivity relationships between read length, coverage and pathway recovery and evaluate the impact of taxonomic pruning on ePGDB construction and interpretation. Resulting ePGDBs provide interactive metabolic maps, predict emergent metabolic pathways associated with biosynthesis and energy production and differentiate between genomic potential and phenotypic expression across defined environmental gradients. This multi-tiered analysis provides the user community with specific operating guidelines, performance metrics and prediction hazards for more reliable ePGDB construction and interpretation. Moreover, it demonstrates the power of Pathway Tools in predicting metabolic interactions in natural and engineered ecosystems.

  13. The Importance of Biological Databases in Biological Discovery.

    PubMed

    Baxevanis, Andreas D; Bateman, Alex

    2015-06-19

    Biological databases play a central role in bioinformatics. They offer scientists the opportunity to access a wide variety of biologically relevant data, including the genomic sequences of an increasingly broad range of organisms. This unit provides a brief overview of major sequence databases and portals, such as GenBank, the UCSC Genome Browser, and Ensembl. Model organism databases, including WormBase, The Arabidopsis Information Resource (TAIR), and those made available through the Mouse Genome Informatics (MGI) resource, are also covered. Non-sequence-centric databases, such as Online Mendelian Inheritance in Man (OMIM), the Protein Data Bank (PDB), MetaCyc, and the Kyoto Encyclopedia of Genes and Genomes (KEGG), are also discussed. Copyright © 2015 John Wiley & Sons, Inc.

  14. The technology and biology of single-cell RNA sequencing.

    PubMed

    Kolodziejczyk, Aleksandra A; Kim, Jong Kyoung; Svensson, Valentine; Marioni, John C; Teichmann, Sarah A

    2015-05-21

    The differences between individual cells can have profound functional consequences, in both unicellular and multicellular organisms. Recently developed single-cell mRNA-sequencing methods enable unbiased, high-throughput, and high-resolution transcriptomic analysis of individual cells. This provides an additional dimension to transcriptomic information relative to traditional methods that profile bulk populations of cells. Already, single-cell RNA-sequencing methods have revealed new biology in terms of the composition of tissues, the dynamics of transcription, and the regulatory relationships between genes. Rapid technological developments at the level of cell capture, phenotyping, molecular biology, and bioinformatics promise an exciting future with numerous biological and medical applications. Copyright © 2015 Elsevier Inc. All rights reserved.

  15. Cost-effectiveness of sequenced treatment of rheumatoid arthritis with targeted immune modulators.

    PubMed

    Jansen, Jeroen P; Incerti, Devin; Mutebi, Alex; Peneva, Desi; MacEwan, Joanna P; Stolshek, Bradley; Kaur, Primal; Gharaibeh, Mahdi; Strand, Vibeke

    2017-07-01

    To determine the cost-effectiveness of treatment sequences of biologic disease-modifying anti-rheumatic drugs or Janus kinase/STAT pathway inhibitors (collectively referred to as bDMARDs) vs conventional DMARDs (cDMARDs) from the US societal perspective for treatment of patients with moderately to severely active rheumatoid arthritis (RA) with inadequate responses to cDMARDs. An individual patient simulation model was developed that assesses the impact of treatments on disease based on clinical trial data and real-world evidence. Treatment strategies included sequences starting with etanercept, adalimumab, certolizumab, or abatacept. Each of these treatment strategies was compared with cDMARDs. Incremental cost, incremental quality-adjusted life-years (QALYs), and incremental cost-effectiveness ratios (ICERs) were calculated for each treatment sequence relative to cDMARDs. The cost-effectiveness of each strategy was determined using a US willingness-to-pay (WTP) threshold of $150,000/QALY. For the base-case scenario, bDMARD treatment sequences were associated with greater treatment benefit (i.e. more QALYs), lower lost productivity costs, and greater treatment-related costs than cDMARDs. The expected ICERs for bDMARD sequences ranged from ∼$126,000 to $140,000 per QALY gained, which is below the US-specific WTP. Alternative scenarios examining the effects of homogeneous patients, dose increases, increased costs of hospitalization for severely physically impaired patients, and a lower baseline Health Assessment Questionnaire (HAQ) Disability Index score resulted in similar ICERs. bDMARD treatment sequences are cost-effective from a US societal perspective.

  16. Identifying Differentially Abundant Metabolic Pathways in Metagenomic Datasets

    NASA Astrophysics Data System (ADS)

    Liu, Bo; Pop, Mihai

    Enabled by rapid advances in sequencing technology, metagenomic studies aim to characterize entire communities of microbes bypassing the need for culturing individual bacterial members. One major goal of such studies is to identify specific functional adaptations of microbial communities to their habitats. Here we describe a powerful analytical method (MetaPath) that can identify differentially abundant pathways in metagenomic data-sets, relying on a combination of metagenomic sequence data and prior metabolic pathway knowledge. We show that MetaPath outperforms other common approaches when evaluated on simulated datasets. We also demonstrate the power of our methods in analyzing two, publicly available, metagenomic datasets: a comparison of the gut microbiome of obese and lean twins; and a comparison of the gut microbiome of infant and adult subjects. We demonstrate that the subpathways identified by our method provide valuable insights into the biological activities of the microbiome.

  17. Pattern statistics on Markov chains and sensitivity to parameter estimation

    PubMed Central

    Nuel, Grégory

    2006-01-01

    Background: In order to compute pattern statistics in computational biology a Markov model is commonly used to take into account the sequence composition. Usually its parameter must be estimated. The aim of this paper is to determine how sensitive these statistics are to parameter estimation, and what are the consequences of this variability on pattern studies (finding the most over-represented words in a genome, the most significant common words to a set of sequences,...). Results: In the particular case where pattern statistics (overlap counting only) computed through binomial approximations we use the delta-method to give an explicit expression of σ, the standard deviation of a pattern statistic. This result is validated using simulations and a simple pattern study is also considered. Conclusion: We establish that the use of high order Markov model could easily lead to major mistakes due to the high sensitivity of pattern statistics to parameter estimation. PMID:17044916

  18. Pattern statistics on Markov chains and sensitivity to parameter estimation.

    PubMed

    Nuel, Grégory

    2006-10-17

    In order to compute pattern statistics in computational biology a Markov model is commonly used to take into account the sequence composition. Usually its parameter must be estimated. The aim of this paper is to determine how sensitive these statistics are to parameter estimation, and what are the consequences of this variability on pattern studies (finding the most over-represented words in a genome, the most significant common words to a set of sequences,...). In the particular case where pattern statistics (overlap counting only) computed through binomial approximations we use the delta-method to give an explicit expression of sigma, the standard deviation of a pattern statistic. This result is validated using simulations and a simple pattern study is also considered. We establish that the use of high order Markov model could easily lead to major mistakes due to the high sensitivity of pattern statistics to parameter estimation.

  19. Information transduction capacity reduces the uncertainties in annotation-free isoform discovery and quantification

    PubMed Central

    Deng, Yue; Bao, Feng; Yang, Yang; Ji, Xiangyang; Du, Mulong; Zhang, Zhengdong

    2017-01-01

    Abstract The automated transcript discovery and quantification of high-throughput RNA sequencing (RNA-seq) data are important tasks of next-generation sequencing (NGS) research. However, these tasks are challenging due to the uncertainties that arise in the inference of complete splicing isoform variants from partially observed short reads. Here, we address this problem by explicitly reducing the inherent uncertainties in a biological system caused by missing information. In our approach, the RNA-seq procedure for transforming transcripts into short reads is considered an information transmission process. Consequently, the data uncertainties are substantially reduced by exploiting the information transduction capacity of information theory. The experimental results obtained from the analyses of simulated datasets and RNA-seq datasets from cell lines and tissues demonstrate the advantages of our method over state-of-the-art competitors. Our algorithm is an open-source implementation of MaxInfo. PMID:28911101

  20. Protein location prediction using atomic composition and global features of the amino acid sequence

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cherian, Betsy Sheena, E-mail: betsy.skb@gmail.com; Nair, Achuthsankar S.

    2010-01-22

    Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectivelymore » used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.« less

  1. Examination of Signatures of Recent Positive Selection on Genes Involved in Human Sialic Acid Biology.

    PubMed

    Moon, Jiyun M; Aronoff, David M; Capra, John A; Abbot, Patrick; Rokas, Antonis

    2018-03-28

    Sialic acids are nine carbon sugars ubiquitously found on the surfaces of vertebrate cells and are involved in various immune response-related processes. In humans, at least 58 genes spanning diverse functions, from biosynthesis and activation to recycling and degradation, are involved in sialic acid biology. Because of their role in immunity, sialic acid biology genes have been hypothesized to exhibit elevated rates of evolutionary change. Consistent with this hypothesis, several genes involved in sialic acid biology have experienced higher rates of non-synonymous substitutions in the human lineage than their counterparts in other great apes, perhaps in response to ancient pathogens that infected hominins millions of years ago (paleopathogens). To test whether sialic acid biology genes have also experienced more recent positive selection during the evolution of the modern human lineage, reflecting adaptation to contemporary cosmopolitan or geographically-restricted pathogens, we examined whether their protein-coding regions showed evidence of recent hard and soft selective sweeps. This examination involved the calculation of four measures that quantify changes in allele frequency spectra, extent of population differentiation, and haplotype homozygosity caused by recent hard and soft selective sweeps for 55 sialic acid biology genes using publicly available whole genome sequencing data from 1,668 humans from three ethnic groups. To disentangle evidence for selection from confounding demographic effects, we compared the observed patterns in sialic acid biology genes to simulated sequences of the same length under a model of neutral evolution that takes into account human demographic history. We found that the patterns of genetic variation of most sialic acid biology genes did not significantly deviate from neutral expectations and were not significantly different among genes belonging to different functional categories. Those few sialic acid biology genes that significantly deviated from neutrality either experienced soft sweeps or population-specific hard sweeps. Interestingly, while most hard sweeps occurred on genes involved in sialic acid recognition, most soft sweeps involved genes associated with recycling, degradation and activation, transport, and transfer functions. We propose that the lack of signatures of recent positive selection for the majority of the sialic acid biology genes is consistent with the view that these genes regulate immune responses against ancient rather than contemporary cosmopolitan or geographically restricted pathogens. Copyright © 2018 Moon et al.

  2. Transformation of a Traditional, Freshman Biology, Three-Semester Sequence, to a Two-Semester, Integrated Thematically Organized, and Team-Taught Course

    ERIC Educational Resources Information Center

    Soto, Julio G.; Everhart, Jerry

    2016-01-01

    Biology faculty at San José State University developed, piloted, implemented, and assessed a freshmen course sequence based on the macro-to micro-teaching approach that was team-taught, and organized around unifying themes. Content learning assessment drove the conceptual framework of our course sequence. Content student learning increased…

  3. Improved Atomistic Monte Carlo Simulations Demonstrate that Poly-L-Proline Adopts Heterogeneous Ensembles of Conformations of Semi-Rigid Segments Interrupted by Kinks

    PubMed Central

    Radhakrishnan, Aditya; Vitalis, Andreas; Mao, Albert H.; Steffen, Adam T.; Pappu, Rohit V.

    2012-01-01

    Poly-L-proline (PLP) polymers are useful mimics of biologically relevant proline-rich sequences. Biophysical and computational studies of PLP polymers in aqueous solutions are challenging because of the diversity of length scales and the slow time scales for conformational conversions. We describe an atomistic simulation approach that combines an improved ABSINTH implicit solvation model, with conformational sampling based on standard and novel Metropolis Monte Carlo moves. Refinements to forcefield parameters were guided by published experimental data for proline-rich systems. We assessed the validity of our simulation results through quantitative comparisons to experimental data that were not used in refining the forcefield parameters. Our analysis shows that PLP polymers form heterogeneous ensembles of conformations characterized by semi-rigid, rod-like segments interrupted by kinks, which result from a combination of internal cis peptide bonds, flexible backbone ψ-angles, and the coupling between ring puckering and backbone degrees of freedom. PMID:22329658

  4. Bayesian selection of Markov models for symbol sequences: application to microsaccadic eye movements.

    PubMed

    Bettenbühl, Mario; Rusconi, Marco; Engbert, Ralf; Holschneider, Matthias

    2012-01-01

    Complex biological dynamics often generate sequences of discrete events which can be described as a Markov process. The order of the underlying Markovian stochastic process is fundamental for characterizing statistical dependencies within sequences. As an example for this class of biological systems, we investigate the Markov order of sequences of microsaccadic eye movements from human observers. We calculate the integrated likelihood of a given sequence for various orders of the Markov process and use this in a Bayesian framework for statistical inference on the Markov order. Our analysis shows that data from most participants are best explained by a first-order Markov process. This is compatible with recent findings of a statistical coupling of subsequent microsaccade orientations. Our method might prove to be useful for a broad class of biological systems.

  5. Computing Platforms for Big Biological Data Analytics: Perspectives and Challenges.

    PubMed

    Yin, Zekun; Lan, Haidong; Tan, Guangming; Lu, Mian; Vasilakos, Athanasios V; Liu, Weiguo

    2017-01-01

    The last decade has witnessed an explosion in the amount of available biological sequence data, due to the rapid progress of high-throughput sequencing projects. However, the biological data amount is becoming so great that traditional data analysis platforms and methods can no longer meet the need to rapidly perform data analysis tasks in life sciences. As a result, both biologists and computer scientists are facing the challenge of gaining a profound insight into the deepest biological functions from big biological data. This in turn requires massive computational resources. Therefore, high performance computing (HPC) platforms are highly needed as well as efficient and scalable algorithms that can take advantage of these platforms. In this paper, we survey the state-of-the-art HPC platforms for big biological data analytics. We first list the characteristics of big biological data and popular computing platforms. Then we provide a taxonomy of different biological data analysis applications and a survey of the way they have been mapped onto various computing platforms. After that, we present a case study to compare the efficiency of different computing platforms for handling the classical biological sequence alignment problem. At last we discuss the open issues in big biological data analytics.

  6. Self-Organizing Hidden Markov Model Map (SOHMMM): Biological Sequence Clustering and Cluster Visualization.

    PubMed

    Ferles, Christos; Beaufort, William-Scott; Ferle, Vanessa

    2017-01-01

    The present study devises mapping methodologies and projection techniques that visualize and demonstrate biological sequence data clustering results. The Sequence Data Density Display (SDDD) and Sequence Likelihood Projection (SLP) visualizations represent the input symbolical sequences in a lower-dimensional space in such a way that the clusters and relations of data elements are depicted graphically. Both operate in combination/synergy with the Self-Organizing Hidden Markov Model Map (SOHMMM). The resulting unified framework is in position to analyze automatically and directly raw sequence data. This analysis is carried out with little, or even complete absence of, prior information/domain knowledge.

  7. Using Phylogenetic Analysis to Detect Market Substitution of Atlantic Salmon for Pacific Salmon: An Introductory Biology Laboratory Experiment

    ERIC Educational Resources Information Center

    Cline, Erica; Gogarten, Jennifer

    2012-01-01

    We describe a laboratory exercise developed for the cell and molecular biology quarter of a year-long majors' undergraduate introductory biology sequence. In an analysis of salmon samples collected by students in their local stores and restaurants, DNA sequencing and phylogenetic analysis were used to detect market substitution of Atlantic salmon…

  8. Ab Initio structure prediction for Escherichia coli: towards genome-wide protein structure modeling and fold assignment

    PubMed Central

    Xu, Dong; Zhang, Yang

    2013-01-01

    Genome-wide protein structure prediction and structure-based function annotation have been a long-term goal in molecular biology but not yet become possible due to difficulties in modeling distant-homology targets. We developed a hybrid pipeline combining ab initio folding and template-based modeling for genome-wide structure prediction applied to the Escherichia coli genome. The pipeline was tested on 43 known sequences, where QUARK-based ab initio folding simulation generated models with TM-score 17% higher than that by traditional comparative modeling methods. For 495 unknown hard sequences, 72 are predicted to have a correct fold (TM-score > 0.5) and 321 have a substantial portion of structure correctly modeled (TM-score > 0.35). 317 sequences can be reliably assigned to a SCOP fold family based on structural analogy to existing proteins in PDB. The presented results, as a case study of E. coli, represent promising progress towards genome-wide structure modeling and fold family assignment using state-of-the-art ab initio folding algorithms. PMID:23719418

  9. Comparative Analysis of Single-Cell RNA Sequencing Methods.

    PubMed

    Ziegenhain, Christoph; Vieth, Beate; Parekh, Swati; Reinius, Björn; Guillaumet-Adkins, Amy; Smets, Martha; Leonhardt, Heinrich; Heyn, Holger; Hellmann, Ines; Enard, Wolfgang

    2017-02-16

    Single-cell RNA sequencing (scRNA-seq) offers new possibilities to address biological and medical questions. However, systematic comparisons of the performance of diverse scRNA-seq protocols are lacking. We generated data from 583 mouse embryonic stem cells to evaluate six prominent scRNA-seq methods: CEL-seq2, Drop-seq, MARS-seq, SCRB-seq, Smart-seq, and Smart-seq2. While Smart-seq2 detected the most genes per cell and across cells, CEL-seq2, Drop-seq, MARS-seq, and SCRB-seq quantified mRNA levels with less amplification noise due to the use of unique molecular identifiers (UMIs). Power simulations at different sequencing depths showed that Drop-seq is more cost-efficient for transcriptome quantification of large numbers of cells, while MARS-seq, SCRB-seq, and Smart-seq2 are more efficient when analyzing fewer cells. Our quantitative comparison offers the basis for an informed choice among six prominent scRNA-seq methods, and it provides a framework for benchmarking further improvements of scRNA-seq protocols. Copyright © 2017 Elsevier Inc. All rights reserved.

  10. Comparison of next generation sequencing technologies for transcriptome characterization

    PubMed Central

    2009-01-01

    Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms. PMID:19646272

  11. Apoptosis: a four-week laboratory investigation for advanced molecular and cellular biology students.

    PubMed

    DiBartolomeis, Susan M; Moné, James P

    2003-01-01

    Over the past decade, apoptosis has emerged as an important field of study central to ongoing research in many diverse fields, from developmental biology to cancer research. Apoptosis proceeds by a highly coordinated series of events that includes enzyme activation, DNA fragmentation, and alterations in plasma membrane permeability. The detection of each of these phenotypic changes is accessible to advanced undergraduate cell and molecular biology students. We describe a 4-week laboratory sequence that integrates cell culture, fluorescence microscopy, DNA isolation and analysis, and western blotting (immunoblotting) to follow apoptosis in cultured human cells. Students working in teams chemically induce apoptosis, and harvest, process, and analyze cells, using their data to determine the order of events during apoptosis. We, as instructors, expose the students to an environment closely simulating what they would encounter in an active cell or molecular biology research laboratory by having students coordinate and perform multiple tasks simultaneously and by having them experience experimental design using current literature, data interpretation, and analysis to answer a single question. Students are assessed by examination of laboratory notebooks for completeness of experimental protocols and analysis of results and for completion of an assignment that includes questions pertaining to data interpretation and apoptosis.

  12. SIRW: A web server for the Simple Indexing and Retrieval System that combines sequence motif searches with keyword searches.

    PubMed

    Ramu, Chenna

    2003-07-01

    SIRW (http://sirw.embl.de/) is a World Wide Web interface to the Simple Indexing and Retrieval System (SIR) that is capable of parsing and indexing various flat file databases. In addition it provides a framework for doing sequence analysis (e.g. motif pattern searches) for selected biological sequences through keyword search. SIRW is an ideal tool for the bioinformatics community for searching as well as analyzing biological sequences of interest.

  13. XS: a FASTQ read simulator.

    PubMed

    Pratas, Diogo; Pinho, Armando J; Rodrigues, João M O S

    2014-01-16

    The emerging next-generation sequencing (NGS) is bringing, besides the natural huge amounts of data, an avalanche of new specialized tools (for analysis, compression, alignment, among others) and large public and private network infrastructures. Therefore, a direct necessity of specific simulation tools for testing and benchmarking is rising, such as a flexible and portable FASTQ read simulator, without the need of a reference sequence, yet correctly prepared for producing approximately the same characteristics as real data. We present XS, a skilled FASTQ read simulation tool, flexible, portable (does not need a reference sequence) and tunable in terms of sequence complexity. It has several running modes, depending on the time and memory available, and is aimed at testing computing infrastructures, namely cloud computing of large-scale projects, and testing FASTQ compression algorithms. Moreover, XS offers the possibility of simulating the three main FASTQ components individually (headers, DNA sequences and quality-scores). XS provides an efficient and convenient method for fast simulation of FASTQ files, such as those from Ion Torrent (currently uncovered by other simulators), Roche-454, Illumina and ABI-SOLiD sequencing machines. This tool is publicly available at http://bioinformatics.ua.pt/software/xs/.

  14. The perceptual shaping of anticipatory actions.

    PubMed

    Maffei, Giovanni; Herreros, Ivan; Sanchez-Fibla, Marti; Friston, Karl J; Verschure, Paul F M J

    2017-12-20

    Humans display anticipatory motor responses to minimize the adverse effects of predictable perturbations. A widely accepted explanation for this behaviour relies on the notion of an inverse model that, learning from motor errors, anticipates corrective responses. Here, we propose and validate the alternative hypothesis that anticipatory control can be realized through a cascade of purely sensory predictions that drive the motor system, reflecting the causal sequence of the perceptual events preceding the error. We compare both hypotheses in a simulated anticipatory postural adjustment task. We observe that adaptation in the sensory domain, but not in the motor one, supports the robust and generalizable anticipatory control characteristic of biological systems. Our proposal unites the neurobiology of the cerebellum with the theory of active inference and provides a concrete implementation of its core tenets with great relevance both to our understanding of biological control systems and, possibly, to their emulation in complex artefacts. © 2017 The Author(s).

  15. G =  MAT: linking transcription factor expression and DNA binding data.

    PubMed

    Tretyakov, Konstantin; Laur, Sven; Vilo, Jaak

    2011-01-31

    Transcription factors are proteins that bind to motifs on the DNA and thus affect gene expression regulation. The qualitative description of the corresponding processes is therefore important for a better understanding of essential biological mechanisms. However, wet lab experiments targeted at the discovery of the regulatory interplay between transcription factors and binding sites are expensive. We propose a new, purely computational method for finding putative associations between transcription factors and motifs. This method is based on a linear model that combines sequence information with expression data. We present various methods for model parameter estimation and show, via experiments on simulated data, that these methods are reliable. Finally, we examine the performance of this model on biological data and conclude that it can indeed be used to discover meaningful associations. The developed software is available as a web tool and Scilab source code at http://biit.cs.ut.ee/gmat/.

  16. G = MAT: Linking Transcription Factor Expression and DNA Binding Data

    PubMed Central

    Tretyakov, Konstantin; Laur, Sven; Vilo, Jaak

    2011-01-01

    Transcription factors are proteins that bind to motifs on the DNA and thus affect gene expression regulation. The qualitative description of the corresponding processes is therefore important for a better understanding of essential biological mechanisms. However, wet lab experiments targeted at the discovery of the regulatory interplay between transcription factors and binding sites are expensive. We propose a new, purely computational method for finding putative associations between transcription factors and motifs. This method is based on a linear model that combines sequence information with expression data. We present various methods for model parameter estimation and show, via experiments on simulated data, that these methods are reliable. Finally, we examine the performance of this model on biological data and conclude that it can indeed be used to discover meaningful associations. The developed software is available as a web tool and Scilab source code at http://biit.cs.ut.ee/gmat/. PMID:21297945

  17. BioPepDB: an integrated data platform for food-derived bioactive peptides.

    PubMed

    Li, Qilin; Zhang, Chao; Chen, Hongjun; Xue, Jitong; Guo, Xiaolei; Liang, Ming; Chen, Ming

    2018-03-12

    Food-derived bioactive peptides play critical roles in regulating most biological processes and have considerable biological, medical and industrial importance. However, a large number of active peptides data, including sequence, function, source, commercial product information, references and other information are poorly integrated. BioPepDB is a searchable database of food-derived bioactive peptides and their related articles, including more than four thousand bioactive peptide entries. Moreover, BioPepDB provides modules of prediction and hydrolysis-simulation for discovering novel peptides. It can serve as a reference database to investigate the function of different bioactive peptides. BioPepDB is available at http://bis.zju.edu.cn/biopepdbr/ . The web page utilises Apache, PHP5 and MySQL to provide the user interface for accessing the database and predict novel peptides. The database itself is operated on a specialised server.

  18. Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms.

    PubMed

    Galbadrakh, Bulgan; Lee, Kyung-Eun; Park, Hyun-Seok

    2012-12-01

    Grammatical inference methods are expected to find grammatical structures hidden in biological sequences. One hopes that studies of grammar serve as an appropriate tool for theory formation. Thus, we have developed JSequitur for automatically generating the grammatical structure of biological sequences in an inference framework of string compression algorithms. Our original motivation was to find any grammatical traits of several cancer genes that can be detected by string compression algorithms. Through this research, we could not find any meaningful unique traits of the cancer genes yet, but we could observe some interesting traits in regards to the relationship among gene length, similarity of sequences, the patterns of the generated grammar, and compression rate.

  19. Biological intuition in alignment-free methods: response to Posada.

    PubMed

    Ragan, Mark A; Chan, Cheong Xin

    2013-08-01

    A recent editorial in Journal of Molecular Evolution highlights opportunities and challenges facing molecular evolution in the era of next-generation sequencing. Abundant sequence data should allow more-complex models to be fit at higher confidence, making phylogenetic inference more reliable and improving our understanding of evolution at the molecular level. However, concern that approaches based on multiple sequence alignment may be computationally infeasible for large datasets is driving the development of so-called alignment-free methods for sequence comparison and phylogenetic inference. The recent editorial characterized these approaches as model-free, not based on the concept of homology, and lacking in biological intuition. We argue here that alignment-free methods have not abandoned models or homology, and can be biologically intuitive.

  20. Ensembler: Enabling High-Throughput Molecular Simulations at the Superfamily Scale.

    PubMed

    Parton, Daniel L; Grinaway, Patrick B; Hanson, Sonya M; Beauchamp, Kyle A; Chodera, John D

    2016-06-01

    The rapidly expanding body of available genomic and protein structural data provides a rich resource for understanding protein dynamics with biomolecular simulation. While computational infrastructure has grown rapidly, simulations on an omics scale are not yet widespread, primarily because software infrastructure to enable simulations at this scale has not kept pace. It should now be possible to study protein dynamics across entire (super)families, exploiting both available structural biology data and conformational similarities across homologous proteins. Here, we present a new tool for enabling high-throughput simulation in the genomics era. Ensembler takes any set of sequences-from a single sequence to an entire superfamily-and shepherds them through various stages of modeling and refinement to produce simulation-ready structures. This includes comparative modeling to all relevant PDB structures (which may span multiple conformational states of interest), reconstruction of missing loops, addition of missing atoms, culling of nearly identical structures, assignment of appropriate protonation states, solvation in explicit solvent, and refinement and filtering with molecular simulation to ensure stable simulation. The output of this pipeline is an ensemble of structures ready for subsequent molecular simulations using computer clusters, supercomputers, or distributed computing projects like Folding@home. Ensembler thus automates much of the time-consuming process of preparing protein models suitable for simulation, while allowing scalability up to entire superfamilies. A particular advantage of this approach can be found in the construction of kinetic models of conformational dynamics-such as Markov state models (MSMs)-which benefit from a diverse array of initial configurations that span the accessible conformational states to aid sampling. We demonstrate the power of this approach by constructing models for all catalytic domains in the human tyrosine kinase family, using all available kinase catalytic domain structures from any organism as structural templates. Ensembler is free and open source software licensed under the GNU General Public License (GPL) v2. It is compatible with Linux and OS X. The latest release can be installed via the conda package manager, and the latest source can be downloaded from https://github.com/choderalab/ensembler.

  1. De Novo Design and Experimental Characterization of Ultrashort Self-Associating Peptides

    PubMed Central

    Xue, Bo; Robinson, Robert C.; Hauser, Charlotte A. E.; Floudas, Christodoulos A.

    2014-01-01

    Self-association is a common phenomenon in biology and one that can have positive and negative impacts, from the construction of the architectural cytoskeleton of cells to the formation of fibrils in amyloid diseases. Understanding the nature and mechanisms of self-association is important for modulating these systems and in creating biologically-inspired materials. Here, we present a two-stage de novo peptide design framework that can generate novel self-associating peptide systems. The first stage uses a simulated multimeric template structure as input into the optimization-based Sequence Selection to generate low potential energy sequences. The second stage is a computational validation procedure that calculates Fold Specificity and/or Approximate Association Affinity (K*association) based on metrics that we have devised for multimeric systems. This framework was applied to the design of self-associating tripeptides using the known self-associating tripeptide, Ac-IVD, as a structural template. Six computationally predicted tripeptides (Ac-LVE, Ac-YYD, Ac-LLE, Ac-YLD, Ac-MYD, Ac-VIE) were chosen for experimental validation in order to illustrate the self-association outcomes predicted by the three metrics. Self-association and electron microscopy studies revealed that Ac-LLE formed bead-like microstructures, Ac-LVE and Ac-YYD formed fibrillar aggregates, Ac-VIE and Ac-MYD formed hydrogels, and Ac-YLD crystallized under ambient conditions. An X-ray crystallographic study was carried out on a single crystal of Ac-YLD, which revealed that each molecule adopts a β-strand conformation that stack together to form parallel β-sheets. As an additional validation of the approach, the hydrogel-forming sequences of Ac-MYD and Ac-VIE were shuffled. The shuffled sequences were computationally predicted to have lower K*association values and were experimentally verified to not form hydrogels. This illustrates the robustness of the framework in predicting self-associating tripeptides. We expect that this enhanced multimeric de novo peptide design framework will find future application in creating novel self-associating peptides based on unnatural amino acids, and inhibitor peptides of detrimental self-aggregating biological proteins. PMID:25010703

  2. BAC sequencing using pooled methods.

    PubMed

    Saski, Christopher A; Feltus, F Alex; Parida, Laxmi; Haiminen, Niina

    2015-01-01

    Shotgun sequencing and assembly of a large, complex genome can be both expensive and challenging to accurately reconstruct the true genome sequence. Repetitive DNA arrays, paralogous sequences, polyploidy, and heterozygosity are main factors that plague de novo genome sequencing projects that typically result in highly fragmented assemblies and are difficult to extract biological meaning. Targeted, sub-genomic sequencing offers complexity reduction by removing distal segments of the genome and a systematic mechanism for exploring prioritized genomic content through BAC sequencing. If one isolates and sequences the genome fraction that encodes the relevant biological information, then it is possible to reduce overall sequencing costs and efforts that target a genomic segment. This chapter describes the sub-genome assembly protocol for an organism based upon a BAC tiling path derived from a genome-scale physical map or from fine mapping using BACs to target sub-genomic regions. Methods that are described include BAC isolation and mapping, DNA sequencing, and sequence assembly.

  3. MySSP: Non-stationary evolutionary sequence simulation, including indels

    PubMed Central

    Rosenberg, Michael S.

    2007-01-01

    MySSP is a new program for the simulation of DNA sequence evolution across a phylogenetic tree. Although many programs are available for sequence simulation, MySSP is unique in its inclusion of indels, flexibility in allowing for non-stationary patterns, and output of ancestral sequences. Some of these features can individually be found in existing programs, but have not all have been previously available in a single package. PMID:19325855

  4. Simulation-Based Evaluation of Learning Sequences for Instructional Technologies

    ERIC Educational Resources Information Center

    McEneaney, John E.

    2016-01-01

    Instructional technologies critically depend on systematic design, and learning hierarchies are a commonly advocated tool for designing instructional sequences. But hierarchies routinely allow numerous sequences and choosing an optimal sequence remains an unsolved problem. This study explores a simulation-based approach to modeling learning…

  5. EEG and ECG changes during simulator operation reflect mental workload and vigilance.

    PubMed

    Dussault, Caroline; Jouanin, Jean-Claude; Philippe, Matthieu; Guezennec, Charles-Yannick

    2005-04-01

    Performing mission tasks in a simulator influences many neurophysiological measures. Quantitative assessments of electroencephalography (EEG) and electrocardiography (ECG) have made it possible to develop indicators of mental workload and to estimate relative physiological responses to cognitive requirements. To evaluate the effects of mental workload without actual physical risk, we studied the cortical and cardiovascular changes that occurred during simulated flight. There were 12 pilots (8 novices and 4 experts) who simulated a flight composed of 10 sequences that induced several different mental workload levels. EEG was recorded at 12 electrode sites during rest and flight sequences; ECG activity was also recorded. Subjective tests were used to evaluate anxiety and vigilance levels. Theta band activity was lower during the two simulated flight rest sequences than during visual and instrument flight sequences at central, parietal, and occipital sites (p < 0.05). On the other hand, rest sequences resulted in higher beta (at the C4 site; p < 0.05) and gamma (at the central, parietal, and occipital sites; p < 0.05) power than active segments. The mean heart rate (HR) was not significantly different during any simulated flight sequence, but HR was lower for expert subjects than for novices. The subjective tests revealed no significant anxiety and high values for vigilance levels before and during flight. The different flight sequences performed on the simulator resulted in electrophysiological changes that expressed variations in mental workload. These results corroborate those found during study of real flights, particularly during sequences requiring the heaviest mental workload.

  6. Patome: a database server for biological sequence annotation and analysis in issued patents and published patent applications.

    PubMed

    Lee, Byungwook; Kim, Taehyung; Kim, Seon-Kyu; Lee, Kwang H; Lee, Doheon

    2007-01-01

    With the advent of automated and high-throughput techniques, the number of patent applications containing biological sequences has been increasing rapidly. However, they have attracted relatively little attention compared to other sequence resources. We have built a database server called Patome, which contains biological sequence data disclosed in patents and published applications, as well as their analysis information. The analysis is divided into two steps. The first is an annotation step in which the disclosed sequences were annotated with RefSeq database. The second is an association step where the sequences were linked to Entrez Gene, OMIM and GO databases, and their results were saved as a gene-patent table. From the analysis, we found that 55% of human genes were associated with patenting. The gene-patent table can be used to identify whether a particular gene or disease is related to patenting. Patome is available at http://www.patome.org/; the information is updated bimonthly.

  7. Patome: a database server for biological sequence annotation and analysis in issued patents and published patent applications

    PubMed Central

    Lee, Byungwook; Kim, Taehyung; Kim, Seon-Kyu; Lee, Kwang H.; Lee, Doheon

    2007-01-01

    With the advent of automated and high-throughput techniques, the number of patent applications containing biological sequences has been increasing rapidly. However, they have attracted relatively little attention compared to other sequence resources. We have built a database server called Patome, which contains biological sequence data disclosed in patents and published applications, as well as their analysis information. The analysis is divided into two steps. The first is an annotation step in which the disclosed sequences were annotated with RefSeq database. The second is an association step where the sequences were linked to Entrez Gene, OMIM and GO databases, and their results were saved as a gene–patent table. From the analysis, we found that 55% of human genes were associated with patenting. The gene–patent table can be used to identify whether a particular gene or disease is related to patenting. Patome is available at ; the information is updated bimonthly. PMID:17085479

  8. Prediction of phenotypes of missense mutations in human proteins from biological assemblies.

    PubMed

    Wei, Qiong; Xu, Qifang; Dunbrack, Roland L

    2013-02-01

    Single nucleotide polymorphisms (SNPs) are the most frequent variation in the human genome. Nonsynonymous SNPs that lead to missense mutations can be neutral or deleterious, and several computational methods have been presented that predict the phenotype of human missense mutations. These methods use sequence-based and structure-based features in various combinations, relying on different statistical distributions of these features for deleterious and neutral mutations. One structure-based feature that has not been studied significantly is the accessible surface area within biologically relevant oligomeric assemblies. These assemblies are different from the crystallographic asymmetric unit for more than half of X-ray crystal structures. We find that mutations in the core of proteins or in the interfaces in biological assemblies are significantly more likely to be disease-associated than those on the surface of the biological assemblies. For structures with more than one protein in the biological assembly (whether the same sequence or different), we find the accessible surface area from biological assemblies provides a statistically significant improvement in prediction over the accessible surface area of monomers from protein crystal structures (P = 6e-5). When adding this information to sequence-based features such as the difference between wildtype and mutant position-specific profile scores, the improvement from biological assemblies is statistically significant but much smaller (P = 0.018). Combining this information with sequence-based features in a support vector machine leads to 82% accuracy on a balanced dataset of 50% disease-associated mutations from SwissVar and 50% neutral mutations from human/primate sequence differences in orthologous proteins. Copyright © 2012 Wiley Periodicals, Inc.

  9. Free Energy Landscape - Settlements of Key Residues.

    NASA Astrophysics Data System (ADS)

    Aroutiounian, Svetlana

    2007-03-01

    FEL perspective in studies of protein folding transitions reflects notion that since there are ˜10^N conformations to scan in search of lowest free energy state, random search is beyond biological timescale. Protein folding must follow certain fel pathways and folding kinetics of evolutionary selected proteins dominates kinetic traps. Good model for functional robustness of natural proteins - coarse-grained model protein is not very accurate but affords bringing simulations closer to biological realm; Go-like potential secures the fel funnel shape; biochemical contacts signify the funnel bottleneck. Boltzmann-weighted ensemble of protein conformations and histogram method are used to obtain from MC sampling of protein conformational space the approximate probability distribution. The fel is F(rmsd) = -1/βLn[Hist(rmsd)], β=kBT and rmsd is root-mean-square-deviation from native conformation. The sperm whale myoglobin has rich dynamic behavior, is small and large - on computational scale, has a symmetry in architecture and unusual sextet of residue pairs. Main idea: there is a mathematical relation between protein fel and a key residues set providing stability to folding transition. Is the set evolutionary conserved also for functional reasons? Hypothesis: primary sequence determines the key residues positions conserved as stabilizers and the fel is the battlefield for the folding stability. Preliminary results: primary sequence - not the architecture, is the rule settler, indeed.

  10. From Random Walks to Brownian Motion, from Diffusion to Entropy: Statistical Principles in Introductory Physics

    NASA Astrophysics Data System (ADS)

    Reeves, Mark

    2014-03-01

    Entropy changes underlie the physics that dominates biological interactions. Indeed, introductory biology courses often begin with an exploration of the qualities of water that are important to living systems. However, one idea that is not explicitly addressed in most introductory physics or biology textbooks is dominant contribution of the entropy in driving important biological processes towards equilibrium. From diffusion to cell-membrane formation, to electrostatic binding in protein folding, to the functioning of nerve cells, entropic effects often act to counterbalance deterministic forces such as electrostatic attraction and in so doing, allow for effective molecular signaling. A small group of biology, biophysics and computer science faculty have worked together for the past five years to develop curricular modules (based on SCALEUP pedagogy) that enable students to create models of stochastic and deterministic processes. Our students are first-year engineering and science students in the calculus-based physics course and they are not expected to know biology beyond the high-school level. In our class, they learn to reduce seemingly complex biological processes and structures to be described by tractable models that include deterministic processes and simple probabilistic inference. The students test these models in simulations and in laboratory experiments that are biologically relevant. The students are challenged to bridge the gap between statistical parameterization of their data (mean and standard deviation) and simple model-building by inference. This allows the students to quantitatively describe realistic cellular processes such as diffusion, ionic transport, and ligand-receptor binding. Moreover, the students confront ``random'' forces and traditional forces in problems, simulations, and in laboratory exploration throughout the year-long course as they move from traditional kinematics through thermodynamics to electrostatic interactions. This talk will present a number of these exercises, with particular focus on the hands-on experiments done by the students, and will give examples of the tangible material that our students work with throughout the two-semester sequence of their course on introductory physics with a bio focus. Supported by NSF DUE.

  11. Points of View: A Survey of Survey Courses--Are They Effective? A Unique Approach? Four Semesters of Biology Core Curriculum

    ERIC Educational Resources Information Center

    Batzli, Janet M.

    2005-01-01

    ''Why four semesters? How does this track differ from the two-semester course sequence?'' These are the most common questions students have when they learn about the Biology Core Curriculum (Biocore), a unique four-semester honors biology sequence at University of Wisconsin-Madison (UW-Madison). Biocore was first taught at University of Wisconsin…

  12. Nonparametric Combinatorial Sequence Models

    NASA Astrophysics Data System (ADS)

    Wauthier, Fabian L.; Jordan, Michael I.; Jojic, Nebojsa

    This work considers biological sequences that exhibit combinatorial structures in their composition: groups of positions of the aligned sequences are "linked" and covary as one unit across sequences. If multiple such groups exist, complex interactions can emerge between them. Sequences of this kind arise frequently in biology but methodologies for analyzing them are still being developed. This paper presents a nonparametric prior on sequences which allows combinatorial structures to emerge and which induces a posterior distribution over factorized sequence representations. We carry out experiments on three sequence datasets which indicate that combinatorial structures are indeed present and that combinatorial sequence models can more succinctly describe them than simpler mixture models. We conclude with an application to MHC binding prediction which highlights the utility of the posterior distribution induced by the prior. By integrating out the posterior our method compares favorably to leading binding predictors.

  13. String Mining in Bioinformatics

    NASA Astrophysics Data System (ADS)

    Abouelhoda, Mohamed; Ghanem, Moustafa

    Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word "data-mining" is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].

  14. String Mining in Bioinformatics

    NASA Astrophysics Data System (ADS)

    Abouelhoda, Mohamed; Ghanem, Moustafa

    Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word “data-mining” is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].

  15. Mean field analysis of algorithms for scale-free networks in molecular biology

    PubMed Central

    2017-01-01

    The sampling of scale-free networks in Molecular Biology is usually achieved by growing networks from a seed using recursive algorithms with elementary moves which include the addition and deletion of nodes and bonds. These algorithms include the Barabási-Albert algorithm. Later algorithms, such as the Duplication-Divergence algorithm, the Solé algorithm and the iSite algorithm, were inspired by biological processes underlying the evolution of protein networks, and the networks they produce differ essentially from networks grown by the Barabási-Albert algorithm. In this paper the mean field analysis of these algorithms is reconsidered, and extended to variant and modified implementations of the algorithms. The degree sequences of scale-free networks decay according to a powerlaw distribution, namely P(k) ∼ k−γ, where γ is a scaling exponent. We derive mean field expressions for γ, and test these by numerical simulations. Generally, good agreement is obtained. We also found that some algorithms do not produce scale-free networks (for example some variant Barabási-Albert and Solé networks). PMID:29272285

  16. Computational biology in the cloud: methods and new insights from computing at scale.

    PubMed

    Kasson, Peter M

    2013-01-01

    The past few years have seen both explosions in the size of biological data sets and the proliferation of new, highly flexible on-demand computing capabilities. The sheer amount of information available from genomic and metagenomic sequencing, high-throughput proteomics, experimental and simulation datasets on molecular structure and dynamics affords an opportunity for greatly expanded insight, but it creates new challenges of scale for computation, storage, and interpretation of petascale data. Cloud computing resources have the potential to help solve these problems by offering a utility model of computing and storage: near-unlimited capacity, the ability to burst usage, and cheap and flexible payment models. Effective use of cloud computing on large biological datasets requires dealing with non-trivial problems of scale and robustness, since performance-limiting factors can change substantially when a dataset grows by a factor of 10,000 or more. New computing paradigms are thus often needed. The use of cloud platforms also creates new opportunities to share data, reduce duplication, and to provide easy reproducibility by making the datasets and computational methods easily available.

  17. Mean field analysis of algorithms for scale-free networks in molecular biology.

    PubMed

    Konini, S; Janse van Rensburg, E J

    2017-01-01

    The sampling of scale-free networks in Molecular Biology is usually achieved by growing networks from a seed using recursive algorithms with elementary moves which include the addition and deletion of nodes and bonds. These algorithms include the Barabási-Albert algorithm. Later algorithms, such as the Duplication-Divergence algorithm, the Solé algorithm and the iSite algorithm, were inspired by biological processes underlying the evolution of protein networks, and the networks they produce differ essentially from networks grown by the Barabási-Albert algorithm. In this paper the mean field analysis of these algorithms is reconsidered, and extended to variant and modified implementations of the algorithms. The degree sequences of scale-free networks decay according to a powerlaw distribution, namely P(k) ∼ k-γ, where γ is a scaling exponent. We derive mean field expressions for γ, and test these by numerical simulations. Generally, good agreement is obtained. We also found that some algorithms do not produce scale-free networks (for example some variant Barabási-Albert and Solé networks).

  18. The World Health Organization Global Programme on AIDS proposal for standardization of HIV sequence nomenclature. WHO Network for HIV Isolation and Characterization.

    PubMed

    Korber, B T; Osmanov, S; Esparza, J; Myers, G

    1994-11-01

    The World Health Organization Global Programme on AIDS (WHO/GPA) is conducting a large-scale collaborative study of human immunodeficiency virus type 1 (HIV-1) variation, based in four potential vaccine-trial site countries: Brazil, Rwanda, Thailand, and Uganda. Through the course of this study, it was crucial to keep track of certain attributes of the samples from which the viral nucleotide sequences were derived (e.g., country of origin and viral culture characterization), so that meaningful sequence comparisons could be made. Here we describe a system developed in the context of the WHO/GPA study that summarizes such critical attributes by representing them as standardized characters directly incorporated into sequence names. This nomenclature allows linkage of clinical, phenotypic, and geographic information with molecular data. We propose that other investigators involved in human immunodeficiency virus (HIV) nucleotide sequencing efforts adopt a similar standardized sequence nomenclature to facilitate cross-study sequence comparison. HIV sequence data are being generated at an ever-increasing rate; directly coupled to this increase is our deepening understanding of biological parameters that influence or result from sequence variability. A standardized sequence nomenclature that includes relevant biological information would enable researchers to better utilize the growing body of sequence data, and enhance their ability to interpret the biological implications of their own data through facilitating comparisons with previously published work.

  19. Cost-effectiveness of biological treatment sequences for fistulising Crohn’s disease across Europe

    PubMed Central

    Baji, Petra; Gulácsi, László; Brodszky, Valentin; Végh, Zsuzsanna; Danese, Silvio; Irving, Peter M; Peyrin-Biroulet, Laurent; Schreiber, Stefan; Rencz, Fanni; Lakatos, Péter L; Péntek, Márta

    2017-01-01

    Background In clinical practice, treatment sequences of biologicals are applied for active fistulising Crohn’s disease, however underlying health economic analyses are lacking. Objective The purpose of this study was to analyse the cost-effectiveness of different biological sequences including infliximab, biosimilar-infliximab, adalimumab and vedolizumab in nine European countries. Methods A Markov model was developed to compare treatment sequences of one, two and three biologicals from the payer’s perspective on a five-year time horizon. Data on effectiveness and health state utilities were obtained from the literature. Country-specific costs were considered. Calculations were performed with both official list prices and estimated real prices of biologicals. Results Biosimilar-infliximab is the most cost-effective treatment against standard care across the countries (with list prices: €34684–€72551/quality adjusted life year; with estimated real prices: €24364–€56086/quality adjusted life year). The most cost-effective two-agent sequence, except for Germany, is the biosimilar-infliximab–adalimumab therapy compared with single biosimilar-infliximab (with list prices: €58533–€133831/quality adjusted life year; with estimated prices: €45513–€105875/quality adjusted life year). The cost-effectiveness of the biosimilar-infliximab–adalimumab–vedolizumab three-agent sequence compared wit biosimilar-infliximab –adalimumab is €87214–€152901/quality adjusted life year. Conclusions The suggested first-choice biological treatment is biosimilar-infliximab. In case of treatment failure, switching to adalimumab then to vedolizumab provides meaningful additional health gains but at increased costs. Inter-country differences in cost-effectiveness are remarkable due to significant differences in costs. PMID:29511561

  20. DLocalMotif: a discriminative approach for discovering local motifs in protein sequences.

    PubMed

    Mehdi, Ahmed M; Sehgal, Muhammad Shoaib B; Kobe, Bostjan; Bailey, Timothy L; Bodén, Mikael

    2013-01-01

    Local motifs are patterns of DNA or protein sequences that occur within a sequence interval relative to a biologically defined anchor or landmark. Current protein motif discovery methods do not adequately consider such constraints to identify biologically significant motifs that are only weakly over-represented but spatially confined. Using negatives, i.e. sequences known to not contain a local motif, can further increase the specificity of their discovery. This article introduces the method DLocalMotif that makes use of positional information and negative data for local motif discovery in protein sequences. DLocalMotif combines three scoring functions, measuring degrees of motif over-representation, entropy and spatial confinement, specifically designed to discriminatively exploit the availability of negative data. The method is shown to outperform current methods that use only a subset of these motif characteristics. We apply the method to several biological datasets. The analysis of peroxisomal targeting signals uncovers several novel motifs that occur immediately upstream of the dominant peroxisomal targeting signal-1 signal. The analysis of proline-tyrosine nuclear localization signals uncovers multiple novel motifs that overlap with C2H2 zinc finger domains. We also evaluate the method on classical nuclear localization signals and endoplasmic reticulum retention signals and find that DLocalMotif successfully recovers biologically relevant sequence properties. http://bioinf.scmb.uq.edu.au/dlocalmotif/

  1. Automated numerical simulation of biological pattern formation based on visual feedback simulation framework

    PubMed Central

    Sun, Mingzhu; Xu, Hui; Zeng, Xingjuan; Zhao, Xin

    2017-01-01

    There are various fantastic biological phenomena in biological pattern formation. Mathematical modeling using reaction-diffusion partial differential equation systems is employed to study the mechanism of pattern formation. However, model parameter selection is both difficult and time consuming. In this paper, a visual feedback simulation framework is proposed to calculate the parameters of a mathematical model automatically based on the basic principle of feedback control. In the simulation framework, the simulation results are visualized, and the image features are extracted as the system feedback. Then, the unknown model parameters are obtained by comparing the image features of the simulation image and the target biological pattern. Considering two typical applications, the visual feedback simulation framework is applied to fulfill pattern formation simulations for vascular mesenchymal cells and lung development. In the simulation framework, the spot, stripe, labyrinthine patterns of vascular mesenchymal cells, the normal branching pattern and the branching pattern lacking side branching for lung branching are obtained in a finite number of iterations. The simulation results indicate that it is easy to achieve the simulation targets, especially when the simulation patterns are sensitive to the model parameters. Moreover, this simulation framework can expand to other types of biological pattern formation. PMID:28225811

  2. Automated numerical simulation of biological pattern formation based on visual feedback simulation framework.

    PubMed

    Sun, Mingzhu; Xu, Hui; Zeng, Xingjuan; Zhao, Xin

    2017-01-01

    There are various fantastic biological phenomena in biological pattern formation. Mathematical modeling using reaction-diffusion partial differential equation systems is employed to study the mechanism of pattern formation. However, model parameter selection is both difficult and time consuming. In this paper, a visual feedback simulation framework is proposed to calculate the parameters of a mathematical model automatically based on the basic principle of feedback control. In the simulation framework, the simulation results are visualized, and the image features are extracted as the system feedback. Then, the unknown model parameters are obtained by comparing the image features of the simulation image and the target biological pattern. Considering two typical applications, the visual feedback simulation framework is applied to fulfill pattern formation simulations for vascular mesenchymal cells and lung development. In the simulation framework, the spot, stripe, labyrinthine patterns of vascular mesenchymal cells, the normal branching pattern and the branching pattern lacking side branching for lung branching are obtained in a finite number of iterations. The simulation results indicate that it is easy to achieve the simulation targets, especially when the simulation patterns are sensitive to the model parameters. Moreover, this simulation framework can expand to other types of biological pattern formation.

  3. Review of the fundamental theories behind small angle X-ray scattering, molecular dynamics simulations, and relevant integrated application.

    PubMed

    Boldon, Lauren; Laliberte, Fallon; Liu, Li

    2015-01-01

    In this paper, the fundamental concepts and equations necessary for performing small angle X-ray scattering (SAXS) experiments, molecular dynamics (MD) simulations, and MD-SAXS analyses were reviewed. Furthermore, several key biological and non-biological applications for SAXS, MD, and MD-SAXS are presented in this review; however, this article does not cover all possible applications. SAXS is an experimental technique used for the analysis of a wide variety of biological and non-biological structures. SAXS utilizes spherical averaging to produce one- or two-dimensional intensity profiles, from which structural data may be extracted. MD simulation is a computer simulation technique that is used to model complex biological and non-biological systems at the atomic level. MD simulations apply classical Newtonian mechanics' equations of motion to perform force calculations and to predict the theoretical physical properties of the system. This review presents several applications that highlight the ability of both SAXS and MD to study protein folding and function in addition to non-biological applications, such as the study of mechanical, electrical, and structural properties of non-biological nanoparticles. Lastly, the potential benefits of combining SAXS and MD simulations for the study of both biological and non-biological systems are demonstrated through the presentation of several examples that combine the two techniques.

  4. Introduction to Single-Cell RNA Sequencing.

    PubMed

    Olsen, Thale Kristin; Baryawno, Ninib

    2018-04-01

    During the last decade, high-throughput sequencing methods have revolutionized the entire field of biology. The opportunity to study entire transcriptomes in great detail using RNA sequencing (RNA-seq) has fueled many important discoveries and is now a routine method in biomedical research. However, RNA-seq is typically performed in "bulk," and the data represent an average of gene expression patterns across thousands to millions of cells; this might obscure biologically relevant differences between cells. Single-cell RNA-seq (scRNA-seq) represents an approach to overcome this problem. By isolating single cells, capturing their transcripts, and generating sequencing libraries in which the transcripts are mapped to individual cells, scRNA-seq allows assessment of fundamental biological properties of cell populations and biological systems at unprecedented resolution. Here, we present the most common scRNA-seq protocols in use today and the basics of data analysis and discuss factors that are important to consider before planning and designing an scRNA-seq project. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.

  5. Comparison of the protein-protein interfaces in the p53-DNA crystal structures: towards elucidation of the biological interface.

    PubMed

    Ma, Buyong; Pan, Yongping; Gunasekaran, K; Venkataraghavan, R Babu; Levine, Arnold J; Nussinov, Ruth

    2005-03-15

    p53, the tumor suppressor protein, functions as a dimer of dimers. However, how the tetramer binds to the DNA is still an open question. In the crystal structure, three copies of the p53 monomers (containing chains A, B, and C) were crystallized with the DNA-consensus element. Although the structure provides crucial data on the p53-DNA contacts, the active oligomeric state is unclear because the two dimeric (A-B and B-C) interfaces present in the crystal cannot both exist in the tetramer. Here, we address the question of which of these two dimeric interfaces may be more biologically relevant. We analyze the sequence and structural properties of the p53-p53 dimeric interfaces and carry out extensive molecular dynamics simulations of the crystal structures of the human and mouse p53 dimers. We find that the A-B interface residues are more conserved than those of the B-C. Molecular dynamics simulations show that the A-B interface can provide a stable DNA-binding motif in the dimeric state, unlike B-C. Our results indicate that the interface between chains A-B in the p53-DNA complex constitutes a better candidate for a stable biological interface, whereas the B-C interface is more likely to be due to crystal packing. Thus, they have significant implications toward our understanding of DNA binding by p53 as well as p53-mediated interactions with other proteins.

  6. CAMELOT: A machine learning approach for coarse-grained simulations of aggregation of block-copolymeric protein sequences

    PubMed Central

    Ruff, Kiersten M.; Harmon, Tyler S.; Pappu, Rohit V.

    2015-01-01

    We report the development and deployment of a coarse-graining method that is well suited for computer simulations of aggregation and phase separation of protein sequences with block-copolymeric architectures. Our algorithm, named CAMELOT for Coarse-grained simulations Aided by MachinE Learning Optimization and Training, leverages information from converged all atom simulations that is used to determine a suitable resolution and parameterize the coarse-grained model. To parameterize a system-specific coarse-grained model, we use a combination of Boltzmann inversion, non-linear regression, and a Gaussian process Bayesian optimization approach. The accuracy of the coarse-grained model is demonstrated through direct comparisons to results from all atom simulations. We demonstrate the utility of our coarse-graining approach using the block-copolymeric sequence from the exon 1 encoded sequence of the huntingtin protein. This sequence comprises of 17 residues from the N-terminal end of huntingtin (N17) followed by a polyglutamine (polyQ) tract. Simulations based on the CAMELOT approach are used to show that the adsorption and unfolding of the wild type N17 and its sequence variants on the surface of polyQ tracts engender a patchy colloid like architecture that promotes the formation of linear aggregates. These results provide a plausible explanation for experimental observations, which show that N17 accelerates the formation of linear aggregates in block-copolymeric N17-polyQ sequences. The CAMELOT approach is versatile and is generalizable for simulating the aggregation and phase behavior of a range of block-copolymeric protein sequences. PMID:26723608

  7. Bioaccessible peptides released by in vitro gastrointestinal digestion of fermented goat milks.

    PubMed

    Moreno-Montoro, Miriam; Jauregi, Paula; Navarro-Alarcón, Miguel; Olalla-Herrera, Manuel; Giménez-Martínez, Rafael; Amigo, Lourdes; Miralles, Beatriz

    2018-06-01

    In this study, ultrafiltered goat milks fermented with the classical starter bacteria Lactobacillus delbrueckii subsp. bulgaricus and Streptococcus salivarus subsp. thermophilus or with the classical starter plus the Lactobacillus plantarum C4 probiotic strain were analyzed using ultra-high performance liquid chromatography-quadrupole-time-of-flight tandem mass spectrometry (UPLC-Q-TOF-MS/MS) and/or high performance liquid chromatography-ion trap (HPLC-IT-MS/MS). Partial overlapping of the identified sequences with regard to fermentation culture was observed. Evaluation of the cleavage specificity suggested a lower proteolytic activity of the probiotic strain. Some of the potentially identified peptides had been previously reported as angiotensin-converting enzyme (ACE) inhibitory, antioxidant, and antibacterial and might account for the in vitro activity previously reported for these fermented milks. Simulated digestion of the products was conducted in the presence of a dialysis membrane to retrieve the bioaccessible peptide fraction. Some sequences with reported physiological activity resisted digestion but were found in the non-dialyzable fraction. However, new forms released by digestion, such as the antioxidant α s1 -casein 144 YFYPQL 149 , the antihypertensive α s2 -casein 90 YQKFPQY 96 , and the antibacterial α s2 -casein 165 LKKISQ 170 , were found in the dialyzable fraction of both fermented milks. Moreover, in the fermented milk including the probiotic strain, the k-casein dipeptidyl peptidase IV inhibitor (DPP-IV) 51 INNQFLPYPY 60 as well as additional ACE inhibitory or antioxidant sequences could be identified. With the aim of anticipating further biological outcomes, quantitative structure activity relationship (QSAR) analysis was applied to the bioaccessible fragments and led to potential ACE inhibitory sequences being proposed. Graphical abstract Ultrafiltered goat milks were fermented with the classical starter bacteria (St) and with St plus the L. plantarum C4 probiotic strain. Samples were analyzed using HPLC-IT-MS/MS and UPLC-Q-TOF-MS/MS. After simulated digestion and dialysis, some of the active sequences remained and new peptides with reported beneficial activities were released.

  8. Introduction to bioinformatics.

    PubMed

    Can, Tolga

    2014-01-01

    Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.

  9. Computational design of RNA parts, devices, and transcripts with kinetic folding algorithms implemented on multiprocessor clusters.

    PubMed

    Thimmaiah, Tim; Voje, William E; Carothers, James M

    2015-01-01

    With progress toward inexpensive, large-scale DNA assembly, the demand for simulation tools that allow the rapid construction of synthetic biological devices with predictable behaviors continues to increase. By combining engineered transcript components, such as ribosome binding sites, transcriptional terminators, ligand-binding aptamers, catalytic ribozymes, and aptamer-controlled ribozymes (aptazymes), gene expression in bacteria can be fine-tuned, with many corollaries and applications in yeast and mammalian cells. The successful design of genetic constructs that implement these kinds of RNA-based control mechanisms requires modeling and analyzing kinetically determined co-transcriptional folding pathways. Transcript design methods using stochastic kinetic folding simulations to search spacer sequence libraries for motifs enabling the assembly of RNA component parts into static ribozyme- and dynamic aptazyme-regulated expression devices with quantitatively predictable functions (rREDs and aREDs, respectively) have been described (Carothers et al., Science 334:1716-1719, 2011). Here, we provide a detailed practical procedure for computational transcript design by illustrating a high throughput, multiprocessor approach for evaluating spacer sequences and generating functional rREDs. This chapter is written as a tutorial, complete with pseudo-code and step-by-step instructions for setting up a computational cluster with an Amazon, Inc. web server and performing the large numbers of kinefold-based stochastic kinetic co-transcriptional folding simulations needed to design functional rREDs and aREDs. The method described here should be broadly applicable for designing and analyzing a variety of synthetic RNA parts, devices and transcripts.

  10. The transcriptome of Lutzomyia longipalpis (Diptera: Psychodidae) male reproductive organs.

    PubMed

    Azevedo, Renata V D M; Dias, Denise B S; Bretãs, Jorge A C; Mazzoni, Camila J; Souza, Nataly A; Albano, Rodolpho M; Wagner, Glauber; Davila, Alberto M R; Peixoto, Alexandre A

    2012-01-01

    It has been suggested that genes involved in the reproductive biology of insect disease vectors are potential targets for future alternative methods of control. Little is known about the molecular biology of reproduction in phlebotomine sand flies and there is no information available concerning genes that are expressed in male reproductive organs of Lutzomyia longipalpis, the main vector of American visceral leishmaniasis and a species complex. We generated 2678 high quality ESTs ("Expressed Sequence Tags") of L. longipalpis male reproductive organs that were grouped in 1391 non-redundant sequences (1136 singlets and 255 clusters). BLAST analysis revealed that only 57% of these sequences share similarity with a L. longipalpis female EST database. Although no more than 36% of the non-redundant sequences showed similarity to protein sequences deposited in databases, more than half of them presented the best-match hits with mosquito genes. Gene ontology analysis identified subsets of genes involved in biological processes such as protein biosynthesis and DNA replication, which are probably associated with spermatogenesis. A number of non-redundant sequences were also identified as putative male reproductive gland proteins (mRGPs), also known as male accessory gland protein genes (Acps). The transcriptome analysis of L. longipalpis male reproductive organs is one step further in the study of the molecular basis of the reproductive biology of this important species complex. It has allowed the identification of genes potentially involved in spermatogenesis as well as putative mRGPs sequences, which have been studied in many insect species because of their effects on female post-mating behavior and physiology and their potential role in sexual selection and speciation. These data open a number of new avenues for further research in the molecular and evolutionary reproductive biology of sand flies.

  11. The Transcriptome of Lutzomyia longipalpis (Diptera: Psychodidae) Male Reproductive Organs

    PubMed Central

    Bretãs, Jorge A. C.; Mazzoni, Camila J.; Souza, Nataly A.; Albano, Rodolpho M.; Wagner, Glauber; Davila, Alberto M. R.; Peixoto, Alexandre A.

    2012-01-01

    Background It has been suggested that genes involved in the reproductive biology of insect disease vectors are potential targets for future alternative methods of control. Little is known about the molecular biology of reproduction in phlebotomine sand flies and there is no information available concerning genes that are expressed in male reproductive organs of Lutzomyia longipalpis, the main vector of American visceral leishmaniasis and a species complex. Methods/Principal Findings We generated 2678 high quality ESTs (“Expressed Sequence Tags”) of L. longipalpis male reproductive organs that were grouped in 1391 non-redundant sequences (1136 singlets and 255 clusters). BLAST analysis revealed that only 57% of these sequences share similarity with a L. longipalpis female EST database. Although no more than 36% of the non-redundant sequences showed similarity to protein sequences deposited in databases, more than half of them presented the best-match hits with mosquito genes. Gene ontology analysis identified subsets of genes involved in biological processes such as protein biosynthesis and DNA replication, which are probably associated with spermatogenesis. A number of non-redundant sequences were also identified as putative male reproductive gland proteins (mRGPs), also known as male accessory gland protein genes (Acps). Conclusions The transcriptome analysis of L. longipalpis male reproductive organs is one step further in the study of the molecular basis of the reproductive biology of this important species complex. It has allowed the identification of genes potentially involved in spermatogenesis as well as putative mRGPs sequences, which have been studied in many insect species because of their effects on female post-mating behavior and physiology and their potential role in sexual selection and speciation. These data open a number of new avenues for further research in the molecular and evolutionary reproductive biology of sand flies. PMID:22496818

  12. SIRAH: a structurally unbiased coarse-grained force field for proteins with aqueous solvation and long-range electrostatics.

    PubMed

    Darré, Leonardo; Machado, Matías Rodrigo; Brandner, Astrid Febe; González, Humberto Carlos; Ferreira, Sebastián; Pantano, Sergio

    2015-02-10

    Modeling of macromolecular structures and interactions represents an important challenge for computational biology, involving different time and length scales. However, this task can be facilitated through the use of coarse-grained (CG) models, which reduce the number of degrees of freedom and allow efficient exploration of complex conformational spaces. This article presents a new CG protein model named SIRAH, developed to work with explicit solvent and to capture sequence, temperature, and ionic strength effects in a topologically unbiased manner. SIRAH is implemented in GROMACS, and interactions are calculated using a standard pairwise Hamiltonian for classical molecular dynamics simulations. We present a set of simulations that test the capability of SIRAH to produce a qualitatively correct solvation on different amino acids, hydrophilic/hydrophobic interactions, and long-range electrostatic recognition leading to spontaneous association of unstructured peptides and stable structures of single polypeptides and protein-protein complexes.

  13. Evaluation of nine popular de novo assemblers in microbial genome assembly.

    PubMed

    Forouzan, Esmaeil; Maleki, Masoumeh Sadat Mousavi; Karkhane, Ali Asghar; Yakhchali, Bagher

    2017-12-01

    Next generation sequencing (NGS) technologies are revolutionizing biology, with Illumina being the most popular NGS platform. Short read assembly is a critical part of most genome studies using NGS. Hence, in this study, the performance of nine well-known assemblers was evaluated in the assembly of seven different microbial genomes. Effect of different read coverage and k-mer parameters on the quality of the assembly were also evaluated on both simulated and actual read datasets. Our results show that the performance of assemblers on real and simulated datasets could be significantly different, mainly because of coverage bias. According to outputs on actual read datasets, for all studied read coverages (of 7×, 25× and 100×), SPAdes and IDBA-UD clearly outperformed other assemblers based on NGA50 and accuracy metrics. Velvet is the most conservative assembler with the lowest NGA50 and error rate. Copyright © 2017. Published by Elsevier B.V.

  14. Developmental Self-Construction and -Configuration of Functional Neocortical Neuronal Networks

    PubMed Central

    Bauer, Roman; Zubler, Frédéric; Pfister, Sabina; Hauri, Andreas; Pfeiffer, Michael; Muir, Dylan R.; Douglas, Rodney J.

    2014-01-01

    The prenatal development of neural circuits must provide sufficient configuration to support at least a set of core postnatal behaviors. Although knowledge of various genetic and cellular aspects of development is accumulating rapidly, there is less systematic understanding of how these various processes play together in order to construct such functional networks. Here we make some steps toward such understanding by demonstrating through detailed simulations how a competitive co-operative (‘winner-take-all’, WTA) network architecture can arise by development from a single precursor cell. This precursor is granted a simplified gene regulatory network that directs cell mitosis, differentiation, migration, neurite outgrowth and synaptogenesis. Once initial axonal connection patterns are established, their synaptic weights undergo homeostatic unsupervised learning that is shaped by wave-like input patterns. We demonstrate how this autonomous genetically directed developmental sequence can give rise to self-calibrated WTA networks, and compare our simulation results with biological data. PMID:25474693

  15. An experimentally-informed coarse-grained 3-site-per-nucleotide model of DNA: Structure, thermodynamics, and dynamics of hybridization

    PubMed Central

    Hinckley, Daniel M.; Freeman, Gordon S.; Whitmer, Jonathan K.; de Pablo, Juan J.

    2013-01-01

    A new 3-Site-Per-Nucleotide coarse-grained model for DNA is presented. The model includes anisotropic potentials between bases involved in base stacking and base pair interactions that enable the description of relevant structural properties, including the major and minor grooves. In an improvement over available coarse-grained models, the correct persistence length is recovered for both ssDNA and dsDNA, allowing for simulation of non-canonical structures such as hairpins. DNA melting temperatures, measured for duplexes and hairpins by integrating over free energy surfaces generated using metadynamics simulations, are shown to be in quantitative agreement with experiment for a variety of sequences and conditions. Hybridization rate constants, calculated using forward-flux sampling, are also shown to be in good agreement with experiment. The coarse-grained model presented here is suitable for use in biological and engineering applications, including nucleosome positioning and DNA-templated engineering. PMID:24116642

  16. Elucidation of Peptide-Directed Palladium Surface Structure for Biologically Tunable Nanocatalysts

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bedford, Nicholas M.; Ramezani-Dakhel, Hadi; Slocik, Joseph M.

    Peptide-enabled synthesis of inorganic nanostructures represents an avenue to access catalytic materials with tunable and optimized properties. This is achieved via peptide complexity and programmability that is missing in traditional ligands for catalytic nanomaterials. Unfortunately, there is limited information available to correlate peptide sequence to particle structure and catalytic activity to date. As such, the application of peptide-enabled nanocatalysts remains limited to trial and error approaches. In this paper, a hybrid experimental and computational approach is introduced to systematically elucidate biomolecule-dependent structure/function relationships for peptide-capped Pd nanocatalysts. Synchrotron X-ray techniques were used to uncover substantial particle surface structural disorder, whichmore » was dependent upon the amino acid sequence of the peptide capping ligand. Nanocatalyst configurations were then determined directly from experimental data using reverse Monte Carlo methods and further refined using molecular dynamics simulation, obtaining thermodynamically stable peptide-Pd nanoparticle configurations. Sequence-dependent catalytic property differences for C-C coupling and olefin hydrogenation were then eluddated by identification of the catalytic active sites at the atomic level and quantitative prediction of relative reaction rates. This hybrid methodology provides a clear route to determine peptide-dependent structure/function relationships, enabling the generation of guidelines for catalyst design through rational tailoring of peptide sequences« less

  17. Computational analysis of stochastic heterogeneity in PCR amplification efficiency revealed by single molecule barcoding

    PubMed Central

    Best, Katharine; Oakes, Theres; Heather, James M.; Shawe-Taylor, John; Chain, Benny

    2015-01-01

    The polymerase chain reaction (PCR) is one of the most widely used techniques in molecular biology. In combination with High Throughput Sequencing (HTS), PCR is widely used to quantify transcript abundance for RNA-seq, and in the context of analysis of T and B cell receptor repertoires. In this study, we combine DNA barcoding with HTS to quantify PCR output from individual target molecules. We develop computational tools that simulate both the PCR branching process itself, and the subsequent subsampling which typically occurs during HTS sequencing. We explore the influence of different types of heterogeneity on sequencing output, and compare them to experimental results where the efficiency of amplification is measured by barcodes uniquely identifying each molecule of starting template. Our results demonstrate that the PCR process introduces substantial amplification heterogeneity, independent of primer sequence and bulk experimental conditions. This heterogeneity can be attributed both to inherited differences between different template DNA molecules, and the inherent stochasticity of the PCR process. The results demonstrate that PCR heterogeneity arises even when reaction and substrate conditions are kept as constant as possible, and therefore single molecule barcoding is essential in order to derive reproducible quantitative results from any protocol combining PCR with HTS. PMID:26459131

  18. Elucidation of peptide-directed palladium surface structure for biologically tunable nanocatalysts.

    PubMed

    Bedford, Nicholas M; Ramezani-Dakhel, Hadi; Slocik, Joseph M; Briggs, Beverly D; Ren, Yang; Frenkel, Anatoly I; Petkov, Valeri; Heinz, Hendrik; Naik, Rajesh R; Knecht, Marc R

    2015-05-26

    Peptide-enabled synthesis of inorganic nanostructures represents an avenue to access catalytic materials with tunable and optimized properties. This is achieved via peptide complexity and programmability that is missing in traditional ligands for catalytic nanomaterials. Unfortunately, there is limited information available to correlate peptide sequence to particle structure and catalytic activity to date. As such, the application of peptide-enabled nanocatalysts remains limited to trial and error approaches. In this paper, a hybrid experimental and computational approach is introduced to systematically elucidate biomolecule-dependent structure/function relationships for peptide-capped Pd nanocatalysts. Synchrotron X-ray techniques were used to uncover substantial particle surface structural disorder, which was dependent upon the amino acid sequence of the peptide capping ligand. Nanocatalyst configurations were then determined directly from experimental data using reverse Monte Carlo methods and further refined using molecular dynamics simulation, obtaining thermodynamically stable peptide-Pd nanoparticle configurations. Sequence-dependent catalytic property differences for C-C coupling and olefin hydrogenation were then elucidated by identification of the catalytic active sites at the atomic level and quantitative prediction of relative reaction rates. This hybrid methodology provides a clear route to determine peptide-dependent structure/function relationships, enabling the generation of guidelines for catalyst design through rational tailoring of peptide sequences.

  19. Simulating Biological and Non-Biological Motion

    ERIC Educational Resources Information Center

    Bruzzo, Angela; Gesierich, Benno; Wohlschlager, Andreas

    2008-01-01

    It is widely accepted that the brain processes biological and non-biological movements in distinct neural circuits. Biological motion, in contrast to non-biological motion, refers to active movements of living beings. Aim of our experiment was to investigate the mechanisms underlying mental simulation of these two movement types. Subjects had to…

  20. Pyvolve: A Flexible Python Module for Simulating Sequences along Phylogenies.

    PubMed

    Spielman, Stephanie J; Wilke, Claus O

    2015-01-01

    We introduce Pyvolve, a flexible Python module for simulating genetic data along a phylogeny using continuous-time Markov models of sequence evolution. Easily incorporated into Python bioinformatics pipelines, Pyvolve can simulate sequences according to most standard models of nucleotide, amino-acid, and codon sequence evolution. All model parameters are fully customizable. Users can additionally specify custom evolutionary models, with custom rate matrices and/or states to evolve. This flexibility makes Pyvolve a convenient framework not only for simulating sequences under a wide variety of conditions, but also for developing and testing new evolutionary models. Pyvolve is an open-source project under a FreeBSD license, and it is available for download, along with a detailed user-manual and example scripts, from http://github.com/sjspielman/pyvolve.

  1. Studies and simulations of the DigiCipher system

    NASA Technical Reports Server (NTRS)

    Sayood, K.; Chen, Y. C.; Kipp, G.

    1993-01-01

    During this period the development of simulators for the various high definition television (HDTV) systems proposed to the FCC was continued. The FCC has indicated that it wants the various proposers to collaborate on a single system. Based on all available information this system will look very much like the advanced digital television (ADTV) system with major contributions only from the DigiCipher system. The results of our simulations of the DigiCipher system are described. This simulator was tested using test sequences from the MPEG committee. The results are extrapolated to HDTV video sequences. Once again, some caveats are in order. The sequences used for testing the simulator and generating the results are those used for testing the MPEG algorithm. The sequences are of much lower resolution than the HDTV sequences would be, and therefore the extrapolations are not totally accurate. One would expect to get significantly higher compression in terms of bits per pixel with sequences that are of higher resolution. However, the simulator itself is a valid one, and should HDTV sequences become available, they could be used directly with the simulator. A brief overview of the DigiCipher system is given. Some coding results obtained using the simulator are looked at. These results are compared to those obtained using the ADTV system. These results are evaluated in the context of the CCSDS specifications and make some suggestions as to how the DigiCipher system could be implemented in the NASA network. Simulations such as the ones reported can be biased depending on the particular source sequence used. In order to get more complete information about the system one needs to obtain a reasonable set of models which mirror the various kinds of sources encountered during video coding. A set of models which can be used to effectively model the various possible scenarios is provided. As this is somewhat tangential to the other work reported, the results are included as an appendix.

  2. Phylogenetic estimates of diversification rate are affected by molecular rate variation.

    PubMed

    Duchêne, D A; Hua, X; Bromham, L

    2017-10-01

    Molecular phylogenies are increasingly being used to investigate the patterns and mechanisms of macroevolution. In particular, node heights in a phylogeny can be used to detect changes in rates of diversification over time. Such analyses rest on the assumption that node heights in a phylogeny represent the timing of diversification events, which in turn rests on the assumption that evolutionary time can be accurately predicted from DNA sequence divergence. But there are many influences on the rate of molecular evolution, which might also influence node heights in molecular phylogenies, and thus affect estimates of diversification rate. In particular, a growing number of studies have revealed an association between the net diversification rate estimated from phylogenies and the rate of molecular evolution. Such an association might, by influencing the relative position of node heights, systematically bias estimates of diversification time. We simulated the evolution of DNA sequences under several scenarios where rates of diversification and molecular evolution vary through time, including models where diversification and molecular evolutionary rates are linked. We show that commonly used methods, including metric-based, likelihood and Bayesian approaches, can have a low power to identify changes in diversification rate when molecular substitution rates vary. Furthermore, the association between the rates of speciation and molecular evolution rate can cause the signature of a slowdown or speedup in speciation rates to be lost or misidentified. These results suggest that the multiple sources of variation in molecular evolutionary rates need to be considered when inferring macroevolutionary processes from phylogenies. © 2017 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2017 European Society For Evolutionary Biology.

  3. Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.

    PubMed

    Mi, Gu; Di, Yanming; Schafer, Daniel W

    2015-01-01

    This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.

  4. Hierarchy of folding and unfolding events of protein G, CI2, and ACBP from explicit-solvent simulations

    NASA Astrophysics Data System (ADS)

    Camilloni, Carlo; Broglia, Ricardo A.; Tiana, Guido

    2011-01-01

    The study of the mechanism which is at the basis of the phenomenon of protein folding requires the knowledge of multiple folding trajectories under biological conditions. Using a biasing molecular-dynamics algorithm based on the physics of the ratchet-and-pawl system, we carry out all-atom, explicit solvent simulations of the sequence of folding events which proteins G, CI2, and ACBP undergo in evolving from the denatured to the folded state. Starting from highly disordered conformations, the algorithm allows the proteins to reach, at the price of a modest computational effort, nativelike conformations, within a root mean square deviation (RMSD) of approximately 1 Å. A scheme is developed to extract, from the myriad of events, information concerning the sequence of native contact formation and of their eventual correlation. Such an analysis indicates that all the studied proteins fold hierarchically, through pathways which, although not deterministic, are well-defined with respect to the order of contact formation. The algorithm also allows one to study unfolding, a process which looks, to a large extent, like the reverse of the major folding pathway. This is also true in situations in which many pathways contribute to the folding process, like in the case of protein G.

  5. Simulations of HIV Capsid Protein Dimerization Reveal the Effect of Chemistry and Topography on the Mechanism of Hydrophobic Protein Association

    PubMed Central

    Yu, Naiyin; Hagan, Michael F.

    2012-01-01

    Recent work has shown that the hydrophobic protein surfaces in aqueous solution sit near a drying transition. The tendency for these surfaces to expel water from their vicinity leads to self-assembly of macromolecular complexes. In this article, we show with a realistic model for a biologically pertinent system how this phenomenon appears at the molecular level. We focus on the association of the C-terminal domain (CA-C) of the human immunodeficiency virus capsid protein. By combining all-atom simulations with specialized sampling techniques, we measure the water density distribution during the approach of two CA-C proteins as a function of separation and amino acid sequence in the interfacial region. The simulations demonstrate that CA-C protein-protein interactions sit at the edge of a dewetting transition and that this mesoscopic manifestation of the underlying liquid-vapor phase transition can be readily manipulated by biology or protein engineering to significantly affect association behavior. Although the wild-type protein remains wet until contact, we identify a set of in silico mutations, in which three hydrophilic amino acids are replaced with nonpolar residues, that leads to dewetting before association. The existence of dewetting depends on the size and relative locations of substituted residues separated by nanometer length scales, indicating long-range cooperativity and a sensitivity to surface topography. These observations identify important details that are missing from descriptions of protein association based on buried hydrophobic surface area. PMID:22995509

  6. Analysis of plant microbe interactions in the era of next generation sequencing technologies

    PubMed Central

    Knief, Claudia

    2014-01-01

    Next generation sequencing (NGS) technologies have impressively accelerated research in biological science during the last years by enabling the production of large volumes of sequence data to a drastically lower price per base, compared to traditional sequencing methods. The recent and ongoing developments in the field allow addressing research questions in plant-microbe biology that were not conceivable just a few years ago. The present review provides an overview of NGS technologies and their usefulness for the analysis of microorganisms that live in association with plants. Possible limitations of the different sequencing systems, in particular sources of errors and bias, are critically discussed and methods are disclosed that help to overcome these shortcomings. A focus will be on the application of NGS methods in metagenomic studies, including the analysis of microbial communities by amplicon sequencing, which can be considered as a targeted metagenomic approach. Different applications of NGS technologies are exemplified by selected research articles that address the biology of the plant associated microbiota to demonstrate the worth of the new methods. PMID:24904612

  7. Cancer systems biology in the genome sequencing era: part 1, dissecting and modeling of tumor clones and their networks.

    PubMed

    Wang, Edwin; Zou, Jinfeng; Zaman, Naif; Beitel, Lenore K; Trifiro, Mark; Paliouras, Miltiadis

    2013-08-01

    Recent tumor genome sequencing confirmed that one tumor often consists of multiple cell subpopulations (clones) which bear different, but related, genetic profiles such as mutation and copy number variation profiles. Thus far, one tumor has been viewed as a whole entity in cancer functional studies. With the advances of genome sequencing and computational analysis, we are able to quantify and computationally dissect clones from tumors, and then conduct clone-based analysis. Emerging technologies such as single-cell genome sequencing and RNA-Seq could profile tumor clones. Thus, we should reconsider how to conduct cancer systems biology studies in the genome sequencing era. We will outline new directions for conducting cancer systems biology by considering that genome sequencing technology can be used for dissecting, quantifying and genetically characterizing clones from tumors. Topics discussed in Part 1 of this review include computationally quantifying of tumor subpopulations; clone-based network modeling, cancer hallmark-based networks and their high-order rewiring principles and the principles of cell survival networks of fast-growing clones. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.

  8. Shot sequencing based on biological equivalent dose considerations for multiple isocenter Gamma Knife radiosurgery.

    PubMed

    Ma, Lijun; Lee, Letitia; Barani, Igor; Hwang, Andrew; Fogh, Shannon; Nakamura, Jean; McDermott, Michael; Sneed, Penny; Larson, David A; Sahgal, Arjun

    2011-11-21

    Rapid delivery of multiple shots or isocenters is one of the hallmarks of Gamma Knife radiosurgery. In this study, we investigated whether the temporal order of shots delivered with Gamma Knife Perfexion would significantly influence the biological equivalent dose for complex multi-isocenter treatments. Twenty single-target cases were selected for analysis. For each case, 3D dose matrices of individual shots were extracted and single-fraction equivalent uniform dose (sEUD) values were determined for all possible shot delivery sequences, corresponding to different patterns of temporal dose delivery within the target. We found significant variations in the sEUD values among these sequences exceeding 15% for certain cases. However, the sequences for the actual treatment delivery were found to agree (<3%) and to correlate (R² = 0.98) excellently with the sequences yielding the maximum sEUD values for all studied cases. This result is applicable for both fast and slow growing tumors with α/β values of 2 to 20 according to the linear-quadratic model. In conclusion, despite large potential variations in different shot sequences for multi-isocenter Gamma Knife treatments, current clinical delivery sequences exhibited consistent biological target dosing that approached that maximally achievable for all studied cases.

  9. Function-Based Algorithms for Biological Sequences

    ERIC Educational Resources Information Center

    Mohanty, Pragyan Sheela P.

    2015-01-01

    Two problems at two different abstraction levels of computational biology are studied. At the molecular level, efficient pattern matching algorithms in DNA sequences are presented. For gene order data, an efficient data structure is presented capable of storing all gene re-orderings in a systematic manner. A common characteristic of presented…

  10. Finding the missing honey bee genes: lessons learned from a genome upgrade

    USDA-ARS?s Scientific Manuscript database

    The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. ...

  11. Review of the fundamental theories behind small angle X-ray scattering, molecular dynamics simulations, and relevant integrated application

    PubMed Central

    Boldon, Lauren; Laliberte, Fallon; Liu, Li

    2015-01-01

    In this paper, the fundamental concepts and equations necessary for performing small angle X-ray scattering (SAXS) experiments, molecular dynamics (MD) simulations, and MD-SAXS analyses were reviewed. Furthermore, several key biological and non-biological applications for SAXS, MD, and MD-SAXS are presented in this review; however, this article does not cover all possible applications. SAXS is an experimental technique used for the analysis of a wide variety of biological and non-biological structures. SAXS utilizes spherical averaging to produce one- or two-dimensional intensity profiles, from which structural data may be extracted. MD simulation is a computer simulation technique that is used to model complex biological and non-biological systems at the atomic level. MD simulations apply classical Newtonian mechanics’ equations of motion to perform force calculations and to predict the theoretical physical properties of the system. This review presents several applications that highlight the ability of both SAXS and MD to study protein folding and function in addition to non-biological applications, such as the study of mechanical, electrical, and structural properties of non-biological nanoparticles. Lastly, the potential benefits of combining SAXS and MD simulations for the study of both biological and non-biological systems are demonstrated through the presentation of several examples that combine the two techniques. PMID:25721341

  12. NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents.

    PubMed

    Liu, Sophia S; Hockenberry, Adam J; Lancichinetti, Andrea; Jewett, Michael C; Amaral, Luís A N

    2016-11-01

    The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems.

  13. The Effects of 3D Computer Simulation on Biology Students' Achievement and Memory Retention

    ERIC Educational Resources Information Center

    Elangovan, Tavasuria; Ismail, Zurida

    2014-01-01

    A quasi experimental study was conducted for six weeks to determine the effectiveness of two different 3D computer simulation based teaching methods, that is, realistic simulation and non-realistic simulation on Form Four Biology students' achievement and memory retention in Perak, Malaysia. A sample of 136 Form Four Biology students in Perak,…

  14. RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA

    PubMed Central

    Wright, Imogen A.; Travers, Simon A.

    2014-01-01

    The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. PMID:24861618

  15. Inference, simulation, modeling, and analysis of complex networks, with special emphasis on complex networks in systems biology

    NASA Astrophysics Data System (ADS)

    Christensen, Claire Petra

    Across diverse fields ranging from physics to biology, sociology, and economics, the technological advances of the past decade have engendered an unprecedented explosion of data on highly complex systems with thousands, if not millions of interacting components. These systems exist at many scales of size and complexity, and it is becoming ever-more apparent that they are, in fact, universal, arising in every field of study. Moreover, they share fundamental properties---chief among these, that the individual interactions of their constituent parts may be well-understood, but the characteristic behaviour produced by the confluence of these interactions---by these complex networks---is unpredictable; in a nutshell, the whole is more than the sum of its parts. There is, perhaps, no better illustration of this concept than the discoveries being made regarding complex networks in the biological sciences. In particular, though the sequencing of the human genome in 2003 was a remarkable feat, scientists understand that the "cellular-level blueprints" for the human being are cellular-level parts lists, but they say nothing (explicitly) about cellular-level processes. The challenge of modern molecular biology is to understand these processes in terms of the networks of parts---in terms of the interactions among proteins, enzymes, genes, and metabolites---as it is these processes that ultimately differentiate animate from inanimate, giving rise to life! It is the goal of systems biology---an umbrella field encapsulating everything from molecular biology to epidemiology in social systems---to understand processes in terms of fundamental networks of core biological parts, be they proteins or people. By virtue of the fact that there are literally countless complex systems, not to mention tools and techniques used to infer, simulate, analyze, and model these systems, it is impossible to give a truly comprehensive account of the history and study of complex systems. The author's own publications have contributed network inference, simulation, modeling, and analysis methods to the much larger body of work in systems biology, and indeed, in network science. The aim of this thesis is therefore twofold: to present this original work in the historical context of network science, but also to provide sufficient review and reference regarding complex systems (with an emphasis on complex networks in systems biology) and tools and techniques for their inference, simulation, analysis, and modeling, such that the reader will be comfortable in seeking out further information on the subject. The review-like Chapters 1, 2, and 4 are intended to convey the co-evolution of network science and the slow but noticeable breakdown of boundaries between disciplines in academia as research and comparison of diverse systems has brought to light the shared properties of these systems. It is the author's hope that theses chapters impart some sense of the remarkable and rapid progress in complex systems research that has led to this unprecedented academic synergy. Chapters 3 and 5 detail the author's original work in the context of complex systems research. Chapter 3 presents the methods and results of a two-stage modeling process that generates candidate gene-regulatory networks of the bacterium B.subtilis from experimentally obtained, yet mathematically underdetermined microchip array data. These networks are then analyzed from a graph theoretical perspective, and their biological viability is critiqued by comparing the networks' graph theoretical properties to those of other biological systems. The results of topological perturbation analyses revealing commonalities in behavior at multiple levels of complexity are also presented, and are shown to be an invaluable means by which to ascertain the level of complexity to which the network inference process is robust to noise. Chapter 5 outlines a learning algorithm for the development of a realistic, evolving social network (a city) into which a disease is introduced. The results of simulations in populations spanning two orders of magnitude are compared to prevaccine era measles data for England and Wales and demonstrate that the simulations are able to capture the quantitative and qualitative features of epidemics in populations as small as 10,000 people. The work presented in Chapter 5 validates the utility of network simulation in concurrently probing contact network dynamics and disease dynamics.

  16. Simulation studies of DNA at the nanoscale: Interactions with proteins, polycations, and surfaces

    NASA Astrophysics Data System (ADS)

    Elder, Robert M.

    Understanding the nanoscale interactions of DNA, a multifunctional biopolymer with sequence-dependent properties, with other biological and synthetic substrates and molecules is essential to advancing these technologies. This doctoral thesis research is aimed at understanding the thermodynamics and molecular-level structure when DNA interacts with proteins, polycations, and functionalized surfaces. First, we investigate the ability of a DNA damage recognition protein (HMGB1a) to bind to anti-cancer drug-induced DNA damage, seeking to explain how HMGB1a differentiates between the drugs in vivo. Using atomistic molecular dynamics simulations, we show that the structure of the drug-DNA molecule exhibits drug- and base sequence-dependence that explains some of the experimentally observed differential recognition of the drugs in various sequence contexts. Then, we show how steric hindrance from the drug decreases the deformability of the drug-DNA molecule, which decreases recognition by the protein, a concept that can be applied to rational drug design. Second, we study how polycation architecture and chemistry affect polycation-DNA binding so as to design optimal polycations for high efficiency gene (DNA) delivery. Using a multiscale computational approach involving atomistic and coarse-grained simulations, we examine how rearranging polylysine from a linear to a grafted architecture, and several aspects of the grafted architecture, affect polycation-DNA binding and the structure of polycation-DNA complexes. Next, going beyond lysine we examine how oligopeptide chemistry and sequence in the grafted architecture affects polycation-DNA binding and find that strategic placement of hydrophobic peptides might be used to tailor binding strength. Third, we study the adsorption and conformations of single-stranded DNA (an amphiphilic biopolymer) on model hydrophilic and hydrophobic surfaces. Short ssDNA oligomers adsorb to both surfaces with similar strength, with the strength of adsorption to the hydrophobic surface depending on the composition of the DNA strands, i.e. purine or pyrimidine bases. Additionally, DNA-surface and DNA-water interactions near the surfaces govern the adsorption. For longer ssDNA oligomers, the effects of surface chemistry and temperature on ssDNA conformations are rather small, but either the hydrophilic surface or increased temperature favor slightly more compact conformations due to energetic and entropic effects, respectively.

  17. Metagenomic ventures into outer sequence space.

    PubMed

    Dutilh, Bas E

    Sequencing DNA or RNA directly from the environment often results in many sequencing reads that have no homologs in the database. These are referred to as "unknowns," and reflect the vast unexplored microbial sequence space of our biosphere, also known as "biological dark matter." However, unknowns also exist because metagenomic datasets are not optimally mined. There is a pressure on researchers to publish and move on, and the unknown sequences are often left for what they are, and conclusions drawn based on reads with annotated homologs. This can cause abundant and widespread genomes to be overlooked, such as the recently discovered human gut bacteriophage crAssphage. The unknowns may be enriched for bacteriophage sequences, the most abundant and genetically diverse component of the biosphere and of sequence space. However, it remains an open question, what is the actual size of biological sequence space? The de novo assembly of shotgun metagenomes is the most powerful tool to address this question.

  18. Robust k-mer frequency estimation using gapped k-mers

    PubMed Central

    Ghandi, Mahmoud; Mohammad-Noori, Morteza

    2013-01-01

    Oligomers of fixed length, k, commonly known as k-mers, are often used as fundamental elements in the description of DNA sequence features of diverse biological function, or as intermediate elements in the constuction of more complex descriptors of sequence features such as position weight matrices. k-mers are very useful as general sequence features because they constitute a complete and unbiased feature set, and do not require parameterization based on incomplete knowledge of biological mechanisms. However, a fundamental limitation in the use of k-mers as sequence features is that as k is increased, larger spatial correlations in DNA sequence elements can be described, but the frequency of observing any specific k-mer becomes very small, and rapidly approaches a sparse matrix of binary counts. Thus any statistical learning approach using k-mers will be susceptible to noisy estimation of k-mer frequencies once k becomes large. Because all molecular DNA interactions have limited spatial extent, gapped k-mers often carry the relevant biological signal. Here we use gapped k-mer counts to more robustly estimate the ungapped k-mer frequencies, by deriving an equation for the minimum norm estimate of k-mer frequencies given an observed set of gapped k-mer frequencies. We demonstrate that this approach provides a more accurate estimate of the k-mer frequencies in real biological sequences using a sample of CTCF binding sites in the human genome. PMID:23861010

  19. Robust k-mer frequency estimation using gapped k-mers.

    PubMed

    Ghandi, Mahmoud; Mohammad-Noori, Morteza; Beer, Michael A

    2014-08-01

    Oligomers of fixed length, k, commonly known as k-mers, are often used as fundamental elements in the description of DNA sequence features of diverse biological function, or as intermediate elements in the constuction of more complex descriptors of sequence features such as position weight matrices. k-mers are very useful as general sequence features because they constitute a complete and unbiased feature set, and do not require parameterization based on incomplete knowledge of biological mechanisms. However, a fundamental limitation in the use of k-mers as sequence features is that as k is increased, larger spatial correlations in DNA sequence elements can be described, but the frequency of observing any specific k-mer becomes very small, and rapidly approaches a sparse matrix of binary counts. Thus any statistical learning approach using k-mers will be susceptible to noisy estimation of k-mer frequencies once k becomes large. Because all molecular DNA interactions have limited spatial extent, gapped k-mers often carry the relevant biological signal. Here we use gapped k-mer counts to more robustly estimate the ungapped k-mer frequencies, by deriving an equation for the minimum norm estimate of k-mer frequencies given an observed set of gapped k-mer frequencies. We demonstrate that this approach provides a more accurate estimate of the k-mer frequencies in real biological sequences using a sample of CTCF binding sites in the human genome.

  20. DNA capture elements for rapid detection and identification of biological agents

    NASA Astrophysics Data System (ADS)

    Kiel, Johnathan L.; Parker, Jill E.; Holwitt, Eric A.; Vivekananda, Jeeva

    2004-08-01

    DNA capture elements (DCEs; aptamers) are artificial DNA sequences, from a random pool of sequences, selected for their specific binding to potential biological warfare agents. These sequences were selected by an affinity method using filters to which the target agent was attached and the DNA isolated and amplified by polymerase chain reaction (PCR) in an iterative, increasingly stringent, process. Reporter molecules were attached to the finished sequences. To date, we have made DCEs to Bacillus anthracis spores, Shiga toxin, Venezuelan Equine Encephalitis (VEE) virus, and Francisella tularensis. These DCEs have demonstrated specificity and sensitivity equal to or better than antibody.

  1. Dynamic Biological Functioning Important for Simulating and Stabilizing Ocean Biogeochemistry

    NASA Astrophysics Data System (ADS)

    Buchanan, P. J.; Matear, R. J.; Chase, Z.; Phipps, S. J.; Bindoff, N. L.

    2018-04-01

    The biogeochemistry of the ocean exerts a strong influence on the climate by modulating atmospheric greenhouse gases. In turn, ocean biogeochemistry depends on numerous physical and biological processes that change over space and time. Accurately simulating these processes is fundamental for accurately simulating the ocean's role within the climate. However, our simulation of these processes is often simplistic, despite a growing understanding of underlying biological dynamics. Here we explore how new parameterizations of biological processes affect simulated biogeochemical properties in a global ocean model. We combine 6 different physical realizations with 6 different biogeochemical parameterizations (36 unique ocean states). The biogeochemical parameterizations, all previously published, aim to more accurately represent the response of ocean biology to changing physical conditions. We make three major findings. First, oxygen, carbon, alkalinity, and phosphate fields are more sensitive to changes in the ocean's physical state. Only nitrate is more sensitive to changes in biological processes, and we suggest that assessment protocols for ocean biogeochemical models formally include the marine nitrogen cycle to assess their performance. Second, we show that dynamic variations in the production, remineralization, and stoichiometry of organic matter in response to changing environmental conditions benefit the simulation of ocean biogeochemistry. Third, dynamic biological functioning reduces the sensitivity of biogeochemical properties to physical change. Carbon and nitrogen inventories were 50% and 20% less sensitive to physical changes, respectively, in simulations that incorporated dynamic biological functioning. These results highlight the importance of a dynamic biology for ocean properties and climate.

  2. Modular protein domains: an engineering approach toward functional biomaterials.

    PubMed

    Lin, Charng-Yu; Liu, Julie C

    2016-08-01

    Protein domains and peptide sequences are a powerful tool for conferring specific functions to engineered biomaterials. Protein sequences with a wide variety of functionalities, including structure, bioactivity, protein-protein interactions, and stimuli responsiveness, have been identified, and advances in molecular biology continue to pinpoint new sequences. Protein domains can be combined to make recombinant proteins with multiple functionalities. The high fidelity of the protein translation machinery results in exquisite control over the sequence of recombinant proteins and the resulting properties of protein-based materials. In this review, we discuss protein domains and peptide sequences in the context of functional protein-based materials, composite materials, and their biological applications. Copyright © 2016 Elsevier Ltd. All rights reserved.

  3. Bayesian Nonparametric Ordination for the Analysis of Microbial Communities.

    PubMed

    Ren, Boyu; Bacallado, Sergio; Favaro, Stefano; Holmes, Susan; Trippa, Lorenzo

    2017-01-01

    Human microbiome studies use sequencing technologies to measure the abundance of bacterial species or Operational Taxonomic Units (OTUs) in samples of biological material. Typically the data are organized in contingency tables with OTU counts across heterogeneous biological samples. In the microbial ecology community, ordination methods are frequently used to investigate latent factors or clusters that capture and describe variations of OTU counts across biological samples. It remains important to evaluate how uncertainty in estimates of each biological sample's microbial distribution propagates to ordination analyses, including visualization of clusters and projections of biological samples on low dimensional spaces. We propose a Bayesian analysis for dependent distributions to endow frequently used ordinations with estimates of uncertainty. A Bayesian nonparametric prior for dependent normalized random measures is constructed, which is marginally equivalent to the normalized generalized Gamma process, a well-known prior for nonparametric analyses. In our prior, the dependence and similarity between microbial distributions is represented by latent factors that concentrate in a low dimensional space. We use a shrinkage prior to tune the dimensionality of the latent factors. The resulting posterior samples of model parameters can be used to evaluate uncertainty in analyses routinely applied in microbiome studies. Specifically, by combining them with multivariate data analysis techniques we can visualize credible regions in ecological ordination plots. The characteristics of the proposed model are illustrated through a simulation study and applications in two microbiome datasets.

  4. Incorporating modeling and simulations in undergraduate biophysical chemistry course to promote understanding of structure-dynamics-function relationships in proteins.

    PubMed

    Hati, Sanchita; Bhattacharyya, Sudeep

    2016-01-01

    A project-based biophysical chemistry laboratory course, which is offered to the biochemistry and molecular biology majors in their senior year, is described. In this course, the classroom study of the structure-function of biomolecules is integrated with the discovery-guided laboratory study of these molecules using computer modeling and simulations. In particular, modern computational tools are employed to elucidate the relationship between structure, dynamics, and function in proteins. Computer-based laboratory protocols that we introduced in three modules allow students to visualize the secondary, super-secondary, and tertiary structures of proteins, analyze non-covalent interactions in protein-ligand complexes, develop three-dimensional structural models (homology model) for new protein sequences and evaluate their structural qualities, and study proteins' intrinsic dynamics to understand their functions. In the fourth module, students are assigned to an authentic research problem, where they apply their laboratory skills (acquired in modules 1-3) to answer conceptual biophysical questions. Through this process, students gain in-depth understanding of protein dynamics-the missing link between structure and function. Additionally, the requirement of term papers sharpens students' writing and communication skills. Finally, these projects result in new findings that are communicated in peer-reviewed journals. © 2016 The International Union of Biochemistry and Molecular Biology.

  5. Software for rapid time dependent ChIP-sequencing analysis (TDCA).

    PubMed

    Myschyshyn, Mike; Farren-Dai, Marco; Chuang, Tien-Jui; Vocadlo, David

    2017-11-25

    Chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq) and associated methods are widely used to define the genome wide distribution of chromatin associated proteins, post-translational epigenetic marks, and modifications found on DNA bases. An area of emerging interest is to study time dependent changes in the distribution of such proteins and marks by using serial ChIP-seq experiments performed in a time resolved manner. Despite such time resolved studies becoming increasingly common, software to facilitate analysis of such data in a robust automated manner is limited. We have designed software called Time-Dependent ChIP-Sequencing Analyser (TDCA), which is the first program to automate analysis of time-dependent ChIP-seq data by fitting to sigmoidal curves. We provide users with guidance for experimental design of TDCA for modeling of time course (TC) ChIP-seq data using two simulated data sets. Furthermore, we demonstrate that this fitting strategy is widely applicable by showing that automated analysis of three previously published TC data sets accurately recapitulates key findings reported in these studies. Using each of these data sets, we highlight how biologically relevant findings can be readily obtained by exploiting TDCA to yield intuitive parameters that describe behavior at either a single locus or sets of loci. TDCA enables customizable analysis of user input aligned DNA sequencing data, coupled with graphical outputs in the form of publication-ready figures that describe behavior at either individual loci or sets of loci sharing common traits defined by the user. TDCA accepts sequencing data as standard binary alignment map (BAM) files and loci of interest in browser extensible data (BED) file format. TDCA accurately models the number of sequencing reads, or coverage, at loci from TC ChIP-seq studies or conceptually related TC sequencing experiments. TC experiments are reduced to intuitive parametric values that facilitate biologically relevant data analysis, and the uncovering of variations in the time-dependent behavior of chromatin. TDCA automates the analysis of TC ChIP-seq experiments, permitting researchers to easily obtain raw and modeled data for specific loci or groups of loci with similar behavior while also enhancing consistency of data analysis of TC data within the genomics field.

  6. A better sequence-read simulator program for metagenomics.

    PubMed

    Johnson, Stephen; Trost, Brett; Long, Jeffrey R; Pittet, Vanessa; Kusalik, Anthony

    2014-01-01

    There are many programs available for generating simulated whole-genome shotgun sequence reads. The data generated by many of these programs follow predefined models, which limits their use to the authors' original intentions. For example, many models assume that read lengths follow a uniform or normal distribution. Other programs generate models from actual sequencing data, but are limited to reads from single-genome studies. To our knowledge, there are no programs that allow a user to generate simulated data following non-parametric read-length distributions and quality profiles based on empirically-derived information from metagenomics sequencing data. We present BEAR (Better Emulation for Artificial Reads), a program that uses a machine-learning approach to generate reads with lengths and quality values that closely match empirically-derived distributions. BEAR can emulate reads from various sequencing platforms, including Illumina, 454, and Ion Torrent. BEAR requires minimal user input, as it automatically determines appropriate parameter settings from user-supplied data. BEAR also uses a unique method for deriving run-specific error rates, and extracts useful statistics from the metagenomic data itself, such as quality-error models. Many existing simulators are specific to a particular sequencing technology; however, BEAR is not restricted in this way. Because of its flexibility, BEAR is particularly useful for emulating the behaviour of technologies like Ion Torrent, for which no dedicated sequencing simulators are currently available. BEAR is also the first metagenomic sequencing simulator program that automates the process of generating abundances, which can be an arduous task. BEAR is useful for evaluating data processing tools in genomics. It has many advantages over existing comparable software, such as generating more realistic reads and being independent of sequencing technology, and has features particularly useful for metagenomics work.

  7. Modeling the evolution of regulatory elements by simultaneous detection and alignment with phylogenetic pair HMMs.

    PubMed

    Majoros, William H; Ohler, Uwe

    2010-12-16

    The computational detection of regulatory elements in DNA is a difficult but important problem impacting our progress in understanding the complex nature of eukaryotic gene regulation. Attempts to utilize cross-species conservation for this task have been hampered both by evolutionary changes of functional sites and poor performance of general-purpose alignment programs when applied to non-coding sequence. We describe a new and flexible framework for modeling binding site evolution in multiple related genomes, based on phylogenetic pair hidden Markov models which explicitly model the gain and loss of binding sites along a phylogeny. We demonstrate the value of this framework for both the alignment of regulatory regions and the inference of precise binding-site locations within those regions. As the underlying formalism is a stochastic, generative model, it can also be used to simulate the evolution of regulatory elements. Our implementation is scalable in terms of numbers of species and sequence lengths and can produce alignments and binding-site predictions with accuracy rivaling or exceeding current systems that specialize in only alignment or only binding-site prediction. We demonstrate the validity and power of various model components on extensive simulations of realistic sequence data and apply a specific model to study Drosophila enhancers in as many as ten related genomes and in the presence of gain and loss of binding sites. Different models and modeling assumptions can be easily specified, thus providing an invaluable tool for the exploration of biological hypotheses that can drive improvements in our understanding of the mechanisms and evolution of gene regulation.

  8. Integration of multi-omics data for integrative gene regulatory network inference.

    PubMed

    Zarayeneh, Neda; Ko, Euiseong; Oh, Jung Hun; Suh, Sang; Liu, Chunyu; Gao, Jean; Kim, Donghyun; Kang, Mingon

    2017-01-01

    Gene regulatory networks provide comprehensive insights and indepth understanding of complex biological processes. The molecular interactions of gene regulatory networks are inferred from a single type of genomic data, e.g., gene expression data in most research. However, gene expression is a product of sequential interactions of multiple biological processes, such as DNA sequence variations, copy number variations, histone modifications, transcription factors, and DNA methylations. The recent rapid advances of high-throughput omics technologies enable one to measure multiple types of omics data, called 'multi-omics data', that represent the various biological processes. In this paper, we propose an Integrative Gene Regulatory Network inference method (iGRN) that incorporates multi-omics data and their interactions in gene regulatory networks. In addition to gene expressions, copy number variations and DNA methylations were considered for multi-omics data in this paper. The intensive experiments were carried out with simulation data, where iGRN's capability that infers the integrative gene regulatory network is assessed. Through the experiments, iGRN shows its better performance on model representation and interpretation than other integrative methods in gene regulatory network inference. iGRN was also applied to a human brain dataset of psychiatric disorders, and the biological network of psychiatric disorders was analysed.

  9. Integration of multi-omics data for integrative gene regulatory network inference

    PubMed Central

    Zarayeneh, Neda; Ko, Euiseong; Oh, Jung Hun; Suh, Sang; Liu, Chunyu; Gao, Jean; Kim, Donghyun

    2017-01-01

    Gene regulatory networks provide comprehensive insights and indepth understanding of complex biological processes. The molecular interactions of gene regulatory networks are inferred from a single type of genomic data, e.g., gene expression data in most research. However, gene expression is a product of sequential interactions of multiple biological processes, such as DNA sequence variations, copy number variations, histone modifications, transcription factors, and DNA methylations. The recent rapid advances of high-throughput omics technologies enable one to measure multiple types of omics data, called ‘multi-omics data’, that represent the various biological processes. In this paper, we propose an Integrative Gene Regulatory Network inference method (iGRN) that incorporates multi-omics data and their interactions in gene regulatory networks. In addition to gene expressions, copy number variations and DNA methylations were considered for multi-omics data in this paper. The intensive experiments were carried out with simulation data, where iGRN’s capability that infers the integrative gene regulatory network is assessed. Through the experiments, iGRN shows its better performance on model representation and interpretation than other integrative methods in gene regulatory network inference. iGRN was also applied to a human brain dataset of psychiatric disorders, and the biological network of psychiatric disorders was analysed. PMID:29354189

  10. From genes to proteins to behavior: a laboratory project that enhances student understanding in cell and molecular biology.

    PubMed

    Aronson, Benjamin D; Silveira, Linda A

    2009-01-01

    In the laboratory, students can actively explore concepts and experience the nature of scientific research. We have devised a 5-wk laboratory project in our introductory college biology course whose aim was to improve understanding in five major concepts that are central to basic cellular, molecular biology, and genetics while teaching molecular biology techniques. The project was focused on the production of adenine in Saccharomyces cerevisiae and investigated the nature of mutant red colonies of this yeast. Students created red mutants from a wild-type strain, amplified the two genes capable of giving rise to the red phenotype, and then analyzed the nucleotide sequences. A quiz assessing student understanding in the five areas was given at the start and the end of the course. Analysis of the quiz showed significant improvement in each of the areas. These areas were taught in the laboratory and the classroom; therefore, students were surveyed to determine whether the laboratory played a role in their improved understanding of the five areas. Student survey data demonstrated that the laboratory did have an important role in their learning of the concepts. This project simulated steps in a research project and could be adapted for an advanced course in genetics.

  11. Consistent design schematics for biological systems: standardization of representation in biological engineering

    PubMed Central

    Matsuoka, Yukiko; Ghosh, Samik; Kitano, Hiroaki

    2009-01-01

    The discovery by design paradigm driving research in synthetic biology entails the engineering of de novo biological constructs with well-characterized input–output behaviours and interfaces. The construction of biological circuits requires iterative phases of design, simulation and assembly, leading to the fabrication of a biological device. In order to represent engineered models in a consistent visual format and further simulating them in silico, standardization of representation and model formalism is imperative. In this article, we review different efforts for standardization, particularly standards for graphical visualization and simulation/annotation schemata adopted in systems biology. We identify the importance of integrating the different standardization efforts and provide insights into potential avenues for developing a common framework for model visualization, simulation and sharing across various tools. We envision that such a synergistic approach would lead to the development of global, standardized schemata in biology, empowering deeper understanding of molecular mechanisms as well as engineering of novel biological systems. PMID:19493898

  12. Combinatorial Pooling Enables Selective Sequencing of the Barley Gene Space

    PubMed Central

    Lonardi, Stefano; Duma, Denisa; Alpert, Matthew; Cordero, Francesca; Beccuti, Marco; Bhat, Prasanna R.; Wu, Yonghui; Ciardo, Gianfranco; Alsaihati, Burair; Ma, Yaqin; Wanamaker, Steve; Resnik, Josh; Bozdag, Serdar; Luo, Ming-Cheng; Close, Timothy J.

    2013-01-01

    For the vast majority of species – including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution) so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding. PMID:23592960

  13. Combinatorial pooling enables selective sequencing of the barley gene space.

    PubMed

    Lonardi, Stefano; Duma, Denisa; Alpert, Matthew; Cordero, Francesca; Beccuti, Marco; Bhat, Prasanna R; Wu, Yonghui; Ciardo, Gianfranco; Alsaihati, Burair; Ma, Yaqin; Wanamaker, Steve; Resnik, Josh; Bozdag, Serdar; Luo, Ming-Cheng; Close, Timothy J

    2013-04-01

    For the vast majority of species - including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution) so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding.

  14. A Probabilistic Framework for Constructing Temporal Relations in Replica Exchange Molecular Trajectories.

    PubMed

    Chattopadhyay, Aditya; Zheng, Min; Waller, Mark Paul; Priyakumar, U Deva

    2018-05-23

    Knowledge of the structure and dynamics of biomolecules is essential for elucidating the underlying mechanisms of biological processes. Given the stochastic nature of many biological processes, like protein unfolding, it's almost impossible that two independent simulations will generate the exact same sequence of events, which makes direct analysis of simulations difficult. Statistical models like Markov Chains, transition networks etc. help in shedding some light on the mechanistic nature of such processes by predicting long-time dynamics of these systems from short simulations. However, such methods fall short in analyzing trajectories with partial or no temporal information, for example, replica exchange molecular dynamics or Monte Carlo simulations. In this work we propose a probabilistic algorithm, borrowing concepts from graph theory and machine learning, to extract reactive pathways from molecular trajectories in the absence of temporal data. A suitable vector representation was chosen to represent each frame in the macromolecular trajectory (as a series of interaction and conformational energies) and dimensionality reduction was performed using principal component analysis (PCA). The trajectory was then clustered using a density-based clustering algorithm, where each cluster represents a metastable state on the potential energy surface (PES) of the biomolecule under study. A graph was created with these clusters as nodes with the edges learnt using an iterative expectation maximization algorithm. The most reactive path is conceived as the widest path along this graph. We have tested our method on RNA hairpin unfolding trajectory in aqueous urea solution. Our method makes the understanding of the mechanism of unfolding in RNA hairpin molecule more tractable. As this method doesn't rely on temporal data it can be used to analyze trajectories from Monte Carlo sampling techniques and replica exchange molecular dynamics (REMD).

  15. Folding and self-assembly of polypeptides: Dynamics and thermodynamics from molecular simulation

    NASA Astrophysics Data System (ADS)

    Fluitt, Aaron Michael

    Empowered by their exquisite three-dimensional structures, or "folds," proteins carry out biological tasks with high specificity, efficiency, and fidelity. The fold that optimizes biological function represents a stable configuration of the constituent polypeptide molecule(s) under physiological conditions. Proteins and polypeptides are not static, however: battered by thermal motion, they explore a distribution of folds that is determined by the sequence of amino acids, the presence and identity of other molecules, and the thermodynamic conditions. In this dissertation, we apply molecular simulation techniques to the study of two polypeptides that have unusually diffuse distributions of folds under physiological conditions: polyglutamine (polyQ) and islet amyloid polypeptide (IAPP). Neither polyQ nor IAPP adopts a predominant fold in dilute aqueous solution, but at sufficient concentrations, both are prone to self-assemble into stable, periodic, and highly regular aggregate structures known as amyloid. The appearance of amyloid deposits of polyQ in the brain, and of IAPP in the pancreas, are associated with Huntington's disease and type 2 diabetes, respectively. A molecular view of the mechanism(s) by which polyQ and IAPP fold and self-assemble will enhance our understanding of disease pathogenesis, and it has the potential to accelerate the development of therapeutics that target early-stage aggregates. Using molecular simulations with spatial and temporal resolution on the atomic scale, we present analyses of the structural distributions of polyQ and IAPP under various conditions, both in and out of equilibrium. In particular, we examine amyloid fibers of polyQ, the IAPP dimer in solution, and single IAPP fragments at a lipid bilayer. We also benchmark the molecular models, or "force fields," available for such studies, and we introduce a novel simulation algorithm.

  16. Teaching Biology around Themes: Teach Proteins and DNA Together.

    ERIC Educational Resources Information Center

    Offner, Susan

    1992-01-01

    Proposes as a unifying theme for high school biology the question of "how chromosomes determine what we are." Describes a sequence of lessons in which students learn about proteins, enzymes, and amino acids. Includes three dry laboratory exercises to demonstrate the DNA sequences for sickle cell anemia and cystic fibrosis. (MDH)

  17. Targeted enrichment strategies for next-generation plant biology

    Treesearch

    Richard Cronn; Brian J. Knaus; Aaron Liston; Peter J. Maughan; Matthew Parks; John V. Syring; Joshua Udall

    2012-01-01

    The dramatic advances offered by modem DNA sequencers continue to redefine the limits of what can be accomplished in comparative plant biology. Even with recent achievements, however, plant genomes present obstacles that can make it difficult to execute large-scale population and phylogenetic studies on next-generation sequencing platforms. Factors like large genome...

  18. Functional Dynamics of Hexameric Helicase Probed by Hydrogen Exchange and Simulation

    PubMed Central

    Radou, Gaël; Dreyer, Frauke N.; Tuma, Roman; Paci, Emanuele

    2014-01-01

    The biological function of large macromolecular assemblies depends on their structure and their dynamics over a broad range of timescales; for this reason, it is a significant challenge to investigate these assemblies using conventional experimental techniques. One of the most promising experimental techniques is hydrogen-deuterium exchange detected by mass spectrometry. Here, we describe to our knowledge a new computational method for quantitative interpretation of deuterium exchange kinetics and apply it to a hexameric viral helicase P4 that unwinds and translocates RNA into a virus capsid at the expense of ATP hydrolysis. Room-temperature dynamics probed by a hundred nanoseconds of all-atom molecular dynamics simulations is sufficient to predict the exchange kinetics of most sequence fragments and provide a residue-level interpretation of the low-resolution experimental results. The strategy presented here is also a valuable tool to validate experimental data, e.g., assignments, and to probe mechanisms that cannot be observed by x-ray crystallography, or that occur over timescales longer than those that can be realistically simulated, such as the opening of the hexameric ring. PMID:25140434

  19. The mechanism by which P250L mutation impairs flavivirus-NS1 dimerization: an investigation based on molecular dynamics simulations.

    PubMed

    Oliveira, Edson R A; de Alencastro, Ricardo B; Horta, Bruno A C

    2016-09-01

    The flavivirus non-structural protein 1 (NS1) is a conserved glycoprotein with as yet undefined biological function. This protein dimerizes when inside infected cells or associated to cell membranes but also forms lipid-associated hexamers when secreted to the extracellular space. A single amino acid substitution (P250L) is capable of preventing the dimerization of NS1 resulting in lower virulence and slower virus replication. In this work, based on molecular dynamics simulations of the dengue-2 virus NS1 [Formula: see text]-ladder monomer as a core model, we found that this mutation can induce several conformational changes that importantly affect critical monomer-monomer interactions. Based on additional simulations, we suggest a mechanism by which a highly orchestrated sequence of events propagate the local perturbations around the mutation site towards the dimer interface. The elucidation of such a mechanism could potentially support new strategies for rational production of live-attenuated vaccines and highlights a step forward in the development of novel anti-flavivirus measures.

  20. BM-Map: Bayesian Mapping of Multireads for Next-Generation Sequencing Data

    PubMed Central

    Ji, Yuan; Xu, Yanxun; Zhang, Qiong; Tsui, Kam-Wah; Yuan, Yuan; Norris, Clift; Liang, Shoudan; Liang, Han

    2011-01-01

    Summary Next-generation sequencing (NGS) technology generates millions of short reads, which provide valuable information for various aspects of cellular activities and biological functions. A key step in NGS applications (e.g., RNA-Seq) is to map short reads to correct genomic locations within the source genome. While most reads are mapped to a unique location, a significant proportion of reads align to multiple genomic locations with equal or similar numbers of mismatches; these are called multireads. The ambiguity in mapping the multireads may lead to bias in downstream analyses. Currently, most practitioners discard the multireads in their analysis, resulting in a loss of valuable information, especially for the genes with similar sequences. To refine the read mapping, we develop a Bayesian model that computes the posterior probability of mapping a multiread to each competing location. The probabilities are used for downstream analyses, such as the quantification of gene expression. We show through simulation studies and RNA-Seq analysis of real life data that the Bayesian method yields better mapping than the current leading methods. We provide a C++ program for downloading that is being packaged into a user-friendly software. PMID:21517792

  1. Engineering fluid flow using sequenced microstructures

    NASA Astrophysics Data System (ADS)

    Amini, Hamed; Sollier, Elodie; Masaeli, Mahdokht; Xie, Yu; Ganapathysubramanian, Baskar; Stone, Howard A.; di Carlo, Dino

    2013-05-01

    Controlling the shape of fluid streams is important across scales: from industrial processing to control of biomolecular interactions. Previous approaches to control fluid streams have focused mainly on creating chaotic flows to enhance mixing. Here we develop an approach to apply order using sequences of fluid transformations rather than enhancing chaos. We investigate the inertial flow deformations around a library of single cylindrical pillars within a microfluidic channel and assemble these net fluid transformations to engineer fluid streams. As these transformations provide a deterministic mapping of fluid elements from upstream to downstream of a pillar, we can sequentially arrange pillars to apply the associated nested maps and, therefore, create complex fluid structures without additional numerical simulation. To show the range of capabilities, we present sequences that sculpt the cross-sectional shape of a stream into complex geometries, move and split a fluid stream, perform solution exchange and achieve particle separation. A general strategy to engineer fluid streams into a broad class of defined configurations in which the complexity of the nonlinear equations of fluid motion are abstracted from the user is a first step to programming streams of any desired shape, which would be useful for biological, chemical and materials automation.

  2. Quantifying Selection with Pool-Seq Time Series Data.

    PubMed

    Taus, Thomas; Futschik, Andreas; Schlötterer, Christian

    2017-11-01

    Allele frequency time series data constitute a powerful resource for unraveling mechanisms of adaptation, because the temporal dimension captures important information about evolutionary forces. In particular, Evolve and Resequence (E&R), the whole-genome sequencing of replicated experimentally evolving populations, is becoming increasingly popular. Based on computer simulations several studies proposed experimental parameters to optimize the identification of the selection targets. No such recommendations are available for the underlying parameters selection strength and dominance. Here, we introduce a highly accurate method to estimate selection parameters from replicated time series data, which is fast enough to be applied on a genome scale. Using this new method, we evaluate how experimental parameters can be optimized to obtain the most reliable estimates for selection parameters. We show that the effective population size (Ne) and the number of replicates have the largest impact. Because the number of time points and sequencing coverage had only a minor effect, we suggest that time series analysis is feasible without major increase in sequencing costs. We anticipate that time series analysis will become routine in E&R studies. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  3. Enabling plant synthetic biology through genome engineering.

    PubMed

    Baltes, Nicholas J; Voytas, Daniel F

    2015-02-01

    Synthetic biology seeks to create new biological systems, including user-designed plants and plant cells. These systems can be employed for a variety of purposes, ranging from producing compounds of industrial or therapeutic value, to reducing crop losses by altering cellular responses to pathogens or climate change. To realize the full potential of plant synthetic biology, techniques are required that provide control over the genetic code - enabling targeted modifications to DNA sequences within living plant cells. Such control is now within reach owing to recent advances in the use of sequence-specific nucleases to precisely engineer genomes. We discuss here the enormous potential provided by genome engineering for plant synthetic biology. Copyright © 2014 Elsevier Ltd. All rights reserved.

  4. The computational linguistics of biological sequences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Searls, D.

    1995-12-31

    This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. Protein sequences are analogous in many respects, particularly their folding behavior. Proteins have a much richer variety of interactions, but in theory the same linguistic principles could come to bear in describing dependencies between distant residues that arise by virtue of three-dimensional structure. This tutorial will concentrate on nucleic acid sequences.

  5. Multi-agent-based bio-network for systems biology: protein-protein interaction network as an example.

    PubMed

    Ren, Li-Hong; Ding, Yong-Sheng; Shen, Yi-Zhen; Zhang, Xiang-Feng

    2008-10-01

    Recently, a collective effort from multiple research areas has been made to understand biological systems at the system level. This research requires the ability to simulate particular biological systems as cells, organs, organisms, and communities. In this paper, a novel bio-network simulation platform is proposed for system biology studies by combining agent approaches. We consider a biological system as a set of active computational components interacting with each other and with an external environment. Then, we propose a bio-network platform for simulating the behaviors of biological systems and modelling them in terms of bio-entities and society-entities. As a demonstration, we discuss how a protein-protein interaction (PPI) network can be seen as a society of autonomous interactive components. From interactions among small PPI networks, a large PPI network can emerge that has a remarkable ability to accomplish a complex function or task. We also simulate the evolution of the PPI networks by using the bio-operators of the bio-entities. Based on the proposed approach, various simulators with different functions can be embedded in the simulation platform, and further research can be done from design to development, including complexity validation of the biological system.

  6. Binary Interval Search: a scalable algorithm for counting interval intersections.

    PubMed

    Layer, Ryan M; Skadron, Kevin; Robins, Gabriel; Hall, Ira M; Quinlan, Aaron R

    2013-01-01

    The comparison of diverse genomic datasets is fundamental to understand genome biology. Researchers must explore many large datasets of genome intervals (e.g. genes, sequence alignments) to place their experimental results in a broader context and to make new discoveries. Relationships between genomic datasets are typically measured by identifying intervals that intersect, that is, they overlap and thus share a common genome interval. Given the continued advances in DNA sequencing technologies, efficient methods for measuring statistically significant relationships between many sets of genomic features are crucial for future discovery. We introduce the Binary Interval Search (BITS) algorithm, a novel and scalable approach to interval set intersection. We demonstrate that BITS outperforms existing methods at counting interval intersections. Moreover, we show that BITS is intrinsically suited to parallel computing architectures, such as graphics processing units by illustrating its utility for efficient Monte Carlo simulations measuring the significance of relationships between sets of genomic intervals. https://github.com/arq5x/bits.

  7. Bambus 2: scaffolding metagenomes.

    PubMed

    Koren, Sergey; Treangen, Todd J; Pop, Mihai

    2011-11-01

    Sequencing projects increasingly target samples from non-clonal sources. In particular, metagenomics has enabled scientists to begin to characterize the structure of microbial communities. The software tools developed for assembling and analyzing sequencing data for clonal organisms are, however, unable to adequately process data derived from non-clonal sources. We present a new scaffolder, Bambus 2, to address some of the challenges encountered when analyzing metagenomes. Our approach relies on a combination of a novel method for detecting genomic repeats and algorithms that analyze assembly graphs to identify biologically meaningful genomic variants. We compare our software to current assemblers using simulated and real data. We demonstrate that the repeat detection algorithms have higher sensitivity than current approaches without sacrificing specificity. In metagenomic datasets, the scaffolder avoids false joins between distantly related organisms while obtaining long-range contiguity. Bambus 2 represents a first step toward automated metagenomic assembly. Bambus 2 is open source and available from http://amos.sf.net. mpop@umiacs.umd.edu. Supplementary data are available at Bioinformatics online.

  8. Bambus 2: scaffolding metagenomes

    PubMed Central

    Koren, Sergey; Treangen, Todd J.; Pop, Mihai

    2011-01-01

    Motivation: Sequencing projects increasingly target samples from non-clonal sources. In particular, metagenomics has enabled scientists to begin to characterize the structure of microbial communities. The software tools developed for assembling and analyzing sequencing data for clonal organisms are, however, unable to adequately process data derived from non-clonal sources. Results: We present a new scaffolder, Bambus 2, to address some of the challenges encountered when analyzing metagenomes. Our approach relies on a combination of a novel method for detecting genomic repeats and algorithms that analyze assembly graphs to identify biologically meaningful genomic variants. We compare our software to current assemblers using simulated and real data. We demonstrate that the repeat detection algorithms have higher sensitivity than current approaches without sacrificing specificity. In metagenomic datasets, the scaffolder avoids false joins between distantly related organisms while obtaining long-range contiguity. Bambus 2 represents a first step toward automated metagenomic assembly. Availability: Bambus 2 is open source and available from http://amos.sf.net. Contact: mpop@umiacs.umd.edu Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:21926123

  9. Direct evidence for sequence-dependent attraction between double-stranded DNA controlled by methylation.

    PubMed

    Yoo, Jejoong; Kim, Hajin; Aksimentiev, Aleksei; Ha, Taekjip

    2016-03-22

    Although proteins mediate highly ordered DNA organization in vivo, theoretical studies suggest that homologous DNA duplexes can preferentially associate with one another even in the absence of proteins. Here we combine molecular dynamics simulations with single-molecule fluorescence resonance energy transfer experiments to examine the interactions between duplex DNA in the presence of spermine, a biological polycation. We find that AT-rich DNA duplexes associate more strongly than GC-rich duplexes, regardless of the sequence homology. Methyl groups of thymine acts as a steric block, relocating spermine from major grooves to interhelical regions, thereby increasing DNA-DNA attraction. Indeed, methylation of cytosines makes attraction between GC-rich DNA as strong as that between AT-rich DNA. Recent genome-wide chromosome organization studies showed that remote contact frequencies are higher for AT-rich and methylated DNA, suggesting that direct DNA-DNA interactions that we report here may play a role in the chromosome organization and gene regulation.

  10. Genetic circuit design automation.

    PubMed

    Nielsen, Alec A K; Der, Bryan S; Shin, Jonghyeon; Vaidyanathan, Prashant; Paralanov, Vanya; Strychalski, Elizabeth A; Ross, David; Densmore, Douglas; Voigt, Christopher A

    2016-04-01

    Computation can be performed in living cells by DNA-encoded circuits that process sensory information and control biological functions. Their construction is time-intensive, requiring manual part assembly and balancing of regulator expression. We describe a design environment, Cello, in which a user writes Verilog code that is automatically transformed into a DNA sequence. Algorithms build a circuit diagram, assign and connect gates, and simulate performance. Reliable circuit design requires the insulation of gates from genetic context, so that they function identically when used in different circuits. We used Cello to design 60 circuits forEscherichia coli(880,000 base pairs of DNA), for which each DNA sequence was built as predicted by the software with no additional tuning. Of these, 45 circuits performed correctly in every output state (up to 10 regulators and 55 parts), and across all circuits 92% of the output states functioned as predicted. Design automation simplifies the incorporation of genetic circuits into biotechnology projects that require decision-making, control, sensing, or spatial organization. Copyright © 2016, American Association for the Advancement of Science.

  11. Tilted pillar array fabrication by the combination of proton beam writing and soft lithography for microfluidic cell capture Part 2: Image sequence analysis based evaluation and biological application.

    PubMed

    Járvás, Gábor; Varga, Tamás; Szigeti, Márton; Hajba, László; Fürjes, Péter; Rajta, István; Guttman, András

    2018-02-01

    As a continuation of our previously published work, this paper presents a detailed evaluation of a microfabricated cell capture device utilizing a doubly tilted micropillar array. The device was fabricated using a novel hybrid technology based on the combination of proton beam writing and conventional lithography techniques. Tilted pillars offer unique flow characteristics and support enhanced fluidic interaction for improved immunoaffinity based cell capture. The performance of the microdevice was evaluated by an image sequence analysis based in-house developed single-cell tracking system. Individual cell tracking allowed in-depth analysis of the cell-chip surface interaction mechanism from hydrodynamic point of view. Simulation results were validated by using the hybrid device and the optimized surface functionalization procedure. Finally, the cell capture capability of this new generation microdevice was demonstrated by efficiently arresting cells from a HT29 cell-line suspension. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. Limited utility of residue masking for positive-selection inference.

    PubMed

    Spielman, Stephanie J; Dawson, Eric T; Wilke, Claus O

    2014-09-01

    Errors in multiple sequence alignments (MSAs) can reduce accuracy in positive-selection inference. Therefore, it has been suggested to filter MSAs before conducting further analyses. One widely used filter, Guidance, allows users to remove MSA positions aligned with low confidence. However, Guidance's utility in positive-selection inference has been disputed in the literature. We have conducted an extensive simulation-based study to characterize fully how Guidance impacts positive-selection inference, specifically for protein-coding sequences of realistic divergence levels. We also investigated whether novel scoring algorithms, which phylogenetically corrected confidence scores, and a new gap-penalization score-normalization scheme improved Guidance's performance. We found that no filter, including original Guidance, consistently benefitted positive-selection inferences. Moreover, all improvements detected were exceedingly minimal, and in certain circumstances, Guidance-based filters worsened inferences. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  13. Direct evidence for sequence-dependent attraction between double-stranded DNA controlled by methylation

    NASA Astrophysics Data System (ADS)

    Yoo, Jejoong; Kim, Hajin; Aksimentiev, Aleksei; Ha, Taekjip

    2016-03-01

    Although proteins mediate highly ordered DNA organization in vivo, theoretical studies suggest that homologous DNA duplexes can preferentially associate with one another even in the absence of proteins. Here we combine molecular dynamics simulations with single-molecule fluorescence resonance energy transfer experiments to examine the interactions between duplex DNA in the presence of spermine, a biological polycation. We find that AT-rich DNA duplexes associate more strongly than GC-rich duplexes, regardless of the sequence homology. Methyl groups of thymine acts as a steric block, relocating spermine from major grooves to interhelical regions, thereby increasing DNA-DNA attraction. Indeed, methylation of cytosines makes attraction between GC-rich DNA as strong as that between AT-rich DNA. Recent genome-wide chromosome organization studies showed that remote contact frequencies are higher for AT-rich and methylated DNA, suggesting that direct DNA-DNA interactions that we report here may play a role in the chromosome organization and gene regulation.

  14. Chirality- and sequence-selective successive self-sorting via specific homo- and complementary-duplex formations

    PubMed Central

    Makiguchi, Wataru; Tanabe, Junki; Yamada, Hidekazu; Iida, Hiroki; Taura, Daisuke; Ousaka, Naoki; Yashima, Eiji

    2015-01-01

    Self-recognition and self-discrimination within complex mixtures are of fundamental importance in biological systems, which entirely rely on the preprogrammed monomer sequences and homochirality of biological macromolecules. Here we report artificial chirality- and sequence-selective successive self-sorting of chiral dimeric strands bearing carboxylic acid or amidine groups joined by chiral amide linkers with different sequences through homo- and complementary-duplex formations. A mixture of carboxylic acid dimers linked by racemic-1,2-cyclohexane bis-amides with different amide sequences (NHCO or CONH) self-associate to form homoduplexes in a completely sequence-selective way, the structures of which are different from each other depending on the linker amide sequences. The further addition of an enantiopure amide-linked amidine dimer to a mixture of the racemic carboxylic acid dimers resulted in the formation of a single optically pure complementary duplex with a 100% diastereoselectivity and complete sequence specificity stabilized by the amidinium–carboxylate salt bridges, leading to the perfect chirality- and sequence-selective duplex formation. PMID:26051291

  15. Protein interface classification by evolutionary analysis

    PubMed Central

    2012-01-01

    Background Distinguishing biologically relevant interfaces from lattice contacts in protein crystals is a fundamental problem in structural biology. Despite efforts towards the computational prediction of interface character, many issues are still unresolved. Results We present here a protein-protein interface classifier that relies on evolutionary data to detect the biological character of interfaces. The classifier uses a simple geometric measure, number of core residues, and two evolutionary indicators based on the sequence entropy of homolog sequences. Both aim at detecting differential selection pressure between interface core and rim or rest of surface. The core residues, defined as fully buried residues (>95% burial), appear to be fundamental determinants of biological interfaces: their number is in itself a powerful discriminator of interface character and together with the evolutionary measures it is able to clearly distinguish evolved biological contacts from crystal ones. We demonstrate that this definition of core residues leads to distinctively better results than earlier definitions from the literature. The stringent selection and quality filtering of structural and sequence data was key to the success of the method. Most importantly we demonstrate that a more conservative selection of homolog sequences - with relatively high sequence identities to the query - is able to produce a clearer signal than previous attempts. Conclusions An evolutionary approach like the one presented here is key to the advancement of the field, which so far was missing an effective method exploiting the evolutionary character of protein interfaces. Its coverage and performance will only improve over time thanks to the incessant growth of sequence databases. Currently our method reaches an accuracy of 89% in classifying interfaces of the Ponstingl 2003 datasets and it lends itself to a variety of useful applications in structural biology and bioinformatics. We made the corresponding software implementation available to the community as an easy-to-use graphical web interface at http://www.eppic-web.org. PMID:23259833

  16. Homology and phylogeny and their automated inference

    NASA Astrophysics Data System (ADS)

    Fuellen, Georg

    2008-06-01

    The analysis of the ever-increasing amount of biological and biomedical data can be pushed forward by comparing the data within and among species. For example, an integrative analysis of data from the genome sequencing projects for various species traces the evolution of the genomes and identifies conserved and innovative parts. Here, I review the foundations and advantages of this “historical” approach and evaluate recent attempts at automating such analyses. Biological data is comparable if a common origin exists (homology), as is the case for members of a gene family originating via duplication of an ancestral gene. If the family has relatives in other species, we can assume that the ancestral gene was present in the ancestral species from which all the other species evolved. In particular, describing the relationships among the duplicated biological sequences found in the various species is often possible by a phylogeny, which is more informative than homology statements. Detecting and elaborating on common origins may answer how certain biological sequences developed, and predict what sequences are in a particular species and what their function is. Such knowledge transfer from sequences in one species to the homologous sequences of the other is based on the principle of ‘my closest relative looks and behaves like I do’, often referred to as ‘guilt by association’. To enable knowledge transfer on a large scale, several automated ‘phylogenomics pipelines’ have been developed in recent years, and seven of these will be described and compared. Overall, the examples in this review demonstrate that homology and phylogeny analyses, done on a large (and automated) scale, can give insights into function in biology and biomedicine.

  17. BioNSi: A Discrete Biological Network Simulator Tool.

    PubMed

    Rubinstein, Amir; Bracha, Noga; Rudner, Liat; Zucker, Noga; Sloin, Hadas E; Chor, Benny

    2016-08-05

    Modeling and simulation of biological networks is an effective and widely used research methodology. The Biological Network Simulator (BioNSi) is a tool for modeling biological networks and simulating their discrete-time dynamics, implemented as a Cytoscape App. BioNSi includes a visual representation of the network that enables researchers to construct, set the parameters, and observe network behavior under various conditions. To construct a network instance in BioNSi, only partial, qualitative biological data suffices. The tool is aimed for use by experimental biologists and requires no prior computational or mathematical expertise. BioNSi is freely available at http://bionsi.wix.com/bionsi , where a complete user guide and a step-by-step manual can also be found.

  18. The Use of Microgravity Simulators for Space Research

    NASA Technical Reports Server (NTRS)

    Zhang, Ye; Richards, Stephanie E.; Richards, Jeffrey T.; Levine, Howard G.

    2016-01-01

    The spaceflight environment is known to influence biological processes ranging from stimulation of cellular metabolism to possible impacts on cellular damage repair, suppression of immune functions, and bone loss in astronauts. Microgravity is one of the most significant stress factors experienced by living organisms during spaceflight, and therefore, understanding cellular responses to altered gravity at the physiological and molecular level is critical for expanding our knowledge of life in space. Since opportunities to conduct experiments in space are scarce, various microgravity simulators and analogues have been widely used in space biology ground studies. Even though simulated microgravity conditions have produced some, but not all of the biological effects observed in the true microgravity environment, they provide test beds that are effective, affordable, and readily available to facilitate microgravity research. Kennedy Space Center (KSC) provides ground microgravity simulator support to offer a variety of microgravity simulators and platforms for Space Biology investigators. Assistance will be provided by both KSC and external experts in molecular biology, microgravity simulation, and engineering. Comparisons between the physical differences in microgravity simulators, examples of experiments using the simulators, and scientific questions regarding the use of microgravity simulators will be discussed.

  19. The Use of Microgravity Simulators for Space Research

    NASA Technical Reports Server (NTRS)

    Zhang, Ye; Richards, Stephanie E.; Wade, Randall I.; Richards, Jeffrey T.; Fritsche, Ralph F.; Levine, Howard G.

    2016-01-01

    The spaceflight environment is known to influence biological processes ranging from stimulation of cellular metabolism to possible impacts on cellular damage repair, suppression of immune functions, and bone loss in astronauts. Microgravity is one of the most significant stress factors experienced by living organisms during spaceflight, and therefore, understanding cellular responses to altered gravity at the physiological and molecular level is critical for expanding our knowledge of life in space. Since opportunities to conduct experiments in space are scarce, various microgravity simulators and analogues have been widely used in space biology ground studies. Even though simulated microgravity conditions have produced some, but not all of the biological effects observed in the true microgravity environment, they provide test beds that are effective, affordable, and readily available to facilitate microgravity research. A Micro-g Simulator Center is being developed at Kennedy Space Center (KSC) to offer a variety of microgravity simulators and platforms for Space Biology investigators. Assistance will be provided by both KSC and external experts in molecular biology, microgravity simulation, and engineering. Comparisons between the physical differences in microgravity simulators, examples of experiments using the simulators, and scientific questions regarding the use of microgravity simulators will be discussed.

  20. Cell and molecular biology of the spiny dogfish Squalus acanthias and little skate Leucoraja erinacea: insights from in vitro cultured cells.

    PubMed

    Barnes, D W

    2012-04-01

    Two of the most commonly used elasmobranch experimental model species are the spiny dogfish Squalus acanthias and the little skate Leucoraja erinacea. Comparative biology and genomics with these species have provided useful information in physiology, pharmacology, toxicology, immunology, evolutionary developmental biology and genetics. A wealth of information has been obtained using in vitro approaches to study isolated cells and tissues from these organisms under circumstances in which the extracellular environment can be controlled. In addition to classical work with primary cell cultures, continuously proliferating cell lines have been derived recently, representing the first cell lines from cartilaginous fishes. These lines have proved to be valuable tools with which to explore functional genomic and biological questions and to test hypotheses at the molecular level. In genomic experiments, complementary (c)DNA libraries have been constructed, and c. 8000 unique transcripts identified, with over 3000 representing previously unknown gene sequences. A sub-set of messenger (m)RNAs has been detected for which the 3' untranslated regions show elements that are remarkably well conserved evolutionarily, representing novel, potentially regulatory gene sequences. The cell culture systems provide physiologically valid tools to study functional roles of these sequences and other aspects of elasmobranch molecular cell biology and physiology. Information derived from the use of in vitro cell cultures is valuable in revealing gene diversity and information for genomic sequence assembly, as well as for identification of new genes and molecular markers, construction of gene-array probes and acquisition of full-length cDNA sequences. © 2012 The Author. Journal of Fish Biology © 2012 The Fisheries Society of the British Isles.

  1. Shot sequencing based on biological equivalent dose considerations for multiple isocenter Gamma Knife radiosurgery

    NASA Astrophysics Data System (ADS)

    Ma, Lijun; Lee, Letitia; Barani, Igor; Hwang, Andrew; Fogh, Shannon; Nakamura, Jean; McDermott, Michael; Sneed, Penny; Larson, David A.; Sahgal, Arjun

    2011-11-01

    Rapid delivery of multiple shots or isocenters is one of the hallmarks of Gamma Knife radiosurgery. In this study, we investigated whether the temporal order of shots delivered with Gamma Knife Perfexion would significantly influence the biological equivalent dose for complex multi-isocenter treatments. Twenty single-target cases were selected for analysis. For each case, 3D dose matrices of individual shots were extracted and single-fraction equivalent uniform dose (sEUD) values were determined for all possible shot delivery sequences, corresponding to different patterns of temporal dose delivery within the target. We found significant variations in the sEUD values among these sequences exceeding 15% for certain cases. However, the sequences for the actual treatment delivery were found to agree (<3%) and to correlate (R2 = 0.98) excellently with the sequences yielding the maximum sEUD values for all studied cases. This result is applicable for both fast and slow growing tumors with α/β values of 2 to 20 according to the linear-quadratic model. In conclusion, despite large potential variations in different shot sequences for multi-isocenter Gamma Knife treatments, current clinical delivery sequences exhibited consistent biological target dosing that approached that maximally achievable for all studied cases.

  2. A Statistical Guide to the Design of Deep Mutational Scanning Experiments

    PubMed Central

    Matuszewski, Sebastian; Hildebrandt, Marcel E.; Ghenu, Ana-Hermina; Jensen, Jeffrey D.; Bank, Claudia

    2016-01-01

    The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deep-sequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates. PMID:27412710

  3. RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA.

    PubMed

    Wright, Imogen A; Travers, Simon A

    2014-07-01

    The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Hidden Markov models of biological primary sequence information.

    PubMed Central

    Baldi, P; Chauvin, Y; Hunkapiller, T; McClure, M A

    1994-01-01

    Hidden Markov model (HMM) techniques are used to model families of biological sequences. A smooth and convergent algorithm is introduced to iteratively adapt the transition and emission parameters of the models from the examples in a given family. The HMM approach is applied to three protein families: globins, immunoglobulins, and kinases. In all cases, the models derived capture the important statistical characteristics of the family and can be used for a number of tasks, including multiple alignments, motif detection, and classification. For K sequences of average length N, this approach yields an effective multiple-alignment algorithm which requires O(KN2) operations, linear in the number of sequences. PMID:8302831

  5. Farming system context drives the value of deep wheat roots in semi-arid environments.

    PubMed

    Lilley, Julianne M; Kirkegaard, John A

    2016-06-01

    The capture of subsoil water by wheat roots can make a valuable contribution to grain yield on deep soils. More extensive root systems can capture more water, but leave the soil in a drier state, potentially limiting water availability to subsequent crops. To evaluate the importance of these legacy effects, a long-term simulation analysis at eight sites in the semi-arid environment of Australia compared the yield of standard wheat cultivars with cultivars that were (i) modified to have root systems which extract more water at depth and/or (ii) sown earlier to increase the duration of the vegetative period and hence rooting depth. We compared simulations with and without annual resetting of soil water to investigate the legacy effects of drier subsoils related to modified root systems. Simulated mean yield benefits from modified root systems declined from 0.1-0.6 t ha(-1) when annually reset, to 0-0.2 t ha(-1) in the continuous simulation due to a legacy of drier soils (mean 0-32mm) at subsequent crop sowing. For continuous simulations, predicted yield benefits of >0.2 t ha(-1) from more extensive root systems were rare (3-10% of years) at sites with shallow soils (<1.0 m), but occurred in 14-44% of years at sites with deeper soils (1.6-2.5 m). Earlier sowing had a larger impact than modified root systems on water uptake (14-31 vs 2-17mm) and mean yield increase (up to 0.7 vs 0-0.2 t ha(-1)) and the benefits occurred on deep and shallow soils and in more years (9-79 vs 3-44%). Increasing the proportion of crops in the sequence which dry the subsoil extensively has implications for the farming system productivity, and the crop sequence must be managed tactically to optimize overall system benefits. © The Author 2016. Published by Oxford University Press on behalf of the Society for Experimental Biology.

  6. The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution

    USDA-ARS?s Scientific Manuscript database

    As a major step toward understanding the biology and evolution of ruminants, the cattle genome was sequenced to ~7x coverage using a combined whole genome shotgun and BAC skim approach. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs found in seven mammalian...

  7. A computational chemistry perspective on the current status and future direction of hepatitis B antiviral drug discovery.

    PubMed

    Morgnanesi, Dante; Heinrichs, Eric J; Mele, Anthony R; Wilkinson, Sean; Zhou, Suzanne; Kulp, John L

    2015-11-01

    Computational chemical biology, applied to research on hepatitis B virus (HBV), has two major branches: bioinformatics (statistical models) and first-principle methods (molecular physics). While bioinformatics focuses on statistical tools and biological databases, molecular physics uses mathematics and chemical theory to study the interactions of biomolecules. Three computational techniques most commonly used in HBV research are homology modeling, molecular docking, and molecular dynamics. Homology modeling is a computational simulation to predict protein structure and has been used to construct conformers of the viral polymerase (reverse transcriptase domain and RNase H domain) and the HBV X protein. Molecular docking is used to predict the most likely orientation of a ligand when it is bound to a protein, as well as determining an energy score of the docked conformation. Molecular dynamics is a simulation that analyzes biomolecule motions and determines conformation and stability patterns. All of these modeling techniques have aided in the understanding of resistance mutations on HBV non-nucleos(t)ide reverse-transcriptase inhibitor binding. Finally, bioinformatics can be used to study the DNA and RNA protein sequences of viruses to both analyze drug resistance and to genotype the viral genomes. Overall, with these techniques, and others, computational chemical biology is becoming more and more necessary in hepatitis B research. This article forms part of a symposium in Antiviral Research on "An unfinished story: from the discovery of the Australia antigen to the development of new curative therapies for hepatitis B." Copyright © 2015 Elsevier B.V. All rights reserved.

  8. Functionalized poly(ethylene glycol) diacrylate microgels by microfluidics: In situ peptide encapsulation for in serum selective protein detection.

    PubMed

    Celetti, Giorgia; Natale, Concetta Di; Causa, Filippo; Battista, Edmondo; Netti, Paolo A

    2016-09-01

    Polymeric microparticles represent a robustly platform for the detection of clinically relevant analytes in biological samples; they can be functionalized encapsulating a multiple types of biologics entities, enhancing their applications as a new class of colloid materials. Microfluidic offers a versatile platform for the synthesis of monodisperse and engineered microparticles. In this work, we report microfluidic synthesis of novel polymeric microparticles endowed with specific peptide due to its superior specificity for target binding in complex media. A peptide sequence was efficiently encapsulated into the polymeric network and protein binding occurred with high affinity (KD 0.1-0.4μM). Fluidic dynamics simulation was performed to optimize the production conditions for monodisperse and stable functionalized microgels. The results demonstrate the easy and fast realization, in a single step, of functionalized monodisperse microgels using droplet-microfluidic technique, and how the inclusion of the peptide within polymeric network improve both the affinity and the specificity of protein capture. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. A Doubly Stochastic Change Point Detection Algorithm for Noisy Biological Signals.

    PubMed

    Gold, Nathan; Frasch, Martin G; Herry, Christophe L; Richardson, Bryan S; Wang, Xiaogang

    2017-01-01

    Experimentally and clinically collected time series data are often contaminated with significant confounding noise, creating short, noisy time series. This noise, due to natural variability and measurement error, poses a challenge to conventional change point detection methods. We propose a novel and robust statistical method for change point detection for noisy biological time sequences. Our method is a significant improvement over traditional change point detection methods, which only examine a potential anomaly at a single time point. In contrast, our method considers all suspected anomaly points and considers the joint probability distribution of the number of change points and the elapsed time between two consecutive anomalies. We validate our method with three simulated time series, a widely accepted benchmark data set, two geological time series, a data set of ECG recordings, and a physiological data set of heart rate variability measurements of fetal sheep model of human labor, comparing it to three existing methods. Our method demonstrates significantly improved performance over the existing point-wise detection methods.

  10. Factoring local sequence composition in motif significance analysis.

    PubMed

    Ng, Patrick; Keich, Uri

    2008-01-01

    We recently introduced a biologically realistic and reliable significance analysis of the output of a popular class of motif finders. In this paper we further improve our significance analysis by incorporating local base composition information. Relying on realistic biological data simulation, as well as on FDR analysis applied to real data, we show that our method is significantly better than the increasingly popular practice of using the normal approximation to estimate the significance of a finder's output. Finally we turn to leveraging our reliable significance analysis to improve the actual motif finding task. Specifically, endowing a variant of the Gibbs Sampler with our improved significance analysis we demonstrate that de novo finders can perform better than has been perceived. Significantly, our new variant outperforms all the finders reviewed in a recently published comprehensive analysis of the Harbison genome-wide binding location data. Interestingly, many of these finders incorporate additional information such as nucleosome positioning and the significance of binding data.

  11. From Network Analysis to Functional Metabolic Modeling of the Human Gut Microbiota.

    PubMed

    Bauer, Eugen; Thiele, Ines

    2018-01-01

    An important hallmark of the human gut microbiota is its species diversity and complexity. Various diseases have been associated with a decreased diversity leading to reduced metabolic functionalities. Common approaches to investigate the human microbiota include high-throughput sequencing with subsequent correlative analyses. However, to understand the ecology of the human gut microbiota and consequently design novel treatments for diseases, it is important to represent the different interactions between microbes with their associated metabolites. Computational systems biology approaches can give further mechanistic insights by constructing data- or knowledge-driven networks that represent microbe interactions. In this minireview, we will discuss current approaches in systems biology to analyze the human gut microbiota, with a particular focus on constraint-based modeling. We will discuss various community modeling techniques with their advantages and differences, as well as their application to predict the metabolic mechanisms of intestinal microbial communities. Finally, we will discuss future perspectives and current challenges of simulating realistic and comprehensive models of the human gut microbiota.

  12. Detecting Signatures of Positive Selection along Defined Branches of a Population Tree Using LSD.

    PubMed

    Librado, Pablo; Orlando, Ludovic

    2018-06-01

    Identifying the genomic basis underlying local adaptation is paramount to evolutionary biology, and bears many applications in the fields of conservation biology, crop, and animal breeding, as well as personalized medicine. Although many approaches have been developed to detect signatures of positive selection within single populations and population pairs, the increasing wealth of high-throughput sequencing data requires improved methods capable of handling multiple, and ideally large number of, populations in a single analysis. In this study, we introduce LSD (levels of exclusively shared differences), a fast and flexible framework to perform genome-wide selection scans, along the internal and external branches of a given population tree. We use forward simulations to demonstrate that LSD can identify branches targeted by positive selection with remarkable sensitivity and specificity. We illustrate a range of potential applications by analyzing data from the 1000 Genomes Project and uncover a list of adaptive candidates accompanying the expansion of anatomically modern humans out of Africa and their spread to Europe.

  13. Diffusion of low-energy electrons in tissue-like liquids.

    PubMed

    Malamut, C; Paes-Leme, P J; Paschoa, A S

    1992-11-01

    The spatial-energetic distribution of low-energy electrons was studied for a source located in a liquid medium simulating biological tissue. A time-independent Boltzmann equation was used to model this distribution microscopically. Ionization was treated as a perturbation to a quasi-elastic collision process between the electron and the medium. A diffusion limit was obtained by using a scale parameter, leading to a sequence of recursive partial differential equations whose solutions, associated with a macroscopic scale, were obtained by numerical approximations. As an application, electron ranges were estimated based on these solutions and then compared with values reported in the open literature based on experimental results and on Monte Carlo calculation. Local dosimetry, i.e., the energy imparted to a volume of a sphere with radius equal to the range of low-energy electrons, of low-energy electrons from internal emitters can benefit by the knowledge of the ranges estimated for biological tissue. Auger electron emitters, for example, have been the object of a number of investigations because of their radiobiological significance.

  14. Inverse statistical physics of protein sequences: a key issues review.

    PubMed

    Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin

    2018-03-01

    In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.

  15. Inverse statistical physics of protein sequences: a key issues review

    NASA Astrophysics Data System (ADS)

    Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin

    2018-03-01

    In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.

  16. Independent test assessment using the extreme value distribution theory.

    PubMed

    Almeida, Marcio; Blondell, Lucy; Peralta, Juan M; Kent, Jack W; Jun, Goo; Teslovich, Tanya M; Fuchsberger, Christian; Wood, Andrew R; Manning, Alisa K; Frayling, Timothy M; Cingolani, Pablo E; Sladek, Robert; Dyer, Thomas D; Abecasis, Goncalo; Duggirala, Ravindranath; Blangero, John

    2016-01-01

    The new generation of whole genome sequencing platforms offers great possibilities and challenges for dissecting the genetic basis of complex traits. With a very high number of sequence variants, a naïve multiple hypothesis threshold correction hinders the identification of reliable associations by the overreduction of statistical power. In this report, we examine 2 alternative approaches to improve the statistical power of a whole genome association study to detect reliable genetic associations. The approaches were tested using the Genetic Analysis Workshop 19 (GAW19) whole genome sequencing data. The first tested method estimates the real number of effective independent tests actually being performed in whole genome association project by the use of an extreme value distribution and a set of phenotype simulations. Given the familiar nature of the GAW19 data and the finite number of pedigree founders in the sample, the number of correlations between genotypes is greater than in a set of unrelated samples. Using our procedure, we estimate that the effective number represents only 15 % of the total number of independent tests performed. However, even using this corrected significance threshold, no genome-wide significant association could be detected for systolic and diastolic blood pressure traits. The second approach implements a biological relevance-driven hypothesis tested by exploiting prior computational predictions on the effect of nonsynonymous genetic variants detected in a whole genome sequencing association study. This guided testing approach was able to identify 2 promising single-nucleotide polymorphisms (SNPs), 1 for each trait, targeting biologically relevant genes that could help shed light on the genesis of the human hypertension. The first gene, PFH14 , associated with systolic blood pressure, interacts directly with genes involved in calcium-channel formation and the second gene, MAP4 , encodes a microtubule-associated protein and had already been detected by previous genome-wide association study experiments conducted in an Asian population. Our results highlight the necessity of the development of alternative approached to improve the efficiency on the detection of reasonable candidate associations in whole genome sequencing studies.

  17. Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq.

    PubMed

    Barrick, Jeffrey E; Colburn, Geoffrey; Deatherage, Daniel E; Traverse, Charles C; Strand, Matthew D; Borges, Jordan J; Knoester, David B; Reba, Aaron; Meyer, Austin G

    2014-11-29

    Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobile genetic elements, recently duplicated genes, or other repetitive sequences. Most current software programs for predicting structural variation from short-read DNA resequencing data are intended primarily for use on human genomes. They typically disregard information in reads mapping to repeat sequences, and significant post-processing and manual examination of their output is often required to rule out false-positive predictions and precisely describe mutational events. We have implemented an algorithm for identifying structural variation from DNA resequencing data as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes. Our method evaluates the support for new sequence junctions present in a clonal sample from split-read alignments to a reference genome, including matches to repeat sequences. Then, it uses a statistical model of read coverage evenness to accept or reject these predictions. Finally, breseq combines predictions of new junctions and deleted chromosomal regions to output biologically relevant descriptions of mutations and their effects on genes. We demonstrate the performance of breseq on simulated Escherichia coli genomes with deletions generating unique breakpoint sequences, new insertions of mobile genetic elements, and deletions mediated by mobile elements. Then, we reanalyze data from an E. coli K-12 mutation accumulation evolution experiment in which structural variation was not previously identified. Transposon insertions and large-scale chromosomal changes detected by breseq account for ~25% of spontaneous mutations in this strain. In all cases, we find that breseq is able to reliably predict structural variation with modest read-depth coverage of the reference genome (>40-fold). Using breseq to predict structural variation should be useful for studies of microbial epidemiology, experimental evolution, synthetic biology, and genetics when a reference genome for a closely related strain is available. In these cases, breseq can discover mutations that may be responsible for important or unintended changes in genomes that might otherwise go undetected.

  18. Differential evolution-simulated annealing for multiple sequence alignment

    NASA Astrophysics Data System (ADS)

    Addawe, R. C.; Addawe, J. M.; Sueño, M. R. K.; Magadia, J. C.

    2017-10-01

    Multiple sequence alignments (MSA) are used in the analysis of molecular evolution and sequence structure relationships. In this paper, a hybrid algorithm, Differential Evolution - Simulated Annealing (DESA) is applied in optimizing multiple sequence alignments (MSAs) based on structural information, non-gaps percentage and totally conserved columns. DESA is a robust algorithm characterized by self-organization, mutation, crossover, and SA-like selection scheme of the strategy parameters. Here, the MSA problem is treated as a multi-objective optimization problem of the hybrid evolutionary algorithm, DESA. Thus, we name the algorithm as DESA-MSA. Simulated sequences and alignments were generated to evaluate the accuracy and efficiency of DESA-MSA using different indel sizes, sequence lengths, deletion rates and insertion rates. The proposed hybrid algorithm obtained acceptable solutions particularly for the MSA problem evaluated based on the three objectives.

  19. Biology First: A History of the Grade Placement of High School Biology

    ERIC Educational Resources Information Center

    Sheppard, Keith; Robbins, Dennis M.

    2006-01-01

    This article outlines the history of the high school "general biology" course and details how biology came to be placed first in the traditional order of science subjects (biology-chemistry-physics). The article briefly discusses the implications of the development of this sequence for the present day biology course.

  20. Chemomechanical coupling in hexameric protein-protein interfaces harness energy within V–type ATPases

    PubMed Central

    Singharoy, Abhishek; Chipot, Christophe; Moradi, Mahmoud

    2017-01-01

    ATP synthase is the most prominent bioenergetic macromolecular motor in all life-forms, utilizing the proton gradient across the cell membrane to fuel the synthesis of ATP. Notwithstanding the wealth of available biochemical and structural information inferred from years of experiments, the precise molecular mechanism whereby vacuolar (V–type) ATP synthase fulfills its biological function remains largely fragmentary. Recently, crystallographers provided the first high-resolution view of ATP activity in Enterococcus hirae V1–ATPase. Employing a combination of transition-path sampling and high-performance free-energy methods, the sequence of conformational transitions involved in a functional cycle accompanying ATP hydrolysis has been investigated in unprecedented detail over an aggregate simulation time of 65 μs. Our simulated pathways reveal that the chemical energy produced by ATP hydrolysis is harnessed via the concerted motion of the protein–protein interfaces in the V1-ring, and is nearly entirely consumed in the rotation of the central stalk. Surprisingly, in an ATPase devoid of a central stalk, the interfaces of this ring are perfectly designed for inducing ATP hydrolysis. Yet, in a complete V1-ATPase, the mechanical property of the central stalk is a key determinant of the rate of ATP turnover. The simulations further unveil a sequence of events, whereby unbinding of the hydrolysis product (ADP+Pi) is followed by ATP uptake, which, in turn, leads to the torque generation step and rotation of the center stalk. Molecular trajectories also bring to light multiple intermediates, two of which have been isolated in independent crystallography experiments. PMID:27936329

  1. Chemomechanical Coupling in Hexameric Protein-Protein Interfaces Harnesses Energy within V-Type ATPases.

    PubMed

    Singharoy, Abhishek; Chipot, Christophe; Moradi, Mahmoud; Schulten, Klaus

    2017-01-11

    ATP synthase is the most prominent bioenergetic macromolecular motor in all life forms, utilizing the proton gradient across the cell membrane to fuel the synthesis of ATP. Notwithstanding the wealth of available biochemical and structural information inferred from years of experiments, the precise molecular mechanism whereby vacuolar (V-type) ATP synthase fulfills its biological function remains largely fragmentary. Recently, crystallographers provided the first high-resolution view of ATP activity in Enterococcus hirae V 1 -ATPase. Employing a combination of transition-path sampling and high-performance free-energy methods, the sequence of conformational transitions involved in a functional cycle accompanying ATP hydrolysis has been investigated in unprecedented detail over an aggregate simulation time of 65 μs. Our simulated pathways reveal that the chemical energy produced by ATP hydrolysis is harnessed via the concerted motion of the protein-protein interfaces in the V 1 -ring, and is nearly entirely consumed in the rotation of the central stalk. Surprisingly, in an ATPase devoid of a central stalk, the interfaces of this ring are perfectly designed for inducing ATP hydrolysis. However, in a complete V 1 -ATPase, the mechanical property of the central stalk is a key determinant of the rate of ATP turnover. The simulations further unveil a sequence of events, whereby unbinding of the hydrolysis product (ADP + P i ) is followed by ATP uptake, which, in turn, leads to the torque generation step and rotation of the center stalk. Molecular trajectories also bring to light multiple intermediates, two of which have been isolated in independent crystallography experiments.

  2. Modeling and simulation of biological systems using SPICE language

    PubMed Central

    Lallement, Christophe; Haiech, Jacques

    2017-01-01

    The article deals with BB-SPICE (SPICE for Biochemical and Biological Systems), an extension of the famous Simulation Program with Integrated Circuit Emphasis (SPICE). BB-SPICE environment is composed of three modules: a new textual and compact description formalism for biological systems, a converter that handles this description and generates the SPICE netlist of the equivalent electronic circuit and NGSPICE which is an open-source SPICE simulator. In addition, the environment provides back and forth interfaces with SBML (System Biology Markup Language), a very common description language used in systems biology. BB-SPICE has been developed in order to bridge the gap between the simulation of biological systems on the one hand and electronics circuits on the other hand. Thus, it is suitable for applications at the interface between both domains, such as development of design tools for synthetic biology and for the virtual prototyping of biosensors and lab-on-chip. Simulation results obtained with BB-SPICE and COPASI (an open-source software used for the simulation of biochemical systems) have been compared on a benchmark of models commonly used in systems biology. Results are in accordance from a quantitative viewpoint but BB-SPICE outclasses COPASI by 1 to 3 orders of magnitude regarding the computation time. Moreover, as our software is based on NGSPICE, it could take profit of incoming updates such as the GPU implementation, of the coupling with powerful analysis and verification tools or of the integration in design automation tools (synthetic biology). PMID:28787027

  3. In silico evolution of the hunchback gene indicates redundancy in cis-regulatory organization and spatial gene expression

    PubMed Central

    Zagrijchuk, Elizaveta A.; Sabirov, Marat A.; Holloway, David M.; Spirov, Alexander V.

    2014-01-01

    Biological development depends on the coordinated expression of genes in time and space. Developmental genes have extensive cis-regulatory regions which control their expression. These regions are organized in a modular manner, with different modules controlling expression at different times and locations. Both how modularity evolved and what function it serves are open questions. We present a computational model for the cis-regulation of the hunchback (hb) gene in the fruit fly (Drosophila). We simulate evolution (using an evolutionary computation approach from computer science) to find the optimal cis-regulatory arrangements for fitting experimental hb expression patterns. We find that the cis-regulatory region tends to readily evolve modularity. These cis-regulatory modules (CRMs) do not tend to control single spatial domains, but show a multi-CRM/multi-domain correspondence. We find that the CRM-domain correspondence seen in Drosophila evolves with a high probability in our model, supporting the biological relevance of the approach. The partial redundancy resulting from multi-CRM control may confer some biological robustness against corruption of regulatory sequences. The technique developed on hb could readily be applied to other multi-CRM developmental genes. PMID:24712536

  4. The activation of fibroblast growth factors by heparin: synthesis, structure, and biological activity of heparin-like oligosaccharides.

    PubMed

    de Paz, J L; Angulo, J; Lassaletta, J M; Nieto, P M; Redondo-Horcajo, M; Lozano, R M; Giménez-Gallego, G; Martín-Lomas, M

    2001-09-03

    An effective strategy has been designed for the synthesis of oligosaccharides of different sizes structurally related to the regular region of heparin; this is illustrated by the preparation of hexasaccharide 1 and octasaccharide 2. This synthetic strategy provides the oligosaccharide sequence containing a D-glucosamine unit at the nonreducing end that is not available either by enzymatic or chemical degradation of heparin. It may permit, after slight modifications, the preparation of oligosaccharide fragments with different charge distribution as well. NMR spectroscopy and molecular dynamics simulations have shown that the overall structure of 1 in solution is a stable right-hand helix with four residues per turn. Hexasaccharide 1 and, most likely, octasaccharide 2 are, therefore, chemically well-defined structural models of naturally occurring heparin-like oligosaccharides for use in binding and biological activity studies. Both compounds 1 and 2 induce the mitogenic activity of acid fibroblast growth factor (FGF1), with the half-maximum activating concentration of 2 being equivalent to that of heparin. Sedimentation equilibrium analysis with compound 2 suggests that heparin-induced FGF1 dimerization is not an absolute requirement for biological activity.

  5. Chemometric analysis of Hymenoptera toxins and defensins: A model for predicting the biological activity of novel peptides from venoms and hemolymph.

    PubMed

    Saidemberg, Daniel M; Baptista-Saidemberg, Nicoli B; Palma, Mario S

    2011-09-01

    When searching for prospective novel peptides, it is difficult to determine the biological activity of a peptide based only on its sequence. The "trial and error" approach is generally laborious, expensive and time consuming due to the large number of different experimental setups required to cover a reasonable number of biological assays. To simulate a virtual model for Hymenoptera insects, 166 peptides were selected from the venoms and hemolymphs of wasps, bees and ants and applied to a mathematical model of multivariate analysis, with nine different chemometric components: GRAVY, aliphaticity index, number of disulfide bonds, total residues, net charge, pI value, Boman index, percentage of alpha helix, and flexibility prediction. Principal component analysis (PCA) with non-linear iterative projections by alternating least-squares (NIPALS) algorithm was performed, without including any information about the biological activity of the peptides. This analysis permitted the grouping of peptides in a way that strongly correlated to the biological function of the peptides. Six different groupings were observed, which seemed to correspond to the following groups: chemotactic peptides, mastoparans, tachykinins, kinins, antibiotic peptides, and a group of long peptides with one or two disulfide bonds and with biological activities that are not yet clearly defined. The partial overlap between the mastoparans group and the chemotactic peptides, tachykinins, kinins and antibiotic peptides in the PCA score plot may be used to explain the frequent reports in the literature about the multifunctionality of some of these peptides. The mathematical model used in the present investigation can be used to predict the biological activities of novel peptides in this system, and it may also be easily applied to other biological systems. Copyright © 2011 Elsevier Inc. All rights reserved.

  6. MATLAB Algorithms for Rapid Detection and Embedding of Palindrome and Emordnilap Electronic Watermarks in Simulated Chemical and Biological Image Data

    DTIC Science & Technology

    2004-11-16

    MATLAB Algorithms for Rapid Detection and Embedding of Palindrome and Emordnilap Electronic Watermarks in Simulated Chemical and Biological Image ...and Emordnilap Electronic Watermarks in Simulated Chemical and Biological Image Data 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT...Conference on Chemical and Biological Defense Research. Held in Hunt Valley, Maryland on 15-17 November 2004., The original document contains color images

  7. High performance hybrid functional Petri net simulations of biological pathway models on CUDA.

    PubMed

    Chalkidis, Georgios; Nagasaki, Masao; Miyano, Satoru

    2011-01-01

    Hybrid functional Petri nets are a wide-spread tool for representing and simulating biological models. Due to their potential of providing virtual drug testing environments, biological simulations have a growing impact on pharmaceutical research. Continuous research advancements in biology and medicine lead to exponentially increasing simulation times, thus raising the demand for performance accelerations by efficient and inexpensive parallel computation solutions. Recent developments in the field of general-purpose computation on graphics processing units (GPGPU) enabled the scientific community to port a variety of compute intensive algorithms onto the graphics processing unit (GPU). This work presents the first scheme for mapping biological hybrid functional Petri net models, which can handle both discrete and continuous entities, onto compute unified device architecture (CUDA) enabled GPUs. GPU accelerated simulations are observed to run up to 18 times faster than sequential implementations. Simulating the cell boundary formation by Delta-Notch signaling on a CUDA enabled GPU results in a speedup of approximately 7x for a model containing 1,600 cells.

  8. Ancient dna from pleistocene fossils: Preservation, recovery, and utility of ancient genetic information for quaternary research

    NASA Astrophysics Data System (ADS)

    Yang, Hong

    Until recently, recovery and analysis of genetic information encoded in ancient DNA sequences from Pleistocene fossils were impossible. Recent advances in molecular biology offered technical tools to obtain ancient DNA sequences from well-preserved Quaternary fossils and opened the possibilities to directly study genetic changes in fossil species to address various biological and paleontological questions. Ancient DNA studies involving Pleistocene fossil material and ancient DNA degradation and preservation in Quaternary deposits are reviewed. The molecular technology applied to isolate, amplify, and sequence ancient DNA is also presented. Authentication of ancient DNA sequences and technical problems associated with modern and ancient DNA contamination are discussed. As illustrated in recent studies on ancient DNA from proboscideans, it is apparent that fossil DNA sequence data can shed light on many aspects of Quaternary research such as systematics and phylogeny. conservation biology, evolutionary theory, molecular taphonomy, and forensic sciences. Improvement of molecular techniques and a better understanding of DNA degradation during fossilization are likely to build on current strengths and to overcome existing problems, making fossil DNA data a unique source of information for Quaternary scientists.

  9. The use of museum specimens with high-throughput DNA sequencers

    PubMed Central

    Burrell, Andrew S.; Disotell, Todd R.; Bergey, Christina M.

    2015-01-01

    Natural history collections have long been used by morphologists, anatomists, and taxonomists to probe the evolutionary process and describe biological diversity. These biological archives also offer great opportunities for genetic research in taxonomy, conservation, systematics, and population biology. They allow assays of past populations, including those of extinct species, giving context to present patterns of genetic variation and direct measures of evolutionary processes. Despite this potential, museum specimens are difficult to work with because natural postmortem processes and preservation methods fragment and damage DNA. These problems have restricted geneticists’ ability to use natural history collections primarily by limiting how much of the genome can be surveyed. Recent advances in DNA sequencing technology, however, have radically changed this, making truly genomic studies from museum specimens possible. We review the opportunities and drawbacks of the use of museum specimens, and suggest how to best execute projects when incorporating such samples. Several high-throughput (HT) sequencing methodologies, including whole genome shotgun sequencing, sequence capture, and restriction digests (demonstrated here), can be used with archived biomaterials. PMID:25532801

  10. Indigenous and introduced potyviruses of legumes and Passiflora spp. from Australia: biological properties and comparison of coat protein sequences

    USDA-ARS?s Scientific Manuscript database

    Coat protein sequences of 33 Potyvirus isolates from legume and Passiflora spp. were sequenced to determine the identity of infecting viruses. Phylogenetic analysis of the sequences revealed the presence of seven distinct virus species....

  11. A Novel Cylindrical Representation for Characterizing Intrinsic Properties of Protein Sequences.

    PubMed

    Yu, Jia-Feng; Dou, Xiang-Hua; Wang, Hong-Bo; Sun, Xiao; Zhao, Hui-Ying; Wang, Ji-Hua

    2015-06-22

    The composition and sequence order of amino acid residues are the two most important characteristics to describe a protein sequence. Graphical representations facilitate visualization of biological sequences and produce biologically useful numerical descriptors. In this paper, we propose a novel cylindrical representation by placing the 20 amino acid residue types in a circle and sequence positions along the z axis. This representation allows visualization of the composition and sequence order of amino acids at the same time. Ten numerical descriptors and one weighted numerical descriptor have been developed to quantitatively describe intrinsic properties of protein sequences on the basis of the cylindrical model. Their applications to similarity/dissimilarity analysis of nine ND5 proteins indicated that these numerical descriptors are more effective than several classical numerical matrices. Thus, the cylindrical representation obtained here provides a new useful tool for visualizing and charactering protein sequences. An online server is available at http://biophy.dzu.edu.cn:8080/CNumD/input.jsp .

  12. Correlation between MCAT biology content specifications and topic scope and sequence of general education college biology textbooks.

    PubMed

    Rissing, Steven W

    2013-01-01

    Most American colleges and universities offer gateway biology courses to meet the needs of three undergraduate audiences: biology and related science majors, many of whom will become biomedical researchers; premedical students meeting medical school requirements and preparing for the Medical College Admissions Test (MCAT); and students completing general education (GE) graduation requirements. Biology textbooks for these three audiences present a topic scope and sequence that correlates with the topic scope and importance ratings of the biology content specifications for the MCAT regardless of the intended audience. Texts for "nonmajors," GE courses appear derived directly from their publisher's majors text. Topic scope and sequence of GE texts reflect those of "their" majors text and, indirectly, the MCAT. MCAT term density of GE texts equals or exceeds that of their corresponding majors text. Most American universities require a GE curriculum to promote a core level of academic understanding among their graduates. This includes civic scientific literacy, recognized as an essential competence for the development of public policies in an increasingly scientific and technological world. Deriving GE biology and related science texts from majors texts designed to meet very different learning objectives may defeat the scientific literacy goals of most schools' GE curricula.

  13. Correlation between MCAT Biology Content Specifications and Topic Scope and Sequence of General Education College Biology Textbooks

    PubMed Central

    Rissing, Steven W.

    2013-01-01

    Most American colleges and universities offer gateway biology courses to meet the needs of three undergraduate audiences: biology and related science majors, many of whom will become biomedical researchers; premedical students meeting medical school requirements and preparing for the Medical College Admissions Test (MCAT); and students completing general education (GE) graduation requirements. Biology textbooks for these three audiences present a topic scope and sequence that correlates with the topic scope and importance ratings of the biology content specifications for the MCAT regardless of the intended audience. Texts for “nonmajors,” GE courses appear derived directly from their publisher's majors text. Topic scope and sequence of GE texts reflect those of “their” majors text and, indirectly, the MCAT. MCAT term density of GE texts equals or exceeds that of their corresponding majors text. Most American universities require a GE curriculum to promote a core level of academic understanding among their graduates. This includes civic scientific literacy, recognized as an essential competence for the development of public policies in an increasingly scientific and technological world. Deriving GE biology and related science texts from majors texts designed to meet very different learning objectives may defeat the scientific literacy goals of most schools’ GE curricula. PMID:24006392

  14. Hemoglobin mass and biological passport for the detection of autologous blood doping.

    PubMed

    Pottgiesser, Torben; Echteler, Tobias; Sottas, Pierre-Edouard; Umhau, Markus; Schumacher, Yorck Olaf

    2012-05-01

    The most promising attempt to reveal otherwise undetectable autologous blood doping is the Athlete Biological Passport enabling a longitudinal monitoring of hematological measures. Recently, the determination of hemoglobin mass (tHb) was suggested to be incorporated in the adaptive model of the Athlete Biological Passport. The purpose therefore was to evaluate the performance of tHb as part of the adaptive model for the detection of autologous blood transfusions in a longitudinal blinded study. Twenty-one subjects were divided into a doped group (n = 11) and a control group (n = 10). During the time course of a simulated cycling season (42 wk) including three major competitions (Classics, Grand Tour, World Championships), multiple autologous transfusions of erythrocyte concentrates were assigned in the doped group. A blinded investigator ordered up to 10 tHb measurements (carbon monoxide rebreathing) per subject, mimicking an intelligent doping testing approach in obtaining hematological data (tHb, OFFmass (novel marker including reticulocytes), and respective sequences) for the adaptive model. The final analysis included 199 of 206 overall tHb measurements. The use of tHb, OFFmass, and their sequences as markers of the adaptive model at the 99% specificity level allowed identification of 10 of 11 doped subjects (91% sensitivity) including one false positive in the control group. At the 99.9% specificity level, 8 of 11 subjects were identified without false positives (73% sensitivity). It seems that the problems of tHb determination by carbon monoxide rebreathing limit the application of this method in antidoping. Because of its potential to detect individual abnormalities associated with autologous blood transfusions shown in this study, a method for tHb determination that is compatible with today's standards of testing should be the focus of future research.

  15. CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates.

    PubMed

    Low, Joel Z B; Khang, Tsung Fei; Tammi, Martti T

    2017-12-28

    In current statistical methods for calling differentially expressed genes in RNA-Seq experiments, the assumption is that an adjusted observed gene count represents an unknown true gene count. This adjustment usually consists of a normalization step to account for heterogeneous sample library sizes, and then the resulting normalized gene counts are used as input for parametric or non-parametric differential gene expression tests. A distribution of true gene counts, each with a different probability, can result in the same observed gene count. Importantly, sequencing coverage information is currently not explicitly incorporated into any of the statistical models used for RNA-Seq analysis. We developed a fast Bayesian method which uses the sequencing coverage information determined from the concentration of an RNA sample to estimate the posterior distribution of a true gene count. Our method has better or comparable performance compared to NOISeq and GFOLD, according to the results from simulations and experiments with real unreplicated data. We incorporated a previously unused sequencing coverage parameter into a procedure for differential gene expression analysis with RNA-Seq data. Our results suggest that our method can be used to overcome analytical bottlenecks in experiments with limited number of replicates and low sequencing coverage. The method is implemented in CORNAS (Coverage-dependent RNA-Seq), and is available at https://github.com/joel-lzb/CORNAS .

  16. Visualization in simulation tools: requirements and a tool specification to support the teaching of dynamic biological processes.

    PubMed

    Jørgensen, Katarina M; Haddow, Pauline C

    2011-08-01

    Simulation tools are playing an increasingly important role behind advances in the field of systems biology. However, the current generation of biological science students has either little or no experience with such tools. As such, this educational glitch is limiting both the potential use of such tools as well as the potential for tighter cooperation between the designers and users. Although some simulation tool producers encourage their use in teaching, little attempt has hitherto been made to analyze and discuss their suitability as an educational tool for noncomputing science students. In general, today's simulation tools assume that the user has a stronger mathematical and computing background than that which is found in most biological science curricula, thus making the introduction of such tools a considerable pedagogical challenge. This paper provides an evaluation of the pedagogical attributes of existing simulation tools for cell signal transduction based on Cognitive Load theory. Further, design recommendations for an improved educational simulation tool are provided. The study is based on simulation tools for cell signal transduction. However, the discussions are relevant to a broader biological simulation tool set.

  17. Linking genotype to phenotype in a changing ocean: inferring the genomic architecture of a blue mussel stress response with genome-wide association.

    PubMed

    Kingston, S E; Martino, P; Melendy, M; Reed, F A; Carlon, D B

    2018-03-01

    A key component to understanding the evolutionary response to a changing climate is linking underlying genetic variation to phenotypic variation in stress response. Here, we use a genome-wide association approach (GWAS) to understand the genetic architecture of calcification rates under simulated climate stress. We take advantage of the genomic gradient across the blue mussel hybrid zone (Mytilus edulis and Mytilus trossulus) in the Gulf of Maine (GOM) to link genetic variation with variance in calcification rates in response to simulated climate change. Falling calcium carbonate saturation states are predicted to negatively impact many marine organisms that build calcium carbonate shells - like blue mussels. We sampled wild mussels and measured net calcification phenotypes after exposing mussels to a 'climate change' common garden, where we raised temperature by 3°C, decreased pH by 0.2 units and limited food supply by filtering out planktonic particles >5 μm, compared to ambient GOM conditions in the summer. This climate change exposure greatly increased phenotypic variation in net calcification rates compared to ambient conditions. We then used regression models to link the phenotypic variation with over 170 000 single nucleotide polymorphism loci (SNPs) generated by genotype by sequencing to identify genomic locations associated with calcification phenotype, and estimate heritability and architecture of the trait. We identified at least one of potentially 2-10 genomic regions responsible for 30% of the phenotypic variation in calcification rates that are potential targets of natural selection by climate change. Our simulations suggest a power of 13.7% with our study's average effective sample size of 118 individuals and rare alleles, but a power of >90% when effective sample size is 900. © 2017 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2017 European Society For Evolutionary Biology.

  18. Molecular dynamics simulations and applications in computational toxicology and nanotoxicology.

    PubMed

    Selvaraj, Chandrabose; Sakkiah, Sugunadevi; Tong, Weida; Hong, Huixiao

    2018-02-01

    Nanotoxicology studies toxicity of nanomaterials and has been widely applied in biomedical researches to explore toxicity of various biological systems. Investigating biological systems through in vivo and in vitro methods is expensive and time taking. Therefore, computational toxicology, a multi-discipline field that utilizes computational power and algorithms to examine toxicology of biological systems, has gained attractions to scientists. Molecular dynamics (MD) simulations of biomolecules such as proteins and DNA are popular for understanding of interactions between biological systems and chemicals in computational toxicology. In this paper, we review MD simulation methods, protocol for running MD simulations and their applications in studies of toxicity and nanotechnology. We also briefly summarize some popular software tools for execution of MD simulations. Published by Elsevier Ltd.

  19. Simbios: an NIH national center for physics-based simulation of biological structures.

    PubMed

    Delp, Scott L; Ku, Joy P; Pande, Vijay S; Sherman, Michael A; Altman, Russ B

    2012-01-01

    Physics-based simulation provides a powerful framework for understanding biological form and function. Simulations can be used by biologists to study macromolecular assemblies and by clinicians to design treatments for diseases. Simulations help biomedical researchers understand the physical constraints on biological systems as they engineer novel drugs, synthetic tissues, medical devices, and surgical interventions. Although individual biomedical investigators make outstanding contributions to physics-based simulation, the field has been fragmented. Applications are typically limited to a single physical scale, and individual investigators usually must create their own software. These conditions created a major barrier to advancing simulation capabilities. In 2004, we established a National Center for Physics-Based Simulation of Biological Structures (Simbios) to help integrate the field and accelerate biomedical research. In 6 years, Simbios has become a vibrant national center, with collaborators in 16 states and eight countries. Simbios focuses on problems at both the molecular scale and the organismal level, with a long-term goal of uniting these in accurate multiscale simulations.

  20. A program code generator for multiphysics biological simulation using markup languages.

    PubMed

    Amano, Akira; Kawabata, Masanari; Yamashita, Yoshiharu; Rusty Punzalan, Florencio; Shimayoshi, Takao; Kuwabara, Hiroaki; Kunieda, Yoshitoshi

    2012-01-01

    To cope with the complexity of the biological function simulation models, model representation with description language is becoming popular. However, simulation software itself becomes complex in these environment, thus, it is difficult to modify the simulation conditions, target computation resources or calculation methods. In the complex biological function simulation software, there are 1) model equations, 2) boundary conditions and 3) calculation schemes. Use of description model file is useful for first point and partly second point, however, third point is difficult to handle for various calculation schemes which is required for simulation models constructed from two or more elementary models. We introduce a simulation software generation system which use description language based description of coupling calculation scheme together with cell model description file. By using this software, we can easily generate biological simulation code with variety of coupling calculation schemes. To show the efficiency of our system, example of coupling calculation scheme with three elementary models are shown.

  1. Simbios: an NIH national center for physics-based simulation of biological structures

    PubMed Central

    Delp, Scott L; Ku, Joy P; Pande, Vijay S; Sherman, Michael A

    2011-01-01

    Physics-based simulation provides a powerful framework for understanding biological form and function. Simulations can be used by biologists to study macromolecular assemblies and by clinicians to design treatments for diseases. Simulations help biomedical researchers understand the physical constraints on biological systems as they engineer novel drugs, synthetic tissues, medical devices, and surgical interventions. Although individual biomedical investigators make outstanding contributions to physics-based simulation, the field has been fragmented. Applications are typically limited to a single physical scale, and individual investigators usually must create their own software. These conditions created a major barrier to advancing simulation capabilities. In 2004, we established a National Center for Physics-Based Simulation of Biological Structures (Simbios) to help integrate the field and accelerate biomedical research. In 6 years, Simbios has become a vibrant national center, with collaborators in 16 states and eight countries. Simbios focuses on problems at both the molecular scale and the organismal level, with a long-term goal of uniting these in accurate multiscale simulations. PMID:22081222

  2. Whole-genome sequencing approaches for conservation biology: Advantages, limitations and practical recommendations.

    PubMed

    Fuentes-Pardo, Angela P; Ruzzante, Daniel E

    2017-10-01

    Whole-genome resequencing (WGR) is a powerful method for addressing fundamental evolutionary biology questions that have not been fully resolved using traditional methods. WGR includes four approaches: the sequencing of individuals to a high depth of coverage with either unresolved or resolved haplotypes, the sequencing of population genomes to a high depth by mixing equimolar amounts of unlabelled-individual DNA (Pool-seq) and the sequencing of multiple individuals from a population to a low depth (lcWGR). These techniques require the availability of a reference genome. This, along with the still high cost of shotgun sequencing and the large demand for computing resources and storage, has limited their implementation in nonmodel species with scarce genomic resources and in fields such as conservation biology. Our goal here is to describe the various WGR methods, their pros and cons and potential applications in conservation biology. WGR offers an unprecedented marker density and surveys a wide diversity of genetic variations not limited to single nucleotide polymorphisms (e.g., structural variants and mutations in regulatory elements), increasing their power for the detection of signatures of selection and local adaptation as well as for the identification of the genetic basis of phenotypic traits and diseases. Currently, though, no single WGR approach fulfils all requirements of conservation genetics, and each method has its own limitations and sources of potential bias. We discuss proposed ways to minimize such biases. We envision a not distant future where the analysis of whole genomes becomes a routine task in many nonmodel species and fields including conservation biology. © 2017 John Wiley & Sons Ltd.

  3. From non-random molecular structure to life and mind

    NASA Technical Reports Server (NTRS)

    Fox, S. W.

    1989-01-01

    The evolutionary hierarchy molecular structure-->macromolecular structure-->protobiological structure-->biological structure-->biological functions has been traced by experiments. The sequence always moves through protein. Extension of the experiments traces the formation of nucleic acids instructed by proteins. The proteins themselves were, in this picture, instructed by the self-sequencing of precursor amino acids. While the sequence indicated explains the thread of the emergence of life, protein in cellular membrane also provides the only known material basis for the emergence of mind in the context of emergence of life.

  4. Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse

    PubMed Central

    Hillier, LaDeana W.; Zody, Michael C.; Goldstein, Steve; She, Xinwe; Bult, Carol J.; Agarwala, Richa; Cherry, Joshua L.; DiCuccio, Michael; Hlavina, Wratko; Kapustin, Yuri; Meric, Peter; Maglott, Donna; Birtle, Zoë; Marques, Ana C.; Graves, Tina; Zhou, Shiguo; Teague, Brian; Potamousis, Konstantinos; Churas, Christopher; Place, Michael; Herschleb, Jill; Runnheim, Ron; Forrest, Daniel; Amos-Landgraf, James; Schwartz, David C.; Cheng, Ze; Lindblad-Toh, Kerstin; Eichler, Evan E.; Ponting, Chris P.

    2009-01-01

    The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non–protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not. PMID:19468303

  5. Genome Sequences of Three Strains of Aspergillus flavus for the Biological Control of Aflatoxin

    PubMed Central

    Scheffler, Brian E.; Duke, Mary; Ballard, Linda; Abbas, Hamed K.; Grodowitz, Michael J.

    2017-01-01

    ABSTRACT Aflatoxin is a carcinogenic contaminant of many commodities that are infected by Aspergillus flavus. Nonaflatoxigenic strains of A. flavus have been utilized as biological control agents. Here, we report the genome sequences from three biocontrol strains. This information will be useful in developing markers for postrelease monitoring of these fungi. PMID:29097466

  6. The Effect of Field-Dependence-Independence and Instructional Sequence on the Achievement of High School Biology Students.

    ERIC Educational Resources Information Center

    Douglass, Claudia B.

    The primary purpose of the reported study was to identify a possible interaction between the cognitive style of the students and the instructional sequence of the materials and their combined effect on achievement. The subjects were 627 biology students from six midwestern high schools. The students were ranked and classified as field-dependent…

  7. Meta-analysis of RNA-Seq data across cohorts in a multi-season feed efficiency study of crossbred beef steers accounts for biological and technical variability within season

    USDA-ARS?s Scientific Manuscript database

    High-throughput sequencing is often used for studies of the transcriptome, particularly for comparisons between experimental conditions. Due to sequencing costs, a limited number of biological replicates are typically considered in such experiments, leading to low detection power for differential ex...

  8. Whole-Exome Sequencing to Identify Novel Biological Pathways Associated With Infertility After Pelvic Inflammatory Disease.

    PubMed

    Taylor, Brandie D; Zheng, Xiaojing; Darville, Toni; Zhong, Wujuan; Konganti, Kranti; Abiodun-Ojo, Olayinka; Ness, Roberta B; O'Connell, Catherine M; Haggerty, Catherine L

    2017-01-01

    Ideal management of sexually transmitted infections (STI) may require risk markers for pathology or vaccine development. Previously, we identified common genetic variants associated with chlamydial pelvic inflammatory disease (PID) and reduced fecundity. As this explains only a proportion of the long-term morbidity risk, we used whole-exome sequencing to identify biological pathways that may be associated with STI-related infertility. We obtained stored DNA from 43 non-Hispanic black women with PID from the PID Evaluation and Clinical Health Study. Infertility was assessed at a mean of 84 months. Principal component analysis revealed no population stratification. Potential covariates did not significantly differ between groups. Sequencing kernel association test was used to examine associations between aggregates of variants on a single gene and infertility. The results from the sequencing kernel association test were used to choose "focus genes" (P < 0.01; n = 150) for subsequent Ingenuity Pathway Analysis to identify "gene sets" that are enriched in biologically relevant pathways. Pathway analysis revealed that focus genes were enriched in canonical pathways including, IL-1 signaling, P2Y purinergic receptor signaling, and bone morphogenic protein signaling. Focus genes were enriched in pathways that impact innate and adaptive immunity, protein kinase A activity, cellular growth, and DNA repair. These may alter host resistance or immunopathology after infection. Targeted sequencing of biological pathways identified in this study may provide insight into STI-related infertility.

  9. Heuristic Identification of Biological Architectures for Simulating Complex Hierarchical Genetic Interactions

    PubMed Central

    Moore, Jason H; Amos, Ryan; Kiralis, Jeff; Andrews, Peter C

    2015-01-01

    Simulation plays an essential role in the development of new computational and statistical methods for the genetic analysis of complex traits. Most simulations start with a statistical model using methods such as linear or logistic regression that specify the relationship between genotype and phenotype. This is appealing due to its simplicity and because these statistical methods are commonly used in genetic analysis. It is our working hypothesis that simulations need to move beyond simple statistical models to more realistically represent the biological complexity of genetic architecture. The goal of the present study was to develop a prototype genotype–phenotype simulation method and software that are capable of simulating complex genetic effects within the context of a hierarchical biology-based framework. Specifically, our goal is to simulate multilocus epistasis or gene–gene interaction where the genetic variants are organized within the framework of one or more genes, their regulatory regions and other regulatory loci. We introduce here the Heuristic Identification of Biological Architectures for simulating Complex Hierarchical Interactions (HIBACHI) method and prototype software for simulating data in this manner. This approach combines a biological hierarchy, a flexible mathematical framework, a liability threshold model for defining disease endpoints, and a heuristic search strategy for identifying high-order epistatic models of disease susceptibility. We provide several simulation examples using genetic models exhibiting independent main effects and three-way epistatic effects. PMID:25395175

  10. antaRNA: ant colony-based RNA sequence design.

    PubMed

    Kleinkauf, Robert; Mann, Martin; Backofen, Rolf

    2015-10-01

    RNA sequence design is studied at least as long as the classical folding problem. Although for the latter the functional fold of an RNA molecule is to be found ,: inverse folding tries to identify RNA sequences that fold into a function-specific target structure. In combination with RNA-based biotechnology and synthetic biology ,: reliable RNA sequence design becomes a crucial step to generate novel biochemical components. In this article ,: the computational tool antaRNA is presented. It is capable of compiling RNA sequences for a given structure that comply in addition with an adjustable full range objective GC-content distribution ,: specific sequence constraints and additional fuzzy structure constraints. antaRNA applies ant colony optimization meta-heuristics and its superior performance is shown on a biological datasets. http://www.bioinf.uni-freiburg.de/Software/antaRNA CONTACT: backofen@informatik.uni-freiburg.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  11. Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells.

    PubMed

    Beltman, Joost B; Urbanus, Jos; Velds, Arno; van Rooij, Nienke; Rohr, Jan C; Naik, Shalin H; Schumacher, Ton N

    2016-04-02

    Next generation sequencing (NGS) of amplified DNA is a powerful tool to describe genetic heterogeneity within cell populations that can both be used to investigate the clonal structure of cell populations and to perform genetic lineage tracing. For applications in which both abundant and rare sequences are biologically relevant, the relatively high error rate of NGS techniques complicates data analysis, as it is difficult to distinguish rare true sequences from spurious sequences that are generated by PCR or sequencing errors. This issue, for instance, applies to cellular barcoding strategies that aim to follow the amount and type of offspring of single cells, by supplying these with unique heritable DNA tags. Here, we use genetic barcoding data from the Illumina HiSeq platform to show that straightforward read threshold-based filtering of data is typically insufficient to filter out spurious barcodes. Importantly, we demonstrate that specific sequencing errors occur at an approximately constant rate across different samples that are sequenced in parallel. We exploit this observation by developing a novel approach to filter out spurious sequences. Application of our new method demonstrates its value in the identification of true sequences amongst spurious sequences in biological data sets.

  12. Simulations in Medicine and Biology: Insights and perspectives

    NASA Astrophysics Data System (ADS)

    Spyrou, George M.

    2015-01-01

    Modern medicine and biology have been transformed into quantitative sciences of high complexity, with challenging objectives. The aims of medicine are related to early diagnosis, effective therapy, accurate intervention, real time monitoring, procedures/systems/instruments optimization, error reduction, and knowledge extraction. Concurrently, following the explosive production of biological data concerning DNA, RNA, and protein biomolecules, a plethora of questions has been raised in relation to their structure and function, the interactions between them, their relationships and dependencies, their regulation and expression, their location, and their thermodynamic characteristics. Furthermore, the interplay between medicine and biology gives rise to fields like molecular medicine and systems biology which are further interconnected with physics, mathematics, informatics, and engineering. Modelling and simulation is a powerful tool in the fields of Medicine and Biology. Simulating the phenomena hidden inside a diagnostic or therapeutic medical procedure, we are able to obtain control on the whole system and perform multilevel optimization. Furthermore, modelling and simulation gives insights in the various scales of biological representation, facilitating the understanding of the huge amounts of derived data and the related mechanisms behind them. Several examples, as well as the insights and the perspectives of simulations in biomedicine will be presented.

  13. Climate matching: implications for the biological control of hemlock woolly adelgid

    Treesearch

    R. Talbot III Trotter

    2008-01-01

    Classical biological control programs are faced with a daunting challenge: inserting a new species into an existing ecological system. In order for the newly introduced biological control species to survive and reproduce, the recipient ecosystem must provide the required biotic and abiotic requirements. The Adelgid Biological Control simulator (ABCs), a simulation...

  14. Assembling in Sequence: A Saleable Work Skill. Occupation Simulation Packet. Grades 3rd-4th.

    ERIC Educational Resources Information Center

    Hueston, Jean

    This teacher's guide for grades 3 and 4 contains simulated work experiences for students using the isolated skill concept - assembling in sequence. Teacher instructions include objectives, evaluation, and sequence of activities. The guide contains pre-tests and post-tests with instructions and answer keys. Three pre-skill activities are suggested,…

  15. Integrating Mathematics into the Introductory Biology Laboratory Course

    ERIC Educational Resources Information Center

    White, James D.; Carpenter, Jenna P.

    2008-01-01

    Louisiana Tech University has an integrated science curriculum for its mathematics, chemistry, physics, computer science, biology-research track and secondary mathematics and science education majors. The curriculum focuses on the calculus sequence and introductory labs in biology, physics, and chemistry. In the introductory biology laboratory…

  16. YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs

    PubMed Central

    Shigematsu, Megumi; Honda, Shozo; Loher, Phillipe; Telonis, Aristeidis G.; Rigoutsos, Isidore

    2017-01-01

    Abstract Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes. PMID:28108659

  17. From cheek swabs to consensus sequences: an A to Z protocol for high-throughput DNA sequencing of complete human mitochondrial genomes

    PubMed Central

    2014-01-01

    Background Next-generation DNA sequencing (NGS) technologies have made huge impacts in many fields of biological research, but especially in evolutionary biology. One area where NGS has shown potential is for high-throughput sequencing of complete mtDNA genomes (of humans and other animals). Despite the increasing use of NGS technologies and a better appreciation of their importance in answering biological questions, there remain significant obstacles to the successful implementation of NGS-based projects, especially for new users. Results Here we present an ‘A to Z’ protocol for obtaining complete human mitochondrial (mtDNA) genomes – from DNA extraction to consensus sequence. Although designed for use on humans, this protocol could also be used to sequence small, organellar genomes from other species, and also nuclear loci. This protocol includes DNA extraction, PCR amplification, fragmentation of PCR products, barcoding of fragments, sequencing using the 454 GS FLX platform, and a complete bioinformatics pipeline (primer removal, reference-based mapping, output of coverage plots and SNP calling). Conclusions All steps in this protocol are designed to be straightforward to implement, especially for researchers who are undertaking next-generation sequencing for the first time. The molecular steps are scalable to large numbers (hundreds) of individuals and all steps post-DNA extraction can be carried out in 96-well plate format. Also, the protocol has been assembled so that individual ‘modules’ can be swapped out to suit available resources. PMID:24460871

  18. Production of a full-length infectious GFP-tagged cDNA clone of Beet mild yellowing virus for the study of plant-polerovirus interactions.

    PubMed

    Stevens, Mark; Viganó, Felicita

    2007-04-01

    The full-length cDNA of Beet mild yellowing virus (Broom's Barn isolate) was sequenced and cloned into the vector pLitmus 29 (pBMYV-BBfl). The sequence of BMYV-BBfl (5721 bases) shared 96% and 98% nucleotide identity with the other complete sequences of BMYV (BMYV-2ITB, France and BMYV-IPP, Germany respectively). Full-length capped RNA transcripts of pBMYV-BBfl were synthesised and found to be biologically active in Arabidopsis thaliana protoplasts following electroporation or PEG inoculation when the protoplasts were subsequently analysed using serological and molecular methods. The BMYV sequence was modified by inserting DNA that encoded the jellyfish green fluorescent protein (GFP) into the P5 gene close to its 3' end. A. thaliana protoplasts electroporated with these RNA transcripts were biologically active and up to 2% of transfected protoplasts showed GFP-specific fluorescence. The exploitation of these cDNA clones for the study of the biology of beet poleroviruses is discussed.

  19. [Big Data Revolution or Data Hubris? : On the Data Positivism of Molecular Biology].

    PubMed

    Gramelsberger, Gabriele

    2017-12-01

    Genome data, the core of the 2008 proclaimed big data revolution in biology, are automatically generated and analyzed. The transition from the manual laboratory practice of electrophoresis sequencing to automated DNA-sequencing machines and software-based analysis programs was completed between 1982 and 1992. This transition facilitated the first data deluge, which was considerably increased by the second and third generation of DNA-sequencers during the 2000s. However, the strategies for evaluating sequence data were also transformed along with this transition. The paper explores both the computational strategies of automation, as well as the data evaluation culture connected with it, in order to provide a complete picture of the complexity of today's data generation and its intrinsic data positivism. This paper is thereby guided by the question, whether this data positivism is the basis of the big data revolution of molecular biology announced today, or it marks the beginning of its data hubris.

  20. A Fast-Time Simulation Tool for Analysis of Airport Arrival Traffic

    NASA Technical Reports Server (NTRS)

    Erzberger, Heinz; Meyn, Larry A.; Neuman, Frank

    2004-01-01

    The basic objective of arrival sequencing in air traffic control automation is to match traffic demand and airport capacity while minimizing delays. The performance of an automated arrival scheduling system, such as the Traffic Management Advisor developed by NASA for the FAA, can be studied by a fast-time simulation that does not involve running expensive and time-consuming real-time simulations. The fast-time simulation models runway configurations, the characteristics of arrival traffic, deviations from predicted arrival times, as well as the arrival sequencing and scheduling algorithm. This report reviews the development of the fast-time simulation method used originally by NASA in the design of the sequencing and scheduling algorithm for the Traffic Management Advisor. The utility of this method of simulation is demonstrated by examining the effect on delays of altering arrival schedules at a hub airport.

  1. RNA folding kinetics using Monte Carlo and Gillespie algorithms.

    PubMed

    Clote, Peter; Bayegan, Amir H

    2018-04-01

    RNA secondary structure folding kinetics is known to be important for the biological function of certain processes, such as the hok/sok system in E. coli. Although linear algebra provides an exact computational solution of secondary structure folding kinetics with respect to the Turner energy model for tiny ([Formula: see text]20 nt) RNA sequences, the folding kinetics for larger sequences can only be approximated by binning structures into macrostates in a coarse-grained model, or by repeatedly simulating secondary structure folding with either the Monte Carlo algorithm or the Gillespie algorithm. Here we investigate the relation between the Monte Carlo algorithm and the Gillespie algorithm. We prove that asymptotically, the expected time for a K-step trajectory of the Monte Carlo algorithm is equal to [Formula: see text] times that of the Gillespie algorithm, where [Formula: see text] denotes the Boltzmann expected network degree. If the network is regular (i.e. every node has the same degree), then the mean first passage time (MFPT) computed by the Monte Carlo algorithm is equal to MFPT computed by the Gillespie algorithm multiplied by [Formula: see text]; however, this is not true for non-regular networks. In particular, RNA secondary structure folding kinetics, as computed by the Monte Carlo algorithm, is not equal to the folding kinetics, as computed by the Gillespie algorithm, although the mean first passage times are roughly correlated. Simulation software for RNA secondary structure folding according to the Monte Carlo and Gillespie algorithms is publicly available, as is our software to compute the expected degree of the network of secondary structures of a given RNA sequence-see http://bioinformatics.bc.edu/clote/RNAexpNumNbors .

  2. Discriminative motif discovery via simulated evolution and random under-sampling.

    PubMed

    Song, Tao; Gu, Hong

    2014-01-01

    Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs) training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes.

  3. Hybrid Modeling for Testing Intelligent Software for Lunar-Mars Closed Life Support

    NASA Technical Reports Server (NTRS)

    Malin, Jane T.; Nicholson, Leonard S. (Technical Monitor)

    1999-01-01

    Intelligent software is being developed for closed life support systems with biological components, for human exploration of the Moon and Mars. The intelligent software functions include planning/scheduling, reactive discrete control and sequencing, management of continuous control, and fault detection, diagnosis, and management of failures and errors. Four types of modeling information have been essential to system modeling and simulation to develop and test the software and to provide operational model-based what-if analyses: discrete component operational and failure modes; continuous dynamic performance within component modes, modeled qualitatively or quantitatively; configuration of flows and power among components in the system; and operations activities and scenarios. CONFIG, a multi-purpose discrete event simulation tool that integrates all four types of models for use throughout the engineering and operations life cycle, has been used to model components and systems involved in the production and transfer of oxygen and carbon dioxide in a plant-growth chamber and between that chamber and a habitation chamber with physicochemical systems for gas processing.

  4. An experimentally-informed coarse-grained 3-site-per-nucleotide model of DNA: Structure, thermodynamics, and dynamics of hybridization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hinckley, Daniel M.; Freeman, Gordon S.; Whitmer, Jonathan K.

    2013-10-14

    A new 3-Site-Per-Nucleotide coarse-grained model for DNA is presented. The model includes anisotropic potentials between bases involved in base stacking and base pair interactions that enable the description of relevant structural properties, including the major and minor grooves. In an improvement over available coarse-grained models, the correct persistence length is recovered for both ssDNA and dsDNA, allowing for simulation of non-canonical structures such as hairpins. DNA melting temperatures, measured for duplexes and hairpins by integrating over free energy surfaces generated using metadynamics simulations, are shown to be in quantitative agreement with experiment for a variety of sequences and conditions. Hybridizationmore » rate constants, calculated using forward-flux sampling, are also shown to be in good agreement with experiment. The coarse-grained model presented here is suitable for use in biological and engineering applications, including nucleosome positioning and DNA-templated engineering.« less

  5. Draft sequencing and comparative genomics of Xylella fastidiosa strains reveal novel biological insights.

    PubMed

    Bhattacharyya, Anamitra; Stilwagen, Stephanie; Reznik, Gary; Feil, Helene; Feil, William S; Anderson, Iain; Bernal, Axel; D'Souza, Mark; Ivanova, Natalia; Kapatral, Vinayak; Larsen, Niels; Los, Tamara; Lykidis, Athanasios; Selkov, Eugene; Walunas, Theresa L; Purcell, Alexander; Edwards, Rob A; Hawkins, Trevor; Haselkorn, Robert; Overbeek, Ross; Kyrpides, Nikos C; Predki, Paul F

    2002-10-01

    Draft sequencing is a rapid and efficient method for determining the near-complete sequence of microbial genomes. Here we report a comparative analysis of one complete and two draft genome sequences of the phytopathogenic bacterium, Xylella fastidiosa, which causes serious disease in plants, including citrus, almond, and oleander. We present highlights of an in silico analysis based on a comparison of reconstructions of core biological subsystems. Cellular pathway reconstructions have been used to identify a small number of genes, which are likely to reside within the draft genomes but are not captured in the draft assembly. These represented only a small fraction of all genes and were predominantly large and small ribosomal subunit protein components. By using this approach, some of the inherent limitations of draft sequence can be significantly reduced. Despite the incomplete nature of the draft genomes, it is possible to identify several phage-related genes, which appear to be absent from the draft genomes and not the result of insufficient sequence sampling. This region may therefore identify potential host-specific functions. Based on this first functional reconstruction of a phytopathogenic microbe, we spotlight an unusual respiration machinery as a potential target for biological control. We also predicted and developed a new defined growth medium for Xylella.

  6. Fossils out of sequence: Computer simulations and strategies for dealing with stratigraphic disorder

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cutler, A.H.; Flessa, K.W.

    Microstratigraphic resolution is limited by vertical mixing and reworking of fossils. Stratigraphic disorder is the degree to which fossils within a stratigraphic sequence are not in proper chronological order. Stratigraphic disorder arises through in situ vertical mixing of fossils and reworking of older fossils into younger deposits. The authors simulated the effects of mixing and reworking by simple computer models, and measured stratigraphic disorder using rank correlation between age and stratigraphic position (Spearman and Kendall coefficients). Mixing was simulated by randomly transposing pairs of adjacent fossils in a sequence. Reworking was simulated by randomly inserting older fossils into a youngermore » sequence. Mixing is an inefficient means of producing disorder; after 500 mixing steps stratigraphic order is still significant at the 99% to 95% level, depending on the coefficient used. Reworking disorders sequences very efficiently: significant order begins to be lost when reworked shells make up 35% of the sequence. Thus a sequence can be dominated by undisturbed, autochthonous shells and still be disordered. The effects of mixing-produced disorder can be minimized by increasing sample size at each horizon. Increased spacing between samples is of limited utility in dealing with disordered sequences: while widely separated samples are more likely to be stratigraphically ordered, the smaller number of samples makes the detection of trends problematic.« less

  7. On Designing Multicore-Aware Simulators for Systems Biology Endowed with OnLine Statistics

    PubMed Central

    Calcagno, Cristina; Coppo, Mario

    2014-01-01

    The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed. PMID:25050327

  8. On designing multicore-aware simulators for systems biology endowed with OnLine statistics.

    PubMed

    Aldinucci, Marco; Calcagno, Cristina; Coppo, Mario; Damiani, Ferruccio; Drocco, Maurizio; Sciacca, Eva; Spinella, Salvatore; Torquati, Massimo; Troina, Angelo

    2014-01-01

    The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed.

  9. Virtual Transgenics: Using a Molecular Biology Simulation to Impact Student Academic Achievement and Attitudes

    NASA Astrophysics Data System (ADS)

    Shegog, Ross; Lazarus, Melanie M.; Murray, Nancy G.; Diamond, Pamela M.; Sessions, Nathalie; Zsigmond, Eva

    2012-10-01

    The transgenic mouse model is useful for studying the causes and potential cures for human genetic diseases. Exposing high school biology students to laboratory experience in developing transgenic animal models is logistically prohibitive. Computer-based simulation, however, offers this potential in addition to advantages of fidelity and reach. This study describes and evaluates a computer-based simulation to train advanced placement high school science students in laboratory protocols, a transgenic mouse model was produced. A simulation module on preparing a gene construct in the molecular biology lab was evaluated using a randomized clinical control design with advanced placement high school biology students in Mercedes, Texas ( n = 44). Pre-post tests assessed procedural and declarative knowledge, time on task, attitudes toward computers for learning and towards science careers. Students who used the simulation increased their procedural and declarative knowledge regarding molecular biology compared to those in the control condition (both p < 0.005). Significant increases continued to occur with additional use of the simulation ( p < 0.001). Students in the treatment group became more positive toward using computers for learning ( p < 0.001). The simulation did not significantly affect attitudes toward science in general. Computer simulation of complex transgenic protocols have potential to provide a "virtual" laboratory experience as an adjunct to conventional educational approaches.

  10. Sequence-dependent DNA deformability studied using molecular dynamics simulations.

    PubMed

    Fujii, Satoshi; Kono, Hidetoshi; Takenaka, Shigeori; Go, Nobuhiro; Sarai, Akinori

    2007-01-01

    Proteins recognize specific DNA sequences not only through direct contact between amino acids and bases, but also indirectly based on the sequence-dependent conformation and deformability of the DNA (indirect readout). We used molecular dynamics simulations to analyze the sequence-dependent DNA conformations of all 136 possible tetrameric sequences sandwiched between CGCG sequences. The deformability of dimeric steps obtained by the simulations is consistent with that by the crystal structures. The simulation results further showed that the conformation and deformability of the tetramers can highly depend on the flanking base pairs. The conformations of xATx tetramers show the most rigidity and are not affected by the flanking base pairs and the xYRx show by contrast the greatest flexibility and change their conformations depending on the base pairs at both ends, suggesting tetramers with the same central dimer can show different deformabilities. These results suggest that analysis of dimeric steps alone may overlook some conformational features of DNA and provide insight into the mechanism of indirect readout during protein-DNA recognition. Moreover, the sequence dependence of DNA conformation and deformability may be used to estimate the contribution of indirect readout to the specificity of protein-DNA recognition as well as nucleosome positioning and large-scale behavior of nucleic acids.

  11. A direct method for computing extreme value (Gumbel) parameters for gapped biological sequence alignments.

    PubMed

    Quinn, Terrance; Sinkala, Zachariah

    2014-01-01

    We develop a general method for computing extreme value distribution (Gumbel, 1958) parameters for gapped alignments. Our approach uses mixture distribution theory to obtain associated BLOSUM matrices for gapped alignments, which in turn are used for determining significance of gapped alignment scores for pairs of biological sequences. We compare our results with parameters already obtained in the literature.

  12. International Space Station Research Plan, Assembly Sequence Rev., F

    DTIC Science & Technology

    2000-08-01

    muscles ü Higher risk for bone fracture upon return to Earth ü Potential for “slipped discs” ü Diminished ability to quickly respond to emergencies...Office of Biological and Physical Research International Space Station Research Plan Assembly Sequence Rev. F, Aug. 2000l . , . Report...Organization Name(s) and Address(es) NASA, Office of Biological and Physical Research Performing Organization Report Number Sponsoring/Monitoring Agency

  13. Matriarch: A Python Library for Materials Architecture.

    PubMed

    Giesa, Tristan; Jagadeesan, Ravi; Spivak, David I; Buehler, Markus J

    2015-10-12

    Biological materials, such as proteins, often have a hierarchical structure ranging from basic building blocks at the nanoscale (e.g., amino acids) to assembled structures at the macroscale (e.g., fibers). Current software for materials engineering allows the user to specify polypeptide chains and simple secondary structures prior to molecular dynamics simulation, but is not flexible in terms of the geometric arrangement of unequilibrated structures. Given some knowledge of a larger-scale structure, instructing the software to create it can be very difficult and time-intensive. To this end, the present paper reports a mathematical language, using category theory, to describe the architecture of a material, i.e., its set of building blocks and instructions for combining them. While this framework applies to any hierarchical material, here we concentrate on proteins. We implement this mathematical language as an open-source Python library called Matriarch. It is a domain-specific language that gives the user the ability to create almost arbitrary structures with arbitrary amino acid sequences and, from them, generate Protein Data Bank (PDB) files. In this way, Matriarch is more powerful than commercial software now available. Matriarch can be used in tandem with molecular dynamics simulations and helps engineers design and modify biologically inspired materials based on their desired functionality. As a case study, we use our software to alter both building blocks and building instructions for tropocollagen, and determine their effect on its structure and mechanical properties.

  14. Molecular mechanics and dynamics characterization of an in silico mutated protein: a stand-alone lab module or support activity for in vivo and in vitro analyses of targeted proteins.

    PubMed

    Chiang, Harry; Robinson, Lucy C; Brame, Cynthia J; Messina, Troy C

    2013-01-01

    Over the past 20 years, the biological sciences have increasingly incorporated chemistry, physics, computer science, and mathematics to aid in the development and use of mathematical models. Such combined approaches have been used to address problems from protein structure-function relationships to the workings of complex biological systems. Computer simulations of molecular events can now be accomplished quickly and with standard computer technology. Also, simulation software is freely available for most computing platforms, and online support for the novice user is ample. We have therefore created a molecular dynamics laboratory module to enhance undergraduate student understanding of molecular events underlying organismal phenotype. This module builds on a previously described project in which students use site-directed mutagenesis to investigate functions of conserved sequence features in members of a eukaryotic protein kinase family. In this report, we detail the laboratory activities of a MD module that provide a complement to phenotypic outcomes by providing a hypothesis-driven and quantifiable measure of predicted structural changes caused by targeted mutations. We also present examples of analyses students may perform. These laboratory activities can be integrated with genetics or biochemistry experiments as described, but could also be used independently in any course that would benefit from a quantitative approach to protein structure-function relationships. Copyright © 2013 Wiley Periodicals, Inc.

  15. Impact of paint shop decanter effluents on biological treatability of automotive industry wastewater.

    PubMed

    Güven, Didem; Hanhan, Oytun; Aksoy, Elif Ceren; Insel, Güçlü; Çokgör, Emine

    2017-05-15

    A lab-scale Sequencing Batch Reactor (SBR) was implemented to investigate biological treatability and kinetic characteristics of paint shop wastewater (PSW) together with main stream wastewater (MSW) of a bus production factory. Readily biodegradable and slowly biodegradable COD fractions of MWS were determined by respirometric analysis: 4.2% (S S ), 10.4% (S H ) and 59.3% (X S ). Carbon and nitrogen removal performance of the SBR feeding with MSW alone were obtained as 89% and 58%, respectively. When PSW was introduced to MSW, both carbon and nitrogen removal were deteriorated. Model simulation indicated that maximum heterotrophic growth rate decreased from 7.2 to 5.7day -1 , maximum hydrolysis rates were reduced from 6 to 4day -1 (k hS ) and 4 to 1day -1 (k hX ). Based on the dynamic model simulation for the evaluation of nitrogen removal, a maximum specific nitrifier growth rate was obtained as 0.45day -1 for MSW feeding alone. When PSW was introduced, nitrification was completely inhibited and following the termination of PSW addition, nitrogen removal performance was recovered in about 100 days, however with a much lower nitrifier growth rate (0.1day -1 ), possibly due to accumulation of toxic compounds in the sludge. Obviously, a longer recovery period is required to ensure an active nitrifier community. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. The nature and use of prediction skills in a biological computer simulation

    NASA Astrophysics Data System (ADS)

    Lavoie, Derrick R.; Good, Ron

    The primary goal of this study was to examine the science process skill of prediction using qualitative research methodology. The think-aloud interview, modeled after Ericsson and Simon (1984), let to the identification of 63 program exploration and prediction behaviors.The performance of seven formal and seven concrete operational high-school biology students were videotaped during a three-phase learning sequence on water pollution. Subjects explored the effects of five independent variables on two dependent variables over time using a computer-simulation program. Predictions were made concerning the effect of the independent variables upon dependent variables through time. Subjects were identified according to initial knowledge of the subject matter and success at solving three selected prediction problems.Successful predictors generally had high initial knowledge of the subject matter and were formal operational. Unsuccessful predictors generally had low initial knowledge and were concrete operational. High initial knowledge seemed to be more important to predictive success than stage of Piagetian cognitive development.Successful prediction behaviors involved systematic manipulation of the independent variables, note taking, identification and use of appropriate independent-dependent variable relationships, high interest and motivation, and in general, higher-level thinking skills. Behaviors characteristic of unsuccessful predictors were nonsystematic manipulation of independent variables, lack of motivation and persistence, misconceptions, and the identification and use of inappropriate independent-dependent variable relationships.

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Heng, E-mail: hengli@mdanderson.org; Zhu, X. Ronald; Zhang, Xiaodong

    Purpose: To develop and validate a novel delivery strategy for reducing the respiratory motion–induced dose uncertainty of spot-scanning proton therapy. Methods and Materials: The spot delivery sequence was optimized to reduce dose uncertainty. The effectiveness of the delivery sequence optimization was evaluated using measurements and patient simulation. One hundred ninety-one 2-dimensional measurements using different delivery sequences of a single-layer uniform pattern were obtained with a detector array on a 1-dimensional moving platform. Intensity modulated proton therapy plans were generated for 10 lung cancer patients, and dose uncertainties for different delivery sequences were evaluated by simulation. Results: Without delivery sequence optimization,more » the maximum absolute dose error can be up to 97.2% in a single measurement, whereas the optimized delivery sequence results in a maximum absolute dose error of ≤11.8%. In patient simulation, the optimized delivery sequence reduces the mean of fractional maximum absolute dose error compared with the regular delivery sequence by 3.3% to 10.6% (32.5-68.0% relative reduction) for different patients. Conclusions: Optimizing the delivery sequence can reduce dose uncertainty due to respiratory motion in spot-scanning proton therapy, assuming the 4-dimensional CT is a true representation of the patients' breathing patterns.« less

  18. BinQuasi: a peak detection method for ChIP-sequencing data with biological replicates.

    PubMed

    Goren, Emily; Liu, Peng; Wang, Chao; Wang, Chong

    2018-04-19

    ChIP-seq experiments that are aimed at detecting DNA-protein interactions require biological replication to draw inferential conclusions, however there is no current consensus on how to analyze ChIP-seq data with biological replicates. Very few methodologies exist for the joint analysis of replicated ChIP-seq data, with approaches ranging from combining the results of analyzing replicates individually to joint modeling of all replicates. Combining the results of individual replicates analyzed separately can lead to reduced peak classification performance compared to joint modeling. Currently available methods for joint analysis may fail to control the false discovery rate at the nominal level. We propose BinQuasi, a peak caller for replicated ChIP-seq data, that jointly models biological replicates using a generalized linear model framework and employs a one-sided quasi-likelihood ratio test to detect peaks. When applied to simulated data and real datasets, BinQuasi performs favorably compared to existing methods, including better control of false discovery rate than existing joint modeling approaches. BinQuasi offers a flexible approach to joint modeling of replicated ChIP-seq data which is preferable to combining the results of replicates analyzed individually. Source code is freely available for download at https://cran.r-project.org/package=BinQuasi, implemented in R. pliu@iastate.edu or egoren@iastate.edu. Supplementary material is available at Bioinformatics online.

  19. A normal mode-based geometric simulation approach for exploring biologically relevant conformational transitions in proteins.

    PubMed

    Ahmed, Aqeel; Rippmann, Friedrich; Barnickel, Gerhard; Gohlke, Holger

    2011-07-25

    A three-step approach for multiscale modeling of protein conformational changes is presented that incorporates information about preferred directions of protein motions into a geometric simulation algorithm. The first two steps are based on a rigid cluster normal-mode analysis (RCNMA). Low-frequency normal modes are used in the third step (NMSim) to extend the recently introduced idea of constrained geometric simulations of diffusive motions in proteins by biasing backbone motions of the protein, whereas side-chain motions are biased toward favorable rotamer states. The generated structures are iteratively corrected regarding steric clashes and stereochemical constraint violations. The approach allows performing three simulation types: unbiased exploration of conformational space; pathway generation by a targeted simulation; and radius of gyration-guided simulation. When applied to a data set of proteins with experimentally observed conformational changes, conformational variabilities are reproduced very well for 4 out of 5 proteins that show domain motions, with correlation coefficients r > 0.70 and as high as r = 0.92 in the case of adenylate kinase. In 7 out of 8 cases, NMSim simulations starting from unbound structures are able to sample conformations that are similar (root-mean-square deviation = 1.0-3.1 Å) to ligand bound conformations. An NMSim generated pathway of conformational change of adenylate kinase correctly describes the sequence of domain closing. The NMSim approach is a computationally efficient alternative to molecular dynamics simulations for conformational sampling of proteins. The generated conformations and pathways of conformational transitions can serve as input to docking approaches or as starting points for more sophisticated sampling techniques.

  20. Analysis of simulated image sequences from sensors for restricted-visibility operations

    NASA Technical Reports Server (NTRS)

    Kasturi, Rangachar

    1991-01-01

    A real time model of the visible output from a 94 GHz sensor, based on a radiometric simulation of the sensor, was developed. A sequence of images as seen from an aircraft as it approaches for landing was simulated using this model. Thirty frames from this sequence of 200 x 200 pixel images were analyzed to identify and track objects in the image using the Cantata image processing package within the visual programming environment provided by the Khoros software system. The image analysis operations are described.

  1. Genome Science: A Video Tour of the Washington University Genome Sequencing Center for High School and Undergraduate Students

    ERIC Educational Resources Information Center

    Flowers, Susan K.; Easter, Carla; Holmes, Andrea; Cohen, Brian; Bednarski, April E.; Mardis, Elaine R.; Wilson, Richard K.; Elgin, Sarah C. R.

    2005-01-01

    Sequencing of the human genome has ushered in a new era of biology. The technologies developed to facilitate the sequencing of the human genome are now being applied to the sequencing of other genomes. In 2004, a partnership was formed between Washington University School of Medicine Genome Sequencing Center's Outreach Program and Washington…

  2. Method and apparatus for biological sequence comparison

    DOEpatents

    Marr, T.G.; Chang, W.I.

    1997-12-23

    A method and apparatus are disclosed for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence. 5 figs.

  3. Method and apparatus for biological sequence comparison

    DOEpatents

    Marr, Thomas G.; Chang, William I-Wei

    1997-01-01

    A method and apparatus for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence.

  4. A highly conserved N-terminal sequence for teleost vitellogenin with potential value to the biochemistry, molecular biology and pathology of vitellogenesis

    USGS Publications Warehouse

    Folmar, L.D.; Denslow, N.D.; Wallace, R.A.; LaFleur, G.; Gross, T.S.; Bonomelli, S.; Sullivan, C.V.

    1995-01-01

    N-terminal amino acid sequences for vitellogenin (Vtg) from six species of teleost fish (striped bass, mummichog, pinfish, brown bullhead, medaka, yellow perch and the sturgeon) are compared with published N-terminal Vtg sequences for the lamprey, clawed frog and domestic chicken. Striped bass and mummichog had 100% identical amino acids between positions 7 and 21, while pinfish, brown bullhead, sturgeon, lamprey, Xenopus and chicken had 87%, 93%, 60%, 47%, 47-60%) for four transcripts and had 40% identical, respectively, with striped bass for the same positions. Partial sequences obtained for medaka and yellow perch were 100% identical between positions 5 to 10. The potential utility of this conserved sequence for studies on the biochemistry, molecular biology and pathology of vitellogenesis is discussed.

  5. Imagery May Arise from Associations Formed through Sensory Experience: A Network of Spiking Neurons Controlling a Robot Learns Visual Sequences in Order to Perform a Mental Rotation Task

    PubMed Central

    McKinstry, Jeffrey L.; Fleischer, Jason G.; Chen, Yanqing; Gall, W. Einar; Edelman, Gerald M.

    2016-01-01

    Mental imagery occurs “when a representation of the type created during the initial phases of perception is present but the stimulus is not actually being perceived.” How does the capability to perform mental imagery arise? Extending the idea that imagery arises from learned associations, we propose that mental rotation, a specific form of imagery, could arise through the mechanism of sequence learning–that is, by learning to regenerate the sequence of mental images perceived while passively observing a rotating object. To demonstrate the feasibility of this proposal, we constructed a simulated nervous system and embedded it within a behaving humanoid robot. By observing a rotating object, the system learns the sequence of neural activity patterns generated by the visual system in response to the object. After learning, it can internally regenerate a similar sequence of neural activations upon briefly viewing the static object. This system learns to perform a mental rotation task in which the subject must determine whether two objects are identical despite differences in orientation. As with human subjects, the time taken to respond is proportional to the angular difference between the two stimuli. Moreover, as reported in humans, the system fills in intermediate angles during the task, and this putative mental rotation activates the same pathways that are activated when the system views physical rotation. This work supports the proposal that mental rotation arises through sequence learning and the idea that mental imagery aids perception through learned associations, and suggests testable predictions for biological experiments. PMID:27653977

  6. Open-Source Sequence Clustering Methods Improve the State Of the Art.

    PubMed

    Kopylova, Evguenia; Navas-Molina, Jose A; Mercier, Céline; Xu, Zhenjiang Zech; Mahé, Frédéric; He, Yan; Zhou, Hong-Wei; Rognes, Torbjørn; Caporaso, J Gregory; Knight, Rob

    2016-01-01

    Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH's most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release. IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http://dx.doi.org/10.1186/s12915-014-0069-1).

  7. Sequence-specific bias correction for RNA-seq data using recurrent neural networks.

    PubMed

    Zhang, Yao-Zhong; Yamaguchi, Rui; Imoto, Seiya; Miyano, Satoru

    2017-01-25

    The recent success of deep learning techniques in machine learning and artificial intelligence has stimulated a great deal of interest among bioinformaticians, who now wish to bring the power of deep learning to bare on a host of bioinformatical problems. Deep learning is ideally suited for biological problems that require automatic or hierarchical feature representation for biological data when prior knowledge is limited. In this work, we address the sequence-specific bias correction problem for RNA-seq data redusing Recurrent Neural Networks (RNNs) to model nucleotide sequences without pre-determining sequence structures. The sequence-specific bias of a read is then calculated based on the sequence probabilities estimated by RNNs, and used in the estimation of gene abundance. We explore the application of two popular RNN recurrent units for this task and demonstrate that RNN-based approaches provide a flexible way to model nucleotide sequences without knowledge of predetermined sequence structures. Our experiments show that training a RNN-based nucleotide sequence model is efficient and RNN-based bias correction methods compare well with the-state-of-the-art sequence-specific bias correction method on the commonly used MAQC-III data set. RNNs provides an alternative and flexible way to calculate sequence-specific bias without explicitly pre-determining sequence structures.

  8. SeqSIMLA2_exact: simulate multiple disease sites in large pedigrees with given disease status for diseases with low prevalence.

    PubMed

    Yao, Po-Ju; Chung, Ren-Hua

    2016-02-15

    It is difficult for current simulation tools to simulate sequence data in a pre-specified pedigree structure and pre-specified affection status. Previously, we developed a flexible tool, SeqSIMLA2, for simulating sequence data in either unrelated case-control or family samples with different disease and quantitative trait models. Here we extended the tool to efficiently simulate sequences with multiple disease sites in large pedigrees with a given disease status for each pedigree member, assuming that the disease prevalence is low. SeqSIMLA2_exact is implemented with C++ and is available at http://seqsimla.sourceforge.net. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  9. Streamlining the Design-to-Build Transition with Build-Optimization Software Tools.

    PubMed

    Oberortner, Ernst; Cheng, Jan-Fang; Hillson, Nathan J; Deutsch, Samuel

    2017-03-17

    Scaling-up capabilities for the design, build, and test of synthetic biology constructs holds great promise for the development of new applications in fuels, chemical production, or cellular-behavior engineering. Construct design is an essential component in this process; however, not every designed DNA sequence can be readily manufactured, even using state-of-the-art DNA synthesis methods. Current biological computer-aided design and manufacture tools (bioCAD/CAM) do not adequately consider the limitations of DNA synthesis technologies when generating their outputs. Designed sequences that violate DNA synthesis constraints may require substantial sequence redesign or lead to price-premiums and temporal delays, which adversely impact the efficiency of the DNA manufacturing process. We have developed a suite of build-optimization software tools (BOOST) to streamline the design-build transition in synthetic biology engineering workflows. BOOST incorporates knowledge of DNA synthesis success determinants into the design process to output ready-to-build sequences, preempting the need for sequence redesign. The BOOST web application is available at https://boost.jgi.doe.gov and its Application Program Interfaces (API) enable integration into automated, customized DNA design processes. The herein presented results highlight the effectiveness of BOOST in reducing DNA synthesis costs and timelines.

  10. Sequence and Structure Dependent DNA-DNA Interactions

    NASA Astrophysics Data System (ADS)

    Kopchick, Benjamin; Qiu, Xiangyun

    Molecular forces between dsDNA strands are largely dominated by electrostatics and have been extensively studied. Quantitative knowledge has been accumulated on how DNA-DNA interactions are modulated by varied biological constituents such as ions, cationic ligands, and proteins. Despite its central role in biology, the sequence of DNA has not received substantial attention and ``random'' DNA sequences are typically used in biophysical studies. However, ~50% of human genome is composed of non-random-sequence DNAs, particularly repetitive sequences. Furthermore, covalent modifications of DNA such as methylation play key roles in gene functions. Such DNAs with specific sequences or modifications often take on structures other than the canonical B-form. Here we present series of quantitative measurements of the DNA-DNA forces with the osmotic stress method on different DNA sequences, from short repeats to the most frequent sequences in genome, and to modifications such as bromination and methylation. We observe peculiar behaviors that appear to be strongly correlated with the incurred structural changes. We speculate the causalities in terms of the differences in hydration shell and DNA surface structures.

  11. Advances in Discrete-Event Simulation for MSL Command Validation

    NASA Technical Reports Server (NTRS)

    Patrikalakis, Alexander; O'Reilly, Taifun

    2013-01-01

    In the last five years, the discrete event simulator, SEQuence GENerator (SEQGEN), developed at the Jet Propulsion Laboratory to plan deep-space missions, has greatly increased uplink operations capacity to deal with increasingly complicated missions. In this paper, we describe how the Mars Science Laboratory (MSL) project makes full use of an interpreted environment to simulate change in more than fifty thousand flight software parameters and conditional command sequences to predict the result of executing a conditional branch in a command sequence, and enable the ability to warn users whenever one or more simulated spacecraft states change in an unexpected manner. Using these new SEQGEN features, operators plan more activities in one sol than ever before.

  12. A Statistical Guide to the Design of Deep Mutational Scanning Experiments.

    PubMed

    Matuszewski, Sebastian; Hildebrandt, Marcel E; Ghenu, Ana-Hermina; Jensen, Jeffrey D; Bank, Claudia

    2016-09-01

    The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deep-sequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates. Copyright © 2016 by the Genetics Society of America.

  13. Isoform-level gene expression patterns in single-cell RNA-sequencing data.

    PubMed

    Vu, Trung Nghia; Wills, Quin F; Kalari, Krishna R; Niu, Nifang; Wang, Liewei; Pawitan, Yudi; Rantalainen, Mattias

    2018-02-27

    RNA sequencing of single cells enables characterization of transcriptional heterogeneity in seemingly homogeneous cell populations. Single-cell sequencing has been applied in a wide range of researches fields. However, few studies have focus on characterization of isoform-level expression patterns at the single-cell level. In this study we propose and apply a novel method, ISOform-Patterns (ISOP), based on mixture modeling, to characterize the expression patterns of isoform pairs from the same gene in single-cell isoform-level expression data. We define six principal patterns of isoform expression relationships and describe a method for differential-pattern analysis. We demonstrate ISOP through analysis of single-cell RNA-sequencing data from a breast cancer cell line, with replication in three independent datasets. We assigned the pattern types to each of 16,562 isoform-pairs from 4,929 genes. Among those, 26% of the discovered patterns were significant (p<0.05), while remaining patterns are possibly effects of transcriptional bursting, drop-out and stochastic biological heterogeneity. Furthermore, 32% of genes discovered through differential-pattern analysis were not detected by differential-expression analysis. The effect of drop-out events, mean expression level, and properties of the expression distribution on the performances of ISOP were also investigated through simulated datasets. To conclude, ISOP provides a novel approach for characterization of isoformlevel preference, commitment and heterogeneity in single-cell RNA-sequencing data. The ISOP method has been implemented as a R package and is available at https://github.com/nghiavtr/ISOP under a GPL-3 license. mattias.rantalainen@ki.se. Supplementary data are available at Bioinformatics online.

  14. WE-H-BRA-04: Biological Geometries for the Monte Carlo Simulation Toolkit TOPASNBio

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McNamara, A; Held, K; Paganetti, H

    2016-06-15

    Purpose: New advances in radiation therapy are most likely to come from the complex interface of physics, chemistry and biology. Computational simulations offer a powerful tool for quantitatively investigating radiation interactions with biological tissue and can thus help bridge the gap between physics and biology. The aim of TOPAS-nBio is to provide a comprehensive tool to generate advanced radiobiology simulations. Methods: TOPAS wraps and extends the Geant4 Monte Carlo (MC) simulation toolkit. TOPAS-nBio is an extension to TOPAS which utilizes the physics processes in Geant4-DNA to model biological damage from very low energy secondary electrons. Specialized cell, organelle and molecularmore » geometries were designed for the toolkit. Results: TOPAS-nBio gives the user the capability of simulating biological geometries, ranging from the micron-scale (e.g. cells and organelles) to complex nano-scale geometries (e.g. DNA and proteins). The user interacts with TOPAS-nBio through easy-to-use input parameter files. For example, in a simple cell simulation the user can specify the cell type and size as well as the type, number and size of included organelles. For more detailed nuclear simulations, the user can specify chromosome territories containing chromatin fiber loops, the later comprised of nucleosomes on a double helix. The chromatin fibers can be arranged in simple rigid geometries or within factual globules, mimicking realistic chromosome territories. TOPAS-nBio also provides users with the capability of reading protein data bank 3D structural files to simulate radiation damage to proteins or nucleic acids e.g. histones or RNA. TOPAS-nBio has been validated by comparing results to other track structure simulation software and published experimental measurements. Conclusion: TOPAS-nBio provides users with a comprehensive MC simulation tool for radiobiological simulations, giving users without advanced programming skills the ability to design and run complex simulations.« less

  15. MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods

    PubMed Central

    Tamura, Koichiro; Peterson, Daniel; Peterson, Nicholas; Stecher, Glen; Nei, Masatoshi; Kumar, Sudhir

    2011-01-01

    Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from http://www.megasoftware.net. PMID:21546353

  16. A Hybrid Parallel Strategy Based on String Graph Theory to Improve De Novo DNA Assembly on the TianHe-2 Supercomputer.

    PubMed

    Zhang, Feng; Liao, Xiangke; Peng, Shaoliang; Cui, Yingbo; Wang, Bingqiang; Zhu, Xiaoqian; Liu, Jie

    2016-06-01

    ' The de novo assembly of DNA sequences is increasingly important for biological researches in the genomic era. After more than one decade since the Human Genome Project, some challenges still exist and new solutions are being explored to improve de novo assembly of genomes. String graph assembler (SGA), based on the string graph theory, is a new method/tool developed to address the challenges. In this paper, based on an in-depth analysis of SGA we prove that the SGA-based sequence de novo assembly is an NP-complete problem. According to our analysis, SGA outperforms other similar methods/tools in memory consumption, but costs much more time, of which 60-70 % is spent on the index construction. Upon this analysis, we introduce a hybrid parallel optimization algorithm and implement this algorithm in the TianHe-2's parallel framework. Simulations are performed with different datasets. For data of small size the optimized solution is 3.06 times faster than before, and for data of middle size it's 1.60 times. The results demonstrate an evident performance improvement, with the linear scalability for parallel FM-index construction. This results thus contribute significantly to improving the efficiency of de novo assembly of DNA sequences.

  17. Networking Biology: The Origins of Sequence-Sharing Practices in Genomics.

    PubMed

    Stevens, Hallam

    2015-10-01

    The wide sharing of biological data, especially nucleotide sequences, is now considered to be a key feature of genomics. Historians and sociologists have attempted to account for the rise of this sharing by pointing to precedents in model organism communities and in natural history. This article supplements these approaches by examining the role that electronic networking technologies played in generating the specific forms of sharing that emerged in genomics. The links between early computer users at the Stanford Artificial Intelligence Laboratory in the 1960s, biologists using local computer networks in the 1970s, and GenBank in the 1980s, show how networking technologies carried particular practices of communication, circulation, and data distribution from computing into biology. In particular, networking practices helped to transform sequences themselves into objects that had value as a community resource.

  18. Unravelling biology and shifting paradigms in cancer with single-cell sequencing.

    PubMed

    Baslan, Timour; Hicks, James

    2017-08-24

    The fundamental operative unit of a cancer is the genetically and epigenetically innovative single cell. Whether proliferating or quiescent, in the primary tumour mass or disseminated elsewhere, single cells govern the parameters that dictate all facets of the biology of cancer. Thus, single-cell analyses provide the ultimate level of resolution in our quest for a fundamental understanding of this disease. Historically, this quest has been hampered by technological shortcomings. In this Opinion article, we argue that the rapidly evolving field of single-cell sequencing has unshackled the cancer research community of these shortcomings. From furthering an elemental understanding of intra-tumoural genetic heterogeneity and cancer genome evolution to illuminating the governing principles of disease relapse and metastasis, we posit that single-cell sequencing promises to unravel the biology of all facets of this disease.

  19. Nanopore-based fourth-generation DNA sequencing technology.

    PubMed

    Feng, Yanxiao; Zhang, Yuechuan; Ying, Cuifeng; Wang, Deqiang; Du, Chunlei

    2015-02-01

    Nanopore-based sequencers, as the fourth-generation DNA sequencing technology, have the potential to quickly and reliably sequence the entire human genome for less than $1000, and possibly for even less than $100. The single-molecule techniques used by this technology allow us to further study the interaction between DNA and protein, as well as between protein and protein. Nanopore analysis opens a new door to molecular biology investigation at the single-molecule scale. In this article, we have reviewed academic achievements in nanopore technology from the past as well as the latest advances, including both biological and solid-state nanopores, and discussed their recent and potential applications. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  20. [Accuracy Check of Monte Carlo Simulation in Particle Therapy Using Gel Dosimeters].

    PubMed

    Furuta, Takuya

    2017-01-01

    Gel dosimeters are a three-dimensional imaging tool for dose distribution induced by radiations. They can be used for accuracy check of Monte Carlo simulation in particle therapy. An application was reviewed in this article. An inhomogeneous biological sample placing a gel dosimeter behind it was irradiated by carbon beam. The recorded dose distribution in the gel dosimeter reflected the inhomogeneity of the biological sample. Monte Carlo simulation was conducted by reconstructing the biological sample from its CT image. The accuracy of the particle transport by Monte Carlo simulation was checked by comparing the dose distribution in the gel dosimeter between simulation and experiment.

  1. Simulating complex intracellular processes using object-oriented computational modelling.

    PubMed

    Johnson, Colin G; Goldman, Jacki P; Gullick, William J

    2004-11-01

    The aim of this paper is to give an overview of computer modelling and simulation in cellular biology, in particular as applied to complex biochemical processes within the cell. This is illustrated by the use of the techniques of object-oriented modelling, where the computer is used to construct abstractions of objects in the domain being modelled, and these objects then interact within the computer to simulate the system and allow emergent properties to be observed. The paper also discusses the role of computer simulation in understanding complexity in biological systems, and the kinds of information which can be obtained about biology via simulation.

  2. A supervised learning rule for classification of spatiotemporal spike patterns.

    PubMed

    Lilin Guo; Zhenzhong Wang; Adjouadi, Malek

    2016-08-01

    This study introduces a novel supervised algorithm for spiking neurons that take into consideration synapse delays and axonal delays associated with weights. It can be utilized for both classification and association and uses several biologically influenced properties, such as axonal and synaptic delays. This algorithm also takes into consideration spike-timing-dependent plasticity as in Remote Supervised Method (ReSuMe). This paper focuses on the classification aspect alone. Spiked neurons trained according to this proposed learning rule are capable of classifying different categories by the associated sequences of precisely timed spikes. Simulation results have shown that the proposed learning method greatly improves classification accuracy when compared to the Spike Pattern Association Neuron (SPAN) and the Tempotron learning rule.

  3. Connecting Artificial Brains to Robots in a Comprehensive Simulation Framework: The Neurorobotics Platform

    PubMed Central

    Falotico, Egidio; Vannucci, Lorenzo; Ambrosano, Alessandro; Albanese, Ugo; Ulbrich, Stefan; Vasquez Tieck, Juan Camilo; Hinkel, Georg; Kaiser, Jacques; Peric, Igor; Denninger, Oliver; Cauli, Nino; Kirtay, Murat; Roennau, Arne; Klinker, Gudrun; Von Arnim, Axel; Guyot, Luc; Peppicelli, Daniel; Martínez-Cañada, Pablo; Ros, Eduardo; Maier, Patrick; Weber, Sandro; Huber, Manuel; Plecher, David; Röhrbein, Florian; Deser, Stefan; Roitberg, Alina; van der Smagt, Patrick; Dillman, Rüdiger; Levi, Paul; Laschi, Cecilia; Knoll, Alois C.; Gewaltig, Marc-Oliver

    2017-01-01

    Combined efforts in the fields of neuroscience, computer science, and biology allowed to design biologically realistic models of the brain based on spiking neural networks. For a proper validation of these models, an embodiment in a dynamic and rich sensory environment, where the model is exposed to a realistic sensory-motor task, is needed. Due to the complexity of these brain models that, at the current stage, cannot deal with real-time constraints, it is not possible to embed them into a real-world task. Rather, the embodiment has to be simulated as well. While adequate tools exist to simulate either complex neural networks or robots and their environments, there is so far no tool that allows to easily establish a communication between brain and body models. The Neurorobotics Platform is a new web-based environment that aims to fill this gap by offering scientists and technology developers a software infrastructure allowing them to connect brain models to detailed simulations of robot bodies and environments and to use the resulting neurorobotic systems for in silico experimentation. In order to simplify the workflow and reduce the level of the required programming skills, the platform provides editors for the specification of experimental sequences and conditions, environments, robots, and brain–body connectors. In addition to that, a variety of existing robots and environments are provided. This work presents the architecture of the first release of the Neurorobotics Platform developed in subproject 10 “Neurorobotics” of the Human Brain Project (HBP).1 At the current state, the Neurorobotics Platform allows researchers to design and run basic experiments in neurorobotics using simulated robots and simulated environments linked to simplified versions of brain models. We illustrate the capabilities of the platform with three example experiments: a Braitenberg task implemented on a mobile robot, a sensory-motor learning task based on a robotic controller, and a visual tracking embedding a retina model on the iCub humanoid robot. These use-cases allow to assess the applicability of the Neurorobotics Platform for robotic tasks as well as in neuroscientific experiments. PMID:28179882

  4. Connecting Artificial Brains to Robots in a Comprehensive Simulation Framework: The Neurorobotics Platform.

    PubMed

    Falotico, Egidio; Vannucci, Lorenzo; Ambrosano, Alessandro; Albanese, Ugo; Ulbrich, Stefan; Vasquez Tieck, Juan Camilo; Hinkel, Georg; Kaiser, Jacques; Peric, Igor; Denninger, Oliver; Cauli, Nino; Kirtay, Murat; Roennau, Arne; Klinker, Gudrun; Von Arnim, Axel; Guyot, Luc; Peppicelli, Daniel; Martínez-Cañada, Pablo; Ros, Eduardo; Maier, Patrick; Weber, Sandro; Huber, Manuel; Plecher, David; Röhrbein, Florian; Deser, Stefan; Roitberg, Alina; van der Smagt, Patrick; Dillman, Rüdiger; Levi, Paul; Laschi, Cecilia; Knoll, Alois C; Gewaltig, Marc-Oliver

    2017-01-01

    Combined efforts in the fields of neuroscience, computer science, and biology allowed to design biologically realistic models of the brain based on spiking neural networks. For a proper validation of these models, an embodiment in a dynamic and rich sensory environment, where the model is exposed to a realistic sensory-motor task, is needed. Due to the complexity of these brain models that, at the current stage, cannot deal with real-time constraints, it is not possible to embed them into a real-world task. Rather, the embodiment has to be simulated as well. While adequate tools exist to simulate either complex neural networks or robots and their environments, there is so far no tool that allows to easily establish a communication between brain and body models. The Neurorobotics Platform is a new web-based environment that aims to fill this gap by offering scientists and technology developers a software infrastructure allowing them to connect brain models to detailed simulations of robot bodies and environments and to use the resulting neurorobotic systems for in silico experimentation. In order to simplify the workflow and reduce the level of the required programming skills, the platform provides editors for the specification of experimental sequences and conditions, environments, robots, and brain-body connectors. In addition to that, a variety of existing robots and environments are provided. This work presents the architecture of the first release of the Neurorobotics Platform developed in subproject 10 "Neurorobotics" of the Human Brain Project (HBP). At the current state, the Neurorobotics Platform allows researchers to design and run basic experiments in neurorobotics using simulated robots and simulated environments linked to simplified versions of brain models. We illustrate the capabilities of the platform with three example experiments: a Braitenberg task implemented on a mobile robot, a sensory-motor learning task based on a robotic controller, and a visual tracking embedding a retina model on the iCub humanoid robot. These use-cases allow to assess the applicability of the Neurorobotics Platform for robotic tasks as well as in neuroscientific experiments.

  5. multiDE: a dimension reduced model based statistical method for differential expression analysis using RNA-sequencing data with multiple treatment conditions.

    PubMed

    Kang, Guangliang; Du, Li; Zhang, Hong

    2016-06-22

    The growing complexity of biological experiment design based on high-throughput RNA sequencing (RNA-seq) is calling for more accommodative statistical tools. We focus on differential expression (DE) analysis using RNA-seq data in the presence of multiple treatment conditions. We propose a novel method, multiDE, for facilitating DE analysis using RNA-seq read count data with multiple treatment conditions. The read count is assumed to follow a log-linear model incorporating two factors (i.e., condition and gene), where an interaction term is used to quantify the association between gene and condition. The number of the degrees of freedom is reduced to one through the first order decomposition of the interaction, leading to a dramatically power improvement in testing DE genes when the number of conditions is greater than two. In our simulation situations, multiDE outperformed the benchmark methods (i.e. edgeR and DESeq2) even if the underlying model was severely misspecified, and the power gain was increasing in the number of conditions. In the application to two real datasets, multiDE identified more biologically meaningful DE genes than the benchmark methods. An R package implementing multiDE is available publicly at http://homepage.fudan.edu.cn/zhangh/softwares/multiDE . When the number of conditions is two, multiDE performs comparably with the benchmark methods. When the number of conditions is greater than two, multiDE outperforms the benchmark methods.

  6. The Science and Issues of Human DNA Polymorphisms: A Training Workshop for High School Biology Teachers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Micklos, David A.

    2006-10-30

    This project achieved its goal of implementing a nationwide training program to introduce high school biology teachers to the key uses and societal implications of human DNA polymorphisms. The 2.5-day workshop introduced high school biology faculty to a laboratory-based unit on human DNA polymorphisms â which provides a uniquely personal perspective on the science and Ethical, Legal and Social Implications (ELSI) of the Human Genome Project. As proposed, 12 workshops were conducted at venues across the United States. The workshops were attended by 256 high school faculty, exceeding proposed attendance of 240 by 7%. Each workshop mixed theoretical, laboratory, andmore » computer work with practical and ethical implications. Program participants learned simplified lab techniques for amplifying three types of chromosomal polymorphisms: an Alu insertion (PV92), a VNTR (pMCT118/D1S80), and single nucleotide polymorphisms (SNPs) in the mitochondrial control region. These polymorphisms illustrate the use of DNA variations in disease diagnosis, forensic biology, and identity testing - and provide a starting point for discussing the uses and potential abuses of genetic technology. Participants also learned how to use their Alu and mitochondrial data as an entrée to human population genetics and evolution. Our work to simplify lab techniques for amplifying human DNA polymorphisms in educational settings culminated with the release in 1998 of three Advanced Technology (AT) PCR kits by Carolina Biological Supply Company, the nationâÂÂs oldest educational science supplier. The kits use a simple 30-minute method to isolate template DNA from hair sheaths or buccal cells and streamlined PCR chemistry based on Pharmacia Ready-To-Go Beads, which incorporate Taq polymerase, deoxynucleotide triphosphates, and buffer in a freeze-dried pellet. These kits have greatly simplified teacher implementation of human PCR labs, and their use is growing at a rapid pace. Sales of human polymorphism kits by Carolina Biological rose from 700 units in 1999 to 1,132 in 2000 â a 62% increase. Competing kits using the Alu system, and based substantially on our earlier work, are also marketed by Biorad and Edvotek. In parallel with the lab experiments, we developed a suite of database/statistical applications and easy-to-use interfaces that allow students to use their own DNA data to explore human population genetics and to test theories of human evolution. Database searches and statistical analyses are launched from a centralized workspace. Workshop participants were introduced to these and other resources available at the DNALC WWW site (http://vector.cshl.org/bioserver/): 1) Allele Server tests Hardy-Weinberg equilibrium and statistically compares PV92 data from world populations. 2) Sequence Server uses DNA sequence data to search Genbank using BLASTN, compare sequences using CLUSTALW, and create phylogenetic trees using PHYLIP. 3) Simulation Server uses a Monte Carlo generator to model the long-term effects of drift, selection, and population bottlenecks. By targeting motivated and innovative biology faculty, we believe that this project offered a cost-effective means to bring high school biology education up-to-the-minute with genomic biology. The workshop reached a target audience of highly professional faculty who have already implemented hands-on labs in molecular genetics and many of whom offer laboratory electives in biotechnology. Many attend professional meetings, develop curriculum, collaborate with scientists, teach faculty workshops, and manage equipment-sharing programs. These individuals are life-long learners, anxious for deeper insight and additional training to further extend their leadership. This contention was supported by data from a mail survey, conducted in February-March 2000 and 2001, of 256 faculty who participated in workshops conducted during the current term of DOE support. Seventy percent of participants responded, providing direct reports on how their teaching behavior had changed since taking the DOE workshop. About nine of ten respondents said they had provided new classroom materials and first-hand accounts of DNA typing, sequencing, or PCR. Three-fourths had introduced new units on human molecular genetics. Most strikingly, half had students use PCR to amplify their own insertion polymorphisms (PV92), and better than one-fourth amplified a VNTR polymorphism and the mitochondrial control region. One in five had mitochondrial DNA sequenced by the DNALC Sequencing Service. A majority (58%) used online materials at the DNALC WWW site, and 28% analyzed student polymorphism data with Bioservers at the DNALC site. A majority (58%) assisted other faculty with student labs on polymorphisms, reaching an additional 786 teachers.« less

  7. The Science and Issues of Human DNA Polymoprhisms: A Training Workshop for High School Biology Teachers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    David. A Micklos

    2006-10-30

    This project achieved its goal of implementing a nationwide training program to introduce high school biology teachers to the key uses and societal implications of human DNA polymorphisms. The 2.5-day workshop introduced high school biology faculty to a laboratory-based unit on human DNA polymorphisms – which provides a uniquely personal perspective on the science and Ethical, Legal and Social Implications (ELSI) of the Human Genome Project. As proposed, 12 workshops were conducted at venues across the United States. The workshops were attended by 256 high school faculty, exceeding proposed attendance of 240 by 7%. Each workshop mixed theoretical, laboratory, andmore » computer work with practical and ethical implications. Program participants learned simplified lab techniques for amplifying three types of chromosomal polymorphisms: an Alu insertion (PV92), a VNTR (pMCT118/D1S80), and single nucleotide polymorphisms (SNPs) in the mitochondrial control region. These polymorphisms illustrate the use of DNA variations in disease diagnosis, forensic biology, and identity testing - and provide a starting point for discussing the uses and potential abuses of genetic technology. Participants also learned how to use their Alu and mitochondrial data as an entrée to human population genetics and evolution. Our work to simplify lab techniques for amplifying human DNA polymorphisms in educational settings culminated with the release in 1998 of three Advanced Technology (AT) PCR kits by Carolina Biological Supply Company, the nation’s oldest educational science supplier. The kits use a simple 30-minute method to isolate template DNA from hair sheaths or buccal cells and streamlined PCR chemistry based on Pharmacia Ready-To-Go Beads, which incorporate Taq polymerase, deoxynucleotide triphosphates, and buffer in a freeze-dried pellet. These kits have greatly simplified teacher implementation of human PCR labs, and their use is growing at a rapid pace. Sales of human polymorphism kits by Carolina Biological rose from 700 units in 1999 to 1,132 in 2000 – a 62% increase. Competing kits using the Alu system, and based substantially on our earlier work, are also marketed by Biorad and Edvotek. In parallel with the lab experiments, we developed a suite of database/statistical applications and easy-to-use interfaces that allow students to use their own DNA data to explore human population genetics and to test theories of human evolution. Database searches and statistical analyses are launched from a centralized workspace. Workshop participants were introduced to these and other resources available at the DNALC WWW site (http://vector.cshl.org/bioserver/): 1) Allele Server tests Hardy-Weinberg equilibrium and statistically compares PV92 data from world populations. 2) Sequence Server uses DNA sequence data to search Genbank using BLASTN, compare sequences using CLUSTALW, and create phylogenetic trees using PHYLIP. 3) Simulation Server uses a Monte Carlo generator to model the long-term effects of drift, selection, and population bottlenecks. By targeting motivated and innovative biology faculty, we believe that this project offered a cost-effective means to bring high school biology education up-to-the-minute with genomic biology. The workshop reached a target audience of highly professional faculty who have already implemented hands-on labs in molecular genetics and many of whom offer laboratory electives in biotechnology. Many attend professional meetings, develop curriculum, collaborate with scientists, teach faculty workshops, and manage equipment-sharing programs. These individuals are life-long learners, anxious for deeper insight and additional training to further extend their leadership. This contention was supported by data from a mail survey, conducted in February-March 2000 and 2001, of 256 faculty who participated in workshops conducted during the current term of DOE support. Seventy percent of participants responded, providing direct reports on how their teaching behavior had changed since taking the DOE workshop. About nine of ten respondents said they had provided new classroom materials and first-hand accounts of DNA typing, sequencing, or PCR. Three-fourths had introduced new units on human molecular genetics. Most strikingly, half had students use PCR to amplify their own insertion polymorphisms (PV92), and better than one-fourth amplified a VNTR polymorphism and the mitochondrial control region. One in five had mitochondrial DNA sequenced by the DNALC Sequencing Service. A majority (58%) used online materials at the DNALC WWW site, and 28% analyzed student polymorphism data with Bioservers at the DNALC site. A majority (58%) assisted other faculty with student labs on polymorphisms, reaching an additional 786 teachers.« less

  8. Adaptive compressive learning for prediction of protein-protein interactions from primary sequence.

    PubMed

    Zhang, Ya-Nan; Pan, Xiao-Yong; Huang, Yan; Shen, Hong-Bin

    2011-08-21

    Protein-protein interactions (PPIs) play an important role in biological processes. Although much effort has been devoted to the identification of novel PPIs by integrating experimental biological knowledge, there are still many difficulties because of lacking enough protein structural and functional information. It is highly desired to develop methods based only on amino acid sequences for predicting PPIs. However, sequence-based predictors are often struggling with the high-dimensionality causing over-fitting and high computational complexity problems, as well as the redundancy of sequential feature vectors. In this paper, a novel computational approach based on compressed sensing theory is proposed to predict yeast Saccharomyces cerevisiae PPIs from primary sequence and has achieved promising results. The key advantage of the proposed compressed sensing algorithm is that it can compress the original high-dimensional protein sequential feature vector into a much lower but more condensed space taking the sparsity property of the original signal into account. What makes compressed sensing much more attractive in protein sequence analysis is its compressed signal can be reconstructed from far fewer measurements than what is usually considered necessary in traditional Nyquist sampling theory. Experimental results demonstrate that proposed compressed sensing method is powerful for analyzing noisy biological data and reducing redundancy in feature vectors. The proposed method represents a new strategy of dealing with high-dimensional protein discrete model and has great potentiality to be extended to deal with many other complicated biological systems. Copyright © 2011 Elsevier Ltd. All rights reserved.

  9. A group LASSO-based method for robustly inferring gene regulatory networks from multiple time-course datasets.

    PubMed

    Liu, Li-Zhi; Wu, Fang-Xiang; Zhang, Wen-Jun

    2014-01-01

    As an abstract mapping of the gene regulations in the cell, gene regulatory network is important to both biological research study and practical applications. The reverse engineering of gene regulatory networks from microarray gene expression data is a challenging research problem in systems biology. With the development of biological technologies, multiple time-course gene expression datasets might be collected for a specific gene network under different circumstances. The inference of a gene regulatory network can be improved by integrating these multiple datasets. It is also known that gene expression data may be contaminated with large errors or outliers, which may affect the inference results. A novel method, Huber group LASSO, is proposed to infer the same underlying network topology from multiple time-course gene expression datasets as well as to take the robustness to large error or outliers into account. To solve the optimization problem involved in the proposed method, an efficient algorithm which combines the ideas of auxiliary function minimization and block descent is developed. A stability selection method is adapted to our method to find a network topology consisting of edges with scores. The proposed method is applied to both simulation datasets and real experimental datasets. It shows that Huber group LASSO outperforms the group LASSO in terms of both areas under receiver operating characteristic curves and areas under the precision-recall curves. The convergence analysis of the algorithm theoretically shows that the sequence generated from the algorithm converges to the optimal solution of the problem. The simulation and real data examples demonstrate the effectiveness of the Huber group LASSO in integrating multiple time-course gene expression datasets and improving the resistance to large errors or outliers.

  10. Whole-Genome Sequence of Pseudomonas graminis Strain UASWS1507, a Potential Biological Control Agent and Biofertilizer Isolated in Switzerland.

    PubMed

    Crovadore, Julien; Calmin, Gautier; Chablais, Romain; Cochard, Bastien; Schulz, Torsten; Lefort, François

    2016-10-06

    We report here the whole-genome shotgun sequence of the strain UASWS1507 of the species Pseudomonas graminis, isolated in Switzerland from an apple tree. This is the first genome registered for this species, which is considered as a potential and valuable resource of biological control agents and biofertilizers for agriculture. Copyright © 2016 Crovadore et al.

  11. Mind the gap; seven reasons to close fragmented genome assemblies

    USDA-ARS?s Scientific Manuscript database

    Like other domains of life, research into the biology of filamentous microbes has greatly benefited from the advent of whole-genome sequencing. Next-generation sequencing (NGS) technologies have revolutionized sequencing, making genomic sciences accessible to many academic laboratories including tho...

  12. Comparative genomics reveals high biological diversity and specific adaptations in the industrially and medically important fungal genus Aspergillus.

    PubMed

    de Vries, Ronald P; Riley, Robert; Wiebenga, Ad; Aguilar-Osorio, Guillermo; Amillis, Sotiris; Uchima, Cristiane Akemi; Anderluh, Gregor; Asadollahi, Mojtaba; Askin, Marion; Barry, Kerrie; Battaglia, Evy; Bayram, Özgür; Benocci, Tiziano; Braus-Stromeyer, Susanna A; Caldana, Camila; Cánovas, David; Cerqueira, Gustavo C; Chen, Fusheng; Chen, Wanping; Choi, Cindy; Clum, Alicia; Dos Santos, Renato Augusto Corrêa; Damásio, André Ricardo de Lima; Diallinas, George; Emri, Tamás; Fekete, Erzsébet; Flipphi, Michel; Freyberg, Susanne; Gallo, Antonia; Gournas, Christos; Habgood, Rob; Hainaut, Matthieu; Harispe, María Laura; Henrissat, Bernard; Hildén, Kristiina S; Hope, Ryan; Hossain, Abeer; Karabika, Eugenia; Karaffa, Levente; Karányi, Zsolt; Kraševec, Nada; Kuo, Alan; Kusch, Harald; LaButti, Kurt; Lagendijk, Ellen L; Lapidus, Alla; Levasseur, Anthony; Lindquist, Erika; Lipzen, Anna; Logrieco, Antonio F; MacCabe, Andrew; Mäkelä, Miia R; Malavazi, Iran; Melin, Petter; Meyer, Vera; Mielnichuk, Natalia; Miskei, Márton; Molnár, Ákos P; Mulé, Giuseppina; Ngan, Chew Yee; Orejas, Margarita; Orosz, Erzsébet; Ouedraogo, Jean Paul; Overkamp, Karin M; Park, Hee-Soo; Perrone, Giancarlo; Piumi, Francois; Punt, Peter J; Ram, Arthur F J; Ramón, Ana; Rauscher, Stefan; Record, Eric; Riaño-Pachón, Diego Mauricio; Robert, Vincent; Röhrig, Julian; Ruller, Roberto; Salamov, Asaf; Salih, Nadhira S; Samson, Rob A; Sándor, Erzsébet; Sanguinetti, Manuel; Schütze, Tabea; Sepčić, Kristina; Shelest, Ekaterina; Sherlock, Gavin; Sophianopoulou, Vicky; Squina, Fabio M; Sun, Hui; Susca, Antonia; Todd, Richard B; Tsang, Adrian; Unkles, Shiela E; van de Wiele, Nathalie; van Rossen-Uffink, Diana; Oliveira, Juliana Velasco de Castro; Vesth, Tammi C; Visser, Jaap; Yu, Jae-Hyuk; Zhou, Miaomiao; Andersen, Mikael R; Archer, David B; Baker, Scott E; Benoit, Isabelle; Brakhage, Axel A; Braus, Gerhard H; Fischer, Reinhard; Frisvad, Jens C; Goldman, Gustavo H; Houbraken, Jos; Oakley, Berl; Pócsi, István; Scazzocchio, Claudio; Seiboth, Bernhard; vanKuyk, Patricia A; Wortman, Jennifer; Dyer, Paul S; Grigoriev, Igor V

    2017-02-14

    The fungal genus Aspergillus is of critical importance to humankind. Species include those with industrial applications, important pathogens of humans, animals and crops, a source of potent carcinogenic contaminants of food, and an important genetic model. The genome sequences of eight aspergilli have already been explored to investigate aspects of fungal biology, raising questions about evolution and specialization within this genus. We have generated genome sequences for ten novel, highly diverse Aspergillus species and compared these in detail to sister and more distant genera. Comparative studies of key aspects of fungal biology, including primary and secondary metabolism, stress response, biomass degradation, and signal transduction, revealed both conservation and diversity among the species. Observed genomic differences were validated with experimental studies. This revealed several highlights, such as the potential for sex in asexual species, organic acid production genes being a key feature of black aspergilli, alternative approaches for degrading plant biomass, and indications for the genetic basis of stress response. A genome-wide phylogenetic analysis demonstrated in detail the relationship of the newly genome sequenced species with other aspergilli. Many aspects of biological differences between fungal species cannot be explained by current knowledge obtained from genome sequences. The comparative genomics and experimental study, presented here, allows for the first time a genus-wide view of the biological diversity of the aspergilli and in many, but not all, cases linked genome differences to phenotype. Insights gained could be exploited for biotechnological and medical applications of fungi.

  13. Genome Science: A Video Tour of the Washington University Genome Sequencing Center for High School and Undergraduate Students

    PubMed Central

    2005-01-01

    Sequencing of the human genome has ushered in a new era of biology. The technologies developed to facilitate the sequencing of the human genome are now being applied to the sequencing of other genomes. In 2004, a partnership was formed between Washington University School of Medicine Genome Sequencing Center's Outreach Program and Washington University Department of Biology Science Outreach to create a video tour depicting the processes involved in large-scale sequencing. “Sequencing a Genome: Inside the Washington University Genome Sequencing Center” is a tour of the laboratory that follows the steps in the sequencing pipeline, interspersed with animated explanations of the scientific procedures used at the facility. Accompanying interviews with the staff illustrate different entry levels for a career in genome science. This video project serves as an example of how research and academic institutions can provide teachers and students with access and exposure to innovative technologies at the forefront of biomedical research. Initial feedback on the video from undergraduate students, high school teachers, and high school students provides suggestions for use of this video in a classroom setting to supplement present curricula. PMID:16341256

  14. Advanced EMT and Phasor-Domain Hybrid Simulation with Simulation Mode Switching Capability for Transmission and Distribution Systems

    DOE PAGES

    Huang, Qiuhua; Vittal, Vijay

    2018-05-09

    Conventional electromagnetic transient (EMT) and phasor-domain hybrid simulation approaches presently exist for trans-mission system level studies. Their simulation efficiency is generally constrained by the EMT simulation. With an increasing number of distributed energy resources and non-conventional loads being installed in distribution systems, it is imperative to extend the hybrid simulation application to include distribution systems and integrated transmission and distribution systems. Meanwhile, it is equally important to improve the simulation efficiency as the modeling scope and complexity of the detailed system in the EMT simulation increases. To meet both requirements, this paper introduces an advanced EMT and phasor-domain hybrid simulationmore » approach. This approach has two main features: 1) a comprehensive phasor-domain modeling framework which supports positive-sequence, three-sequence, three-phase and mixed three-sequence/three-phase representations and 2) a robust and flexible simulation mode switching scheme. The developed scheme enables simulation switching from hybrid simulation mode back to pure phasor-domain dynamic simulation mode to achieve significantly improved simulation efficiency. The proposed method has been tested on integrated transmission and distribution systems. In conclusion, the results show that with the developed simulation switching feature, the total computational time is significantly reduced compared to running the hybrid simulation for the whole simulation period, while maintaining good simulation accuracy.« less

  15. Advanced EMT and Phasor-Domain Hybrid Simulation with Simulation Mode Switching Capability for Transmission and Distribution Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huang, Qiuhua; Vittal, Vijay

    Conventional electromagnetic transient (EMT) and phasor-domain hybrid simulation approaches presently exist for trans-mission system level studies. Their simulation efficiency is generally constrained by the EMT simulation. With an increasing number of distributed energy resources and non-conventional loads being installed in distribution systems, it is imperative to extend the hybrid simulation application to include distribution systems and integrated transmission and distribution systems. Meanwhile, it is equally important to improve the simulation efficiency as the modeling scope and complexity of the detailed system in the EMT simulation increases. To meet both requirements, this paper introduces an advanced EMT and phasor-domain hybrid simulationmore » approach. This approach has two main features: 1) a comprehensive phasor-domain modeling framework which supports positive-sequence, three-sequence, three-phase and mixed three-sequence/three-phase representations and 2) a robust and flexible simulation mode switching scheme. The developed scheme enables simulation switching from hybrid simulation mode back to pure phasor-domain dynamic simulation mode to achieve significantly improved simulation efficiency. The proposed method has been tested on integrated transmission and distribution systems. In conclusion, the results show that with the developed simulation switching feature, the total computational time is significantly reduced compared to running the hybrid simulation for the whole simulation period, while maintaining good simulation accuracy.« less

  16. Short reads from honey bee (Apis sp.) sequencing projects reflect microbial associate diversity

    PubMed Central

    Hurst, Gregory D.D.

    2017-01-01

    High throughput (or ‘next generation’) sequencing has transformed most areas of biological research and is now a standard method that underpins empirical study of organismal biology, and (through comparison of genomes), reveals patterns of evolution. For projects focused on animals, these sequencing methods do not discriminate between the primary target of sequencing (the animal genome) and ‘contaminating’ material, such as associated microbes. A common first step is to filter out these contaminants to allow better assembly of the animal genome or transcriptome. Here, we aimed to assess if these ‘contaminations’ provide information with regard to biologically important microorganisms associated with the individual. To achieve this, we examined whether the short read data from Apis retrieved elements of its well established microbiome. To this end, we screened almost 1,000 short read libraries of honey bee (Apis sp.) DNA sequencing project for the presence of microbial sequences, and find sequences from known honey bee microbial associates in at least 11% of them. Further to this, we screened ∼500 Apis RNA sequencing libraries for evidence of viral infections, which were found to be present in about half of them. We then used the data to reconstruct draft genomes of three Apis associated bacteria, as well as several viral strains de novo. We conclude that ‘contamination’ in short read sequencing libraries can provide useful genomic information on microbial taxa known to be associated with the target organisms, and may even lead to the discovery of novel associations. Finally, we demonstrate that RNAseq samples from experiments commonly carry uneven viral loads across libraries. We note variation in viral presence and load may be a confounding feature of differential gene expression analyses, and as such it should be incorporated as a random factor in analyses. PMID:28717593

  17. Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns.

    PubMed

    Ortuño, Francisco M; Valenzuela, Olga; Rojas, Fernando; Pomares, Hector; Florido, Javier P; Urquiza, Jose M; Rojas, Ignacio

    2013-09-01

    Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biological features. Thus, the same set of sequences can produce different alignments, above all when sequences are less similar. Consequently, researchers and biologists do not agree about which is the most suitable way to evaluate MSAs. Recent evaluations tend to use more complex scores including further biological features. Among them, 3D structures are increasingly being used to evaluate alignments. Because structures are more conserved in proteins than sequences, scores with structural information are better suited to evaluate more distant relationships between sequences. The proposed multiobjective algorithm, based on the non-dominated sorting genetic algorithm, aims to jointly optimize three objectives: STRIKE score, non-gaps percentage and totally conserved columns. It was significantly assessed on the BAliBASE benchmark according to the Kruskal-Wallis test (P < 0.01). This algorithm also outperforms other aligners, such as ClustalW, Multiple Sequence Alignment Genetic Algorithm (MSA-GA), PRRP, DIALIGN, Hidden Markov Model Training (HMMT), Pattern-Induced Multi-sequence Alignment (PIMA), MULTIALIGN, Sequence Alignment Genetic Algorithm (SAGA), PILEUP, Rubber Band Technique Genetic Algorithm (RBT-GA) and Vertical Decomposition Genetic Algorithm (VDGA), according to the Wilcoxon signed-rank test (P < 0.05), whereas it shows results not significantly different to 3D-COFFEE (P > 0.05) with the advantage of being able to use less structures. Structural information is included within the objective function to evaluate more accurately the obtained alignments. The source code is available at http://www.ugr.es/~fortuno/MOSAStrE/MO-SAStrE.zip.

  18. Short reads from honey bee (Apis sp.) sequencing projects reflect microbial associate diversity.

    PubMed

    Gerth, Michael; Hurst, Gregory D D

    2017-01-01

    High throughput (or 'next generation') sequencing has transformed most areas of biological research and is now a standard method that underpins empirical study of organismal biology, and (through comparison of genomes), reveals patterns of evolution. For projects focused on animals, these sequencing methods do not discriminate between the primary target of sequencing (the animal genome) and 'contaminating' material, such as associated microbes. A common first step is to filter out these contaminants to allow better assembly of the animal genome or transcriptome. Here, we aimed to assess if these 'contaminations' provide information with regard to biologically important microorganisms associated with the individual. To achieve this, we examined whether the short read data from Apis retrieved elements of its well established microbiome. To this end, we screened almost 1,000 short read libraries of honey bee ( Apis sp.) DNA sequencing project for the presence of microbial sequences, and find sequences from known honey bee microbial associates in at least 11% of them. Further to this, we screened ∼500 Apis RNA sequencing libraries for evidence of viral infections, which were found to be present in about half of them. We then used the data to reconstruct draft genomes of three Apis associated bacteria, as well as several viral strains de novo . We conclude that 'contamination' in short read sequencing libraries can provide useful genomic information on microbial taxa known to be associated with the target organisms, and may even lead to the discovery of novel associations. Finally, we demonstrate that RNAseq samples from experiments commonly carry uneven viral loads across libraries. We note variation in viral presence and load may be a confounding feature of differential gene expression analyses, and as such it should be incorporated as a random factor in analyses.

  19. RBT-GA: a novel metaheuristic for solving the Multiple Sequence Alignment problem.

    PubMed

    Taheri, Javid; Zomaya, Albert Y

    2009-07-07

    Multiple Sequence Alignment (MSA) has always been an active area of research in Bioinformatics. MSA is mainly focused on discovering biologically meaningful relationships among different sequences or proteins in order to investigate the underlying main characteristics/functions. This information is also used to generate phylogenetic trees. This paper presents a novel approach, namely RBT-GA, to solve the MSA problem using a hybrid solution methodology combining the Rubber Band Technique (RBT) and the Genetic Algorithm (GA) metaheuristic. RBT is inspired by the behavior of an elastic Rubber Band (RB) on a plate with several poles, which is analogues to locations in the input sequences that could potentially be biologically related. A GA attempts to mimic the evolutionary processes of life in order to locate optimal solutions in an often very complex landscape. RBT-GA is a population based optimization algorithm designed to find the optimal alignment for a set of input protein sequences. In this novel technique, each alignment answer is modeled as a chromosome consisting of several poles in the RBT framework. These poles resemble locations in the input sequences that are most likely to be correlated and/or biologically related. A GA-based optimization process improves these chromosomes gradually yielding a set of mostly optimal answers for the MSA problem. RBT-GA is tested with one of the well-known benchmarks suites (BALiBASE 2.0) in this area. The obtained results show that the superiority of the proposed technique even in the case of formidable sequences.

  20. YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs.

    PubMed

    Shigematsu, Megumi; Honda, Shozo; Loher, Phillipe; Telonis, Aristeidis G; Rigoutsos, Isidore; Kirino, Yohei

    2017-05-19

    Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Comparing viral metagenomics methods using a highly multiplexed human viral pathogens reagent

    PubMed Central

    Li, Linlin; Deng, Xutao; Mee, Edward T.; Collot-Teixeira, Sophie; Anderson, Rob; Schepelmann, Silke; Minor, Philip D.; Delwart, Eric

    2014-01-01

    Unbiased metagenomic sequencing holds significant potential as a diagnostic tool for the simultaneous detection of any previously genetically described viral nucleic acids in clinical samples. Viral genome sequences can also inform on likely phenotypes including drug susceptibility or neutralization serotypes. In this study, different variables of the laboratory methods often used to generate viral metagenomics libraries on the efficiency of viral detection and virus genome coverage were compared. A biological reagent consisting of 25 different human RNA and DNA viral pathogens was used to estimate the effect of filtration and nuclease digestion, DNA/RNA extraction methods, pre-amplification and the use of different library preparation kits on the detection of viral nucleic acids. Filtration and nuclease treatment led to slight decreases in the percentage of viral sequence reads and number of viruses detected. For nucleic acid extractions silica spin columns improved viral sequence recovery relative to magnetic beads and Trizol extraction. Pre-amplification using random RT-PCR while generating more viral sequence reads resulted in detection of fewer viruses, more overlapping sequences, and lower genome coverage. The ScriptSeq library preparation method retrieved more viruses and a greater fraction of their genomes than the TruSeq and Nextera methods. Viral metagenomics sequencing was able to simultaneously detect up to 22 different viruses in the biological reagent analyzed including all those detected by qPCR. Further optimization will be required for the detection of viruses in biologically more complex samples such as tissues, blood, or feces. PMID:25497414

  2. Biological Pathways

    MedlinePlus

    ... Sheets A Brief Guide to Genomics About NHGRI Research About the International HapMap Project Biological Pathways Chromosome Abnormalities Chromosomes Cloning Comparative Genomics DNA Microarray Technology DNA Sequencing Deoxyribonucleic Acid ( ...

  3. Sequence-dependent nucleosome sliding in rotation-coupled and uncoupled modes revealed by molecular simulations

    PubMed Central

    Tan, Cheng; Takada, Shoji

    2017-01-01

    While nucleosome positioning on eukaryotic genome play important roles for genetic regulation, molecular mechanisms of nucleosome positioning and sliding along DNA are not well understood. Here we investigated thermally-activated spontaneous nucleosome sliding mechanisms developing and applying a coarse-grained molecular simulation method that incorporates both long-range electrostatic and short-range hydrogen-bond interactions between histone octamer and DNA. The simulations revealed two distinct sliding modes depending on the nucleosomal DNA sequence. A uniform DNA sequence showed frequent sliding with one base pair step in a rotation-coupled manner, akin to screw-like motions. On the contrary, a strong positioning sequence, the so-called 601 sequence, exhibits rare, abrupt transitions of five and ten base pair steps without rotation. Moreover, we evaluated the importance of hydrogen bond interactions on the sliding mode, finding that strong and weak bonds favor respectively the rotation-coupled and -uncoupled sliding movements. PMID:29194442

  4. Real time simulation of computer-assisted sequencing of terminal area operations

    NASA Technical Reports Server (NTRS)

    Dear, R. G.

    1981-01-01

    A simulation was developed to investigate the utilization of computer assisted decision making for the task of sequencing and scheduling aircraft in a high density terminal area. The simulation incorporates a decision methodology termed Constrained Position Shifting. This methodology accounts for aircraft velocity profiles, routes, and weight classes in dynamically sequencing and scheduling arriving aircraft. A sample demonstration of Constrained Position Shifting is presented where six aircraft types (including both light and heavy aircraft) are sequenced to land at Denver's Stapleton International Airport. A graphical display is utilized and Constrained Position Shifting with a maximum shift of four positions (rearward or forward) is compared to first come, first serve with respect to arrival at the runway. The implementation of computer assisted sequencing and scheduling methodologies is investigated. A time based control concept will be required and design considerations for such a system are discussed.

  5. Communicating the Benefits of a Full Sequence of High School Science Courses

    ERIC Educational Resources Information Center

    Nicholas, Catherine Marie

    2014-01-01

    High school students are generally uninformed about the benefits of enrolling in a full sequence of science courses, therefore only about a third of our nation's high school graduates have completed the science sequence of Biology, Chemistry and Physics. The lack of students completing a full sequence of science courses contributes to the deficit…

  6. Putting Physics First: Three Case Studies of High School Science Department and Course Sequence Reorganization

    ERIC Educational Resources Information Center

    Larkin, Douglas B.

    2016-01-01

    This article examines the process of shifting to a "Physics First" sequence in science course offerings in three school districts in the United States. This curricular sequence reverses the more common U.S. high school sequence of biology/chemistry/physics, and has gained substantial support in the physics education community over the…

  7. Binary Interval Search: a scalable algorithm for counting interval intersections

    PubMed Central

    Layer, Ryan M.; Skadron, Kevin; Robins, Gabriel; Hall, Ira M.; Quinlan, Aaron R.

    2013-01-01

    Motivation: The comparison of diverse genomic datasets is fundamental to understand genome biology. Researchers must explore many large datasets of genome intervals (e.g. genes, sequence alignments) to place their experimental results in a broader context and to make new discoveries. Relationships between genomic datasets are typically measured by identifying intervals that intersect, that is, they overlap and thus share a common genome interval. Given the continued advances in DNA sequencing technologies, efficient methods for measuring statistically significant relationships between many sets of genomic features are crucial for future discovery. Results: We introduce the Binary Interval Search (BITS) algorithm, a novel and scalable approach to interval set intersection. We demonstrate that BITS outperforms existing methods at counting interval intersections. Moreover, we show that BITS is intrinsically suited to parallel computing architectures, such as graphics processing units by illustrating its utility for efficient Monte Carlo simulations measuring the significance of relationships between sets of genomic intervals. Availability: https://github.com/arq5x/bits. Contact: arq5x@virginia.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23129298

  8. Sequence-dependent response of DNA to torsional stress: a potential biological regulation mechanism.

    PubMed

    Reymer, Anna; Zakrzewska, Krystyna; Lavery, Richard

    2018-02-28

    Torsional restraints on DNA change in time and space during the life of the cell and are an integral part of processes such as gene expression, DNA repair and packaging. The mechanical behavior of DNA under torsional stress has been studied on a mesoscopic scale, but little is known concerning its response at the level of individual base pairs and the effects of base pair composition. To answer this question, we have developed a geometrical restraint that can accurately control the total twist of a DNA segment during all-atom molecular dynamics simulations. By applying this restraint to four different DNA oligomers, we are able to show that DNA responds to both under- and overtwisting in a very heterogeneous manner. Certain base pair steps, in specific sequence environments, are able to absorb most of the torsional stress, leaving other steps close to their relaxed conformation. This heterogeneity also affects the local torsional modulus of DNA. These findings suggest that modifying torsional stress on DNA could act as a modulator for protein binding via the heterogeneous changes in local DNA structure.

  9. Sequence-dependent response of DNA to torsional stress: a potential biological regulation mechanism

    PubMed Central

    Reymer, Anna; Zakrzewska, Krystyna; Lavery, Richard

    2018-01-01

    Abstract Torsional restraints on DNA change in time and space during the life of the cell and are an integral part of processes such as gene expression, DNA repair and packaging. The mechanical behavior of DNA under torsional stress has been studied on a mesoscopic scale, but little is known concerning its response at the level of individual base pairs and the effects of base pair composition. To answer this question, we have developed a geometrical restraint that can accurately control the total twist of a DNA segment during all-atom molecular dynamics simulations. By applying this restraint to four different DNA oligomers, we are able to show that DNA responds to both under- and overtwisting in a very heterogeneous manner. Certain base pair steps, in specific sequence environments, are able to absorb most of the torsional stress, leaving other steps close to their relaxed conformation. This heterogeneity also affects the local torsional modulus of DNA. These findings suggest that modifying torsional stress on DNA could act as a modulator for protein binding via the heterogeneous changes in local DNA structure. PMID:29267977

  10. Cumulant expansions for measuring water exchange using diffusion MRI

    NASA Astrophysics Data System (ADS)

    Ning, Lipeng; Nilsson, Markus; Lasič, Samo; Westin, Carl-Fredrik; Rathi, Yogesh

    2018-02-01

    The rate of water exchange across cell membranes is a parameter of biological interest and can be measured by diffusion magnetic resonance imaging (dMRI). In this work, we investigate a stochastic model for the diffusion-and-exchange of water molecules. This model provides a general solution for the temporal evolution of dMRI signal using any type of gradient waveform, thereby generalizing the signal expressions for the Kärger model. Moreover, we also derive a general nth order cumulant expansion of the dMRI signal accounting for water exchange, which has not been explored in earlier studies. Based on this analytical expression, we compute the cumulant expansion for dMRI signals for the special case of single diffusion encoding (SDE) and double diffusion encoding (DDE) sequences. Our results provide a theoretical guideline on optimizing experimental parameters for SDE and DDE sequences, respectively. Moreover, we show that DDE signals are more sensitive to water exchange at short-time scale but provide less attenuation at long-time scale than SDE signals. Our theoretical analysis is also validated using Monte Carlo simulations on synthetic structures.

  11. Integrated Modular Teaching of Human Biology for Primary Care Practitioners

    ERIC Educational Resources Information Center

    Glasgow, Michael S.

    1977-01-01

    Describes the use of integrated modular teaching of the human biology component of the Health Associate Program at Johns Hopkins University, where the goal is to develop an understanding of the sciences as applied to primary care. Discussion covers the module sequence, the human biology faculty, goals of the human biology faculty, laboratory…

  12. Symposium: The Role of Biological Sciences in the Optometric Curriculum.

    ERIC Educational Resources Information Center

    And Others; Rapp, Jerry

    1980-01-01

    Papers from a symposium probing some of the curricular elements of the program in biological sciences at a school or college of optometry are provided. The overall program sequence in the biological sciences, microbiology, pharmacology, and the curriculum in the biological sciences from a clinical perspective are discussed. (Author/MLW)

  13. Torque measurements reveal sequence-specific cooperative transitions in supercoiled DNA

    PubMed Central

    Oberstrass, Florian C.; Fernandes, Louis E.; Bryant, Zev

    2012-01-01

    B-DNA becomes unstable under superhelical stress and is able to adopt a wide range of alternative conformations including strand-separated DNA and Z-DNA. Localized sequence-dependent structural transitions are important for the regulation of biological processes such as DNA replication and transcription. To directly probe the effect of sequence on structural transitions driven by torque, we have measured the torsional response of a panel of DNA sequences using single molecule assays that employ nanosphere rotational probes to achieve high torque resolution. The responses of Z-forming d(pGpC)n sequences match our predictions based on a theoretical treatment of cooperative transitions in helical polymers. “Bubble” templates containing 50–100 bp mismatch regions show cooperative structural transitions similar to B-DNA, although less torque is required to disrupt strand–strand interactions. Our mechanical measurements, including direct characterization of the torsional rigidity of strand-separated DNA, establish a framework for quantitative predictions of the complex torsional response of arbitrary sequences in their biological context. PMID:22474350

  14. Learning Quantitative Sequence-Function Relationships from Massively Parallel Experiments

    NASA Astrophysics Data System (ADS)

    Atwal, Gurinder S.; Kinney, Justin B.

    2016-03-01

    A fundamental aspect of biological information processing is the ubiquity of sequence-function relationships—functions that map the sequence of DNA, RNA, or protein to a biochemically relevant activity. Most sequence-function relationships in biology are quantitative, but only recently have experimental techniques for effectively measuring these relationships been developed. The advent of such "massively parallel" experiments presents an exciting opportunity for the concepts and methods of statistical physics to inform the study of biological systems. After reviewing these recent experimental advances, we focus on the problem of how to infer parametric models of sequence-function relationships from the data produced by these experiments. Specifically, we retrace and extend recent theoretical work showing that inference based on mutual information, not the standard likelihood-based approach, is often necessary for accurately learning the parameters of these models. Closely connected with this result is the emergence of "diffeomorphic modes"—directions in parameter space that are far less constrained by data than likelihood-based inference would suggest. Analogous to Goldstone modes in physics, diffeomorphic modes arise from an arbitrarily broken symmetry of the inference problem. An analytically tractable model of a massively parallel experiment is then described, providing an explicit demonstration of these fundamental aspects of statistical inference. This paper concludes with an outlook on the theoretical and computational challenges currently facing studies of quantitative sequence-function relationships.

  15. Experimental strategies for the identification and characterization of adhesive proteins in animals: a review

    PubMed Central

    Hennebert, Elise; Maldonado, Barbara; Ladurner, Peter; Flammang, Patrick; Santos, Romana

    2015-01-01

    Adhesive secretions occur in both aquatic and terrestrial animals, in which they perform diverse functions. Biological adhesives can therefore be remarkably complex and involve a large range of components with different functions and interactions. However, being mainly protein based, biological adhesives can be characterized by classical molecular methods. This review compiles experimental strategies that were successfully used to identify, characterize and obtain the full-length sequence of adhesive proteins from nine biological models: echinoderms, barnacles, tubeworms, mussels, sticklebacks, slugs, velvet worms, spiders and ticks. A brief description and practical examples are given for a variety of tools used to study adhesive molecules at different levels from genes to secreted proteins. In most studies, proteins, extracted from secreted materials or from adhesive organs, are analysed for the presence of post-translational modifications and submitted to peptide sequencing. The peptide sequences are then used directly for a BLAST search in genomic or transcriptomic databases, or to design degenerate primers to perform RT-PCR, both allowing the recovery of the sequence of the cDNA coding for the investigated protein. These sequences can then be used for functional validation and recombinant production. In recent years, the dual proteomic and transcriptomic approach has emerged as the best way leading to the identification of novel adhesive proteins and retrieval of their complete sequences. PMID:25657842

  16. Generalizing Gillespie’s Direct Method to Enable Network-Free Simulations

    DOE PAGES

    Suderman, Ryan T.; Mitra, Eshan David; Lin, Yen Ting; ...

    2018-03-28

    Gillespie’s direct method for stochastic simulation of chemical kinetics is a staple of computational systems biology research. However, the algorithm requires explicit enumeration of all reactions and all chemical species that may arise in the system. In many cases, this is not feasible due to the combinatorial explosion of reactions and species in biological networks. Rule-based modeling frameworks provide a way to exactly represent networks containing such combinatorial complexity, and generalizations of Gillespie’s direct method have been developed as simulation engines for rule-based modeling languages. Here, we provide both a high-level description of the algorithms underlying the simulation engines, termedmore » network-free simulation algorithms, and how they have been applied in systems biology research. We also define a generic rule-based modeling framework and describe a number of technical details required for adapting Gillespie’s direct method for network-free simulation. Lastly, we briefly discuss potential avenues for advancing network-free simulation and the role they continue to play in modeling dynamical systems in biology.« less

  17. Generalizing Gillespie’s Direct Method to Enable Network-Free Simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Suderman, Ryan T.; Mitra, Eshan David; Lin, Yen Ting

    Gillespie’s direct method for stochastic simulation of chemical kinetics is a staple of computational systems biology research. However, the algorithm requires explicit enumeration of all reactions and all chemical species that may arise in the system. In many cases, this is not feasible due to the combinatorial explosion of reactions and species in biological networks. Rule-based modeling frameworks provide a way to exactly represent networks containing such combinatorial complexity, and generalizations of Gillespie’s direct method have been developed as simulation engines for rule-based modeling languages. Here, we provide both a high-level description of the algorithms underlying the simulation engines, termedmore » network-free simulation algorithms, and how they have been applied in systems biology research. We also define a generic rule-based modeling framework and describe a number of technical details required for adapting Gillespie’s direct method for network-free simulation. Lastly, we briefly discuss potential avenues for advancing network-free simulation and the role they continue to play in modeling dynamical systems in biology.« less

  18. LS³: A Method for Improving Phylogenomic Inferences When Evolutionary Rates Are Heterogeneous among Taxa.

    PubMed

    Rivera-Rivera, Carlos J; Montoya-Burgos, Juan I

    2016-06-01

    Phylogenetic inference artifacts can occur when sequence evolution deviates from assumptions made by the models used to analyze them. The combination of strong model assumption violations and highly heterogeneous lineage evolutionary rates can become problematic in phylogenetic inference, and lead to the well-described long-branch attraction (LBA) artifact. Here, we define an objective criterion for assessing lineage evolutionary rate heterogeneity among predefined lineages: the result of a likelihood ratio test between a model in which the lineages evolve at the same rate (homogeneous model) and a model in which different lineage rates are allowed (heterogeneous model). We implement this criterion in the algorithm Locus Specific Sequence Subsampling (LS³), aimed at reducing the effects of LBA in multi-gene datasets. For each gene, LS³ sequentially removes the fastest-evolving taxon of the ingroup and tests for lineage rate homogeneity until all lineages have uniform evolutionary rates. The sequences excluded from the homogeneously evolving taxon subset are flagged as potentially problematic. The software implementation provides the user with the possibility to remove the flagged sequences for generating a new concatenated alignment. We tested LS³ with simulations and two real datasets containing LBA artifacts: a nucleotide dataset regarding the position of Glires within mammals and an amino-acid dataset concerning the position of nematodes within bilaterians. The initially incorrect phylogenies were corrected in all cases upon removing data flagged by LS³. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  19. Modelling outcomes of complex treatment strategies following a clinical guideline for treatment decisions in patients with rheumatoid arthritis.

    PubMed

    Tran-Duy, An; Boonen, Annelies; Kievit, Wietske; van Riel, Piet L C M; van de Laar, Mart A F J; Severens, Johan L

    2014-10-01

    Management of rheumatoid arthritis (RA) is characterised by a sequence of disease-modifying antirheumatic drugs (DMARDs) and biological response modifiers (BRMs). In most of the Western countries, the drug sequences are determined based on disease activity and treatment history of the patients. A model for realistic patient outcomes should reflect the treatment pathways relevant for patients with specific characteristics. This study aimed at developing a model that could simulate long-term patient outcomes and cost effectiveness of treatment strategies with and without inclusion of BRMs following a clinical guideline for treatment decisions. Discrete event simulation taking into account patient characteristics and treatment history was used for model development. Treatment effect on disease activity, costs, health utilities and times to events were estimated using Dutch observational studies. Long-term progression of physical functioning was quantified using a linear mixed-effects model. Costs and health utilities were estimated using two-part models. The treatment strategy recommended by the Dutch Society for Rheumatology where both DMARDs and BRMs were available (Strategy 2) was compared with the treatment strategy without BRMs (Strategy 1). Ten thousand theoretical patients were tracked individually until death. In the probabilistic sensitivity analysis, Monte Carlo simulations were performed with 1,000 sets of parameters sampled from appropriate probability distributions. The simulated changes over time in disease activity and physical functioning were plausible. The incremental cost per quality-adjusted life-year gained of Strategy 2 compared with Strategy 1 was 124,011. At a willingness-to-pay threshold higher than 119,167, Strategy 2 dominated Strategy 1 in terms of cost effectiveness but the probability that the Strategy 2 is cost effective never exceeded 0.87. It is possible to model the outcomes of complex treatment strategies based on a clinical guideline for the management of RA. Following the Dutch guideline and using real-life data, inclusion of BRMs in the treatment strategy for RA appeared to be less favourable in our model than in most of the existing models that compared drug sequences independent of patient characteristics and used data from randomised controlled clinical trials. Despite complexity and demand for extensive data, our modelling approach can help to identify the knowledge gaps in clinical guidelines for RA management and priorities for future research.

  20. College Students' Misconceptions about Evolutionary Trees

    ERIC Educational Resources Information Center

    Meir, Eli; Perry, Judy; Herron, Jon C.; Kingsolver, Joel

    2007-01-01

    Evolution is at the center of the biological sciences and is therefore a required topic for virtually every college biology student. Over the past year, the authors have been building a new simulation software package called EvoBeaker to teach college-level evolutionary biology through simulated experiments. They have built both micro and…

  1. BlochSolver: A GPU-optimized fast 3D MRI simulator for experimentally compatible pulse sequences

    NASA Astrophysics Data System (ADS)

    Kose, Ryoichi; Kose, Katsumi

    2017-08-01

    A magnetic resonance imaging (MRI) simulator, which reproduces MRI experiments using computers, has been developed using two graphic-processor-unit (GPU) boards (GTX 1080). The MRI simulator was developed to run according to pulse sequences used in experiments. Experiments and simulations were performed to demonstrate the usefulness of the MRI simulator for three types of pulse sequences, namely, three-dimensional (3D) gradient-echo, 3D radio-frequency spoiled gradient-echo, and gradient-echo multislice with practical matrix sizes. The results demonstrated that the calculation speed using two GPU boards was typically about 7 TFLOPS and about 14 times faster than the calculation speed using CPUs (two 18-core Xeons). We also found that MR images acquired by experiment could be reproduced using an appropriate number of subvoxels, and that 3D isotropic and two-dimensional multislice imaging experiments for practical matrix sizes could be simulated using the MRI simulator. Therefore, we concluded that such powerful MRI simulators are expected to become an indispensable tool for MRI research and development.

  2. High performance MRI simulations of motion on multi-GPU systems.

    PubMed

    Xanthis, Christos G; Venetis, Ioannis E; Aletras, Anthony H

    2014-07-04

    MRI physics simulators have been developed in the past for optimizing imaging protocols and for training purposes. However, these simulators have only addressed motion within a limited scope. The purpose of this study was the incorporation of realistic motion, such as cardiac motion, respiratory motion and flow, within MRI simulations in a high performance multi-GPU environment. Three different motion models were introduced in the Magnetic Resonance Imaging SIMULator (MRISIMUL) of this study: cardiac motion, respiratory motion and flow. Simulation of a simple Gradient Echo pulse sequence and a CINE pulse sequence on the corresponding anatomical model was performed. Myocardial tagging was also investigated. In pulse sequence design, software crushers were introduced to accommodate the long execution times in order to avoid spurious echoes formation. The displacement of the anatomical model isochromats was calculated within the Graphics Processing Unit (GPU) kernel for every timestep of the pulse sequence. Experiments that would allow simulation of custom anatomical and motion models were also performed. Last, simulations of motion with MRISIMUL on single-node and multi-node multi-GPU systems were examined. Gradient Echo and CINE images of the three motion models were produced and motion-related artifacts were demonstrated. The temporal evolution of the contractility of the heart was presented through the application of myocardial tagging. Better simulation performance and image quality were presented through the introduction of software crushers without the need to further increase the computational load and GPU resources. Last, MRISIMUL demonstrated an almost linear scalable performance with the increasing number of available GPU cards, in both single-node and multi-node multi-GPU computer systems. MRISIMUL is the first MR physics simulator to have implemented motion with a 3D large computational load on a single computer multi-GPU configuration. The incorporation of realistic motion models, such as cardiac motion, respiratory motion and flow may benefit the design and optimization of existing or new MR pulse sequences, protocols and algorithms, which examine motion related MR applications.

  3. Detecting gene subnetworks under selection in biological pathways.

    PubMed

    Gouy, Alexandre; Daub, Joséphine T; Excoffier, Laurent

    2017-09-19

    Advances in high throughput sequencing technologies have created a gap between data production and functional data analysis. Indeed, phenotypes result from interactions between numerous genes, but traditional methods treat loci independently, missing important knowledge brought by network-level emerging properties. Therefore, detecting selection acting on multiple genes affecting the evolution of complex traits remains challenging. In this context, gene network analysis provides a powerful framework to study the evolution of adaptive traits and facilitates the interpretation of genome-wide data. We developed a method to analyse gene networks that is suitable to evidence polygenic selection. The general idea is to search biological pathways for subnetworks of genes that directly interact with each other and that present unusual evolutionary features. Subnetwork search is a typical combinatorial optimization problem that we solve using a simulated annealing approach. We have applied our methodology to find signals of adaptation to high-altitude in human populations. We show that this adaptation has a clear polygenic basis and is influenced by many genetic components. Our approach, implemented in the R package signet, improves on gene-level classical tests for selection by identifying both new candidate genes and new biological processes involved in adaptation to altitude. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Electrical detection of the biological interaction of a charged peptide via gallium arsenide junction-field-effect transistors

    PubMed Central

    Lee, Kangho; Nair, Pradeep R.; Alam, Muhammad A.; Janes, David B.; Wampler, Heeyeon P.; Zemlyanov, Dmitry Y.; Ivanisevic, Albena

    2008-01-01

    GaAs junction-field-effect transistors (JFETs) are utilized to achieve label-free detection of biological interaction between a probe transactivating transcriptional activator (TAT) peptide and the target trans-activation-responsive (TAR) RNA. The TAT peptide is a short sequence derived from the human immunodeficiency virus-type 1 TAT protein. The GaAs JFETs are modified with a mixed adlayer of 1-octadecanethiol (ODT) and TAT peptide, with the ODT passivating the GaAs surface from polar ions in physiological solutions and the TAT peptide providing selective binding sites for TAR RNA. The devices modified with the mixed adlayer exhibit a negative pinch-off voltage (VP) shift, which is attributed to the fixed positive charges from the arginine-rich regions in the TAT peptide. Immersing the modified devices into a TAR RNA solution results in a large positive VP shift (>1 V) and a steeper subthreshold slope (∼80 mV∕decade), whereas “dummy” RNA induced a small positive VP shift (∼0.3 V) without a significant change in subthreshold slopes (∼330 mV∕decade). The observed modulation of device characteristics is analyzed with analytical modeling and two-dimensional numerical device simulations to investigate the electronic interactions between the GaAs JFETs and biological molecules. PMID:19484151

  5. Destruction of Spores on Building Decontamination Residue in a Commercial Autoclave▿

    PubMed Central

    Lemieux, P.; Sieber, R.; Osborne, A.; Woodard, A.

    2006-01-01

    The U.S. Environmental Protection Agency conducted an experiment to evaluate the effectiveness of a commercial autoclave for treating simulated building decontamination residue (BDR). The BDR was intended to simulate porous materials removed from a building deliberately contaminated with biological agents such as Bacillus anthracis (anthrax) in a terrorist attack. The purpose of the tests was to assess whether the standard operating procedure for a commercial autoclave provided sufficiently robust conditions to adequately destroy bacterial spores bound to the BDR. In this study we investigated the effects of several variables related to autoclaving BDR, including time, temperature, pressure, item type, moisture content, packing density, packing orientation, autoclave bag integrity, and autoclave process sequence. The test team created simulated BDR from wallboard, ceiling tiles, carpet, and upholstered furniture, and embedded in the BDR were Geobacillus stearothermophilus biological indicator (BI) strips containing 106 spores and thermocouples to obtain time and temperature profile data associated with each BI strip. The results indicated that a single standard autoclave cycle did not effectively decontaminate the BDR. Autoclave cycles consisting of 120 min at 31.5 lb/in2 and 275°F and 75 min at 45 lb/in2 and 292°F effectively decontaminated the BDR material. Two sequential standard autoclave cycles consisting of 40 min at 31.5 lb/in2 and 275°F proved to be particularly effective, probably because the second cycle's evacuation step pulled the condensed water out of the pores of the materials, allowing better steam penetration. The results also indicated that the packing density and material type of the BDR in the autoclave could have a significant impact on the effectiveness of the decontamination process. PMID:17012597

  6. Combining Experiments and Simulations Using the Maximum Entropy Principle

    PubMed Central

    Boomsma, Wouter; Ferkinghoff-Borg, Jesper; Lindorff-Larsen, Kresten

    2014-01-01

    A key component of computational biology is to compare the results of computer modelling with experimental measurements. Despite substantial progress in the models and algorithms used in many areas of computational biology, such comparisons sometimes reveal that the computations are not in quantitative agreement with experimental data. The principle of maximum entropy is a general procedure for constructing probability distributions in the light of new data, making it a natural tool in cases when an initial model provides results that are at odds with experiments. The number of maximum entropy applications in our field has grown steadily in recent years, in areas as diverse as sequence analysis, structural modelling, and neurobiology. In this Perspectives article, we give a broad introduction to the method, in an attempt to encourage its further adoption. The general procedure is explained in the context of a simple example, after which we proceed with a real-world application in the field of molecular simulations, where the maximum entropy procedure has recently provided new insight. Given the limited accuracy of force fields, macromolecular simulations sometimes produce results that are at not in complete and quantitative accordance with experiments. A common solution to this problem is to explicitly ensure agreement between the two by perturbing the potential energy function towards the experimental data. So far, a general consensus for how such perturbations should be implemented has been lacking. Three very recent papers have explored this problem using the maximum entropy approach, providing both new theoretical and practical insights to the problem. We highlight each of these contributions in turn and conclude with a discussion on remaining challenges. PMID:24586124

  7. Destruction of spores on building decontamination residue in a commercial autoclave.

    PubMed

    Lemieux, P; Sieber, R; Osborne, A; Woodard, A

    2006-12-01

    The U.S. Environmental Protection Agency conducted an experiment to evaluate the effectiveness of a commercial autoclave for treating simulated building decontamination residue (BDR). The BDR was intended to simulate porous materials removed from a building deliberately contaminated with biological agents such as Bacillus anthracis (anthrax) in a terrorist attack. The purpose of the tests was to assess whether the standard operating procedure for a commercial autoclave provided sufficiently robust conditions to adequately destroy bacterial spores bound to the BDR. In this study we investigated the effects of several variables related to autoclaving BDR, including time, temperature, pressure, item type, moisture content, packing density, packing orientation, autoclave bag integrity, and autoclave process sequence. The test team created simulated BDR from wallboard, ceiling tiles, carpet, and upholstered furniture, and embedded in the BDR were Geobacillus stearothermophilus biological indicator (BI) strips containing 10(6) spores and thermocouples to obtain time and temperature profile data associated with each BI strip. The results indicated that a single standard autoclave cycle did not effectively decontaminate the BDR. Autoclave cycles consisting of 120 min at 31.5 lb/in2 and 275 degrees F and 75 min at 45 lb/in2 and 292 degrees F effectively decontaminated the BDR material. Two sequential standard autoclave cycles consisting of 40 min at 31.5 lb/in2 and 275 degrees F proved to be particularly effective, probably because the second cycle's evacuation step pulled the condensed water out of the pores of the materials, allowing better steam penetration. The results also indicated that the packing density and material type of the BDR in the autoclave could have a significant impact on the effectiveness of the decontamination process.

  8. Natural product-inspired cascade synthesis yields modulators of centrosome integrity.

    PubMed

    Dückert, Heiko; Pries, Verena; Khedkar, Vivek; Menninger, Sascha; Bruss, Hanna; Bird, Alexander W; Maliga, Zoltan; Brockmeyer, Andreas; Janning, Petra; Hyman, Anthony; Grimme, Stefan; Schürmann, Markus; Preut, Hans; Hübel, Katja; Ziegler, Slava; Kumar, Kamal; Waldmann, Herbert

    2011-12-25

    In biology-oriented synthesis, the scaffolds of biologically relevant compound classes inspire the synthesis of focused compound collections enriched in bioactivity. This criterion is, in particular, met by the scaffolds of natural products selected in evolution. The synthesis of natural product-inspired compound collections calls for efficient reaction sequences that preferably combine multiple individual transformations in one operation. Here we report the development of a one-pot, twelve-step cascade reaction sequence that includes nine different reactions and two opposing kinds of organocatalysis. The cascade sequence proceeds within 10-30 min and transforms readily available substrates into complex indoloquinolizines that resemble the core tetracyclic scaffold of numerous polycyclic indole alkaloids. Biological investigation of a corresponding focused compound collection revealed modulators of centrosome integrity, termed centrocountins, which caused fragmented and supernumerary centrosomes, chromosome congression defects, multipolar mitotic spindles, acentrosomal spindle poles and multipolar cell division by targeting the centrosome-associated proteins nucleophosmin and Crm1.

  9. Automated design of genetic toggle switches with predetermined bistability.

    PubMed

    Chen, Shuobing; Zhang, Haoqian; Shi, Handuo; Ji, Weiyue; Feng, Jingchen; Gong, Yan; Yang, Zhenglin; Ouyang, Qi

    2012-07-20

    Synthetic biology aims to rationally construct biological devices with required functionalities. Methods that automate the design of genetic devices without post-hoc adjustment are therefore highly desired. Here we provide a method to predictably design genetic toggle switches with predetermined bistability. To accomplish this task, a biophysical model that links ribosome binding site (RBS) DNA sequence to toggle switch bistability was first developed by integrating a stochastic model with RBS design method. Then, to parametrize the model, a library of genetic toggle switch mutants was experimentally built, followed by establishing the equivalence between RBS DNA sequences and switch bistability. To test this equivalence, RBS nucleotide sequences for different specified bistabilities were in silico designed and experimentally verified. Results show that the deciphered equivalence is highly predictive for the toggle switch design with predetermined bistability. This method can be generalized to quantitative design of other probabilistic genetic devices in synthetic biology.

  10. Affordable hands-on DNA sequencing and genotyping: an exercise for teaching DNA analysis to undergraduates.

    PubMed

    Shah, Kushani; Thomas, Shelby; Stein, Arnold

    2013-01-01

    In this report, we describe a 5-week laboratory exercise for undergraduate biology and biochemistry students in which students learn to sequence DNA and to genotype their DNA for selected single nucleotide polymorphisms (SNPs). Students use miniaturized DNA sequencing gels that require approximately 8 min to run. The students perform G, A, T, C Sanger sequencing reactions. They prepare and run the gels, perform Southern blots (which require only 10 min), and detect sequencing ladders using a colorimetric detection system. Students enlarge their sequencing ladders from digital images of their small nylon membranes, and read the sequence manually. They compare their reads with the actual DNA sequence using BLAST2. After mastering the DNA sequencing system, students prepare their own DNA from a cheek swab, polymerase chain reaction-amplify a region of their DNA that encompasses a SNP of interest, and perform sequencing to determine their genotype at the SNP position. A family pedigree can also be constructed. The SNP chosen by the instructor was rs17822931, which is in the ABCC11 gene and is the determinant of human earwax type. Genotypes at the rs178229931 site vary in different ethnic populations. © 2013 by The International Union of Biochemistry and Molecular Biology.

  11. STOCHSIMGPU: parallel stochastic simulation for the Systems Biology Toolbox 2 for MATLAB.

    PubMed

    Klingbeil, Guido; Erban, Radek; Giles, Mike; Maini, Philip K

    2011-04-15

    The importance of stochasticity in biological systems is becoming increasingly recognized and the computational cost of biologically realistic stochastic simulations urgently requires development of efficient software. We present a new software tool STOCHSIMGPU that exploits graphics processing units (GPUs) for parallel stochastic simulations of biological/chemical reaction systems and show that significant gains in efficiency can be made. It is integrated into MATLAB and works with the Systems Biology Toolbox 2 (SBTOOLBOX2) for MATLAB. The GPU-based parallel implementation of the Gillespie stochastic simulation algorithm (SSA), the logarithmic direct method (LDM) and the next reaction method (NRM) is approximately 85 times faster than the sequential implementation of the NRM on a central processing unit (CPU). Using our software does not require any changes to the user's models, since it acts as a direct replacement of the stochastic simulation software of the SBTOOLBOX2. The software is open source under the GPL v3 and available at http://www.maths.ox.ac.uk/cmb/STOCHSIMGPU. The web site also contains supplementary information. klingbeil@maths.ox.ac.uk Supplementary data are available at Bioinformatics online.

  12. Biological Simulations in Distance Learning. CAL Research Group Technical Report No. 12.

    ERIC Educational Resources Information Center

    Murphy, P. J.

    When two biological simulations on evolution and genetics (one originally developed for a conventional university undergraduate course) were introduced into Open University distance education classes, the difficulties encountered required a reappraisal of the concept of using computer simulation for distance learning and decisions on which…

  13. Scalable metagenomic taxonomy classification using a reference genome database

    PubMed Central

    Ames, Sasha K.; Hysom, David A.; Gardner, Shea N.; Lloyd, G. Scott; Gokhale, Maya B.; Allen, Jonathan E.

    2013-01-01

    Motivation: Deep metagenomic sequencing of biological samples has the potential to recover otherwise difficult-to-detect microorganisms and accurately characterize biological samples with limited prior knowledge of sample contents. Existing metagenomic taxonomic classification algorithms, however, do not scale well to analyze large metagenomic datasets, and balancing classification accuracy with computational efficiency presents a fundamental challenge. Results: A method is presented to shift computational costs to an off-line computation by creating a taxonomy/genome index that supports scalable metagenomic classification. Scalable performance is demonstrated on real and simulated data to show accurate classification in the presence of novel organisms on samples that include viruses, prokaryotes, fungi and protists. Taxonomic classification of the previously published 150 giga-base Tyrolean Iceman dataset was found to take <20 h on a single node 40 core large memory machine and provide new insights on the metagenomic contents of the sample. Availability: Software was implemented in C++ and is freely available at http://sourceforge.net/projects/lmat Contact: allen99@llnl.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23828782

  14. An information-based approach to change-point analysis with applications to biophysics and cell biology.

    PubMed

    Wiggins, Paul A

    2015-07-21

    This article describes the application of a change-point algorithm to the analysis of stochastic signals in biological systems whose underlying state dynamics consist of transitions between discrete states. Applications of this analysis include molecular-motor stepping, fluorophore bleaching, electrophysiology, particle and cell tracking, detection of copy number variation by sequencing, tethered-particle motion, etc. We present a unified approach to the analysis of processes whose noise can be modeled by Gaussian, Wiener, or Ornstein-Uhlenbeck processes. To fit the model, we exploit explicit, closed-form algebraic expressions for maximum-likelihood estimators of model parameters and estimated information loss of the generalized noise model, which can be computed extremely efficiently. We implement change-point detection using the frequentist information criterion (which, to our knowledge, is a new information criterion). The frequentist information criterion specifies a single, information-based statistical test that is free from ad hoc parameters and requires no prior probability distribution. We demonstrate this information-based approach in the analysis of simulated and experimental tethered-particle-motion data. Copyright © 2015 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  15. Fractals in biology and medicine

    NASA Technical Reports Server (NTRS)

    Havlin, S.; Buldyrev, S. V.; Goldberger, A. L.; Mantegna, R. N.; Ossadnik, S. M.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    Our purpose is to describe some recent progress in applying fractal concepts to systems of relevance to biology and medicine. We review several biological systems characterized by fractal geometry, with a particular focus on the long-range power-law correlations found recently in DNA sequences containing noncoding material. Furthermore, we discuss the finding that the exponent alpha quantifying these long-range correlations ("fractal complexity") is smaller for coding than for noncoding sequences. We also discuss the application of fractal scaling analysis to the dynamics of heartbeat regulation, and report the recent finding that the normal heart is characterized by long-range "anticorrelations" which are absent in the diseased heart.

  16. Using Maximum Entropy to Find Patterns in Genomes

    NASA Astrophysics Data System (ADS)

    Liu, Sophia; Hockenberry, Adam; Lancichinetti, Andrea; Jewett, Michael; Amaral, Luis

    The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. To accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. This approach can also be easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes. National Institute of General Medical Science, Northwestern University Presidential Fellowship, National Science Foundation, David and Lucile Packard Foundation, Camille Dreyfus Teacher Scholar Award.

  17. BASiNET-BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification.

    PubMed

    Ito, Eric Augusto; Katahira, Isaque; Vicente, Fábio Fernandes da Rocha; Pereira, Luiz Filipe Protasio; Lopes, Fabrício Martins

    2018-06-05

    With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET.

  18. Systematization of the protein sequence diversity in enzymes related to secondary metabolic pathways in plants, in the context of big data biology inspired by the KNApSAcK motorcycle database.

    PubMed

    Ikeda, Shun; Abe, Takashi; Nakamura, Yukiko; Kibinge, Nelson; Hirai Morita, Aki; Nakatani, Atsushi; Ono, Naoaki; Ikemura, Toshimichi; Nakamura, Kensuke; Altaf-Ul-Amin, Md; Kanaya, Shigehiko

    2013-05-01

    Biology is increasingly becoming a data-intensive science with the recent progress of the omics fields, e.g. genomics, transcriptomics, proteomics and metabolomics. The species-metabolite relationship database, KNApSAcK Core, has been widely utilized and cited in metabolomics research, and chronological analysis of that research work has helped to reveal recent trends in metabolomics research. To meet the needs of these trends, the KNApSAcK database has been extended by incorporating a secondary metabolic pathway database called Motorcycle DB. We examined the enzyme sequence diversity related to secondary metabolism by means of batch-learning self-organizing maps (BL-SOMs). Initially, we constructed a map by using a big data matrix consisting of the frequencies of all possible dipeptides in the protein sequence segments of plants and bacteria. The enzyme sequence diversity of the secondary metabolic pathways was examined by identifying clusters of segments associated with certain enzyme groups in the resulting map. The extent of diversity of 15 secondary metabolic enzyme groups is discussed. Data-intensive approaches such as BL-SOM applied to big data matrices are needed for systematizing protein sequences. Handling big data has become an inevitable part of biology.

  19. Conserved noncoding sequences conserve biological networks and influence genome evolution.

    PubMed

    Xie, Jianbo; Qian, Kecheng; Si, Jingna; Xiao, Liang; Ci, Dong; Zhang, Deqiang

    2018-05-01

    Comparative genomics approaches have identified numerous conserved cis-regulatory sequences near genes in plant genomes. Despite the identification of these conserved noncoding sequences (CNSs), our knowledge of their functional importance and selection remains limited. Here, we used a combination of DNA methylome analysis, microarray expression analyses, and functional annotation to study these sequences in the model tree Populus trichocarpa. Methylation in CG contexts and non-CG contexts was lower in CNSs, particularly CNSs in the 5'-upstream regions of genes, compared with other sites in the genome. We observed that CNSs are enriched in genes with transcription and binding functions, and this also associated with syntenic genes and those from whole-genome duplications, suggesting that cis-regulatory sequences play a key role in genome evolution. We detected a significant positive correlation between CNS number and protein interactions, suggesting that CNSs may have roles in the evolution and maintenance of biological networks. The divergence of CNSs indicates that duplication-degeneration-complementation drives the subfunctionalization of a proportion of duplicated genes from whole-genome duplication. Furthermore, population genomics confirmed that most CNSs are under strong purifying selection and only a small subset of CNSs shows evidence of adaptive evolution. These findings provide a foundation for future studies exploring these key genomic features in the maintenance of biological networks, local adaptation, and transcription.

  20. incaRNAfbinv: a web server for the fragment-based design of RNA sequences

    PubMed Central

    Drory Retwitzer, Matan; Reinharz, Vladimir; Ponty, Yann; Waldispühl, Jérôme; Barash, Danny

    2016-01-01

    Abstract In recent years, new methods for computational RNA design have been developed and applied to various problems in synthetic biology and nanotechnology. Lately, there is considerable interest in incorporating essential biological information when solving the inverse RNA folding problem. Correspondingly, RNAfbinv aims at including biologically meaningful constraints and is the only program to-date that performs a fragment-based design of RNA sequences. In doing so it allows the design of sequences that do not necessarily exactly fold into the target, as long as the overall coarse-grained tree graph shape is preserved. Augmented by the weighted sampling algorithm of incaRNAtion, our web server called incaRNAfbinv implements the method devised in RNAfbinv and offers an interactive environment for the inverse folding of RNA using a fragment-based design approach. It takes as input: a target RNA secondary structure; optional sequence and motif constraints; optional target minimum free energy, neutrality and GC content. In addition to the design of synthetic regulatory sequences, it can be used as a pre-processing step for the detection of novel natural occurring RNAs. The two complementary methodologies RNAfbinv and incaRNAtion are merged together and fully implemented in our web server incaRNAfbinv, available at http://www.cs.bgu.ac.il/incaRNAfbinv. PMID:27185893

  1. The Structural Biology Knowledgebase: a portal to protein structures, sequences, functions, and methods.

    PubMed

    Gabanyi, Margaret J; Adams, Paul D; Arnold, Konstantin; Bordoli, Lorenza; Carter, Lester G; Flippen-Andersen, Judith; Gifford, Lida; Haas, Juergen; Kouranov, Andrei; McLaughlin, William A; Micallef, David I; Minor, Wladek; Shah, Raship; Schwede, Torsten; Tao, Yi-Ping; Westbrook, John D; Zimmerman, Matthew; Berman, Helen M

    2011-07-01

    The Protein Structure Initiative's Structural Biology Knowledgebase (SBKB, URL: http://sbkb.org ) is an open web resource designed to turn the products of the structural genomics and structural biology efforts into knowledge that can be used by the biological community to understand living systems and disease. Here we will present examples on how to use the SBKB to enable biological research. For example, a protein sequence or Protein Data Bank (PDB) structure ID search will provide a list of related protein structures in the PDB, associated biological descriptions (annotations), homology models, structural genomics protein target status, experimental protocols, and the ability to order available DNA clones from the PSI:Biology-Materials Repository. A text search will find publication and technology reports resulting from the PSI's high-throughput research efforts. Web tools that aid in research, including a system that accepts protein structure requests from the community, will also be described. Created in collaboration with the Nature Publishing Group, the Structural Biology Knowledgebase monthly update also provides a research library, editorials about new research advances, news, and an events calendar to present a broader view of structural genomics and structural biology.

  2. Toward University Modeling Instruction—Biology: Adapting Curricular Frameworks from Physics to Biology

    PubMed Central

    Manthey, Seth; Brewe, Eric

    2013-01-01

    University Modeling Instruction (UMI) is an approach to curriculum and pedagogy that focuses instruction on engaging students in building, validating, and deploying scientific models. Modeling Instruction has been successfully implemented in both high school and university physics courses. Studies within the physics education research (PER) community have identified UMI's positive impacts on learning gains, equity, attitudinal shifts, and self-efficacy. While the success of this pedagogical approach has been recognized within the physics community, the use of models and modeling practices is still being developed for biology. Drawing from the existing research on UMI in physics, we describe the theoretical foundations of UMI and how UMI can be adapted to include an emphasis on models and modeling for undergraduate introductory biology courses. In particular, we discuss our ongoing work to develop a framework for the first semester of a two-semester introductory biology course sequence by identifying the essential basic models for an introductory biology course sequence. PMID:23737628

  3. Toward university modeling instruction--biology: adapting curricular frameworks from physics to biology.

    PubMed

    Manthey, Seth; Brewe, Eric

    2013-06-01

    University Modeling Instruction (UMI) is an approach to curriculum and pedagogy that focuses instruction on engaging students in building, validating, and deploying scientific models. Modeling Instruction has been successfully implemented in both high school and university physics courses. Studies within the physics education research (PER) community have identified UMI's positive impacts on learning gains, equity, attitudinal shifts, and self-efficacy. While the success of this pedagogical approach has been recognized within the physics community, the use of models and modeling practices is still being developed for biology. Drawing from the existing research on UMI in physics, we describe the theoretical foundations of UMI and how UMI can be adapted to include an emphasis on models and modeling for undergraduate introductory biology courses. In particular, we discuss our ongoing work to develop a framework for the first semester of a two-semester introductory biology course sequence by identifying the essential basic models for an introductory biology course sequence.

  4. MetaCRAST: reference-guided extraction of CRISPR spacers from unassembled metagenomes.

    PubMed

    Moller, Abraham G; Liang, Chun

    2017-01-01

    Clustered regularly interspaced short palindromic repeat (CRISPR) systems are the adaptive immune systems of bacteria and archaea against viral infection. While CRISPRs have been exploited as a tool for genetic engineering, their spacer sequences can also provide valuable insights into microbial ecology by linking environmental viruses to their microbial hosts. Despite this importance, metagenomic CRISPR detection remains a major challenge. Here we present a reference-guided CRISPR spacer detection tool ( Meta genomic C RISPR R eference- A ided S earch T ool-MetaCRAST) that constrains searches based on user-specified direct repeats (DRs). These DRs could be expected from assembly or taxonomic profiles of metagenomes. We compared the performance of MetaCRAST to those of two existing metagenomic CRISPR detection tools-Crass and MinCED-using both real and simulated acid mine drainage (AMD) and enhanced biological phosphorus removal (EBPR) metagenomes. Our evaluation shows MetaCRAST improves CRISPR spacer detection in real metagenomes compared to the de novo CRISPR detection methods Crass and MinCED. Evaluation on simulated metagenomes show it performs better than de novo tools for Illumina metagenomes and comparably for 454 metagenomes. It also has comparable performance dependence on read length and community composition, run time, and accuracy to these tools. MetaCRAST is implemented in Perl, parallelizable through the Many Core Engine (MCE), and takes metagenomic sequence reads and direct repeat queries (FASTA or FASTQ) as input. It is freely available for download at https://github.com/molleraj/MetaCRAST.

  5. Use of simulated data sets to evaluate the fidelity of metagenomic processing methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mavromatis, K; Ivanova, N; Barry, Kerrie

    2007-01-01

    Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based ( blast hit distribution) and twomore » sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.« less

  6. Use of simulated data sets to evaluate the fidelity of Metagenomicprocessing methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mavromatis, Konstantinos; Ivanova, Natalia; Barry, Kerri

    2006-12-01

    Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity--based (blast hit distribution) and twomore » sequence composition--based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.« less

  7. Correlation between MCAT Biology Content Specifications and Topic Scope and Sequence of General Education College Biology Textbooks

    ERIC Educational Resources Information Center

    Rissing, Steven W.

    2013-01-01

    Most American colleges and universities offer gateway biology courses to meet the needs of three undergraduate audiences: biology and related science majors, many of whom will become biomedical researchers; premedical students meeting medical school requirements and preparing for the Medical College Admissions Test (MCAT); and students completing…

  8. A "Vision and Change" Reform of Introductory Biology Shifts Faculty Perceptions and Use of Active Learning

    ERIC Educational Resources Information Center

    Auerbach, Anna Jo; Schussler, Elisabeth

    2017-01-01

    Increasing faculty use of active-learning (AL) pedagogies in college classrooms is a persistent challenge in biology education. A large research-intensive university implemented changes to its biology majors' two-course introductory sequence as outlined by the "Vision and Change in Undergraduate Biology Education" final report. One goal…

  9. Scalable Kernel Methods and Algorithms for General Sequence Analysis

    ERIC Educational Resources Information Center

    Kuksa, Pavel

    2011-01-01

    Analysis of large-scale sequential data has become an important task in machine learning and pattern recognition, inspired in part by numerous scientific and technological applications such as the document and text classification or the analysis of biological sequences. However, current computational methods for sequence comparison still lack…

  10. Fluorescence in situ hybridization and optical mapping to correct scaffold arrangement in the tomato genome

    USDA-ARS?s Scientific Manuscript database

    Modern biological analyses are often assisted by recent technologies making the sequencing of complex genomes both technically possible and feasible. We recently sequenced the tomato genome that, like many eukaryotic genomes, is large and complex. Current sequencing technologies allow the developmen...

  11. Synthesis, Characterization, and Multimillion-Atom Simulation of Halogen-Based Energetic Materials for Agent Defeat

    DTIC Science & Technology

    2013-04-01

    DTRA-TR-13-23 Synthesis, Characterization, and Multimillion -Atom Simulation of Halogen-Based Energetic Materials for Agent Defeat Approved for...reagents for the destruction of biologically active materials and a simulation of their reactions on a multimillion atom scale with quantum...explosives for destruction of chemical & biological agents. Multimillion -atom molecular dynamics simulations with quantum mechanical accuracy were

  12. Discovery of Escherichia coli CRISPR sequences in an undergraduate laboratory.

    PubMed

    Militello, Kevin T; Lazatin, Justine C

    2017-05-01

    Clustered regularly interspaced short palindromic repeats (CRISPRs) represent a novel type of adaptive immune system found in eubacteria and archaebacteria. CRISPRs have recently generated a lot of attention due to their unique ability to catalog foreign nucleic acids, their ability to destroy foreign nucleic acids in a mechanism that shares some similarity to RNA interference, and the ability to utilize reconstituted CRISPR systems for genome editing in numerous organisms. In order to introduce CRISPR biology into an undergraduate upper-level laboratory, a five-week set of exercises was designed to allow students to examine the CRISPR status of uncharacterized Escherichia coli strains and to allow the discovery of new repeats and spacers. Students started the project by isolating genomic DNA from E. coli and amplifying the iap CRISPR locus using the polymerase chain reaction (PCR). The PCR products were analyzed by Sanger DNA sequencing, and the sequences were examined for the presence of CRISPR repeat sequences. The regions between the repeats, the spacers, were extracted and analyzed with BLASTN searches. Overall, CRISPR loci were sequenced from several previously uncharacterized E. coli strains and one E. coli K-12 strain. Sanger DNA sequencing resulted in the discovery of 36 spacer sequences and their corresponding surrounding repeat sequences. Five of the spacers were homologous to foreign (non-E. coli) DNA. Assessment of the laboratory indicates that improvements were made in the ability of students to answer questions relating to the structure and function of CRISPRs. Future directions of the laboratory are presented and discussed. © 2016 by The International Union of Biochemistry and Molecular Biology, 45(3):262-269, 2017. © 2016 The International Union of Biochemistry and Molecular Biology.

  13. Predicting PDZ domain mediated protein interactions from structure

    PubMed Central

    2013-01-01

    Background PDZ domains are structural protein domains that recognize simple linear amino acid motifs, often at protein C-termini, and mediate protein-protein interactions (PPIs) in important biological processes, such as ion channel regulation, cell polarity and neural development. PDZ domain-peptide interaction predictors have been developed based on domain and peptide sequence information. Since domain structure is known to influence binding specificity, we hypothesized that structural information could be used to predict new interactions compared to sequence-based predictors. Results We developed a novel computational predictor of PDZ domain and C-terminal peptide interactions using a support vector machine trained with PDZ domain structure and peptide sequence information. Performance was estimated using extensive cross validation testing. We used the structure-based predictor to scan the human proteome for ligands of 218 PDZ domains and show that the predictions correspond to known PDZ domain-peptide interactions and PPIs in curated databases. The structure-based predictor is complementary to the sequence-based predictor, finding unique known and novel PPIs, and is less dependent on training–testing domain sequence similarity. We used a functional enrichment analysis of our hits to create a predicted map of PDZ domain biology. This map highlights PDZ domain involvement in diverse biological processes, some only found by the structure-based predictor. Based on this analysis, we predict novel PDZ domain involvement in xenobiotic metabolism and suggest new interactions for other processes including wound healing and Wnt signalling. Conclusions We built a structure-based predictor of PDZ domain-peptide interactions, which can be used to scan C-terminal proteomes for PDZ interactions. We also show that the structure-based predictor finds many known PDZ mediated PPIs in human that were not found by our previous sequence-based predictor and is less dependent on training–testing domain sequence similarity. Using both predictors, we defined a functional map of human PDZ domain biology and predict novel PDZ domain function. Users may access our structure-based and previous sequence-based predictors at http://webservice.baderlab.org/domains/POW. PMID:23336252

  14. A detailed gene expression study of the Miscanthus genus reveals changes in the transcriptome associated with the rejuvenation of spring rhizomes.

    PubMed

    Barling, Adam; Swaminathan, Kankshita; Mitros, Therese; James, Brandon T; Morris, Juliette; Ngamboma, Ornella; Hall, Megan C; Kirkpatrick, Jessica; Alabady, Magdy; Spence, Ashley K; Hudson, Matthew E; Rokhsar, Daniel S; Moose, Stephen P

    2013-12-09

    The Miscanthus genus of perennial C4 grasses contains promising biofuel crops for temperate climates. However, few genomic resources exist for Miscanthus, which limits understanding of its interesting biology and future genetic improvement. A comprehensive catalog of expressed sequences were generated from a variety of Miscanthus species and tissue types, with an emphasis on characterizing gene expression changes in spring compared to fall rhizomes. Illumina short read sequencing technology was used to produce transcriptome sequences from different tissues and organs during distinct developmental stages for multiple Miscanthus species, including Miscanthus sinensis, Miscanthus sacchariflorus, and their interspecific hybrid Miscanthus × giganteus. More than fifty billion base-pairs of Miscanthus transcript sequence were produced. Overall, 26,230 Sorghum gene models (i.e., ~ 96% of predicted Sorghum genes) had at least five Miscanthus reads mapped to them, suggesting that a large portion of the Miscanthus transcriptome is represented in this dataset. The Miscanthus × giganteus data was used to identify genes preferentially expressed in a single tissue, such as the spring rhizome, using Sorghum bicolor as a reference. Quantitative real-time PCR was used to verify examples of preferential expression predicted via RNA-Seq. Contiguous consensus transcript sequences were assembled for each species and annotated using InterProScan. Sequences from the assembled transcriptome were used to amplify genomic segments from a doubled haploid Miscanthus sinensis and from Miscanthus × giganteus to further disentangle the allelic and paralogous variations in genes. This large expressed sequence tag collection creates a valuable resource for the study of Miscanthus biology by providing detailed gene sequence information and tissue preferred expression patterns. We have successfully generated a database of transcriptome assemblies and demonstrated its use in the study of genes of interest. Analysis of gene expression profiles revealed biological pathways that exhibit altered regulation in spring compared to fall rhizomes, which are consistent with their different physiological functions. The expression profiles of the subterranean rhizome provides a better understanding of the biological activities of the underground stem structures that are essentials for perenniality and the storage or remobilization of carbon and nutrient resources.

  15. Systematic analysis of signaling pathways using an integrative environment.

    PubMed

    Visvanathan, Mahesh; Breit, Marc; Pfeifer, Bernhard; Baumgartner, Christian; Modre-Osprian, Robert; Tilg, Bernhard

    2007-01-01

    Understanding the biological processes of signaling pathways as a whole system requires an integrative software environment that has comprehensive capabilities. The environment should include tools for pathway design, visualization, simulation and a knowledge base concerning signaling pathways as one. In this paper we introduce a new integrative environment for the systematic analysis of signaling pathways. This system includes environments for pathway design, visualization, simulation and a knowledge base that combines biological and modeling information concerning signaling pathways that provides the basic understanding of the biological system, its structure and functioning. The system is designed with a client-server architecture. It contains a pathway designing environment and a simulation environment as upper layers with a relational knowledge base as the underlying layer. The TNFa-mediated NF-kB signal trans-duction pathway model was designed and tested using our integrative framework. It was also useful to define the structure of the knowledge base. Sensitivity analysis of this specific pathway was performed providing simulation data. Then the model was extended showing promising initial results. The proposed system offers a holistic view of pathways containing biological and modeling data. It will help us to perform biological interpretation of the simulation results and thus contribute to a better understanding of the biological system for drug identification.

  16. A System to Automatically Classify and Name Any Individual Genome-Sequenced Organism Independently of Current Biological Classification and Nomenclature

    PubMed Central

    Song, Yuhyun; Leman, Scotland; Monteil, Caroline L.; Heath, Lenwood S.; Vinatzer, Boris A.

    2014-01-01

    A broadly accepted and stable biological classification system is a prerequisite for biological sciences. It provides the means to describe and communicate about life without ambiguity. Current biological classification and nomenclature use the species as the basic unit and require lengthy and laborious species descriptions before newly discovered organisms can be assigned to a species and be named. The current system is thus inadequate to classify and name the immense genetic diversity within species that is now being revealed by genome sequencing on a daily basis. To address this lack of a general intra-species classification and naming system adequate for today’s speed of discovery of new diversity, we propose a classification and naming system that is exclusively based on genome similarity and that is suitable for automatic assignment of codes to any genome-sequenced organism without requiring any phenotypic or phylogenetic analysis. We provide examples demonstrating that genome similarity-based codes largely align with current taxonomic groups at many different levels in bacteria, animals, humans, plants, and viruses. Importantly, the proposed approach is only slightly affected by the order of code assignment and can thus provide codes that reflect similarity between organisms and that do not need to be revised upon discovery of new diversity. We envision genome similarity-based codes to complement current biological nomenclature and to provide a universal means to communicate unambiguously about any genome-sequenced organism in fields as diverse as biodiversity research, infectious disease control, human and microbial forensics, animal breed and plant cultivar certification, and human ancestry research. PMID:24586551

  17. Biological function in the twilight zone of sequence conservation.

    PubMed

    Ponting, Chris P

    2017-08-16

    Strong DNA conservation among divergent species is an indicator of enduring functionality. With weaker sequence conservation we enter a vast 'twilight zone' in which sequence subject to transient or lower constraint cannot be distinguished easily from neutrally evolving, non-functional sequence. Twilight zone functional sequence is illuminated instead by principles of selective constraint and positive selection using genomic data acquired from within a species' population. Application of these principles reveals that despite being biochemically active, most twilight zone sequence is not functional.

  18. Rational Diversification of a Promoter Providing Fine-Tuned Expression and Orthogonal Regulation for Synthetic Biology

    PubMed Central

    Blount, Benjamin A.; Weenink, Tim; Vasylechko, Serge; Ellis, Tom

    2012-01-01

    Yeast is an ideal organism for the development and application of synthetic biology, yet there remain relatively few well-characterised biological parts suitable for precise engineering of this chassis. In order to address this current need, we present here a strategy that takes a single biological part, a promoter, and re-engineers it to produce a fine-graded output range promoter library and new regulated promoters desirable for orthogonal synthetic biology applications. A highly constitutive Saccharomyces cerevisiae promoter, PFY1p, was identified by bioinformatic approaches, characterised in vivo and diversified at its core sequence to create a 36-member promoter library. TetR regulation was introduced into PFY1p to create a synthetic inducible promoter (iPFY1p) that functions in an inverter device. Orthogonal and scalable regulation of synthetic promoters was then demonstrated for the first time using customisable Transcription Activator-Like Effectors (TALEs) modified and designed to act as orthogonal repressors for specific PFY1-based promoters. The ability to diversify a promoter at its core sequences and then independently target Transcription Activator-Like Orthogonal Repressors (TALORs) to virtually any of these sequences shows great promise toward the design and construction of future synthetic gene networks that encode complex “multi-wire” logic functions. PMID:22442681

  19. Rational diversification of a promoter providing fine-tuned expression and orthogonal regulation for synthetic biology.

    PubMed

    Blount, Benjamin A; Weenink, Tim; Vasylechko, Serge; Ellis, Tom

    2012-01-01

    Yeast is an ideal organism for the development and application of synthetic biology, yet there remain relatively few well-characterised biological parts suitable for precise engineering of this chassis. In order to address this current need, we present here a strategy that takes a single biological part, a promoter, and re-engineers it to produce a fine-graded output range promoter library and new regulated promoters desirable for orthogonal synthetic biology applications. A highly constitutive Saccharomyces cerevisiae promoter, PFY1p, was identified by bioinformatic approaches, characterised in vivo and diversified at its core sequence to create a 36-member promoter library. TetR regulation was introduced into PFY1p to create a synthetic inducible promoter (iPFY1p) that functions in an inverter device. Orthogonal and scalable regulation of synthetic promoters was then demonstrated for the first time using customisable Transcription Activator-Like Effectors (TALEs) modified and designed to act as orthogonal repressors for specific PFY1-based promoters. The ability to diversify a promoter at its core sequences and then independently target Transcription Activator-Like Orthogonal Repressors (TALORs) to virtually any of these sequences shows great promise toward the design and construction of future synthetic gene networks that encode complex "multi-wire" logic functions.

  20. RBT-GA: a novel metaheuristic for solving the multiple sequence alignment problem

    PubMed Central

    Taheri, Javid; Zomaya, Albert Y

    2009-01-01

    Background Multiple Sequence Alignment (MSA) has always been an active area of research in Bioinformatics. MSA is mainly focused on discovering biologically meaningful relationships among different sequences or proteins in order to investigate the underlying main characteristics/functions. This information is also used to generate phylogenetic trees. Results This paper presents a novel approach, namely RBT-GA, to solve the MSA problem using a hybrid solution methodology combining the Rubber Band Technique (RBT) and the Genetic Algorithm (GA) metaheuristic. RBT is inspired by the behavior of an elastic Rubber Band (RB) on a plate with several poles, which is analogues to locations in the input sequences that could potentially be biologically related. A GA attempts to mimic the evolutionary processes of life in order to locate optimal solutions in an often very complex landscape. RBT-GA is a population based optimization algorithm designed to find the optimal alignment for a set of input protein sequences. In this novel technique, each alignment answer is modeled as a chromosome consisting of several poles in the RBT framework. These poles resemble locations in the input sequences that are most likely to be correlated and/or biologically related. A GA-based optimization process improves these chromosomes gradually yielding a set of mostly optimal answers for the MSA problem. Conclusion RBT-GA is tested with one of the well-known benchmarks suites (BALiBASE 2.0) in this area. The obtained results show that the superiority of the proposed technique even in the case of formidable sequences. PMID:19594869

  1. Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development

    PubMed Central

    2011-01-01

    Background We present the genome sequence of the tammar wallaby, Macropus eugenii, which is a member of the kangaroo family and the first representative of the iconic hopping mammals that symbolize Australia to be sequenced. The tammar has many unusual biological characteristics, including the longest period of embryonic diapause of any mammal, extremely synchronized seasonal breeding and prolonged and sophisticated lactation within a well-defined pouch. Like other marsupials, it gives birth to highly altricial young, and has a small number of very large chromosomes, making it a valuable model for genomics, reproduction and development. Results The genome has been sequenced to 2 × coverage using Sanger sequencing, enhanced with additional next generation sequencing and the integration of extensive physical and linkage maps to build the genome assembly. We also sequenced the tammar transcriptome across many tissues and developmental time points. Our analyses of these data shed light on mammalian reproduction, development and genome evolution: there is innovation in reproductive and lactational genes, rapid evolution of germ cell genes, and incomplete, locus-specific X inactivation. We also observe novel retrotransposons and a highly rearranged major histocompatibility complex, with many class I genes located outside the complex. Novel microRNAs in the tammar HOX clusters uncover new potential mammalian HOX regulatory elements. Conclusions Analyses of these resources enhance our understanding of marsupial gene evolution, identify marsupial-specific conserved non-coding elements and critical genes across a range of biological systems, including reproduction, development and immunity, and provide new insight into marsupial and mammalian biology and genome evolution. PMID:21854559

  2. Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development.

    PubMed

    Renfree, Marilyn B; Papenfuss, Anthony T; Deakin, Janine E; Lindsay, James; Heider, Thomas; Belov, Katherine; Rens, Willem; Waters, Paul D; Pharo, Elizabeth A; Shaw, Geoff; Wong, Emily S W; Lefèvre, Christophe M; Nicholas, Kevin R; Kuroki, Yoko; Wakefield, Matthew J; Zenger, Kyall R; Wang, Chenwei; Ferguson-Smith, Malcolm; Nicholas, Frank W; Hickford, Danielle; Yu, Hongshi; Short, Kirsty R; Siddle, Hannah V; Frankenberg, Stephen R; Chew, Keng Yih; Menzies, Brandon R; Stringer, Jessica M; Suzuki, Shunsuke; Hore, Timothy A; Delbridge, Margaret L; Patel, Hardip R; Mohammadi, Amir; Schneider, Nanette Y; Hu, Yanqiu; O'Hara, William; Al Nadaf, Shafagh; Wu, Chen; Feng, Zhi-Ping; Cocks, Benjamin G; Wang, Jianghui; Flicek, Paul; Searle, Stephen M J; Fairley, Susan; Beal, Kathryn; Herrero, Javier; Carone, Dawn M; Suzuki, Yutaka; Sugano, Sumio; Toyoda, Atsushi; Sakaki, Yoshiyuki; Kondo, Shinji; Nishida, Yuichiro; Tatsumoto, Shoji; Mandiou, Ion; Hsu, Arthur; McColl, Kaighin A; Lansdell, Benjamin; Weinstock, George; Kuczek, Elizabeth; McGrath, Annette; Wilson, Peter; Men, Artem; Hazar-Rethinam, Mehlika; Hall, Allison; Davis, John; Wood, David; Williams, Sarah; Sundaravadanam, Yogi; Muzny, Donna M; Jhangiani, Shalini N; Lewis, Lora R; Morgan, Margaret B; Okwuonu, Geoffrey O; Ruiz, San Juana; Santibanez, Jireh; Nazareth, Lynne; Cree, Andrew; Fowler, Gerald; Kovar, Christie L; Dinh, Huyen H; Joshi, Vandita; Jing, Chyn; Lara, Fremiet; Thornton, Rebecca; Chen, Lei; Deng, Jixin; Liu, Yue; Shen, Joshua Y; Song, Xing-Zhi; Edson, Janette; Troon, Carmen; Thomas, Daniel; Stephens, Amber; Yapa, Lankesha; Levchenko, Tanya; Gibbs, Richard A; Cooper, Desmond W; Speed, Terence P; Fujiyama, Asao; Graves, Jennifer A M; O'Neill, Rachel J; Pask, Andrew J; Forrest, Susan M; Worley, Kim C

    2011-08-29

    We present the genome sequence of the tammar wallaby, Macropus eugenii, which is a member of the kangaroo family and the first representative of the iconic hopping mammals that symbolize Australia to be sequenced. The tammar has many unusual biological characteristics, including the longest period of embryonic diapause of any mammal, extremely synchronized seasonal breeding and prolonged and sophisticated lactation within a well-defined pouch. Like other marsupials, it gives birth to highly altricial young, and has a small number of very large chromosomes, making it a valuable model for genomics, reproduction and development. The genome has been sequenced to 2 × coverage using Sanger sequencing, enhanced with additional next generation sequencing and the integration of extensive physical and linkage maps to build the genome assembly. We also sequenced the tammar transcriptome across many tissues and developmental time points. Our analyses of these data shed light on mammalian reproduction, development and genome evolution: there is innovation in reproductive and lactational genes, rapid evolution of germ cell genes, and incomplete, locus-specific X inactivation. We also observe novel retrotransposons and a highly rearranged major histocompatibility complex, with many class I genes located outside the complex. Novel microRNAs in the tammar HOX clusters uncover new potential mammalian HOX regulatory elements. Analyses of these resources enhance our understanding of marsupial gene evolution, identify marsupial-specific conserved non-coding elements and critical genes across a range of biological systems, including reproduction, development and immunity, and provide new insight into marsupial and mammalian biology and genome evolution.

  3. Biologically-inspired hexapod robot design and simulation

    NASA Technical Reports Server (NTRS)

    Espenschied, Kenneth S.; Quinn, Roger D.

    1994-01-01

    The design and construction of a biologically-inspired hexapod robot is presented. A previously developed simulation is modified to include models of the DC drive motors, the motor driver circuits and their transmissions. The application of this simulation to the design and development of the robot is discussed. The mechanisms thought to be responsible for the leg coordination of the walking stick insect were previously applied to control the straight-line locomotion of a robot. We generalized these rules for a robot walking on a plane. This biologically-inspired control strategy is used to control the robot in simulation. Numerical results show that the general body motion and performance of the simulated robot is similar to that of the robot based on our preliminary experimental results.

  4. Model annotation for synthetic biology: automating model to nucleotide sequence conversion

    PubMed Central

    Misirli, Goksel; Hallinan, Jennifer S.; Yu, Tommy; Lawson, James R.; Wimalaratne, Sarala M.; Cooling, Michael T.; Wipat, Anil

    2011-01-01

    Motivation: The need for the automated computational design of genetic circuits is becoming increasingly apparent with the advent of ever more complex and ambitious synthetic biology projects. Currently, most circuits are designed through the assembly of models of individual parts such as promoters, ribosome binding sites and coding sequences. These low level models are combined to produce a dynamic model of a larger device that exhibits a desired behaviour. The larger model then acts as a blueprint for physical implementation at the DNA level. However, the conversion of models of complex genetic circuits into DNA sequences is a non-trivial undertaking due to the complexity of mapping the model parts to their physical manifestation. Automating this process is further hampered by the lack of computationally tractable information in most models. Results: We describe a method for automatically generating DNA sequences from dynamic models implemented in CellML and Systems Biology Markup Language (SBML). We also identify the metadata needed to annotate models to facilitate automated conversion, and propose and demonstrate a method for the markup of these models using RDF. Our algorithm has been implemented in a software tool called MoSeC. Availability: The software is available from the authors' web site http://research.ncl.ac.uk/synthetic_biology/downloads.html. Contact: anil.wipat@ncl.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21296753

  5. Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic

    PubMed Central

    Yebra, Gonzalo; Hodcroft, Emma B.; Ragonnet-Cronin, Manon L.; Pillay, Deenan; Brown, Andrew J. Leigh; Fraser, Christophe; Kellam, Paul; de Oliveira, Tulio; Dennis, Ann; Hoppe, Anne; Kityo, Cissy; Frampton, Dan; Ssemwanga, Deogratius; Tanser, Frank; Keshani, Jagoda; Lingappa, Jairam; Herbeck, Joshua; Wawer, Maria; Essex, Max; Cohen, Myron S.; Paton, Nicholas; Ratmann, Oliver; Kaleebu, Pontiano; Hayes, Richard; Fidler, Sarah; Quinn, Thomas; Novitsky, Vladimir; Haywards, Andrew; Nastouli, Eleni; Morris, Steven; Clark, Duncan; Kozlakidis, Zisis

    2016-01-01

    HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes (gag-pol-env, gag-pol, gag, pol, env and partial pol) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree’s using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag-pol-env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences. PMID:28008945

  6. Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic.

    PubMed

    Yebra, Gonzalo; Hodcroft, Emma B; Ragonnet-Cronin, Manon L; Pillay, Deenan; Brown, Andrew J Leigh

    2016-12-23

    HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes (gag-pol-env, gag-pol, gag, pol, env and partial pol) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree's using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag-pol-env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences.

  7. Application of artificial neural networks to identify equilibration in computer simulations

    NASA Astrophysics Data System (ADS)

    Leibowitz, Mitchell H.; Miller, Evan D.; Henry, Michael M.; Jankowski, Eric

    2017-11-01

    Determining which microstates generated by a thermodynamic simulation are representative of the ensemble for which sampling is desired is a ubiquitous, underspecified problem. Artificial neural networks are one type of machine learning algorithm that can provide a reproducible way to apply pattern recognition heuristics to underspecified problems. Here we use the open-source TensorFlow machine learning library and apply it to the problem of identifying which hypothetical observation sequences from a computer simulation are “equilibrated” and which are not. We generate training populations and test populations of observation sequences with embedded linear and exponential correlations. We train a two-neuron artificial network to distinguish the correlated and uncorrelated sequences. We find that this simple network is good enough for > 98% accuracy in identifying exponentially-decaying energy trajectories from molecular simulations.

  8. A fortran program for Monte Carlo simulation of oil-field discovery sequences

    USGS Publications Warehouse

    Bohling, Geoffrey C.; Davis, J.C.

    1993-01-01

    We have developed a program for performing Monte Carlo simulation of oil-field discovery histories. A synthetic parent population of fields is generated as a finite sample from a distribution of specified form. The discovery sequence then is simulated by sampling without replacement from this parent population in accordance with a probabilistic discovery process model. The program computes a chi-squared deviation between synthetic and actual discovery sequences as a function of the parameters of the discovery process model, the number of fields in the parent population, and the distributional parameters of the parent population. The program employs the three-parameter log gamma model for the distribution of field sizes and employs a two-parameter discovery process model, allowing the simulation of a wide range of scenarios. ?? 1993.

  9. Synthetic Biology Open Language (SBOL) Version 2.1.0.

    PubMed

    Beal, Jacob; Cox, Robert Sidney; Grünberg, Raik; McLaughlin, James; Nguyen, Tramy; Bartley, Bryan; Bissell, Michael; Choi, Kiri; Clancy, Kevin; Macklin, Chris; Madsen, Curtis; Misirli, Goksel; Oberortner, Ernst; Pocock, Matthew; Roehner, Nicholas; Samineni, Meher; Zhang, Michael; Zhang, Zhen; Zundel, Zach; Gennari, John H; Myers, Chris; Sauro, Herbert; Wipat, Anil

    2016-09-01

    Synthetic biology builds upon the techniques and successes of genetics, molecular biology, and metabolic engineering by applying engineering principles to the design of biological systems. The field still faces substantial challenges, including long development times, high rates of failure, and poor reproducibility. One method to ameliorate these problems would be to improve the exchange of information about designed systems between laboratories. The Synthetic Biology Open Language (SBOL) has been developed as a standard to support the specification and exchange of biological design information in synthetic biology, filling a need not satisfied by other pre-existing standards. This document details version 2.1 of SBOL that builds upon version 2.0 published in last year's JIB special issue. In particular, SBOL 2.1 includes improved rules for what constitutes a valid SBOL document, new role fields to simplify the expression of sequence features and how components are used in context, and new best practices descriptions to improve the exchange of basic sequence topology information and the description of genetic design provenance, as well as miscellaneous other minor improvements.

  10. Synthetic Biology Open Language (SBOL) Version 2.1.0.

    PubMed

    Beal, Jacob; Cox, Robert Sidney; Grünberg, Raik; McLaughlin, James; Nguyen, Tramy; Bartley, Bryan; Bissell, Michael; Choi, Kiri; Clancy, Kevin; Macklin, Chris; Madsen, Curtis; Misirli, Goksel; Oberortner, Ernst; Pocock, Matthew; Roehner, Nicholas; Samineni, Meher; Zhang, Michael; Zhang, Zhen; Zundel, Zach; Gennari, John; Myers, Chris; Sauro, Herbert; Wipat, Anil

    2016-12-18

    Synthetic biology builds upon the techniques and successes of genetics, molecular biology, and metabolic engineering by applying engineering principles to the design of biological systems. The field still faces substantial challenges, including long development times, high rates of failure, and poor reproducibility. One method to ameliorate these problems would be to improve the exchange of information about designed systems between laboratories. The Synthetic Biology Open Language (SBOL) has been developed as a standard to support the specification and exchange of biological design information in synthetic biology, filling a need not satisfied by other pre-existing standards. This document details version 2.1 of SBOL that builds upon version 2.0 published in last year’s JIB special issue. In particular, SBOL 2.1 includes improved rules for what constitutes a valid SBOL document, new role fields to simplify the expression of sequence features and how components are used in context, and new best practices descriptions to improve the exchange of basic sequence topology information and the description of genetic design provenance, as well as miscellaneous other minor improvements.

  11. Online model checking approach based parameter estimation to a neuronal fate decision simulation model in Caenorhabditis elegans with hybrid functional Petri net with extension.

    PubMed

    Li, Chen; Nagasaki, Masao; Koh, Chuan Hock; Miyano, Satoru

    2011-05-01

    Mathematical modeling and simulation studies are playing an increasingly important role in helping researchers elucidate how living organisms function in cells. In systems biology, researchers typically tune many parameters manually to achieve simulation results that are consistent with biological knowledge. This severely limits the size and complexity of simulation models built. In order to break this limitation, we propose a computational framework to automatically estimate kinetic parameters for a given network structure. We utilized an online (on-the-fly) model checking technique (which saves resources compared to the offline approach), with a quantitative modeling and simulation architecture named hybrid functional Petri net with extension (HFPNe). We demonstrate the applicability of this framework by the analysis of the underlying model for the neuronal cell fate decision model (ASE fate model) in Caenorhabditis elegans. First, we built a quantitative ASE fate model containing 3327 components emulating nine genetic conditions. Then, using our developed efficient online model checker, MIRACH 1.0, together with parameter estimation, we ran 20-million simulation runs, and were able to locate 57 parameter sets for 23 parameters in the model that are consistent with 45 biological rules extracted from published biological articles without much manual intervention. To evaluate the robustness of these 57 parameter sets, we run another 20 million simulation runs using different magnitudes of noise. Our simulation results concluded that among these models, one model is the most reasonable and robust simulation model owing to the high stability against these stochastic noises. Our simulation results provide interesting biological findings which could be used for future wet-lab experiments.

  12. Draft Genome Sequences of Six Mycobacterium immunogenum, Strains Obtained from a Chloraminated Drinking Water Distribution System Simulator

    EPA Science Inventory

    We report the draft genome sequences of six Mycobacterium immunogenum isolated from a chloraminated drinking water distribution system simulator subjected to changes in operational parameters. M. immunogenum, a rapidly growing mycobacteria previously reported as the cause of hyp...

  13. Optimization of parameter values for complex pulse sequences by simulated annealing: application to 3D MP-RAGE imaging of the brain.

    PubMed

    Epstein, F H; Mugler, J P; Brookeman, J R

    1994-02-01

    A number of pulse sequence techniques, including magnetization-prepared gradient echo (MP-GRE), segmented GRE, and hybrid RARE, employ a relatively large number of variable pulse sequence parameters and acquire the image data during a transient signal evolution. These sequences have recently been proposed and/or used for clinical applications in the brain, spine, liver, and coronary arteries. Thus, the need for a method of deriving optimal pulse sequence parameter values for this class of sequences now exists. Due to the complexity of these sequences, conventional optimization approaches, such as applying differential calculus to signal difference equations, are inadequate. We have developed a general framework for adapting the simulated annealing algorithm to pulse sequence parameter value optimization, and applied this framework to the specific case of optimizing the white matter-gray matter signal difference for a T1-weighted variable flip angle 3D MP-RAGE sequence. Using our algorithm, the values of 35 sequence parameters, including the magnetization-preparation RF pulse flip angle and delay time, 32 flip angles in the variable flip angle gradient-echo acquisition sequence, and the magnetization recovery time, were derived. Optimized 3D MP-RAGE achieved up to a 130% increase in white matter-gray matter signal difference compared with optimized 3D RF-spoiled FLASH with the same total acquisition time. The simulated annealing approach was effective at deriving optimal parameter values for a specific 3D MP-RAGE imaging objective, and may be useful for other imaging objectives and sequences in this general class.

  14. PetriScape - A plugin for discrete Petri net simulations in Cytoscape.

    PubMed

    Almeida, Diogo; Azevedo, Vasco; Silva, Artur; Baumbach, Jan

    2016-06-04

    Systems biology plays a central role for biological network analysis in the post-genomic era. Cytoscape is the standard bioinformatics tool offering the community an extensible platform for computational analysis of the emerging cellular network together with experimental omics data sets. However, only few apps/plugins/tools are available for simulating network dynamics in Cytoscape 3. Many approaches of varying complexity exist but none of them have been integrated into Cytoscape as app/plugin yet. Here, we introduce PetriScape, the first Petri net simulator for Cytoscape. Although discrete Petri nets are quite simplistic models, they are capable of modeling global network properties and simulating their behaviour. In addition, they are easily understood and well visualizable. PetriScape comes with the following main functionalities: (1) import of biological networks in SBML format, (2) conversion into a Petri net, (3) visualization as Petri net, and (4) simulation and visualization of the token flow in Cytoscape. PetriScape is the first Cytoscape plugin for Petri nets. It allows a straightforward Petri net model creation, simulation and visualization with Cytoscape, providing clues about the activity of key components in biological networks.

  15. PetriScape - A plugin for discrete Petri net simulations in Cytoscape.

    PubMed

    Almeida, Diogo; Azevedo, Vasco; Silva, Artur; Baumbach, Jan

    2016-03-01

    Systems biology plays a central role for biological network analysis in the post-genomic era. Cytoscape is the standard bioinformatics tool offering the community an extensible platform for computational analysis of the emerging cellular network together with experimental omics data sets. However, only few apps/plugins/tools are available for simulating network dynamics in Cytoscape 3. Many approaches of varying complexity exist but none of them have been integrated into Cytoscape as app/plugin yet. Here, we introduce PetriScape, the first Petri net simulator for Cytoscape. Although discrete Petri nets are quite simplistic models, they are capable of modeling global network properties and simulating their behaviour. In addition, they are easily understood and well visualizable. PetriScape comes with the following main functionalities: (1) import of biological networks in SBML format, (2) conversion into a Petri net, (3) visualization as Petri net, and (4) simulation and visualization of the token flow in Cytoscape. PetriScape is the first Cytoscape plugin for Petri nets. It allows a straightforward Petri net model creation, simulation and visualization with Cytoscape, providing clues about the activity of key components in biological networks.

  16. Unraveling Hydrophobic Interactions at the Molecular Scale Using Force Spectroscopy and Molecular Dynamics Simulations.

    PubMed

    Stock, Philipp; Monroe, Jacob I; Utzig, Thomas; Smith, David J; Shell, M Scott; Valtiner, Markus

    2017-03-28

    Interactions between hydrophobic moieties steer ubiquitous processes in aqueous media, including the self-organization of biologic matter. Recent decades have seen tremendous progress in understanding these for macroscopic hydrophobic interfaces. Yet, it is still a challenge to experimentally measure hydrophobic interactions (HIs) at the single-molecule scale and thus to compare with theory. Here, we present a combined experimental-simulation approach to directly measure and quantify the sequence dependence and additivity of HIs in peptide systems at the single-molecule scale. We combine dynamic single-molecule force spectroscopy on model peptides with fully atomistic, both equilibrium and nonequilibrium, molecular dynamics (MD) simulations of the same systems. Specifically, we mutate a flexible (GS) 5 peptide scaffold with increasing numbers of hydrophobic leucine monomers and measure the peptides' desorption from hydrophobic self-assembled monolayer surfaces. Based on the analysis of nonequilibrium work-trajectories, we measure an interaction free energy that scales linearly with 3.0-3.4 k B T per leucine. In good agreement, simulations indicate a similar trend with 2.1 k B T per leucine, while also providing a detailed molecular view into HIs. This approach potentially provides a roadmap for directly extracting qualitative and quantitative single-molecule interactions at solid/liquid interfaces in a wide range of fields, including interactions at biointerfaces and adhesive interactions in industrial applications.

  17. Atomistic Free Energy Model for Nucleic Acids: Simulations of Single-Stranded DNA and the Entropy Landscape of RNA Stem-Loop Structures.

    PubMed

    Mak, Chi H

    2015-11-25

    While single-stranded (ss) segments of DNAs and RNAs are ubiquitous in biology, details about their structures have only recently begun to emerge. To study ssDNA and RNAs, we have developed a new Monte Carlo (MC) simulation using a free energy model for nucleic acids that has the atomisitic accuracy to capture fine molecular details of the sugar-phosphate backbone. Formulated on the basis of a first-principle calculation of the conformational entropy of the nucleic acid chain, this free energy model correctly reproduced both the long and short length-scale structural properties of ssDNA and RNAs in a rigorous comparison against recent data from fluorescence resonance energy transfer, small-angle X-ray scattering, force spectroscopy and fluorescence correlation transport measurements on sequences up to ∼100 nucleotides long. With this new MC algorithm, we conducted a comprehensive investigation of the entropy landscape of small RNA stem-loop structures. From a simulated ensemble of ∼10(6) equilibrium conformations, the entropy for the initiation of different size RNA hairpin loops was computed and compared against thermodynamic measurements. Starting from seeded hairpin loops, constrained MC simulations were then used to estimate the entropic costs associated with propagation of the stem. The numerical results provide new direct molecular insights into thermodynaimc measurement from macroscopic calorimetry and melting experiments.

  18. Optimal word sizes for dissimilarity measures and estimation of the degree of dissimilarity between DNA sequences.

    PubMed

    Wu, Tiee-Jian; Huang, Ying-Hsueh; Li, Lung-An

    2005-11-15

    Several measures of DNA sequence dissimilarity have been developed. The purpose of this paper is 3-fold. Firstly, we compare the performance of several word-based or alignment-based methods. Secondly, we give a general guideline for choosing the window size and determining the optimal word sizes for several word-based measures at different window sizes. Thirdly, we use a large-scale simulation method to simulate data from the distribution of SK-LD (symmetric Kullback-Leibler discrepancy). These simulated data can be used to estimate the degree of dissimilarity beta between any pair of DNA sequences. Our study shows (1) for whole sequence similiarity/dissimilarity identification the window size taken should be as large as possible, but probably not >3000, as restricted by CPU time in practice, (2) for each measure the optimal word size increases with window size, (3) when the optimal word size is used, SK-LD performance is superior in both simulation and real data analysis, (4) the estimate beta of beta based on SK-LD can be used to filter out quickly a large number of dissimilar sequences and speed alignment-based database search for similar sequences and (5) beta is also applicable in local similarity comparison situations. For example, it can help in selecting oligo probes with high specificity and, therefore, has potential in probe design for microarrays. The algorithm SK-LD, estimate beta and simulation software are implemented in MATLAB code, and are available at http://www.stat.ncku.edu.tw/tjwu

  19. Metamodels for Transdisciplinary Analysis of Wildlife Population Dynamics

    PubMed Central

    Lacy, Robert C.; Miller, Philip S.; Nyhus, Philip J.; Pollak, J. P.; Raboy, Becky E.; Zeigler, Sara L.

    2013-01-01

    Wildlife population models have been criticized for their narrow disciplinary perspective when analyzing complexity in coupled biological – physical – human systems. We describe a “metamodel” approach to species risk assessment when diverse threats act at different spatiotemporal scales, interact in non-linear ways, and are addressed by distinct disciplines. A metamodel links discrete, individual models that depict components of a complex system, governing the flow of information among models and the sequence of simulated events. Each model simulates processes specific to its disciplinary realm while being informed of changes in other metamodel components by accessing common descriptors of the system, populations, and individuals. Interactions among models are revealed as emergent properties of the system. We introduce a new metamodel platform, both to further explain key elements of the metamodel approach and as an example that we hope will facilitate the development of other platforms for implementing metamodels in population biology, species risk assessments, and conservation planning. We present two examples – one exploring the interactions of dispersal in metapopulations and the spread of infectious disease, the other examining predator-prey dynamics – to illustrate how metamodels can reveal complex processes and unexpected patterns when population dynamics are linked to additional extrinsic factors. Metamodels provide a flexible, extensible method for expanding population viability analyses beyond models of isolated population demographics into more complete representations of the external and intrinsic threats that must be understood and managed for species conservation. PMID:24349567

  20. The Dynamics of DNA Sequencing.

    ERIC Educational Resources Information Center

    Morvillo, Nancy

    1997-01-01

    Describes a paper-and-pencil activity that helps students understand DNA sequencing and expands student understanding of DNA structure, replication, and gel electrophoresis. Appropriate for advanced biology students who are familiar with the Sanger method. (DDR)

Top