Smith, Gretchen N. L.; Conway, Christopher M.; Bauernschmidt, Althea; Pisoni, David B.
2015-01-01
Recent research suggests that language acquisition may rely on domain-general learning abilities, such as structured sequence processing, which is the ability to extract, encode, and represent structured patterns in a temporal sequence. If structured sequence processing supports language, then it may be possible to improve language function by enhancing this foundational learning ability. The goal of the present study was to use a novel computerized training task as a means to better understand the relationship between structured sequence processing and language function. Participants first were assessed on pre-training tasks to provide baseline behavioral measures of structured sequence processing and language abilities. Participants were then quasi-randomly assigned to either a treatment group involving adaptive structured visuospatial sequence training, a treatment group involving adaptive non-structured visuospatial sequence training, or a control group. Following four days of sequence training, all participants were assessed with the same pre-training measures. Overall comparison of the post-training means revealed no group differences. However, in order to examine the potential relations between sequence training, structured sequence processing, and language ability, we used a mediation analysis that showed two competing effects. In the indirect effect, adaptive sequence training with structural regularities had a positive impact on structured sequence processing performance, which in turn had a positive impact on language processing. This finding not only identifies a potential novel intervention to treat language impairments but also may be the first demonstration that structured sequence processing can be improved and that this, in turn, has an impact on language processing. However, in the direct effect, adaptive sequence training with structural regularities had a direct negative impact on language processing. This unexpected finding suggests that adaptive training with structural regularities might potentially interfere with language processing. Taken together, these findings underscore the importance of pursuing designs that promote a better understanding of the mechanisms underlying training-related changes, so that regimens can be developed that help reduce these types of negative effects while simultaneously maximizing the benefits to outcome measures of interest. PMID:25946222
Smith, Gretchen N L; Conway, Christopher M; Bauernschmidt, Althea; Pisoni, David B
2015-01-01
Recent research suggests that language acquisition may rely on domain-general learning abilities, such as structured sequence processing, which is the ability to extract, encode, and represent structured patterns in a temporal sequence. If structured sequence processing supports language, then it may be possible to improve language function by enhancing this foundational learning ability. The goal of the present study was to use a novel computerized training task as a means to better understand the relationship between structured sequence processing and language function. Participants first were assessed on pre-training tasks to provide baseline behavioral measures of structured sequence processing and language abilities. Participants were then quasi-randomly assigned to either a treatment group involving adaptive structured visuospatial sequence training, a treatment group involving adaptive non-structured visuospatial sequence training, or a control group. Following four days of sequence training, all participants were assessed with the same pre-training measures. Overall comparison of the post-training means revealed no group differences. However, in order to examine the potential relations between sequence training, structured sequence processing, and language ability, we used a mediation analysis that showed two competing effects. In the indirect effect, adaptive sequence training with structural regularities had a positive impact on structured sequence processing performance, which in turn had a positive impact on language processing. This finding not only identifies a potential novel intervention to treat language impairments but also may be the first demonstration that structured sequence processing can be improved and that this, in turn, has an impact on language processing. However, in the direct effect, adaptive sequence training with structural regularities had a direct negative impact on language processing. This unexpected finding suggests that adaptive training with structural regularities might potentially interfere with language processing. Taken together, these findings underscore the importance of pursuing designs that promote a better understanding of the mechanisms underlying training-related changes, so that regimens can be developed that help reduce these types of negative effects while simultaneously maximizing the benefits to outcome measures of interest.
(Pea)nuts and bolts of visual narrative: Structure and meaning in sequential image comprehension
Cohn, Neil; Paczynski, Martin; Jackendoff, Ray; Holcomb, Phillip J.; Kuperberg, Gina R.
2012-01-01
Just as syntax differentiates coherent sentences from scrambled word strings, the comprehension of sequential images must also use a cognitive system to distinguish coherent narrative sequences from random strings of images. We conducted experiments analogous to two classic studies of language processing to examine the contributions of narrative structure and semantic relatedness to processing sequential images. We compared four types of comic strips: 1) Normal sequences with both structure and meaning, 2) Semantic Only sequences (in which the panels were related to a common semantic theme, but had no narrative structure), 3) Structural Only sequences (narrative structure but no semantic relatedness), and 4) Scrambled sequences of randomly-ordered panels. In Experiment 1, participants monitored for target panels in sequences presented panel-by-panel. Reaction times were slowest to panels in Scrambled sequences, intermediate in both Structural Only and Semantic Only sequences, and fastest in Normal sequences. This suggests that both semantic relatedness and narrative structure offer advantages to processing. Experiment 2 measured ERPs to all panels across the whole sequence. The N300/N400 was largest to panels in both the Scrambled and Structural Only sequences, intermediate in Semantic Only sequences and smallest in the Normal sequences. This implies that a combination of narrative structure and semantic relatedness can facilitate semantic processing of upcoming panels (as reflected by the N300/N400). Also, panels in the Scrambled sequences evoked a larger left-lateralized anterior negativity than panels in the Structural Only sequences. This localized effect was distinct from the N300/N400, and appeared despite the fact that these two sequence types were matched on local semantic relatedness between individual panels. These findings suggest that sequential image comprehension uses a narrative structure that may be independent of semantic relatedness. Altogether, we argue that the comprehension of visual narrative is guided by an interaction between structure and meaning. PMID:22387723
Music and language perception: expectations, structural integration, and cognitive sequencing.
Tillmann, Barbara
2012-10-01
Music can be described as sequences of events that are structured in pitch and time. Studying music processing provides insight into how complex event sequences are learned, perceived, and represented by the brain. Given the temporal nature of sound, expectations, structural integration, and cognitive sequencing are central in music perception (i.e., which sounds are most likely to come next and at what moment should they occur?). This paper focuses on similarities in music and language cognition research, showing that music cognition research provides insight into the understanding of not only music processing but also language processing and the processing of other structured stimuli. The hypothesis of shared resources between music and language processing and of domain-general dynamic attention has motivated the development of research to test music as a means to stimulate sensory, cognitive, and motor processes. Copyright © 2012 Cognitive Science Society, Inc.
Heinke, Florian; Bittrich, Sebastian; Kaiser, Florian; Labudde, Dirk
2016-01-01
To understand the molecular function of biopolymers, studying their structural characteristics is of central importance. Graphics programs are often utilized to conceive these properties, but with the increasing number of available structures in databases or structure models produced by automated modeling frameworks this process requires assistance from tools that allow automated structure visualization. In this paper a web server and its underlying method for generating graphical sequence representations of molecular structures is presented. The method, called SequenceCEROSENE (color encoding of residues obtained by spatial neighborhood embedding), retrieves the sequence of each amino acid or nucleotide chain in a given structure and produces a color coding for each residue based on three-dimensional structure information. From this, color-highlighted sequences are obtained, where residue coloring represent three-dimensional residue locations in the structure. This color encoding thus provides a one-dimensional representation, from which spatial interactions, proximity and relations between residues or entire chains can be deduced quickly and solely from color similarity. Furthermore, additional heteroatoms and chemical compounds bound to the structure, like ligands or coenzymes, are processed and reported as well. To provide free access to SequenceCEROSENE, a web server has been implemented that allows generating color codings for structures deposited in the Protein Data Bank or structure models uploaded by the user. Besides retrieving visualizations in popular graphic formats, underlying raw data can be downloaded as well. In addition, the server provides user interactivity with generated visualizations and the three-dimensional structure in question. Color encoded sequences generated by SequenceCEROSENE can aid to quickly perceive the general characteristics of a structure of interest (or entire sets of complexes), thus supporting the researcher in the initial phase of structure-based studies. In this respect, the web server can be a valuable tool, as users are allowed to process multiple structures, quickly switch between results, and interact with generated visualizations in an intuitive manner. The SequenceCEROSENE web server is available at https://biosciences.hs-mittweida.de/seqcerosene.
Sequencing of Dust Filter Production Process Using Design Structure Matrix (DSM)
NASA Astrophysics Data System (ADS)
Sari, R. M.; Matondang, A. R.; Syahputri, K.; Anizar; Siregar, I.; Rizkya, I.; Ursula, C.
2018-01-01
Metal casting company produces machinery spare part for manufactures. One of the product produced is dust filter. Most of palm oil mill used this product. Since it is used in most of palm oil mill, company often have problems to address this product. One of problem is the disordered of production process. It carried out by the job sequencing. The important job that should be solved first, least implement, while less important job and could be completed later, implemented first. Design Structure Matrix (DSM) used to analyse and determine priorities in the production process. DSM analysis is sort of production process through dependency sequencing. The result of dependency sequences shows the sequence process according to the inter-process linkage considering before and after activities. Finally, it demonstrates their activities to the coupled activities for metal smelting, refining, grinding, cutting container castings, metal expenditure of molds, metal casting, coating processes, and manufacture of molds of sand.
Biocuration in the structure-function linkage database: the anatomy of a superfamily.
Holliday, Gemma L; Brown, Shoshana D; Akiva, Eyal; Mischel, David; Hicks, Michael A; Morris, John H; Huang, Conrad C; Meng, Elaine C; Pegg, Scott C-H; Ferrin, Thomas E; Babbitt, Patricia C
2017-01-01
With ever-increasing amounts of sequence data available in both the primary literature and sequence repositories, there is a bottleneck in annotating molecular function to a sequence. This article describes the biocuration process and methods used in the structure-function linkage database (SFLD) to help address some of the challenges. We discuss how the hierarchy within the SFLD allows us to infer detailed functional properties for functionally diverse enzyme superfamilies in which all members are homologous, conserve an aspect of their chemical function and have associated conserved structural features that enable the chemistry. Also presented is the Enzyme Structure-Function Ontology (ESFO), which has been designed to capture the relationships between enzyme sequence, structure and function that underlie the SFLD and is used to guide the biocuration processes within the SFLD. http://sfld.rbvi.ucsf.edu/. © The Author 2017. Published by Oxford University Press.
Topological Structure of the Space of Phenotypes: The Case of RNA Neutral Networks
Aguirre, Jacobo; Buldú, Javier M.; Stich, Michael; Manrubia, Susanna C.
2011-01-01
The evolution and adaptation of molecular populations is constrained by the diversity accessible through mutational processes. RNA is a paradigmatic example of biopolymer where genotype (sequence) and phenotype (approximated by the secondary structure fold) are identified in a single molecule. The extreme redundancy of the genotype-phenotype map leads to large ensembles of RNA sequences that fold into the same secondary structure and can be connected through single-point mutations. These ensembles define neutral networks of phenotypes in sequence space. Here we analyze the topological properties of neutral networks formed by 12-nucleotides RNA sequences, obtained through the exhaustive folding of sequence space. A total of 412 sequences fragments into 645 subnetworks that correspond to 57 different secondary structures. The topological analysis reveals that each subnetwork is far from being random: it has a degree distribution with a well-defined average and a small dispersion, a high clustering coefficient, and an average shortest path between nodes close to its minimum possible value, i.e. the Hamming distance between sequences. RNA neutral networks are assortative due to the correlation in the composition of neighboring sequences, a feature that together with the symmetries inherent to the folding process explains the existence of communities. Several topological relationships can be analytically derived attending to structural restrictions and generic properties of the folding process. The average degree of these phenotypic networks grows logarithmically with their size, such that abundant phenotypes have the additional advantage of being more robust to mutations. This property prevents fragmentation of neutral networks and thus enhances the navigability of sequence space. In summary, RNA neutral networks show unique topological properties, unknown to other networks previously described. PMID:22028856
Prediction of RNA secondary structures: from theory to models and real molecules
NASA Astrophysics Data System (ADS)
Schuster, Peter
2006-05-01
RNA secondary structures are derived from RNA sequences, which are strings built form the natural four letter nucleotide alphabet, {AUGC}. These coarse-grained structures, in turn, are tantamount to constrained strings over a three letter alphabet. Hence, the secondary structures are discrete objects and the number of sequences always exceeds the number of structures. The sequences built from two letter alphabets form perfect structures when the nucleotides can form a base pair, as is the case with {GC} or {AU}, but the relation between the sequences and structures differs strongly from the four letter alphabet. A comprehensive theory of RNA structure is presented, which is based on the concepts of sequence space and shape space, being a space of structures. It sets the stage for modelling processes in ensembles of RNA molecules like evolutionary optimization or kinetic folding as dynamical phenomena guided by mappings between the two spaces. The number of minimum free energy (mfe) structures is always smaller than the number of sequences, even for two letter alphabets. Folding of RNA molecules into mfe energy structures constitutes a non-invertible mapping from sequence space onto shape space. The preimage of a structure in sequence space is defined as its neutral network. Similarly the set of suboptimal structures is the preimage of a sequence in shape space. This set represents the conformation space of a given sequence. The evolutionary optimization of structures in populations is a process taking place in sequence space, whereas kinetic folding occurs in molecular ensembles that optimize free energy in conformation space. Efficient folding algorithms based on dynamic programming are available for the prediction of secondary structures for given sequences. The inverse problem, the computation of sequences for predefined structures, is an important tool for the design of RNA molecules with tailored properties. Simultaneous folding or cofolding of two or more RNA molecules can be modelled readily at the secondary structure level and allows prediction of the most stable (mfe) conformations of complexes together with suboptimal states. Cofolding algorithms are important tools for efficient and highly specific primer design in the polymerase chain reaction (PCR) and help to explain the mechanisms of small interference RNA (si-RNA) molecules in gene regulation. The evolutionary optimization of RNA structures is illustrated by the search for a target structure and mimics aptamer selection in evolutionary biotechnology. It occurs typically in steps consisting of short adaptive phases interrupted by long epochs of little or no obvious progress in optimization. During these quasi-stationary epochs the populations are essentially confined to neutral networks where they search for sequences that allow a continuation of the adaptive process. Modelling RNA evolution as a simultaneous process in sequence and shape space provides answers to questions of the optimal population size and mutation rates. Kinetic folding is a stochastic process in conformation space. Exact solutions are derived by direct simulation in the form of trajectory sampling or by solving the master equation. The exact solutions can be approximated straightforwardly by Arrhenius kinetics on barrier trees, which represent simplified versions of conformational energy landscapes. The existence of at least one sequence forming any arbitrarily chosen pair of structures is granted by the intersection theorem. Folding kinetics is the key to understanding and designing multistable RNA molecules or RNA switches. These RNAs form two or more long lived conformations, and conformational changes occur either spontaneously or are induced through binding of small molecules or other biopolymers. RNA switches are found in nature where they act as elements in genetic and metabolic regulation. The reliability of RNA secondary structure prediction is limited by the accuracy with which the empirical parameters can be determined and by principal deficiencies, for example by the lack of energy contributions resulting from tertiary interactions. In addition, native structures may be determined by folding kinetics rather than by thermodynamics. We address the first problem by considering base pair probabilities or base pairing entropies, which are derived from the partition function of conformations. A high base pair probability corresponding to a low pairing entropy is taken as an indicator of a high reliability of prediction. Pseudoknots are discussed as an example of a tertiary interaction that is highly important for RNA function. Moreover, pseudoknot formation is readily incorporated into structure prediction algorithms. Some examples of experimental data on RNA secondary structures that are readily explained using the landscape concept are presented. They deal with (i) properties of RNA molecules with random sequences, (ii) RNA molecules from restricted alphabets, (iii) existence of neutral networks, (iv) shape space covering, (v) riboswitches and (vi) evolution of non-coding RNAs as an example of evolution restricted to neutral networks.
Predicting PDZ domain mediated protein interactions from structure
2013-01-01
Background PDZ domains are structural protein domains that recognize simple linear amino acid motifs, often at protein C-termini, and mediate protein-protein interactions (PPIs) in important biological processes, such as ion channel regulation, cell polarity and neural development. PDZ domain-peptide interaction predictors have been developed based on domain and peptide sequence information. Since domain structure is known to influence binding specificity, we hypothesized that structural information could be used to predict new interactions compared to sequence-based predictors. Results We developed a novel computational predictor of PDZ domain and C-terminal peptide interactions using a support vector machine trained with PDZ domain structure and peptide sequence information. Performance was estimated using extensive cross validation testing. We used the structure-based predictor to scan the human proteome for ligands of 218 PDZ domains and show that the predictions correspond to known PDZ domain-peptide interactions and PPIs in curated databases. The structure-based predictor is complementary to the sequence-based predictor, finding unique known and novel PPIs, and is less dependent on training–testing domain sequence similarity. We used a functional enrichment analysis of our hits to create a predicted map of PDZ domain biology. This map highlights PDZ domain involvement in diverse biological processes, some only found by the structure-based predictor. Based on this analysis, we predict novel PDZ domain involvement in xenobiotic metabolism and suggest new interactions for other processes including wound healing and Wnt signalling. Conclusions We built a structure-based predictor of PDZ domain-peptide interactions, which can be used to scan C-terminal proteomes for PDZ interactions. We also show that the structure-based predictor finds many known PDZ mediated PPIs in human that were not found by our previous sequence-based predictor and is less dependent on training–testing domain sequence similarity. Using both predictors, we defined a functional map of human PDZ domain biology and predict novel PDZ domain function. Users may access our structure-based and previous sequence-based predictors at http://webservice.baderlab.org/domains/POW. PMID:23336252
Sequence-structure mapping errors in the PDB: OB-fold domains
Venclovas, Česlovas; Ginalski, Krzysztof; Kang, Chulhee
2004-01-01
The Protein Data Bank (PDB) is the single most important repository of structural data for proteins and other biologically relevant molecules. Therefore, it is critically important to keep the PDB data, as much as possible, error-free. In this study, we have analyzed PDB crystal structures possessing oligonucleotide/oligosaccharide binding (OB)-fold, one of the highly populated folds, for the presence of sequence-structure mapping errors. Using energy-based structure quality assessment coupled with sequence analyses, we have found that there are at least five OB-structures in the PDB that have regions where sequences have been incorrectly mapped onto the structure. We have demonstrated that the combination of these computation techniques is effective not only in detecting sequence-structure mapping errors, but also in providing guidance to correct them. Namely, we have used results of computational analysis to direct a revision of X-ray data for one of the PDB entries containing a fairly inconspicuous sequence-structure mapping error. The revised structure has been deposited with the PDB. We suggest use of computational energy assessment and sequence analysis techniques to facilitate structure determination when homologs having known structure are available to use as a reference. Such computational analysis may be useful in either guiding the sequence-structure assignment process or verifying the sequence mapping within poorly defined regions. PMID:15133161
Evolution of ribozymes in the presence of a mineral surface
Stephenson, James D.; Popović, Milena; Bristow, Thomas F.
2016-01-01
Mineral surfaces are often proposed as the sites of critical processes in the emergence of life. Clay minerals in particular are thought to play significant roles in the origin of life including polymerizing, concentrating, organizing, and protecting biopolymers. In these scenarios, the impact of minerals on biopolymer folding is expected to influence evolutionary processes. These processes include both the initial emergence of functional structures in the presence of the mineral and the subsequent transition away from the mineral-associated niche. The initial evolution of function depends upon the number and distribution of sequences capable of functioning in the presence of the mineral, and the transition to new environments depends upon the overlap between sequences that evolve on the mineral surface and sequences that can perform the same functions in the mineral's absence. To examine these processes, we evolved self-cleaving ribozymes in vitro in the presence or absence of Na-saturated montmorillonite clay mineral particles. Starting from a shared population of random sequences, RNA populations were evolved in parallel, along separate evolutionary trajectories. Comparative sequence analysis and activity assays show that the impact of this clay mineral on functional structure selection was minimal; it neither prevented common structures from emerging, nor did it promote the emergence of new structures. This suggests that montmorillonite does not improve RNA's ability to evolve functional structures; however, it also suggests that RNAs that do evolve in contact with montmorillonite retain the same structures in mineral-free environments, potentially facilitating an evolutionary transition away from a mineral-associated niche. PMID:27793980
Levels of integration in cognitive control and sequence processing in the prefrontal cortex.
Bahlmann, Jörg; Korb, Franziska M; Gratton, Caterina; Friederici, Angela D
2012-01-01
Cognitive control is necessary to flexibly act in changing environments. Sequence processing is needed in language comprehension to build the syntactic structure in sentences. Functional imaging studies suggest that sequence processing engages the left ventrolateral prefrontal cortex (PFC). In contrast, cognitive control processes additionally recruit bilateral rostral lateral PFC regions. The present study aimed to investigate these two types of processes in one experimental paradigm. Sequence processing was manipulated using two different sequencing rules varying in complexity. Cognitive control was varied with different cue-sets that determined the choice of a sequencing rule. Univariate analyses revealed distinct PFC regions for the two types of processing (i.e. sequence processing: left ventrolateral PFC and cognitive control processing: bilateral dorsolateral and rostral PFC). Moreover, in a common brain network (including left lateral PFC and intraparietal sulcus) no interaction between sequence and cognitive control processing was observed. In contrast, a multivariate pattern analysis revealed an interaction of sequence and cognitive control processing, such that voxels in left lateral PFC and parietal cortex showed different tuning functions for tasks involving different sequencing and cognitive control demands. These results suggest that the difference between the process of rule selection (i.e. cognitive control) and the process of rule-based sequencing (i.e. sequence processing) find their neuronal underpinnings in distinct activation patterns in lateral PFC. Moreover, the combination of rule selection and rule sequencing can shape the response of neurons in lateral PFC and parietal cortex.
Levels of Integration in Cognitive Control and Sequence Processing in the Prefrontal Cortex
Bahlmann, Jörg; Korb, Franziska M.; Gratton, Caterina; Friederici, Angela D.
2012-01-01
Cognitive control is necessary to flexibly act in changing environments. Sequence processing is needed in language comprehension to build the syntactic structure in sentences. Functional imaging studies suggest that sequence processing engages the left ventrolateral prefrontal cortex (PFC). In contrast, cognitive control processes additionally recruit bilateral rostral lateral PFC regions. The present study aimed to investigate these two types of processes in one experimental paradigm. Sequence processing was manipulated using two different sequencing rules varying in complexity. Cognitive control was varied with different cue-sets that determined the choice of a sequencing rule. Univariate analyses revealed distinct PFC regions for the two types of processing (i.e. sequence processing: left ventrolateral PFC and cognitive control processing: bilateral dorsolateral and rostral PFC). Moreover, in a common brain network (including left lateral PFC and intraparietal sulcus) no interaction between sequence and cognitive control processing was observed. In contrast, a multivariate pattern analysis revealed an interaction of sequence and cognitive control processing, such that voxels in left lateral PFC and parietal cortex showed different tuning functions for tasks involving different sequencing and cognitive control demands. These results suggest that the difference between the process of rule selection (i.e. cognitive control) and the process of rule-based sequencing (i.e. sequence processing) find their neuronal underpinnings in distinct activation patterns in lateral PFC. Moreover, the combination of rule selection and rule sequencing can shape the response of neurons in lateral PFC and parietal cortex. PMID:22952762
How the Sequence of a Gene Specifies Structural Symmetry in Proteins
Shen, Xiaojuan; Huang, Tongcheng; Wang, Guanyu; Li, Guanglin
2015-01-01
Internal symmetry is commonly observed in the majority of fundamental protein folds. Meanwhile, sufficient evidence suggests that nascent polypeptide chains of proteins have the potential to start the co-translational folding process and this process allows mRNA to contain additional information on protein structure. In this paper, we study the relationship between gene sequences and protein structures from the viewpoint of symmetry to explore how gene sequences code for structural symmetry in proteins. We found that, for a set of two-fold symmetric proteins from left-handed beta-helix fold, intragenic symmetry always exists in their corresponding gene sequences. Meanwhile, codon usage bias and local mRNA structure might be involved in modulating translation speed for the formation of structural symmetry: a major decrease of local codon usage bias in the middle of the codon sequence can be identified as a common feature; and major or consecutive decreases in local mRNA folding energy near the boundaries of the symmetric substructures can also be observed. The results suggest that gene duplication and fusion may be an evolutionarily conserved process for this protein fold. In addition, the usage of rare codons and the formation of higher order of secondary structure near the boundaries of symmetric substructures might have coevolved as conserved mechanisms to slow down translation elongation and to facilitate effective folding of symmetric substructures. These findings provide valuable insights into our understanding of the mechanisms of translation and its evolution, as well as the design of proteins via symmetric modules. PMID:26641668
DOE Office of Scientific and Technical Information (OSTI.GOV)
Larsen, P. E.; Trivedi, G.; Sreedasyam, A.
2010-07-06
Accurate structural annotation is important for prediction of function and required for in vitro approaches to characterize or validate the gene expression products. Despite significant efforts in the field, determination of the gene structure from genomic data alone is a challenging and inaccurate process. The ease of acquisition of transcriptomic sequence provides a direct route to identify expressed sequences and determine the correct gene structure. We developed methods to utilize RNA-seq data to correct errors in the structural annotation and extend the boundaries of current gene models using assembly approaches. The methods were validated with a transcriptomic data set derivedmore » from the fungus Laccaria bicolor, which develops a mycorrhizal symbiotic association with the roots of many tree species. Our analysis focused on the subset of 1501 gene models that are differentially expressed in the free living vs. mycorrhizal transcriptome and are expected to be important elements related to carbon metabolism, membrane permeability and transport, and intracellular signaling. Of the set of 1501 gene models, 1439 (96%) successfully generated modified gene models in which all error flags were successfully resolved and the sequences aligned to the genomic sequence. The remaining 4% (62 gene models) either had deviations from transcriptomic data that could not be spanned or generated sequence that did not align to genomic sequence. The outcome of this process is a set of high confidence gene models that can be reliably used for experimental characterization of protein function. 69% of expressed mycorrhizal JGI 'best' gene models deviated from the transcript sequence derived by this method. The transcriptomic sequence enabled correction of a majority of the structural inconsistencies and resulted in a set of validated models for 96% of the mycorrhizal genes. The method described here can be applied to improve gene structural annotation in other species, provided that there is a sequenced genome and a set of gene models.« less
Lee, Sooncheol; Kang, Changwon
2011-05-06
The RNA oligo(U) sequence, along with an immediately preceding RNA hairpin structure, is an essential cis-acting element for bacterial class I intrinsic termination. This sequence not only causes a pause in transcription during the beginning of the termination process but also facilitates transcript release at the end of the process. In this study, the oligo(U) sequence of the bacteriophage T7 intrinsic terminator Tφ, rather than the hairpin structure, induced pauses of phage T7 RNA polymerase not only at the termination site, triggering a termination process, but also 3 bp upstream, exerting an antitermination effect. The upstream pause presumably allowed RNA to form a thermodynamically more stable secondary structure rather than a terminator hairpin and to persist because the 5'-half of the terminator hairpin-forming sequence could be sequestered by a farther upstream sequence via sequence-specific hybridization, prohibiting formation of the terminator hairpin and termination. The putative antiterminator RNA structure lacked several base pairs essential for termination when probed using RNases A, T1, and V1. When the antiterminator was destabilized by incorporation of IMP into nascent RNA at G residue positions, antitermination was abolished. Furthermore, antitermination strength increased with more stable antiterminator secondary structures and longer pauses. Thus, the oligo(U)-mediated pause prior to the termination site can exert a cis-acting antitermination activity on intrinsic terminator Tφ, and the termination efficiency depends primarily on the termination-interfering pause that precedes the termination-facilitating pause at the termination site.
Processing multiple non-adjacent dependencies: evidence from sequence learning
de Vries, Meinou H.; Petersson, Karl Magnus; Geukes, Sebastian; Zwitserlood, Pienie; Christiansen, Morten H.
2012-01-01
Processing non-adjacent dependencies is considered to be one of the hallmarks of human language. Assuming that sequence-learning tasks provide a useful way to tap natural-language-processing mechanisms, we cross-modally combined serial reaction time and artificial-grammar learning paradigms to investigate the processing of multiple nested (A1A2A3B3B2B1) and crossed dependencies (A1A2A3B1B2B3), containing either three or two dependencies. Both reaction times and prediction errors highlighted problems with processing the middle dependency in nested structures (A1A2A3B3_B1), reminiscent of the ‘missing-verb effect’ observed in English and French, but not with crossed structures (A1A2A3B1_B3). Prior linguistic experience did not play a major role: native speakers of German and Dutch—which permit nested and crossed dependencies, respectively—showed a similar pattern of results for sequences with three dependencies. As for sequences with two dependencies, reaction times and prediction errors were similar for both nested and crossed dependencies. The results suggest that constraints on the processing of multiple non-adjacent dependencies are determined by the specific ordering of the non-adjacent dependencies (i.e. nested or crossed), as well as the number of non-adjacent dependencies to be resolved (i.e. two or three). Furthermore, these constraints may not be specific to language but instead derive from limitations on structured sequence learning. PMID:22688641
On Cognition, Structured Sequence Processing, and Adaptive Dynamical Systems
NASA Astrophysics Data System (ADS)
Petersson, Karl Magnus
2008-11-01
Cognitive neuroscience approaches the brain as a cognitive system: a system that functionally is conceptualized in terms of information processing. We outline some aspects of this concept and consider a physical system to be an information processing device when a subclass of its physical states can be viewed as representational/cognitive and transitions between these can be conceptualized as a process operating on these states by implementing operations on the corresponding representational structures. We identify a generic and fundamental problem in cognition: sequentially organized structured processing. Structured sequence processing provides the brain, in an essential sense, with its processing logic. In an approach addressing this problem, we illustrate how to integrate levels of analysis within a framework of adaptive dynamical systems. We note that the dynamical system framework lends itself to a description of asynchronous event-driven devices, which is likely to be important in cognition because the brain appears to be an asynchronous processing system. We use the human language faculty and natural language processing as a concrete example through out.
Xie, Guangfa; Wang, Lan; Gao, Qikang; Yu, Wenjing; Hong, Xutao; Zhao, Lingyun; Zou, Huijun
2013-09-01
To understand the role of the community structure of microbes in the environment in the fermentation of Shaoxing rice wine, samples collected from a wine factory were subjected to Illumina-based metagenomic sequencing. De novo assembly of the sequencing reads allowed the characterisation of more than 23 thousand microbial genes derived from 1.7 and 1.88 Gbp of sequences from two samples fermented for 5 and 30 days respectively. The microbial community structure at different fermentation times of Shaoxing rice wine was revealed, showing the different roles of the microbiota in the fermentation process of Shaoxing rice wine. The gene function of both samples was also studied in the COG database, with most genes belonging to category S (function unknown), category E (amino acid transport and metabolism) and unclassified group. The results show that both the microbial community structure and gene function composition change greatly at different time points of Shaoxing rice wine fermentation. © 2013 Society of Chemical Industry.
Template-based protein structure modeling using the RaptorX web server.
Källberg, Morten; Wang, Haipeng; Wang, Sheng; Peng, Jian; Wang, Zhiyong; Lu, Hui; Xu, Jinbo
2012-07-19
A key challenge of modern biology is to uncover the functional role of the protein entities that compose cellular proteomes. To this end, the availability of reliable three-dimensional atomic models of proteins is often crucial. This protocol presents a community-wide web-based method using RaptorX (http://raptorx.uchicago.edu/) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling. RaptorX distinguishes itself from other servers by the quality of the alignment between a target sequence and one or multiple distantly related template proteins (especially those with sparse sequence profiles) and by a novel nonlinear scoring function and a probabilistic-consistency algorithm. Consequently, RaptorX delivers high-quality structural models for many targets with only remote templates. At present, it takes RaptorX ~35 min to finish processing a sequence of 200 amino acids. Since its official release in August 2011, RaptorX has processed ~6,000 sequences submitted by ~1,600 users from around the world.
Template-based protein structure modeling using the RaptorX web server
Källberg, Morten; Wang, Haipeng; Wang, Sheng; Peng, Jian; Wang, Zhiyong; Lu, Hui; Xu, Jinbo
2016-01-01
A key challenge of modern biology is to uncover the functional role of the protein entities that compose cellular proteomes. To this end, the availability of reliable three-dimensional atomic models of proteins is often crucial. This protocol presents a community-wide web-based method using RaptorX (http://raptorx.uchicago.edu/) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling. RaptorX distinguishes itself from other servers by the quality of the alignment between a target sequence and one or multiple distantly related template proteins (especially those with sparse sequence profiles) and by a novel nonlinear scoring function and a probabilistic-consistency algorithm. Consequently, RaptorX delivers high-quality structural models for many targets with only remote templates. At present, it takes RaptorX ~35 min to finish processing a sequence of 200 amino acids. Since its official release in August 2011, RaptorX has processed ~6,000 sequences submitted by ~1,600 users from around the world. PMID:22814390
NASA Astrophysics Data System (ADS)
Kraljić, K.; Strüngmann, L.; Fimmel, E.; Gumbel, M.
2018-01-01
The genetic code is degenerated and it is assumed that redundancy provides error detection and correction mechanisms in the translation process. However, the biological meaning of the code's structure is still under current research. This paper presents a Genetic Code Analysis Toolkit (GCAT) which provides workflows and algorithms for the analysis of the structure of nucleotide sequences. In particular, sets or sequences of codons can be transformed and tested for circularity, comma-freeness, dichotomic partitions and others. GCAT comes with a fertile editor custom-built to work with the genetic code and a batch mode for multi-sequence processing. With the ability to read FASTA files or load sequences from GenBank, the tool can be used for the mathematical and statistical analysis of existing sequence data. GCAT is Java-based and provides a plug-in concept for extensibility. Availability: Open source Homepage:http://www.gcat.bio/
Structural study of Bombyx mori silk fibroin during processing for regeneration
NASA Astrophysics Data System (ADS)
Ha, Sung-Won
Bombyx mori silk fibroin has excellent mechanical properties combined with flexibility, tissue compatibility, and high oxygen permeability in the wet condition. This important material should be dissolved and regenerated to be utilized as useful forms such as gel, film, fiber, powder, or non-woven. However, it has long been a problem that the regenerated fibroin materials show poor mechanical properties and brittleness. These problems were technically solved by improving a fiber processing method reported here. The regenerated fibroin fibers showed much better mechanical properties compared to the original silk fibers. This improved technique for the fiber processing of Bombyx mori silk fibroin may be used as a model system for other semi-crystalline fiber forming proteins, becoming available through biotechnology. The physical and chemical properties of the regenerated fibers were characterized by SinTechRTM tensile testing, X-ray diffraction, solid state 13C NMR spectroscopy, and SEM. Unlike synthetic polymers, the molecular weight distribution of Bombyx mori silk fibroin is mono-disperse because silk fibroin is synthesized from DNA template. Genetic studies have revealed the entire amino acid sequence of Bombyx mori silk fibroin. It is known that the crystalline silk II structure is composed of hexa-amino acid sequences, GAGAGS. However, in the amino acid sequence of Bombyx mori silk fibroin heavy chain, there are present 11 chemically irregular but evolutionarily conserved sequences with about 31 amino acid residues (irregular GT˜GT sequences). The structure and role of these irregular sequences have remained unknown. One of the most frequently appearing irregular sequences was synthesized by a peptide synthesizer. The three-dimensional structure of this irregular silk peptide was studied by the high resolution two-dimensional NMR technique. The three-dimensional structure of this peptide shows that it makes a turn or loop structure (distorted O shape), which means the proceeding backbone direction is changed 180° by this sequence. This may facilitate the beta-sheet formation of the crystal forming building blocks, GAGAGS/GY˜GY sequences, in fibroin heavy chain. It may also facilitate the solubilization of the fibroin heavy chain within the silk gland.
Georgiev, O; Birnstiel, M L
1985-01-01
Analysis of cDNA sequences obtained from the small nuclear RNA U7 has previously suggested specific contacts, by base pairing, between the conserved stem-loop structure and CAAGAAAGA sequence of the histone pre-mRNA and the 5'-terminal sequence of the U7 RNA during RNA processing. In order to test some aspects of the model we have created a series of linker scan, deletion and insertion mutants of the 3' terminus of a sea urchin H3 histone gene and have injected mutant DNAs or in vitro synthesized precursors into frog oocyte nuclei for interpretation. We find that, in addition to the stem-loop structure of the mRNA, the CAAGAAAGA spacer transcript within the histone pre-mRNA is required absolutely for RNA processing, as predicted from our model. Spacer sequences immediately downstream of the CAAGAAAGA motif are not complementary to U7 RNA. Nevertheless, they are necessary for obtaining a maximal rate of RNA processing, as is the ACCA sequence coding for the 3' terminus of the mature mRNA. An increase of distance between the mRNA palindrome and the CAAGAAAGA by as little as six nucleotides abolishes all processing. It may, therefore, be useful to regard both these sequence motifs as part of one and the same RNA processing signal with narrowly defined topologies. Interestingly, U7 RNA-dependent 3' processing of histone pre-mRNA can occur in RNA injection experiments only when the in vitro synthesized pre-mRNA contains sequence extensions well beyond the region of sequence complementarities to the U7 RNA. In addition to directing 3' processing the terminal mRNA sequences may have a role in histone mRNA stabilization in the cytoplasmic compartment. Images Fig. 3. Fig. 4. Fig. 5. Fig. 6. Fig. 7. PMID:2410259
Sequence Segmentation with changeptGUI.
Tasker, Edward; Keith, Jonathan M
2017-01-01
Many biological sequences have a segmental structure that can provide valuable clues to their content, structure, and function. The program changept is a tool for investigating the segmental structure of a sequence, and can also be applied to multiple sequences in parallel to identify a common segmental structure, thus providing a method for integrating multiple data types to identify functional elements in genomes. In the previous edition of this book, a command line interface for changept is described. Here we present a graphical user interface for this package, called changeptGUI. This interface also includes tools for pre- and post-processing of data and results to facilitate investigation of the number and characteristics of segment classes.
Sequence Memory Constraints Give Rise to Language-Like Structure through Iterated Learning
Cornish, Hannah; Dale, Rick; Kirby, Simon; Christiansen, Morten H.
2017-01-01
Human language is composed of sequences of reusable elements. The origins of the sequential structure of language is a hotly debated topic in evolutionary linguistics. In this paper, we show that sets of sequences with language-like statistical properties can emerge from a process of cultural evolution under pressure from chunk-based memory constraints. We employ a novel experimental task that is non-linguistic and non-communicative in nature, in which participants are trained on and later asked to recall a set of sequences one-by-one. Recalled sequences from one participant become training data for the next participant. In this way, we simulate cultural evolution in the laboratory. Our results show a cumulative increase in structure, and by comparing this structure to data from existing linguistic corpora, we demonstrate a close parallel between the sets of sequences that emerge in our experiment and those seen in natural language. PMID:28118370
Protein Interaction Profile Sequencing (PIP-seq).
Foley, Shawn W; Gregory, Brian D
2016-10-10
Every eukaryotic RNA transcript undergoes extensive post-transcriptional processing from the moment of transcription up through degradation. This regulation is performed by a distinct cohort of RNA-binding proteins which recognize their target transcript by both its primary sequence and secondary structure. Here, we describe protein interaction profile sequencing (PIP-seq), a technique that uses ribonuclease-based footprinting followed by high-throughput sequencing to globally assess both protein-bound RNA sequences and RNA secondary structure. PIP-seq utilizes single- and double-stranded RNA-specific nucleases in the absence of proteins to infer RNA secondary structure. These libraries are also compared to samples that undergo nuclease digestion in the presence of proteins in order to find enriched protein-bound sequences. Combined, these four libraries provide a comprehensive, transcriptome-wide view of RNA secondary structure and RNA protein interaction sites from a single experimental technique. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
Sequence Memory Constraints Give Rise to Language-Like Structure through Iterated Learning.
Cornish, Hannah; Dale, Rick; Kirby, Simon; Christiansen, Morten H
2017-01-01
Human language is composed of sequences of reusable elements. The origins of the sequential structure of language is a hotly debated topic in evolutionary linguistics. In this paper, we show that sets of sequences with language-like statistical properties can emerge from a process of cultural evolution under pressure from chunk-based memory constraints. We employ a novel experimental task that is non-linguistic and non-communicative in nature, in which participants are trained on and later asked to recall a set of sequences one-by-one. Recalled sequences from one participant become training data for the next participant. In this way, we simulate cultural evolution in the laboratory. Our results show a cumulative increase in structure, and by comparing this structure to data from existing linguistic corpora, we demonstrate a close parallel between the sets of sequences that emerge in our experiment and those seen in natural language.
A Generative Angular Model of Protein Structure Evolution
Golden, Michael; García-Portugués, Eduardo; Sørensen, Michael; Mardia, Kanti V.; Hamelryck, Thomas; Hein, Jotun
2017-01-01
Abstract Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modeling both “smooth” conformational changes and “catastrophic” conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence–structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof. PMID:28453724
Evolutionary optimization of biopolymers and sequence structure maps
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reidys, C.M.; Kopp, S.; Schuster, P.
1996-06-01
Searching for biopolymers having a predefined function is a core problem of biotechnology, biochemistry and pharmacy. On the level of RNA sequences and their corresponding secondary structures we show that this problem can be analyzed mathematically. The strategy will be to study the properties of the RNA sequence to secondary structure mapping that is essential for the understanding of the search process. We show that to each secondary structure s there exists a neutral network consisting of all sequences folding into s. This network can be modeled as a random graph and has the following generic properties: it is densemore » and has a giant component within the graph of compatible sequences. The neutral network percolates sequence space and any two neutral nets come close in terms of Hamming distance. We investigate the distribution of the orders of neutral nets and show that above a certain threshold the topology of neutral nets allows to find practically all frequent secondary structures.« less
An algorithm to compute the sequency ordered Walsh transform
NASA Technical Reports Server (NTRS)
Larsen, H.
1976-01-01
A fast sequency-ordered Walsh transform algorithm is presented; this sequency-ordered fast transform is complementary to the sequency-ordered fast Walsh transform introduced by Manz (1972) and eliminating gray code reordering through a modification of the basic fast Hadamard transform structure. The new algorithm retains the advantages of its complement (it is in place and is its own inverse), while differing in having a decimation-in time structure, accepting data in normal order, and returning the coefficients in bit-reversed sequency order. Applications include estimation of Walsh power spectra for a random process, sequency filtering and computing logical autocorrelations, and selective bit reversing.
LncRNA Structural Characteristics in Epigenetic Regulation
Wang, Chenguang; Wang, Lianzong; Ding, Yu; Lu, Xiaoyan; Zhang, Guosi; Yang, Jiaxin; Zheng, Hewei; Wang, Hong; Jiang, Yongshuai; Xu, Liangde
2017-01-01
The rapid development of new generation sequencing technology has deepened the understanding of genomes and functional products. RNA-sequencing studies in mammals show that approximately 85% of the DNA sequences have RNA products, for which the length greater than 200 nucleotides (nt) is called long non-coding RNAs (lncRNA). LncRNAs now have been shown to play important epigenetic regulatory roles in key molecular processes, such as gene expression, genetic imprinting, histone modification, chromatin dynamics, and other activities by forming specific structures and interacting with all kinds of molecules. This paper mainly discusses the correlation between the structure and function of lncRNAs with the recent progress in epigenetic regulation, which is important to the understanding of the mechanism of lncRNAs in physiological and pathological processes. PMID:29292750
Protein structure determination by exhaustive search of Protein Data Bank derived databases.
Stokes-Rees, Ian; Sliz, Piotr
2010-12-14
Parallel sequence and structure alignment tools have become ubiquitous and invaluable at all levels in the study of biological systems. We demonstrate the application and utility of this same parallel search paradigm to the process of protein structure determination, benefitting from the large and growing corpus of known structures. Such searches were previously computationally intractable. Through the method of Wide Search Molecular Replacement, developed here, they can be completed in a few hours with the aide of national-scale federated cyberinfrastructure. By dramatically expanding the range of models considered for structure determination, we show that small (less than 12% structural coverage) and low sequence identity (less than 20% identity) template structures can be identified through multidimensional template scoring metrics and used for structure determination. Many new macromolecular complexes can benefit significantly from such a technique due to the lack of known homologous protein folds or sequences. We demonstrate the effectiveness of the method by determining the structure of a full-length p97 homologue from Trichoplusia ni. Example cases with the MHC/T-cell receptor complex and the EmoB protein provide systematic estimates of minimum sequence identity, structure coverage, and structural similarity required for this method to succeed. We describe how this structure-search approach and other novel computationally intensive workflows are made tractable through integration with the US national computational cyberinfrastructure, allowing, for example, rapid processing of the entire Structural Classification of Proteins protein fragment database.
Analyzing Student Inquiry Data Using Process Discovery and Sequence Classification
ERIC Educational Resources Information Center
Emond, Bruno; Buffett, Scott
2015-01-01
This paper reports on results of applying process discovery mining and sequence classification mining techniques to a data set of semi-structured learning activities. The main research objective is to advance educational data mining to model and support self-regulated learning in heterogeneous environments of learning content, activities, and…
Molecular Structure and Sequence in Complex Coacervates
NASA Astrophysics Data System (ADS)
Sing, Charles; Lytle, Tyler; Madinya, Jason; Radhakrishna, Mithun
Oppositely-charged polyelectrolytes in aqueous solution can undergo associative phase separation, in a process known as complex coacervation. This results in a polyelectrolyte-dense phase (coacervate) and polyelectrolyte-dilute phase (supernatant). There remain challenges in understanding this process, despite a long history in polymer physics. We use Monte Carlo simulation to demonstrate that molecular features (charge spacing, size) play a crucial role in governing the equilibrium in coacervates. We show how these molecular features give rise to strong monomer sequence effects, due to a combination of counterion condensation and correlation effects. We distinguish between structural and sequence-based correlations, which can be designed to tune the phase diagram of coacervation. Sequence effects further inform the physical understanding of coacervation, and provide the basis for new coacervation models that take monomer-level features into account.
Yang, Qin; Gilmartin, Gregory M.; Doublié, Sylvie
2010-01-01
Human Cleavage Factor Im (CFIm) is an essential component of the pre-mRNA 3′ processing complex that functions in the regulation of poly(A) site selection through the recognition of UGUA sequences upstream of the poly(A) site. Although the highly conserved 25 kDa subunit (CFIm25) of the CFIm complex possesses a characteristic α/β/α Nudix fold, CFIm25 has no detectable hydrolase activity. Here we report the crystal structures of the human CFIm25 homodimer in complex with UGUAAA and UUGUAU RNA sequences. CFIm25 is the first Nudix protein to be reported to bind RNA in a sequence-specific manner. The UGUA sequence contributes to binding specificity through an intramolecular G:A Watson–Crick/sugar-edge base interaction, an unusual pairing previously found to be involved in the binding specificity of the SAM-III riboswitch. The structures, together with mutational data, suggest a novel mechanism for the simultaneous sequence-specific recognition of two UGUA elements within the pre-mRNA. Furthermore, the mutually exclusive binding of RNA and the signaling molecule Ap4A (diadenosine tetraphosphate) by CFIm25 suggests a potential role for small molecules in the regulation of mRNA 3′ processing. PMID:20479262
Yang, Qin; Gilmartin, Gregory M; Doublié, Sylvie
2010-06-01
Human Cleavage Factor Im (CFI(m)) is an essential component of the pre-mRNA 3' processing complex that functions in the regulation of poly(A) site selection through the recognition of UGUA sequences upstream of the poly(A) site. Although the highly conserved 25 kDa subunit (CFI(m)25) of the CFI(m) complex possesses a characteristic alpha/beta/alpha Nudix fold, CFI(m)25 has no detectable hydrolase activity. Here we report the crystal structures of the human CFI(m)25 homodimer in complex with UGUAAA and UUGUAU RNA sequences. CFI(m)25 is the first Nudix protein to be reported to bind RNA in a sequence-specific manner. The UGUA sequence contributes to binding specificity through an intramolecular G:A Watson-Crick/sugar-edge base interaction, an unusual pairing previously found to be involved in the binding specificity of the SAM-III riboswitch. The structures, together with mutational data, suggest a novel mechanism for the simultaneous sequence-specific recognition of two UGUA elements within the pre-mRNA. Furthermore, the mutually exclusive binding of RNA and the signaling molecule Ap(4)A (diadenosine tetraphosphate) by CFI(m)25 suggests a potential role for small molecules in the regulation of mRNA 3' processing.
ASSET: Analysis of Sequences of Synchronous Events in Massively Parallel Spike Trains
Canova, Carlos; Denker, Michael; Gerstein, George; Helias, Moritz
2016-01-01
With the ability to observe the activity from large numbers of neurons simultaneously using modern recording technologies, the chance to identify sub-networks involved in coordinated processing increases. Sequences of synchronous spike events (SSEs) constitute one type of such coordinated spiking that propagates activity in a temporally precise manner. The synfire chain was proposed as one potential model for such network processing. Previous work introduced a method for visualization of SSEs in massively parallel spike trains, based on an intersection matrix that contains in each entry the degree of overlap of active neurons in two corresponding time bins. Repeated SSEs are reflected in the matrix as diagonal structures of high overlap values. The method as such, however, leaves the task of identifying these diagonal structures to visual inspection rather than to a quantitative analysis. Here we present ASSET (Analysis of Sequences of Synchronous EvenTs), an improved, fully automated method which determines diagonal structures in the intersection matrix by a robust mathematical procedure. The method consists of a sequence of steps that i) assess which entries in the matrix potentially belong to a diagonal structure, ii) cluster these entries into individual diagonal structures and iii) determine the neurons composing the associated SSEs. We employ parallel point processes generated by stochastic simulations as test data to demonstrate the performance of the method under a wide range of realistic scenarios, including different types of non-stationarity of the spiking activity and different correlation structures. Finally, the ability of the method to discover SSEs is demonstrated on complex data from large network simulations with embedded synfire chains. Thus, ASSET represents an effective and efficient tool to analyze massively parallel spike data for temporal sequences of synchronous activity. PMID:27420734
Christiansen, Morten H.; Conway, Christopher M.; Onnis, Luca
2011-01-01
We used event-related potentials (ERPs) to investigate the time course and distribution of brain activity while adults performed (a) a sequential learning task involving complex structured sequences, and (b) a language processing task. The same positive ERP deflection, the P600 effect, typically linked to difficult or ungrammatical syntactic processing, was found for structural incongruencies in both sequential learning as well as natural language, and with similar topographical distributions. Additionally, a left anterior negativity (LAN) was observed for language but not for sequential learning. These results are interpreted as an indication that the P600 provides an index of violations and the cost of integration of expectations for upcoming material when processing complex sequential structure. We conclude that the same neural mechanisms may be recruited for both syntactic processing of linguistic stimuli and sequential learning of structured sequence patterns more generally. PMID:23678205
Stewart, Mikaela; Dunlap, Tori; Dourlain, Elizabeth; Grant, Bryce; McFail-Isom, Lori
2013-01-01
The fine conformational subtleties of DNA structure modulate many fundamental cellular processes including gene activation/repression, cellular division, and DNA repair. Most of these cellular processes rely on the conformational heterogeneity of specific DNA sequences. Factors including those structural characteristics inherent in the particular base sequence as well as those induced through interaction with solvent components combine to produce fine DNA structural variation including helical flexibility and conformation. Cation-pi interactions between solvent cations or their first hydration shell waters and the faces of DNA bases form sequence selectively and contribute to DNA structural heterogeneity. In this paper, we detect and characterize the binding patterns found in cation-pi interactions between solvent cations and DNA bases in a set of high resolution x-ray crystal structures. Specifically, we found that monovalent cations (Tl+) and the polarized first hydration shell waters of divalent cations (Mg2+, Ca2+) form cation-pi interactions with DNA bases stabilizing unstacked conformations. When these cation-pi interactions are combined with electrostatic interactions a pattern of specific binding motifs is formed within the grooves. PMID:23940752
Stewart, Mikaela; Dunlap, Tori; Dourlain, Elizabeth; Grant, Bryce; McFail-Isom, Lori
2013-01-01
The fine conformational subtleties of DNA structure modulate many fundamental cellular processes including gene activation/repression, cellular division, and DNA repair. Most of these cellular processes rely on the conformational heterogeneity of specific DNA sequences. Factors including those structural characteristics inherent in the particular base sequence as well as those induced through interaction with solvent components combine to produce fine DNA structural variation including helical flexibility and conformation. Cation-pi interactions between solvent cations or their first hydration shell waters and the faces of DNA bases form sequence selectively and contribute to DNA structural heterogeneity. In this paper, we detect and characterize the binding patterns found in cation-pi interactions between solvent cations and DNA bases in a set of high resolution x-ray crystal structures. Specifically, we found that monovalent cations (Tl⁺) and the polarized first hydration shell waters of divalent cations (Mg²⁺, Ca²⁺) form cation-pi interactions with DNA bases stabilizing unstacked conformations. When these cation-pi interactions are combined with electrostatic interactions a pattern of specific binding motifs is formed within the grooves.
Learning of pitch and time structures in an artificial grammar setting.
Prince, Jon B; Stevens, Catherine J; Jones, Mari Riess; Tillmann, Barbara
2018-04-12
Despite the empirical evidence for the power of the cognitive capacity of implicit learning of structures and regularities in several modalities and materials, it remains controversial whether implicit learning extends to the learning of temporal structures and regularities. We investigated whether (a) an artificial grammar can be learned equally well when expressed in duration sequences as when expressed in pitch sequences, (b) learning of the artificial grammar in either duration or pitch (as the primary dimension) sequences can be influenced by the properties of the secondary dimension (invariant vs. randomized), and (c) learning can be boosted when the artificial grammar is expressed in both pitch and duration. After an exposure phase with grammatical sequences, learning in a subsequent test phase was assessed in a grammaticality judgment task. Participants in both the pitch and duration conditions showed incidental (not fully implicit) learning of the artificial grammar when the secondary dimension was invariant, but randomizing the pitch sequence prevented learning of the artificial grammar in duration sequences. Expressing the artificial grammar in both pitch and duration resulted in disproportionately better performance, suggesting an interaction between the learning of pitch and temporal structure. The findings are relevant to research investigating the learning of temporal structures and the learning of structures presented simultaneously in 2 dimensions (e.g., space and time, space and objects). By investigating learning, the findings provide further insight into the potential specificity of pitch and time processing, and their integrated versus independent processing, as previously debated in music cognition research. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Using the self-select paradigm to delineate the nature of speech motor programming.
Wright, David L; Robin, Don A; Rhee, Jooyhun; Vaculin, Amber; Jacks, Adam; Guenther, Frank H; Fox, Peter T
2009-06-01
The authors examined the involvement of 2 speech motor programming processes identified by S. T. Klapp (1995, 2003) during the articulation of utterances differing in syllable and sequence complexity. According to S. T. Klapp, 1 process, INT, resolves the demands of the programmed unit, whereas a second process, SEQ, oversees the serial order demands of longer sequences. A modified reaction time paradigm was used to assess INT and SEQ demands. Specifically, syllable complexity was dependent on syllable structure, whereas sequence complexity involved either repeated or unique syllabi within an utterance. INT execution was slowed when articulating single syllables in the form CCCV compared to simpler CV syllables. Planning unique syllables within a multisyllabic utterance rather than repetitions of the same syllable slowed INT but not SEQ. The INT speech motor programming process, important for mental syllabary access, is sensitive to changes in both syllable structure and the number of unique syllables in an utterance.
Learning predictive statistics from temporal sequences: Dynamics and strategies
Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew E.; Kourtzi, Zoe
2017-01-01
Human behavior is guided by our expectations about the future. Often, we make predictions by monitoring how event sequences unfold, even though such sequences may appear incomprehensible. Event structures in the natural environment typically vary in complexity, from simple repetition to complex probabilistic combinations. How do we learn these structures? Here we investigate the dynamics of structure learning by tracking human responses to temporal sequences that change in structure unbeknownst to the participants. Participants were asked to predict the upcoming item following a probabilistic sequence of symbols. Using a Markov process, we created a family of sequences, from simple frequency statistics (e.g., some symbols are more probable than others) to context-based statistics (e.g., symbol probability is contingent on preceding symbols). We demonstrate the dynamics with which individuals adapt to changes in the environment's statistics—that is, they extract the behaviorally relevant structures to make predictions about upcoming events. Further, we show that this structure learning relates to individual decision strategy; faster learning of complex structures relates to selection of the most probable outcome in a given context (maximizing) rather than matching of the exact sequence statistics. Our findings provide evidence for alternate routes to learning of behaviorally relevant statistics that facilitate our ability to predict future events in variable environments. PMID:28973111
Learning predictive statistics from temporal sequences: Dynamics and strategies.
Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew E; Kourtzi, Zoe
2017-10-01
Human behavior is guided by our expectations about the future. Often, we make predictions by monitoring how event sequences unfold, even though such sequences may appear incomprehensible. Event structures in the natural environment typically vary in complexity, from simple repetition to complex probabilistic combinations. How do we learn these structures? Here we investigate the dynamics of structure learning by tracking human responses to temporal sequences that change in structure unbeknownst to the participants. Participants were asked to predict the upcoming item following a probabilistic sequence of symbols. Using a Markov process, we created a family of sequences, from simple frequency statistics (e.g., some symbols are more probable than others) to context-based statistics (e.g., symbol probability is contingent on preceding symbols). We demonstrate the dynamics with which individuals adapt to changes in the environment's statistics-that is, they extract the behaviorally relevant structures to make predictions about upcoming events. Further, we show that this structure learning relates to individual decision strategy; faster learning of complex structures relates to selection of the most probable outcome in a given context (maximizing) rather than matching of the exact sequence statistics. Our findings provide evidence for alternate routes to learning of behaviorally relevant statistics that facilitate our ability to predict future events in variable environments.
Van de Cavey, Joris; Hartsuiker, Robert J
2016-01-01
Cognitive processing in many domains (e.g., sentence comprehension, music listening, and math solving) requires sequential information to be organized into an integrational structure. There appears to be some overlap in integrational processing across domains, as shown by cross-domain interference effects when for example linguistic and musical stimuli are jointly presented (Koelsch, Gunter, Wittfoth, & Sammler, 2005; Slevc, Rosenberg, & Patel, 2009). These findings support theories of overlapping resources for integrational processing across domains (cfr. SSIRH Patel, 2003; SWM, Kljajevic, 2010). However, there are some limitations to the studies mentioned above, such as the frequent use of unnaturalistic integrational difficulties. In recent years, the idea has risen that evidence for domain-generality in structural processing might also be yielded though priming paradigms (cfr. Scheepers, 2003). The rationale behind this is that integrational processing across domains regularly requires the processing of dependencies across short or long distances in the sequence, involving respectively less or more syntactic working memory resources (cfr. SWM, Kljajevic, 2010), and such processing decisions might persist over time. However, whereas recent studies have shown suggestive priming of integrational structure between language and arithmetics (though often dependent on arithmetic performance, cfr. Scheepers et al., 2011; Scheepers & Sturt, 2014), it remains to be investigated to what extent we can also find evidence for priming in other domains, such as music and action (cfr. SWM, Kljajevic, 2010). Experiment 1a showed structural priming from the processing of musical sequences onto the position in the sentence structure (early or late) to which a relative clause was attached in subsequent sentence completion. Importantly, Experiment 1b showed that a similar structural manipulation based on non-hierarchically ordered color sequences did not yield any priming effect, suggesting that the priming effect is not based on linear order, but integrational dependency. Finally, Experiment 2 presented primes in four domains (relative clause sentences, music, mathematics, and structured descriptions of actions), and consistently showed priming within and across domains. These findings provide clear evidence for domain-general structural processing mechanisms. Copyright © 2015 Elsevier B.V. All rights reserved.
Lorenzo, J Ramiro; Alonso, Leonardo G; Sánchez, Ignacio E
2015-01-01
Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage "Protein and nucleic acid structure and sequence analysis".
De Novo Protein Structure Prediction
NASA Astrophysics Data System (ADS)
Hung, Ling-Hong; Ngan, Shing-Chung; Samudrala, Ram
An unparalleled amount of sequence data is being made available from large-scale genome sequencing efforts. The data provide a shortcut to the determination of the function of a gene of interest, as long as there is an existing sequenced gene with similar sequence and of known function. This has spurred structural genomic initiatives with the goal of determining as many protein folds as possible (Brenner and Levitt, 2000; Burley, 2000; Brenner, 2001; Heinemann et al., 2001). The purpose of this is twofold: First, the structure of a gene product can often lead to direct inference of its function. Second, since the function of a protein is dependent on its structure, direct comparison of the structures of gene products can be more sensitive than the comparison of sequences of genes for detecting homology. Presently, structural determination by crystallography and NMR techniques is still slow and expensive in terms of manpower and resources, despite attempts to automate the processes. Computer structure prediction algorithms, while not providing the accuracy of the traditional techniques, are extremely quick and inexpensive and can provide useful low-resolution data for structure comparisons (Bonneau and Baker, 2001). Given the immense number of structures which the structural genomic projects are attempting to solve, there would be a considerable gain even if the computer structure prediction approach were applicable to a subset of proteins.
Ghosh, Pritha; Mathew, Oommen K; Sowdhamini, Ramanathan
2016-10-07
RNA-binding proteins (RBPs) interact with their cognate RNA(s) to form large biomolecular assemblies. They are versatile in their functionality and are involved in a myriad of processes inside the cell. RBPs with similar structural features and common biological functions are grouped together into families and superfamilies. It will be useful to obtain an early understanding and association of RNA-binding property of sequences of gene products. Here, we report a web server, RStrucFam, to predict the structure, type of cognate RNA(s) and function(s) of proteins, where possible, from mere sequence information. The web server employs Hidden Markov Model scan (hmmscan) to enable association to a back-end database of structural and sequence families. The database (HMMRBP) comprises of 437 HMMs of RBP families of known structure that have been generated using structure-based sequence alignments and 746 sequence-centric RBP family HMMs. The input protein sequence is associated with structural or sequence domain families, if structure or sequence signatures exist. In case of association of the protein with a family of known structures, output features like, multiple structure-based sequence alignment (MSSA) of the query with all others members of that family is provided. Further, cognate RNA partner(s) for that protein, Gene Ontology (GO) annotations, if any and a homology model of the protein can be obtained. The users can also browse through the database for details pertaining to each family, protein or RNA and their related information based on keyword search or RNA motif search. RStrucFam is a web server that exploits structurally conserved features of RBPs, derived from known family members and imprinted in mathematical profiles, to predict putative RBPs from sequence information. Proteins that fail to associate with such structure-centric families are further queried against the sequence-centric RBP family HMMs in the HMMRBP database. Further, all other essential information pertaining to an RBP, like overall function annotations, are provided. The web server can be accessed at the following link: http://caps.ncbs.res.in/rstrucfam .
Secondary Structure Predictions for Long RNA Sequences Based on Inversion Excursions and MapReduce.
Yehdego, Daniel T; Zhang, Boyu; Kodimala, Vikram K R; Johnson, Kyle L; Taufer, Michela; Leung, Ming-Ying
2013-05-01
Secondary structures of ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Experimental observations and computing limitations suggest that we can approach the secondary structure prediction problem for long RNA sequences by segmenting them into shorter chunks, predicting the secondary structures of each chunk individually using existing prediction programs, and then assembling the results to give the structure of the original sequence. The selection of cutting points is a crucial component of the segmenting step. Noting that stem-loops and pseudoknots always contain an inversion, i.e., a stretch of nucleotides followed closely by its inverse complementary sequence, we developed two cutting methods for segmenting long RNA sequences based on inversion excursions: the centered and optimized method. Each step of searching for inversions, chunking, and predictions can be performed in parallel. In this paper we use a MapReduce framework, i.e., Hadoop, to extensively explore meaningful inversion stem lengths and gap sizes for the segmentation and identify correlations between chunking methods and prediction accuracy. We show that for a set of long RNA sequences in the RFAM database, whose secondary structures are known to contain pseudoknots, our approach predicts secondary structures more accurately than methods that do not segment the sequence, when the latter predictions are possible computationally. We also show that, as sequences exceed certain lengths, some programs cannot computationally predict pseudoknots while our chunking methods can. Overall, our predicted structures still retain the accuracy level of the original prediction programs when compared with known experimental secondary structure.
Hu, Xihao; Wu, Yang; Lu, Zhi John; Yip, Kevin Y
2016-11-01
High-throughput sequencing has been used to study posttranscriptional regulations, where the identification of protein-RNA binding is a major and fast-developing sub-area, which is in turn benefited by the sequencing methods for whole-transcriptome probing of RNA secondary structures. In the study of RNA secondary structures using high-throughput sequencing, bases are modified or cleaved according to their structural features, which alter the resulting composition of sequencing reads. In the study of protein-RNA binding, methods have been proposed to immuno-precipitate (IP) protein-bound RNA transcripts in vitro or in vivo By sequencing these transcripts, the protein-RNA interactions and the binding locations can be identified. For both types of data, read counts are affected by a combination of confounding factors, including expression levels of transcripts, sequence biases, mapping errors and the probing or IP efficiency of the experimental protocols. Careful processing of the sequencing data and proper extraction of important features are fundamentally important to a successful analysis. Here we review and compare different experimental methods for probing RNA secondary structures and binding sites of RNA-binding proteins (RBPs), and the computational methods proposed for analyzing the corresponding sequencing data. We suggest how these two types of data should be integrated to study the structural properties of RBP binding sites as a systematic way to better understand posttranscriptional regulations. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-05-01
Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-01-01
Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616
Sensitivity to structure in action sequences: An infant event-related potential study.
Monroy, Claire D; Gerson, Sarah A; Domínguez-Martínez, Estefanía; Kaduk, Katharina; Hunnius, Sabine; Reid, Vincent
2017-05-06
Infants are sensitive to structure and patterns within continuous streams of sensory input. This sensitivity relies on statistical learning, the ability to detect predictable regularities in spatial and temporal sequences. Recent evidence has shown that infants can detect statistical regularities in action sequences they observe, but little is known about the neural process that give rise to this ability. In the current experiment, we combined electroencephalography (EEG) with eye-tracking to identify electrophysiological markers that indicate whether 8-11-month-old infants detect violations to learned regularities in action sequences, and to relate these markers to behavioral measures of anticipation during learning. In a learning phase, infants observed an actor performing a sequence featuring two deterministic pairs embedded within an otherwise random sequence. Thus, the first action of each pair was predictive of what would occur next. One of the pairs caused an action-effect, whereas the second did not. In a subsequent test phase, infants observed another sequence that included deviant pairs, violating the previously observed action pairs. Event-related potential (ERP) responses were analyzed and compared between the deviant and the original action pairs. Findings reveal that infants demonstrated a greater Negative central (Nc) ERP response to the deviant actions for the pair that caused the action-effect, which was consistent with their visual anticipations during the learning phase. Findings are discussed in terms of the neural and behavioral processes underlying perception and learning of structured action sequences. Copyright © 2017 Elsevier Ltd. All rights reserved.
Quantitative analysis and prediction of G-quadruplex forming sequences in double-stranded DNA
Kim, Minji; Kreig, Alex; Lee, Chun-Ying; Rube, H. Tomas; Calvert, Jacob; Song, Jun S.; Myong, Sua
2016-01-01
Abstract G-quadruplex (GQ) is a four-stranded DNA structure that can be formed in guanine-rich sequences. GQ structures have been proposed to regulate diverse biological processes including transcription, replication, translation and telomere maintenance. Recent studies have demonstrated the existence of GQ DNA in live mammalian cells and a significant number of potential GQ forming sequences in the human genome. We present a systematic and quantitative analysis of GQ folding propensity on a large set of 438 GQ forming sequences in double-stranded DNA by integrating fluorescence measurement, single-molecule imaging and computational modeling. We find that short minimum loop length and the thymine base are two main factors that lead to high GQ folding propensity. Linear and Gaussian process regression models further validate that the GQ folding potential can be predicted with high accuracy based on the loop length distribution and the nucleotide content of the loop sequences. Our study provides important new parameters that can inform the evaluation and classification of putative GQ sequences in the human genome. PMID:27095201
RaptorX server: a resource for template-based protein structure modeling.
Källberg, Morten; Margaryan, Gohar; Wang, Sheng; Ma, Jianzhu; Xu, Jinbo
2014-01-01
Assigning functional properties to a newly discovered protein is a key challenge in modern biology. To this end, computational modeling of the three-dimensional atomic arrangement of the amino acid chain is often crucial in determining the role of the protein in biological processes. We present a community-wide web-based protocol, RaptorX server ( http://raptorx.uchicago.edu ), for automated protein secondary structure prediction, template-based tertiary structure modeling, and probabilistic alignment sampling.Given a target sequence, RaptorX server is able to detect even remotely related template sequences by means of a novel nonlinear context-specific alignment potential and probabilistic consistency algorithm. Using the protocol presented here it is thus possible to obtain high-quality structural models for many target protein sequences when only distantly related protein domains have experimentally solved structures. At present, RaptorX server can perform secondary and tertiary structure prediction of a 200 amino acid target sequence in approximately 30 min.
Expectation, information processing, and subjective duration.
Simchy-Gross, Rhimmon; Margulis, Elizabeth Hellmuth
2018-01-01
In research on psychological time, it is important to examine the subjective duration of entire stimulus sequences, such as those produced by music (Teki, Frontiers in Neuroscience, 10, 2016). Yet research on the temporal oddball illusion (according to which oddball stimuli seem longer than standard stimuli of the same duration) has examined only the subjective duration of single events contained within sequences, not the subjective duration of sequences themselves. Does the finding that oddballs seem longer than standards translate to entire sequences, such that entire sequences that contain oddballs seem longer than those that do not? Is this potential translation influenced by the mode of information processing-whether people are engaged in direct or indirect temporal processing? Two experiments aimed to answer both questions using different manipulations of information processing. In both experiments, musical sequences either did or did not contain oddballs (auditory sliding tones). To manipulate information processing, we varied the task (Experiment 1), the sequence event structure (Experiments 1 and 2), and the sequence familiarity (Experiment 2) independently within subjects. Overall, in both experiments, the sequences that contained oddballs seemed shorter than those that did not when people were engaged in direct temporal processing, but longer when people were engaged in indirect temporal processing. These findings support the dual-process contingency model of time estimation (Zakay, Attention, Perception & Psychophysics, 54, 656-664, 1993). Theoretical implications for attention-based and memory-based models of time estimation, the pacemaker accumulator and coding efficiency hypotheses of time perception, and dynamic attending theory are discussed.
Occurrence probability of structured motifs in random sequences.
Robin, S; Daudin, J-J; Richard, H; Sagot, M-F; Schbath, S
2002-01-01
The problem of extracting from a set of nucleic acid sequences motifs which may have biological function is more and more important. In this paper, we are interested in particular motifs that may be implicated in the transcription process. These motifs, called structured motifs, are composed of two ordered parts separated by a variable distance and allowing for substitutions. In order to assess their statistical significance, we propose approximations of the probability of occurrences of such a structured motif in a given sequence. An application of our method to evaluate candidate promoters in E. coli and B. subtilis is presented. Simulations show the goodness of the approximations.
Using the Self-Select Paradigm to Delineate the Nature of Speech Motor Programming
Wright, David L.; Robin, Don A.; Rhee, Jooyhun; Vaculin, Amber; Jacks, Adam; Guenther, Frank H.; Fox, Peter T.
2015-01-01
Purpose The authors examined the involvement of 2 speech motor programming processes identified by S. T. Klapp (1995, 2003) during the articulation of utterances differing in syllable and sequence complexity. According to S. T. Klapp, 1 process, INT, resolves the demands of the programmed unit, whereas a second process, SEQ, oversees the serial order demands of longer sequences. Method A modified reaction time paradigm was used to assess INT and SEQ demands. Specifically, syllable complexity was dependent on syllable structure, whereas sequence complexity involved either repeated or unique syllabi within an utterance. Results INT execution was slowed when articulating single syllables in the form CCCV compared to simpler CV syllables. Planning unique syllables within a multisyllabic utterance rather than repetitions of the same syllable slowed INT but not SEQ. Conclusions The INT speech motor programming process, important for mental syllabary access, is sensitive to changes in both syllable structure and the number of unique syllables in an utterance. PMID:19474396
Spatial analysis of extension fracture systems: A process modeling approach
Ferguson, C.C.
1985-01-01
Little consensus exists on how best to analyze natural fracture spacings and their sequences. Field measurements and analyses published in geotechnical literature imply fracture processes radically different from those assumed by theoretical structural geologists. The approach adopted in this paper recognizes that disruption of rock layers by layer-parallel extension results in two spacing distributions, one representing layer-fragment lengths and another separation distances between fragments. These two distributions and their sequences reflect mechanics and history of fracture and separation. Such distributions and sequences, represented by a 2 ?? n matrix of lengthsL, can be analyzed using a method that is history sensitive and which yields also a scalar estimate of bulk extension, e (L). The method is illustrated by a series of Monte Carlo experiments representing a variety of fracture-and-separation processes, each with distinct implications for extension history. Resulting distributions of e (L)are process-specific, suggesting that the inverse problem of deducing fracture-and-separation history from final structure may be tractable. ?? 1985 Plenum Publishing Corporation.
Meaningful call combinations and compositional processing in the southern pied babbler
Engesser, Sabrina; Ridley, Amanda R.; Townsend, Simon W.
2016-01-01
Language’s expressive power is largely attributable to its compositionality: meaningful words are combined into larger/higher-order structures with derived meaning. Despite its importance, little is known regarding the evolutionary origins and emergence of this syntactic ability. Although previous research has shown a rudimentary capability to combine meaningful calls in primates, because of a scarcity of comparative data, it is unclear to what extent analog forms might also exist outside of primates. Here, we address this ambiguity and provide evidence for rudimentary compositionality in the discrete vocal system of a social passerine, the pied babbler (Turdoides bicolor). Natural observations and predator presentations revealed that babblers produce acoustically distinct alert calls in response to close, low-urgency threats and recruitment calls when recruiting group members during locomotion. On encountering terrestrial predators, both vocalizations are combined into a “mobbing sequence,” potentially to recruit group members in a dangerous situation. To investigate whether babblers process the sequence in a compositional way, we conducted systematic experiments, playing back the individual calls in isolation as well as naturally occurring and artificial sequences. Babblers reacted most strongly to mobbing sequence playbacks, showing a greater attentiveness and a quicker approach to the loudspeaker, compared with individual calls or control sequences. We conclude that the sequence constitutes a compositional structure, communicating information on both the context and the requested action. Our work supports previous research suggesting combinatoriality as a viable mechanism to increase communicative output and indicates that the ability to combine and process meaningful vocal structures, a basic syntax, may be more widespread than previously thought. PMID:27155011
Meaningful call combinations and compositional processing in the southern pied babbler.
Engesser, Sabrina; Ridley, Amanda R; Townsend, Simon W
2016-05-24
Language's expressive power is largely attributable to its compositionality: meaningful words are combined into larger/higher-order structures with derived meaning. Despite its importance, little is known regarding the evolutionary origins and emergence of this syntactic ability. Although previous research has shown a rudimentary capability to combine meaningful calls in primates, because of a scarcity of comparative data, it is unclear to what extent analog forms might also exist outside of primates. Here, we address this ambiguity and provide evidence for rudimentary compositionality in the discrete vocal system of a social passerine, the pied babbler (Turdoides bicolor). Natural observations and predator presentations revealed that babblers produce acoustically distinct alert calls in response to close, low-urgency threats and recruitment calls when recruiting group members during locomotion. On encountering terrestrial predators, both vocalizations are combined into a "mobbing sequence," potentially to recruit group members in a dangerous situation. To investigate whether babblers process the sequence in a compositional way, we conducted systematic experiments, playing back the individual calls in isolation as well as naturally occurring and artificial sequences. Babblers reacted most strongly to mobbing sequence playbacks, showing a greater attentiveness and a quicker approach to the loudspeaker, compared with individual calls or control sequences. We conclude that the sequence constitutes a compositional structure, communicating information on both the context and the requested action. Our work supports previous research suggesting combinatoriality as a viable mechanism to increase communicative output and indicates that the ability to combine and process meaningful vocal structures, a basic syntax, may be more widespread than previously thought.
Identification of sequence-structure RNA binding motifs for SELEX-derived aptamers.
Hoinka, Jan; Zotenko, Elena; Friedman, Adam; Sauna, Zuben E; Przytycka, Teresa M
2012-06-15
Systematic Evolution of Ligands by EXponential Enrichment (SELEX) represents a state-of-the-art technology to isolate single-stranded (ribo)nucleic acid fragments, named aptamers, which bind to a molecule (or molecules) of interest via specific structural regions induced by their sequence-dependent fold. This powerful method has applications in designing protein inhibitors, molecular detection systems, therapeutic drugs and antibody replacement among others. However, full understanding and consequently optimal utilization of the process has lagged behind its wide application due to the lack of dedicated computational approaches. At the same time, the combination of SELEX with novel sequencing technologies is beginning to provide the data that will allow the examination of a variety of properties of the selection process. To close this gap we developed, Aptamotif, a computational method for the identification of sequence-structure motifs in SELEX-derived aptamers. To increase the chances of identifying functional motifs, Aptamotif uses an ensemble-based approach. We validated the method using two published aptamer datasets containing experimentally determined motifs of increasing complexity. We were able to recreate the author's findings to a high degree, thus proving the capability of our approach to identify binding motifs in SELEX data. Additionally, using our new experimental dataset, we illustrate the application of Aptamotif to elucidate several properties of the selection process.
Refining the structure and content of clinical genomic reports.
Dorschner, Michael O; Amendola, Laura M; Shirts, Brian H; Kiedrowski, Lesli; Salama, Joseph; Gordon, Adam S; Fullerton, Stephanie M; Tarczy-Hornoch, Peter; Byers, Peter H; Jarvik, Gail P
2014-03-01
To effectively articulate the results of exome and genome sequencing we refined the structure and content of molecular test reports. To communicate results of a randomized control trial aimed at the evaluation of exome sequencing for clinical medicine, we developed a structured narrative report. With feedback from genetics and non-genetics professionals, we developed separate indication-specific and incidental findings reports. Standard test report elements were supplemented with research study-specific language, which highlighted the limitations of exome sequencing and provided detailed, structured results, and interpretations. The report format we developed to communicate research results can easily be transformed for clinical use by removal of research-specific statements and disclaimers. The development of clinical reports for exome sequencing has shown that accurate and open communication between the clinician and laboratory is ideally an ongoing process to address the increasing complexity of molecular genetic testing. © 2014 Wiley Periodicals, Inc.
Refining the Structure and Content of Clinical Genomic Reports
DORSCHNER, MICHAEL O.; AMENDOLA, LAURA M.; SHIRTS, BRIAN H.; KIEDROWSKI, LESLI; SALAMA, JOSEPH; GORDON, ADAM S.; FULLERTON, STEPHANIE M.; TARCZY-HORNOCH, PETER; BYERS, PETER H.; JARVIK, GAIL P.
2014-01-01
To effectively articulate the results of exome and genome sequencing we refined the structure and content of molecular test reports. To communicate results of a randomized control trial aimed at the evaluation of exome sequencing for clinical medicine, we developed a structured narrative report. With feedback from genetics and non-genetics professionals, we developed separate indication-specific and incidental findings reports. Standard test report elements were supplemented with research study-specific language, which highlighted the limitations of exome sequencing and provided detailed, structured results, and interpretations. The report format we developed to communicate research results can easily be transformed for clinical use by removal of research-specific statements and disclaimers. The development of clinical reports for exome sequencing has shown that accurate and open communication between the clinician and laboratory is ideally an ongoing process to address the increasing complexity of molecular genetic testing. PMID:24616401
Computer constructed imagery of distant plasma interaction boundaries
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grenstadt, E.W.; Schurr, H.D.; Tsugawa, R.K.
1982-01-01
Computer constructed sketches of plasma boundaries arising from the interaction between the solar wind and the magnetosphere can serve as both didactic and research tools. In particular, the structure of the earth's bow shock can be represented as a nonuniform surfce according to the instantaneous orientation of the IMF, and temporal changes in structural distribution can be modeled as a sequence of sketches based on observed sequences of spacecraft-based measurements. Viewed rapidly, such a sequence of sketches can be the basis for representation of plasma processes by computer animation.
Torque measurements reveal sequence-specific cooperative transitions in supercoiled DNA
Oberstrass, Florian C.; Fernandes, Louis E.; Bryant, Zev
2012-01-01
B-DNA becomes unstable under superhelical stress and is able to adopt a wide range of alternative conformations including strand-separated DNA and Z-DNA. Localized sequence-dependent structural transitions are important for the regulation of biological processes such as DNA replication and transcription. To directly probe the effect of sequence on structural transitions driven by torque, we have measured the torsional response of a panel of DNA sequences using single molecule assays that employ nanosphere rotational probes to achieve high torque resolution. The responses of Z-forming d(pGpC)n sequences match our predictions based on a theoretical treatment of cooperative transitions in helical polymers. “Bubble” templates containing 50–100 bp mismatch regions show cooperative structural transitions similar to B-DNA, although less torque is required to disrupt strand–strand interactions. Our mechanical measurements, including direct characterization of the torsional rigidity of strand-separated DNA, establish a framework for quantitative predictions of the complex torsional response of arbitrary sequences in their biological context. PMID:22474350
Fiebach, Christian J; Schubotz, Ricarda I
2006-05-01
This paper proposes a domain-general model for the functional contribution of ventral premotor cortex (PMv) and adjacent Broca's area to perceptual, cognitive, and motor processing. We propose to understand this frontal region as a highly flexible sequence processor, with the PMv mapping sequential events onto stored structural templates and Broca's Area involved in more complex, hierarchical or hypersequential processing. This proposal is supported by reference to previous functional neuroimaging studies investigating abstract sequence processing and syntactic processing.
DNA nanostructures: Through, rather than across
NASA Astrophysics Data System (ADS)
Bruchez, Marcel P.
2018-02-01
Dye molecules are shown to assemble into J-aggregate arrays by sequence-specific organization in the minor groove of DNA duplex sequences. Energy transfer through these structures displays the hallmarks of coherent coupling over distances that exceed those of conventional dipole-coupling processes.
Sequence co-evolution gives 3D contacts and structures of protein complexes
Hopf, Thomas A; Schärfe, Charlotta P I; Rodrigues, João P G L M; Green, Anna G; Kohlbacher, Oliver; Sander, Chris; Bonvin, Alexandre M J J; Marks, Debora S
2014-01-01
Protein–protein interactions are fundamental to many biological processes. Experimental screens have identified tens of thousands of interactions, and structural biology has provided detailed functional insight for select 3D protein complexes. An alternative rich source of information about protein interactions is the evolutionary sequence record. Building on earlier work, we show that analysis of correlated evolutionary sequence changes across proteins identifies residues that are close in space with sufficient accuracy to determine the three-dimensional structure of the protein complexes. We evaluate prediction performance in blinded tests on 76 complexes of known 3D structure, predict protein–protein contacts in 32 complexes of unknown structure, and demonstrate how evolutionary couplings can be used to distinguish between interacting and non-interacting protein pairs in a large complex. With the current growth of sequences, we expect that the method can be generalized to genome-wide elucidation of protein–protein interaction networks and used for interaction predictions at residue resolution. DOI: http://dx.doi.org/10.7554/eLife.03430.001 PMID:25255213
Automated use of mutagenesis data in structure prediction.
Nanda, Vikas; DeGrado, William F
2005-05-15
In the absence of experimental structural determination, numerous methods are available to indirectly predict or probe the structure of a target molecule. Genetic modification of a protein sequence is a powerful tool for identifying key residues involved in binding reactions or protein stability. Mutagenesis data is usually incorporated into the modeling process either through manual inspection of model compatibility with empirical data, or through the generation of geometric constraints linking sensitive residues to a binding interface. We present an approach derived from statistical studies of lattice models for introducing mutation information directly into the fitness score. The approach takes into account the phenotype of mutation (neutral or disruptive) and calculates the energy for a given structure over an ensemble of sequences. The structure prediction procedure searches for the optimal conformation where neutral sequences either have no impact or improve stability and disruptive sequences reduce stability relative to wild type. We examine three types of sequence ensembles: information from saturation mutagenesis, scanning mutagenesis, and homologous proteins. Incorporating multiple sequences into a statistical ensemble serves to energetically separate the native state and misfolded structures. As a result, the prediction of structure with a poor force field is sufficiently enhanced by mutational information to improve accuracy. Furthermore, by separating misfolded conformations from the target score, the ensemble energy serves to speed up conformational search algorithms such as Monte Carlo-based methods. Copyright 2005 Wiley-Liss, Inc.
Cross-correlation patterns in social opinion formation with sequential data
NASA Astrophysics Data System (ADS)
Chakrabarti, Anindya S.
2016-11-01
Recent research on large-scale internet data suggests existence of patterns in the collective behavior of billions of people even though each of them may pursue own activities. In this paper, we interpret online rating activity as a process of forming social opinion about individual items, where people sequentially choose a rating based on the current information set comprising all previous ratings and own preferences. We construct an opinion index from the sequence of ratings and we show that (1) movie-specific opinion converges much slower than an independent and identically distributed (i.i.d.) sequence of ratings, (2) rating sequence for individual movies shows lesser variation compared to an i.i.d. sequence of ratings, (3) the probability density function of the asymptotic opinions has more spread than that defined over opinion arising from i.i.d. sequence of ratings, (4) opinion sequences across movies are correlated with significantly higher and lower correlation compared to opinion constructed from i.i.d. sequence of ratings, creating a bimodal cross-correlation structure. By decomposing the temporal correlation structures from panel data of movie ratings, we show that the social effects are very prominent whereas group effects cannot be differentiated from those of surrogate data and individual effects are quite small. The former explains a large part of extreme positive or negative correlations between sequences of opinions. In general, this method can be applied to any rating data to extract social or group-specific effects in correlation structures. We conclude that in this particular case, social effects are important in opinion formation process.
Molecular dynamics studies on the DNA-binding process of ERG.
Beuerle, Matthias G; Dufton, Neil P; Randi, Anna M; Gould, Ian R
2016-11-15
The ETS family of transcription factors regulate gene targets by binding to a core GGAA DNA-sequence. The ETS factor ERG is required for homeostasis and lineage-specific functions in endothelial cells, some subset of haemopoietic cells and chondrocytes; its ectopic expression is linked to oncogenesis in multiple tissues. To date details of the DNA-binding process of ERG including DNA-sequence recognition outside the core GGAA-sequence are largely unknown. We combined available structural and experimental data to perform molecular dynamics simulations to study the DNA-binding process of ERG. In particular we were able to reproduce the ERG DNA-complex with a DNA-binding simulation starting in an unbound configuration with a final root-mean-square-deviation (RMSD) of 2.1 Å to the core ETS domain DNA-complex crystal structure. This allowed us to elucidate the relevance of amino acids involved in the formation of the ERG DNA-complex and to identify Arg385 as a novel key residue in the DNA-binding process. Moreover we were able to show that water-mediated hydrogen bonds are present between ERG and DNA in our simulations and that those interactions have the potential to achieve sequence recognition outside the GGAA core DNA-sequence. The methodology employed in this study shows the promising capabilities of modern molecular dynamics simulations in the field of protein DNA-interactions.
Applying Agrep to r-NSA to solve multiple sequences approximate matching.
Ni, Bing; Wong, Man-Hon; Lam, Chi-Fai David; Leung, Kwong-Sak
2014-01-01
This paper addresses the approximate matching problem in a database consisting of multiple DNA sequences, where the proposed approach applies Agrep to a new truncated suffix array, r-NSA. The construction time of the structure is linear to the database size, and the computations of indexing a substring in the structure are constant. The number of characters processed in applying Agrep is analysed theoretically, and the theoretical upper-bound can approximate closely the empirical number of characters, which is obtained through enumerating the characters in the actual structure built. Experiments are carried out using (synthetic) random DNA sequences, as well as (real) genome sequences including Hepatitis-B Virus and X-chromosome. Experimental results show that, compared to the straight-forward approach that applies Agrep to multiple sequences individually, the proposed approach solves the matching problem in much shorter time. The speed-up of our approach depends on the sequence patterns, and for highly similar homologous genome sequences, which are the common cases in real-life genomes, it can be up to several orders of magnitude.
Nomura, Nobuhiko; Nakamura, Kouji
2013-01-01
The Gram-positive anaerobic bacterium Clostridium perfringens is pathogenic to humans and animals, and the production of its toxins is strictly regulated during the exponential phase. We recently found that the 5′ leader sequence of the colA transcript encoding collagenase, which is a major toxin of this organism, is processed and stabilized in the presence of the small RNA VR-RNA. The primary colA 5′-untranslated region (5′UTR) forms a long stem-loop structure containing an internal bulge and masks its own ribosomal binding site. Here we found that VR-RNA directly regulates colA expression through base pairing with colA mRNA in vivo. However, when the internal bulge structure was closed by point mutations in colA mRNA, translation ceased despite the presence of VR-RNA. In addition, a mutation disrupting the colA stem-loop structure induced mRNA processing and ColA-FLAG translational activation in the absence of VR-RNA, indicating that the stem-loop and internal bulge structure of the colA 5′ leader sequence is important for regulation by VR-RNA. On the other hand, processing was required for maximal ColA expression but was not essential for VR-RNA-dependent colA regulation. Finally, colA processing and translational activation were induced at a high temperature without VR-RNA. These results suggest that inhibition of the colA 5′ leader structure through base pairing is the primary role of VR-RNA in colA regulation and that the colA 5′ leader structure is a possible thermosensor. PMID:23585542
PANGEA: pipeline for analysis of next generation amplicons.
Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz F W; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W
2010-07-01
High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including pre-processing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the chi(2) step, are joined into one program called the 'backbone'.
Dynamic peptide libraries for the discovery of supramolecular nanomaterials
NASA Astrophysics Data System (ADS)
Pappas, Charalampos G.; Shafi, Ramim; Sasselli, Ivan R.; Siccardi, Henry; Wang, Tong; Narang, Vishal; Abzalimov, Rinat; Wijerathne, Nadeesha; Ulijn, Rein V.
2016-11-01
Sequence-specific polymers, such as oligonucleotides and peptides, can be used as building blocks for functional supramolecular nanomaterials. The design and selection of suitable self-assembling sequences is, however, challenging because of the vast combinatorial space available. Here we report a methodology that allows the peptide sequence space to be searched for self-assembling structures. In this approach, unprotected homo- and heterodipeptides (including aromatic, aliphatic, polar and charged amino acids) are subjected to continuous enzymatic condensation, hydrolysis and sequence exchange to create a dynamic combinatorial peptide library. The free-energy change associated with the assembly process itself gives rise to selective amplification of self-assembling candidates. By changing the environmental conditions during the selection process, different sequences and consequent nanoscale morphologies are selected.
Dynamic peptide libraries for the discovery of supramolecular nanomaterials.
Pappas, Charalampos G; Shafi, Ramim; Sasselli, Ivan R; Siccardi, Henry; Wang, Tong; Narang, Vishal; Abzalimov, Rinat; Wijerathne, Nadeesha; Ulijn, Rein V
2016-11-01
Sequence-specific polymers, such as oligonucleotides and peptides, can be used as building blocks for functional supramolecular nanomaterials. The design and selection of suitable self-assembling sequences is, however, challenging because of the vast combinatorial space available. Here we report a methodology that allows the peptide sequence space to be searched for self-assembling structures. In this approach, unprotected homo- and heterodipeptides (including aromatic, aliphatic, polar and charged amino acids) are subjected to continuous enzymatic condensation, hydrolysis and sequence exchange to create a dynamic combinatorial peptide library. The free-energy change associated with the assembly process itself gives rise to selective amplification of self-assembling candidates. By changing the environmental conditions during the selection process, different sequences and consequent nanoscale morphologies are selected.
Current challenges in genome annotation through structural biology and bioinformatics.
Furnham, Nicholas; de Beer, Tjaart A P; Thornton, Janet M
2012-10-01
With the huge volume in genomic sequences being generated from high-throughout sequencing projects the requirement for providing accurate and detailed annotations of gene products has never been greater. It is proving to be a huge challenge for computational biologists to use as much information as possible from experimental data to provide annotations for genome data of unknown function. A central component to this process is to use experimentally determined structures, which provide a means to detect homology that is not discernable from just the sequence and permit the consequences of genomic variation to be realized at the molecular level. In particular, structures also form the basis of many bioinformatics methods for improving the detailed functional annotations of enzymes in combination with similarities in sequence and chemistry. Copyright © 2012. Published by Elsevier Ltd.
The language faculty that wasn't: a usage-based account of natural language recursion
Christiansen, Morten H.; Chater, Nick
2015-01-01
In the generative tradition, the language faculty has been shrinking—perhaps to include only the mechanism of recursion. This paper argues that even this view of the language faculty is too expansive. We first argue that a language faculty is difficult to reconcile with evolutionary considerations. We then focus on recursion as a detailed case study, arguing that our ability to process recursive structure does not rely on recursion as a property of the grammar, but instead emerges gradually by piggybacking on domain-general sequence learning abilities. Evidence from genetics, comparative work on non-human primates, and cognitive neuroscience suggests that humans have evolved complex sequence learning skills, which were subsequently pressed into service to accommodate language. Constraints on sequence learning therefore have played an important role in shaping the cultural evolution of linguistic structure, including our limited abilities for processing recursive structure. Finally, we re-evaluate some of the key considerations that have often been taken to require the postulation of a language faculty. PMID:26379567
The language faculty that wasn't: a usage-based account of natural language recursion.
Christiansen, Morten H; Chater, Nick
2015-01-01
In the generative tradition, the language faculty has been shrinking-perhaps to include only the mechanism of recursion. This paper argues that even this view of the language faculty is too expansive. We first argue that a language faculty is difficult to reconcile with evolutionary considerations. We then focus on recursion as a detailed case study, arguing that our ability to process recursive structure does not rely on recursion as a property of the grammar, but instead emerges gradually by piggybacking on domain-general sequence learning abilities. Evidence from genetics, comparative work on non-human primates, and cognitive neuroscience suggests that humans have evolved complex sequence learning skills, which were subsequently pressed into service to accommodate language. Constraints on sequence learning therefore have played an important role in shaping the cultural evolution of linguistic structure, including our limited abilities for processing recursive structure. Finally, we re-evaluate some of the key considerations that have often been taken to require the postulation of a language faculty.
Vicario, David S.
2017-01-01
Sensory and motor brain structures work in collaboration during perception. To evaluate their respective contributions, the present study recorded neural responses to auditory stimulation at multiple sites simultaneously in both the higher-order auditory area NCM and the premotor area HVC of the songbird brain in awake zebra finches (Taeniopygia guttata). Bird’s own song (BOS) and various conspecific songs (CON) were presented in both blocked and shuffled sequences. Neural responses showed plasticity in the form of stimulus-specific adaptation, with markedly different dynamics between the two structures. In NCM, the response decrease with repetition of each stimulus was gradual and long-lasting and did not differ between the stimuli or the stimulus presentation sequences. In contrast, HVC responses to CON stimuli decreased much more rapidly in the blocked than in the shuffled sequence. Furthermore, this decrease was more transient in HVC than in NCM, as shown by differential dynamics in the shuffled sequence. Responses to BOS in HVC decreased more gradually than to CON stimuli. The quality of neural representations, computed as the mutual information between stimuli and neural activity, was higher in NCM than in HVC. Conversely, internal functional correlations, estimated as the coherence between recording sites, were greater in HVC than in NCM. The cross-coherence between the two structures was weak and limited to low frequencies. These findings suggest that auditory communication signals are processed according to very different but complementary principles in NCM and HVC, a contrast that may inform study of the auditory and motor pathways for human speech processing. NEW & NOTEWORTHY Neural responses to auditory stimulation in sensory area NCM and premotor area HVC of the songbird forebrain show plasticity in the form of stimulus-specific adaptation with markedly different dynamics. These two structures also differ in stimulus representations and internal functional correlations. Accordingly, NCM seems to process the individually specific complex vocalizations of others based on prior familiarity, while HVC responses appear to be modulated by transitions and/or timing in the ongoing sequence of sounds. PMID:28031398
Unraveling the sequence and structure of the protein osteocalcin from a 42 ka fossil horse
NASA Astrophysics Data System (ADS)
Ostrom, Peggy H.; Gandhi, Hasand; Strahler, John R.; Walker, Angela K.; Andrews, Philip C.; Leykam, Joseph; Stafford, Thomas W.; Kelly, Robert L.; Walker, Danny N.; Buckley, Mike; Humpula, James
2006-04-01
We report the first complete amino acid sequence and evidence of secondary structure for osteocalcin from a temperate fossil. The osteocalcin derives from a 42 ka equid bone excavated from Juniper Cave, Wyoming. Results were determined by matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-MS) and Edman sequencing with independent confirmation of the sequence in two laboratories. The ancient sequence was compared to that of three modern taxa: horse ( Equus caballus), zebra ( Equus grevyi), and donkey ( Equus asinus). Although there was no difference in sequence among modern taxa, MALDI-MS and Edman sequencing show that residues 48 and 49 of our modern horse are Thr, Ala rather than Pro, Val as previously reported (Carstanjen B., Wattiez, R., Armory, H., Lepage, O.M., Remy, B., 2002. Isolation and characterization of equine osteocalcin. Ann. Med. Vet.146(1), 31-38). MALDI-MS and Edman sequencing data indicate that the osteocalcin sequence of the 42 ka fossil is similar to that of modern horse. Previously inaccessible structural attributes for ancient osteocalcin were observed. Glu 39 rather than Gln 39 is consistent with deamidation, a process known to occur during fossilization and aging. Two post-translational modifications were documented: Hyp 9 and a disulfide bridge. The latter suggests at least partial retention of secondary structure. As has been done for ancient DNA research, we recommend standards for preparation and criteria for authenticating results of ancient protein sequencing.
RNA design using simulated SHAPE data.
Lotfi, Mohadeseh; Zare-Mirakabad, Fatemeh; Montaseri, Soheila
2018-05-03
It has long been established that in addition to being involved in protein translation, RNA plays essential roles in numerous other cellular processes, including gene regulation and DNA replication. Such roles are known to be dictated by higher-order structures of RNA molecules. It is therefore of prime importance to find an RNA sequence that can fold to acquire a particular function that is desirable for use in pharmaceuticals and basic research. The challenge of finding an RNA sequence for a given structure is known as the RNA design problem. Although there are several algorithms to solve this problem, they mainly consider hard constraints, such as minimum free energy, to evaluate the predicted sequences. Recently, SHAPE data has emerged as a new soft constraint for RNA secondary structure prediction. To take advantage of this new experimental constraint, we report here a new method for accurate design of RNA sequences based on their secondary structures using SHAPE data as pseudo-free energy. We then compare our algorithm with four others: INFO-RNA, ERD, MODENA and RNAifold 2.0. Our algorithm precisely predicts 26 out of 29 new sequences for the structures extracted from the Rfam dataset, while the other four algorithms predict no more than 22 out of 29. The proposed algorithm is comparable to the above algorithms on RNA-SSD datasets, where they can predict up to 33 appropriate sequences for RNA secondary structures out of 34.
Pearston, Douglas H.; Gordon, Mairi; Hardman, Norman
1985-01-01
A family of long, highly-repetitive sequences, referred to previously as `HpaII-repeats', dominates the genome of the eukaryotic slime mould Physarum polycephalum. These sequences are found exclusively in scrambled clusters. They account for about one-half of the total complement of repetitive DNA in Physarum, and represent the major sequence component found in hypermethylated, 20-50 kb segments of Physarum genomic DNA that fail to be cleaved using the restriction endonuclease HpaII. The structure of this abundant repetitive element was investigated by analysing cloned segments derived from the hypermethylated genomic DNA compartment. We show that the `HpaII-repeat' forms part of a larger repetitive DNA structure, ∼8.6 kb in length, with several structural features in common with recognised eukaryotic transposable genetic elements. Scrambled clusters of the sequence probably arise as a result of transposition-like events, during which the element preferentially recombines in either orientation with target sites located in other copies of the same repeated sequence. The target sites for transposition/recombination are not related in sequence but in all cases studied they are potentially capable of promoting the formation of small `cruciforms' or `Z-DNA' structures which might be recognised during the recombination process. ImagesFig. 3.Fig. 4. PMID:16453652
Supplementary motor area as key structure for domain-general sequence processing: A unified account.
Cona, Giorgia; Semenza, Carlo
2017-01-01
The Supplementary Motor Area (SMA) is considered as an anatomically and functionally heterogeneous region and is implicated in several functions. We propose that SMA plays a crucial role in domain-general sequence processes, contributing to the integration of sequential elements into higher-order representations regardless of the nature of such elements (e.g., motor, temporal, spatial, numerical, linguistic, etc.). This review emphasizes the domain-general involvement of the SMA, as this region has been found to support sequence operations in a variety of cognitive domains that, albeit different, share an inherent sequence processing. These include action, time and spatial processing, numerical cognition, music and language processing, and working memory. In this light, we reviewed and synthesized recent neuroimaging, stimulation and electrophysiological studies in order to compare and reconcile the distinct sources of data by proposing a unifying account for the role of the SMA. We also discussed the differential contribution of the pre-SMA and SMA-proper in sequence operations, and possible neural mechanisms by which such operations are executed. Copyright © 2016 Elsevier Ltd. All rights reserved.
A Tentative Application Of Morphological Filters To Time-Varying Images
NASA Astrophysics Data System (ADS)
Billard, D.; Poquillon, B.
1989-03-01
In this paper, morphological filters, which are commonly used to process either 2D or multidimensional static images, are generalized to the analysis of time-varying image sequence. The introduction of the time dimension induces then interesting prop-erties when designing such spatio-temporal morphological filters. In particular, the specification of spatio-temporal structuring ele-ments (equivalent to time-varying spatial structuring elements) can be adjusted according to the temporal variations of the image sequences to be processed : this allows to derive specific morphological transforms to perform noise filtering or moving objects discrimination on dynamic images viewed by a non-stationary sensor. First, a brief introduction to the basic principles underlying morphological filters will be given. Then, a straightforward gener-alization of these principles to time-varying images will be pro-posed. This will lead us to define spatio-temporal opening and closing and to introduce some of their possible applications to process dynamic images. At last, preliminary results obtained us-ing a natural forward looking infrared (FUR) image sequence are presented.
The Influence of Task-Irrelevant Music on Language Processing: Syntactic and Semantic Structures
Hoch, Lisianne; Poulin-Charronnat, Benedicte; Tillmann, Barbara
2011-01-01
Recent research has suggested that music and language processing share neural resources, leading to new hypotheses about interference in the simultaneous processing of these two structures. The present study investigated the effect of a musical chord's tonal function on syntactic processing (Experiment 1) and semantic processing (Experiment 2) using a cross-modal paradigm and controlling for acoustic differences. Participants read sentences and performed a lexical decision task on the last word, which was, syntactically or semantically, expected or unexpected. The simultaneously presented (task-irrelevant) musical sequences ended on either an expected tonic or a less-expected subdominant chord. Experiment 1 revealed interactive effects between music-syntactic and linguistic-syntactic processing. Experiment 2 showed only main effects of both music-syntactic and linguistic-semantic expectations. An additional analysis over the two experiments revealed that linguistic violations interacted with musical violations, though not differently as a function of the type of linguistic violations. The present findings were discussed in light of currently available data on the processing of music as well as of syntax and semantics in language, leading to the hypothesis that resources might be shared for structural integration processes and sequencing. PMID:21713122
Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi
2016-06-15
Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5'-end processing and 3'-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA004502. yasu@bio.keio.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Fagot, Joël; De Lillo, Carlo
2011-12-01
Two experiments assessed if non-human primates can be meaningfully compared to humans in a non-verbal test of serial recall. A procedure was used that was derived from variations of the Corsi test, designed to test the effects of sequence structure and movement path length in humans. Two baboons were tested in Experiment 1. The monkeys showed several attributes of human serial recall. These included an easier recall of sequences with a shorter number of items and of sequences characterized by a shorter path length when the number of items was kept constant. However, the accuracy and speed of processing did not indicate that the monkeys were able to benefit from the spatiotemporal structure of sequences. Humans tested in Experiment 2 showed a quantitatively longer memory span, and, in contrast with monkeys, benefitted from sequence structure. The results are discussed in relation to differences in how human and non-human primates segment complex visual patterns. Copyright © 2011 Elsevier Ltd. All rights reserved.
Extraordinary Structured Noncoding RNAs Revealed by Bacterial Metagenome Analysis
Weinberg, Zasha; Perreault, Jonathan; Meyer, Michelle M.; Breaker, Ronald R.
2012-01-01
Estimates of the total number of bacterial species1-3 suggest that existing DNA sequence databases carry only a tiny fraction of the total amount of DNA sequence space represented by this division of life. Indeed, environmental DNA samples have been shown to encode many previously unknown classes of proteins4 and RNAs5. Bioinformatics searches6-10 of genomic DNA from bacteria commonly identify novel noncoding RNAs (ncRNAs)10-12 such as riboswitches13,14. In rare instances, RNAs that exhibit more extensive sequence and structural conservation across a wide range of bacteria are encountered15,16. Given that large structured RNAs are known to carry out complex biochemical functions such as protein synthesis and RNA processing reactions, identifying more RNAs of great size and intricate structure is likely to reveal additional biochemical functions that can be achieved by RNA. We applied an updated computational pipeline17 to discover ncRNAs that rival the known large ribozymes in size and structural complexity or that are among the most abundant RNAs in bacteria that encode them. These RNAs would have been difficult or impossible to detect without examining environmental DNA sequences, suggesting that numerous RNAs with extraordinary size, structural complexity, or other exceptional characteristics remain to be discovered in unexplored sequence space. PMID:19956260
Brain processing of meter and rhythm in music. Electrophysiological evidence of a common network.
Kuck, Heleln; Grossbach, Michael; Bangert, Marc; Altenmüller, Eckart
2003-11-01
To determine cortical structures involved in "global" meter and "local" rhythm processing, slow brain potentials (DC potentials) were recorded from the scalp of 18 musically trained subjects while listening to pairs of monophonic sequences with both metric structure and rhythmic variations. The second sequence could be either identical to or different from the first one. Differences were either of a metric or a rhythmic nature. The subjects' task was to judge whether the sequences were identical or not. During processing of the auditory tasks, brain activation patterns along with the subjects' performance were assessed using 32-channel DC electroencephalography. Data were statistically analyzed using MANOVA. Processing of both meter and rhythm produced sustained cortical activation over bilateral frontal and temporal brain regions. A shift towards right hemispheric activation was pronounced during presentation of the second stimulus. Processing of rhythmic differences yielded a more centroparietal activation compared to metric processing. These results do not support Lerdhal and Jackendoff's two-component model, predicting a dissociation of left hemispheric rhythm and right hemispheric meter processing. We suggest that the uniform right temporofrontal predominance reflects auditory working memory and a pattern recognition module, which participates in both rhythm and meter processing. More pronounced parietal activation during rhythm processing may be related to switching of task-solving strategies towards mental imagination of the score.
Sequencing proteins with transverse ionic transport in nanochannels.
Boynton, Paul; Di Ventra, Massimiliano
2016-05-03
De novo protein sequencing is essential for understanding cellular processes that govern the function of living organisms and all sequence modifications that occur after a protein has been constructed from its corresponding DNA code. By obtaining the order of the amino acids that compose a given protein one can then determine both its secondary and tertiary structures through structure prediction, which is used to create models for protein aggregation diseases such as Alzheimer's Disease. Here, we propose a new technique for de novo protein sequencing that involves translocating a polypeptide through a synthetic nanochannel and measuring the ionic current of each amino acid through an intersecting perpendicular nanochannel. We find that the distribution of ionic currents for each of the 20 proteinogenic amino acids encoded by eukaryotic genes is statistically distinct, showing this technique's potential for de novo protein sequencing.
Random sequences generation through optical measurements by phase-shifting interferometry
NASA Astrophysics Data System (ADS)
François, M.; Grosges, T.; Barchiesi, D.; Erra, R.; Cornet, A.
2012-04-01
The development of new techniques for producing random sequences with a high level of security is a challenging topic of research in modern cryptographics. The proposed method is based on the measurement by phase-shifting interferometry of the speckle signals of the interaction between light and structures. We show how the combination of amplitude and phase distributions (maps) under a numerical process can produce random sequences. The produced sequences satisfy all the statistical requirements of randomness and can be used in cryptographic schemes.
Present Day Biology seen in the Looking Glass of Physics of Complexity
NASA Astrophysics Data System (ADS)
Schuster, P.
Darwin's theory of variation and selection in its simplest form is directly applicable to RNA evolution in vitro as well as to virus evolution, and it allows for quantitative predictions. Understanding evolution at the molecular level is ultimately related to the central paradigm of structural biology: sequence⇒ structure ⇒ function. We elaborate on the state of the art in modeling and understanding evolution of RNA driven by reproduction and mutation. The focus will be laid on the landscape concept—originally introduced by Sewall Wright—and its application to problems in biology. The relation between genotypes and phenotypes is the result of two consecutive mappings from a space of genotypes called sequence space onto a space of phenotypes or structures, and fitness is the result of a mapping from phenotype space into non-negative real numbers. Realistic landscapes as derived from folding of RNA sequences into structures are characterized by two properties: (i) they are rugged in the sense that sequences lying nearby in sequence space may have very different fitness values and (ii) they are characterized by an appreciable degree of neutrality implying that a certain fraction of genotypes and/or phenotypes cannot be distinguished in the selection process. Evolutionary dynamics on realistic landscapes will be studied as a function of the mutation rate, and the role of neutrality in the selection process will be discussed.
Bacterial Inclusion Bodies Contain Amyloid-Like Structure
Wang, Lei; Maji, Samir K; Sawaya, Michael R; Eisenberg, David; Riek, Roland
2008-01-01
Protein aggregation is a process in which identical proteins self-associate into imperfectly ordered macroscopic entities. Such aggregates are generally classified as amorphous, lacking any long-range order, or highly ordered fibrils. Protein fibrils can be composed of native globular molecules, such as the hemoglobin molecules in sickle-cell fibrils, or can be reorganized β-sheet–rich aggregates, termed amyloid-like fibrils. Amyloid fibrils are associated with several pathological conditions in humans, including Alzheimer disease and diabetes type II. We studied the structure of bacterial inclusion bodies, which have been believed to belong to the amorphous class of aggregates. We demonstrate that all three in vivo-derived inclusion bodies studied are amyloid-like and comprised of amino-acid sequence-specific cross-β structure. These findings suggest that inclusion bodies are structured, that amyloid formation is an omnipresent process both in eukaryotes and prokaryotes, and that amino acid sequences evolve to avoid the amyloid conformation. PMID:18684013
Primary and secondary structural analyses of glutathione S-transferase pi from human placenta.
Ahmad, H; Wilson, D E; Fritz, R R; Singh, S V; Medh, R D; Nagle, G T; Awasthi, Y C; Kurosky, A
1990-05-01
The primary structure of glutathione S-transferase (GST) pi from a single human placenta was determined. The structure was established by chemical characterization of tryptic and cyanogen bromide peptides as well as automated sequence analysis of the intact enzyme. The structural analysis indicated that the protein is comprised of 209 amino acid residues and gave no evidence of post-translational modifications. The amino acid sequence differed from that of the deduced amino acid sequence determined by nucleotide sequence analysis of a cDNA clone (Kano, T., Sakai, M., and Muramatsu, M., 1987, Cancer Res. 47, 5626-5630) at position 104 which contained both valine and isoleucine whereas the deduced sequence from nucleotide sequence analysis identified only isoleucine at this position. These results demonstrated that in the one individual placenta studied at least two GST pi genes are coexpressed, probably as a result of allelomorphism. Computer assisted consensus sequence evaluation identified a hydrophobic region in GST pi (residues 155-181) that was predicted to be either a buried transmembrane helical region or a signal sequence region. The significance of this hydrophobic region was interpreted in relation to the mode of action of the enzyme especially in regard to the potential involvement of a histidine in the active site mechanism. A comparison of the chemical similarity of five known human GST complete enzyme structures, one of pi, one of mu, two of alpha, and one microsomal, gave evidence that all five enzymes have evolved by a divergent evolutionary process after gene duplication, with the microsomal enzyme representing the most divergent form.
Jmol-Enhanced Biochemistry Research Projects
ERIC Educational Resources Information Center
Saderholm, Matthew; Reynolds, Anthony
2011-01-01
We developed a protein research project for a one-semester biochemistry lecture class to enhance learning and more effectively train students to understand protein structure and function. During this semester-long process, students select a protein with known structure and then research its structure, sequence, and function. This project…
Crystal Structure of the HEAT Domain from the Pre-mRNA Processing Factor Symplekin
Kennedy, Sarah A.; Frazier, Monica L.; Steiniger, Mindy; Mast, Ann M.; Marzluff, William F.; Redinbo, Matthew R.
2009-01-01
The majority of eukaryotic pre-mRNAs are processed by 3′-end cleavage and polyadenylation, although in metazoa the replication-dependant histone mRNAs are processed by 3′-end cleavage but not polyadenylation. The macromolecular complex responsible for processing both canonical and histone pre-mRNAs contains the ~1,160-residue protein Symplekin. Secondary structural prediction algorithms identified putative HEAT domains in the 300 N-terminal residues of all Symplekins of known sequence. The structure and dynamics of this domain were investigated to begin elucidating the role Symplekin plays in mRNA maturation. The crystal structure of the Drosophila melanogaster Symplekin HEAT domain was determined to 2.4 Å resolution using SAD phasing methods. The structure exhibits 5 canonical HEAT repeats along with an extended 31 amino acid loop (loop 8) between the fourth and fifth repeat that is conserved within closely related Symplekin sequences. Molecular dynamics simulations of this domain show that the presence of loop 8 dampens correlated and anticorrelated motion in the HEAT domain, therefore providing a neutral surface for potential protein-protein interactions. HEAT domains are often employed for such macromolecular contacts. The Symplekin HEAT region not only structurally aligns with several established scaffolding proteins, but also has been reported to contact proteins essential for regulating 3′-end processing. Taken together, these data support the conclusion that the Symplekin HEAT domain serves as a scaffold for protein-protein interactions essential to the mRNA maturation process. PMID:19576221
A semi-supervised learning approach for RNA secondary structure prediction.
Yonemoto, Haruka; Asai, Kiyoshi; Hamada, Michiaki
2015-08-01
RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited. Copyright © 2015 Elsevier Ltd. All rights reserved.
Otsuka, Sachio; Saiki, Jun
2016-02-01
Prior studies have shown that visual statistical learning (VSL) enhances familiarity (a type of memory) of sequences. How do statistical regularities influence the processing of each triplet element and inserted distractors that disrupt the regularity? Given that increased attention to triplets induced by VSL and inhibition of unattended triplets, we predicted that VSL would promote memory for each triplet constituent, and degrade memory for inserted stimuli. Across the first two experiments, we found that objects from structured sequences were more likely to be remembered than objects from random sequences, and that letters (Experiment 1) or objects (Experiment 2) inserted into structured sequences were less likely to be remembered than those inserted into random sequences. In the subsequent two experiments, we examined an alternative account for our results, whereby the difference in memory for inserted items between structured and random conditions is due to individuation of items within random sequences. Our findings replicated even when control letters (Experiment 3A) or objects (Experiment 3B) were presented before or after, rather than inserted into, random sequences. Our findings suggest that statistical learning enhances memory for each item in a regular set and impairs memory for items that disrupt the regularity. Copyright © 2015 Elsevier B.V. All rights reserved.
Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins.
Raimondi, Daniele; Orlando, Gabriele; Pancsa, Rita; Khan, Taushif; Vranken, Wim F
2017-08-18
Protein folding is a complex process that can lead to disease when it fails. Especially poorly understood are the very early stages of protein folding, which are likely defined by intrinsic local interactions between amino acids close to each other in the protein sequence. We here present EFoldMine, a method that predicts, from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. The method is based on early folding data from hydrogen deuterium exchange (HDX) data from NMR pulsed labelling experiments, and uses backbone and sidechain dynamics as well as secondary structure propensities as features. The EFoldMine predictions give insights into the folding process, as illustrated by a qualitative comparison with independent experimental observations. Furthermore, on a quantitative proteome scale, the predicted early folding residues tend to become the residues that interact the most in the folded structure, and they are often residues that display evolutionary covariation. The connection of the EFoldMine predictions with both folding pathway data and the folded protein structure suggests that the initial statistical behavior of the protein chain with respect to local structure formation has a lasting effect on its subsequent states.
Interactive computer graphics system for structural sizing and analysis of aircraft structures
NASA Technical Reports Server (NTRS)
Bendavid, D.; Pipano, A.; Raibstein, A.; Somekh, E.
1975-01-01
A computerized system for preliminary sizing and analysis of aircraft wing and fuselage structures was described. The system is based upon repeated application of analytical program modules, which are interactively interfaced and sequence-controlled during the iterative design process with the aid of design-oriented graphics software modules. The entire process is initiated and controlled via low-cost interactive graphics terminals driven by a remote computer in a time-sharing mode.
Bedbrook, Claire N; Yang, Kevin K; Rice, Austin J; Gradinaru, Viviana; Arnold, Frances H
2017-10-01
There is growing interest in studying and engineering integral membrane proteins (MPs) that play key roles in sensing and regulating cellular response to diverse external signals. A MP must be expressed, correctly inserted and folded in a lipid bilayer, and trafficked to the proper cellular location in order to function. The sequence and structural determinants of these processes are complex and highly constrained. Here we describe a predictive, machine-learning approach that captures this complexity to facilitate successful MP engineering and design. Machine learning on carefully-chosen training sequences made by structure-guided SCHEMA recombination has enabled us to accurately predict the rare sequences in a diverse library of channelrhodopsins (ChRs) that express and localize to the plasma membrane of mammalian cells. These light-gated channel proteins of microbial origin are of interest for neuroscience applications, where expression and localization to the plasma membrane is a prerequisite for function. We trained Gaussian process (GP) classification and regression models with expression and localization data from 218 ChR chimeras chosen from a 118,098-variant library designed by SCHEMA recombination of three parent ChRs. We use these GP models to identify ChRs that express and localize well and show that our models can elucidate sequence and structure elements important for these processes. We also used the predictive models to convert a naturally occurring ChR incapable of mammalian localization into one that localizes well.
Rice, Austin J.; Gradinaru, Viviana; Arnold, Frances H.
2017-01-01
There is growing interest in studying and engineering integral membrane proteins (MPs) that play key roles in sensing and regulating cellular response to diverse external signals. A MP must be expressed, correctly inserted and folded in a lipid bilayer, and trafficked to the proper cellular location in order to function. The sequence and structural determinants of these processes are complex and highly constrained. Here we describe a predictive, machine-learning approach that captures this complexity to facilitate successful MP engineering and design. Machine learning on carefully-chosen training sequences made by structure-guided SCHEMA recombination has enabled us to accurately predict the rare sequences in a diverse library of channelrhodopsins (ChRs) that express and localize to the plasma membrane of mammalian cells. These light-gated channel proteins of microbial origin are of interest for neuroscience applications, where expression and localization to the plasma membrane is a prerequisite for function. We trained Gaussian process (GP) classification and regression models with expression and localization data from 218 ChR chimeras chosen from a 118,098-variant library designed by SCHEMA recombination of three parent ChRs. We use these GP models to identify ChRs that express and localize well and show that our models can elucidate sequence and structure elements important for these processes. We also used the predictive models to convert a naturally occurring ChR incapable of mammalian localization into one that localizes well. PMID:29059183
Tracking prominent points in image sequences
NASA Astrophysics Data System (ADS)
Hahn, Michael
1994-03-01
Measuring image motion and inferring scene geometry and camera motion are main aspects of image sequence analysis. The determination of image motion and the structure-from-motion problem are tasks that can be addressed independently or in cooperative processes. In this paper we focus on tracking prominent points. High stability, reliability, and accuracy are criteria for the extraction of prominent points. This implies that tracking should work quite well with those features; unfortunately, the reality looks quite different. In the experimental investigations we processed a long sequence of 128 images. This mono sequence is taken in an outdoor environment at the experimental field of Mercedes Benz in Rastatt. Different tracking schemes are explored and the results with respect to stability and quality are reported.
Ran, Kun; Yang, Hongqiang; Sun, Xiaoli; Li, Qiang; Jiang, Qianqian; Zhang, Weiwei; Shen, Wei
2014-05-01
Vacuolar processing enzymes (VPEs) have received considerable attention recently, as they exhibit caspase-1-like cleavage activity and regulate the process of PCD. However, knowledge about their detailed characteristics and structures is relatively limited. In this study, a gamma vacuolar processing enzyme gene, MhVPEγ, has been isolated from the leaves of Malus hupehensis (Ramp) Rehd. var pinyiensis Jiang. MhVPEγ coded-translated protein sequence comprised of 494 amino acids with a signal peptide and a transmembrane helix structure at N-terminal, peptidase_C13 domain, and vacuolar sorting signal at C-terminal. Consequently, genomic walking approach was performed for the isolation of its upstream sequence. Computational analysis demonstrated several motifs of the promoter exhibiting hypothetic MeJA, ABA, and light-induced characteristics, as well as some typical domains universally discovered in promoter, such as TATA-box and CAAT-box. MhVPEγ transcript level was enhanced during wounding treatment, and WUN-motif, as one of the cis-acting regulatory elements existing in the upstream sequence perhaps regulates its expression. In silico-constructed 3D models revealed that MhCPYL successively interacts with MhVPEγ like that of "Induced Fit-Lock and Key" model, providing molecular conformation evidence that CPY is a direct substrate of VPEγ. This study is the first stride to understand the molecular mechanism of VPEγ and CPYL interactions.
PANGEA: pipeline for analysis of next generation amplicons
Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz FW; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W
2010-01-01
High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including preprocessing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the χ2 step, are joined into one program called the ‘backbone’. PMID:20182525
Folding thermodynamics of pseudoknotted chain conformations
Kopeikin, Zoia; Chen, Shi-Jie
2008-01-01
We develop a statistical mechanical framework for the folding thermodynamics of pseudoknotted structures. As applications of the theory, we investigate the folding stability and the free energy landscapes for both the thermal and the mechanical unfolding of pseudoknotted chains. For the mechanical unfolding process, we predict the force-extension curves, from which we can obtain the information about structural transitions in the unfolding process. In general, a pseudoknotted structure unfolds through multiple structural transitions. The interplay between the helix stems and the loops plays an important role in the folding stability of pseudoknots. For instance, variations in loop sizes can lead to the destabilization of some intermediate states and change the (equilibrium) folding pathways (e.g., two helix stems unfold either cooperatively or sequentially). In both thermal and mechanical unfolding, depending on the nucleotide sequence, misfolded intermediate states can emerge in the folding process. In addition, thermal and mechanical unfoldings often have different (equilibrium) pathways. For example, for certain sequences, the misfolded intermediates, which generally have longer tails, can fold, unfold, and refold again in the pulling process, which means that these intermediates can switch between two different average end-end extensions. PMID:16674261
2007-03-01
the P5abc subdomain of the tetrahymena thermophila ribozyme that was studied by Wu and Tinoco [24]. The results for the second sequence are found in...virus ribozyme that was studied by Lazinski et al. [25], for its regulation of self-cleavage activity. The results for the third sequence are found...mention the existence of eight possible mutations that provide the desired non-linear effect in the ribozyme structure, and this may explain the
An improved stochastic fractal search algorithm for 3D protein structure prediction.
Zhou, Changjun; Sun, Chuan; Wang, Bin; Wang, Xiaojun
2018-05-03
Protein structure prediction (PSP) is a significant area for biological information research, disease treatment, and drug development and so on. In this paper, three-dimensional structures of proteins are predicted based on the known amino acid sequences, and the structure prediction problem is transformed into a typical NP problem by an AB off-lattice model. This work applies a novel improved Stochastic Fractal Search algorithm (ISFS) to solve the problem. The Stochastic Fractal Search algorithm (SFS) is an effective evolutionary algorithm that performs well in exploring the search space but falls into local minimums sometimes. In order to avoid the weakness, Lvy flight and internal feedback information are introduced in ISFS. In the experimental process, simulations are conducted by ISFS algorithm on Fibonacci sequences and real peptide sequences. Experimental results prove that the ISFS performs more efficiently and robust in terms of finding the global minimum and avoiding getting stuck in local minimums.
Cohn, Neil; Kutas, Marta
2015-01-01
Inference has long been emphasized in the comprehension of verbal and visual narratives. Here, we measured event-related brain potentials to visual sequences designed to elicit inferential processing. In Impoverished sequences, an expressionless “onlooker” watches an undepicted event (e.g., person throws a ball for a dog, then watches the dog chase it) just prior to a surprising finale (e.g., someone else returns the ball), which should lead to an inference (i.e., the different person retrieved the ball). Implied sequences alter this narrative structure by adding visual cues to the critical panel such as a surprised facial expression to the onlooker implying they saw an unexpected, albeit undepicted, event. In contrast, Expected sequences show a predictable, but then confounded, event (i.e., dog retrieves ball, then different person returns it), and Explicit sequences depict the unexpected event (i.e., different person retrieves then returns ball). At the critical penultimate panel, sequences representing depicted events (Explicit, Expected) elicited a larger posterior positivity (P600) than the relatively passive events of an onlooker (Impoverished, Implied), though Implied sequences were slightly more positive than Impoverished sequences. At the subsequent and final panel, a posterior positivity (P600) was greater to images in Impoverished sequences than those in Explicit and Implied sequences, which did not differ. In addition, both sequence types requiring inference (Implied, Impoverished) elicited a larger frontal negativity than those explicitly depicting events (Expected, Explicit). These results show that neural processing differs for visual narratives omitting events versus those depicting events, and that the presence of subtle visual cues can modulate such effects presumably by altering narrative structure. PMID:26320706
Computationally mapping sequence space to understand evolutionary protein engineering.
Armstrong, Kathryn A; Tidor, Bruce
2008-01-01
Evolutionary protein engineering has been dramatically successful, producing a wide variety of new proteins with altered stability, binding affinity, and enzymatic activity. However, the success of such procedures is often unreliable, and the impact of the choice of protein, engineering goal, and evolutionary procedure is not well understood. We have created a framework for understanding aspects of the protein engineering process by computationally mapping regions of feasible sequence space for three small proteins using structure-based design protocols. We then tested the ability of different evolutionary search strategies to explore these sequence spaces. The results point to a non-intuitive relationship between the error-prone PCR mutation rate and the number of rounds of replication. The evolutionary relationships among feasible sequences reveal hub-like sequences that serve as particularly fruitful starting sequences for evolutionary search. Moreover, genetic recombination procedures were examined, and tradeoffs relating sequence diversity and search efficiency were identified. This framework allows us to consider the impact of protein structure on the allowed sequence space and therefore on the challenges that each protein presents to error-prone PCR and genetic recombination procedures.
Deresiewicz, R L; Flaxenburg, J; Leng, K; Kasper, D L
1996-01-01
To explore whether a novel staphylococcal clone or structural variant of toxic shock syndrome toxin 1 is associated with Kawasaki syndrome, six toxigenic strains of Staphylococcus aureus from Kawasaki syndrome patients were studied. The strains were divisible into two groups based on phenotypic and genotypic characteristics and are therefore unequivocally not clonal. Portions of the tstH genes of each strain were sequenced. Three were sequenced in their entirety, while the remainder were sequenced from codon 66 to codon 137 of the mature protein only. Two of the former group differed slightly in the sequences of their signal peptides relative to the sequence published for the tstH signal peptide. Those differences did not affect toxin processing or secretion. The sequenced portions of the regions encoding mature toxic shock syndrome toxin 1 were identical in all six strains and corresponded exactly to the published sequence of tstH. No evidence was found for the existence of a structural variant of tstH uniquely associated with Kawasaki syndrome. PMID:8757881
Pérez Sirkin, Daniela I; Lafont, Anne-Gaëlle; Kamech, Nédia; Somoza, Gustavo M; Vissio, Paula G; Dufour, Sylvie
2017-01-01
GnRH-associated peptide (GAP) is the C-terminal portion of the gonadotropin-releasing hormone (GnRH) preprohormone. Although it was reported in mammals that GAP may act as a prolactin-inhibiting factor and can be co-secreted with GnRH into the hypophyseal portal blood, GAP has been practically out of the research circuit for about 20 years. Comparative studies highlighted the low conservation of GAP primary amino acid sequences among vertebrates, contributing to consider that this peptide only participates in the folding or carrying process of GnRH. Considering that the three-dimensional (3D) structure of a protein may define its function, the aim of this study was to evaluate if GAP sequences and 3D structures are conserved in the vertebrate lineage. GAP sequences from various vertebrates were retrieved from databases. Analysis of primary amino acid sequence identity and similarity, molecular phylogeny, and prediction of 3D structures were performed. Amino acid sequence comparison and phylogeny analyses confirmed the large variation of GAP sequences throughout vertebrate radiation. In contrast, prediction of the 3D structure revealed a striking conservation of the 3D structure of GAP1 (GAP associated with the hypophysiotropic type 1 GnRH), despite low amino acid sequence conservation. This GAP1 peptide presented a typical helix-loop-helix (HLH) structure in all the vertebrate species analyzed. This HLH structure could also be predicted for GAP2 in some but not all vertebrate species and in none of the GAP3 analyzed. These results allowed us to infer that selective pressures have maintained GAP1 HLH structure throughout the vertebrate lineage. The conservation of the HLH motif, known to confer biological activity to various proteins, suggests that GAP1 peptides may exert some hypophysiotropic biological functions across vertebrate radiation.
Pérez Sirkin, Daniela I.; Lafont, Anne-Gaëlle; Kamech, Nédia; Somoza, Gustavo M.; Vissio, Paula G.; Dufour, Sylvie
2017-01-01
GnRH-associated peptide (GAP) is the C-terminal portion of the gonadotropin-releasing hormone (GnRH) preprohormone. Although it was reported in mammals that GAP may act as a prolactin-inhibiting factor and can be co-secreted with GnRH into the hypophyseal portal blood, GAP has been practically out of the research circuit for about 20 years. Comparative studies highlighted the low conservation of GAP primary amino acid sequences among vertebrates, contributing to consider that this peptide only participates in the folding or carrying process of GnRH. Considering that the three-dimensional (3D) structure of a protein may define its function, the aim of this study was to evaluate if GAP sequences and 3D structures are conserved in the vertebrate lineage. GAP sequences from various vertebrates were retrieved from databases. Analysis of primary amino acid sequence identity and similarity, molecular phylogeny, and prediction of 3D structures were performed. Amino acid sequence comparison and phylogeny analyses confirmed the large variation of GAP sequences throughout vertebrate radiation. In contrast, prediction of the 3D structure revealed a striking conservation of the 3D structure of GAP1 (GAP associated with the hypophysiotropic type 1 GnRH), despite low amino acid sequence conservation. This GAP1 peptide presented a typical helix-loop-helix (HLH) structure in all the vertebrate species analyzed. This HLH structure could also be predicted for GAP2 in some but not all vertebrate species and in none of the GAP3 analyzed. These results allowed us to infer that selective pressures have maintained GAP1 HLH structure throughout the vertebrate lineage. The conservation of the HLH motif, known to confer biological activity to various proteins, suggests that GAP1 peptides may exert some hypophysiotropic biological functions across vertebrate radiation. PMID:28878737
Aircraft stress sequence development: A complex engineering process made simple
NASA Technical Reports Server (NTRS)
Schrader, K. H.; Butts, D. G.; Sparks, W. A.
1994-01-01
Development of stress sequences for critical aircraft structure requires flight measured usage data, known aircraft loads, and established relationships between aircraft flight loads and structural stresses. Resulting cycle-by-cycle stress sequences can be directly usable for crack growth analysis and coupon spectra tests. Often, an expert in loads and spectra development manipulates the usage data into a typical sequence of representative flight conditions for which loads and stresses are calculated. For a fighter/trainer type aircraft, this effort is repeated many times for each of the fatigue critical locations (FCL) resulting in expenditure of numerous engineering hours. The Aircraft Stress Sequence Computer Program (ACSTRSEQ), developed by Southwest Research Institute under contract to San Antonio Air Logistics Center, presents a unique approach for making complex technical computations in a simple, easy to use method. The program is written in Microsoft Visual Basic for the Microsoft Windows environment.
Single molecule sequencing-guided scaffolding and correction of draft assemblies.
Zhu, Shenglong; Chen, Danny Z; Emrich, Scott J
2017-12-06
Although single molecule sequencing is still improving, the lengths of the generated sequences are inevitably an advantage in genome assembly. Prior work that utilizes long reads to conduct genome assembly has mostly focused on correcting sequencing errors and improving contiguity of de novo assemblies. We propose a disassembling-reassembling approach for both correcting structural errors in the draft assembly and scaffolding a target assembly based on error-corrected single molecule sequences. To achieve this goal, we formulate a maximum alternating path cover problem. We prove that this problem is NP-hard, and solve it by a 2-approximation algorithm. Our experimental results show that our approach can improve the structural correctness of target assemblies in the cost of some contiguity, even with smaller amounts of long reads. In addition, our reassembling process can also serve as a competitive scaffolder relative to well-established assembly benchmarks.
Tree-Structured Digital Organisms Model
NASA Astrophysics Data System (ADS)
Suzuki, Teruhiko; Nobesawa, Shiho; Tahara, Ikuo
Tierra and Avida are well-known models of digital organisms. They describe a life process as a sequence of computation codes. A linear sequence model may not be the only way to describe a digital organism, though it is very simple for a computer-based model. Thus we propose a new digital organism model based on a tree structure, which is rather similar to the generic programming. With our model, a life process is a combination of various functions, as if life in the real world is. This implies that our model can easily describe the hierarchical structure of life, and it can simulate evolutionary computation through mutual interaction of functions. We verified our model by simulations that our model can be regarded as a digital organism model according to its definitions. Our model even succeeded in creating species such as viruses and parasites.
Simulating protein folding initiation sites using an alpha-carbon-only knowledge-based force field
Buck, Patrick M.; Bystroff, Christopher
2015-01-01
Protein folding is a hierarchical process where structure forms locally first, then globally. Some short sequence segments initiate folding through strong structural preferences that are independent of their three-dimensional context in proteins. We have constructed a knowledge-based force field in which the energy functions are conditional on local sequence patterns, as expressed in the hidden Markov model for local structure (HMMSTR). Carbon-alpha force field (CALF) builds sequence specific statistical potentials based on database frequencies for α-carbon virtual bond opening and dihedral angles, pairwise contacts and hydrogen bond donor-acceptor pairs, and simulates folding via Brownian dynamics. We introduce hydrogen bond donor and acceptor potentials as α-carbon probability fields that are conditional on the predicted local sequence. Constant temperature simulations were carried out using 27 peptides selected as putative folding initiation sites, each 12 residues in length, representing several different local structure motifs. Each 0.6 μs trajectory was clustered based on structure. Simulation convergence or representativeness was assessed by subdividing trajectories and comparing clusters. For 21 of the 27 sequences, the largest cluster made up more than half of the total trajectory. Of these 21 sequences, 14 had cluster centers that were at most 2.6 Å root mean square deviation (RMSD) from their native structure in the corresponding full-length protein. To assess the adequacy of the energy function on nonlocal interactions, 11 full length native structures were relaxed using Brownian dynamics simulations. Equilibrated structures deviated from their native states but retained their overall topology and compactness. A simple potential that folds proteins locally and stabilizes proteins globally may enable a more realistic understanding of hierarchical folding pathways. PMID:19137613
The right inferior frontal gyrus processes nested non-local dependencies in music.
Cheung, Vincent K M; Meyer, Lars; Friederici, Angela D; Koelsch, Stefan
2018-02-28
Complex auditory sequences known as music have often been described as hierarchically structured. This permits the existence of non-local dependencies, which relate elements of a sequence beyond their temporal sequential order. Previous studies in music have reported differential activity in the inferior frontal gyrus (IFG) when comparing regular and irregular chord-transitions based on theories in Western tonal harmony. However, it is unclear if the observed activity reflects the interpretation of hierarchical structure as the effects are confounded by local irregularity. Using functional magnetic resonance imaging (fMRI), we found that violations to non-local dependencies in nested sequences of three-tone musical motifs in musicians elicited increased activity in the right IFG. This is in contrast to similar studies in language which typically report the left IFG in processing grammatical syntax. Effects of increasing auditory working demands are moreover reflected by distributed activity in frontal and parietal regions. Our study therefore demonstrates the role of the right IFG in processing non-local dependencies in music, and suggests that hierarchical processing in different cognitive domains relies on similar mechanisms that are subserved by domain-selective neuronal subpopulations.
Billoud, B; Kontic, M; Viari, A
1996-01-01
At the DNA/RNA level, biological signals are defined by a combination of spatial structures and sequence motifs. Until now, few attempts had been made in writing general purpose search programs that take into account both sequence and structure criteria. Indeed, the most successful structure scanning programs are usually dedicated to particular structures and are written using general purpose programming languages through a complex and time consuming process where the biological problem of defining the structure and the computer engineering problem of looking for it are intimately intertwined. In this paper, we describe a general representation of structures, suitable for database scanning, together with a programming language, Palingol, designed to manipulate it. Palingol has specific data types, corresponding to structural elements-basically helices-that can be arranged in any way to form a complex structure. As a consequence of the declarative approach used in Palingol, the user should only focus on 'what to search for' while the language engine takes care of 'how to look for it'. Therefore, it becomes simpler to write a scanning program and the structural constraints that define the required structure are more clearly identified. PMID:8628670
Accelerated probabilistic inference of RNA structure evolution
Holmes, Ian
2005-01-01
Background Pairwise stochastic context-free grammars (Pair SCFGs) are powerful tools for evolutionary analysis of RNA, including simultaneous RNA sequence alignment and secondary structure prediction, but the associated algorithms are intensive in both CPU and memory usage. The same problem is faced by other RNA alignment-and-folding algorithms based on Sankoff's 1985 algorithm. It is therefore desirable to constrain such algorithms, by pre-processing the sequences and using this first pass to limit the range of structures and/or alignments that can be considered. Results We demonstrate how flexible classes of constraint can be imposed, greatly reducing the computational costs while maintaining a high quality of structural homology prediction. Any score-attributed context-free grammar (e.g. energy-based scoring schemes, or conditionally normalized Pair SCFGs) is amenable to this treatment. It is now possible to combine independent structural and alignment constraints of unprecedented general flexibility in Pair SCFG alignment algorithms. We outline several applications to the bioinformatics of RNA sequence and structure, including Waterman-Eggert N-best alignments and progressive multiple alignment. We evaluate the performance of the algorithm on test examples from the RFAM database. Conclusion A program, Stemloc, that implements these algorithms for efficient RNA sequence alignment and structure prediction is available under the GNU General Public License. PMID:15790387
Dodge, D.A.; Beroza, G.C.; Ellsworth, W.L.
1996-01-01
We find that foreshocks provide clear evidence for an extended nucleation process before some earthquakes. In this study, we examine in detail the evolution of six California foreshock sequences, the 1986 Mount Lewis (ML, = 5.5), the 1986 Chalfant (ML = 6.4), the. 1986 Stone Canyon (ML = 4.7), the 1990 Upland (ML = 5.2), the 1992 Joshua Tree (MW= 6.1), and the 1992 Landers (MW = 7.3) sequence. Typically, uncertainties in hypocentral parameters are too large to establish the geometry of foreshock sequences and hence to understand their evolution. However, the similarity of location and focal mechanisms for the events in these sequences leads to similar foreshock waveforms that we cross correlate to obtain extremely accurate relative locations. We use these results to identify small-scale fault zone structures that could influence nucleation and to determine the stress evolution leading up to the mainshock. In general, these foreshock sequences are not compatible with a cascading failure nucleation model in which the foreshocks all occur on a single fault plane and trigger the mainshock by static stress transfer. Instead, the foreshocks seem to concentrate near structural discontinuities in the fault and may themselves be a product of an aseismic nucleation process. Fault zone heterogeneity may also be important in controlling the number of foreshocks, i.e., the stronger the heterogeneity, the greater the number of foreshocks. The size of the nucleation region, as measured by the extent of the foreshock sequence, appears to scale with mainshock moment in the same manner as determined independently by measurements of the seismic nucleation phase. We also find evidence for slip localization as predicted by some models of earthquake nucleation. Copyright 1996 by the American Geophysical Union.
A 3D sequence-independent representation of the protein data bank.
Fischer, D; Tsai, C J; Nussinov, R; Wolfson, H
1995-10-01
Here we address the following questions. How many structurally different entries are there in the Protein Data Bank (PDB)? How do the proteins populate the structural universe? To investigate these questions a structurally non-redundant set of representative entries was selected from the PDB. Construction of such a dataset is not trivial: (i) the considerable size of the PDB requires a large number of comparisons (there were more than 3250 structures of protein chains available in May 1994); (ii) the PDB is highly redundant, containing many structurally similar entries, not necessarily with significant sequence homology, and (iii) there is no clear-cut definition of structural similarity. The latter depend on the criteria and methods used. Here, we analyze structural similarity ignoring protein topology. To date, representative sets have been selected either by hand, by sequence comparison techniques which ignore the three-dimensional (3D) structures of the proteins or by using sequence comparisons followed by linear structural comparison (i.e. the topology, or the sequential order of the chains, is enforced in the structural comparison). Here we describe a 3D sequence-independent automated and efficient method to obtain a representative set of protein molecules from the PDB which contains all unique structures and which is structurally non-redundant. The method has two novel features. The first is the use of strictly structural criteria in the selection process without taking into account the sequence information. To this end we employ a fast structural comparison algorithm which requires on average approximately 2 s per pairwise comparison on a workstation. The second novel feature is the iterative application of a heuristic clustering algorithm that greatly reduces the number of comparisons required. We obtain a representative set of 220 chains with resolution better than 3.0 A, or 268 chains including lower resolution entries, NMR entries and models. The resulting set can serve as a basis for extensive structural classification and studies of 3D recurring motifs and of sequence-structure relationships. The clustering algorithm succeeds in classifying into the same structural family chains with no significant sequence homology, e.g. all the globins in one single group, all the trypsin-like serine proteases in another or all the immunoglobulin-like folds into a third. In addition, unexpected structural similarities of interest have been automatically detected between pairs of chains. A cluster analysis of the representative structures demonstrates the way the "structural universe' is populated.
Holland, M J; Holland, J P; Thill, G P; Jackson, K A
1981-02-10
Segments of yeast genomic DNA containing two enolase structural genes have been isolated by subculture cloning procedures using a cDNA hybridization probe synthesized from purified yeast enolase mRNA. Based on restriction endonuclease and transcriptional maps of these two segments of yeast DNA, each hybrid plasmid contains a region of extensive nucleotide sequence homology which forms hybrids with the cDNA probe. The DNA sequences which flank this homologous region in the two hybrid plasmids are nonhomologous indicating that these sequences are nontandemly repeated in the yeast genome. The complete nucleotide sequence of the coding as well as the flanking noncoding regions of these genes has been determined. The amino acid sequence predicted from one reading frame of both structural genes is extremely similar to that determined for yeast enolase (Chin, C. C. Q., Brewer, J. M., Eckard, E., and Wold, F. (1981) J. Biol. Chem. 256, 1370-1376), confirming that these isolated structural genes encode yeast enolase. The nucleotide sequences of the coding regions of the genes are approximately 95% homologous, and neither gene contains an intervening sequence. Codon utilization in the enolase genes follows the same biased pattern previously described for two yeast glyceraldehyde-3-phosphate dehydrogenase structural genes (Holland, J. P., and Holland, M. J. (1980) J. Biol. Chem. 255, 2596-2605). DNA blotting analysis confirmed that the isolated segments of yeast DNA are colinear with yeast genomic DNA and that there are two nontandemly repeated enolase genes per haploid yeast genome. The noncoding portions of the two enolase genes adjacent to the initiation and termination codons are approximately 70% homologous and contain sequences thought to be involved in the synthesis and processing messenger RNA. Finally there are regions of extensive homology between the two enolase structural genes and two yeast glyceraldehyde-3-phosphate dehydrogenase structural genes within the 5- noncoding portions of these glycolytic genes.
De Lillo, Carlo; Kirby, Melissa; Poole, Daniel
2016-01-01
Immediate serial spatial recall measures the ability to retain sequences of locations in short-term memory and is considered the spatial equivalent of digit span. It is tested by requiring participants to reproduce sequences of movements performed by an experimenter or displayed on a monitor. Different organizational factors dramatically affect serial spatial recall but they are often confounded or underspecified. Untangling them is crucial for the characterization of working-memory models and for establishing the contribution of structure and memory capacity to spatial span. We report five experiments assessing the relative role and independence of factors that have been reported in the literature. Experiment 1 disentangled the effects of spatial clustering and path-length by manipulating the distance of items displayed on a touchscreen monitor. Long-path sequences segregated by spatial clusters were compared with short-path sequences not segregated by clusters. Recall was more accurate for sequences segregated by clusters independently from path-length. Experiment 2 featured conditions where temporal pauses were introduced between or within cluster boundaries during the presentation of sequences with the same paths. Thus, the temporal structure of the sequences was either consistent or inconsistent with a hierarchical representation based on segmentation by spatial clusters but the effect of structure could not be confounded with effects of path-characteristics. Pauses at cluster boundaries yielded more accurate recall, as predicted by a hierarchical model. In Experiment 3, the systematic manipulation of sequence structure, path-length, and presence of path-crossings of sequences showed that structure explained most of the variance, followed by the presence/absence of path-crossings, and path-length. Experiments 4 and 5 replicated the results of the previous experiments in immersive virtual reality navigation tasks where the viewpoint of the observer changed dynamically during encoding and recall. This suggested that the effects of structure in spatial span are not dependent on perceptual grouping processes induced by the aerial view of the stimulus array typically afforded by spatial recall tasks. These results demonstrate the independence of coding strategies based on structure from effects of path characteristics and perceptual grouping in immediate serial spatial recall. PMID:27891101
Composite Structural Motifs of Binding Sites for Delineating Biological Functions of Proteins
Kinjo, Akira R.; Nakamura, Haruki
2012-01-01
Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs that represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures. PMID:22347478
ERIC Educational Resources Information Center
Massaro, Dominic W., Ed.
In an information-processing approach to language processing, language processing is viewed as a sequence of psychological stages that occur between the initial presentation of the language stimulus and the meaning in the mind of the language processor. This book defines each of the processes and structures involved, explains how each of them…
Kinact: a computational approach for predicting activating missense mutations in protein kinases.
Rodrigues, Carlos H M; Ascher, David B; Pires, Douglas E V
2018-05-21
Protein phosphorylation is tightly regulated due to its vital role in many cellular processes. While gain of function mutations leading to constitutive activation of protein kinases are known to be driver events of many cancers, the identification of these mutations has proven challenging. Here we present Kinact, a novel machine learning approach for predicting kinase activating missense mutations using information from sequence and structure. By adapting our graph-based signatures, Kinact represents both structural and sequence information, which are used as evidence to train predictive models. We show the combination of structural and sequence features significantly improved the overall accuracy compared to considering either primary or tertiary structure alone, highlighting their complementarity. Kinact achieved a precision of 87% and 94% and Area Under ROC Curve of 0.89 and 0.92 on 10-fold cross-validation, and on blind tests, respectively, outperforming well established tools (P < 0.01). We further show that Kinact performs equally well on homology models built using templates with sequence identity as low as 33%. Kinact is freely available as a user-friendly web server at http://biosig.unimelb.edu.au/kinact/.
Davey, James A; Chica, Roberto A
2015-04-01
Computational protein design (CPD) predictions are highly dependent on the structure of the input template used. However, it is unclear how small differences in template geometry translate to large differences in stability prediction accuracy. Herein, we explored how structural changes to the input template affect the outcome of stability predictions by CPD. To do this, we prepared alternate templates by Rotamer Optimization followed by energy Minimization (ROM) and used them to recapitulate the stability of 84 protein G domain β1 mutant sequences. In the ROM process, side-chain rotamers for wild-type (WT) or mutant sequences are optimized on crystal or nuclear magnetic resonance (NMR) structures prior to template minimization, resulting in alternate structures termed ROM templates. We show that use of ROM templates prepared from sequences known to be stable results predominantly in improved prediction accuracy compared to using the minimized crystal or NMR structures. Conversely, ROM templates prepared from sequences that are less stable than the WT reduce prediction accuracy by increasing the number of false positives. These observed changes in prediction outcomes are attributed to differences in side-chain contacts made by rotamers in ROM templates. Finally, we show that ROM templates prepared from sequences that are unfolded or that adopt a nonnative fold result in the selective enrichment of sequences that are also unfolded or that adopt a nonnative fold, respectively. Our results demonstrate the existence of a rotamer bias caused by the input template that can be harnessed to skew predictions toward sequences displaying desired characteristics. © 2014 The Protein Society.
Exploring the sequence-structure protein landscape in the glycosyltransferase family
Zhang, Ziding; Kochhar, Sunil; Grigorov, Martin
2003-01-01
To understand the molecular basis of glycosyltransferases’ (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBank and Swissprot. First, GTF sequences were retrieved and classified into clusters, based on sequence similarity only. Intracluster sequence similarity was chosen sufficiently high to ensure that the same fold is found within a given cluster. Then, a representative sequence from each cluster was selected to compose a subset of GTF sequences. The members of this reduced set were processed by three different fold recognition methods: 3D-PSSM, FUGUE, and GeneFold. Finally, the results from different fold recognition methods were analyzed and compared to sequence-similarity search methods (i.e., BLAST and PSI-BLAST). It was established that the folds of about 70% of all currently known GTF sequences can be confidently assigned by fold recognition methods, a value which is higher than the fold identification rate based on sequence comparison alone (48% for BLAST and 64% for PSI-BLAST). The identified folds were submitted to 3D clustering, and we found that most of the GTF sequences adopt the typical GTF A or GTF B folds. Our results indicate a lack of evidence that new GTF folds (i.e., folds other than GTF A and B) exist. Based on cases where fold identification was not possible, we suggest several sequences as the most promising targets for a structural genomics initiative focused on the GTF protein family. PMID:14500887
Flynn, Theodore M.; Koval, Jason C.; Greenwald, Stephanie M.; Owens, Sarah M.; Kemner, Kenneth M.; Antonopoulos, Dionysios A.
2017-01-01
We present DNA sequence data in FASTA-formatted files from aerobic environmental microcosms inoculated with a sole carbon source. DNA sequences are of 16S rRNA genes present in DNA extracted from each microcosm along with the environmental samples (soil, water) used to inoculate them. These samples were sequenced using the Illumina MiSeq platform at the Environmental Sample Preparation and Sequencing Facility at Argonne National Laboratory. This data is compatible with standard microbiome analysis pipelines (e.g., QIIME, mothur, etc.).
NASA Astrophysics Data System (ADS)
Tene, Yair; Tene, Noam; Tene, G.
1993-08-01
An interactive data fusion methodology of video, audio, and nonlinear structural dynamic analysis for potential application in forensic engineering is presented. The methodology was developed and successfully demonstrated in the analysis of heavy transportable bridge collapse during preparation for testing. Multiple bridge elements failures were identified after the collapse, including fracture, cracks and rupture of high performance structural materials. Videotape recording by hand held camcorder was the only source of information about the collapse sequence. The interactive data fusion methodology resulted in extracting relevant information form the videotape and from dynamic nonlinear structural analysis, leading to full account of the sequence of events during the bridge collapse.
Smola, Matthew J; Rice, Greggory M; Busan, Steven; Siegfried, Nathan A; Weeks, Kevin M
2015-11-01
Selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) chemistries exploit small electrophilic reagents that react with 2'-hydroxyl groups to interrogate RNA structure at single-nucleotide resolution. Mutational profiling (MaP) identifies modified residues by using reverse transcriptase to misread a SHAPE-modified nucleotide and then counting the resulting mutations by massively parallel sequencing. The SHAPE-MaP approach measures the structure of large and transcriptome-wide systems as accurately as can be done for simple model RNAs. This protocol describes the experimental steps, implemented over 3 d, that are required to perform SHAPE probing and to construct multiplexed SHAPE-MaP libraries suitable for deep sequencing. Automated processing of MaP sequencing data is accomplished using two software packages. ShapeMapper converts raw sequencing files into mutational profiles, creates SHAPE reactivity plots and provides useful troubleshooting information. SuperFold uses these data to model RNA secondary structures, identify regions with well-defined structures and visualize probable and alternative helices, often in under 1 d. SHAPE-MaP can be used to make nucleotide-resolution biophysical measurements of individual RNA motifs, rare components of complex RNA ensembles and entire transcriptomes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shi, Yuqian; Hellinga, Homme W.; Beese, Lorena S.
Human exonuclease 1 (hExo1) is a member of the RAD2/XPG structure-specific 5'-nuclease superfamily. Its dominant, processive 5'–3' exonuclease and secondary 5'-flap endonuclease activities participate in various DNA repair, recombination, and replication processes. A single active site processes both recessed ends and 5'-flap substrates. By initiating enzyme reactions in crystals, we have trapped hExo1 reaction intermediates that reveal structures of these substrates before and after their exo- and endonucleolytic cleavage, as well as structures of uncleaved, unthreaded, and partially threaded 5' flaps. Their distinctive 5' ends are accommodated by a small, mobile arch in the active site that binds recessed endsmore » at its base and threads 5' flaps through a narrow aperture within its interior. A sequence of successive, interlocking conformational changes guides the two substrate types into a shared reaction mechanism that catalyzes their cleavage by an elaborated variant of the two-metal, in-line hydrolysis mechanism. Coupling of substrate-dependent arch motions to transition-state stabilization suppresses inappropriate or premature cleavage, enhancing processing fidelity. The striking reduction in flap conformational entropy is catalyzed, in part, by arch motions and transient binding interactions between the flap and unprocessed DNA strand. At the end of the observed reaction sequence, hExo1 resets without relinquishing DNA binding, suggesting a structural basis for its processivity.« less
Shi, Yuqian; Hellinga, Homme W; Beese, Lorena S
2017-06-06
Human exonuclease 1 (hExo1) is a member of the RAD2/XPG structure-specific 5'-nuclease superfamily. Its dominant, processive 5'-3' exonuclease and secondary 5'-flap endonuclease activities participate in various DNA repair, recombination, and replication processes. A single active site processes both recessed ends and 5'-flap substrates. By initiating enzyme reactions in crystals, we have trapped hExo1 reaction intermediates that reveal structures of these substrates before and after their exo- and endonucleolytic cleavage, as well as structures of uncleaved, unthreaded, and partially threaded 5' flaps. Their distinctive 5' ends are accommodated by a small, mobile arch in the active site that binds recessed ends at its base and threads 5' flaps through a narrow aperture within its interior. A sequence of successive, interlocking conformational changes guides the two substrate types into a shared reaction mechanism that catalyzes their cleavage by an elaborated variant of the two-metal, in-line hydrolysis mechanism. Coupling of substrate-dependent arch motions to transition-state stabilization suppresses inappropriate or premature cleavage, enhancing processing fidelity. The striking reduction in flap conformational entropy is catalyzed, in part, by arch motions and transient binding interactions between the flap and unprocessed DNA strand. At the end of the observed reaction sequence, hExo1 resets without relinquishing DNA binding, suggesting a structural basis for its processivity.
MollDE: a homology modeling framework you can click with.
Canutescu, Adrian A; Dunbrack, Roland L
2005-06-15
Molecular Integrated Development Environment (MolIDE) is an integrated application designed to provide homology modeling tools and protocols under a uniform, user-friendly graphical interface. Its main purpose is to combine the most frequent modeling steps in a semi-automatic, interactive way, guiding the user from the target protein sequence to the final three-dimensional protein structure. The typical basic homology modeling process is composed of building sequence profiles of the target sequence family, secondary structure prediction, sequence alignment with PDB structures, assisted alignment editing, side-chain prediction and loop building. All of these steps are available through a graphical user interface. MolIDE's user-friendly and streamlined interactive modeling protocol allows the user to focus on the important modeling questions, hiding from the user the raw data generation and conversion steps. MolIDE was designed from the ground up as an open-source, cross-platform, extensible framework. This allows developers to integrate additional third-party programs to MolIDE. http://dunbrack.fccc.edu/molide/molide.php rl_dunbrack@fccc.edu.
The rRNA evolution and procaryotic phylogeny
NASA Technical Reports Server (NTRS)
Fox, G. E.
1986-01-01
Studies of ribosomal RNA primary structure allow reconstruction of phylogenetic trees for prokaryotic organisms. Such studies reveal major dichotomy among the bacteria that separates them into eubacteria and archaebacteria. Both groupings are further segmented into several major divisions. The results obtained from 5S rRNA sequences are essentially the same as those obtained with the 16S rRNA data. In the case of Gram negative bacteria the ribosomal RNA sequencing results can also be directly compared with hybridization studies and cytochrome c sequencing studies. There is again excellent agreement among the several methods. It seems likely then that the overall picture of microbial phylogeny that is emerging from the RNA sequence studies is a good approximation of the true history of these organisms. The RNA data allow examination of the evolutionary process in a semi-quantitative way. The secondary structures of these RNAs are largely established. As a result it is possible to recognize examples of local structural evolution. Evolutionary pathways accounting for these events can be proposed and their probability can be assessed.
High taxonomic variability despite stable functional structure across microbial communities.
Louca, Stilianos; Jacques, Saulo M S; Pires, Aliny P F; Leal, Juliana S; Srivastava, Diane S; Parfrey, Laura Wegener; Farjalla, Vinicius F; Doebeli, Michael
2016-12-05
Understanding the processes that are driving variation of natural microbial communities across space or time is a major challenge for ecologists. Environmental conditions strongly shape the metabolic function of microbial communities; however, other processes such as biotic interactions, random demographic drift or dispersal limitation may also influence community dynamics. The relative importance of these processes and their effects on community function remain largely unknown. To address this uncertainty, here we examined bacterial and archaeal communities in replicate 'miniature' aquatic ecosystems contained within the foliage of wild bromeliads. We used marker gene sequencing to infer the taxonomic composition within nine metabolic functional groups, and shotgun environmental DNA sequencing to estimate the relative abundances of these groups. We found that all of the bromeliads exhibited remarkably similar functional community structures, but that the taxonomic composition within individual functional groups was highly variable. Furthermore, using statistical analyses, we found that non-neutral processes, including environmental filtering and potentially biotic interactions, at least partly shaped the composition within functional groups and were more important than spatial dispersal limitation and demographic drift. Hence both the functional structure and taxonomic composition within functional groups of natural microbial communities may be shaped by non-neutral and roughly separate processes.
Next Generation Sequencing Technologies: The Doorway to the Unexplored Genomics of Non-Model Plants
Unamba, Chibuikem I. N.; Nag, Akshay; Sharma, Ram K.
2015-01-01
Non-model plants i.e., the species which have one or all of the characters such as long life cycle, difficulty to grow in the laboratory or poor fecundity, have been schemed out of sequencing projects earlier, due to high running cost of Sanger sequencing. Consequently, the information about their genomics and key biological processes are inadequate. However, the advent of fast and cost effective next generation sequencing (NGS) platforms in the recent past has enabled the unearthing of certain characteristic gene structures unique to these species. It has also aided in gaining insight about mechanisms underlying processes of gene expression and secondary metabolism as well as facilitated development of genomic resources for diversity characterization, evolutionary analysis and marker assisted breeding even without prior availability of genomic sequence information. In this review we explore how different Next Gen Sequencing platforms, as well as recent advances in NGS based high throughput genotyping technologies are rewarding efforts on de-novo whole genome/transcriptome sequencing, development of genome wide sequence based markers resources for improvement of non-model crops that are less costly than phenotyping. PMID:26734016
Target Site Recognition by a Diversity-Generating Retroelement
Guo, Huatao; Tse, Longping V.; Nieh, Angela W.; Czornyj, Elizabeth; Williams, Steven; Oukil, Sabrina; Liu, Vincent B.; Miller, Jeff F.
2011-01-01
Diversity-generating retroelements (DGRs) are in vivo sequence diversification machines that are widely distributed in bacterial, phage, and plasmid genomes. They function to introduce vast amounts of targeted diversity into protein-encoding DNA sequences via mutagenic homing. Adenine residues are converted to random nucleotides in a retrotransposition process from a donor template repeat (TR) to a recipient variable repeat (VR). Using the Bordetella bacteriophage BPP-1 element as a prototype, we have characterized requirements for DGR target site function. Although sequences upstream of VR are dispensable, a 24 bp sequence immediately downstream of VR, which contains short inverted repeats, is required for efficient retrohoming. The inverted repeats form a hairpin or cruciform structure and mutational analysis demonstrated that, while the structure of the stem is important, its sequence can vary. In contrast, the loop has a sequence-dependent function. Structure-specific nuclease digestion confirmed the existence of a DNA hairpin/cruciform, and marker coconversion assays demonstrated that it influences the efficiency, but not the site of cDNA integration. Comparisons with other phage DGRs suggested that similar structures are a conserved feature of target sequences. Using a kanamycin resistance determinant as a reporter, we found that transplantation of the IMH and hairpin/cruciform-forming region was sufficient to target the DGR diversification machinery to a heterologous gene. In addition to furthering our understanding of DGR retrohoming, our results suggest that DGRs may provide unique tools for directed protein evolution via in vivo DNA diversification. PMID:22194701
Protein structure recognition: From eigenvector analysis to structural threading method
NASA Astrophysics Data System (ADS)
Cao, Haibo
In this work, we try to understand the protein folding problem using pair-wise hydrophobic interaction as the dominant interaction for the protein folding process. We found a strong correlation between amino acid sequence and the corresponding native structure of the protein. Some applications of this correlation were discussed in this dissertation include the domain partition and a new structural threading method as well as the performance of this method in the CASP5 competition. In the first part, we give a brief introduction to the protein folding problem. Some essential knowledge and progress from other research groups was discussed. This part include discussions of interactions among amino acids residues, lattice HP model, and the designablity principle. In the second part, we try to establish the correlation between amino acid sequence and the corresponding native structure of the protein. This correlation was observed in our eigenvector study of protein contact matrix. We believe the correlation is universal, thus it can be used in automatic partition of protein structures into folding domains. In the third part, we discuss a threading method based on the correlation between amino acid sequence and ominant eigenvector of the structure contact-matrix. A mathematically straightforward iteration scheme provides a self-consistent optimum global sequence-structure alignment. The computational efficiency of this method makes it possible to search whole protein structure databases for structural homology without relying on sequence similarity. The sensitivity and specificity of this method is discussed, along with a case of blind test prediction. In the appendix, we list the overall performance of this threading method in CASP5 blind test in comparison with other existing approaches.
Pitch structure, but not selective attention, affects accent weightings in metrical grouping.
Prince, Jon B
2014-10-01
Among other cues, pitch and temporal accents contribute to grouping in musical sequences. However, exactly how they combine remains unclear, possibly because of the role of structural organization. In 3 experiments, participants rated the perceived metrical grouping of sequences that either adhered to the rules of tonal Western musical pitch structure (musical key) or did not (atonal). The tonal status of sequences did not provide any grouping cues and was irrelevant to the task. Experiment 1 established equally strong levels of pitch leap accents and duration accents in baseline conditions, which were then recombined in subsequent experiments. Neither accent type was stronger or weaker for tonal and atonal contexts. In Experiment 2, pitch leap accents dominated over duration accents, but the extent of this advantage was greater when sequences were tonal. Experiment 3 ruled out an attentional origin of this effect by replicating this finding while explicitly manipulating attention to pitch or duration accents between participant groups. Overall, the presence of tonal pitch structure made the dimension of pitch more salient at the expense of time. These findings support a dimensional salience framework in which the presence of organizational structure prioritizes the processing of the more structured dimension regardless of task relevance, independent from psychophysical difficulty, and impervious to attentional allocation.
Orenstein, Yaron; Wang, Yuhao; Berger, Bonnie
2016-06-15
Protein-RNA interactions, which play vital roles in many processes, are mediated through both RNA sequence and structure. CLIP-based methods, which measure protein-RNA binding in vivo, suffer from experimental noise and systematic biases, whereas in vitro experiments capture a clearer signal of protein RNA-binding. Among them, RNAcompete provides binding affinities of a specific protein to more than 240 000 unstructured RNA probes in one experiment. The computational challenge is to infer RNA structure- and sequence-based binding models from these data. The state-of-the-art in sequence models, Deepbind, does not model structural preferences. RNAcontext models both sequence and structure preferences, but is outperformed by GraphProt. Unfortunately, GraphProt cannot detect structural preferences from RNAcompete data due to the unstructured nature of the data, as noted by its developers, nor can it be tractably run on the full RNACompete dataset. We develop RCK, an efficient, scalable algorithm that infers both sequence and structure preferences based on a new k-mer based model. Remarkably, even though RNAcompete data is designed to be unstructured, RCK can still learn structural preferences from it. RCK significantly outperforms both RNAcontext and Deepbind in in vitro binding prediction for 244 RNAcompete experiments. Moreover, RCK is also faster and uses less memory, which enables scalability. While currently on par with existing methods in in vivo binding prediction on a small scale test, we demonstrate that RCK will increasingly benefit from experimentally measured RNA structure profiles as compared to computationally predicted ones. By running RCK on the entire RNAcompete dataset, we generate and provide as a resource a set of protein-RNA structure-based models on an unprecedented scale. Software and models are freely available at http://rck.csail.mit.edu/ bab@mit.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Application of the AMPLE cluster-and-truncate approach to NMR structures for molecular replacement
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bibby, Jaclyn; Keegan, Ronan M.; Mayans, Olga
2013-11-01
Processing of NMR structures for molecular replacement by AMPLE works well. AMPLE is a program developed for clustering and truncating ab initio protein structure predictions into search models for molecular replacement. Here, it is shown that its core cluster-and-truncate methods also work well for processing NMR ensembles into search models. Rosetta remodelling helps to extend success to NMR structures bearing low sequence identity or high structural divergence from the target protein. Potential future routes to improved performance are considered and practical, general guidelines on using AMPLE are provided.
Maurer-Stroh, Sebastian; Gao, He; Han, Hao; Baeten, Lies; Schymkowitz, Joost; Rousseau, Frederic; Zhang, Louxin; Eisenhaber, Frank
2013-02-01
Data mining in protein databases, derivatives from more fundamental protein 3D structure and sequence databases, has considerable unearthed potential for the discovery of sequence motif--structural motif--function relationships as the finding of the U-shape (Huf-Zinc) motif, originally a small student's project, exemplifies. The metal ion zinc is critically involved in universal biological processes, ranging from protein-DNA complexes and transcription regulation to enzymatic catalysis and metabolic pathways. Proteins have evolved a series of motifs to specifically recognize and bind zinc ions. Many of these, so called zinc fingers, are structurally independent globular domains with discontinuous binding motifs made up of residues mostly far apart in sequence. Through a systematic approach starting from the BRIX structure fragment database, we discovered that there exists another predictable subset of zinc-binding motifs that not only have a conserved continuous sequence pattern but also share a characteristic local conformation, despite being included in totally different overall folds. While this does not allow general prediction of all Zn binding motifs, a HMM-based web server, Huf-Zinc, is available for prediction of these novel, as well as conventional, zinc finger motifs in protein sequences. The Huf-Zinc webserver can be freely accessed through this URL (http://mendel.bii.a-star.edu.sg/METHODS/hufzinc/).
Prediction of the translocon-mediated membrane insertion free energies of protein sequences.
Park, Yungki; Helms, Volkhard
2008-05-15
Helical membrane proteins (HMPs) play crucial roles in a variety of cellular processes. Unlike water-soluble proteins, HMPs need not only to fold but also get inserted into the membrane to be fully functional. This process of membrane insertion is mediated by the translocon complex. Thus, it is of great interest to develop computational methods for predicting the translocon-mediated membrane insertion free energies of protein sequences. We have developed Membrane Insertion (MINS), a novel sequence-based computational method for predicting the membrane insertion free energies of protein sequences. A benchmark test gives a correlation coefficient of 0.74 between predicted and observed free energies for 357 known cases, which corresponds to a mean unsigned error of 0.41 kcal/mol. These results are significantly better than those obtained by traditional hydropathy analysis. Moreover, the ability of MINS to reasonably predict membrane insertion free energies of protein sequences allows for effective identification of transmembrane (TM) segments. Subsequently, MINS was applied to predict the membrane insertion free energies of 316 TM segments found in known structures. An in-depth analysis of the predicted free energies reveals a number of interesting findings about the biogenesis and structural stability of HMPs. A web server for MINS is available at http://service.bioinformatik.uni-saarland.de/mins
In vitro selection using a dual RNA library that allows primerless selection
Jarosch, Florian; Buchner, Klaus; Klussmann, Sven
2006-01-01
High affinity target-binding aptamers are identified from random oligonucleotide libraries by an in vitro selection process called Systematic Evolution of Ligands by EXponential enrichment (SELEX). Since the SELEX process includes a PCR amplification step the randomized region of the oligonucleotide libraries need to be flanked by two fixed primer binding sequences. These primer binding sites are often difficult to truncate because they may be necessary to maintain the structure of the aptamer or may even be part of the target binding motif. We designed a novel type of RNA library that carries fixed sequences which constrain the oligonucleotides into a partly double-stranded structure, thereby minimizing the risk that the primer binding sequences become part of the target-binding motif. Moreover, the specific design of the library including the use of tandem RNA Polymerase promoters allows the selection of oligonucleotides without any primer binding sequences. The library was used to select aptamers to the mirror-image peptide of ghrelin. Ghrelin is a potent stimulator of growth-hormone release and food intake. After selection, the identified aptamer sequences were directly synthesized in their mirror-image configuration. The final 44 nt-Spiegelmer, named NOX-B11-3, blocks ghrelin action in a cell culture assay displaying an IC50 of 4.5 nM at 37°C. PMID:16855281
The influence of focused-attention meditation states on the cognitive control of sequence learning.
Chan, Russell W; Immink, Maarten A; Lushington, Kurt
2017-10-01
Cognitive control processes influence how motor sequence information is utilised and represented. Since cognitive control processes are shared amongst goal-oriented tasks, motor sequence learning and performance might be influenced by preceding cognitive tasks such as focused-attention meditation (FAM). Prior to a serial reaction time task (SRTT), participants completed either a single-session of FAM, a single-session of FAM followed by delay (FAM+) or no meditation (CONTROL). Relative to CONTROL, FAM benefitted performance in early, random-ordered blocks. However, across subsequent sequence learning blocks, FAM+ supported the highest levels of performance improvement resulting in superior performance at the end of the SRTT. Performance following FAM+ demonstrated greater reliance on embedded sequence structures than FAM. These findings illustrate that increased top-down control immediately after FAM biases the implementation of stimulus-based planning. Introduction of a delay following FAM relaxes top-down control allowing for implementation of response-based planning resulting in sequence learning benefits. Copyright © 2017 Elsevier Inc. All rights reserved.
Zhang, Bo; Wu, Wen-Qiang; Liu, Na-Nv; Duan, Xiao-Lei; Li, Ming; Dou, Shuo-Xing; Hou, Xi-Miao; Xi, Xu-Guang
2016-01-01
Alternative DNA structures that deviate from B-form double-stranded DNA such as G-quadruplex (G4) DNA can be formed by G-rich sequences that are widely distributed throughout the human genome. We have previously shown that Pif1p not only unfolds G4, but also unwinds the downstream duplex DNA in a G4-stimulated manner. In the present study, we further characterized the G4-stimulated duplex DNA unwinding phenomenon by means of single-molecule fluorescence resonance energy transfer. It was found that Pif1p did not unwind the partial duplex DNA immediately after unfolding the upstream G4 structure, but rather, it would dwell at the ss/dsDNA junction with a ‘waiting time’. Further studies revealed that the waiting time was in fact related to a protein dimerization process that was sensitive to ssDNA sequence and would become rapid if the sequence is G-rich. Furthermore, we identified that the G-rich sequence, as the G4 structure, equally stimulates duplex DNA unwinding. The present work sheds new light on the molecular mechanism by which G4-unwinding helicase Pif1p resolves physiological G4/duplex DNA structures in cells. PMID:27471032
2014-01-01
Background Due to rapid sequencing of genomes, there are now millions of deposited protein sequences with no known function. Fast sequence-based comparisons allow detecting close homologs for a protein of interest to transfer functional information from the homologs to the given protein. Sequence-based comparison cannot detect remote homologs, in which evolution has adjusted the sequence while largely preserving structure. Structure-based comparisons can detect remote homologs but most methods for doing so are too expensive to apply at a large scale over structural databases of proteins. Recently, fragment-based structural representations have been proposed that allow fast detection of remote homologs with reasonable accuracy. These representations have also been used to obtain linearly-reducible maps of protein structure space. It has been shown, as additionally supported from analysis in this paper that such maps preserve functional co-localization of the protein structure space. Methods Inspired by a recent application of the Latent Dirichlet Allocation (LDA) model for conducting structural comparisons of proteins, we propose higher-order LDA-obtained topic-based representations of protein structures to provide an alternative route for remote homology detection and organization of the protein structure space in few dimensions. Various techniques based on natural language processing are proposed and employed to aid the analysis of topics in the protein structure domain. Results We show that a topic-based representation is just as effective as a fragment-based one at automated detection of remote homologs and organization of protein structure space. We conduct a detailed analysis of the information content in the topic-based representation, showing that topics have semantic meaning. The fragment-based and topic-based representations are also shown to allow prediction of superfamily membership. Conclusions This work opens exciting venues in designing novel representations to extract information about protein structures, as well as organizing and mining protein structure space with mature text mining tools. PMID:25080993
Choosing order of operations to accelerate strip structure analysis in parameter range
NASA Astrophysics Data System (ADS)
Kuksenko, S. P.; Akhunov, R. R.; Gazizov, T. R.
2018-05-01
The paper considers the issue of using iteration methods in solving the sequence of linear algebraic systems obtained in quasistatic analysis of strip structures with the method of moments. Using the analysis of 4 strip structures, the authors have proved that additional acceleration (up to 2.21 times) of the iterative process can be obtained during the process of solving linear systems repeatedly by means of choosing a proper order of operations and a preconditioner. The obtained results can be used to accelerate the process of computer-aided design of various strip structures. The choice of the order of operations to accelerate the process is quite simple, universal and could be used not only for strip structure analysis but also for a wide range of computational problems.
Computational studies of sequence-specific driving forces in peptide self-assembly
NASA Astrophysics Data System (ADS)
Jeon, Joohyun
Peptides are biopolymers made from various sequences of twenty different types of amino acids, connected by peptide bonds. There are practically an infinite number of possible sequences and tremendous possible combinations of peptide-peptide interactions. Recently, an increasing number of studies have shown a stark variety of peptide self-assembled nanomaterials whose detailed structures depend on their sequences and environmental factors; these have end uses in medical and bio-electronic applications, for example. To understand the underlying physics of complex peptide self-assembly processes and to delineate sequence specific effects, in this study, I use various simulation tools spanning all-atom molecular dynamics to simple lattice models and quantify the balance of interactions in the peptide self-assembly processes. In contrast to the existing view that peptides' aggregation propensities are proportional to the net sequence hydrophobicity and inversely proportional to the net charge, I show the more nuanced effects of electrostatic interactions, including the cooperative effects between hydrophobic and electrostatic interactions. Notably, I suggest rather unexpected, yet important roles of entropies in the small scale oligomerization processes. Overall, this study broadens our understanding of the role of thermodynamic driving forces in peptide self-assembly.
FANCJ promotes DNA synthesis through G-quadruplex structures
Castillo Bosch, Pau; Segura-Bayona, Sandra; Koole, Wouter; van Heteren, Jane T; Dewar, James M; Tijsterman, Marcel; Knipscheer, Puck
2014-01-01
Our genome contains many G-rich sequences, which have the propensity to fold into stable secondary DNA structures called G4 or G-quadruplex structures. These structures have been implicated in cellular processes such as gene regulation and telomere maintenance. However, G4 sequences are prone to mutations particularly upon replication stress or in the absence of specific helicases. To investigate how G-quadruplex structures are resolved during DNA replication, we developed a model system using ssDNA templates and Xenopus egg extracts that recapitulates eukaryotic G4 replication. Here, we show that G-quadruplex structures form a barrier for DNA replication. Nascent strand synthesis is blocked at one or two nucleotides from the G4. After transient stalling, G-quadruplexes are efficiently unwound and replicated. In contrast, depletion of the FANCJ/BRIP1 helicase causes persistent replication stalling at G-quadruplex structures, demonstrating a vital role for this helicase in resolving these structures. FANCJ performs this function independently of the classical Fanconi anemia pathway. These data provide evidence that the G4 sequence instability in FANCJ−/− cells and Fancj/dog1 deficient C. elegans is caused by replication stalling at G-quadruplexes. PMID:25193968
Zhang, Yiming; Jin, Quan; Wang, Shuting; Ren, Ren
2011-05-01
The mobile behavior of 1481 peptides in ion mobility spectrometry (IMS), which are generated by protease digestion of the Drosophila melanogaster proteome, is modeled and predicted based on two different types of characterization methods, i.e. sequence-based approach and structure-based approach. In this procedure, the sequence-based approach considers both the amino acid composition of a peptide and the local environment profile of each amino acid in the peptide; the structure-based approach is performed with the CODESSA protocol, which regards a peptide as a common organic compound and generates more than 200 statistically significant variables to characterize the whole structure profile of a peptide molecule. Subsequently, the nonlinear support vector machine (SVM) and Gaussian process (GP) as well as linear partial least squares (PLS) regression is employed to correlate the structural parameters of the characterizations with the IMS drift times of these peptides. The obtained quantitative structure-spectrum relationship (QSSR) models are evaluated rigorously and investigated systematically via both one-deep and two-deep cross-validations as well as the rigorous Monte Carlo cross-validation (MCCV). We also give a comprehensive comparison on the resulting statistics arising from the different combinations of variable types with modeling methods and find that the sequence-based approach can give the QSSR models with better fitting ability and predictive power but worse interpretability than the structure-based approach. In addition, though the QSSR modeling using sequence-based approach is not needed for the preparation of the minimization structures of peptides before the modeling, it would be considerably efficient as compared to that using structure-based approach. Copyright © 2011 Elsevier Ltd. All rights reserved.
Unified Deep Learning Architecture for Modeling Biology Sequence.
Wu, Hongjie; Cao, Chengyuan; Xia, Xiaoyan; Lu, Qiang
2017-10-09
Prediction of the spatial structure or function of biological macromolecules based on their sequence remains an important challenge in bioinformatics. When modeling biological sequences using traditional sequencing models, characteristics, such as long-range interactions between basic units, the complicated and variable output of labeled structures, and the variable length of biological sequences, usually lead to different solutions on a case-by-case basis. This study proposed the use of bidirectional recurrent neural networks based on long short-term memory or a gated recurrent unit to capture long-range interactions by designing the optional reshape operator to adapt to the diversity of the output labels and implementing a training algorithm to support the training of sequence models capable of processing variable-length sequences. Additionally, the merge and pooling operators enhanced the ability to capture short-range interactions between basic units of biological sequences. The proposed deep-learning model and its training algorithm might be capable of solving currently known biological sequence-modeling problems through the use of a unified framework. We validated our model on one of the most difficult biological sequence-modeling problems currently known, with our results indicating the ability of the model to obtain predictions of protein residue interactions that exceeded the accuracy of current popular approaches by 10% based on multiple benchmarks.
NASA Astrophysics Data System (ADS)
Liu, Feng-xiang; Liu, Rang-su; Hou, Zhao-yang; Liu, Hai-Rong; Tian, Ze-an; Zhou, Li-li
2009-02-01
The rapid solidification processes of Al 50Mg 50 liquid alloy consisting of 50,000 atoms have been simulated by using molecular dynamics method based on the effective pair potential derived from the pseudopotential theory. The formation mechanisms of atomic clusters during the rapid solidification processes have been investigated adopting a new cluster description method—cluster-type index method (CTIM). The simulated partial structure factors are in good agreement with the experimental results. And Al-Mg amorphous structure characterized with Al-centered icosahedral topological short-range order (SRO) is found to form during the rapid solidification processes. The icosahedral cluster plays a key role in the microstructure transition. Besides, it is also found that the size distribution of various clusters in the system presents a magic number sequence of 13, 19, 23, 25, 29, 31, 33, 37, …. The magic clusters are more stable and mainly correspond to the incompact arrangements of linked icosahedra in the form of rings, chains or dendrites. And each magic number point stands correspondingly for one certain combining form of icosahedra. This magic number sequence is different from that generated in the solidification structure of liquid Al and those obtained by methods of gaseous deposition and ionic spray, etc.
Antunes, Deborah; Jorge, Natasha A. N.; Caffarena, Ernesto R.; Passetti, Fabio
2018-01-01
RNA molecules are essential players in many fundamental biological processes. Prokaryotes and eukaryotes have distinct RNA classes with specific structural features and functional roles. Computational prediction of protein structures is a research field in which high confidence three-dimensional protein models can be proposed based on the sequence alignment between target and templates. However, to date, only a few approaches have been developed for the computational prediction of RNA structures. Similar to proteins, RNA structures may be altered due to the interaction with various ligands, including proteins, other RNAs, and metabolites. A riboswitch is a molecular mechanism, found in the three kingdoms of life, in which the RNA structure is modified by the binding of a metabolite. It can regulate multiple gene expression mechanisms, such as transcription, translation initiation, and mRNA splicing and processing. Due to their nature, these entities also act on the regulation of gene expression and detection of small metabolites and have the potential to helping in the discovery of new classes of antimicrobial agents. In this review, we describe software and web servers currently available for riboswitch aptamer identification and secondary and tertiary structure prediction, including applications. PMID:29403526
Smola, Matthew J.; Rice, Greggory M.; Busan, Steven; Siegfried, Nathan A.; Weeks, Kevin M.
2016-01-01
SHAPE chemistries exploit small electrophilic reagents that react with the 2′-hydroxyl group to interrogate RNA structure at single-nucleotide resolution. Mutational profiling (MaP) identifies modified residues based on the ability of reverse transcriptase to misread a SHAPE-modified nucleotide and then counting the resulting mutations by massively parallel sequencing. The SHAPE-MaP approach measures the structure of large and transcriptome-wide systems as accurately as for simple model RNAs. This protocol describes the experimental steps, implemented over three days, required to perform SHAPE probing and construct multiplexed SHAPE-MaP libraries suitable for deep sequencing. These steps include RNA folding and SHAPE structure probing, mutational profiling by reverse transcription, library construction, and sequencing. Automated processing of MaP sequencing data is accomplished using two software packages. ShapeMapper converts raw sequencing files into mutational profiles, creates SHAPE reactivity plots, and provides useful troubleshooting information, often within an hour. SuperFold uses these data to model RNA secondary structures, identify regions with well-defined structures, and visualize probable and alternative helices, often in under a day. We illustrate these algorithms with the E. coli thiamine pyrophosphate riboswitch, E. coli 16S rRNA, and HIV-1 genomic RNAs. SHAPE-MaP can be used to make nucleotide-resolution biophysical measurements of individual RNA motifs, rare components of complex RNA ensembles, and entire transcriptomes. The straightforward MaP strategy greatly expands the number, length, and complexity of analyzable RNA structures. PMID:26426499
Sources of PCR-induced distortions in high-throughput sequencing data sets
Kebschull, Justus M.; Zador, Anthony M.
2015-01-01
PCR permits the exponential and sequence-specific amplification of DNA, even from minute starting quantities. PCR is a fundamental step in preparing DNA samples for high-throughput sequencing. However, there are errors associated with PCR-mediated amplification. Here we examine the effects of four important sources of error—bias, stochasticity, template switches and polymerase errors—on sequence representation in low-input next-generation sequencing libraries. We designed a pool of diverse PCR amplicons with a defined structure, and then used Illumina sequencing to search for signatures of each process. We further developed quantitative models for each process, and compared predictions of these models to our experimental data. We find that PCR stochasticity is the major force skewing sequence representation after amplification of a pool of unique DNA amplicons. Polymerase errors become very common in later cycles of PCR but have little impact on the overall sequence distribution as they are confined to small copy numbers. PCR template switches are rare and confined to low copy numbers. Our results provide a theoretical basis for removing distortions from high-throughput sequencing data. In addition, our findings on PCR stochasticity will have particular relevance to quantification of results from single cell sequencing, in which sequences are represented by only one or a few molecules. PMID:26187991
Ciotlos, Serban; Mao, Qing; Zhang, Rebecca Yu; Li, Zhenyu; Chin, Robert; Gulbahce, Natali; Liu, Sophie Jia; Drmanac, Radoje; Peters, Brock A
2016-01-01
The cell line BT-474 is a popular cell line for studying the biology of cancer and developing novel drugs. However, there is no complete, published genome sequence for this highly utilized scientific resource. In this study we sought to provide a comprehensive and useful data set for the scientific community by generating a whole genome sequence for BT-474. Five μg of genomic DNA, isolated from an early passage of the BT-474 cell line, was used to generate a whole genome sequence (114X coverage) using Complete Genomics' standard sequencing process. To provide additional variant phasing and structural variation data we also processed and analyzed two separate libraries of 5 and 6 individual cells to depths of 99X and 87X, respectively, using Complete Genomics' Long Fragment Read (LFR) technology. BT-474 is a highly aneuploid cell line with an extremely complex genome sequence. This ~300X total coverage genome sequence provides a more complete understanding of this highly utilized cell line at the genomic level.
Hazes, Bart
2014-02-28
Protein-coding DNA sequences and their corresponding amino acid sequences are routinely used to study relationships between sequence, structure, function, and evolution. The rapidly growing size of sequence databases increases the power of such comparative analyses but it makes it more challenging to prepare high quality sequence data sets with control over redundancy, quality, completeness, formatting, and labeling. Software tools for some individual steps in this process exist but manual intervention remains a common and time consuming necessity. CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5' and 3' ends, full taxonomic data, and a heuristic to rank the scientific interest of each species. This rich information allows fully automated data set preparation with a level of sophistication that aims to meet or exceed manual processing. Defaults ensure ease of use for typical scenarios while allowing great flexibility when needed. Access is via a free web server at http://hazeslab.med.ualberta.ca/CDSbank/. CDSbank presents a user-friendly web server to download, filter, format, and name large sequence data sets. Common usage scenarios can be accessed via pre-programmed default choices, while optional sections give full control over the processing pipeline. Particular strengths are: extract protein-coding DNA sequences just as easily as amino acid sequences, full access to taxonomy for labeling and filtering, awareness of incomplete sequences, and the ability to take one protein sequence and extract all synonymous CDS or identical protein sequences in other species. Finally, CDSbank can also create labeled property files to, for instance, annotate or re-label phylogenetic trees.
Robakis, Thalia; Bak, Beata; Lin, Shu-huei; Bernard, Daniel J.; Scheiffele, Peter
2008-01-01
Precursor proteolysis is a crucial mechanism for regulating protein structure and function. Signal peptidase (SP) is an enzyme with a well defined role in cleaving N-terminal signal sequences but no demonstrated function in the proteolysis of cellular precursor proteins. We provide evidence that SP mediates intraprotein cleavage of IgSF1, a large cellular Ig domain protein that is processed into two separate Ig domain proteins. In addition, our results suggest the involvement of signal peptide peptidase (SPP), an intramembrane protease, which acts on substrates that have been previously cleaved by SP. We show that IgSF1 is processed through sequential proteolysis by SP and SPP. Cleavage is directed by an internal signal sequence and generates two separate Ig domain proteins from a polytopic precursor. Our findings suggest that SP and SPP function are not restricted to N-terminal signal sequence cleavage but also contribute to the processing of cellular transmembrane proteins. PMID:18981173
Use of simulated data sets to evaluate the fidelity of metagenomic processing methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mavromatis, K; Ivanova, N; Barry, Kerrie
2007-01-01
Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based ( blast hit distribution) and twomore » sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.« less
The Histone Database: an integrated resource for histones and histone fold-containing proteins
Mariño-Ramírez, Leonardo; Levine, Kevin M.; Morales, Mario; Zhang, Suiyuan; Moreland, R. Travis; Baxevanis, Andreas D.; Landsman, David
2011-01-01
Eukaryotic chromatin is composed of DNA and protein components—core histones—that act to compactly pack the DNA into nucleosomes, the fundamental building blocks of chromatin. These nucleosomes are connected to adjacent nucleosomes by linker histones. Nucleosomes are highly dynamic and, through various core histone post-translational modifications and incorporation of diverse histone variants, can serve as epigenetic marks to control processes such as gene expression and recombination. The Histone Sequence Database is a curated collection of sequences and structures of histones and non-histone proteins containing histone folds, assembled from major public databases. Here, we report a substantial increase in the number of sequences and taxonomic coverage for histone and histone fold-containing proteins available in the database. Additionally, the database now contains an expanded dataset that includes archaeal histone sequences. The database also provides comprehensive multiple sequence alignments for each of the four core histones (H2A, H2B, H3 and H4), the linker histones (H1/H5) and the archaeal histones. The database also includes current information on solved histone fold-containing structures. The Histone Sequence Database is an inclusive resource for the analysis of chromatin structure and function focused on histones and histone fold-containing proteins. Database URL: The Histone Sequence Database is freely available and can be accessed at http://research.nhgri.nih.gov/histones/. PMID:22025671
Genetic variability and evolutionary dynamics of viruses of the family Closteroviridae
Rubio, Luis; Guerri, José; Moreno, Pedro
2013-01-01
RNA viruses have a great potential for genetic variation, rapid evolution and adaptation. Characterization of the genetic variation of viral populations provides relevant information on the processes involved in virus evolution and epidemiology and it is crucial for designing reliable diagnostic tools and developing efficient and durable disease control strategies. Here we performed an updated analysis of sequences available in Genbank and reviewed present knowledge on the genetic variability and evolutionary processes of viruses of the family Closteroviridae. Several factors have shaped the genetic structure and diversity of closteroviruses. (I) A strong negative selection seems to be responsible for the high genetic stability in space and time for some viruses. (2) Long distance migration, probably by human transport of infected propagative plant material, have caused that genetically similar virus isolates are found in distant geographical regions. (3) Recombination between divergent sequence variants have generated new genotypes and plays an important role for the evolution of some viruses of the family Closteroviridae. (4) Interaction between virus strains or between different viruses in mixed infections may alter accumulation of certain strains. (5) Host change or virus transmission by insect vectors induced changes in the viral population structure due to positive selection of sequence variants with higher fitness for host-virus or vector-virus interaction (adaptation) or by genetic drift due to random selection of sequence variants during the population bottleneck associated to the transmission process. PMID:23805130
Sequence of the chloroplast 16S rRNA gene and its surrounding regions of Chlamydomonas reinhardii.
Dron, M; Rahire, M; Rochaix, J D
1982-01-01
The sequence of a 2 kb DNA fragment containing the chloroplast 16S ribosomal RNA gene from Chlamydomonas reinhardii and its flanking regions has been determined. The algal 16S rRNA sequence (1475 nucleotides) and secondary structure are highly related to those found in bacteria and in the chloroplasts of higher plants. In contrast, the flanking regions are very different. In C. reinhardii the 16S rRNA gene is surrounded by AT rich segments of about 180 bases, which are followed by a long stretch of complementary bases separated from each other by 1833 nucleotides. It is likely that these structures play an important role in the folding and processing of the precursor of 16S rRNA. The primary and secondary structures of the binding sites of two ribosomal proteins in the 16SrRNAs of E. coli and C. reinhardii are considerably related. Images PMID:6296784
Seligmann, Hervé
2016-01-01
In mitochondria, secondary structures punctuate post-transcriptional RNA processing. Recently described transcripts match the human mitogenome after systematic deletions of every 4th, respectively every 4th and 5th nucleotides, called delRNAs. Here I explore predicted stem-loop hairpin formation by delRNAs, and their associations with delRNA transcription and detected peptides matching their translation. Despite missing 25, respectively 40% of the nucleotides in the original sequence, del-transformed sequences form significantly more secondary structures than corresponding randomly shuffled sequences, indicating biological function, independently of, and in combination with, previously detected delRNA and thereof translated peptides. Self-hybridization decreases delRNA abundances, indicating downregulation. Systematic deletions of the human mitogenome reveal new, unsuspected coding and structural informations. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Lopopolo, Alessandro; Frank, Stefan L; van den Bosch, Antal; Willems, Roel M
2017-01-01
Language comprehension involves the simultaneous processing of information at the phonological, syntactic, and lexical level. We track these three distinct streams of information in the brain by using stochastic measures derived from computational language models to detect neural correlates of phoneme, part-of-speech, and word processing in an fMRI experiment. Probabilistic language models have proven to be useful tools for studying how language is processed as a sequence of symbols unfolding in time. Conditional probabilities between sequences of words are at the basis of probabilistic measures such as surprisal and perplexity which have been successfully used as predictors of several behavioural and neural correlates of sentence processing. Here we computed perplexity from sequences of words and their parts of speech, and their phonemic transcriptions. Brain activity time-locked to each word is regressed on the three model-derived measures. We observe that the brain keeps track of the statistical structure of lexical, syntactic and phonological information in distinct areas.
NASA Astrophysics Data System (ADS)
Tang, Le; Zhu, Songling; Mastriani, Emilio; Fang, Xin; Zhou, Yu-Jie; Li, Yong-Guo; Johnston, Randal N.; Guo, Zheng; Liu, Gui-Rong; Liu, Shu-Lin
2017-03-01
Highly conserved short sequences help identify functional genomic regions and facilitate genomic annotation. We used Salmonella as the model to search the genome for evolutionarily conserved regions and focused on the tetranucleotide sequence CTAG for its potentially important functions. In Salmonella, CTAG is highly conserved across the lineages and large numbers of CTAG-containing short sequences fall in intergenic regions, strongly indicating their biological importance. Computer modeling demonstrated stable stem-loop structures in some of the CTAG-containing intergenic regions, and substitution of a nucleotide of the CTAG sequence would radically rearrange the free energy and disrupt the structure. The postulated degeneration of CTAG takes distinct patterns among Salmonella lineages and provides novel information about genomic divergence and evolution of these bacterial pathogens. Comparison of the vertically and horizontally transmitted genomic segments showed different CTAG distribution landscapes, with the genome amelioration process to remove CTAG taking place inward from both terminals of the horizontally acquired segment.
Rapid phylogenetic dissection of prokaryotic community structure in tidal flat using pyrosequencing.
Kim, Bong-Soo; Kim, Byung Kwon; Lee, Jae-Hak; Kim, Myungjin; Lim, Young Woon; Chun, Jongsik
2008-08-01
Dissection of prokaryotic community structure is prerequisite to understand their ecological roles. Various methods are available for such a purpose which amplification and sequencing of 16S rRNA genes gained its popularity. However, conventional methods based on Sanger sequencing technique require cloning process prior to sequencing, and are expensive and labor-intensive. We investigated prokaryotic community structure in tidal flat sediments, Korea, using pyrosequencing and a subsequent automated bioinformatic pipeline for the rapid and accurate taxonomic assignment of each amplicon. The combination of pyrosequencing and bioinformatic analysis showed that bacterial and archaeal communities were more diverse than previously reported in clone library studies. Pyrosequencing analysis revealed 21 bacterial divisions and 37 candidate divisions. Proteobacteria was the most abundant division in the bacterial community, of which Gamma-and Delta-Proteobacteria were the most abundant. Similarly, 4 archaeal divisions were found in tidal flat sediments. Euryarchaeota was the most abundant division in the archaeal sequences, which were further divided into 8 classes and 11 unclassified euryarchaeota groups. The system developed here provides a simple, in-depth and automated way of dissecting a prokaryotic community structure without extensive pretreatment such as cloning.
Effect of the SH3-SH2 domain linker sequence on the structure of Hck kinase.
Meiselbach, Heike; Sticht, Heinrich
2011-08-01
The coordination of activity in biological systems requires the existence of different signal transduction pathways that interact with one another and must be precisely regulated. The Src-family tyrosine kinases, which are found in many signaling pathways, differ in their physiological function despite their high overall structural similarity. In this context, the differences in the SH3-SH2 domain linkers might play a role for differential regulation, but the structural consequences of linker sequence remain poorly understood. We have therefore performed comparative molecular dynamics simulations of wildtype Hck and of a mutant Hck in which the SH3-SH2 domain linker is replaced by the corresponding sequence from the homologous kinase Lck. These simulations reveal that linker replacement not only affects the orientation of the SH3 domain itself, but also leads to an alternative conformation of the activation segment in the Hck kinase domain. The sequence of the SH3-SH2 domain linker thus exerts a remote effect on the active site geometry and might therefore play a role in modulating the structure of the inactive kinase or in fine-tuning the activation process itself.
Shi, Yuqian; Hellinga, Homme W.; Beese, Lorena S.
2017-01-01
Human exonuclease 1 (hExo1) is a member of the RAD2/XPG structure-specific 5′-nuclease superfamily. Its dominant, processive 5′–3′ exonuclease and secondary 5′-flap endonuclease activities participate in various DNA repair, recombination, and replication processes. A single active site processes both recessed ends and 5′-flap substrates. By initiating enzyme reactions in crystals, we have trapped hExo1 reaction intermediates that reveal structures of these substrates before and after their exo- and endonucleolytic cleavage, as well as structures of uncleaved, unthreaded, and partially threaded 5′ flaps. Their distinctive 5′ ends are accommodated by a small, mobile arch in the active site that binds recessed ends at its base and threads 5′ flaps through a narrow aperture within its interior. A sequence of successive, interlocking conformational changes guides the two substrate types into a shared reaction mechanism that catalyzes their cleavage by an elaborated variant of the two-metal, in-line hydrolysis mechanism. Coupling of substrate-dependent arch motions to transition-state stabilization suppresses inappropriate or premature cleavage, enhancing processing fidelity. The striking reduction in flap conformational entropy is catalyzed, in part, by arch motions and transient binding interactions between the flap and unprocessed DNA strand. At the end of the observed reaction sequence, hExo1 resets without relinquishing DNA binding, suggesting a structural basis for its processivity. PMID:28533382
Tateishi-Karimata, Hisae; Isono, Noburu; Sugimoto, Naoki
2014-01-01
The thermal stability and topology of non-canonical structures of G-quadruplexes and hairpins in template DNA were investigated, and the effect of non-canonical structures on transcription fidelity was evaluated quantitatively. We designed ten template DNAs: A linear sequence that does not have significant higher-order structure, three sequences that form hairpin structures, and six sequences that form G-quadruplex structures with different stabilities. Templates with non-canonical structures induced the production of an arrested, a slipped, and a full-length transcript, whereas the linear sequence produced only a full-length transcript. The efficiency of production for run-off transcripts (full-length and slipped transcripts) from templates that formed the non-canonical structures was lower than that from the linear. G-quadruplex structures were more effective inhibitors of full-length product formation than were hairpin structure even when the stability of the G-quadruplex in an aqueous solution was the same as that of the hairpin. We considered that intra-polymerase conditions may differentially affect the stability of non-canonical structures. The values of transcription efficiencies of run-off or arrest transcripts were correlated with stabilities of non-canonical structures in the intra-polymerase condition mimicked by 20 wt% polyethylene glycol (PEG). Transcriptional arrest was induced when the stability of the G-quadruplex structure (-ΔG°37) in the presence of 20 wt% PEG was more than 8.2 kcal mol(-1). Thus, values of stability in the presence of 20 wt% PEG are an important indicator of transcription perturbation. Our results further our understanding of the impact of template structure on the transcription process and may guide logical design of transcription-regulating drugs.
Tateishi-Karimata, Hisae; Isono, Noburu; Sugimoto, Naoki
2014-01-01
The thermal stability and topology of non-canonical structures of G-quadruplexes and hairpins in template DNA were investigated, and the effect of non-canonical structures on transcription fidelity was evaluated quantitatively. We designed ten template DNAs: A linear sequence that does not have significant higher-order structure, three sequences that form hairpin structures, and six sequences that form G-quadruplex structures with different stabilities. Templates with non-canonical structures induced the production of an arrested, a slipped, and a full-length transcript, whereas the linear sequence produced only a full-length transcript. The efficiency of production for run-off transcripts (full-length and slipped transcripts) from templates that formed the non-canonical structures was lower than that from the linear. G-quadruplex structures were more effective inhibitors of full-length product formation than were hairpin structure even when the stability of the G-quadruplex in an aqueous solution was the same as that of the hairpin. We considered that intra-polymerase conditions may differentially affect the stability of non-canonical structures. The values of transcription efficiencies of run-off or arrest transcripts were correlated with stabilities of non-canonical structures in the intra-polymerase condition mimicked by 20 wt% polyethylene glycol (PEG). Transcriptional arrest was induced when the stability of the G-quadruplex structure (−ΔGo 37) in the presence of 20 wt% PEG was more than 8.2 kcal mol−1. Thus, values of stability in the presence of 20 wt% PEG are an important indicator of transcription perturbation. Our results further our understanding of the impact of template structure on the transcription process and may guide logical design of transcription-regulating drugs. PMID:24594642
Heparin Characterization: Challenges and Solutions
NASA Astrophysics Data System (ADS)
Jones, Christopher J.; Beni, Szabolcs; Limtiaco, John F. K.; Langeslay, Derek J.; Larive, Cynthia K.
2011-07-01
Although heparin is an important and widely prescribed pharmaceutical anticoagulant, its high degree of sequence microheterogeneity and size polydispersity make molecular-level characterization challenging. Unlike nucleic acids and proteins that are biosynthesized through template-driven assembly processes, heparin and the related glycosaminoglycan heparan sulfate are actively remodeled during biosynthesis through a series of enzymatic reactions that lead to variable levels of O- and N-sulfonation and uronic acid epimers. As summarized in this review, heparin sequence information is determined through a bottom-up approach that relies on depolymerization reactions, size- and charge-based separations, and sensitive mass spectrometric and nuclear magnetic resonance experiments to determine the structural identity of component oligosaccharides. The structure-elucidation process, along with its challenges and opportunities for future analytical improvements, is reviewed and illustrated for a heparin-derived hexasaccharide.
Martins, Mauricio Dias; Gingras, Bruno; Puig-Waldmueller, Estela; Fitch, W Tecumseh
2017-04-01
The human ability to process hierarchical structures has been a longstanding research topic. However, the nature of the cognitive machinery underlying this faculty remains controversial. Recursion, the ability to embed structures within structures of the same kind, has been proposed as a key component of our ability to parse and generate complex hierarchies. Here, we investigated the cognitive representation of both recursive and iterative processes in the auditory domain. The experiment used a two-alternative forced-choice paradigm: participants were exposed to three-step processes in which pure-tone sequences were built either through recursive or iterative processes, and had to choose the correct completion. Foils were constructed according to generative processes that did not match the previous steps. Both musicians and non-musicians were able to represent recursion in the auditory domain, although musicians performed better. We also observed that general 'musical' aptitudes played a role in both recursion and iteration, although the influence of musical training was somehow independent from melodic memory. Moreover, unlike iteration, recursion in audition was well correlated with its non-auditory (recursive) analogues in the visual and action sequencing domains. These results suggest that the cognitive machinery involved in establishing recursive representations is domain-general, even though this machinery requires access to information resulting from domain-specific processes. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Probing the Structures of Viral RNA Regulatory Elements with SHAPE and Related Methodologies
Rausch, Jason W.; Sztuba-Solinska, Joanna; Le Grice, Stuart F. J.
2018-01-01
Viral RNAs were selected by evolution to possess maximum functionality in a minimal sequence. Depending on the classification of the virus and the type of RNA in question, viral RNAs must alternately be replicated, spliced, transcribed, transported from the nucleus into the cytoplasm, translated and/or packaged into nascent virions, and in most cases, provide the sequence and structural determinants to facilitate these processes. One consequence of this compact multifunctionality is that viral RNA structures can be exquisitely complex, often involving intermolecular interactions with RNA or protein, intramolecular interactions between sequence segments separated by several thousands of nucleotides, or specialized motifs such as pseudoknots or kissing loops. The fluidity of viral RNA structure can also present a challenge when attempting to characterize it, as genomic RNAs especially are likely to sample numerous conformations at various stages of the virus life cycle. Here we review advances in chemoenzymatic structure probing that have made it possible to address such challenges with respect to cis-acting elements, full-length viral genomes and long non-coding RNAs that play a major role in regulating viral gene expression. PMID:29375504
Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong
2015-01-01
Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.
Segmenting Dynamic Human Action via Statistical Structure
ERIC Educational Resources Information Center
Baldwin, Dare; Andersson, Annika; Saffran, Jenny; Meyer, Meredith
2008-01-01
Human social, cognitive, and linguistic functioning depends on skills for rapidly processing action. Identifying distinct acts within the dynamic motion flow is one basic component of action processing; for example, skill at segmenting action is foundational to action categorization, verb learning, and comprehension of novel action sequences. Yet…
NASA Technical Reports Server (NTRS)
McMillan, R. Andrew; Howard, Jeanie; Zaluzec, Nestor J.; Kagawa, Hiromi K.; Li, Yi-Fen; Paavola, Chad D.; Trent, Jonathan D.
2004-01-01
Self-assembling biomolecules that form highly ordered structures have attracted interest as potential alternatives to conventional lithographic processes for patterning materials. Here we introduce a general technique for patterning materials on the nanoscale using genetically modified protein cage structures called chaperonins that self-assemble into crystalline templates. Constrained chemical synthesis of transition metal nanoparticles is specific to templates genetically functionalized with poly-Histidine sequences. These arrays of materials are ordered by the nanoscale structure of the crystallized protein. This system may be easily adapted to pattern a variety of materials given the rapidly growing list of peptide sequences selected by screening for specificity for inorganic materials.
Underwound DNA under Tension: Structure, Elasticity, and Sequence-Dependent Behaviors
NASA Astrophysics Data System (ADS)
Sheinin, Maxim Y.; Forth, Scott; Marko, John F.; Wang, Michelle D.
2011-09-01
DNA melting under torsion plays an important role in a wide variety of cellular processes. In the present Letter, we have investigated DNA melting at the single-molecule level using an angular optical trap. By directly measuring force, extension, torque, and angle of DNA, we determined the structural and elastic parameters of torsionally melted DNA. Our data reveal that under moderate forces, the melted DNA assumes a left-handed structure as opposed to an open bubble conformation and is highly torsionally compliant. We have also discovered that at low forces melted DNA properties are highly dependent on DNA sequence. These results provide a more comprehensive picture of the global DNA force-torque phase diagram.
Study of mould design and forming process on advanced polymer-matrix composite complex structure
NASA Astrophysics Data System (ADS)
Li, S. J.; Zhan, L. H.; Bai, H. M.; Chen, X. P.; Zhou, Y. Q.
2015-07-01
Advanced carbon fibre-reinforced polymer-matrix composites are widely applied to aviation manufacturing field due to their outstanding performance. In this paper, the mould design and forming process of the complex composite structure were discussed in detail using the hat stiffened structure as an example. The key issues of the moulddesign were analyzed, and the corresponding solutions were also presented. The crucial control points of the forming process such as the determination of materials and stacking sequence, the temperature and pressure route of the co-curing process were introduced. In order to guarantee the forming quality of the composite hat stiffened structure, a mathematical model about the aperture of rubber mandrel was introduced. The study presented in this paper may provide some actual references for the design and manufacture of the important complex composite structures.
How Messenger RNA and Nascent Chain Sequences Regulate Translation Elongation.
Choi, Junhong; Grosely, Rosslyn; Prabhakar, Arjun; Lapointe, Christopher P; Wang, Jinfan; Puglisi, Joseph D
2018-06-20
Translation elongation is a highly coordinated, multistep, multifactor process that ensures accurate and efficient addition of amino acids to a growing nascent-peptide chain encoded in the sequence of translated messenger RNA (mRNA). Although translation elongation is heavily regulated by external factors, there is clear evidence that mRNA and nascent-peptide sequences control elongation dynamics, determining both the sequence and structure of synthesized proteins. Advances in methods have driven experiments that revealed the basic mechanisms of elongation as well as the mechanisms of regulation by mRNA and nascent-peptide sequences. In this review, we highlight how mRNA and nascent-peptide elements manipulate the translation machinery to alter the dynamics and pathway of elongation.
Structural basis of DNA target recognition by the B3 domain of Arabidopsis epigenome reader VAL1
Sasnauskas, Giedrius; Kauneckaitė, Kotryna; Siksnys, Virginijus
2018-01-01
Abstract Arabidopsis thaliana requires a prolonged period of cold exposure during winter to initiate flowering in a process termed vernalization. Exposure to cold induces epigenetic silencing of the FLOWERING LOCUS C (FLC) gene by Polycomb group (PcG) proteins. A key role in this epigenetic switch is played by transcriptional repressors VAL1 and VAL2, which specifically recognize Sph/RY DNA sequences within FLC via B3 DNA binding domains, and mediate recruitment of PcG silencing machinery. To understand the structural mechanism of site-specific DNA recognition by VAL1, we have solved the crystal structure of VAL1 B3 domain (VAL1-B3) bound to a 12 bp oligoduplex containing the canonical Sph/RY DNA sequence 5′-CATGCA-3′/5′-TGCATG-3′. We find that VAL1-B3 makes H-bonds and van der Waals contacts to DNA bases of all six positions of the canonical Sph/RY element. In agreement with the structure, in vitro DNA binding studies show that VAL1-B3 does not tolerate substitutions at any position of the 5′-TGCATG-3′ sequence. The VAL1-B3–DNA structure presented here provides a structural model for understanding the specificity of plant B3 domains interacting with the Sph/RY and other DNA sequences. PMID:29660015
Structural Insights into the HIV-1 Minus-strand Strong-stop DNA*
Chen, Yingying; Maskri, Ouerdia; Chaminade, Françoise; René, Brigitte; Benkaroun, Jessica; Godet, Julien; Mély, Yves; Mauffret, Olivier; Fossé, Philippe
2016-01-01
An essential step of human immunodeficiency virus type 1 (HIV-1) reverse transcription is the first strand transfer that requires base pairing of the R region at the 3′-end of the genomic RNA with the complementary r region at the 3′-end of minus-strand strong-stop DNA (ssDNA). HIV-1 nucleocapsid protein (NC) facilitates this annealing process. Determination of the ssDNA structure is needed to understand the molecular basis of NC-mediated genomic RNA-ssDNA annealing. For this purpose, we investigated ssDNA using structural probes (nucleases and potassium permanganate). This study is the first to determine the secondary structure of the full-length HIV-1 ssDNA in the absence or presence of NC. The probing data and phylogenetic analysis support the folding of ssDNA into three stem-loop structures and the presence of four high-affinity binding sites for NC. Our results support a model for the NC-mediated annealing process in which the preferential binding of NC to four sites triggers unfolding of the three-dimensional structure of ssDNA, thus facilitating interaction of the r sequence of ssDNA with the R sequence of the genomic RNA. In addition, using gel retardation assays and ssDNA mutants, we show that the NC-mediated annealing process does not rely on a single pathway (zipper intermediate or kissing complex). PMID:26668324
Influence of processing sequence on the tribological properties of VGCF-X/PA6/SEBS composites
NASA Astrophysics Data System (ADS)
Osada, Yu; Nishitani, Yosuke; Kitano, Takeshi
2016-03-01
In order to develop the new tribomaterials for mechanical sliding parts with sufficient balance of mechanical and tribological properties, we investigated the influence of processing sequence on the tribological properties of the ternary nanocomposites: the polymer blends of polyamide 6 (PA6) and styrene-ethylene/butylene-styrene copolymer (SEBS) filled with vapor grown carbon fiber (VGCF-X), which is one of carbon nanofiber (CNF) and has 15nm diameter and 3μm length. Five different processing sequences: (1) VGCF-X, PA6 and SEBS were mixed simultaneously (Process A), (2) Re-mixing (Second compounding) of the materials prepared by Process A (Process AR),(3) SEBS was blended with PA6 (PA6/SEBS blends) and then these blends were mixed with VGCF-X (Process B), (4) VGCF-X was mixed with PA6 (VGCF-X/PA6 composites) and then these composites were blended with SEBS (Process C), and (5) VGCF-X were mixed with SEBS (VGCF-X/SEBS composites) and then these composites were blended with PA6 (Process D) were attempted for preparing of the ternary nanocomposites (VGCF-X/PA6/SEBS composites). These ternary polymer nanocomposites were extruded by a twin screw extruder and injection-molded. Their tribological properties were evaluated by using a ring-on-plate type sliding wear tester under dry condition. The tribological properties such as the frictional coefficient and the specific wear rate were influenced by the processing sequence. These results may be attributed to the change of internal structure formation, which is a dispersibility of SEBS particle and VGCF-X in ternary nanocomposites (VGCF-X/PA6/SEBS) by different processing sequences. In particular, the processing sequences of AR, B and D, which are those of re-mixing of VGCF-X, have a good dispersibility of VGCF-X for the improvement of tribological properties.
Cortical Sensitivity to Guitar Note Patterns: EEG Entrainment to Repetition and Key.
Bridwell, David A; Leslie, Emily; McCoy, Dakarai Q; Plis, Sergey M; Calhoun, Vince D
2017-01-01
Music is ubiquitous throughout recent human culture, and many individual's have an innate ability to appreciate and understand music. Our appreciation of music likely emerges from the brain's ability to process a series of repeated complex acoustic patterns. In order to understand these processes further, cortical responses were measured to a series of guitar notes presented with a musical pattern or without a pattern. ERP responses to individual notes were measured using a 24 electrode Bluetooth mobile EEG system (Smarting mBrainTrain) while 13 healthy non-musicians listened to structured (i.e., within musical keys and with repetition) or random sequences of guitar notes for 10 min each. We demonstrate an increased amplitude to the ERP that appears ~200 ms to notes presented within the musical sequence. This amplitude difference between random notes and patterned notes likely reflects individual's cortical sensitivity to guitar note patterns. These amplitudes were compared to ERP responses to a rare note embedded within a stream of frequent notes to determine whether the sensitivity to complex musical structure overlaps with the sensitivity to simple irregularities reflected in traditional auditory oddball experiments. Response amplitudes to the negative peak at ~175 ms are statistically correlated with the mismatch negativity (MMN) response measured to a rare note presented among a series of frequent notes (i.e., in a traditional oddball sequence), but responses to the subsequent positive peak at ~200 do not show a statistical relationship with the P300 response. Thus, the sensitivity to musical structure identified to 4 Hz note patterns appears somewhat distinct from the sensitivity to statistical regularities reflected in the traditional "auditory oddball" sequence. Overall, we suggest that this is a promising approach to examine individual's sensitivity to complex acoustic patterns, which may overlap with higher level cognitive processes, including language.
Automatic prediction of protein domains from sequence information using a hybrid learning system.
Nagarajan, Niranjan; Yona, Golan
2004-06-12
We describe a novel method for detecting the domain structure of a protein from sequence information alone. The method is based on analyzing multiple sequence alignments that are derived from a database search. Multiple measures are defined to quantify the domain information content of each position along the sequence and are combined into a single predictor using a neural network. The output is further smoothed and post-processed using a probabilistic model to predict the most likely transition positions between domains. The method was assessed using the domain definitions in SCOP and CATH for proteins of known structure and was compared with several other existing methods. Our method performs well both in terms of accuracy and sensitivity. It improves significantly over the best methods available, even some of the semi-manual ones, while being fully automatic. Our method can also be used to suggest and verify domain partitions based on structural data. A few examples of predicted domain definitions and alternative partitions, as suggested by our method, are also discussed. An online domain-prediction server is available at http://biozon.org/tools/domains/
The evolution processes of DNA sequences, languages and carols
NASA Astrophysics Data System (ADS)
Hauck, Jürgen; Henkel, Dorothea; Mika, Klaus
2001-04-01
The sequences of bases A, T, C and G of about 100 enolase, secA and cytochrome DNA were analyzed for attractive or repulsive interactions by the numbers T 1,T 2,T 3; r of nearest, next-nearest and third neighbor bases of the same kind and the concentration r=other bases/analyzed base. The area of possible T1, T2 values is limited by the linear borders T 2=2T 1-2, T 2=0 or T1=0 for clustering, attractive or repulsive interactions and the border T2=-2 T1+2(2- r) for a variation from repulsive to attractive interactions at r⩽2. Clustering is preferred by most bases in sequences of enolases and secA’ s. Major deviations with repulsive interactions of some bases are observed for archaea bacteria in secA and for highly developed animals and the human species in enolase sequences. The borders of the structure map for enthalpy stabilized structures with maximum interactions are approached in few cases. Most letters of the natural languages and some music notes are at the borders of the structure map.
Structural analysis of vibroacoustical processes
NASA Technical Reports Server (NTRS)
Gromov, A. P.; Myasnikov, L. L.; Myasnikova, Y. N.; Finagin, B. A.
1973-01-01
The method of automatic identification of acoustical signals, by means of the segmentation was used to investigate noises and vibrations in machines and mechanisms, for cybernetic diagnostics. The structural analysis consists of presentation of a noise or vibroacoustical signal as a sequence of segments, determined by the time quantization, in which each segment is characterized by specific spectral characteristics. The structural spectrum is plotted as a histogram of the segments, also as a relation of the probability density of appearance of a segment to the segment type. It is assumed that the conditions of ergodic processes are maintained.
Maleki, Ehsan; Babashah, Hossein; Koohi, Somayyeh; Kavehvash, Zahra
2017-07-01
This paper presents an optical processing approach for exploring a large number of genome sequences. Specifically, we propose an optical correlator for global alignment and an extended moiré matching technique for local analysis of spatially coded DNA, whose output is fed to a novel three-dimensional artificial neural network for local DNA alignment. All-optical implementation of the proposed 3D artificial neural network is developed and its accuracy is verified in Zemax. Thanks to its parallel processing capability, the proposed structure performs local alignment of 4 million sequences of 150 base pairs in a few seconds, which is much faster than its electrical counterparts, such as the basic local alignment search tool.
VAMPS: a website for visualization and analysis of microbial population structures.
Huse, Susan M; Mark Welch, David B; Voorhis, Andy; Shipunova, Anna; Morrison, Hilary G; Eren, A Murat; Sogin, Mitchell L
2014-02-05
The advent of next-generation DNA sequencing platforms has revolutionized molecular microbial ecology by making the detailed analysis of complex communities over time and space a tractable research pursuit for small research groups. However, the ability to generate 10⁵-10⁸ reads with relative ease brings with it many downstream complications. Beyond the computational resources and skills needed to process and analyze data, it is difficult to compare datasets in an intuitive and interactive manner that leads to hypothesis generation and testing. We developed the free web service VAMPS (Visualization and Analysis of Microbial Population Structures, http://vamps.mbl.edu) to address these challenges and to facilitate research by individuals or collaborating groups working on projects with large-scale sequencing data. Users can upload marker gene sequences and associated metadata; reads are quality filtered and assigned to both taxonomic structures and to taxonomy-independent clusters. A simple point-and-click interface allows users to select for analysis any combination of their own or their collaborators' private data and data from public projects, filter these by their choice of taxonomic and/or abundance criteria, and then explore these data using a wide range of analytic methods and visualizations. Each result is extensively hyperlinked to other analysis and visualization options, promoting data exploration and leading to a greater understanding of data relationships. VAMPS allows researchers using marker gene sequence data to analyze the diversity of microbial communities and the relationships between communities, to explore these analyses in an intuitive visual context, and to download data, results, and images for publication. VAMPS obviates the need for individual research groups to make the considerable investment in computational infrastructure and bioinformatic support otherwise necessary to process, analyze, and interpret massive amounts of next-generation sequence data. Any web-capable device can be used to upload, process, explore, and extract data and results from VAMPS. VAMPS encourages researchers to share sequence and metadata, and fosters collaboration between researchers of disparate biomes who recognize common patterns in shared data.
Shahinyan, Grigor; Margaryan, Armine; Panosyan, Hovik; Trchounian, Armen
2017-05-02
Among the huge diversity of thermophilic bacteria mainly bacilli have been reported as active thermostable lipase producers. Geothermal springs serve as the main source for isolation of thermostable lipase producing bacilli. Thermostable lipolytic enzymes, functioning in the harsh conditions, have promising applications in processing of organic chemicals, detergent formulation, synthesis of biosurfactants, pharmaceutical processing etc. In order to study the distribution of lipase-producing thermophilic bacilli and their specific lipase protein primary structures, three lipase producers from different genera were isolated from mesothermal (27.5-70 °C) springs distributed on the territory of Armenia and Nagorno Karabakh. Based on phenotypic characteristics and 16S rRNA gene sequencing the isolates were identified as Geobacillus sp., Bacillus licheniformis and Anoxibacillus flavithermus strains. The lipase genes of isolates were sequenced by using initially designed primer sets. Multiple alignments generated from primary structures of the lipase proteins and annotated lipase protein sequences, conserved regions analysis and amino acid composition have illustrated the similarity (98-99%) of the lipases with true lipases (family I) and GDSL esterase family (family II). A conserved sequence block that determines the thermostability has been identified in the multiple alignments of the lipase proteins. The results are spreading light on the lipase producing bacilli distribution in geothermal springs in Armenia and Nagorno Karabakh. Newly isolated bacilli strains could be prospective source for thermostable lipases and their genes.
DNA Secondary Structure at Chromosomal Fragile Sites in Human Disease
Thys, Ryan G; Lehman, Christine E; Pierce, Levi C. T; Wang, Yuh-Hwa
2015-01-01
DNA has the ability to form a variety of secondary structures that can interfere with normal cellular processes, and many of these structures have been associated with neurological diseases and cancer. Secondary structure-forming sequences are often found at chromosomal fragile sites, which are hotspots for sister chromatid exchange, chromosomal translocations, and deletions. Structures formed at fragile sites can lead to instability by disrupting normal cellular processes such as DNA replication and transcription. The instability caused by disruption of replication and transcription can lead to DNA breakage, resulting in gene rearrangements and deletions that cause disease. In this review, we discuss the role of DNA secondary structure at fragile sites in human disease. PMID:25937814
DOE Office of Scientific and Technical Information (OSTI.GOV)
Han, S.; Tainer, J.A.
2001-08-01
ADP-ribosylation is a widely occurring and biologically critical covalent chemical modification process in pathogenic mechanisms, intracellular signaling systems, DNA repair, and cell division. The reaction is catalyzed by ADP-ribosyltransferases, which transfer the ADP-ribose moiety of NAD to a target protein with nicotinamide release. A family of bacterial toxins and eukaryotic enzymes has been termed the mono-ADP-ribosyltransferases, in distinction to the poly-ADP-ribosyltransferases, which catalyze the addition of multiple ADP-ribose groups to the carboxyl terminus of eukaryotic nucleoproteins. Despite the limited primary sequence homology among the different ADP-ribosyltransferases, a central cleft bearing NAD-binding pocket formed by the two perpendicular b-sheet core hasmore » been remarkably conserved between bacterial toxins and eukaryotic mono- and poly-ADP-ribosyltransferases. The majority of bacterial toxins and eukaryotic mono-ADP-ribosyltransferases are characterized by conserved His and catalytic Glu residues. In contrast, Diphtheria toxin, Pseudomonas exotoxin A, and eukaryotic poly-ADP-ribosyltransferases are characterized by conserved Arg and catalytic Glu residues. The NAD-binding core of a binary toxin and a C3-like toxin family identified an ARTT motif (ADP-ribosylating turn-turn motif) that is implicated in substrate specificity and recognition by structural and mutagenic studies. Here we apply structure-based sequence alignment and comparative structural analyses of all known structures of ADP-ribosyltransfeases to suggest that this ARTT motif is functionally important in many ADP-ribosylating enzymes that bear a NAD binding cleft as characterized by conserved Arg and catalytic Glu residues. Overall, structure-based sequence analysis reveals common core structures and conserved active sites of ADP-ribosyltransferases to support similar NAD binding mechanisms but differing mechanisms of target protein binding via sequence variations within the ARTT motif structural framework. Thus, we propose here that the ARTT motif represents an experimentally testable general recognition motif region for many ADP-ribosyltransferases and thereby potentially provides a unified structural understanding of substrate recognition in ADP-ribosylation processes.« less
NASA Technical Reports Server (NTRS)
Dayhoff, M. O.
1971-01-01
The amino acid sequences of proteins from living organisms are dealt with. The structure of proteins is first discussed; the variation in this structure from one biological group to another is illustrated by the first halves of the sequences of cytochrome c, and a phylogenetic tree is derived from the cytochrome c data. The relative geological times associated with the events of this tree are discussed. Errors which occur in the duplication of cells during the evolutionary process are examined. Particular attention is given to evolution of mutant proteins, globins, ferredoxin, and transfer ribonucleic acids (tRNA's). Finally, a general outline of biological evolution is presented.
The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4).
Huntemann, Marcel; Ivanova, Natalia N; Mavromatis, Konstantinos; Tripp, H James; Paez-Espino, David; Palaniappan, Krishnaveni; Szeto, Ernest; Pillay, Manoj; Chen, I-Min A; Pati, Amrita; Nielsen, Torben; Markowitz, Victor M; Kyrpides, Nikos C
2015-01-01
The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. Structural annotation is followed by assignment of protein product names and functions.
Neural Sequence Generation Using Spatiotemporal Patterns of Inhibition.
Cannon, Jonathan; Kopell, Nancy; Gardner, Timothy; Markowitz, Jeffrey
2015-11-01
Stereotyped sequences of neural activity are thought to underlie reproducible behaviors and cognitive processes ranging from memory recall to arm movement. One of the most prominent theoretical models of neural sequence generation is the synfire chain, in which pulses of synchronized spiking activity propagate robustly along a chain of cells connected by highly redundant feedforward excitation. But recent experimental observations in the avian song production pathway during song generation have shown excitatory activity interacting strongly with the firing patterns of inhibitory neurons, suggesting a process of sequence generation more complex than feedforward excitation. Here we propose a model of sequence generation inspired by these observations in which a pulse travels along a spatially recurrent excitatory chain, passing repeatedly through zones of local feedback inhibition. In this model, synchrony and robust timing are maintained not through redundant excitatory connections, but rather through the interaction between the pulse and the spatiotemporal pattern of inhibition that it creates as it circulates the network. These results suggest that spatially and temporally structured inhibition may play a key role in sequence generation.
The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v.4)
Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos; ...
2016-02-24
The DOE-JGI Metagenome Annotation Pipeline (MAP v.4) performs structural and functional annotation for metagenomic sequences that are submitted to the Integrated Microbial Genomes with Microbiomes (IMG/M) system for comparative analysis. The pipeline runs on nucleotide sequences provide d via the IMG submission site. Users must first define their analysis projects in GOLD and then submit the associated sequence datasets consisting of scaffolds/contigs with optional coverage information and/or unassembled reads in fasta and fastq file formats. The MAP processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNAs, as well as CRISPR elements. Structural annotation ismore » followed by functional annotation including assignment of protein product names and connection to various protein family databases.« less
The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v.4)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos
The DOE-JGI Metagenome Annotation Pipeline (MAP v.4) performs structural and functional annotation for metagenomic sequences that are submitted to the Integrated Microbial Genomes with Microbiomes (IMG/M) system for comparative analysis. The pipeline runs on nucleotide sequences provide d via the IMG submission site. Users must first define their analysis projects in GOLD and then submit the associated sequence datasets consisting of scaffolds/contigs with optional coverage information and/or unassembled reads in fasta and fastq file formats. The MAP processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNAs, as well as CRISPR elements. Structural annotation ismore » followed by functional annotation including assignment of protein product names and connection to various protein family databases.« less
Algorithm to find distant repeats in a single protein sequence
Banerjee, Nirjhar; Sarani, Rangarajan; Ranjani, Chellamuthu Vasuki; Sowmiya, Govindaraj; Michael, Daliah; Balakrishnan, Narayanasamy; Sekar, Kanagaraj
2008-01-01
Distant repeats in protein sequence play an important role in various aspects of protein analysis. A keen analysis of the distant repeats would enable to establish a firm relation of the repeats with respect to their function and three-dimensional structure during the evolutionary process. Further, it enlightens the diversity of duplication during the evolution. To this end, an algorithm has been developed to find all distant repeats in a protein sequence. The scores from Point Accepted Mutation (PAM) matrix has been deployed for the identification of amino acid substitutions while detecting the distant repeats. Due to the biological importance of distant repeats, the proposed algorithm will be of importance to structural biologists, molecular biologists, biochemists and researchers involved in phylogenetic and evolutionary studies. PMID:19052663
A statistical physics perspective on alignment-independent protein sequence comparison.
Chattopadhyay, Amit K; Nasiev, Diar; Flower, Darren R
2015-08-01
Within bioinformatics, the textual alignment of amino acid sequences has long dominated the determination of similarity between proteins, with all that implies for shared structure, function and evolutionary descent. Despite the relative success of modern-day sequence alignment algorithms, so-called alignment-free approaches offer a complementary means of determining and expressing similarity, with potential benefits in certain key applications, such as regression analysis of protein structure-function studies, where alignment-base similarity has performed poorly. Here, we offer a fresh, statistical physics-based perspective focusing on the question of alignment-free comparison, in the process adapting results from 'first passage probability distribution' to summarize statistics of ensemble averaged amino acid propensity values. In this article, we introduce and elaborate this approach. © The Author 2015. Published by Oxford University Press.
Nayak, Dhananjaya; Siller, Sylvester; Guo, Qing; Sousa, Rui
2008-02-15
The T7RNA polymerase (RNAP) elongation complex (EC) pauses and is destabilized at a unique 8 nucleotide (nt) sequence found at the junction of the head-to-tail concatemers of T7 genomic DNA generated during T7 DNA replication. The paused EC may recruit the T7 DNA processing machinery, which cleaves the concatemerized DNA within this 8 nt concatemer junction (CJ). Pausing of the EC at the CJ involves structural changes in both the RNAP and transcription bubble. However, these structural changes have not been fully defined, nor is it understood how the CJ sequence itself causes the EC to change its structure, to pause, and to become less stable. Here we use solution and RNAP-tethered chemical nucleases to probe the CJ transcript and changes in the EC structure as the polymerase pauses and terminates at the CJ. Together with extensive mutational scanning of regions of the polymerase that are likely to be involved in recognition of the CJ, we are able to develop a description of the events that occur as the EC transcribes through the CJ and subsequently pauses. In this process, a local change in the structure of the transcription bubble drives a large change in the architecture of the EC. This altered EC structure may then serve as the signal that recruits the processing machinery to the CJ.
Harkness, Robert W; Mittermaier, Anthony K
2017-11-01
G-quadruplexes (GQs) are four-stranded nucleic acid secondary structures formed by guanosine (G)-rich DNA and RNA sequences. It is becoming increasingly clear that cellular processes including gene expression and mRNA translation are regulated by GQs. GQ structures have been extensively characterized, however little attention to date has been paid to their conformational dynamics, despite the fact that many biological GQ sequences populate multiple structures of similar free energies, leading to an ensemble of exchanging conformations. The impact of these dynamics on biological function is currently not well understood. Recently, structural dynamics have been demonstrated to entropically stabilize GQ ensembles, potentially modulating gene expression. Transient, low-populated states in GQ ensembles may additionally regulate nucleic acid interactions and function. This review will underscore the interplay of GQ dynamics and biological function, focusing on several dynamic processes for biological GQs and the characterization of GQ dynamics by nuclear magnetic resonance (NMR) spectroscopy in conjunction with other biophysical techniques. This article is part of a Special Issue entitled: Biophysics in Canada, edited by Lewis Kay, John Baenziger, Albert Berghuis and Peter Tieleman. Copyright © 2017 Elsevier B.V. All rights reserved.
Milne, Alice E; Petkov, Christopher I; Wilson, Benjamin
2017-07-05
Language flexibly supports the human ability to communicate using different sensory modalities, such as writing and reading in the visual modality and speaking and listening in the auditory domain. Although it has been argued that nonhuman primate communication abilities are inherently multisensory, direct behavioural comparisons between human and nonhuman primates are scant. Artificial grammar learning (AGL) tasks and statistical learning experiments can be used to emulate ordering relationships between words in a sentence. However, previous comparative work using such paradigms has primarily investigated sequence learning within a single sensory modality. We used an AGL paradigm to evaluate how humans and macaque monkeys learn and respond to identically structured sequences of either auditory or visual stimuli. In the auditory and visual experiments, we found that both species were sensitive to the ordering relationships between elements in the sequences. Moreover, the humans and monkeys produced largely similar response patterns to the visual and auditory sequences, indicating that the sequences are processed in comparable ways across the sensory modalities. These results provide evidence that human sequence processing abilities stem from an evolutionarily conserved capacity that appears to operate comparably across the sensory modalities in both human and nonhuman primates. The findings set the stage for future neurobiological studies to investigate the multisensory nature of these sequencing operations in nonhuman primates and how they compare to related processes in humans. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Backbone hydration determines the folding signature of amino acid residues.
Bignucolo, Olivier; Leung, Hoi Tik Alvin; Grzesiek, Stephan; Bernèche, Simon
2015-04-08
The relation between the sequence of a protein and its three-dimensional structure remains largely unknown. A lasting dream is to elucidate the side-chain-dependent driving forces that govern the folding process. Different structural data suggest that aromatic amino acids play a particular role in the stabilization of protein structures. To better understand the underlying mechanism, we studied peptides of the sequence EGAAXAASS (X = Gly, Ile, Tyr, Trp) through comparison of molecular dynamics (MD) trajectories and NMR residual dipolar coupling (RDC) measurements. The RDC data for aromatic substitutions provide evidence for a kink in the peptide backbone. Analysis of the MD simulations shows that the formation of internal hydrogen bonds underlying a helical turn is key to reproduce the experimental RDC values. The simulations further reveal that the driving force leading to such helical-turn conformations arises from the lack of hydration of the peptide chain on either side of the bulky aromatic side chain, which can potentially act as a nucleation point initiating the folding process.
Di Pierro, Michele; Cheng, Ryan R; Lieberman Aiden, Erez; Wolynes, Peter G; Onuchic, José N
2017-11-14
Inside the cell nucleus, genomes fold into organized structures that are characteristic of cell type. Here, we show that this chromatin architecture can be predicted de novo using epigenetic data derived from chromatin immunoprecipitation-sequencing (ChIP-Seq). We exploit the idea that chromosomes encode a 1D sequence of chromatin structural types. Interactions between these chromatin types determine the 3D structural ensemble of chromosomes through a process similar to phase separation. First, a neural network is used to infer the relation between the epigenetic marks present at a locus, as assayed by ChIP-Seq, and the genomic compartment in which those loci reside, as measured by DNA-DNA proximity ligation (Hi-C). Next, types inferred from this neural network are used as an input to an energy landscape model for chromatin organization [Minimal Chromatin Model (MiChroM)] to generate an ensemble of 3D chromosome conformations at a resolution of 50 kilobases (kb). After training the model, dubbed Maximum Entropy Genomic Annotation from Biomarkers Associated to Structural Ensembles (MEGABASE), on odd-numbered chromosomes, we predict the sequences of chromatin types and the subsequent 3D conformational ensembles for the even chromosomes. We validate these structural ensembles by using ChIP-Seq tracks alone to predict Hi-C maps, as well as distances measured using 3D fluorescence in situ hybridization (FISH) experiments. Both sets of experiments support the hypothesis of phase separation being the driving process behind compartmentalization. These findings strongly suggest that epigenetic marking patterns encode sufficient information to determine the global architecture of chromosomes and that de novo structure prediction for whole genomes may be increasingly possible. Copyright © 2017 the Author(s). Published by PNAS.
Role of Sequence and Structural Polymorphism on the Mechanical Properties of Amyloid Fibrils
Kim, Jae In; Na, Sungsoo; Eom, Kilho
2014-01-01
Amyloid fibrils playing a critical role in disease expression, have recently been found to exhibit the excellent mechanical properties such as elastic modulus in the order of 10 GPa, which is comparable to that of other mechanical proteins such as microtubule, actin filament, and spider silk. These remarkable mechanical properties of amyloid fibrils are correlated with their functional role in disease expression. This suggests the importance in understanding how these excellent mechanical properties are originated through self-assembly process that may depend on the amino acid sequence. However, the sequence-structure-property relationship of amyloid fibrils has not been fully understood yet. In this work, we characterize the mechanical properties of human islet amyloid polypeptide (hIAPP) fibrils with respect to their molecular structures as well as their amino acid sequence by using all-atom explicit water molecular dynamics (MD) simulation. The simulation result suggests that the remarkable bending rigidity of amyloid fibrils can be achieved through a specific self-aggregation pattern such as antiparallel stacking of β strands (peptide chain). Moreover, we have shown that a single point mutation of hIAPP chain constituting a hIAPP fibril significantly affects the thermodynamic stability of hIAPP fibril formed by parallel stacking of peptide chain, and that a single point mutation results in a significant change in the bending rigidity of hIAPP fibrils formed by antiparallel stacking of β strands. This clearly elucidates the role of amino acid sequence on not only the equilibrium conformations of amyloid fibrils but also their mechanical properties. Our study sheds light on sequence-structure-property relationships of amyloid fibrils, which suggests that the mechanical properties of amyloid fibrils are encoded in their sequence-dependent molecular architecture. PMID:24551113
Reading biological processes from nucleotide sequences
NASA Astrophysics Data System (ADS)
Murugan, Anand
Cellular processes have traditionally been investigated by techniques of imaging and biochemical analysis of the molecules involved. The recent rapid progress in our ability to manipulate and read nucleic acid sequences gives us direct access to the genetic information that directs and constrains biological processes. While sequence data is being used widely to investigate genotype-phenotype relationships and population structure, here we use sequencing to understand biophysical mechanisms. We present work on two different systems. First, in chapter 2, we characterize the stochastic genetic editing mechanism that produces diverse T-cell receptors in the human immune system. We do this by inferring statistical distributions of the underlying biochemical events that generate T-cell receptor coding sequences from the statistics of the observed sequences. This inferred model quantitatively describes the potential repertoire of T-cell receptors that can be produced by an individual, providing insight into its potential diversity and the probability of generation of any specific T-cell receptor. Then in chapter 3, we present work on understanding the functioning of regulatory DNA sequences in both prokaryotes and eukaryotes. Here we use experiments that measure the transcriptional activity of large libraries of mutagenized promoters and enhancers and infer models of the sequence-function relationship from this data. For the bacterial promoter, we infer a physically motivated 'thermodynamic' model of the interaction of DNA-binding proteins and RNA polymerase determining the transcription rate of the downstream gene. For the eukaryotic enhancers, we infer heuristic models of the sequence-function relationship and use these models to find synthetic enhancer sequences that optimize inducibility of expression. Both projects demonstrate the utility of sequence information in conjunction with sophisticated statistical inference techniques for dissecting underlying biophysical mechanisms.
Gao, Feng; Song, Weibo; Katz, Laura A.
2014-01-01
In most lineages, diversity among gene family members results from gene duplication followed by sequence divergence. Because of the genome rearrangements during the development of somatic nuclei, gene family evolution in ciliates involves more complex processes. Previous work on the ciliate Chilodonella uncinata revealed that macronuclear β-tubulin gene family members are generated by alternative processing, in which germline regions are alternatively used in multiple macronuclear chromosomes. To further study genome evolution in this ciliate, we analyzed its transcriptome and found that: 1) alternative processing is extensive among gene families; and 2) such gene families are likely to be C. uncinata-specific. We characterized additional macronuclear and micronuclear copies of one candidate alternatively processed gene family -- a protein kinase domain containing protein (PKc) -- from two C. uncinata strains. Analysis of the PKc sequences reveals: 1) multiple PKc gene family members in the macronucleus share some identical regions flanked by divergent regions; and 2) the shared identical regions are processed from a single micronuclear chromosome. We discuss analogous processes in lineages across the eukaryotic tree of life to provide further insights on the impact of genome structure on gene family evolution in eukaryotes. PMID:24749903
Biosynthesis and processing of the somatostatin family of peptide hormones.
Andrews, P C; Dixon, J E
1986-01-01
Understanding of the biosynthesis of the somatostatin family of peptide hormones has greatly increased in recent years. Isolation and sequencing of the rat somatostatin gene indicates that it contains a single intron located between the codons for Gn(-57) and Glu(-56) of pre-prosomatostatin. The gene contains three repetitive sequences, one at the 5' end of the gene and two of them 3' to the coding portion. Two of the sequences consist of alternating purine-pyrimidine bases and have been shown to adopt Z-DNA structures in vitro. The cDNA for rat somatostatin codes for a 116-residue peptide structurally similar to the anglerfish and catfish precursors to the 14-residue somatostatin (SST-14). In addition to SST-14, the catfish and the anglerfish both contain an additional pancreatic somatostatin, each derived from a different gene. The catfish contains a 22-residue somatostatin, which is O-glycosylated at Thr-5. The second somatostatin gene from anglerfish encodes a prosomatostatin that is processed to a 28-residue peptide. The mature peptide contains a hydroxylated lysine at position 23.
Disentangling perceptual from motor implicit sequence learning with a serial color-matching task.
Gheysen, Freja; Gevers, Wim; De Schutter, Erik; Van Waelvelde, Hilde; Fias, Wim
2009-08-01
This paper contributes to the domain of implicit sequence learning by presenting a new version of the serial reaction time (SRT) task that allows unambiguously separating perceptual from motor learning. Participants matched the colors of three small squares with the color of a subsequently presented large target square. An identical sequential structure was tied to the colors of the target square (perceptual version, Experiment 1) or to the manual responses (motor version, Experiment 2). Short blocks of sequenced and randomized trials alternated and hence provided a continuous monitoring of the learning process. Reaction time measurements demonstrated clear evidence of independently learning perceptual and motor serial information, though revealed different time courses between both learning processes. No explicit awareness of the serial structure was needed for either of the two types of learning to occur. The paradigm introduced in this paper evidenced that perceptual learning can occur with SRT measurements and opens important perspectives for future imaging studies to answer the ongoing question, which brain areas are involved in the implicit learning of modality specific (motor vs. perceptual) or general serial order.
Chunk formation in immediate memory and how it relates to data compression.
Chekaf, Mustapha; Cowan, Nelson; Mathy, Fabien
2016-10-01
This paper attempts to evaluate the capacity of immediate memory to cope with new situations in relation to the compressibility of information likely to allow the formation of chunks. We constructed a task in which untrained participants had to immediately recall sequences of stimuli with possible associations between them. Compressibility of information was used to measure the chunkability of each sequence on a single trial. Compressibility refers to the recoding of information in a more compact representation. Although compressibility has almost exclusively been used to study long-term memory, our theory suggests that a compression process relying on redundancies within the structure of the list materials can occur very rapidly in immediate memory. The results indicated a span of about three items when the list had no structure, but increased linearly as structure was added. The amount of information retained in immediate memory was maximal for the most compressible sequences, particularly when information was ordered in a way that facilitated the compression process. We discuss the role of immediate memory in the rapid formation of chunks made up of new associations that did not already exist in long-term memory, and we conclude that immediate memory is the starting place for the reorganization of information. Copyright © 2016 Elsevier B.V. All rights reserved.
Adelman, K; Salmon, B; Baines, J D
2001-03-13
The product of the herpes simplex virus type 1 U(L)28 gene is essential for cleavage of concatemeric viral DNA into genome-length units and packaging of this DNA into viral procapsids. To address the role of U(L)28 in this process, purified U(L)28 protein was assayed for the ability to recognize conserved herpesvirus DNA packaging sequences. We report that DNA fragments containing the pac1 DNA packaging motif can be induced by heat treatment to adopt novel DNA conformations that migrate faster than the corresponding duplex in nondenaturing gels. Surprisingly, these novel DNA structures are high-affinity substrates for U(L)28 protein binding, whereas double-stranded DNA of identical sequence composition is not recognized by U(L)28 protein. We demonstrate that only one strand of the pac1 motif is responsible for the formation of novel DNA structures that are bound tightly and specifically by U(L)28 protein. To determine the relevance of the observed U(L)28 protein-pac1 interaction to the cleavage and packaging process, we have analyzed the binding affinity of U(L)28 protein for pac1 mutants previously shown to be deficient in cleavage and packaging in vivo. Each of the pac1 mutants exhibited a decrease in DNA binding by U(L)28 protein that correlated directly with the reported reduction in cleavage and packaging efficiency, thereby supporting a role for the U(L)28 protein-pac1 interaction in vivo. These data therefore suggest that the formation of novel DNA structures by the pac1 motif confers added specificity on recognition of DNA packaging sequences by the U(L)28-encoded component of the herpesvirus cleavage and packaging machinery.
NASA Astrophysics Data System (ADS)
Senge, S.; Brachmann, J.; Hirt, G.; Bührig-Polaczek, A.
2017-10-01
Lightweight design is a major driving force of innovation, especially in the automotive industry. Using hybrid components made of two or more different materials is one approach to reduce the vehicles weight and decrease fuel consumption. As a possible way to increase the stiffness of multi-material components, this paper presents a process chain to produce such components made of steel sheets and high-pressure die cast aluminium. Prior to the casting sequence the steel sheets are structured in a modified rolling process which enables continuous interlocking with the aluminium. Two structures manufactured by this rolling process are tested. The first one is a channel like structure and the second one is a channel like structure with undercuts. These undercuts enable the formation of small anchors when the molten aluminium fills them. The correlation between thickness reduction during rolling and the shape of the resulting structure was evaluated for both structures. It can be stated that channels with a depth of up to 0.5 mm and a width of 1 mm could be created. Undercuts with different size depending on the thickness reduction could be realised. Subsequent aluminium high-pressure die casting experiments were performed to determine if the surface structure can be filled gap-free with molten aluminium during the casting sequence and if a gap-free connection can be achieved after contraction of the aluminium. The casting experiments showed that both structures could be filled during the high-pressure die casting. The channel like structure results in a gap between steel and aluminium after contraction of the cast metal whereas the structure with undercuts leads to a good interlocking resulting in a gap-free connection.
The Central Italy Seismic Sequence (2016): Spatial Patterns and Dynamic Fingerprints
NASA Astrophysics Data System (ADS)
Suteanu, Cristian; Liucci, Luisa; Melelli, Laura
2018-01-01
The paper investigates spatio-temporal aspects of the seismic sequence that started in Central Italy (Amatrice, Lazio region) in August 2016, causing hundreds of fatalities and producing major damage to settlements. On one hand, scaling properties of the landscape topography are identified and related to geomorphological processes, supporting the identification of preferential spatial directions in tectonic activity and confirming the role of the past tectonic periods and ongoing processes with respect to the driving of the geomorphological evolution of the area. On the other hand, relations between the spatio-temporal evolution of the sequence and the seismogenic fault systems are studied. The dynamic fingerprints of seismicity are established with the help of events thread analysis (ETA), which characterizes anisotropy in spatio-temporal earthquake patterns. ETA confirms the fact that the direction of the seismogenic normal fault-oriented (N)NW-(S)SE is characterized by persistent seismic activity. More importantly, it also highlights the role of the pre-existing compressive structures, Neogenic thrust and transpressive regional fronts, with a trend-oriented (N)NE-(S)SW, in the stress transfer. Both the fractal features of the topographic surface and the dynamic fingerprint of the recent seismic sequence point to the hypothesis of an active interaction between the Quaternary fault systems and the pre-existing compressional structures.
A Stochastic Evolutionary Model for Protein Structure Alignment and Phylogeny
Challis, Christopher J.; Schmidler, Scott C.
2012-01-01
We present a stochastic process model for the joint evolution of protein primary and tertiary structure, suitable for use in alignment and estimation of phylogeny. Indels arise from a classic Links model, and mutations follow a standard substitution matrix, whereas backbone atoms diffuse in three-dimensional space according to an Ornstein–Uhlenbeck process. The model allows for simultaneous estimation of evolutionary distances, indel rates, structural drift rates, and alignments, while fully accounting for uncertainty. The inclusion of structural information enables phylogenetic inference on time scales not previously attainable with sequence evolution models. The model also provides a tool for testing evolutionary hypotheses and improving our understanding of protein structural evolution. PMID:22723302
SwiSpot: modeling riboswitches by spotting out switching sequences.
Barsacchi, Marco; Novoa, Eva Maria; Kellis, Manolis; Bechini, Alessio
2016-11-01
Riboswitches are cis-regulatory elements in mRNA, mostly found in Bacteria, which exhibit two main secondary structure conformations. Although one of them prevents the gene from being expressed, the other conformation allows its expression, and this switching process is typically driven by the presence of a specific ligand. Although there are a handful of known riboswitches, our knowledge in this field has been greatly limited due to our inability to identify their alternate structures from their sequences. Indeed, current methods are not able to predict the presence of the two functionally distinct conformations just from the knowledge of the plain RNA nucleotide sequence. Whether this would be possible, for which cases, and what prediction accuracy can be achieved, are currently open questions. Here we show that the two alternate secondary structures of riboswitches can be accurately predicted once the 'switching sequence' of the riboswitch has been properly identified. The proposed SwiSpot approach is capable of identifying the switching sequence inside a putative, complete riboswitch sequence, on the basis of pairing behaviors, which are evaluated on proper sets of configurations. Moreover, it is able to model the switching behavior of riboswitches whose generated ensemble covers both alternate configurations. Beyond structural predictions, the approach can also be paired to homology-based riboswitch searches. SwiSpot software, along with the reference dataset files, is available at: http://www.iet.unipi.it/a.bechini/swispot/Supplementary information: Supplementary data are available at Bioinformatics online. a.bechini@ing.unipi.it. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Introduction to bioinformatics.
Can, Tolga
2014-01-01
Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.
Learning Orthographic Structure With Sequential Generative Neural Networks.
Testolin, Alberto; Stoianov, Ivilin; Sperduti, Alessandro; Zorzi, Marco
2016-04-01
Learning the structure of event sequences is a ubiquitous problem in cognition and particularly in language. One possible solution is to learn a probabilistic generative model of sequences that allows making predictions about upcoming events. Though appealing from a neurobiological standpoint, this approach is typically not pursued in connectionist modeling. Here, we investigated a sequential version of the restricted Boltzmann machine (RBM), a stochastic recurrent neural network that extracts high-order structure from sensory data through unsupervised generative learning and can encode contextual information in the form of internal, distributed representations. We assessed whether this type of network can extract the orthographic structure of English monosyllables by learning a generative model of the letter sequences forming a word training corpus. We show that the network learned an accurate probabilistic model of English graphotactics, which can be used to make predictions about the letter following a given context as well as to autonomously generate high-quality pseudowords. The model was compared to an extended version of simple recurrent networks, augmented with a stochastic process that allows autonomous generation of sequences, and to non-connectionist probabilistic models (n-grams and hidden Markov models). We conclude that sequential RBMs and stochastic simple recurrent networks are promising candidates for modeling cognition in the temporal domain. Copyright © 2015 Cognitive Science Society, Inc.
SPMBR: a scalable algorithm for mining sequential patterns based on bitmaps
NASA Astrophysics Data System (ADS)
Xu, Xiwei; Zhang, Changhai
2013-12-01
Now some sequential patterns mining algorithms generate too many candidate sequences, and increase the processing cost of support counting. Therefore, we present an effective and scalable algorithm called SPMBR (Sequential Patterns Mining based on Bitmap Representation) to solve the problem of mining the sequential patterns for large databases. Our method differs from previous related works of mining sequential patterns. The main difference is that the database of sequential patterns is represented by bitmaps, and a simplified bitmap structure is presented firstly. In this paper, First the algorithm generate candidate sequences by SE(Sequence Extension) and IE(Item Extension), and then obtain all frequent sequences by comparing the original bitmap and the extended item bitmap .This method could simplify the problem of mining the sequential patterns and avoid the high processing cost of support counting. Both theories and experiments indicate that the performance of SPMBR is predominant for large transaction databases, the required memory size for storing temporal data is much less during mining process, and all sequential patterns can be mined with feasibility.
Robust analysis of semiparametric renewal process models
Lin, Feng-Chang; Truong, Young K.; Fine, Jason P.
2013-01-01
Summary A rate model is proposed for a modulated renewal process comprising a single long sequence, where the covariate process may not capture the dependencies in the sequence as in standard intensity models. We consider partial likelihood-based inferences under a semiparametric multiplicative rate model, which has been widely studied in the context of independent and identical data. Under an intensity model, gap times in a single long sequence may be used naively in the partial likelihood with variance estimation utilizing the observed information matrix. Under a rate model, the gap times cannot be treated as independent and studying the partial likelihood is much more challenging. We employ a mixing condition in the application of limit theory for stationary sequences to obtain consistency and asymptotic normality. The estimator's variance is quite complicated owing to the unknown gap times dependence structure. We adapt block bootstrapping and cluster variance estimators to the partial likelihood. Simulation studies and an analysis of a semiparametric extension of a popular model for neural spike train data demonstrate the practical utility of the rate approach in comparison with the intensity approach. PMID:24550568
Carré-Eusèbe, D; Lederer, F; Lê, K H; Elsevier, S M
1991-01-01
Protamine P2, the major basic chromosomal protein of mouse spermatozoa, is synthesized as a precursor almost twice as long as the mature protein, its extra length arising from an N-terminal extension of 44 amino acid residues. This precursor is integrated into chromatin of spermatids, and the extension is processed during chromatin condensation in the haploid cells. We have studied processing in the mouse and have identified two intermediates generated by proteolytic cleavage of the precursor. H.p.l.c. separated protamine P2 from four other spermatid proteins, including the precursor and three proteins known to possess physiological characteristics expected of processing intermediates. Peptide mapping indicated that all of these proteins were structurally similar. Two major proteins were further purified by PAGE, transferred to poly(vinylidene difluoride) membranes and submitted to automated N-terminal sequence analysis. Both sequences were found within the deduced sequence of the precursor extension. The N-terminus of the larger intermediate, PP2C, was Gly-12, whereas the N-terminus of the smaller, PP2D, was His-21. Both processing sites involved a peptide bond in which the carbonyl function was contributed by an acidic amino acid. Images Fig. 1. Fig. 3. Fig. 4. PMID:1854346
ERIC Educational Resources Information Center
Sonnenberg, Christoph; Bannert, Maria
2015-01-01
According to research examining self-regulated learning (SRL), we regard individual regulation as a specific sequence of regulatory activities. Ideally, students perform various learning activities, such as analyzing, monitoring, and evaluating cognitive and motivational aspects during learning. Metacognitive prompts can foster SRL by inducing…
Hazardous Waste Processing in the Chemical Engineering Curriculum.
ERIC Educational Resources Information Center
Dorland, Dianne; Baria, Dorab N.
1995-01-01
Describes a sequence of two courses included in the chemical engineering program at the University of Minnesota, Duluth that deal with the processing of hazardous wastes. Covers course content and structure, and discusses developments in pollution prevention and waste management that led to the addition of these courses to the curriculum.…
Modeling Structure-Function Relationships in Synthetic DNA Sequences using Attribute Grammars
Cai, Yizhi; Lux, Matthew W.; Adam, Laura; Peccoud, Jean
2009-01-01
Recognizing that certain biological functions can be associated with specific DNA sequences has led various fields of biology to adopt the notion of the genetic part. This concept provides a finer level of granularity than the traditional notion of the gene. However, a method of formally relating how a set of parts relates to a function has not yet emerged. Synthetic biology both demands such a formalism and provides an ideal setting for testing hypotheses about relationships between DNA sequences and phenotypes beyond the gene-centric methods used in genetics. Attribute grammars are used in computer science to translate the text of a program source code into the computational operations it represents. By associating attributes with parts, modifying the value of these attributes using rules that describe the structure of DNA sequences, and using a multi-pass compilation process, it is possible to translate DNA sequences into molecular interaction network models. These capabilities are illustrated by simple example grammars expressing how gene expression rates are dependent upon single or multiple parts. The translation process is validated by systematically generating, translating, and simulating the phenotype of all the sequences in the design space generated by a small library of genetic parts. Attribute grammars represent a flexible framework connecting parts with models of biological function. They will be instrumental for building mathematical models of libraries of genetic constructs synthesized to characterize the function of genetic parts. This formalism is also expected to provide a solid foundation for the development of computer assisted design applications for synthetic biology. PMID:19816554
Wang, Zhangjie; Zhang, Tianji; Xie, Shaoshuai; Liu, Xinyue; Li, Hongmei; Linhardt, Robert J; Chi, Lianli
2018-03-01
Low molecular weight heparins (LMWHs) are widely used anticoagulant drugs. The composition and sequence of LMWH oligosaccharides determine their safety and efficacy. The short oligosaccharide pool in LMWHs undergoes more depolymerization reactions than the longer chains and is the most sensitive indicator of the manufacturing process. Electrospray ionization tandem mass spectrometry (ESI-MS/MS) has been demonstrated as a powerful tool to sequence synthetic heparin oligosaccharide but never been applied to analyze complicated mixture like LMWHs. We established an offline strong anion exchange (SAX)-high performance liquid chromatography (HPLC) and ESI-MS/MS approach to sequence the short oligosaccharides of dalteparin sodium. With the help of in-house developed MS/MS interpretation software, the sequences of 18 representative species ranging from tetrasaccharide to octasaccharide were obtained. Interestingly, we found a novel 2,3-disulfated hexauronic acid structure and reconfirmed it by complementary heparinase digestion and LC-MS/MS analysis. This approach provides straightforward and in-depth insight to the structure of LMWHs and the reaction mechanism of heparin depolymerization. Copyright © 2017 Elsevier Ltd. All rights reserved.
The Mouse Genomes Project: a repository of inbred laboratory mouse strain genomes.
Adams, David J; Doran, Anthony G; Lilue, Jingtao; Keane, Thomas M
2015-10-01
The Mouse Genomes Project was initiated in 2009 with the goal of using next-generation sequencing technologies to catalogue molecular variation in the common laboratory mouse strains, and a selected set of wild-derived inbred strains. The initial sequencing and survey of sequence variation in 17 inbred strains was completed in 2011 and included comprehensive catalogue of single nucleotide polymorphisms, short insertion/deletions, larger structural variants including their fine scale architecture and landscape of transposable element variation, and genomic sites subject to post-transcriptional alteration of RNA. From this beginning, the resource has expanded significantly to include 36 fully sequenced inbred laboratory mouse strains, a refined and updated data processing pipeline, and new variation querying and data visualisation tools which are available on the project's website ( http://www.sanger.ac.uk/resources/mouse/genomes/ ). The focus of the project is now the completion of de novo assembled chromosome sequences and strain-specific gene structures for the core strains. We discuss how the assembled chromosomes will power comparative analysis, data access tools and future directions of mouse genetics.
Links, Matthew G; Chaban, Bonnie; Hemmingsen, Sean M; Muirhead, Kevin; Hill, Janet E
2013-08-15
Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. The de novo assembly of OTU sequences has been recently demonstrated as an alternative to widely used clustering methods, providing robust information from experimental data alone, without any reliance on an external reference database. Here we introduce mPUMA (microbial Profiling Using Metagenomic Assembly, http://mpuma.sourceforge.net), a software package for identification and analysis of protein-coding barcode sequence data. It was developed originally for Cpn60 universal target sequences (also known as GroEL or Hsp60). Using an unattended process that is independent of external reference sequences, mPUMA forms OTUs by DNA sequence assembly and is capable of tracking OTU abundance. mPUMA processes microbial profiles both in terms of the direct DNA sequence as well as in the translated amino acid sequence for protein coding barcodes. By forming OTUs and calculating abundance through an assembly approach, mPUMA is capable of generating inputs for several popular microbiota analysis tools. Using SFF data from sequencing of a synthetic community of Cpn60 sequences derived from the human vaginal microbiome, we demonstrate that mPUMA can faithfully reconstruct all expected OTU sequences and produce compositional profiles consistent with actual community structure. mPUMA enables analysis of microbial communities while empowering the discovery of novel organisms through OTU assembly.
McCarthy, Davis J; Campbell, Kieran R; Lun, Aaron T L; Wills, Quin F
2017-04-15
Single-cell RNA sequencing (scRNA-seq) is increasingly used to study gene expression at the level of individual cells. However, preparing raw sequence data for further analysis is not a straightforward process. Biases, artifacts and other sources of unwanted variation are present in the data, requiring substantial time and effort to be spent on pre-processing, quality control (QC) and normalization. We have developed the R/Bioconductor package scater to facilitate rigorous pre-processing, quality control, normalization and visualization of scRNA-seq data. The package provides a convenient, flexible workflow to process raw sequencing reads into a high-quality expression dataset ready for downstream analysis. scater provides a rich suite of plotting tools for single-cell data and a flexible data structure that is compatible with existing tools and can be used as infrastructure for future software development. The open-source code, along with installation instructions, vignettes and case studies, is available through Bioconductor at http://bioconductor.org/packages/scater . davis@ebi.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
Wan, Dongjin; Liu, Yongde; Niu, Zhenhua; Xiao, Shuhu; Li, Daorong
2016-02-01
Hydrogen autotrophic reduction of perchlorate have advantages of high removal efficiency and harmless to drinking water. But so far the reported information about the microbial community structure was comparatively limited, changes in the biodiversity and the dominant bacteria during acclimation process required detailed study. In this study, perchlorate-reducing hydrogen autotrophic bacteria were acclimated by hydrogen aeration from activated sludge. For the first time, high-throughput sequencing was applied to analyze changes in biodiversity and the dominant bacteria during acclimation process. The Michaelis-Menten model described the perchlorate reduction kinetics well. Model parameters q(max) and K(s) were 2.521-3.245 (mg ClO4(-)/gVSS h) and 5.44-8.23 (mg/l), respectively. Microbial perchlorate reduction occurred across at pH range 5.0-11.0; removal was highest at pH 9.0. The enriched mixed bacteria could use perchlorate, nitrate and sulfate as electron accepter, and the sequence of preference was: NO3(-) > ClO4(-) > SO4(2-). Compared to the feed culture, biodiversity decreased greatly during acclimation process, the microbial community structure gradually stabilized after 9 acclimation cycles. The Thauera genus related to Rhodocyclales was the dominated perchlorate reducing bacteria (PRB) in the mixed culture.
Graph mining for next generation sequencing: leveraging the assembly graph for biological insights.
Warnke-Sommer, Julia; Ali, Hesham
2016-05-06
The assembly of Next Generation Sequencing (NGS) reads remains a challenging task. This is especially true for the assembly of metagenomics data that originate from environmental samples potentially containing hundreds to thousands of unique species. The principle objective of current assembly tools is to assemble NGS reads into contiguous stretches of sequence called contigs while maximizing for both accuracy and contig length. The end goal of this process is to produce longer contigs with the major focus being on assembly only. Sequence read assembly is an aggregative process, during which read overlap relationship information is lost as reads are merged into longer sequences or contigs. The assembly graph is information rich and capable of capturing the genomic architecture of an input read data set. We have developed a novel hybrid graph in which nodes represent sequence regions at different levels of granularity. This model, utilized in the assembly and analysis pipeline Focus, presents a concise yet feature rich view of a given input data set, allowing for the extraction of biologically relevant graph structures for graph mining purposes. Focus was used to create hybrid graphs to model metagenomics data sets obtained from the gut microbiomes of five individuals with Crohn's disease and eight healthy individuals. Repetitive and mobile genetic elements are found to be associated with hybrid graph structure. Using graph mining techniques, a comparative study of the Crohn's disease and healthy data sets was conducted with focus on antibiotics resistance genes associated with transposase genes. Results demonstrated significant differences in the phylogenetic distribution of categories of antibiotics resistance genes in the healthy and diseased patients. Focus was also evaluated as a pure assembly tool and produced excellent results when compared against the Meta-velvet, Omega, and UD-IDBA assemblers. Mining the hybrid graph can reveal biological phenomena captured by its structure. We demonstrate the advantages of considering assembly graphs as data-mining support in addition to their role as frameworks for assembly.
Diversity of Secondary Structure in Catalytic Peptides with β-Turn-Biased Sequences
2016-01-01
X-ray crystallography has been applied to the structural analysis of a series of tetrapeptides that were previously assessed for catalytic activity in an atroposelective bromination reaction. Common to the series is a central Pro-Xaa sequence, where Pro is either l- or d-proline, which was chosen to favor nucleation of canonical β-turn secondary structures. Crystallographic analysis of 35 different peptide sequences revealed a range of conformational states. The observed differences appear not only in cases where the Pro-Xaa loop-region is altered, but also when seemingly subtle alterations to the flanking residues are introduced. In many instances, distinct conformers of the same sequence were observed, either as symmetry-independent molecules within the same unit cell or as polymorphs. Computational studies using DFT provided additional insight into the analysis of solid-state structural features. Select X-ray crystal structures were compared to the corresponding solution structures derived from measured proton chemical shifts, 3J-values, and 1H–1H-NOESY contacts. These findings imply that the conformational space available to simple peptide-based catalysts is more diverse than precedent might suggest. The direct observation of multiple ground state conformations for peptides of this family, as well as the dynamic processes associated with conformational equilibria, underscore not only the challenge of designing peptide-based catalysts, but also the difficulty in predicting their accessible transition states. These findings implicate the advantages of low-barrier interconversions between conformations of peptide-based catalysts for multistep, enantioselective reactions. PMID:28029251
DOE Office of Scientific and Technical Information (OSTI.GOV)
2003-05-29
AUTOGEN computes collision-free sequences of robot motion instructions to permit traversal of three-dimensional space curves. Order and direction of curve traversal and orientation of end effector are constraided by a set of manufacturing rules. Input can be provided as a collection of solid models or in terms of wireframe objects and structural cross-section definitions. Entity juxtaposition can be inferred, with appropriate structural features automatically provided. Process control is asserted as a function of position and orientation along each space curve, and is currently implemented for welding processes.
Bedoin, Nathalie; Brisseau, Lucie; Molinier, Pauline; Roch, Didier; Tillmann, Barbara
2016-01-01
Children with developmental language disorders have been shown to be also impaired in rhythm and meter perception. Temporal processing and its link to language processing can be understood within the dynamic attending theory. An external stimulus can stimulate internal oscillators, which orient attention over time and drive speech signal segmentation to provide benefits for syntax processing, which is impaired in various patient populations. For children with Specific Language Impairment (SLI) and dyslexia, previous research has shown the influence of an external rhythmic stimulation on subsequent language processing by comparing the influence of a temporally regular musical prime to that of a temporally irregular prime. Here we tested whether the observed rhythmic stimulation effect is indeed due to a benefit provided by the regular musical prime (rather than a cost subsequent to the temporally irregular prime). Sixteen children with SLI and 16 age-matched controls listened to either a regular musical prime sequence or an environmental sound scene (without temporal regularities in event occurrence; i.e., referred to as "baseline condition") followed by grammatically correct and incorrect sentences. They were required to perform grammaticality judgments for each auditorily presented sentence. Results revealed that performance for the grammaticality judgments was better after the regular prime sequences than after the baseline sequences. Our findings are interpreted in the theoretical framework of the dynamic attending theory (Jones, 1976) and the temporal sampling (oscillatory) framework for developmental language disorders (Goswami, 2011). Furthermore, they encourage the use of rhythmic structures (even in non-verbal materials) to boost linguistic structure processing and outline perspectives for rehabilitation.
Sequence, Structure, and Context Preferences of Human RNA Binding Proteins.
Dominguez, Daniel; Freese, Peter; Alexis, Maria S; Su, Amanda; Hochman, Myles; Palden, Tsultrim; Bazile, Cassandra; Lambert, Nicole J; Van Nostrand, Eric L; Pratt, Gabriel A; Yeo, Gene W; Graveley, Brenton R; Burge, Christopher B
2018-06-07
RNA binding proteins (RBPs) orchestrate the production, processing, and function of mRNAs. Here, we present the affinity landscapes of 78 human RBPs using an unbiased assay that determines the sequence, structure, and context preferences of these proteins in vitro by deep sequencing of bound RNAs. These data enable construction of "RNA maps" of RBP activity without requiring crosslinking-based assays. We found an unexpectedly low diversity of RNA motifs, implying frequent convergence of binding specificity toward a relatively small set of RNA motifs, many with low compositional complexity. Offsetting this trend, however, we observed extensive preferences for contextual features distinct from short linear RNA motifs, including spaced "bipartite" motifs, biased flanking nucleotide composition, and bias away from or toward RNA structure. Our results emphasize the importance of contextual features in RNA recognition, which likely enable targeting of distinct subsets of transcripts by different RBPs that recognize the same linear motif. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Sequence Determinants of Compaction in Intrinsically Disordered Proteins
Marsh, Joseph A.; Forman-Kay, Julie D.
2010-01-01
Abstract Intrinsically disordered proteins (IDPs), which lack folded structure and are disordered under nondenaturing conditions, have been shown to perform important functions in a large number of cellular processes. These proteins have interesting structural properties that deviate from the random-coil-like behavior exhibited by chemically denatured proteins. In particular, IDPs are often observed to exhibit significant compaction. In this study, we have analyzed the hydrodynamic radii of a number of IDPs to investigate the sequence determinants of this compaction. Net charge and proline content are observed to be strongly correlated with increased hydrodynamic radii, suggesting that these are the dominant contributors to compaction. Hydrophobicity and secondary structure, on the other hand, appear to have negligible effects on compaction, which implies that the determinants of structure in folded and intrinsically disordered proteins are profoundly different. Finally, we observe that polyhistidine tags seem to increase IDP compaction, which suggests that these tags have significant perturbing effects and thus should be removed before any structural characterizations of IDPs. Using the relationships observed in this analysis, we have developed a sequence-based predictor of hydrodynamic radius for IDPs that shows substantial improvement over a simple model based upon chain length alone. PMID:20483348
GeneBee-net: Internet-based server for analyzing biopolymers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brodsky, L.I.; Ivanov, V.V.; Nikolaev, V.K.
This work describes a network server for searching databanks of biopolymer structures and performing other biocomputing procedures; it is available via direct Internet connection. Basic server procedures are dedicated to homology (similarity) search of sequence and 3D structure of proteins. The homologies found could be used to build multiple alignments, predict protein and RNA secondary structure, and construct phylogenetic trees. In addition to traditional methods of sequence similarity search, the authors propose {open_quotes}non-matrix{close_quotes} (correlational) search. An analogous approach is used to identify regions of similar tertiary structure of proteins. Algorithm concepts and usage examples are presented for new methods. Servicemore » logic is based upon interaction of a client program and server procedures. The client program allows the compilation of queries and the processing of results of an analysis.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Man, Viet Hoang; Pan, Feng; Sagui, Celeste, E-mail: sagui@ncsu.edu
We explore the use of a fast laser melting simulation approach combined with atomistic molecular dynamics simulations in order to determine the melting and healing responses of B-DNA and Z-DNA dodecamers with the same d(5′-CGCGCGCGCGCG-3′){sub 2} sequence. The frequency of the laser pulse is specifically tuned to disrupt Watson-Crick hydrogen bonds, thus inducing melting of the DNA duplexes. Subsequently, the structures relax and partially refold, depending on the field strength. In addition to the inherent interest of the nonequilibrium melting process, we propose that fast melting by an infrared laser pulse could be used as a technique for a fastmore » comparison of relative stabilities of same-sequence oligonucleotides with different secondary structures with full atomistic detail of the structures and solvent. This could be particularly useful for nonstandard secondary structures involving non-canonical base pairs, mismatches, etc.« less
Unsupervised learning of natural languages
Solan, Zach; Horn, David; Ruppin, Eytan; Edelman, Shimon
2005-01-01
We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The adios (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics. PMID:16087885
Unsupervised learning of natural languages.
Solan, Zach; Horn, David; Ruppin, Eytan; Edelman, Shimon
2005-08-16
We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The adios (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics.
Prospects and limitations of full-text index structures in genome analysis
Vyverman, Michaël; De Baets, Bernard; Fack, Veerle; Dawyndt, Peter
2012-01-01
The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared. PMID:22584621
ERIC Educational Resources Information Center
Frith, Uta
1970-01-01
Findings are consistent with the hypothesis of an input processing deficit in autistic children. Autistic children were insensitive to differences in the structures present and tended to impose their own simple stereotyped patterns. Normal children imposed such patterns in the absence of structured input only. Paper reports work which has been…
Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran
2015-02-06
Gaining access to sequence and structure information of telomere binding proteins helps in understanding the essential biological processes involve in conserved sequence specific interaction between DNA and the proteins. Rice telomere binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix turn helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain but till now there is very less communication on the in silico studies of these complete proteins.Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK web server.Digging up all the facts about the proteins it was reveled that around 120 amino acids in the tail part was showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicates the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and Energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.
Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran
2015-09-01
Gaining access to sequence and structure information of telomere-binding proteins helps in understanding the essential biological processes involve in conserved sequence-specific interaction between DNA and the proteins. Rice telomere-binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix-turn-helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain, but till now there is very less communication on the in silico studies of these complete proteins. Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK Web server. By digging up all the facts about the proteins, it was revealed that around 120 amino acids in the tail part were showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicate the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA-binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.
The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4)
Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos; ...
2015-10-26
The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. In conclusion, structural annotation is followed by assignment of protein product names and functions.
The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos
The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. In conclusion, structural annotation is followed by assignment of protein product names and functions.
High-Throughput Analysis of T-DNA Location and Structure Using Sequence Capture.
Inagaki, Soichi; Henry, Isabelle M; Lieberman, Meric C; Comai, Luca
2015-01-01
Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA-genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. Our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.
Sequence-similar, structure-dissimilar protein pairs in the PDB.
Kosloff, Mickey; Kolodny, Rachel
2008-05-01
It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).
Miller, Thomas F.
2017-01-01
We present a coarse-grained simulation model that is capable of simulating the minute-timescale dynamics of protein translocation and membrane integration via the Sec translocon, while retaining sufficient chemical and structural detail to capture many of the sequence-specific interactions that drive these processes. The model includes accurate geometric representations of the ribosome and Sec translocon, obtained directly from experimental structures, and interactions parameterized from nearly 200 μs of residue-based coarse-grained molecular dynamics simulations. A protocol for mapping amino-acid sequences to coarse-grained beads enables the direct simulation of trajectories for the co-translational insertion of arbitrary polypeptide sequences into the Sec translocon. The model reproduces experimentally observed features of membrane protein integration, including the efficiency with which polypeptide domains integrate into the membrane, the variation in integration efficiency upon single amino-acid mutations, and the orientation of transmembrane domains. The central advantage of the model is that it connects sequence-level protein features to biological observables and timescales, enabling direct simulation for the mechanistic analysis of co-translational integration and for the engineering of membrane proteins with enhanced membrane integration efficiency. PMID:28328943
SIMBAD : a sequence-independent molecular-replacement pipeline
Simpkin, Adam J.; Simkovic, Felix; Thomas, Jens M. H.; ...
2018-06-08
The conventional approach to finding structurally similar search models for use in molecular replacement (MR) is to use the sequence of the target to search against those of a set of known structures. Sequence similarity often correlates with structure similarity. Given sufficient similarity, a known structure correctly positioned in the target cell by the MR process can provide an approximation to the unknown phases of the target. An alternative approach to identifying homologous structures suitable for MR is to exploit the measured data directly, comparing the lattice parameters or the experimentally derived structure-factor amplitudes with those of known structures. Here,more » SIMBAD , a new sequence-independent MR pipeline which implements these approaches, is presented. SIMBAD can identify cases of contaminant crystallization and other mishaps such as mistaken identity (swapped crystallization trays), as well as solving unsequenced targets and providing a brute-force approach where sequence-dependent search-model identification may be nontrivial, for example because of conformational diversity among identifiable homologues. The program implements a three-step pipeline to efficiently identify a suitable search model in a database of known structures. The first step performs a lattice-parameter search against the entire Protein Data Bank (PDB), rapidly determining whether or not a homologue exists in the same crystal form. The second step is designed to screen the target data for the presence of a crystallized contaminant, a not uncommon occurrence in macromolecular crystallography. Solving structures with MR in such cases can remain problematic for many years, since the search models, which are assumed to be similar to the structure of interest, are not necessarily related to the structures that have actually crystallized. To cater for this eventuality, SIMBAD rapidly screens the data against a database of known contaminant structures. Where the first two steps fail to yield a solution, a final step in SIMBAD can be invoked to perform a brute-force search of a nonredundant PDB database provided by the MoRDa MR software. Through early-access usage of SIMBAD , this approach has solved novel cases that have otherwise proved difficult to solve.« less
SIMBAD : a sequence-independent molecular-replacement pipeline
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simpkin, Adam J.; Simkovic, Felix; Thomas, Jens M. H.
The conventional approach to finding structurally similar search models for use in molecular replacement (MR) is to use the sequence of the target to search against those of a set of known structures. Sequence similarity often correlates with structure similarity. Given sufficient similarity, a known structure correctly positioned in the target cell by the MR process can provide an approximation to the unknown phases of the target. An alternative approach to identifying homologous structures suitable for MR is to exploit the measured data directly, comparing the lattice parameters or the experimentally derived structure-factor amplitudes with those of known structures. Here,more » SIMBAD , a new sequence-independent MR pipeline which implements these approaches, is presented. SIMBAD can identify cases of contaminant crystallization and other mishaps such as mistaken identity (swapped crystallization trays), as well as solving unsequenced targets and providing a brute-force approach where sequence-dependent search-model identification may be nontrivial, for example because of conformational diversity among identifiable homologues. The program implements a three-step pipeline to efficiently identify a suitable search model in a database of known structures. The first step performs a lattice-parameter search against the entire Protein Data Bank (PDB), rapidly determining whether or not a homologue exists in the same crystal form. The second step is designed to screen the target data for the presence of a crystallized contaminant, a not uncommon occurrence in macromolecular crystallography. Solving structures with MR in such cases can remain problematic for many years, since the search models, which are assumed to be similar to the structure of interest, are not necessarily related to the structures that have actually crystallized. To cater for this eventuality, SIMBAD rapidly screens the data against a database of known contaminant structures. Where the first two steps fail to yield a solution, a final step in SIMBAD can be invoked to perform a brute-force search of a nonredundant PDB database provided by the MoRDa MR software. Through early-access usage of SIMBAD , this approach has solved novel cases that have otherwise proved difficult to solve.« less
Knutson, Stacy T; Westwood, Brian M; Leuthaeuser, Janelle B; Turner, Brandon E; Nguyendac, Don; Shea, Gabrielle; Kumar, Kiran; Hayden, Julia D; Harper, Angela F; Brown, Shoshana D; Morris, John H; Ferrin, Thomas E; Babbitt, Patricia C; Fetrow, Jacquelyn S
2017-04-01
Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification-amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two-Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure-Function Linkage Database, SFLD) self-identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self-identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well-curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP-identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F-measure and performance analysis on the enolase search results and comparison to GEMMA and SCI-PHY demonstrate that TuLIP avoids the over-division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results. © 2017 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Broxson, Christopher; Beckett, Joshua; Tornaletti, Silvia
2011-05-17
Non canonical DNA structures correspond to genomic regions particularly susceptible to genetic instability. The transcription process facilitates formation of these structures and plays a major role in generating the instability associated with these genomic sites. However, little is known about how non canonical structures are processed when encountered by an elongating RNA polymerase. Here we have studied the behavior of T7 RNA polymerase (T7RNAP) when encountering a G quadruplex forming-(GGA)(4) repeat located in the human c-myb proto-oncogene. To make direct correlations between formation of the structure and effects on transcription, we have taken advantage of the ability of the T7 polymerase to transcribe single-stranded substrates and of G4 DNA to form in single-stranded G-rich sequences in the presence of potassium ions. Under physiological KCl concentrations, we found that T7 RNAP transcription was arrested at two sites that mapped to the c-myb (GGA)(4) repeat sequence. The extent of arrest did not change with time, indicating that the c-myb repeat represented an absolute block and not a transient pause to T7 RNAP. Consistent with G4 DNA formation, arrest was not observed in the absence of KCl or in the presence of LiCl. Furthermore, mutations in the c-myb (GGA)(4) repeat, expected to prevent transition to G4, also eliminated the transcription block. We show T7 RNAP arrest at the c-myb repeat in double-stranded DNA under conditions mimicking the cellular concentration of biomolecules and potassium ions, suggesting that the G4 structure formed in the c-myb repeat may represent a transcription roadblock in vivo. Our results support a mechanism of transcription-coupled DNA repair initiated by arrest of transcription at G4 structures.
Sroubek, Jakub; Krishnan, Yamini; McDonald, Thomas V.
2013-01-01
Human ether-á-gogo-related gene (HERG) encodes a potassium channel that is highly susceptible to deleterious mutations resulting in susceptibility to fatal cardiac arrhythmias. Most mutations adversely affect HERG channel assembly and trafficking. Why the channel is so vulnerable to missense mutations is not well understood. Since nothing is known of how mRNA structural elements factor in channel processing, we synthesized a codon-modified HERG cDNA (HERG-CM) where the codons were synonymously changed to reduce GC content, secondary structure, and rare codon usage. HERG-CM produced typical IKr-like currents; however, channel synthesis and processing were markedly different. Translation efficiency was reduced for HERG-CM, as determined by heterologous expression, in vitro translation, and polysomal profiling. Trafficking efficiency to the cell surface was greatly enhanced, as assayed by immunofluorescence, subcellular fractionation, and surface labeling. Chimeras of HERG-NT/CM indicated that trafficking efficiency was largely dependent on 5′ sequences, while translation efficiency involved multiple areas. These results suggest that HERG translation and trafficking rates are independently governed by noncoding information in various regions of the mRNA molecule. Noncoding information embedded within the mRNA may play a role in the pathogenesis of hereditary arrhythmia syndromes and could provide an avenue for targeted therapeutics.—Sroubek, J., Krishnan, Y., McDonald, T V. Sequence- and structure-specific elements of HERG mRNA determine channel synthesis and trafficking efficiency. PMID:23608144
NASA Astrophysics Data System (ADS)
Stow, Dorrik A. V.; Shanmugam, Ganapathy
1980-01-01
A comparative study of the sequence of sedimentary structures in ancient and modern fine-grained turbidites is made in three contrasting areas. They are (1) Holocene and Pleistocene deep-sea muds of the Nova Scotian Slope and Rise, (2) Middle Ordovician Sevier Shale of the Valley and Ridge Province of the Southern Appalachians, and (3) Cambro-Ordovician Halifax Slate of the Meguma Group in Nova Scotia. A standard sequence of structures is proposed for fine-grained turbidites. The complete sequence has nine sub-divisions that are here termed T 0 to T 8. "The lower subdivision (T 0) comprises a silt lamina which has a sharp, scoured and load-cast base, internal parallel-lamination and cross-lamination, and a sharp current-lineated or wavy surface with 'fading-ripples' (= Type C etc. …)." (= Type C ripple-drift cross-lamination, Jopling and Walker, 1968). The overlying sequence shows textural and compositional grading through alternating silt and mud laminae. A convolute-laminated sub-division (T 1) is overlain by low-amplitude climbing ripples (T 2), thin regular laminae (T 3), thin indistinct laminae (T 4), and thin wipsy or convolute laminae (T 5). The topmost three divisions, graded mud (T 6), ungraded mud (T 7) and bioturbated mud (T 8), do not have silt laminae but rare patchy silt lenses and silt pseudonodules and a thin zone of micro-burrowing near the upper surface. The proposed sequence is analogous to the Bouma (1962) structural scheme for sandy turbidites and is approximately equivalent to Bouma's (C)DE divisions. The repetition of partial sequences characterizes different parts of the slope/base-of-slope/basin plain environment, and represents deposition from different stages of evolution of a large, muddy, turbidity flow. Microstructural detail and sequence are well preserved in ancient and even slightly metamorphosed sediments. Their recognition is important for determining depositional processes and for palaeoenvironmental interpretation.
Transcription blockage by stable H-DNA analogs in vitro
Pandey, Shristi; Ogloblina, Anna M.; Belotserkovskii, Boris P.; Dolinnaya, Nina G.; Yakubovskaya, Marianna G.; Mirkin, Sergei M.; Hanawalt, Philip C.
2015-01-01
DNA sequences that can form unusual secondary structures are implicated in regulating gene expression and causing genomic instability. H-palindromes are an important class of such DNA sequences that can form an intramolecular triplex structure, H-DNA. Within an H-palindrome, the H-DNA and canonical B-DNA are in a dynamic equilibrium that shifts toward H-DNA with increased negative supercoiling. The interplay between H- and B-DNA and the fact that the process of transcription affects supercoiling makes it difficult to elucidate the effects of H-DNA upon transcription. We constructed a stable structural analog of H-DNA that cannot flip into B-DNA, and studied the effects of this structure on transcription by T7 RNA polymerase in vitro. We found multiple transcription blockage sites adjacent to and within sequences engaged in this triplex structure. Triplex-mediated transcription blockage varied significantly with changes in ambient conditions: it was exacerbated in the presence of Mn2+ or by increased concentrations of K+ and Li+. Analysis of the detailed pattern of the blockage suggests that RNA polymerase is sterically hindered by H-DNA and has difficulties in unwinding triplex DNA. The implications of these findings for the biological roles of triple-stranded DNA structures are discussed. PMID:26101261
Kumar, Anil; Sharma, Divya; Tiwari, Apoorv; Jaiswal, J P; Singh, N K; Sood, Salej
2016-07-01
Finger millet [ (L.) Gaertn.] is grown mainly by subsistence farmers in arid and semiarid regions of the world. To broaden its genetic base and to boost its production, it is of paramount importance to characterize and genotype the diverse gene pool of this important food and nutritional security crop. However, as a result of nonavailability of the genome sequence of finger millet, the progress could not be made in realizing the molecular basis of unique qualities of the crop. In the present investigation, attempts have been made to characterize the genetically diverse collection of 113 finger millet accessions through whole-genome genotyping-by-sequencing (GBS), which resulted in a genome-wide set of 23,000 single-nucleotide polymorphisms (SNPs) segregating across the entire collection and several thousand SNPs segregating within every accession. A model-based population structure analysis reveals the presence of three subpopulations among the finger millet accessions, which are in parallel with the results of phylogenetic analysis. The observed population structure is consistent with the hypothesis that finger millet was domesticated first in Africa, and from there it was introduced to India some 3000 yr ago. A total of 1128 gene ontology (GO) terms were assigned to SNP-carrying genes for three main categories: biological process, cellular component, and molecular function. Facilitated access to high-throughput genotyping and sequencing technologies are likely to improve the breeding process in developing countries, and as such, this data will be very useful to breeders who are working for the genetic improvement of finger millet. Copyright © 2016 Crop Science Society of America.
Potential in vivo roles of nucleic acid triple-helices
Buske, Fabian A
2011-01-01
The ability of double-stranded DNA to form a triple-helical structure by hydrogen bonding with a third strand is well established, but the biological functions of these structures remain largely unknown. There is considerable albeit circumstantial evidence for the existence of nucleic triplexes in vivo and their potential participation in a variety of biological processes including chromatin organization, DNA repair, transcriptional regulation and RNA processing has been investigated in a number of studies to date. There is also a range of possible mechanisms to regulate triplex formation through differential expression of triplex-forming RNAs, alteration of chromatin accessibility, sequence unwinding and nucleotide modifications. With the advent of next generation sequencing technology combined with targeted approaches to isolate triplexes, it is now possible to survey triplex formation with respect to their genomic context, abundance and dynamical changes during differentiation and development, which may open up new vistas in understanding genome biology and gene regulation. PMID:21525785
Xie, Xuehui; Liu, Na; Ping, Jing; Zhang, Qingyun; Zheng, Xiulin; Liu, Jianshe
2018-06-01
In present study, a hydrolysis acidification (HA) reactor was used for simulated dyeing wastewater treatment. Co-substrates included starch, glucose, sucrose, yeast extract (YE) and peptone were fed sequentially into the HA reactor to enhance the HA process effects. The performance of the HA reactor and the microbial community structure in HA process were investigated under different co-substrates conditions. Results showed that different co-substrates had different influences on the performance of HA reactor. The highest decolorization (50.64%) and COD removal rate (60.73%) of the HA reactor were obtained when sucrose was as the co-substrate. And it found that carbon co-substrates starch, glucose and sucrose exhibited better decolorization and higher COD removal efficiency of the HA reactor than the nitrogen co-substrates YE and peptone. Microbial community structure in the HA process was analyzed by Illumina MiSeq sequencing. Results revealed different co-substrates had different influences on the community structure and microbial diversity in HA process. It was considered that sucrose could enrich the species such as Raoultella, Desulfovibrio, Tolumonas, Clostridium, which might be capable of degrading the dyes. Sucrose was considered to be the best co-substrate of enhancing the HA reactor's performance in this study. This work would provide deep insight into the influence of many different co-substrates on HA reactor performance and microbial communities in HA process. Copyright © 2018 Elsevier Ltd. All rights reserved.
Exploring Connectivity in Sequence Space of Functional RNA
NASA Technical Reports Server (NTRS)
Wei, Chenyu; Pohorille, Andrzej; Popovic, Milena; Ditzler, Mark
2017-01-01
Emergence of replicable genetic molecules was one of the marking points in the origin of life, evolution of which can be conceptualized as a walk through the space of all possible sequences. A theoretical concept of fitness landscape helps to understand evolutionary processes through assigning a value of fitness to each genotype. Then, evolution of a phenotype is viewed as a series of consecutive, single-point mutations. Natural selection biases evolution toward peaks of high fitness and away from valleys of low fitness. whereas neutral drift occurs in the sequence space without direction as mutations are introduced at random. Large networks of neutral or near-neutral mutations on a fitness landscape, especially for sufficiently long genomes, are possible or even inevitable. Their detection in experiments, however, has been elusive. Although a few near-neutral evolutionary pathways have been found, recent experimental evidence indicates landscapes consist of largely isolated islands. The generality of these results, however, is not clear, as the genome length or the fraction of functional molecules in the genotypic space might have been insufficient for the emergence of large, neutral networks. Thorough investigation on the structure of the fitness landscape is essential to understand the mechanisms of evolution of early genomes. RNA molecules are commonly assumed to play the pivotal role in the origin of genetic systems. They are widely believed to be early, if not the earliest, genetic and catalytic molecules, with abundant biochemical activities as aptamers and ribozymes, i.e. RNA molecules capable, respectively, to bind small molecules or catalyze chemical reactions. Here, we present results of our recent studies on the structure of the sequence space of RNA ligase ribozymes selected through in vitro evolution. Several hundred thousands of sequences active to a different degree were obtained by way of deep sequencing. Analysis of these sequences revealed several large clusters defined such that every sequence in a cluster can be reached from any other sequence in the same cluster through a series of single point mutations. Sequences in a single cluster appear to adopt more than one secondary structure. The mechanism of refolding within a single cluster was examined. To shed light on possible evolutionary paths in the space of ribozymes, the connectivity between clusters was investigated. The effect of length of RNA molecules on the structure of the fitness landscape and possible evolutionary paths was examined by way of comparing functional sequences of 20 and 80 nucleobases in length. It was found that sequences of different lengths shared secondary structure motifs that were presumed responsible for catalytic activity, with increasing complexity and global structural rearrangements emerging in longer molecules.
Comparative Protein Structure Modeling Using MODELLER.
Webb, Benjamin; Sali, Andrej
2014-09-08
Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. Copyright © 2014 John Wiley & Sons, Inc.
Generation of control sequences for a pilot-disassembly system
NASA Astrophysics Data System (ADS)
Seliger, Guenther; Kim, Hyung-Ju; Keil, Thomas
2002-02-01
Closing the product and material cycles has emerged as a paradigm for industry in the 21st century. Disassembly plays a key role in a life cycle economy since it enables the recovery of resources. A partly automated disassembly system should adapt to a large variety of products and different degrees of devaluation. Also the amounts of products to be disassembled can vary strongly. To cope with these demands an approach to generate on-line disassembly control sequences will be presented. In order to react on these demands the technological feasibility is considered within a procedure for the generation of disassembly control sequences. Procedures are designed to find available and technologically feasible disassembly processes. The control system is formed by modularised and parameterised control units in the cell level within the entire control architecture. In the first development stage product and process analyses at the sample product washing machine were executed. Furthermore a generalized disassembly process was defined. Afterwards these processes were structured in primary and secondary functions. In the second stage the disassembly control at the technological level was investigated. Factors were the availability of the disassembly tools and the technological feasibility of the disassembly processes within the disassembly system. Technical alternative disassembly processes are determined as a result of availability of the tools and technological feasibility of processes. The fourth phase was the concept for the generation of the disassembly control sequences. The approach will be proved in a prototypical disassembly system.
SimRNAweb: a web server for RNA 3D structure modeling with optional restraints.
Magnus, Marcin; Boniecki, Michał J; Dawson, Wayne; Bujnicki, Janusz M
2016-07-08
RNA function in many biological processes depends on the formation of three-dimensional (3D) structures. However, RNA structure is difficult to determine experimentally, which has prompted the development of predictive computational methods. Here, we introduce a user-friendly online interface for modeling RNA 3D structures using SimRNA, a method that uses a coarse-grained representation of RNA molecules, utilizes the Monte Carlo method to sample the conformational space, and relies on a statistical potential to describe the interactions in the folding process. SimRNAweb makes SimRNA accessible to users who do not normally use high performance computational facilities or are unfamiliar with using the command line tools. The simplest input consists of an RNA sequence to fold RNA de novo. Alternatively, a user can provide a 3D structure in the PDB format, for instance a preliminary model built with some other technique, to jump-start the modeling close to the expected final outcome. The user can optionally provide secondary structure and distance restraints, and can freeze a part of the starting 3D structure. SimRNAweb can be used to model single RNA sequences and RNA-RNA complexes (up to 52 chains). The webserver is available at http://genesilico.pl/SimRNAweb. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Czar, Michael J; Cai, Yizhi; Peccoud, Jean
2009-07-01
Chemical synthesis of custom DNA made to order calls for software streamlining the design of synthetic DNA sequences. GenoCAD (www.genocad.org) is a free web-based application to design protein expression vectors, artificial gene networks and other genetic constructs composed of multiple functional blocks called genetic parts. By capturing design strategies in grammatical models of DNA sequences, GenoCAD guides the user through the design process. By successively clicking on icons representing structural features or actual genetic parts, complex constructs composed of dozens of functional blocks can be designed in a matter of minutes. GenoCAD automatically derives the construct sequence from its comprehensive libraries of genetic parts. Upon completion of the design process, users can download the sequence for synthesis or further analysis. Users who elect to create a personal account on the system can customize their workspace by creating their own parts libraries, adding new parts to the libraries, or reusing designs to quickly generate sets of related constructs.
Use of simulated data sets to evaluate the fidelity of Metagenomicprocessing methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mavromatis, Konstantinos; Ivanova, Natalia; Barry, Kerri
2006-12-01
Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity--based (blast hit distribution) and twomore » sequence composition--based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.« less
Primer-Free Aptamer Selection Using A Random DNA Library
Pan, Weihua; Xin, Ping; Patrick, Susan; Dean, Stacey; Keating, Christine; Clawson, Gary
2010-01-01
Aptamers are highly structured oligonucleotides (DNA or RNA) that can bind to targets with affinities comparable to antibodies 1. They are identified through an in vitro selection process called Systematic Evolution of Ligands by EXponential enrichment (SELEX) to recognize a wide variety of targets, from small molecules to proteins and other macromolecules 2-4. Aptamers have properties that are well suited for in vivo diagnostic and/or therapeutic applications: Besides good specificity and affinity, they are easily synthesized, survive more rigorous processing conditions, they are poorly immunogenic, and their relatively small size can result in facile penetration of tissues. Aptamers that are identified through the standard SELEX process usually comprise ~80 nucleotides (nt), since they are typically selected from nucleic acid libraries with ~40 nt long randomized regions plus fixed primer sites of ~20 nt on each side. The fixed primer sequences thus can comprise nearly ~50% of the library sequences, and therefore may positively or negatively compromise identification of aptamers in the selection process 3, although bioinformatics approaches suggest that the fixed sequences do not contribute significantly to aptamer structure after selection 5. To address these potential problems, primer sequences have been blocked by complementary oligonucleotides or switched to different sequences midway during the rounds of SELEX 6, or they have been trimmed to 6-9 nt 7, 8. Wen and Gray 9 designed a primer-free genomic SELEX method, in which the primer sequences were completely removed from the library before selection and were then regenerated to allow amplification of the selected genomic fragments. However, to employ the technique, a unique genomic library has to be constructed, which possesses limited diversity, and regeneration after rounds of selection relies on a linear reamplification step. Alternatively, efforts to circumvent problems caused by fixed primer sequences using high efficiency partitioning are met with problems regarding PCR amplification 10. We have developed a primer-free (PF) selection method that significantly simplifies SELEX procedures and effectively eliminates primer-interference problems 11, 12. The protocols work in a straightforward manner. The central random region of the library is purified without extraneous flanking sequences and is bound to a suitable target (for example to a purified protein or complex mixtures such as cell lines). Then the bound sequences are obtained, reunited with flanking sequences, and re-amplified to generate selected sub-libraries. As an example, here we selected aptamers to S100B, a protein marker for melanoma. Binding assays showed Kd s in the 10-7 - 10-8 M range after a few rounds of selection, and we demonstrate that the aptamers function effectively in a sandwich binding format. PMID:20689511
Evaluating the solution from MrBUMP and BALBES
Keegan, Ronan M.; Long, Fei; Fazio, Vincent J.; Winn, Martyn D.; Murshudov, Garib N.; Vagin, Alexei A.
2011-01-01
Molecular replacement is one of the key methods used to solve the problem of determining the phases of structure factors in protein structure solution from X-ray image diffraction data. Its success rate has been steadily improving with the development of improved software methods and the increasing number of structures available in the PDB for use as search models. Despite this, in cases where there is low sequence identity between the target-structure sequence and that of its set of possible homologues it can be a difficult and time-consuming chore to isolate and prepare the best search model for molecular replacement. MrBUMP and BALBES are two recent developments from CCP4 that have been designed to automate and speed up the process of determining and preparing the best search models and putting them through molecular replacement. Their intention is to provide the user with a broad set of results using many search models and to highlight the best of these for further processing. An overview of both programs is presented along with a description of how best to use them, citing case studies and the results of large-scale testing of the software. PMID:21460449
Zhang, Chengxin; Zheng, Wei; Freddolino, Peter L; Zhang, Yang
2018-03-10
Homology-based transferal remains the major approach to computational protein function annotations, but it becomes increasingly unreliable when the sequence identity between query and template decreases below 30%. We propose a novel pipeline, MetaGO, to deduce Gene Ontology attributes of proteins by combining sequence homology-based annotation with low-resolution structure prediction and comparison, and partner's homology-based protein-protein network mapping. The pipeline was tested on a large-scale set of 1000 non-redundant proteins from the CAFA3 experiment. Under the stringent benchmark conditions where templates with >30% sequence identity to the query are excluded, MetaGO achieves average F-measures of 0.487, 0.408, and 0.598, for Molecular Function, Biological Process, and Cellular Component, respectively, which are significantly higher than those achieved by other state-of-the-art function annotations methods. Detailed data analysis shows that the major advantage of the MetaGO lies in the new functional homolog detections from partner's homology-based network mapping and structure-based local and global structure alignments, the confidence scores of which can be optimally combined through logistic regression. These data demonstrate the power of using a hybrid model incorporating protein structure and interaction networks to deduce new functional insights beyond traditional sequence homology-based referrals, especially for proteins that lack homologous function templates. The MetaGO pipeline is available at http://zhanglab.ccmb.med.umich.edu/MetaGO/. Copyright © 2018. Published by Elsevier Ltd.
Processing of Archaebacterial Intron-Containing tRNA Gene Transcripts.
1987-07-31
1{ 1. Project Goals: A. To determine the mechanism of tRNA intron processing in the halophilic archaebacteria. B. Characterize and compare the...enzyme(s) responsible for the removal of 5’-flanking sequences from halophilic and sulfur-dependent tRNA gene transcripts. C. Examine the structure and...distribution of tRNA introns in the halophilic archaebacteria. 2. Accomplishments: A. Intron processing mechanism We have succeeded in our primary
Lo, Yu-Sheng; Tseng, Wen-Hsuan; Chuang, Chien-Ying; Hou, Ming-Hon
2013-01-01
The potent anticancer drug actinomycin D (ActD) functions by intercalating into DNA at GpC sites, thereby interrupting essential biological processes including replication and transcription. Certain neurological diseases are correlated with the expansion of (CGG)n trinucleotide sequences, which contain many contiguous GpC sites separated by a single G:G mispair. To characterize the binding of ActD to CGG triplet repeat sequences, the structural basis for the strong binding of ActD to neighbouring GpC sites flanking a G:G mismatch has been determined based on the crystal structure of ActD bound to ATGCGGCAT, which contains a CGG triplet sequence. The binding of ActD molecules to GCGGC causes many unexpected conformational changes including nucleotide flipping out, a sharp bend and a left-handed twist in the DNA helix via a two site-binding model. Heat denaturation, circular dichroism and surface plasmon resonance analyses showed that adjacent GpC sequences flanking a G:G mismatch are preferred ActD-binding sites. In addition, ActD was shown to bind the hairpin conformation of (CGG)16 in a pairwise combination and with greater stability than that of other DNA intercalators. Our results provide evidence of a possible biological consequence of ActD binding to CGG triplet repeat sequences. PMID:23408860
Dynamic Energy Landscapes of Riboswitches Help Interpret Conformational Rearrangements and Function
Quarta, Giulio; Sin, Ken; Schlick, Tamar
2012-01-01
Riboswitches are RNAs that modulate gene expression by ligand-induced conformational changes. However, the way in which sequence dictates alternative folding pathways of gene regulation remains unclear. In this study, we compute energy landscapes, which describe the accessible secondary structures for a range of sequence lengths, to analyze the transcriptional process as a given sequence elongates to full length. In line with experimental evidence, we find that most riboswitch landscapes can be characterized by three broad classes as a function of sequence length in terms of the distribution and barrier type of the conformational clusters: low-barrier landscape with an ensemble of different conformations in equilibrium before encountering a substrate; barrier-free landscape in which a direct, dominant “downhill” pathway to the minimum free energy structure is apparent; and a barrier-dominated landscape with two isolated conformational states, each associated with a different biological function. Sharing concepts with the “new view” of protein folding energy landscapes, we term the three sequence ranges above as the sensing, downhill folding, and functional windows, respectively. We find that these energy landscape patterns are conserved in various riboswitch classes, though the order of the windows may vary. In fact, the order of the three windows suggests either kinetic or thermodynamic control of ligand binding. These findings help understand riboswitch structure/function relationships and open new avenues to riboswitch design. PMID:22359488
van Atteveldt, Nienke; Musacchia, Gabriella; Zion-Golumbic, Elana; Sehatpour, Pejman; Javitt, Daniel C.; Schroeder, Charles
2015-01-01
The brain’s fascinating ability to adapt its internal neural dynamics to the temporal structure of the sensory environment is becoming increasingly clear. It is thought to be metabolically beneficial to align ongoing oscillatory activity to the relevant inputs in a predictable stream, so that they will enter at optimal processing phases of the spontaneously occurring rhythmic excitability fluctuations. However, some contexts have a more predictable temporal structure than others. Here, we tested the hypothesis that the processing of rhythmic sounds is more efficient than the processing of irregularly timed sounds. To do this, we simultaneously measured functional magnetic resonance imaging (fMRI) and electro-encephalograms (EEG) while participants detected oddball target sounds in alternating blocks of rhythmic (e.g., with equal inter-stimulus intervals) or random (e.g., with randomly varied inter-stimulus intervals) tone sequences. Behaviorally, participants detected target sounds faster and more accurately when embedded in rhythmic streams. The fMRI response in the auditory cortex was stronger during random compared to random tone sequence processing. Simultaneously recorded N1 responses showed larger peak amplitudes and longer latencies for tones in the random (vs. the rhythmic) streams. These results reveal complementary evidence for more efficient neural and perceptual processing during temporally predictable sensory contexts. PMID:26579044
Gao, Feng; Song, Weibo; Katz, Laura A
2014-08-01
In most lineages, diversity among gene family members results from gene duplication followed by sequence divergence. Because of the genome rearrangements during the development of somatic nuclei, gene family evolution in ciliates involves more complex processes. Previous work on the ciliate Chilodonella uncinata revealed that macronuclear β-tubulin gene family members are generated by alternative processing, in which germline regions are alternatively used in multiple macronuclear chromosomes. To further study genome evolution in this ciliate, we analyzed its transcriptome and found that (1) alternative processing is extensive among gene families; and (2) such gene families are likely to be C. uncinata specific. We characterized additional macronuclear and micronuclear copies of one candidate alternatively processed gene family-a protein kinase domain containing protein (PKc)-from two C. uncinata strains. Analysis of the PKc sequences reveals that (1) multiple PKc gene family members in the macronucleus share some identical regions flanked by divergent regions; and (2) the shared identical regions are processed from a single micronuclear chromosome. We discuss analogous processes in lineages across the eukaryotic tree of life to provide further insights on the impact of genome structure on gene family evolution in eukaryotes. © 2014 The Author(s). Evolution © 2014 The Society for the Study of Evolution.
Bassi, G S; Murchie, A I; Lilley, D M
1996-01-01
The hammerhead ribozyme undergoes an ion-dependent folding process into the active conformation. We find that the folding can be blocked at specific stages by changes of sequence or functionality within the core. In the the absence of added metal ions, the global structure of the hammerhead is extended, with a large angle subtended between stems I and II. No core sequence changes appear to alter this geometry, consistent with an unstructured core under these conditions. Upon addition of low concentrations of magnesium ions, the hammerhead folds by an association of stems II and III, to include a large angle between them. This stage is inhibited or altered by mutations within the oligopurine sequence lying between stems II and III, and folding is completely prevented by an A14G mutation. Further increase in magnesium ion concentration brings about a second stage of folding in the natural sequence hammerhead, involving a reorientation of stem I, which rotates around into the same direction of stem II. Because this transition occurs over the same range of magnesium ion concentration over which the hammerhead ribozyme becomes active, it is likely that the final conformation is most closely related to the active form of the structure. Magnesium ion-dependent folding into this conformation is prevented by changes at G5, notably removal of the 2'-hydroxyl group and replacement of the base by cytidine. The ability to dissect the folding process by means of sequence changes suggests that two separate ion-dependent stages are involved in the folding of the hammerhead ribozyme into the active conformation. PMID:8752086
Sun, Lianpeng; Chen, Jianfan; Wei, Xiange; Guo, Wuzhen; Lin, Meishan; Yu, Xiaoyu
2016-05-01
To further reveal the mechanism of sludge reduction in the oxic-settling-anaerobic (OSA) process, the polymerase chain reaction - denaturing gradient gel electrophoresis protocol was used to study the possible difference in the microbial communities between a sequencing batch reactor (SBR)-OSA process and its modified process, by analyzing the change in the diversity of the microbial communities in each reactor of both systems. The results indicated that the structure of the microbial communities in aerobic reactors of the 2 processes was very different, but the predominant microbial populations in anaerobic reactors were similar. The predominant microbial population in the aerobic reactor of the SBR-OSA belonged to Burkholderia cepacia, class Betaproteobacteria, while those of the modified process belonged to the classes Alphaproteobacteria, Betaproteobacteria, and Gammaproteobacteria. These 3 types of microbes had a cryptic growth characteristic, which was the main cause of a greater sludge reduction efficiency achieved by the modified process.
PSI:Biology-Materials Repository: A Biologist’s Resource for Protein Expression Plasmids
Cormier, Catherine Y.; Park, Jin G.; Fiacco, Michael; Steel, Jason; Hunter, Preston; Kramer, Jason; Singla, Rajeev; LaBaer, Joshua
2011-01-01
The Protein Structure Initiative:Biology-Materials Repository (PSI:Biology-MR; MR; http://psimr.asu.edu) sequence-verifies, annotates, stores, and distributes the protein expression plasmids and vectors created by the Protein Structure Initiative (PSI). The MR has developed an informatics and sample processing pipeline that manages this process for thousands of samples per month from nearly a dozen PSI centers. DNASU (http://dnasu.asu.edu), a freely searchable database, stores the plasmid annotations, which include the full-length sequence, vector information, and associated publications for over 130,000 plasmids created by our laboratory, by the PSI and other consortia, and by individual laboratories for distribution to researchers worldwide. Each plasmid links to external resources, including the PSI Structural Biology Knowledgebase (http://sbkb.org), which facilitates cross-referencing of a particular plasmid to additional protein annotations and experimental data. To expedite and simplify plasmid requests, the MR uses an expedited material transfer agreement (EP-MTA) network, where researchers from network institutions can order and receive PSI plasmids without institutional delays. Currently over 39,000 protein expression plasmids and 78 empty vectors from the PSI are available upon request from DNASU. Overall, the MR’s repository of expression-ready plasmids, its automated pipeline, and the rapid process for receiving and distributing these plasmids more effectively allows the research community to dissect the biological function of proteins whose structures have been studied by the PSI. PMID:21360289
Zhang, Lin; Bai, Zhitong; Ban, Heng; Liu, Ling
2015-11-21
Recent experiments have discovered very different thermal conductivities between the spider silk and the silkworm silk. Decoding the molecular mechanisms underpinning the distinct thermal properties may guide the rational design of synthetic silk materials and other biomaterials for multifunctionality and tunable properties. However, such an understanding is lacking, mainly due to the complex structure and phonon physics associated with the silk materials. Here, using non-equilibrium molecular dynamics, we demonstrate that the amino acid sequence plays a key role in the thermal conduction process through β-sheets, essential building blocks of natural silks and a variety of other biomaterials. Three representative β-sheet types, i.e. poly-A, poly-(GA), and poly-G, are shown to have distinct structural features and phonon dynamics leading to different thermal conductivities. A fundamental understanding of the sequence effects may stimulate the design and engineering of polymers and biopolymers for desired thermal properties.
Sequences required for transcription termination at the intrinsic lambdatI terminator.
Martínez-Trujillo, Miguel; Sánchez-Trujillo, Alejandra; Ceja, Víctor; Avila-Moreno, Federico; Bermúdez-Cruz, Rosa María; Court, Donald; Montañez, Cecilia
2010-02-01
The lambdatI terminator is located approximately 280 bp beyond the lambdaint gene, and it has a typical structure of an intrinsic terminator. To identify sequences required for lambdatI transcription termination a set of deletion mutants were generated, either from the 5' or the 3' end onto the lambdatI region. The termination efficiency was determined by measuring galactokinase (galK) levels by Northern blot assays and by in vitro transcription termination. The importance of the uridines and the stability of the stem structure in the termination were demonstrated. The nontranscribed DNA beyond the 3' end also affects termination. Additionally, sequences upstream have a small effect on transcription termination. The in vivo RNA termination sites at lambdatI were determined by S1 mapping and were located at 8 different positions. Processing of transcripts from the 3' end confirmed the importance of the hairpin stem in protection against exonuclease.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ma, Xiang; Zhang, Shuai; Jiao, Fang
Two-step nucleation pathways in which disordered, amorphous, or dense liquid states precede appearance of crystalline phases have been reported for a wide range of materials, but the dynamics of such pathways are poorly understood. Moreover, whether these pathways are general features of crystallizing systems or a consequence of system-specific structural details that select for direct vs two-step processes is unknown. Using atomic force microscopy to directly observe crystallization of sequence-defined polymers, we show that crystallization pathways are indeed sequence dependent. When a short hydrophobic region is added to a sequence that directly forms crystalline particles, crystallization instead follows a two-stepmore » pathway that begins with creation of disordered clusters of 10-20 molecules and is characterized by highly non-linear crystallization kinetics in which clusters transform into ordered structures that then enter the growth phase. The results shed new light on non-classical crystallization mechanisms and have implications for design of self-assembling polymer systems.« less
Mainardi, L T; Pattini, L; Cerutti, S
2007-01-01
A novel method is presented for the investigation of protein properties of sequences using Ramanujan Fourier Transform (RFT). The new methodology involves the preprocessing of protein sequence data by numerically encoding it and then applying the RFT. The RFT is based on projecting the obtained numerical series on a set of basis functions constituted by Ramanujan sums (RS). In RS components, periodicities of finite integer length, rather than frequency, (as in classical harmonic analysis) are considered. The potential of the new approach is documented by a few examples in the analysis of hydrophobic profiles of proteins in two classes including abundance of alpha-helices (group A) or beta-strands (group B). Different patterns are provided as evidence. RFT can be used to characterize the structural properties of proteins and integrate complementary information provided by other signal processing transforms.
NASA Astrophysics Data System (ADS)
Ginsburger, Kévin; Poupon, Fabrice; Beaujoin, Justine; Estournet, Delphine; Matuschke, Felix; Mangin, Jean-François; Axer, Markus; Poupon, Cyril
2018-02-01
White matter is composed of irregularly packed axons leading to a structural disorder in the extra-axonal space. Diffusion MRI experiments using oscillating gradient spin echo sequences have shown that the diffusivity transverse to axons in this extra-axonal space is dependent on the frequency of the employed sequence. In this study, we observe the same frequency-dependence using 3D simulations of the diffusion process in disordered media. We design a novel white matter numerical phantom generation algorithm which constructs biomimicking geometric configurations with few design parameters, and enables to control the level of disorder of the generated phantoms. The influence of various geometrical parameters present in white matter, such as global angular dispersion, tortuosity, presence of Ranvier nodes, beading, on the extra-cellular perpendicular diffusivity frequency dependence was investigated by simulating the diffusion process in numerical phantoms of increasing complexity and fitting the resulting simulated diffusion MR signal attenuation with an adequate analytical model designed for trapezoidal OGSE sequences. This work suggests that angular dispersion and especially beading have non-negligible effects on this extracellular diffusion metrics that may be measured using standard OGSE DW-MRI clinical protocols.
Thermodynamic stability of biomolecules and evolution.
Chakravarty, Ashim K
2017-08-01
The thermodynamic stability of biomolecules in the perspective of evolution is a complex issue and needs discussion. Intra molecular bonds maintain the structure and the state of internal energy (E) of a biomolecule at "local minima". In this communication, possibility of loss in internal energy level of a biomolecule through the changes in the bonds has been discussed, that might earn more thermodynamic stability for the molecule. In the process variations in structure and functions of the molecule could occur. Thus, E of a biomolecule is likely to have energy stature for minimization. Such change in energy status is an intrinsic factor for evolving biomolecules buying more stability and generating variations in the structure and function of DNA molecules undergoing natural selection. Thus, the variations might very well contribute towards the process of evolution. A brief discussion on conserved sequence in the light of proposition in this communication has been made at the end. Extension of the idea may resolve certain standing problems in evolution, such as maintenance of conserved sequences in genome of diverse species, pre- versus post adaptive mutations, 'orthogenesis', etc. Copyright © 2017 Elsevier Ltd. All rights reserved.
Towards fully automated structure-based function prediction in structural genomics: a case study.
Watson, James D; Sanderson, Steve; Ezersky, Alexandra; Savchenko, Alexei; Edwards, Aled; Orengo, Christine; Joachimiak, Andrzej; Laskowski, Roman A; Thornton, Janet M
2007-04-13
As the global Structural Genomics projects have picked up pace, the number of structures annotated in the Protein Data Bank as hypothetical protein or unknown function has grown significantly. A major challenge now involves the development of computational methods to assign functions to these proteins accurately and automatically. As part of the Midwest Center for Structural Genomics (MCSG) we have developed a fully automated functional analysis server, ProFunc, which performs a battery of analyses on a submitted structure. The analyses combine a number of sequence-based and structure-based methods to identify functional clues. After the first stage of the Protein Structure Initiative (PSI), we review the success of the pipeline and the importance of structure-based function prediction. As a dataset, we have chosen all structures solved by the MCSG during the 5 years of the first PSI. Our analysis suggests that two of the structure-based methods are particularly successful and provide examples of local similarity that is difficult to identify using current sequence-based methods. No one method is successful in all cases, so, through the use of a number of complementary sequence and structural approaches, the ProFunc server increases the chances that at least one method will find a significant hit that can help elucidate function. Manual assessment of the results is a time-consuming process and subject to individual interpretation and human error. We present a method based on the Gene Ontology (GO) schema using GO-slims that can allow the automated assessment of hits with a success rate approaching that of expert manual assessment.
NASA Astrophysics Data System (ADS)
Sethaphong, Latsavongsakda
This work examines smart material properties of rational self-assembly and molecular recognition found in nano-biosystems. Exploiting the sequence and structural information encoded within nucleic acids and proteins will permit programmed synthesis of nanomaterials and help create molecular machines that may carry out new roles involving chemical catalysis and bioenergy. Responsive to different ionic environments thru self-reorgnization, nucleic acids (NA) are nature's signature smart material; organisms such as viruses and bacteria use features of NAs to react to their environment and orchestrate their lifecycle. Furthermore, nucleic acid systems (both RNA and DNA) are currently exploited as scaffolds; recent applications have been showcased to build bioelectronics and biotemplated nanostructures via directed assembly of multidimensional nanoelectronic devices 1. Since the most stable and rudimentary structure of nucleic acids is the helical duplex, these were modeled in order to examine the influence of the microenvironment, sequence, and cation-dependent perturbations of their canonical forms. Due to their negatively charged phosphate backbone, NA's rely on counterions to overcome the inherent repulsive forces that arise from the assembly of two complementary strands. As a realistic model system, we chose the HIV-TAR helix (PDB ID: 397D) to study specific sequence motifs on cation sequestration. At physiologically relevant concentrations of sodium and potassium ions, we observed sequence based effects where purine stretches were adept in retaining high residency cations. The transitional space between adenine and guanosine nucleotides (ApG step) in a sequence proved the most favorable. This work was the first to directly show these subtle interactions of sequence based cationic sequestration and may be useful for controlling metallization of nucleic acids in conductive nanowires. Extending the study further, we explored the degree to which the structure of NA duplexes alone interacted with cations distinct from a specific sequence. Under physiologically relevant conditions, a duplex of RNA polyguanine-polycitidine was highly responsive and able to sequester cations to the middle of the purine stretches. The least responsive structure was a DNA polyadenine-polythymine duplex. A random sequence DNA duplex contorted into an RNA-like helix resulted in cationic dynamics similar to RNA systems. These studies showed that cation diffusive binding events in nucleic acid duplex structures are sequence specific and heavily influenced by structural aspects helical forms to account for much of the differences observed. Although structural information in nucleic acids is encoded within their sequence, linking amino acid sequence to protein structure is murkier; the structural information within proteins is encoded by the folding process itself: a complex phenomenon driven toward the equilibrium state of the active conformation. Upwards of two thirds of a protein's sequence can be substituted with similar amino acids without significantly perturbing its function; conserved residues of about 10% seem to be vital; since evolutionary selection pressure in proteins operates 3-dimenionally, a linear sequence is partially informative. We explored this problem by folding de-novo the cytosolic portion of the membrane protein, cellulose synthase, CESA1 from upland cotton, Gossypium hirsutum (Ghcesa1). The cytoplasmic region was generated by homology modeling and refined with molecular dynamics. These mutations impair local structural flexibility which likely results in cellulose that is produced at a lower rate and is less crystalline. Additional modeling of fragments of cellulose synthases from the model plant, Arabidopsis thaliana, offered novel insights into the function of conserved cytosolic domains within plant cellulose synthases. Transport mechanisms related to the transmembrane region revealed significant differences between plants and a bacterial complex. These studies generated possible mutations that may allow for the creation of new synthases and identified other avenues of research in order to develop technologies that may alter the crystallinity and other useful properties of cellulose. 1. Karplus, K., SAM-T08, HMM-based protein structure prediction. Nucleic Acids Research, 2009. 37: p. W492-W497.
Ibrahim, Kalibulla Syed; Muniyandi, Jeyaraj; Pandian, Shunmugiah Karutha
2011-10-01
Leather industries release a large amount of pollution-causing chemicals which creates one of the major industrial pollutions. The development of enzyme based processes as a potent alternative to pollution-causing chemicals is useful to overcome this issue. Proteases are enzymes which have extensive applications in leather processing and in several bioremediation processes due to their high alkaline protease activity and dehairing efficacy. In the present study, we report cloning, characterization of a Mn2+ dependent alkaline serine protease gene (MASPT) of Bacillus pumilus TMS55. The gene encoding the protease from B. pumilus TMS55 was cloned and its nucleotide sequence was determined. This gene has an open reading frame (ORF) of 1,149 bp that encodes a polypeptide of 383 amino acid residues. Our analysis showed that this polypeptide is composed of 29 residues N-terminal signal peptide, a propeptide of 79 residues and a mature protein of 275 amino acids. We performed bioinformatics analysis to compare MASPT enzyme with other proteases. Homology modeling was employed to model three dimensional structure for MASPT. Structural analysis showed that MASPT structure is composed of nine α-helices and nine β-strands. It has 3 catalytic residues and 14 metal binding residues. Docking analysis showed that residues S223, A260, N263, T328 and S329 interact with Mn2+. This study allows initial inferences about the structure of the protease and will allow the rational design of its derivatives for structure-function studies and also for further improvement of the enzyme.
McGarvey, J A; Franco, R B; Palumbo, J D; Hnasko, R; Stanker, L; Mitloehner, F M
2013-06-01
To describe, at high resolution, the bacterial population dynamics and chemical transformations during the ensiling of alfalfa and subsequent exposure to air. Samples of alfalfa, ensiled alfalfa and silage exposed to air were collected and their bacterial population structures compared using 16S rRNA gene libraries containing approximately 1900 sequences each. Cultural and chemical analyses were also performed to complement the 16S gene sequence data. Sequence analysis revealed significant differences (P < 0·05) in the bacterial populations at each time point. The alfalfa-derived library contained mostly sequences associated with the Gammaproteobacteria (including the genera: Enterobacter, Erwinia and Pantoea); the ensiled material contained mostly sequences associated with the lactic acid bacteria (LAB) (including the genera: Lactobacillus, Pediococcus and Lactococcus). Exposure to air resulted in even greater percentages of LAB, especially among the genus Lactobacillus, and a significant drop in bacterial diversity. In-depth 16S rRNA gene sequence analysis revealed significant bacterial population structure changes during ensiling and again during exposure to air. This in-depth description of the bacterial population dynamics that occurred during ensiling and simulated feed out expands our knowledge of these processes. © 2013 The Society for Applied Microbiology No claim to US Government works.
SIBIS: a Bayesian model for inconsistent protein sequence estimation.
Khenoussi, Walyd; Vanhoutrève, Renaud; Poch, Olivier; Thompson, Julie D
2014-09-01
The prediction of protein coding genes is a major challenge that depends on the quality of genome sequencing, the accuracy of the model used to elucidate the exonic structure of the genes and the complexity of the gene splicing process leading to different protein variants. As a consequence, today's protein databases contain a huge amount of inconsistency, due to both natural variants and sequence prediction errors. We have developed a new method, called SIBIS, to detect such inconsistencies based on the evolutionary information in multiple sequence alignments. A Bayesian framework, combined with Dirichlet mixture models, is used to estimate the probability of observing specific amino acids and to detect inconsistent or erroneous sequence segments. We evaluated the performance of SIBIS on a reference set of protein sequences with experimentally validated errors and showed that the sensitivity is significantly higher than previous methods, with only a small loss of specificity. We also assessed a large set of human sequences from the UniProt database and found evidence of inconsistency in 48% of the previously uncharacterized sequences. We conclude that the integration of quality control methods like SIBIS in automatic analysis pipelines will be critical for the robust inference of structural, functional and phylogenetic information from these sequences. Source code, implemented in C on a linux system, and the datasets of protein sequences are freely available for download at http://www.lbgi.fr/∼julie/SIBIS. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Brain potentials predict learning, transmission and modification of an artificial symbolic system.
Lumaca, Massimo; Baggio, Giosuè
2016-12-01
It has recently been argued that symbolic systems evolve while they are being transmitted across generations of learners, gradually adapting to the relevant brain structures and processes. In the context of this hypothesis, little is known on whether individual differences in neural processing capacity account for aspects of 'variation' observed in symbolic behavior and symbolic systems. We addressed this issue in the domain of auditory processing. We conducted a combined behavioral and EEG study on 2 successive days. On day 1, participants listened to standard and deviant five-tone sequences: as in previous oddball studies, an mismatch negativity (MMN) was elicited by deviant tones. On day 2, participants learned an artificial signaling system from a trained confederate of the experimenters in a coordination game in which five-tone sequences were associated to affective meanings (emotion-laden pictures of human faces). In a subsequent game with identical structure, participants transmitted and occasionally changed the signaling system learned during the first game. The MMN latency from day 1 predicted learning, transmission and structural modification of signaling systems on day 2. Our study introduces neurophysiological methods into research on cultural transmission and evolution, and relates aspects of variation in symbolic systems to individual differences in neural information processing. © The Author (2016). Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Kanamori, Hiroshi; Yuhashi, Kazuhito; Ohnishi, Shin; Koike, Kazuhiko; Kodama, Tatsuhiko
2010-05-01
The hepatitis C virus NS5B RNA-dependent RNA polymerase (RdRp) is a key enzyme involved in viral replication. Interaction between NS5B RdRp and the viral RNA sequence is likely to be an important step in viral RNA replication. The C-terminal half of the NS5B-coding sequence, which contains the important cis-acting replication element, has been identified as an NS5B-binding sequence. In the present study, we confirm the specific binding of NS5B to one of the RNA stem-loop structures in the region, 5BSL3.2. In addition, we show that NS5B binds to the complementary strand of 5BSL3.2 (5BSL3.2N). The bulge structure of 5BSL3.2N was shown to be indispensable for tight binding to NS5B. In vitro RdRp activity was inhibited by 5BSL3.2N, indicating the importance of the RNA element in the polymerization by RdRp. These results suggest the involvement of the RNA stem-loop structure of the negative strand in the replication process.
High-throughput analysis of T-DNA location and structure using sequence capture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Inagaki, Soichi; Henry, Isabelle M.; Lieberman, Meric C.
Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA—genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously,more » using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. As a result, our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.« less
High-throughput analysis of T-DNA location and structure using sequence capture
Inagaki, Soichi; Henry, Isabelle M.; Lieberman, Meric C.; ...
2015-10-07
Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA—genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously,more » using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. As a result, our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.« less
Shinshi, H.; Wenzler, H.; Neuhaus, J.-M.; Felix, G.; Hofsteenge, J.; Meins, F.
1988-01-01
Tobacco glucan endo-1,3-β-glucosidase (β-1,3-glucanase; 1,3-β-D-glucan glucanohydrolase; EC 3.2.1.39) exhibits complex hormonal and developmental regulation and is induced when plants are infected with pathogens. We determined the primary structure of this enzyme from the nucleotide sequence of five partial cDNA clones and the amino acid sequence of five peptides covering a total of 70 residues. β-1,3-Glucanase is produced as a 359-residue preproenzyme with an N-terminal hydrophobic signal peptide of 21 residues and a C-terminal extension of 22 residues containing a putative N-glycosylation site. The results of pulse-chase experiments with tunicamycin provide evidence that the first step in processing is loss of the signal peptide and addition of an oligosaccharide side chain. The glycosylated intermediate is further processed with the loss of the oligosaccharide side chain and C-terminal extension to give the mature enzyme. Heterogeneity in the sequences of cDNA clones and of mature protein and in Southern blot analysis of restriction endonuclease fragments indicates that tobacco β-1,3-glucanase is encoded by a small gene family. Two or three members of this family appear to have their evolutionary origin in each of the progenitors of tobacco, Nicotiana sylvestris and Nicotiana tomentosiformis. Images PMID:16593965
Knutson, Stacy T.; Westwood, Brian M.; Leuthaeuser, Janelle B.; Turner, Brandon E.; Nguyendac, Don; Shea, Gabrielle; Kumar, Kiran; Hayden, Julia D.; Harper, Angela F.; Brown, Shoshana D.; Morris, John H.; Ferrin, Thomas E.; Babbitt, Patricia C.
2017-01-01
Abstract Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification—amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two‐Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure‐Function Linkage Database, SFLD) self‐identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self‐identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well‐curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP‐identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F‐measure and performance analysis on the enolase search results and comparison to GEMMA and SCI‐PHY demonstrate that TuLIP avoids the over‐division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results. PMID:28054422
Zhou, Carol L Ecale
2015-01-01
In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure. This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins. CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.
Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi
2014-09-18
Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.
Mourier, Pierre A J; Guichard, Olivier Y; Herman, Fréderic; Sizun, Philippe; Viskov, Christian
2017-03-08
Low Molecular Weight Heparins (LMWH) are complex anticoagulant drugs that mainly inhibit the blood coagulation cascade through indirect interaction with antithrombin. While inhibition of the factor Xa is well described, little is known about the polysaccharide structure inhibiting thrombin. In fact, a minimal chain length of 18 saccharides units, including an antithrombin (AT) binding pentasaccharide, is mandatory to form the active ternary complex for LMWH obtained by alkaline β-elimination (e.g., enoxaparin). However, the relationship between structure of octadecasaccharides and their thrombin inhibition has not been yet assessed on natural compounds due to technical hurdles to isolate sufficiently pure material. We report the preparation of five octadecasaccharides by using orthogonal separation methods including size exclusion, AT affinity, ion pairing and strong anion exchange chromatography. Each of these octadecasaccharides possesses two AT binding pentasaccharide sequences located at various positions. After structural elucidation using enzymatic sequencing and NMR, in vitro aFXa and aFIIa were determined. The biological activities reveal the critical role of each pentasaccharide sequence position within the octadecasaccharides and structural requirements to inhibit thrombin. Significant differences in potency, such as the twenty-fold magnitude difference observed between two regioisomers, further highlights the importance of depolymerisation process conditions on LMWH biological activity.
MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction.
Fang, Chao; Shang, Yi; Xu, Dong
2018-05-01
Protein secondary structure prediction can provide important information for protein 3D structure prediction and protein functions. Deep learning offers a new opportunity to significantly improve prediction accuracy. In this article, a new deep neural network architecture, named the Deep inception-inside-inception (Deep3I) network, is proposed for protein secondary structure prediction and implemented as a software tool MUFOLD-SS. The input to MUFOLD-SS is a carefully designed feature matrix corresponding to the primary amino acid sequence of a protein, which consists of a rich set of information derived from individual amino acid, as well as the context of the protein sequence. Specifically, the feature matrix is a composition of physio-chemical properties of amino acids, PSI-BLAST profile, and HHBlits profile. MUFOLD-SS is composed of a sequence of nested inception modules and maps the input matrix to either eight states or three states of secondary structures. The architecture of MUFOLD-SS enables effective processing of local and global interactions between amino acids in making accurate prediction. In extensive experiments on multiple datasets, MUFOLD-SS outperformed the best existing methods and other deep neural networks significantly. MUFold-SS can be downloaded from http://dslsrv8.cs.missouri.edu/~cf797/MUFoldSS/download.html. © 2018 Wiley Periodicals, Inc.
Fine-tuning structural RNA alignments in the twilight zone.
Bremges, Andreas; Schirmer, Stefanie; Giegerich, Robert
2010-04-30
A widely used method to find conserved secondary structure in RNA is to first construct a multiple sequence alignment, and then fold the alignment, optimizing a score based on thermodynamics and covariance. This method works best around 75% sequence similarity. However, in a "twilight zone" below 55% similarity, the sequence alignment tends to obscure the covariance signal used in the second phase. Therefore, while the overall shape of the consensus structure may still be found, the degree of conservation cannot be estimated reliably. Based on a combination of available methods, we present a method named planACstar for improving structure conservation in structural alignments in the twilight zone. After constructing a consensus structure by alignment folding, planACstar abandons the original sequence alignment, refolds the sequences individually, but consistent with the consensus, aligns the structures, irrespective of sequence, by a pure structure alignment method, and derives an improved sequence alignment from the alignment of structures, to be re-submitted to alignment folding, etc.. This circle may be iterated as long as structural conservation improves, but normally, one step suffices. Employing the tools ClustalW, RNAalifold, and RNAforester, we find that for sequences with 30-55% sequence identity, structural conservation can be improved by 10% on average, with a large variation, measured in terms of RNAalifold's own criterion, the structure conservation index.
Exploring Fold Space Preferences of New-born and Ancient Protein Superfamilies
Edwards, Hannah; Abeln, Sanne; Deane, Charlotte M.
2013-01-01
The evolution of proteins is one of the fundamental processes that has delivered the diversity and complexity of life we see around ourselves today. While we tend to define protein evolution in terms of sequence level mutations, insertions and deletions, it is hard to translate these processes to a more complete picture incorporating a polypeptide's structure and function. By considering how protein structures change over time we can gain an entirely new appreciation of their long-term evolutionary dynamics. In this work we seek to identify how populations of proteins at different stages of evolution explore their possible structure space. We use an annotation of superfamily age to this space and explore the relationship between these ages and a diverse set of properties pertaining to a superfamily's sequence, structure and function. We note several marked differences between the populations of newly evolved and ancient structures, such as in their length distributions, secondary structure content and tertiary packing arrangements. In particular, many of these differences suggest a less elaborate structure for newly evolved superfamilies when compared with their ancient counterparts. We show that the structural preferences we report are not a residual effect of a more fundamental relationship with function. Furthermore, we demonstrate the robustness of our results, using significant variation in the algorithm used to estimate the ages. We present these age estimates as a useful tool to analyse protein populations. In particularly, we apply this in a comparison of domains containing greek key or jelly roll motifs. PMID:24244135
Reilly, Kevin J.; Spencer, Kristie A.
2013-01-01
The current study investigated the processes responsible for selection of sounds and syllables during production of speech sequences in 10 adults with hypokinetic dysarthria from Parkinson’s disease, five adults with ataxic dysarthria, and 14 healthy control speakers. Speech production data from a choice reaction time task were analyzed to evaluate the effects of sequence length and practice on speech sound sequencing. Speakers produced sequences that were between one and five syllables in length over five experimental runs of 60 trials each. In contrast to the healthy speakers, speakers with hypokinetic dysarthria demonstrated exaggerated sequence length effects for both inter-syllable intervals (ISIs) and speech error rates. Conversely, speakers with ataxic dysarthria failed to demonstrate a sequence length effect on ISIs and were also the only group that did not exhibit practice-related changes in ISIs and speech error rates over the five experimental runs. The exaggerated sequence length effects in the hypokinetic speakers with Parkinson’s disease are consistent with an impairment of action selection during speech sequence production. The absent length effects observed in the speakers with ataxic dysarthria is consistent with previous findings that indicate a limited capacity to buffer speech sequences in advance of their execution. In addition, the lack of practice effects in these speakers suggests that learning-related improvements in the production rate and accuracy of speech sequences involves processing by structures of the cerebellum. Together, the current findings inform models of serial control for speech in healthy speakers and support the notion that sequencing deficits contribute to speech symptoms in speakers with hypokinetic or ataxic dysarthria. In addition, these findings indicate that speech sequencing is differentially impaired in hypokinetic and ataxic dysarthria. PMID:24137121
A divergent Pumilio repeat protein family for pre-rRNA processing and mRNA localization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Qiu, Chen; McCann, Kathleen L.; Wine, Robert N.
Pumilio/feminization of XX and XO animals (fem)-3 mRNA-binding factor (PUF) proteins bind sequence specifically to mRNA targets using a single-stranded RNA-binding domain comprising eight Pumilio (PUM) repeats. PUM repeats have now been identified in proteins that function in pre-rRNA processing, including human Puf-A and yeast Puf6. This is a role not previously ascribed to PUF proteins. In this paper we present crystal structures of human Puf-A that reveal a class of nucleic acid-binding proteins with 11 PUM repeats arranged in an “L”-like shape. In contrast to classical PUF proteins, Puf-A forms sequence-independent interactions with DNA or RNA, mediated by conservedmore » basic residues. We demonstrate that equivalent basic residues in yeast Puf6 are important for RNA binding, pre-rRNA processing, and mRNA localization. Finally, PUM repeats can be assembled into alternative folds that bind to structured nucleic acids in addition to forming canonical eight-repeat crescent-shaped RNA-binding domains found in classical PUF proteins.« less
A divergent Pumilio repeat protein family for pre-rRNA processing and mRNA localization
Qiu, Chen; McCann, Kathleen L.; Wine, Robert N.; ...
2014-12-15
Pumilio/feminization of XX and XO animals (fem)-3 mRNA-binding factor (PUF) proteins bind sequence specifically to mRNA targets using a single-stranded RNA-binding domain comprising eight Pumilio (PUM) repeats. PUM repeats have now been identified in proteins that function in pre-rRNA processing, including human Puf-A and yeast Puf6. This is a role not previously ascribed to PUF proteins. In this paper we present crystal structures of human Puf-A that reveal a class of nucleic acid-binding proteins with 11 PUM repeats arranged in an “L”-like shape. In contrast to classical PUF proteins, Puf-A forms sequence-independent interactions with DNA or RNA, mediated by conservedmore » basic residues. We demonstrate that equivalent basic residues in yeast Puf6 are important for RNA binding, pre-rRNA processing, and mRNA localization. Finally, PUM repeats can be assembled into alternative folds that bind to structured nucleic acids in addition to forming canonical eight-repeat crescent-shaped RNA-binding domains found in classical PUF proteins.« less
FIR Filter of DS-CDMA UWB Modem Transmitter
NASA Astrophysics Data System (ADS)
Kang, Kyu-Min; Cho, Sang-In; Won, Hui-Chul; Choi, Sang-Sung
This letter presents low-complexity digital pulse shaping filter structures of a direct sequence code division multiple access (DS-CDMA) ultra wide-band (UWB) modem transmitter with a ternary spreading code. The proposed finite impulse response (FIR) filter structures using a look-up table (LUT) have the effect of saving the amount of memory by about 50% to 80% in comparison to the conventional FIR filter structures, and consequently are suitable for a high-speed parallel data process.
Topological impact of noncanonical DNA structures on Klenow fragment of DNA polymerase.
Takahashi, Shuntaro; Brazier, John A; Sugimoto, Naoki
2017-09-05
Noncanonical DNA structures that stall DNA replication can cause errors in genomic DNA. Here, we investigated how the noncanonical structures formed by sequences in genes associated with a number of diseases impacted DNA polymerization by the Klenow fragment of DNA polymerase. Replication of a DNA sequence forming an i-motif from a telomere, hypoxia-induced transcription factor, and an insulin-linked polymorphic region was effectively inhibited. On the other hand, replication of a mixed-type G-quadruplex (G4) from a telomere was less inhibited than that of the antiparallel type or parallel type. Interestingly, the i-motif was a better inhibitor of replication than were mixed-type G4s or hairpin structures, even though all had similar thermodynamic stabilities. These results indicate that both the stability and topology of structures formed in DNA templates impact the processivity of a DNA polymerase. This suggests that i-motif formation may trigger genomic instability by stalling the replication of DNA, causing intractable diseases.
Topological impact of noncanonical DNA structures on Klenow fragment of DNA polymerase
Takahashi, Shuntaro; Brazier, John A.; Sugimoto, Naoki
2017-01-01
Noncanonical DNA structures that stall DNA replication can cause errors in genomic DNA. Here, we investigated how the noncanonical structures formed by sequences in genes associated with a number of diseases impacted DNA polymerization by the Klenow fragment of DNA polymerase. Replication of a DNA sequence forming an i-motif from a telomere, hypoxia-induced transcription factor, and an insulin-linked polymorphic region was effectively inhibited. On the other hand, replication of a mixed-type G-quadruplex (G4) from a telomere was less inhibited than that of the antiparallel type or parallel type. Interestingly, the i-motif was a better inhibitor of replication than were mixed-type G4s or hairpin structures, even though all had similar thermodynamic stabilities. These results indicate that both the stability and topology of structures formed in DNA templates impact the processivity of a DNA polymerase. This suggests that i-motif formation may trigger genomic instability by stalling the replication of DNA, causing intractable diseases. PMID:28827350
Conlon, J M; Davis, M S; Falkmer, S; Thim, L
1987-11-02
The primary structures of three peptides from extracts from the pancreatic islets of the daddy sculpin (Cottus scorpius) and three analogous peptides from the islets of the flounder (Platichthys flesus), two species of teleostean fish, have been determined by automated Edman degradation. The structures of the flounder peptides were confirmed by fast-atom bombardment mass spectrometry. The peptides show strong homology to residues (49-60), (63-96) and (98-125) of the predicted sequence of preprosomatostatin II from the anglerfish (Lophius americanus). The amino acid sequences of the peptides suggest that, in the sculpin, prosomatostatin II is cleaved at a dibasic amino acid residue processing site (corresponding to Lys61-Arg62 in anglerfish preprosomatostatin II). The resulting fragments are further cleaved at monobasic residue processing sites (corresponding to Arg48 and Arg97 in anglerfish preprosomatostatin II). In the flounder the same dibasic residue processing site is utilised but cleavage at different monobasic sites takes place (corresponding to Arg50 and Arg97 in anglerfish preprosomatostatin II). A peptide identical to mammalian somatostatin-14 was also isolated from the islets of both species and is presumed to represent a cleavage product of prosomatostatin I.
Mechanisms of molecular mimicry involving the microbiota in neurodegeneration.
Friedland, Robert P
2015-01-01
The concept of molecular mimicry was established to explain commonalities of structure which developed in response to evolutionary pressures. Most examples of molecular mimicry in medicine have involved homologies of primary protein structure which cause disease. Molecular mimicry can be expanded beyond amino acid sequence to include microRNA and proteomic effects which are either pathogenic or salutogenic (beneficial) in regard to Parkinson's disease, Alzheimer's disease, and related disorders. Viruses of animal or plant origin may mimic nucleotide sequences of microRNAs and influence protein expression. Both Parkinson's and Alzheimer's diseases involve the formation of transmissible self-propagating prion-like proteins. However, the initiating factors responsible for creation of these misfolded nucleating factors are unknown. Amyloid patterns of protein folding are highly conserved through evolution and are widely distributed in the world. Similarities of tertiary protein structure may be involved in the creation of these prion-like agents through molecular mimicry. Cross-seeding of amyloid misfolding, altered proteostasis, and oxidative stress may be induced by amyloid proteins residing in bacteria in our microbiota in the gut and in the diet. Pathways of molecular mimicry induced processes induced by bacterial amyloid in neurodegeneration may involve TLR 2/1, CD14, and NFκB, among others. Furthermore, priming of the innate immune system by the microbiota may enhance the inflammatory response to cerebral amyloids (such as amyloid-β and α-synuclein). This paper describes the specific molecular pathways of these cross-seeding and neuroinflammatory processes. Evolutionary conservation of proteins provides the opportunity for conserved sequences and structures to influence neurological disease through molecular mimicry.
Oh, Jeongsu; Choi, Chi-Hwan; Park, Min-Kyu; Kim, Byung Kwon; Hwang, Kyuin; Lee, Sang-Heon; Hong, Soon Gyu; Nasir, Arshan; Cho, Wan-Sup; Kim, Kyung Mo
2016-01-01
High-throughput sequencing can produce hundreds of thousands of 16S rRNA sequence reads corresponding to different organisms present in the environmental samples. Typically, analysis of microbial diversity in bioinformatics starts from pre-processing followed by clustering 16S rRNA reads into relatively fewer operational taxonomic units (OTUs). The OTUs are reliable indicators of microbial diversity and greatly accelerate the downstream analysis time. However, existing hierarchical clustering algorithms that are generally more accurate than greedy heuristic algorithms struggle with large sequence datasets. To keep pace with the rapid rise in sequencing data, we present CLUSTOM-CLOUD, which is the first distributed sequence clustering program based on In-Memory Data Grid (IMDG) technology-a distributed data structure to store all data in the main memory of multiple computing nodes. The IMDG technology helps CLUSTOM-CLOUD to enhance both its capability of handling larger datasets and its computational scalability better than its ancestor, CLUSTOM, while maintaining high accuracy. Clustering speed of CLUSTOM-CLOUD was evaluated on published 16S rRNA human microbiome sequence datasets using the small laboratory cluster (10 nodes) and under the Amazon EC2 cloud-computing environments. Under the laboratory environment, it required only ~3 hours to process dataset of size 200 K reads regardless of the complexity of the human microbiome data. In turn, one million reads were processed in approximately 20, 14, and 11 hours when utilizing 20, 30, and 40 nodes on the Amazon EC2 cloud-computing environment. The running time evaluation indicates that CLUSTOM-CLOUD can handle much larger sequence datasets than CLUSTOM and is also a scalable distributed processing system. The comparative accuracy test using 16S rRNA pyrosequences of a mock community shows that CLUSTOM-CLOUD achieves higher accuracy than DOTUR, mothur, ESPRIT-Tree, UCLUST and Swarm. CLUSTOM-CLOUD is written in JAVA and is freely available at http://clustomcloud.kopri.re.kr.
Park, Min-Kyu; Kim, Byung Kwon; Hwang, Kyuin; Lee, Sang-Heon; Hong, Soon Gyu; Nasir, Arshan; Cho, Wan-Sup; Kim, Kyung Mo
2016-01-01
High-throughput sequencing can produce hundreds of thousands of 16S rRNA sequence reads corresponding to different organisms present in the environmental samples. Typically, analysis of microbial diversity in bioinformatics starts from pre-processing followed by clustering 16S rRNA reads into relatively fewer operational taxonomic units (OTUs). The OTUs are reliable indicators of microbial diversity and greatly accelerate the downstream analysis time. However, existing hierarchical clustering algorithms that are generally more accurate than greedy heuristic algorithms struggle with large sequence datasets. To keep pace with the rapid rise in sequencing data, we present CLUSTOM-CLOUD, which is the first distributed sequence clustering program based on In-Memory Data Grid (IMDG) technology–a distributed data structure to store all data in the main memory of multiple computing nodes. The IMDG technology helps CLUSTOM-CLOUD to enhance both its capability of handling larger datasets and its computational scalability better than its ancestor, CLUSTOM, while maintaining high accuracy. Clustering speed of CLUSTOM-CLOUD was evaluated on published 16S rRNA human microbiome sequence datasets using the small laboratory cluster (10 nodes) and under the Amazon EC2 cloud-computing environments. Under the laboratory environment, it required only ~3 hours to process dataset of size 200 K reads regardless of the complexity of the human microbiome data. In turn, one million reads were processed in approximately 20, 14, and 11 hours when utilizing 20, 30, and 40 nodes on the Amazon EC2 cloud-computing environment. The running time evaluation indicates that CLUSTOM-CLOUD can handle much larger sequence datasets than CLUSTOM and is also a scalable distributed processing system. The comparative accuracy test using 16S rRNA pyrosequences of a mock community shows that CLUSTOM-CLOUD achieves higher accuracy than DOTUR, mothur, ESPRIT-Tree, UCLUST and Swarm. CLUSTOM-CLOUD is written in JAVA and is freely available at http://clustomcloud.kopri.re.kr. PMID:26954507
Felsenstein, K M; Goff, S P
1992-01-01
The gag-pol polyprotein of the murine and feline leukemia viruses is expressed by translational readthrough of a UAG terminator codon at the 3' end of the gag gene. To explore the cis-acting sequence requirements for the readthrough event in vivo, we generated a library of mutants of the Moloney murine leukemia virus with point mutations near the terminator codon and tested the mutant viral DNAs for the ability to direct synthesis of the gag-pol fusion protein and formation of infectious virus. The analysis showed that sequences 3' to the terminator are necessary and sufficient for the process. The results do not support a role for one proposed stem-loop structure that includes the terminator but are consistent with the involvement of another stem-loop 3' to the terminator. One mutant, containing two compensatory changes in this stem structure, was temperature sensitive for replication and for formation of the gag-pol protein. The results suggest that RNA sequence and structure are critical determinants of translational readthrough in vivo. Images PMID:1404606
Ribeyre, Cyril; Lopes, Judith; Boulé, Jean-Baptiste; Piazza, Aurèle; Guédin, Aurore; Zakian, Virginia A; Mergny, Jean-Louis; Nicolas, Alain
2009-05-01
In budding yeast, the Pif1 DNA helicase is involved in the maintenance of both nuclear and mitochondrial genomes, but its role in these processes is still poorly understood. Here, we provide evidence for a new Pif1 function by demonstrating that its absence promotes genetic instability of alleles of the G-rich human minisatellite CEB1 inserted in the Saccharomyces cerevisiae genome, but not of other tandem repeats. Inactivation of other DNA helicases, including Sgs1, had no effect on CEB1 stability. In vitro, we show that CEB1 repeats formed stable G-quadruplex (G4) secondary structures and the Pif1 protein unwinds these structures more efficiently than regular B-DNA. Finally, synthetic CEB1 arrays in which we mutated the potential G4-forming sequences were no longer destabilized in pif1Delta cells. Hence, we conclude that CEB1 instability in pif1Delta cells depends on the potential to form G-quadruplex structures, suggesting that Pif1 could play a role in the metabolism of G4-forming sequences.
Blome, C.D.; Reed, K.M.
1993-01-01
Destruction of radiolarians during both diagenesis and HF processing severely reduces faunal abundance and diversity and affects the taxonomic and biostratigraphic utility of chert residues. The robust forms that survive the processing represent only a small fraction of the death assemblage, and delicate skeletal structures used for species differentiation, are either poorly preserved or dissolved in many coeval chert residues. First and last occurrences of taxa in chert sequences are likely to be coarse approximations of their true stratigraphic ranges. Precise correlation is difficult between biozonations based solely on index species from cherts and those constructed from limestone faunas. Careful selection of samples in sequence, use of weaker HF solutions, and study of both chert and limestone faunas should yield better biostratigraphic information. -from Authors
Sequence-structure relationships in RNA loops: establishing the basis for loop homology modeling.
Schudoma, Christian; May, Patrick; Nikiforova, Viktoria; Walther, Dirk
2010-01-01
The specific function of RNA molecules frequently resides in their seemingly unstructured loop regions. We performed a systematic analysis of RNA loops extracted from experimentally determined three-dimensional structures of RNA molecules. A comprehensive loop-structure data set was created and organized into distinct clusters based on structural and sequence similarity. We detected clear evidence of the hallmark of homology present in the sequence-structure relationships in loops. Loops differing by <25% in sequence identity fold into very similar structures. Thus, our results support the application of homology modeling for RNA loop model building. We established a threshold that may guide the sequence divergence-based selection of template structures for RNA loop homology modeling. Of all possible sequences that are, under the assumption of isosteric relationships, theoretically compatible with actual sequences observed in RNA structures, only a small fraction is contained in the Rfam database of RNA sequences and classes implying that the actual RNA loop space may consist of a limited number of unique loop structures and conserved sequences. The loop-structure data sets are made available via an online database, RLooM. RLooM also offers functionalities for the modeling of RNA loop structures in support of RNA engineering and design efforts.
SimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction.
Boniecki, Michal J; Lach, Grzegorz; Dawson, Wayne K; Tomala, Konrad; Lukasz, Pawel; Soltysinski, Tomasz; Rother, Kristian M; Bujnicki, Janusz M
2016-04-20
RNA molecules play fundamental roles in cellular processes. Their function and interactions with other biomolecules are dependent on the ability to form complex three-dimensional (3D) structures. However, experimental determination of RNA 3D structures is laborious and challenging, and therefore, the majority of known RNAs remain structurally uncharacterized. Here, we present SimRNA: a new method for computational RNA 3D structure prediction, which uses a coarse-grained representation, relies on the Monte Carlo method for sampling the conformational space, and employs a statistical potential to approximate the energy and identify conformations that correspond to biologically relevant structures. SimRNA can fold RNA molecules using only sequence information, and, on established test sequences, it recapitulates secondary structure with high accuracy, including correct prediction of pseudoknots. For modeling of complex 3D structures, it can use additional restraints, derived from experimental or computational analyses, including information about secondary structure and/or long-range contacts. SimRNA also can be used to analyze conformational landscapes and identify potential alternative structures. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Transcription blockage by stable H-DNA analogs in vitro.
Pandey, Shristi; Ogloblina, Anna M; Belotserkovskii, Boris P; Dolinnaya, Nina G; Yakubovskaya, Marianna G; Mirkin, Sergei M; Hanawalt, Philip C
2015-08-18
DNA sequences that can form unusual secondary structures are implicated in regulating gene expression and causing genomic instability. H-palindromes are an important class of such DNA sequences that can form an intramolecular triplex structure, H-DNA. Within an H-palindrome, the H-DNA and canonical B-DNA are in a dynamic equilibrium that shifts toward H-DNA with increased negative supercoiling. The interplay between H- and B-DNA and the fact that the process of transcription affects supercoiling makes it difficult to elucidate the effects of H-DNA upon transcription. We constructed a stable structural analog of H-DNA that cannot flip into B-DNA, and studied the effects of this structure on transcription by T7 RNA polymerase in vitro. We found multiple transcription blockage sites adjacent to and within sequences engaged in this triplex structure. Triplex-mediated transcription blockage varied significantly with changes in ambient conditions: it was exacerbated in the presence of Mn(2+) or by increased concentrations of K(+) and Li(+). Analysis of the detailed pattern of the blockage suggests that RNA polymerase is sterically hindered by H-DNA and has difficulties in unwinding triplex DNA. The implications of these findings for the biological roles of triple-stranded DNA structures are discussed. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Observing complex action sequences: The role of the fronto-parietal mirror neuron system.
Molnar-Szakacs, Istvan; Kaplan, Jonas; Greenfield, Patricia M; Iacoboni, Marco
2006-11-15
A fronto-parietal mirror neuron network in the human brain supports the ability to represent and understand observed actions allowing us to successfully interact with others and our environment. Using functional magnetic resonance imaging (fMRI), we wanted to investigate the response of this network in adults during observation of hierarchically organized action sequences of varying complexity that emerge at different developmental stages. We hypothesized that fronto-parietal systems may play a role in coding the hierarchical structure of object-directed actions. The observation of all action sequences recruited a common bilateral network including the fronto-parietal mirror neuron system and occipito-temporal visual motion areas. Activity in mirror neuron areas varied according to the motoric complexity of the observed actions, but not according to the developmental sequence of action structures, possibly due to the fact that our subjects were all adults. These results suggest that the mirror neuron system provides a fairly accurate simulation process of observed actions, mimicking internally the level of motoric complexity. We also discuss the results in terms of the links between mirror neurons, language development and evolution.
Jézéquel, Laetitia; Loeper, Jacqueline; Pompon, Denis
2008-11-01
Combinatorial libraries coding for mosaic enzymes with predefined crossover points constitute useful tools to address and model structure-function relationships and for functional optimization of enzymes based on multivariate statistics. The presented method, called sequence-independent generation of a chimera-ordered library (SIGNAL), allows easy shuffling of any predefined amino acid segment between two or more proteins. This method is particularly well adapted to the exchange of protein structural modules. The procedure could also be well suited to generate ordered combinatorial libraries independent of sequence similarities in a robotized manner. Sequence segments to be recombined are first extracted by PCR from a single-stranded template coding for an enzyme of interest using a biotin-avidin-based method. This technique allows the reduction of parental template contamination in the final library. Specific PCR primers allow amplification of two complementary mosaic DNA fragments, overlapping in the region to be exchanged. Fragments are finally reassembled using a fusion PCR. The process is illustrated via the construction of a set of mosaic CYP2B enzymes using this highly modular approach.
Panzer, Katrin; Yilmaz, Pelin; Weiß, Michael; Reich, Lothar; Richter, Michael; Wiese, Jutta; Schmaljohann, Rolf; Labes, Antje; Imhoff, Johannes F.; Glöckner, Frank Oliver; Reich, Marlis
2015-01-01
Molecular diversity surveys have demonstrated that aquatic fungi are highly diverse, and that they play fundamental ecological roles in aquatic systems. Unfortunately, comparative studies of aquatic fungal communities are few and far between, due to the scarcity of adequate datasets. We combined all publicly available fungal 18S ribosomal RNA (rRNA) gene sequences with new sequence data from a marine fungi culture collection. We further enriched this dataset by adding validated contextual data. Specifically, we included data on the habitat type of the samples assigning fungal taxa to ten different habitat categories. This dataset has been created with the intention to serve as a valuable reference dataset for aquatic fungi including a phylogenetic reference tree. The combined data enabled us to infer fungal community patterns in aquatic systems. Pairwise habitat comparisons showed significant phylogenetic differences, indicating that habitat strongly affects fungal community structure. Fungal taxonomic composition differed considerably even on phylum and class level. Freshwater fungal assemblage was most different from all other habitat types and was dominated by basal fungal lineages. For most communities, phylogenetic signals indicated clustering of sequences suggesting that environmental factors were the main drivers of fungal community structure, rather than species competition. Thus, the diversification process of aquatic fungi must be highly clade specific in some cases.The combined data enabled us to infer fungal community patterns in aquatic systems. Pairwise habitat comparisons showed significant phylogenetic differences, indicating that habitat strongly affects fungal community structure. Fungal taxonomic composition differed considerably even on phylum and class level. Freshwater fungal assemblage was most different from all other habitat types and was dominated by basal fungal lineages. For most communities, phylogenetic signals indicated clustering of sequences suggesting that environmental factors were the main drivers of fungal community structure, rather than species competition. Thus, the diversification process of aquatic fungi must be highly clade specific in some cases. PMID:26226014
Inter-subject synchronization of brain responses during natural music listening
Abrams, Daniel A.; Ryali, Srikanth; Chen, Tianwen; Chordia, Parag; Khouzam, Amirah; Levitin, Daniel J.; Menon, Vinod
2015-01-01
Music is a cultural universal and a rich part of the human experience. However, little is known about common brain systems that support the processing and integration of extended, naturalistic ‘real-world’ music stimuli. We examined this question by presenting extended excerpts of symphonic music, and two pseudomusical stimuli in which the temporal and spectral structure of the Natural Music condition were disrupted, to non-musician participants undergoing functional brain imaging and analysing synchronized spatiotemporal activity patterns between listeners. We found that music synchronizes brain responses across listeners in bilateral auditory midbrain and thalamus, primary auditory and auditory association cortex, right-lateralized structures in frontal and parietal cortex, and motor planning regions of the brain. These effects were greater for natural music compared to the pseudo-musical control conditions. Remarkably, inter-subject synchronization in the inferior colliculus and medial geniculate nucleus was also greater for the natural music condition, indicating that synchronization at these early stages of auditory processing is not simply driven by spectro-temporal features of the stimulus. Increased synchronization during music listening was also evident in a right-hemisphere fronto-parietal attention network and bilateral cortical regions involved in motor planning. While these brain structures have previously been implicated in various aspects of musical processing, our results are the first to show that these regions track structural elements of a musical stimulus over extended time periods lasting minutes. Our results show that a hierarchical distributed network is synchronized between individuals during the processing of extended musical sequences, and provide new insight into the temporal integration of complex and biologically salient auditory sequences. PMID:23578016
Gruenstaeudl, Michael; Gerschler, Nico; Borsch, Thomas
2018-06-21
The sequencing and comparison of plastid genomes are becoming a standard method in plant genomics, and many researchers are using this approach to infer plant phylogenetic relationships. Due to the widespread availability of next-generation sequencing, plastid genome sequences are being generated at breakneck pace. This trend towards massive sequencing of plastid genomes highlights the need for standardized bioinformatic workflows. In particular, documentation and dissemination of the details of genome assembly, annotation, alignment and phylogenetic tree inference are needed, as these processes are highly sensitive to the choice of software and the precise settings used. Here, we present the procedure and results of sequencing, assembling, annotating and quality-checking of three complete plastid genomes of the aquatic plant genus Cabomba as well as subsequent gene alignment and phylogenetic tree inference. We accompany our findings by a detailed description of the bioinformatic workflow employed. Importantly, we share a total of eleven software scripts for each of these bioinformatic processes, enabling other researchers to evaluate and replicate our analyses step by step. The results of our analyses illustrate that the plastid genomes of Cabomba are highly conserved in both structure and gene content.
Fine-tuning structural RNA alignments in the twilight zone
2010-01-01
Background A widely used method to find conserved secondary structure in RNA is to first construct a multiple sequence alignment, and then fold the alignment, optimizing a score based on thermodynamics and covariance. This method works best around 75% sequence similarity. However, in a "twilight zone" below 55% similarity, the sequence alignment tends to obscure the covariance signal used in the second phase. Therefore, while the overall shape of the consensus structure may still be found, the degree of conservation cannot be estimated reliably. Results Based on a combination of available methods, we present a method named planACstar for improving structure conservation in structural alignments in the twilight zone. After constructing a consensus structure by alignment folding, planACstar abandons the original sequence alignment, refolds the sequences individually, but consistent with the consensus, aligns the structures, irrespective of sequence, by a pure structure alignment method, and derives an improved sequence alignment from the alignment of structures, to be re-submitted to alignment folding, etc.. This circle may be iterated as long as structural conservation improves, but normally, one step suffices. Conclusions Employing the tools ClustalW, RNAalifold, and RNAforester, we find that for sequences with 30-55% sequence identity, structural conservation can be improved by 10% on average, with a large variation, measured in terms of RNAalifold's own criterion, the structure conservation index. PMID:20433706
Distinct frontal regions for processing sentence syntax and story grammar.
Sirigu, A; Cohen, L; Zalla, T; Pradat-Diehl, P; Van Eeckhout, P; Grafman, J; Agid, Y
1998-12-01
Time is a fundamental dimension of cognition. It is expressed in the sequential ordering of individual elements in a wide variety of activities such as language, motor control or in the broader domain of long range goal-directed actions. Several studies have shown the importance of the frontal lobes in sequencing information. The question addressed in this study is whether this brain region hosts a single supramodal sequence processor, or whether separate mechanisms are required for different kinds of temporally organised knowledge structures such as syntax and action knowledge. Here we show that so-called agrammatic patients, with lesions in Broca's area, ordered word groups correctly to form a logical sequence of actions but they were severely impaired when similar word groups had to be ordered as a syntactically well-formed sentence. The opposite performance was observed in patients with dorsolateral prefrontal lesions, that is, while their syntactic processing was intact at the sentence level, they demonstrated a pronounced deficit in producing temporally coherent sequences of actions. Anatomical reconstruction of lesions from brain scans revealed that the sentence and action grammar deficits involved distinct, non-overlapping sites within the frontal lobes. Finally, in a third group of patients whose lesions encompassed both Broca's area and the prefrontal cortex, the two types of deficits were found. We conclude that sequence processing is specific to knowledge domains and involves different networks within the frontal lobes.
Schnare, Murray N.; Collings, James C.; Spencer, David F.; Gray, Michael W.
2000-01-01
In Crithidia fasciculata, the ribosomal RNA (rRNA) gene repeats range in size from ∼11 to 12 kb. This length heterogeneity is localized to a region of the intergenic spacer (IGS) that contains tandemly repeated copies of a 19mer sequence. The IGS also contains four copies of an ∼55 nt repeat that has an internal inverted repeat and is also present in the IGS of Leishmania species. We have mapped the C.fasciculata transcription initiation site as well as two other reverse transcriptase stop sites that may be analogous to the A0 and A′ pre-rRNA processing sites within the 5′ external transcribed spacer (ETS) of other eukaryotes. Features that could influence processing at these sites include two stretches of conserved primary sequence and three secondary structure elements present in the 5′ ETS. We also characterized the C.fasciculata U3 snoRNA, which has the potential for base-pairing with pre-rRNA sequences. Finally, we demonstrate that biosynthesis of large subunit rRNA in both C.fasciculata and Trypanosoma brucei involves 3′-terminal addition of three A residues that are not present in the corresponding DNA sequences. PMID:10982863
In-cell RNA structure probing with SHAPE-MaP.
Smola, Matthew J; Weeks, Kevin M
2018-06-01
This protocol is an extension to: Nat. Protoc. 10, 1643-1669 (2015); doi:10.1038/nprot.2015.103; published online 01 October 2015RNAs play key roles in many cellular processes. The underlying structure of RNA is an important determinant of how transcripts function, are processed, and interact with RNA-binding proteins and ligands. RNA structure analysis by selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) takes advantage of the reactivity of small electrophilic chemical probes that react with the 2'-hydroxyl group to assess RNA structure at nucleotide resolution. When coupled with mutational profiling (MaP), in which modified nucleotides are detected as internal miscodings during reverse transcription and then read out by massively parallel sequencing, SHAPE yields quantitative per-nucleotide measurements of RNA structure. Here, we provide an extension to our previous in vitro SHAPE-MaP protocol with detailed guidance for undertaking and analyzing SHAPE-MaP probing experiments in live cells. The MaP strategy works for both abundant-transcriptome experiments and for cellular RNAs of low to moderate abundance, which are not well examined by whole-transcriptome methods. In-cell SHAPE-MaP, performed in roughly 3 d, can be applied in cell types ranging from bacteria to cultured mammalian cells and is compatible with a variety of structure-probing reagents. We detail several strategies by which in-cell SHAPE-MaP can inform new biological hypotheses and emphasize downstream analyses that reveal sequence or structure motifs important for RNA interactions in cells.
Novel in situ resistance measurement for the investigation of CIGS growth in a selenization process
NASA Astrophysics Data System (ADS)
Liu, Wei; Tian, Jian-Guo; Li, Zu-Bin; He, Qing; Li, Feng-Yan; Li, Chang-Jian; Sun, Yun
2009-03-01
During the selenization process of CIGS thin films, the relation between the element loss rate and the precursor depositions are analyzed. The growth of the CIGS thin films during the selenization process is investigated by the novel in situ resistance measurement, by which the formation of compound semiconductors can be observed directly and simultaneously. Their structures, phase evolutions and element losses are analyzed by XRD and XRF. Based on the experimental results, it can be concluded that the phase transforms have nothing to do with the deposition sequences of precursors, while the element loss rates are related to the deposition sequences in this process. In addition, element loss mechanisms of CIGS thin films prepared by the selenization process are analyzed by the phase evolutions and chemical combined path in the In, Ga-Se reaction processes. Moreover it is verified that the element losses are depressed by increasing the ramping-up rate finally. The results provide effective methods to fabricate high-quality CIGS thin films with low element losses.
Dissecting the relationship between protein structure and sequence variation
NASA Astrophysics Data System (ADS)
Shahmoradi, Amir; Wilke, Claus; Wilke Lab Team
2015-03-01
Over the past decade several independent works have shown that some structural properties of proteins are capable of predicting protein evolution. The strength and significance of these structure-sequence relations, however, appear to vary widely among different proteins, with absolute correlation strengths ranging from 0 . 1 to 0 . 8 . Here we present the results from a comprehensive search for the potential biophysical and structural determinants of protein evolution by studying more than 200 structural and evolutionary properties in a dataset of 209 monomeric enzymes. We discuss the main protein characteristics responsible for the general patterns of protein evolution, and identify sequence divergence as the main determinant of the strengths of virtually all structure-evolution relationships, explaining ~ 10 - 30 % of observed variation in sequence-structure relations. In addition to sequence divergence, we identify several protein structural properties that are moderately but significantly coupled with the strength of sequence-structure relations. In particular, proteins with more homogeneous back-bone hydrogen bond energies, large fractions of helical secondary structures and low fraction of beta sheets tend to have the strongest sequence-structure relation. BEACON-NSF center for the study of evolution in action.
Yamashita, Satoshi; Masuya, Hayato; Abe, Shin; Masaki, Takashi; Okabe, Kimiko
2015-01-01
We examined the relationship between the community structure of wood-decaying fungi, detected by high-throughput sequencing, and the decomposition rate using 13 years of data from a forest dynamics plot. For molecular analysis and wood density measurements, drill dust samples were collected from logs and stumps of Fagus and Quercus in the plot. Regression using a negative exponential model between wood density and time since death revealed that the decomposition rate of Fagus was greater than that of Quercus. The residual between the expected value obtained from the regression curve and the observed wood density was used as a decomposition rate index. Principal component analysis showed that the fungal community compositions of both Fagus and Quercus changed with time since death. Principal component analysis axis scores were used as an index of fungal community composition. A structural equation model for each wood genus was used to assess the effect of fungal community structure traits on the decomposition rate and how the fungal community structure was determined by the traits of coarse woody debris. Results of the structural equation model suggested that the decomposition rate of Fagus was affected by two fungal community composition components: one that was affected by time since death and another that was not affected by the traits of coarse woody debris. In contrast, the decomposition rate of Quercus was not affected by coarse woody debris traits or fungal community structure. These findings suggest that, in the case of Fagus coarse woody debris, the fungal community structure is related to the decomposition process of its host substrate. Because fungal community structure is affected partly by the decay stage and wood density of its substrate, these factors influence each other. Further research on interactive effects is needed to improve our understanding of the relationship between fungal community structure and the woody debris decomposition process. PMID:26110605
SL1 revisited: functional analysis of the structure and conformation of HIV-1 genome RNA.
Sakuragi, Sayuri; Yokoyama, Masaru; Shioda, Tatsuo; Sato, Hironori; Sakuragi, Jun-Ichi
2016-11-11
The dimer initiation site/dimer linkage sequence (DIS/DLS) region of HIV is located on the 5' end of the viral genome and suggested to form complex secondary/tertiary structures. Within this structure, stem-loop 1 (SL1) is believed to be most important and an essential key to dimerization, since the sequence and predicted secondary structure of SL1 are highly stable and conserved among various virus subtypes. In particular, a six-base palindromic sequence is always present at the hairpin loop of SL1 and the formation of kissing-loop structure at this position between the two strands of genomic RNA is suggested to trigger dimerization. Although the higher-order structure model of SL1 is well accepted and perhaps even undoubted lately, there could be stillroom for consideration to depict the functional SL1 structure while in vivo (in virion or cell). In this study, we performed several analyses to identify the nucleotides and/or basepairing within SL1 which are necessary for HIV-1 genome dimerization, encapsidation, recombination and infectivity. We unexpectedly found that some nucleotides that are believed to contribute the formation of the stem do not impact dimerization or infectivity. On the other hand, we found that one G-C basepair involved in stem formation may serve as an alternative dimer interactive site. We also report on our further investigation of the roles of the palindromic sequences on viral replication. Collectively, we aim to assemble a more-comprehensive functional map of SL1 on the HIV-1 viral life cycle. We discovered several possibilities for a novel structure of SL1 in HIV-1 DLS. The newly proposed structure model suggested that the hairpin loop of SL1 appeared larger, and genome dimerization process might consist of more complicated mechanism than previously understood. Further investigations would be still required to fully understand the genome packaging and dimerization of HIV.
Structure-based Analysis to Hu-DNA Binding
DOE Office of Scientific and Technical Information (OSTI.GOV)
Swinger,K.; Rice, P.
2007-01-01
HU and IHF are prokaryotic proteins that induce very large bends in DNA. They are present in high concentrations in the bacterial nucleoid and aid in chromosomal compaction. They also function as regulatory cofactors in many processes, such as site-specific recombination and the initiation of replication and transcription. HU and IHF have become paradigms for understanding DNA bending and indirect readout of sequence. While IHF shows significant sequence specificity, HU binds preferentially to certain damaged or distorted DNAs. However, none of the structurally diverse HU substrates previously studied in vitro is identical with the distorted substrates in the recently publishedmore » Anabaena HU(AHU)-DNA cocrystal structures. Here, we report binding affinities for AHU and the DNA in the cocrystal structures. The binding free energies for formation of these AHU-DNA complexes range from 10-14.5 kcal/mol, representing K{sub d} values in the nanomolar to low picomolar range, and a maximum stabilization of at least 6.3 kcal/mol relative to complexes with undistorted, non-specific DNA. We investigated IHF binding and found that appropriate structural distortions can greatly enhance its affinity. On the basis of the coupling of structural and relevant binding data, we estimate the amount of conformational strain in an IHF-mediated DNA kink that is relieved by a nick (at least 0.76 kcal/mol) and pinpoint the location of the strain. We show that AHU has a sequence preference for an A+T-rich region in the center of its DNA-binding site, correlating with an unusually narrow minor groove. This is similar to sequence preferences shown by the eukaryotic nucleosome.« less
Accelerating calculations of RNA secondary structure partition functions using GPUs
2013-01-01
Background RNA performs many diverse functions in the cell in addition to its role as a messenger of genetic information. These functions depend on its ability to fold to a unique three-dimensional structure determined by the sequence. The conformation of RNA is in part determined by its secondary structure, or the particular set of contacts between pairs of complementary bases. Prediction of the secondary structure of RNA from its sequence is therefore of great interest, but can be computationally expensive. In this work we accelerate computations of base-pair probababilities using parallel graphics processing units (GPUs). Results Calculation of the probabilities of base pairs in RNA secondary structures using nearest-neighbor standard free energy change parameters has been implemented using CUDA to run on hardware with multiprocessor GPUs. A modified set of recursions was introduced, which reduces memory usage by about 25%. GPUs are fastest in single precision, and for some hardware, restricted to single precision. This may introduce significant roundoff error. However, deviations in base-pair probabilities calculated using single precision were found to be negligible compared to those resulting from shifting the nearest-neighbor parameters by a random amount of magnitude similar to their experimental uncertainties. For large sequences running on our particular hardware, the GPU implementation reduces execution time by a factor of close to 60 compared with an optimized serial implementation, and by a factor of 116 compared with the original code. Conclusions Using GPUs can greatly accelerate computation of RNA secondary structure partition functions, allowing calculation of base-pair probabilities for large sequences in a reasonable amount of time, with a negligible compromise in accuracy due to working in single precision. The source code is integrated into the RNAstructure software package and available for download at http://rna.urmc.rochester.edu. PMID:24180434
Lücke, S; Xu, G L; Palfi, Z; Cross, M; Bellofatto, V; Bindereif, A
1996-01-01
In trypanosomes mRNAs are generated through trans splicing. The spliced leader (SL) RNA, which donates the 5'-terminal mini-exon to each of the protein coding exons, plays a central role in the trans splicing process. We have established in vivo assays to study in detail trans splicing, cap4 modification, and RNP assembly of the SL RNA in the trypanosomatid species Leptomonas seymouri. First, we found that extensive sequences within the mini-exon are required for SL RNA function in vivo, although a conserved length of 39 nt is not essential. In contrast, the intron sequence appears to be surprisingly tolerant to mutation; only the stem-loop II structure is indispensable. The asymmetry of the sequence requirements in the stem I region suggests that this domain may exist in different functional conformations. Second, distinct mini-exon sequences outside the modification site are important for efficient cap4 formation. Third, all SL RNA mutations tested allowed core RNP assembly, suggesting flexible requirements for core protein binding. In sum, the results of our mutational analysis provide evidence for a discrete domain structure of the SL RNA and help to explain the strong phylogenetic conservation of the mini-exon sequence and of the overall SL RNA secondary structure; they also suggest that there may be certain differences between trans splicing in nematodes and trypanosomes. This approach provides a basis for studying RNA-RNA interactions in the trans spliceosome. Images PMID:8861965
Parker, K A; Steitz, J A
1987-01-01
The human U3 ribonucleoprotein (RNP) has been analyzed to determine its protein constituents, sites of protein-RNA interaction, and RNA secondary structure. By using anti-U3 RNP antibodies and extracts prepared from HeLa cells labeled in vivo, the RNP was found to contain four nonphosphorylated proteins of 36, 30, 13, and 12.5 kilodaltons and two phosphorylated proteins of 74 and 59 kilodaltons. U3 nucleotides 72-90, 106-121, 154-166, and 190-217 must contain sites that interact with proteins since these regions are immunoprecipitated after treatment of the RNP with RNase A or T1. The secondary structure was probed with specific nucleases and by chemical modification with single-strand-specific reagents that block subsequent reverse transcription. Regions that are single stranded (and therefore potentially able to interact with a substrate RNA) include an evolutionarily conserved sequence at nucleotides 104-112 and nonconserved sequences at nucleotides 65-74, 80-84, and 88-93. Nucleotides 159-168 do not appear to be highly accessible, thus making it unlikely that this U3 sequence base pairs with sequences near the 5.8S rRNA-internal transcribed spacer II junction, as previously proposed. Alternative functions of the U3 RNP are discussed, including the possibility that U3 may participate in a processing event near the 3' end of 28S rRNA. Images PMID:2959855
Cui, Xuefeng; Lu, Zhiwu; Wang, Sheng; Jing-Yan Wang, Jim; Gao, Xin
2016-06-15
Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence-structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM-HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods. Our program is freely available for download from http://sfb.kaust.edu.sa/Pages/Software.aspx : xin.gao@kaust.edu.sa Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
ERIC Educational Resources Information Center
Stahl, Robert J.; Murphy, Gary T.
Weaknesses in the structure, levels, and sequence of Bloom's taxonomy of cognitive domains emphasize the need for both a new model of how individual learners process information and a new taxonomy of the different levels of memory, thinking, and learning. Both the model and the taxonomy should be consistent with current research findings. The…
Structator: fast index-based search for RNA sequence-structure patterns
2011-01-01
Background The secondary structure of RNA molecules is intimately related to their function and often more conserved than the sequence. Hence, the important task of searching databases for RNAs requires to match sequence-structure patterns. Unfortunately, current tools for this task have, in the best case, a running time that is only linear in the size of sequence databases. Furthermore, established index data structures for fast sequence matching, like suffix trees or arrays, cannot benefit from the complementarity constraints introduced by the secondary structure of RNAs. Results We present a novel method and readily applicable software for time efficient matching of RNA sequence-structure patterns in sequence databases. Our approach is based on affix arrays, a recently introduced index data structure, preprocessed from the target database. Affix arrays support bidirectional pattern search, which is required for efficiently handling the structural constraints of the pattern. Structural patterns like stem-loops can be matched inside out, such that the loop region is matched first and then the pairing bases on the boundaries are matched consecutively. This allows to exploit base pairing information for search space reduction and leads to an expected running time that is sublinear in the size of the sequence database. The incorporation of a new chaining approach in the search of RNA sequence-structure patterns enables the description of molecules folding into complex secondary structures with multiple ordered patterns. The chaining approach removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our method runs up to two orders of magnitude faster than previous methods. Conclusions The presented method's sublinear expected running time makes it well suited for RNA sequence-structure pattern matching in large sequence databases. RNA molecules containing several stem-loop substructures can be described by multiple sequence-structure patterns and their matches are efficiently handled by a novel chaining method. Beyond our algorithmic contributions, we provide with Structator a complete and robust open-source software solution for index-based search of RNA sequence-structure patterns. The Structator software is available at http://www.zbh.uni-hamburg.de/Structator. PMID:21619640
The MIGenAS integrated bioinformatics toolkit for web-based sequence analysis
Rampp, Markus; Soddemann, Thomas; Lederer, Hermann
2006-01-01
We describe a versatile and extensible integrated bioinformatics toolkit for the analysis of biological sequences over the Internet. The web portal offers convenient interactive access to a growing pool of chainable bioinformatics software tools and databases that are centrally installed and maintained by the RZG. Currently, supported tasks comprise sequence similarity searches in public or user-supplied databases, computation and validation of multiple sequence alignments, phylogenetic analysis and protein–structure prediction. Individual tools can be seamlessly chained into pipelines allowing the user to conveniently process complex workflows without the necessity to take care of any format conversions or tedious parsing of intermediate results. The toolkit is part of the Max-Planck Integrated Gene Analysis System (MIGenAS) of the Max Planck Society available at (click ‘Start Toolkit’). PMID:16844980
Facilitated sequence counting and assembly by template mutagenesis
Levy, Dan; Wigler, Michael
2014-01-01
Presently, inferring the long-range structure of the DNA templates is limited by short read lengths. Accurate template counts suffer from distortions occurring during PCR amplification. We explore the utility of introducing random mutations in identical or nearly identical templates to create distinguishable patterns that are inherited during subsequent copying. We simulate the applications of this process under assumptions of error-free sequencing and perfect mapping, using cytosine deamination as a model for mutation. The simulations demonstrate that within readily achievable conditions of nucleotide conversion and sequence coverage, we can accurately count the number of otherwise identical molecules as well as connect variants separated by long spans of identical sequence. We discuss many potential applications, such as transcript profiling, isoform assembly, haplotype phasing, and de novo genome assembly. PMID:25313059
Random Sequence for Optimal Low-Power Laser Generated Ultrasound
NASA Astrophysics Data System (ADS)
Vangi, D.; Virga, A.; Gulino, M. S.
2017-08-01
Low-power laser generated ultrasounds are lately gaining importance in the research world, thanks to the possibility of investigating a mechanical component structural integrity through a non-contact and Non-Destructive Testing (NDT) procedure. The ultrasounds are, however, very low in amplitude, making it necessary to use pre-processing and post-processing operations on the signals to detect them. The cross-correlation technique is used in this work, meaning that a random signal must be used as laser input. For this purpose, a highly random and simple-to-create code called T sequence, capable of enhancing the ultrasound detectability, is introduced (not previously available at the state of the art). Several important parameters which characterize the T sequence can influence the process: the number of pulses Npulses , the pulse duration δ and the distance between pulses dpulses . A Finite Element FE model of a 3 mm steel disk has been initially developed to analytically study the longitudinal ultrasound generation mechanism and the obtainable outputs. Later, experimental tests have shown that the T sequence is highly flexible for ultrasound detection purposes, making it optimal to use high Npulses and δ but low dpulses . In the end, apart from describing all phenomena that arise in the low-power laser generation process, the results of this study are also important for setting up an effective NDT procedure using this technology.
Metzger, Julia; Tonda, Raul; Beltran, Sergi; Agueda, Lídia; Gut, Marta; Distl, Ottmar
2014-07-04
Domestication has shaped the horse and lead to a group of many different types. Some have been under strong human selection while others developed in close relationship with nature. The aim of our study was to perform next generation sequencing of breed and non-breed horses to provide an insight into genetic influences on selective forces. Whole genome sequencing of five horses of four different populations revealed 10,193,421 single nucleotide polymorphisms (SNPs) and 1,361,948 insertion/deletion polymorphisms (indels). In comparison to horse variant databases and previous reports, we were able to identify 3,394,883 novel SNPs and 868,525 novel indels. We analyzed the distribution of individual variants and found significant enrichment of private mutations in coding regions of genes involved in primary metabolic processes, anatomical structures, morphogenesis and cellular components in non-breed horses and in contrast to that private mutations in genes affecting cell communication, lipid metabolic process, neurological system process, muscle contraction, ion transport, developmental processes of the nervous system and ectoderm in breed horses. Our next generation sequencing data constitute an important first step for the characterization of non-breed in comparison to breed horses and provide a large number of novel variants for future analyses. Functional annotations suggest specific variants that could play a role for the characterization of breed or non-breed horses.
NASA Technical Reports Server (NTRS)
Rudd, Michelle T.; Hilburger, Mark W.; Lovejoy, Andrew E.; Lindell, Michael C.; Gardner, Nathaniel W.; Schultz, Marc R.
2018-01-01
The NASA Engineering Safety Center (NESC) Shell Buckling Knockdown Factor Project (SBKF) was established in 2007 by the NESC with the primary objective to develop analysis-based buckling design factors and guidelines for metallic and composite launch-vehicle structures.1 A secondary objective of the project is to advance technologies that have the potential to increase the structural efficiency of launch-vehicles. The SBKF Project has determined that weld-land stiffness discontinuities can significantly reduce the buckling load of a cylinder. In addition, the welding process can introduce localized geometric imperfections that can further exacerbate the inherent buckling imperfection sensitivity of the cylinder. Therefore, single-piece barrel fabrication technologies can improve structural efficiency by eliminating these weld-land issues. As part of this effort, SBKF partnered with the Advanced Materials and Processing Branch (AMPB) at NASA Langley Research Center (LaRC), the Mechanical and Fabrication Branch at NASA Marshall Space Flight Center (MSFC), and ATI Forged Products to design and fabricate an 8-ft-diameter orthogrid-stiffened seamless metallic cylinder. The cylinder was subjected to seven subcritical load sequences (load levels that are not intended to induce test article buckling or material failure) and one load sequence to failure. The purpose of this test effort was to demonstrate the potential benefits of building cylindrical structures with no weld lands using the flow-formed manufacturing process. This seamless barrel is the ninth 8-ft-diameter metallic barrel and the first single-piece metallic structure to be tested under this program.
Rincon, Sergio A; Paoletti, Anne
2016-01-01
Unveiling the function of a novel protein is a challenging task that requires careful experimental design. Yeast cytokinesis is a conserved process that involves modular structural and regulatory proteins. For such proteins, an important step is to identify their domains and structural organization. Here we briefly discuss a collection of methods commonly used for sequence alignment and prediction of protein structure that represent powerful tools for the identification homologous domains and design of structure-function approaches to test experimentally the function of multi-domain proteins such as those implicated in yeast cytokinesis.
DNA viewed as an out-of-equilibrium structure
NASA Astrophysics Data System (ADS)
Provata, A.; Nicolis, C.; Nicolis, G.
2014-05-01
The complexity of the primary structure of human DNA is explored using methods from nonequilibrium statistical mechanics, dynamical systems theory, and information theory. A collection of statistical analyses is performed on the DNA data and the results are compared with sequences derived from different stochastic processes. The use of χ2 tests shows that DNA can not be described as a low order Markov chain of order up to r =6. Although detailed balance seems to hold at the level of a binary alphabet, it fails when all four base pairs are considered, suggesting spatial asymmetry and irreversibility. Furthermore, the block entropy does not increase linearly with the block size, reflecting the long-range nature of the correlations in the human genomic sequences. To probe locally the spatial structure of the chain, we study the exit distances from a specific symbol, the distribution of recurrence distances, and the Hurst exponent, all of which show power law tails and long-range characteristics. These results suggest that human DNA can be viewed as a nonequilibrium structure maintained in its state through interactions with a constantly changing environment. Based solely on the exit distance distribution accounting for the nonequilibrium statistics and using the Monte Carlo rejection sampling method, we construct a model DNA sequence. This method allows us to keep both long- and short-range statistical characteristics of the native DNA data. The model sequence presents the same characteristic exponents as the natural DNA but fails to capture spatial correlations and point-to-point details.
DNA viewed as an out-of-equilibrium structure.
Provata, A; Nicolis, C; Nicolis, G
2014-05-01
The complexity of the primary structure of human DNA is explored using methods from nonequilibrium statistical mechanics, dynamical systems theory, and information theory. A collection of statistical analyses is performed on the DNA data and the results are compared with sequences derived from different stochastic processes. The use of χ^{2} tests shows that DNA can not be described as a low order Markov chain of order up to r=6. Although detailed balance seems to hold at the level of a binary alphabet, it fails when all four base pairs are considered, suggesting spatial asymmetry and irreversibility. Furthermore, the block entropy does not increase linearly with the block size, reflecting the long-range nature of the correlations in the human genomic sequences. To probe locally the spatial structure of the chain, we study the exit distances from a specific symbol, the distribution of recurrence distances, and the Hurst exponent, all of which show power law tails and long-range characteristics. These results suggest that human DNA can be viewed as a nonequilibrium structure maintained in its state through interactions with a constantly changing environment. Based solely on the exit distance distribution accounting for the nonequilibrium statistics and using the Monte Carlo rejection sampling method, we construct a model DNA sequence. This method allows us to keep both long- and short-range statistical characteristics of the native DNA data. The model sequence presents the same characteristic exponents as the natural DNA but fails to capture spatial correlations and point-to-point details.
The mechanism and control of DNA transfer by the conjugative relaxase of resistance plasmid pCU1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nash, Rebekah Potts; Habibi, Sohrab; Cheng, Yuan
2010-11-15
Bacteria expand their genetic diversity, spread antibiotic resistance genes, and obtain virulence factors through the highly coordinated process of conjugative plasmid transfer (CPT). A plasmid-encoded relaxase enzyme initiates and terminates CPT by nicking and religating the transferred plasmid in a sequence-specific manner. We solved the 2.3 {angstrom} crystal structure of the relaxase responsible for the spread of the resistance plasmid pCU1 and determined its DNA binding and nicking capabilities. The overall fold of the pCU1 relaxase is similar to that of the F plasmid and plasmid R388 relaxases. However, in the pCU1 structure, the conserved tyrosine residues (Y18,19,26,27) that aremore » required for DNA nicking and religation were displaced up to 14 {angstrom} out of the relaxase active site, revealing a high degree of mobility in this region of the enzyme. In spite of this flexibility, the tyrosines still cleaved the nic site of the plasmid's origin of transfer, and did so in a sequence-specific, metal-dependent manner. Unexpectedly, the pCU1 relaxase lacked the sequence-specific DNA binding previously reported for the homologous F and R388 relaxase enzymes, despite its high sequence and structural similarity with both proteins. In summary, our work outlines novel structural and functional aspects of the relaxase-mediated conjugative transfer of plasmid pCU1.« less
StructRNAfinder: an automated pipeline and web server for RNA families prediction.
Arias-Carrasco, Raúl; Vásquez-Morán, Yessenia; Nakaya, Helder I; Maracaja-Coutinho, Vinicius
2018-02-17
The function of many noncoding RNAs (ncRNAs) depend upon their secondary structures. Over the last decades, several methodologies have been developed to predict such structures or to use them to functionally annotate RNAs into RNA families. However, to fully perform this analysis, researchers should utilize multiple tools, which require the constant parsing and processing of several intermediate files. This makes the large-scale prediction and annotation of RNAs a daunting task even to researchers with good computational or bioinformatics skills. We present an automated pipeline named StructRNAfinder that predicts and annotates RNA families in transcript or genome sequences. This single tool not only displays the sequence/structural consensus alignments for each RNA family, according to Rfam database but also provides a taxonomic overview for each assigned functional RNA. Moreover, we implemented a user-friendly web service that allows researchers to upload their own nucleotide sequences in order to perform the whole analysis. Finally, we provided a stand-alone version of StructRNAfinder to be used in large-scale projects. The tool was developed under GNU General Public License (GPLv3) and is freely available at http://structrnafinder.integrativebioinformatics.me . The main advantage of StructRNAfinder relies on the large-scale processing and integrating the data obtained by each tool and database employed along the workflow, of which several files are generated and displayed in user-friendly reports, useful for downstream analyses and data exploration.
Integrated aerodynamic-structural design of a forward-swept transport wing
NASA Technical Reports Server (NTRS)
Haftka, Raphael T.; Grossman, Bernard; Kao, Pi-Jen; Polen, David M.; Sobieszczanski-Sobieski, Jaroslaw
1989-01-01
The introduction of composite materials is having a profound effect on aircraft design. Since these materials permit the designer to tailor material properties to improve structural, aerodynamic and acoustic performance, they require an integrated multidisciplinary design process. Futhermore, because of the complexity of the design process, numerical optimization methods are required. The utilization of integrated multidisciplinary design procedures for improving aircraft design is not currently feasible because of software coordination problems and the enormous computational burden. Even with the expected rapid growth of supercomputers and parallel architectures, these tasks will not be practical without the development of efficient methods for cross-disciplinary sensitivities and efficient optimization procedures. The present research is part of an on-going effort which is focused on the processes of simultaneous aerodynamic and structural wing design as a prototype for design integration. A sequence of integrated wing design procedures has been developed in order to investigate various aspects of the design process.
Using mobile sequencers in an academic classroom
Zaaijer, Sophie; Erlich, Yaniv
2016-01-01
The advent of mobile DNA sequencers has made it possible to generate DNA sequencing data outside of laboratories and genome centers. Here, we report our experience of using the MinION, a mobile sequencer, in a 13-week academic course for undergraduate and graduate students. The course consisted of theoretical sessions that presented fundamental topics in genomics and several applied hackathon sessions. In these hackathons, the students used MinION sequencers to generate and analyze their own data and gain hands-on experience in the topics discussed in the theoretical classes. The manuscript describes the structure of our class, the educational material, and the lessons we learned in the process. We hope that the knowledge and material presented here will provide the community with useful tools to help educate future generations of genome scientists. DOI: http://dx.doi.org/10.7554/eLife.14258.001 PMID:27054412
GWFASTA: server for FASTA search in eukaryotic and microbial genomes.
Issac, Biju; Raghava, G P S
2002-09-01
Similarity searches are a powerful method for solving important biological problems such as database scanning, evolutionary studies, gene prediction, and protein structure prediction. FASTA is a widely used sequence comparison tool for rapid database scanning. Here we describe the GWFASTA server that was developed to assist the FASTA user in similarity searches against partially and/or completely sequenced genomes. GWFASTA consists of more than 60 microbial genomes, eight eukaryote genomes, and proteomes of annotatedgenomes. Infact, it provides the maximum number of databases for similarity searching from a single platform. GWFASTA allows the submission of more than one sequence as a single query for a FASTA search. It also provides integrated post-processing of FASTA output, including compositional analysis of proteins, multiple sequences alignment, and phylogenetic analysis. Furthermore, it summarizes the search results organism-wise for prokaryotes and chromosome-wise for eukaryotes. Thus, the integration of different tools for sequence analyses makes GWFASTA a powerful toolfor biologists.
General properties of magnetic CP stars
NASA Astrophysics Data System (ADS)
Glagolevskij, Yu. V.
2017-07-01
We present the review of our previous studies related to observational evidence of the fossil field hypothesis of formation and evolution of magnetic and non-magnetic chemically peculiar stars. Analysis of the observed data shows that these stars acquire their main properties in the process of gravitational collapse. In the non-stationary Hayashi phase, a magnetic field becomes weakened and its configuration complicated, but the fossil field global orientation remains. After a non-stationary phase, relaxation of young star's tangled field takes place and by the time of joining ZAMS (Zero Age Main Sequence) it is generally restored to a dipole structure. Stability of dipole structures allows them to remain unchanged up to the end of their life on the Main Sequence which is 109 years at most.
Seal, B S; Neill, J D; Ridpath, J F
1994-07-01
Caliciviruses are nonenveloped with a polyadenylated genome of approximately 7.6 kb and a single capsid protein. The "RNA Fold" computer program was used to analyze 3'-terminal noncoding sequences of five feline calicivirus (FCV), rabbit hemorrhagic disease virus (RHDV), and two San Miguel sea lion virus (SMSV) isolates. The FCV 3'-terminal sequences are 40-46 nucleotides in length and 72-91% similar. The FCV sequences were predicted to contain two possible duplex structures and one stem-loop structure with free energies of -2.1 to -18.2 kcal/mole. The RHDV genomic 3'-terminal RNA sequences are 54 nucleotides in length and share 49% sequence similarity to homologous regions of the FCV genome. The RHDV sequence was predicted to form two duplex structures in the 3'-terminal noncoding region with a single stem-loop structure, resembling that of FCV. In contrast, the SMSV 1 and 4 genomic 3'-terminal noncoding sequences were 185 and 182 nucleotides in length, respectively. Ten possible duplex structures were predicted with an average structural free energy of -35 kcal/mole. Sequence similarity between the two SMSV isolates was 75%. Furthermore, extensive cloverleaflike structures are predicted in the 3' noncoding region of the SMSV genome, in contrast to the predicted single stem-loop structures of FCV or RHDV.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Espínola, Fernando; Dionisi, Hebe M.; Borglin, Sharon
In this work, we analyzed the community structure and metabolic potential of sediment microbial communities in high-latitude coastal environments subjected to low to moderate levels of chronic pollution. Subtidal sediments from four low-energy inlets located in polar and subpolar regions from both Hemispheres were analyzed using large-scale 16S rRNA gene and metagenomic sequencing. Communities showed high diversity (Shannon’s index 6.8 to 10.2), with distinct phylogenetic structures (<40% shared taxa at the Phylum level among regions) but similar metabolic potential in terms of sequences assigned to KOs. Environmental factors (mainly salinity, temperature, and in less extent organic pollution) were drivers ofmore » both phylogenetic and functional traits. Bacterial taxa correlating with hydrocarbon pollution included families of anaerobic or facultative anaerobic lifestyle, such as Desulfuromonadaceae, Geobacteraceae, and Rhodocyclaceae. In accordance, biomarker genes for anaerobic hydrocarbon degradation (bamA, ebdA, bcrA, and bssA) were prevalent, only outnumbered by alkB, and their sequences were taxonomically binned to the same bacterial groups. BssA-assigned metagenomic sequences showed an extremely wide diversity distributed all along the phylogeny known for this gene, including bssA sensu stricto, nmsA, assA, and other clusters from poorly or not yet described variants. This work increases our understanding of microbial community patterns in cold coastal sediments, and highlights the relevance of anaerobic hydrocarbon degradation processes in subtidal environments.« less
Quantifying the relationship between sequence and three-dimensional structure conservation in RNA
2010-01-01
Background In recent years, the number of available RNA structures has rapidly grown reflecting the increased interest on RNA biology. Similarly to the studies carried out two decades ago for proteins, which gave the fundamental grounds for developing comparative protein structure prediction methods, we are now able to quantify the relationship between sequence and structure conservation in RNA. Results Here we introduce an all-against-all sequence- and three-dimensional (3D) structure-based comparison of a representative set of RNA structures, which have allowed us to quantitatively confirm that: (i) there is a measurable relationship between sequence and structure conservation that weakens for alignments resulting in below 60% sequence identity, (ii) evolution tends to conserve more RNA structure than sequence, and (iii) there is a twilight zone for RNA homology detection. Discussion The computational analysis here presented quantitatively describes the relationship between sequence and structure for RNA molecules and defines a twilight zone region for detecting RNA homology. Our work could represent the theoretical basis and limitations for future developments in comparative RNA 3D structure prediction. PMID:20550657
QUASAR--scoring and ranking of sequence-structure alignments.
Birzele, Fabian; Gewehr, Jan E; Zimmer, Ralf
2005-12-15
Sequence-structure alignments are a common means for protein structure prediction in the fields of fold recognition and homology modeling, and there is a broad variety of programs that provide such alignments based on sequence similarity, secondary structure or contact potentials. Nevertheless, finding the best sequence-structure alignment in a pool of alignments remains a difficult problem. QUASAR (quality of sequence-structure alignments ranking) provides a unifying framework for scoring sequence-structure alignments that aids finding well-performing combinations of well-known and custom-made scoring schemes. Those scoring functions can be benchmarked against widely accepted quality scores like MaxSub, TMScore, Touch and APDB, thus enabling users to test their own alignment scores against 'standard-of-truth' structure-based scores. Furthermore, individual score combinations can be optimized with respect to benchmark sets based on known structural relationships using QUASAR's in-built optimization routines.
HCV IRES domain IIb affects the configuration of coding RNA in the 40S subunit's decoding groove
Filbin, Megan E.; Kieft, Jeffrey S.
2011-01-01
Hepatitis C virus (HCV) uses a structured internal ribosome entry site (IRES) RNA to recruit the translation machinery to the viral RNA and begin protein synthesis without the ribosomal scanning process required for canonical translation initiation. Different IRES structural domains are used in this process, which begins with direct binding of the 40S ribosomal subunit to the IRES RNA and involves specific manipulation of the translational machinery. We have found that upon initial 40S subunit binding, the stem–loop domain of the IRES that contains the start codon unwinds and adopts a stable configuration within the subunit's decoding groove. This configuration depends on the sequence and structure of a different stem–loop domain (domain IIb) located far from the start codon in sequence, but spatially proximal in the IRES•40S complex. Mutation of domain IIb results in misconfiguration of the HCV RNA in the decoding groove that includes changes in the placement of the AUG start codon, and a substantial decrease in the ability of the IRES to initiate translation. Our results show that two distal regions of the IRES are structurally communicating at the initial step of 40S subunit binding and suggest that this is an important step in driving protein synthesis. PMID:21606179
HCV IRES domain IIb affects the configuration of coding RNA in the 40S subunit's decoding groove.
Filbin, Megan E; Kieft, Jeffrey S
2011-07-01
Hepatitis C virus (HCV) uses a structured internal ribosome entry site (IRES) RNA to recruit the translation machinery to the viral RNA and begin protein synthesis without the ribosomal scanning process required for canonical translation initiation. Different IRES structural domains are used in this process, which begins with direct binding of the 40S ribosomal subunit to the IRES RNA and involves specific manipulation of the translational machinery. We have found that upon initial 40S subunit binding, the stem-loop domain of the IRES that contains the start codon unwinds and adopts a stable configuration within the subunit's decoding groove. This configuration depends on the sequence and structure of a different stem-loop domain (domain IIb) located far from the start codon in sequence, but spatially proximal in the IRES•40S complex. Mutation of domain IIb results in misconfiguration of the HCV RNA in the decoding groove that includes changes in the placement of the AUG start codon, and a substantial decrease in the ability of the IRES to initiate translation. Our results show that two distal regions of the IRES are structurally communicating at the initial step of 40S subunit binding and suggest that this is an important step in driving protein synthesis.
Jamsari, Amirul Firdaus Jamaluddin; Jamaluddin, Jamsari Amirul Firdaus; Pau, Tan Min; Siti-Azizah, Mohd Nor
2011-01-01
Nucleotide sequences of a partial cytochrome c oxidase subunit I gene were used to assess the manner in which historical processes and geomorphological effects may have influenced genetic structuring and phylogeographic patterns in Channa striata. Assaying was based on individuals from twelve populations in four river systems, which were separated into two regions, the eastern and western, of the biodiversely rich state of Perak in central Peninsular Malaysia. In 238 specimens, a total of 368-bp sequences with ten polymorphic sites and eleven unique haplotypes were detected. Data on all the twelve populations revealed incomplete divergence due to past historical coalescence and the short period of separation. Nevertheless, SAMOVA and F(ST) revealed geographical structuring existed to a certain extent in both regions. For the eastern region, the data also showed that the upstream populations were genetically significantly different compared to the mid- and downstream ones. It is inferred that physical barriers and historical processes played a dominant role in structuring the genetic dispersal of the species. A further inference is that the Grik, Tanjung Rambutan and Sungkai are potential candidates for conservation and aquaculture programmes since they contained most of the total diversity in this area.
Jamaluddin, Jamsari Amirul Firdaus; Pau, Tan Min; Siti-Azizah, Mohd Nor
2011-01-01
Nucleotide sequences of a partial cytochrome c oxidase subunit I gene were used to assess the manner in which historical processes and geomorphological effects may have influenced genetic structuring and phylogeographic patterns in Channa striata. Assaying was based on individuals from twelve populations in four river systems, which were separated into two regions, the eastern and western, of the biodiversely rich state of Perak in central Peninsular Malaysia. In 238 specimens, a total of 368-bp sequences with ten polymorphic sites and eleven unique haplotypes were detected. Data on all the twelve populations revealed incomplete divergence due to past historical coalescence and the short period of separation. Nevertheless, SAMOVA and FST revealed geographical structuring existed to a certain extent in both regions. For the eastern region, the data also showed that the upstream populations were genetically significantly different compared to the mid- and downstream ones. It is inferred that physical barriers and historical processes played a dominant role in structuring the genetic dispersal of the species. A further inference is that the Grik, Tanjung Rambutan and Sungkai are potential candidates for conservation and aquaculture programmes since they contained most of the total diversity in this area. PMID:21637559
NASA Astrophysics Data System (ADS)
Iriondo, M. H.; Kröhling, D. M.
2007-12-01
The purpose of this contribution is to describe the sequence of physical and chemical processes resulting in the sediment-type named loess, a fine-grained sediment deposit of universal occurrence. Owing to historical causes, loess has been (and still is) implicitly linked to glacial/periglacial environments among most naturalists. However it is known today that most eolian dust is deflated from tropical deserts. Hence, that sequence of processes is more comprehensive than the former narrow cold scenario. Six examples of different "non-classical" cases (from South America and Europe) that fit well to the loess definition are developed: 1) volcanic loess in Ecuador: pyroclastic eruptions/valley wind/mountain praire/silica structuring; 2) tropical loess in northeastern Argentina, Brazil and Uruguay: deflation of river and fan splays/savanna/iron sesquioxide structuring; 3) gypsum loess in northern Spain: destruction of anhydrite/gypsiferous layers in a dry climate/valley wind/Saharian shrub peridesert/gypsum structuring; 4) trade-wind deposits in Venezuela and Brazil: deflation in tidal flats/trade wind into the continent/savanna/iron hydroxide structuring; 5) anticyclonic gray loess in Argentina: continental anticyclone on plains/anti-clockwise winds and whirls/steppe/carbonate structuring. All these non-classical types conform to the accepted loess definitions and they also share the most important field characteristics of loess such as grain size, friability, vertical or sub-vertical slopes in outcrops, subfusion and others. Other cases can probably be recognized when systematically scrutinized.
Implicit structured sequence learning: an fMRI study of the structural mere-exposure effect
Folia, Vasiliki; Petersson, Karl Magnus
2014-01-01
In this event-related fMRI study we investigated the effect of 5 days of implicit acquisition on preference classification by means of an artificial grammar learning (AGL) paradigm based on the structural mere-exposure effect and preference classification using a simple right-linear unification grammar. This allowed us to investigate implicit AGL in a proper learning design by including baseline measurements prior to grammar exposure. After 5 days of implicit acquisition, the fMRI results showed activations in a network of brain regions including the inferior frontal (centered on BA 44/45) and the medial prefrontal regions (centered on BA 8/32). Importantly, and central to this study, the inclusion of a naive preference fMRI baseline measurement allowed us to conclude that these fMRI findings were the intrinsic outcomes of the learning process itself and not a reflection of a preexisting functionality recruited during classification, independent of acquisition. Support for the implicit nature of the knowledge utilized during preference classification on day 5 come from the fact that the basal ganglia, associated with implicit procedural learning, were activated during classification, while the medial temporal lobe system, associated with explicit declarative memory, was consistently deactivated. Thus, preference classification in combination with structural mere-exposure can be used to investigate structural sequence processing (syntax) in unsupervised AGL paradigms with proper learning designs. PMID:24550865
Implicit structured sequence learning: an fMRI study of the structural mere-exposure effect.
Folia, Vasiliki; Petersson, Karl Magnus
2014-01-01
In this event-related fMRI study we investigated the effect of 5 days of implicit acquisition on preference classification by means of an artificial grammar learning (AGL) paradigm based on the structural mere-exposure effect and preference classification using a simple right-linear unification grammar. This allowed us to investigate implicit AGL in a proper learning design by including baseline measurements prior to grammar exposure. After 5 days of implicit acquisition, the fMRI results showed activations in a network of brain regions including the inferior frontal (centered on BA 44/45) and the medial prefrontal regions (centered on BA 8/32). Importantly, and central to this study, the inclusion of a naive preference fMRI baseline measurement allowed us to conclude that these fMRI findings were the intrinsic outcomes of the learning process itself and not a reflection of a preexisting functionality recruited during classification, independent of acquisition. Support for the implicit nature of the knowledge utilized during preference classification on day 5 come from the fact that the basal ganglia, associated with implicit procedural learning, were activated during classification, while the medial temporal lobe system, associated with explicit declarative memory, was consistently deactivated. Thus, preference classification in combination with structural mere-exposure can be used to investigate structural sequence processing (syntax) in unsupervised AGL paradigms with proper learning designs.
SSMART: Sequence-structure motif identification for RNA-binding proteins.
Munteanu, Alina; Mukherjee, Neelanjan; Ohler, Uwe
2018-06-11
RNA-binding proteins (RBPs) regulate every aspect of RNA metabolism and function. There are hundreds of RBPs encoded in the eukaryotic genomes, and each recognize its RNA targets through a specific mixture of RNA sequence and structure properties. For most RBPs, however, only a primary sequence motif has been determined, while the structure of the binding sites is uncharacterized. We developed SSMART, an RNA motif finder that simultaneously models the primary sequence and the structural properties of the RNA targets sites. The sequence-structure motifs are represented as consensus strings over a degenerate alphabet, extending the IUPAC codes for nucleotides to account for secondary structure preferences. Evaluation on synthetic data showed that SSMART is able to recover both sequence and structure motifs implanted into 3'UTR-like sequences, for various degrees of structured/unstructured binding sites. In addition, we successfully used SSMART on high-throughput in vivo and in vitro data, showing that we not only recover the known sequence motif, but also gain insight into the structural preferences of the RBP. Availability: SSMART is freely available at https://ohlerlab.mdc-berlin.de/software/SSMART_137/. Supplementary data are available at Bioinformatics online.
Reproducibility and quantitation of amplicon sequencing-based detection
Zhou, Jizhong; Wu, Liyou; Deng, Ye; Zhi, Xiaoyang; Jiang, Yi-Huei; Tu, Qichao; Xie, Jianping; Van Nostrand, Joy D; He, Zhili; Yang, Yunfeng
2011-01-01
To determine the reproducibility and quantitation of the amplicon sequencing-based detection approach for analyzing microbial community structure, a total of 24 microbial communities from a long-term global change experimental site were examined. Genomic DNA obtained from each community was used to amplify 16S rRNA genes with two or three barcode tags as technical replicates in the presence of a small quantity (0.1% wt/wt) of genomic DNA from Shewanella oneidensis MR-1 as the control. The technical reproducibility of the amplicon sequencing-based detection approach is quite low, with an average operational taxonomic unit (OTU) overlap of 17.2%±2.3% between two technical replicates, and 8.2%±2.3% among three technical replicates, which is most likely due to problems associated with random sampling processes. Such variations in technical replicates could have substantial effects on estimating β-diversity but less on α-diversity. A high variation was also observed in the control across different samples (for example, 66.7-fold for the forward primer), suggesting that the amplicon sequencing-based detection approach could not be quantitative. In addition, various strategies were examined to improve the comparability of amplicon sequencing data, such as increasing biological replicates, and removing singleton sequences and less-representative OTUs across biological replicates. Finally, as expected, various statistical analyses with preprocessed experimental data revealed clear differences in the composition and structure of microbial communities between warming and non-warming, or between clipping and non-clipping. Taken together, these results suggest that amplicon sequencing-based detection is useful in analyzing microbial community structure even though it is not reproducible and quantitative. However, great caution should be taken in experimental design and data interpretation when the amplicon sequencing-based detection approach is used for quantitative analysis of the β-diversity of microbial communities. PMID:21346791
Seo, Eunyoung; Woo, Jongchan; Park, Eunsook; Bertolani, Steven J; Siegel, Justin B; Choi, Doil; Dinesh-Kumar, Savithramma P
2016-11-01
Autophagy is important for degradation and recycling of intracellular components. In a diversity of genera and species, orthologs and paralogs of the yeast Atg4 and Atg8 proteins are crucial in the biogenesis of double-membrane autophagosomes that carry the cellular cargoes to vacuoles and lysosomes. Although many plant genome sequences are available, the ATG4 and ATG8 sequence analysis is limited to some model plants. We identified 28 ATG4 and 116 ATG8 genes from the available 18 different plant genome sequences. Gene structures and protein domain sequences of ATG4 and ATG8 are conserved in plant lineages. Phylogenetic analyses classified ATG8s into 3 subgroups suggesting divergence from the common ancestor. The ATG8 expansion in plants might be attributed to whole genome duplication, segmental and dispersed duplication, and purifying selection. Our results revealed that the yeast Atg4 processes Arabidopsis ATG8 but not human LC3A (HsLC3A). In contrast, HsATG4B can process yeast and plant ATG8s in vitro but yeast and plant ATG4s cannot process HsLC3A. Interestingly, in Nicotiana benthamiana plants the yeast Atg8 is processed compared to HsLC3A. However, HsLC3A is processed when coexpressed with HsATG4B in plants. Molecular modeling indicates that lack of processing of HsLC3A by plant and yeast ATG4 is not due to lack of interaction with HsLC3A. Our in-depth analyses of ATG4 and ATG8 in the plant lineage combined with results of cross-kingdom ATG8 processing by ATG4 further support the evolutionarily conserved maturation of ATG8. Broad ATG8 processing by HsATG4B and lack of processing of HsLC3A by yeast and plant ATG4s suggest that the cross-kingdom ATG8 processing is determined by ATG8 sequence rather than ATG4.
Meng, Yijun; Yu, Dongliang; Xue, Jie; Lu, Jiangjie; Feng, Shangguo; Shen, Chenjia; Wang, Huizhong
2016-01-01
Dendrobium officinale is an important traditional Chinese herb. Here, we did a transcriptome-wide, organ-specific study on this valuable plant by combining RNA, small RNA (sRNA) and degradome sequencing. RNA sequencing of four organs (flower, root, leaf and stem) of Dendrobium officinale enabled us to obtain 536,558 assembled transcripts, from which 2,645, 256, 42 and 54 were identified to be highly expressed in the four organs respectively. Based on sRNA sequencing, 2,038, 2, 21 and 24 sRNAs were identified to be specifically accumulated in the four organs respectively. A total of 1,047 mature microRNA (miRNA) candidates were detected. Based on secondary structure predictions and sequencing, tens of potential miRNA precursors were identified from the assembled transcripts. Interestingly, phase-distributed sRNAs with degradome-based processing evidences were discovered on the long-stem structures of two precursors. Target identification was performed for the 1,047 miRNA candidates, resulting in the discovery of 1,257 miRNA--target pairs. Finally, some biological meaningful subnetworks involving hormone signaling, development, secondary metabolism and Argonaute 1-related regulation were established. All of the sequencing data sets are available at NCBI Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra/). Summarily, our study provides a valuable resource for the in-depth molecular and functional studies on this important Chinese orchid herb. PMID:26732614
SARA-Coffee web server, a tool for the computation of RNA sequence and structure multiple alignments
Di Tommaso, Paolo; Bussotti, Giovanni; Kemena, Carsten; Capriotti, Emidio; Chatzou, Maria; Prieto, Pablo; Notredame, Cedric
2014-01-01
This article introduces the SARA-Coffee web server; a service allowing the online computation of 3D structure based multiple RNA sequence alignments. The server makes it possible to combine sequences with and without known 3D structures. Given a set of sequences SARA-Coffee outputs a multiple sequence alignment along with a reliability index for every sequence, column and aligned residue. SARA-Coffee combines SARA, a pairwise structural RNA aligner with the R-Coffee multiple RNA aligner in a way that has been shown to improve alignment accuracy over most sequence aligners when enough structural data is available. The server can be accessed from http://tcoffee.crg.cat/apps/tcoffee/do:saracoffee. PMID:24972831
Colored petri net modeling of small interfering RNA-mediated messenger RNA degradation.
Nickaeen, Niloofar; Moein, Shiva; Heidary, Zarifeh; Ghaisari, Jafar
2016-01-01
Mathematical modeling of biological systems is an attractive way for studying complex biological systems and their behaviors. Petri Nets, due to their ability to model systems with various levels of qualitative information, have been wildly used in modeling biological systems in which enough qualitative data may not be at disposal. These nets have been used to answer questions regarding the dynamics of different cell behaviors including the translation process. In one stage of the translation process, the RNA sequence may be degraded. In the process of degradation of RNA sequence, small-noncoding RNA molecules known as small interfering RNA (siRNA) match the target RNA sequence. As a result of this matching, the target RNA sequence is destroyed. In this context, the process of matching and destruction is modeled using Colored Petri Nets (CPNs). The model is constructed using CPNs which allow tokens to have a value or type on them. Thus, CPN is a suitable tool to model string structures in which each element of the string has a different type. Using CPNs, long RNA, and siRNA strings are modeled with a finite set of colors. The model is simulated via CPN Tools. A CPN model of the matching between RNA and siRNA strings is constructed in CPN Tools environment. In previous studies, a network of stoichiometric equations was modeled. However, in this particular study, we modeled the mechanism behind the silencing process. Modeling this kind of mechanisms provides us with a tool to examine the effects of different factors such as mutation or drugs on the process.
Improved growth of GaN layers on ultra thin silicon nitride/Si (1 1 1) by RF-MBE
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kumar, Mahesh; Roul, Basanta; Central Research Laboratory, Bharat Electronics, Bangalore 560013
High-quality GaN epilayers were grown on Si (1 1 1) substrates by molecular beam epitaxy using a new growth process sequence which involved a substrate nitridation at low temperatures, annealing at high temperatures, followed by nitridation at high temperatures, deposition of a low-temperature buffer layer, and a high-temperature overgrowth. The material quality of the GaN films was also investigated as a function of nitridation time and temperature. Crystallinity and surface roughness of GaN was found to improve when the Si substrate was treated under the new growth process sequence. Micro-Raman and photoluminescence (PL) measurement results indicate that the GaN filmmore » grown by the new process sequence has less tensile stress and optically good. The surface and interface structures of an ultra thin silicon nitride film grown on the Si surface are investigated by core-level photoelectron spectroscopy and it clearly indicates that the quality of silicon nitride notably affects the properties of GaN growth.« less
Natural mummification of the human gut preserves bacteriophage DNA.
Santiago-Rodriguez, Tasha M; Fornaciari, Gino; Luciani, Stefania; Dowd, Scot E; Toranzos, Gary A; Marota, Isolina; Cano, Raul J
2016-01-01
The natural mummification process of the human gut represents a unique opportunity to study the resulting microbial community structure and composition. While results are providing insights into the preservation of bacteria, fungi, pathogenic eukaryotes and eukaryotic viruses, no studies have demonstrated that the process of natural mummification also results in the preservation of bacteriophage DNA. We characterized the gut microbiome of three pre-Columbian Andean mummies, namely FI3, FI9 and FI12, and found sequences homologous to viruses. From the sequences attributable to viruses, 50.4% (mummy FI3), 1.0% (mummy FI9) and 84.4% (mummy FI12) were homologous to bacteriophages. Sequences corresponding to the Siphoviridae, Myoviridae, Podoviridae and Microviridae families were identified. Predicted putative bacterial hosts corresponded mainly to the Firmicutes and Proteobacteria, and included Bacillus, Staphylococcus, Clostridium, Escherichia, Vibrio, Klebsiella, Pseudomonas and Yersinia. Predicted functional categories associated with bacteriophages showed a representation of structural, replication, integration and entry and lysis genes. The present study suggests that the natural mummification of the human gut results in the preservation of bacteriophage DNA, representing an opportunity to elucidate the ancient phageome and to hypothesize possible mechanisms of preservation. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Gomez-Alvarez, Vicente; Humrighouse, Ben W; Revetta, Randy P; Santo Domingo, Jorge W
2015-03-01
We investigated the bacterial composition of water samples from two service areas within a drinking water distribution system (DWDS), each associated with a different primary source of water (groundwater, GW; surface water, SW) and different treatment process. Community analysis based on 16S rRNA gene clone libraries indicated that Actinobacteria (Mycobacterium spp.) and α-Proteobacteria represented nearly 43 and 38% of the total sequences, respectively. Sequences closely related to Legionella, Pseudomonas, and Vibrio spp. were also identified. In spite of the high number of sequences (71%) shared in both areas, multivariable analysis revealed significant differences between the GW and SW areas. While the dominant phylotypes where not significantly contributing in the ordination of samples, the populations associated with the core of phylotypes (1-10% in each sample) significantly contributed to the differences between both service areas. Diversity indices indicate that the microbial community inhabiting the SW area is more diverse and contains more distantly related species coexisting with local assemblages as compared with the GW area. The bacterial community structure of SW and GW service areas were dissimilar, suggesting that their respective source water and/or water quality parameters shaped by the treatment processes may contribute to the differences in community structure observed.
Ganguli, Sayak; Gupta, Manoj Kumar; Basu, Protip; Banik, Rahul; Singh, Pankaj Kumar; Vishal, Vineet; Bera, Abhisek Ranjan; Chakraborty, Hirak Jyoti; Das, Sasti Gopal
2014-01-01
With the advent of age of big data and advances in high throughput technology accessing data has become one of the most important step in the entire knowledge discovery process. Most users are not able to decipher the query result that is obtained when non specific keywords or a combination of keywords are used. Intelligent access to sequence and structure databases (IASSD) is a desktop application for windows operating system. It is written in Java and utilizes the web service description language (wsdl) files and Jar files of E-utilities of various databases such as National Centre for Biotechnology Information (NCBI) and Protein Data Bank (PDB). Apart from that IASSD allows the user to view protein structure using a JMOL application which supports conditional editing. The Jar file is freely available through e-mail from the corresponding author.
The BIRN Project: Imaging the Nervous System
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ellisman, Mark
The grand goal in neuroscience research is to understand how the interplay of structural, chemical and electrical signals in nervous tissue gives rise to behavior. Experimental advances of the past decades have given the individual neuroscientist an increasingly powerful arsenal for obtaining data, from the level of molecules to nervous systems. Scientists have begun the arduous and challenging process of adapting and assembling neuroscience data at all scales of resolution and across disciplines into computerized databases and other easily accessed sources. These databases will complement the vast structural and sequence databases created to catalogue, organize and analyze gene sequences andmore » protein products. The general premise of the neuroscience goal is simple; namely that with "complete" knowledge of the genome and protein structures accruing rapidly we next need to assemble an infrastructure that will facilitate acquisition of an understanding for how functional complexes operate in their cell and tissue contexts.« less
The BIRN Project: Imaging the Nervous System
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ellisman, Mark
The grand goal in neuroscience research is to understand how the interplay of structural, chemical and electrical signals in nervous tissue gives rise to behavior. Experimental advances of the past decades have given the individual neuroscientist an increasingly powerful arsenal for obtaining data, from the level of molecules to nervous systems. Scientists have begun the arduous and challenging process of adapting and assembling neuroscience data at all scales of resolution and across disciplines into computerized databases and other easily accessed sources. These databases will complement the vast structural and sequence databases created to catalogue, organize and analyze gene sequences andmore » protein products. The general premise of the neuroscience goal is simple; namely that with 'complete' knowledge of the genome and protein structures accruing rapidly we next need to assemble an infrastructure that will facilitate acquisition of an understanding for how functional complexes operate in their cell and tissue contexts.« less
HDAPD: a web tool for searching the disease-associated protein structures
2010-01-01
Background The protein structures of the disease-associated proteins are important for proceeding with the structure-based drug design to against a particular disease. Up until now, proteins structures are usually searched through a PDB id or some sequence information. However, in the HDAPD database presented here the protein structure of a disease-associated protein can be directly searched through the associated disease name keyed in. Description The search in HDAPD can be easily initiated by keying some key words of a disease, protein name, protein type, or PDB id. The protein sequence can be presented in FASTA format and directly copied for a BLAST search. HDAPD is also interfaced with Jmol so that users can observe and operate a protein structure with Jmol. The gene ontological data such as cellular components, molecular functions, and biological processes are provided once a hyperlink to Gene Ontology (GO) is clicked. Further, HDAPD provides a link to the KEGG map such that where the protein is placed and its relationship with other proteins in a metabolic pathway can be found from the map. The latest literatures namely titles, journals, authors, and abstracts searched from PubMed for the protein are also presented as a length controllable list. Conclusions Since the HDAPD data content can be routinely updated through a PHP-MySQL web page built, the new database presented is useful for searching the structures for some disease-associated proteins that may play important roles in the disease developing process for performing the structure-based drug design to against the diseases. PMID:20158919
Thientosapol, Eddy Sanchai; Sharbeen, George; Lau, K.K. Edwin; Bosnjak, Daniel; Durack, Timothy; Stevanovski, Igor; Weninger, Wolfgang
2017-01-01
Abstract AID deaminates C to U in either strand of Ig genes, exclusively producing C:G/G:C to T:A/A:T transition mutations if U is left unrepaired. Error-prone processing by UNG2 or mismatch repair diversifies mutation, predominantly at C:G or A:T base pairs, respectively. Here, we show that transversions at C:G base pairs occur by two distinct processing pathways that are dictated by sequence context. Within and near AGCT mutation hotspots, transversion mutation at C:G was driven by UNG2 without requirement for mismatch repair. Deaminations in AGCT were refractive both to processing by UNG2 and to high-fidelity base excision repair (BER) downstream of UNG2, regardless of mismatch repair activity. We propose that AGCT sequences resist faithful BER because they bind BER-inhibitory protein(s) and/or because hemi-deaminated AGCT motifs innately form a BER-resistant DNA structure. Distal to AGCT sequences, transversions at G were largely co-dependent on UNG2 and mismatch repair. We propose that AGCT-distal transversions are produced when apyrimidinic sites are exposed in mismatch excision patches, because completion of mismatch repair would require bypass of these sites. PMID:28039326
PASS2: an automated database of protein alignments organised as structural superfamilies.
Bhaduri, Anirban; Pugalenthi, Ganesan; Sowdhamini, Ramanathan
2004-04-02
The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2) database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins. An automated and updated version of PASS2 is, in direct correspondence with SCOP 1.63, consisting of sequences having identity below 40% among themselves. Protein domains have been grouped into 628 multi-member superfamilies and 566 single member superfamilies. Structure-based sequence alignments for the superfamilies have been obtained using COMPARER, while initial equivalencies have been derived from a preliminary superposition using LSQMAN or STAMP 4.0. The final sequence alignments have been annotated for structural features using JOY4.0. The database is supplemented with sequence relatives belonging to different genomes, conserved spatially interacting and structural motifs, probabilistic hidden markov models of superfamilies based on the alignments and useful links to other databases. Probabilistic models and sensitive position specific profiles obtained from reliable superfamily alignments aid annotation of remote homologues and are useful tools in structural and functional genomics. PASS2 presents the phylogeny of its members both based on sequence and structural dissimilarities. Clustering of members allows us to understand diversification of the family members. The search engine has been improved for simpler browsing of the database. The database resolves alignments among the structural domains consisting of evolutionarily diverged set of sequences. Availability of reliable sequence alignments of distantly related proteins despite poor sequence identity and single-member superfamilies permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure-function relationships of individual superfamilies. PASS2 is accessible at http://www.ncbs.res.in/~faculty/mini/campass/pass2.html
Mapping the acquisition of the number word sequence in the first year of school
NASA Astrophysics Data System (ADS)
Gould, Peter
2017-03-01
Learning to count and to produce the correct sequence of number words in English is not a simple process. In NSW government schools taking part in Early Action for Success, over 800 students in each of the first 3 years of school were assessed every 5 weeks over the school year to determine the highest correct oral count they could produce. Rather than displaying a steady increase in the accurate sequence of the number words produced, the kindergarten data reported here identified clear, substantial hurdles in the acquisition of the counting sequence. The large-scale, longitudinal data also provided evidence of learning to count through the teens being facilitated by the semi-regular structure of the number words in English. Instead of occurring as hurdles to starting the next counting sequence, number words corresponding to some multiples of ten (10, 20 and 100) acted as if they were rest points. These rest points appear to be artefacts of how the counting sequence is acquired.
Harnessing glycomics technologies: integrating structure with function for glycan characterization
Robinson, Luke N.; Artpradit, Charlermchai; Raman, Rahul; Shriver, Zachary H.; Ruchirawat, Mathuros; Sasisekharan, Ram
2013-01-01
Glycans, or complex carbohydrates, are a ubiquitous class of biological molecules which impinge on a variety of physiological processes ranging from signal transduction to tissue development and microbial pathogenesis. In comparison to DNA and proteins, glycans present unique challenges to the study of their structure and function owing to their complex and heterogeneous structures and the dominant role played by multivalency in their sequence-specific biological interactions. Arising from these challenges, there is a need to integrate information from multiple complementary methods to decode structure-function relationships. Focusing on acidic glycans, we describe here key glycomics technologies for characterizing their structural attributes, including linkage, modifications, and topology, as well as for elucidating their role in biological processes. Two cases studies, one involving sialylated branched glycans and the other sulfated glycosaminoglycans, are used to highlight how integration of orthogonal information from diverse datasets enables rapid convergence of glycan characterization for development of robust structure-function relationships. PMID:22522536
Fine-scale population structure and the era of next-generation sequencing.
Henn, Brenna M; Gravel, Simon; Moreno-Estrada, Andres; Acevedo-Acevedo, Suehelay; Bustamante, Carlos D
2010-10-15
Fine-scale population structure characterizes most continents and is especially pronounced in non-cosmopolitan populations. Roughly half of the world's population remains non-cosmopolitan and even populations within cities often assort along ethnic and linguistic categories. Barriers to random mating can be ecologically extreme, such as the Sahara Desert, or cultural, such as the Indian caste system. In either case, subpopulations accumulate genetic differences if the barrier is maintained over multiple generations. Genome-wide polymorphism data, initially with only a few hundred autosomal microsatellites, have clearly established differences in allele frequency not only among continental regions, but also within continents and within countries. We review recent evidence from the analysis of genome-wide polymorphism data for genetic boundaries delineating human population structure and the main demographic and genomic processes shaping variation, and discuss the implications of population structure for the distribution and discovery of disease-causing genetic variants, in the light of the imminent availability of sequencing data for a multitude of diverse human genomes.
Structures of Human Pumilio with Noncognate RNAs Reveal Molecular Mechanisms for Binding Promiscuity
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gupta,Y.; Nair, D.; Wharton, R.
2008-01-01
Pumilio is a founder member of the evolutionarily conserved Puf family of RNA-binding proteins that control a number of physiological processes in eukaryotes. A structure of human Pumilio (hPum) Puf domain bound to a Drosophila regulatory sequence showed that each Puf repeat recognizes a single nucleotide. Puf domains in general bind promiscuously to a large set of degenerate sequences, but the structural basis for this promiscuity has been unclear. Here, we describe the structures of hPum Puf domain complexed to two noncognate RNAs, CycBreverse and Puf5. In each complex, one of the nucleotides is ejected from the binding surface, inmore » effect, acting as a 'spacer.' The complexes also reveal the plasticity of several Puf repeats, which recognize noncanonical nucleotides. Together, these complexes provide a molecular basis for recognition of degenerate binding sites, which significantly increases the number of mRNAs targeted for regulation by Puf proteins in vivo.« less
Staufen1 senses overall transcript secondary structure to regulate translation
Ricci, Emiliano P; Kucukural, Alper; Cenik, Can; Mercier, Blandine C; Singh, Guramrit; Heyer, Erin E; Ashar-Patel, Ami; Peng, Lingtao; Moore, Melissa J
2015-01-01
Human Staufen1 (Stau1) is a double-stranded RNA (dsRNA)-binding protein implicated in multiple post-transcriptional gene-regulatory processes. Here we combined RNA immunoprecipitation in tandem (RIPiT) with RNase footprinting, formaldehyde cross-linking, sonication-mediated RNA fragmentation and deep sequencing to map Staufen1-binding sites transcriptome wide. We find that Stau1 binds complex secondary structures containing multiple short helices, many of which are formed by inverted Alu elements in annotated 3′ untranslated regions (UTRs) or in ‘strongly distal’ 3′ UTRs. Stau1 also interacts with actively translating ribosomes and with mRNA coding sequences (CDSs) and 3′ UTRs in proportion to their GC content and propensity to form internal secondary structure. On mRNAs with high CDS GC content, higher Stau1 levels lead to greater ribosome densities, thus suggesting a general role for Stau1 in modulating translation elongation through structured CDS regions. Our results also indicate that Stau1 regulates translation of transcription-regulatory proteins. PMID:24336223
Observations of disconnection of open coronal magnetic structures
NASA Technical Reports Server (NTRS)
Mccomas, D. J.; Phillips, J. L.; Hundhausen, A. J.; Burkepile, J. T.
1991-01-01
The solar maximum mission coronagraph/polarimeter observations are surveyed for evidence of magnetic disconnection of previously open magnetic structures and several sequences of images consistent with this interpretation are identified. Such disconnection occurs when open field lines above helmet streamers reconnect, in contrast to previously suggested disconnections of CMEs into closed plasmoids. In this paper a clear example of open field disconnection is shown in detail. The event, on June 27, 1988, is preceded by compression of a preexisting helmet streamer and the open coronal field around it. The compressed helmet streamer and surrounding open field region detach in a large U-shaped structure which subsequently accelerates outward from the sun. The observed sequence of events is consistent with reconnection across the heliospheric current sheet and the creation of a detached U-shaped magnetic structure. Unlike CMEs, which may open new magnetic flux into interplanetary space, this process could serve to close off previously open flux, perhaps helping to maintain the roughly constant amount of open magnetic flux observed in interplanetary space.
Jalili, Seifollah; Karami, Leila; Schofield, Jeremy
2013-06-01
Proline-rich homeodomain (PRH) is a regulatory protein controlling transcription and gene expression processes by binding to the specific sequence of DNA, especially to the sequence 5'-TAATNN-3'. The impact of base pair mutations on the binding between the PRH protein and DNA is investigated using molecular dynamics and free energy simulations to identify DNA sequences that form stable complexes with PRH. Three 20-ns molecular dynamics simulations (PRH-TAATTG, PRH-TAATTA and PRH-TAATGG complexes) in explicit solvent water were performed to investigate three complexes structurally. Structural analysis shows that the native TAATTG sequence forms a complex that is more stable than complexes with base pair mutations. It is also observed that upon mutation, the number and occupancy of the direct and water-mediated hydrogen bonds decrease. Free energy calculations performed with the thermodynamic integration method predict relative binding free energies of 0.64 and 2 kcal/mol for GC to AT and TA to GC mutations, respectively, suggesting that among the three DNA sequences, the PRH-TAATTG complex is more stable than the two mutated complexes. In addition, it is demonstrated that the stability of the PRH-TAATTA complex is greater than that of the PRH-TAATGG complex.
The sponge microbiome project.
Moitinho-Silva, Lucas; Nielsen, Shaun; Amir, Amnon; Gonzalez, Antonio; Ackermann, Gail L; Cerrano, Carlo; Astudillo-Garcia, Carmen; Easson, Cole; Sipkema, Detmer; Liu, Fang; Steinert, Georg; Kotoulas, Giorgos; McCormack, Grace P; Feng, Guofang; Bell, James J; Vicente, Jan; Björk, Johannes R; Montoya, Jose M; Olson, Julie B; Reveillaud, Julie; Steindler, Laura; Pineda, Mari-Carmen; Marra, Maria V; Ilan, Micha; Taylor, Michael W; Polymenakou, Paraskevi; Erwin, Patrick M; Schupp, Peter J; Simister, Rachel L; Knight, Rob; Thacker, Robert W; Costa, Rodrigo; Hill, Russell T; Lopez-Legentil, Susanna; Dailianis, Thanos; Ravasi, Timothy; Hentschel, Ute; Li, Zhiyong; Webster, Nicole S; Thomas, Torsten
2017-10-01
Marine sponges (phylum Porifera) are a diverse, phylogenetically deep-branching clade known for forming intimate partnerships with complex communities of microorganisms. To date, 16S rRNA gene sequencing studies have largely utilised different extraction and amplification methodologies to target the microbial communities of a limited number of sponge species, severely limiting comparative analyses of sponge microbial diversity and structure. Here, we provide an extensive and standardised dataset that will facilitate sponge microbiome comparisons across large spatial, temporal, and environmental scales. Samples from marine sponges (n = 3569 specimens), seawater (n = 370), marine sediments (n = 65) and other environments (n = 29) were collected from different locations across the globe. This dataset incorporates at least 268 different sponge species, including several yet unidentified taxa. The V4 region of the 16S rRNA gene was amplified and sequenced from extracted DNA using standardised procedures. Raw sequences (total of 1.1 billion sequences) were processed and clustered with (i) a standard protocol using QIIME closed-reference picking resulting in 39 543 operational taxonomic units (OTU) at 97% sequence identity, (ii) a de novo clustering using Mothur resulting in 518 246 OTUs, and (iii) a new high-resolution Deblur protocol resulting in 83 908 unique bacterial sequences. Abundance tables, representative sequences, taxonomic classifications, and metadata are provided. This dataset represents a comprehensive resource of sponge-associated microbial communities based on 16S rRNA gene sequences that can be used to address overarching hypotheses regarding host-associated prokaryotes, including host specificity, convergent evolution, environmental drivers of microbiome structure, and the sponge-associated rare biosphere. © The Authors 2017. Published by Oxford University Press.
Cerebellum, temporal predictability and the updating of a mental model.
Kotz, Sonja A; Stockert, Anika; Schwartze, Michael
2014-12-19
We live in a dynamic and changing environment, which necessitates that we adapt to and efficiently respond to changes of stimulus form ('what') and stimulus occurrence ('when'). Consequently, behaviour is optimal when we can anticipate both the 'what' and 'when' dimensions of a stimulus. For example, to perceive a temporally expected stimulus, a listener needs to establish a fairly precise internal representation of its external temporal structure, a function ascribed to classical sensorimotor areas such as the cerebellum. Here we investigated how patients with cerebellar lesions and healthy matched controls exploit temporal regularity during auditory deviance processing. We expected modulations of the N2b and P3b components of the event-related potential in response to deviant tones, and also a stronger P3b response when deviant tones are embedded in temporally regular compared to irregular tone sequences. We further tested to what degree structural damage to the cerebellar temporal processing system affects the N2b and P3b responses associated with voluntary attention to change detection and the predictive adaptation of a mental model of the environment, respectively. Results revealed that healthy controls and cerebellar patients display an increased N2b response to deviant tones independent of temporal context. However, while healthy controls showed the expected enhanced P3b response to deviant tones in temporally regular sequences, the P3b response in cerebellar patients was significantly smaller in these sequences. The current data provide evidence that structural damage to the cerebellum affects the predictive adaptation to the temporal structure of events and the updating of a mental model of the environment under voluntary attention. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
Cerebellum, temporal predictability and the updating of a mental model
Kotz, Sonja A.; Stockert, Anika; Schwartze, Michael
2014-01-01
We live in a dynamic and changing environment, which necessitates that we adapt to and efficiently respond to changes of stimulus form (‘what’) and stimulus occurrence (‘when’). Consequently, behaviour is optimal when we can anticipate both the ‘what’ and ‘when’ dimensions of a stimulus. For example, to perceive a temporally expected stimulus, a listener needs to establish a fairly precise internal representation of its external temporal structure, a function ascribed to classical sensorimotor areas such as the cerebellum. Here we investigated how patients with cerebellar lesions and healthy matched controls exploit temporal regularity during auditory deviance processing. We expected modulations of the N2b and P3b components of the event-related potential in response to deviant tones, and also a stronger P3b response when deviant tones are embedded in temporally regular compared to irregular tone sequences. We further tested to what degree structural damage to the cerebellar temporal processing system affects the N2b and P3b responses associated with voluntary attention to change detection and the predictive adaptation of a mental model of the environment, respectively. Results revealed that healthy controls and cerebellar patients display an increased N2b response to deviant tones independent of temporal context. However, while healthy controls showed the expected enhanced P3b response to deviant tones in temporally regular sequences, the P3b response in cerebellar patients was significantly smaller in these sequences. The current data provide evidence that structural damage to the cerebellum affects the predictive adaptation to the temporal structure of events and the updating of a mental model of the environment under voluntary attention. PMID:25385781
Barrick, Jeffrey E; Colburn, Geoffrey; Deatherage, Daniel E; Traverse, Charles C; Strand, Matthew D; Borges, Jordan J; Knoester, David B; Reba, Aaron; Meyer, Austin G
2014-11-29
Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobile genetic elements, recently duplicated genes, or other repetitive sequences. Most current software programs for predicting structural variation from short-read DNA resequencing data are intended primarily for use on human genomes. They typically disregard information in reads mapping to repeat sequences, and significant post-processing and manual examination of their output is often required to rule out false-positive predictions and precisely describe mutational events. We have implemented an algorithm for identifying structural variation from DNA resequencing data as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes. Our method evaluates the support for new sequence junctions present in a clonal sample from split-read alignments to a reference genome, including matches to repeat sequences. Then, it uses a statistical model of read coverage evenness to accept or reject these predictions. Finally, breseq combines predictions of new junctions and deleted chromosomal regions to output biologically relevant descriptions of mutations and their effects on genes. We demonstrate the performance of breseq on simulated Escherichia coli genomes with deletions generating unique breakpoint sequences, new insertions of mobile genetic elements, and deletions mediated by mobile elements. Then, we reanalyze data from an E. coli K-12 mutation accumulation evolution experiment in which structural variation was not previously identified. Transposon insertions and large-scale chromosomal changes detected by breseq account for ~25% of spontaneous mutations in this strain. In all cases, we find that breseq is able to reliably predict structural variation with modest read-depth coverage of the reference genome (>40-fold). Using breseq to predict structural variation should be useful for studies of microbial epidemiology, experimental evolution, synthetic biology, and genetics when a reference genome for a closely related strain is available. In these cases, breseq can discover mutations that may be responsible for important or unintended changes in genomes that might otherwise go undetected.
Buenrostro, Jason D.; Chircus, Lauren M.; Araya, Carlos L.; Layton, Curtis J.; Chang, Howard Y.; Snyder, Michael P.; Greenleaf, William J.
2015-01-01
RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited. Here we repurpose a high-throughput sequencing instrument to quantitatively measure binding and dissociation of MS2 coat protein to >107 RNA targets generated on a flow-cell surface by in situ transcription and inter-molecular tethering of RNA to DNA. We decompose the binding energy contributions from primary and secondary RNA structure, finding that differences in affinity are often driven by sequence-specific changes in association rates. By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis, and a long-hypothesized structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories. Our results suggest that quantitative analysis of RNA on a massively parallel array (RNAMaP) relationships across molecular variants. PMID:24727714
Savary, Brett J; Vasu, Prasanna; Cameron, Randall G; McCollum, T Gregory; Nuñez, Alberto
2013-12-26
Despite the longstanding importance of the thermally tolerant pectin methylesterase (TT-PME) activity in citrus juice processing and product quality, the unequivocal identification of the protein and its corresponding gene has remained elusive. TT-PME was purified from sweet orange [ Citrus sinensis (L.) Osbeck] finisher pulp (8.0 mg/1.3 kg tissue) with an improved purification scheme that provided 20-fold increased enzyme yield over previous results. Structural characterization of electrophoretically pure TT-PME by MALDI-TOF MS determined molecular masses of approximately 47900 and 53000 Da for two principal glycoisoforms. De novo sequences generated from tryptic peptides by MALDI-TOF/TOF MS matched multiple anonymous Citrus EST cDNA accessions. The complete tt-pme cDNA (1710 base pair) was cloned from a fruit mRNA library using RT- and RLM-RACE PCR. Citrus TT-PME is a novel isoform that showed higher sequence identity with the multiply glycosylated kiwifruit PME than to previously described Citrus thermally labile PME isoforms.
Percolation in random-Sierpiński carpets: A real space renormalization group approach
NASA Astrophysics Data System (ADS)
Perreau, Michel; Peiro, Joaquina; Berthier, Serge
1996-11-01
The site percolation transition in random Sierpiński carpets is investigated by real space renormalization. The fixed point is not unique like in regular translationally invariant lattices, but depends on the number k of segmentation steps of the generation process of the fractal. It is shown that, for each scale invariance ratio n, the sequence of fixed points pn,k is increasing with k, and converges when k-->∞ toward a limit pn strictly less than 1. Moreover, in such scale invariant structures, the percolation threshold does not depend only on the scale invariance ratio n, but also on the scale. The sequence pn,k and pn are calculated for n=4, 8, 16, 32, and 64, and for k=1 to k=11, and k=∞. The corresponding thermal exponent sequence νn,k is calculated for n=8 and 16, and for k=1 to k=5, and k=∞. Suggestions are made for an experimental test in physical self-similar structures.
ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data
Krestel, Ralf; Ohler, Uwe; Vingron, Martin; Marsico, Annalisa
2017-01-01
Abstract RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or employ models which are not directly interpretable as sequence-structure motifs. We developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and Gibbs sampling which fully captures the relationship between RNA sequence and secondary structure preference of a given RBP. Compared to previous methods which output separate logos for sequence and structure, it directly produces a combined sequence-structure motif when trained on a large set of sequences. ssHMM’s model is visualized intuitively as a graph and facilitates biological interpretation. ssHMM can be used to find novel bona fide sequence-structure motifs of uncharacterized RBPs, such as the one presented here for the YY1 protein. ssHMM reaches a high motif recovery rate on synthetic data, it recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input size, being considerably faster than MEMERIS and RNAcontext on large datasets while being on par with GraphProt. It is freely available on Github and as a Docker image. PMID:28977546
Memory and learning with rapid audiovisual sequences
Keller, Arielle S.; Sekuler, Robert
2015-01-01
We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed. PMID:26575193
Memory and learning with rapid audiovisual sequences.
Keller, Arielle S; Sekuler, Robert
2015-01-01
We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed.
Precipitation process in a Mg–Gd–Y alloy grain-refined by Al addition
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dai, Jichun; CAST Cooperative Research Centre, Department of Materials Engineering, Monash University, Victoria 3800; Zhu, Suming, E-mail: suming.zhu@monash.edu
2014-02-15
The precipitation process in Mg–10Gd–3Y (wt.%) alloy grain-refined by 0.8 wt.% Al addition has been investigated by transmission electron microscopy. The alloy was given a solution treatment at 520 °C for 6 h plus 550 °C for 7 h before ageing at 250 °C. Plate-shaped intermetallic particles with the 18R-type long-period stacking ordered structure were observed in the solution-treated state. Upon isothermal ageing at 250 °C, the following precipitation sequence was identified for the α-Mg supersaturated solution: β″ (D0{sub 19}) → β′ (bco) → β{sub 1} (fcc) → β (fcc). The observed precipitation process and age hardening response in themore » Al grain-refined Mg–10Gd–3Y alloy are compared with those reported in the Zr grain-refined counterpart. - Highlights: • The precipitation process in Mg–10Gd–3Y–0.8Al (wt.%) alloy has been investigated. • Particles with the 18R-type LPSO structure were observed in the solution state. • Upon ageing at 250 °C, the precipitation sequence is: β″ → β′ → β1 (fcc) → β. • The Al grain-refined alloy has a lower hardness than the Zr refined counterpart.« less
Using structure to explore the sequence alignment space of remote homologs.
Kuziemko, Andrew; Honig, Barry; Petrey, Donald
2011-10-01
Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is "optimal" in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are "suboptimal" in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for "modelability", we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended.
High throughput profile-profile based fold recognition for the entire human proteome.
McGuffin, Liam J; Smith, Richard T; Bryson, Kevin; Sørensen, Søren-Aksel; Jones, David T
2006-06-07
In order to maintain the most comprehensive structural annotation databases we must carry out regular updates for each proteome using the latest profile-profile fold recognition methods. The ability to carry out these updates on demand is necessary to keep pace with the regular updates of sequence and structure databases. Providing the highest quality structural models requires the most intensive profile-profile fold recognition methods running with the very latest available sequence databases and fold libraries. However, running these methods on such a regular basis for every sequenced proteome requires large amounts of processing power. In this paper we describe and benchmark the JYDE (Job Yield Distribution Environment) system, which is a meta-scheduler designed to work above cluster schedulers, such as Sun Grid Engine (SGE) or Condor. We demonstrate the ability of JYDE to distribute the load of genomic-scale fold recognition across multiple independent Grid domains. We use the most recent profile-profile version of our mGenTHREADER software in order to annotate the latest version of the Human proteome against the latest sequence and structure databases in as short a time as possible. We show that our JYDE system is able to scale to large numbers of intensive fold recognition jobs running across several independent computer clusters. Using our JYDE system we have been able to annotate 99.9% of the protein sequences within the Human proteome in less than 24 hours, by harnessing over 500 CPUs from 3 independent Grid domains. This study clearly demonstrates the feasibility of carrying out on demand high quality structural annotations for the proteomes of major eukaryotic organisms. Specifically, we have shown that it is now possible to provide complete regular updates of profile-profile based fold recognition models for entire eukaryotic proteomes, through the use of Grid middleware such as JYDE.
Thrombin-like enzymes from snake venom: Structural characterization and mechanism of action.
Ullah, Anwar; Masood, Rehana; Ali, Ijaz; Ullah, Kifayat; Ali, Hamid; Akbar, Haji; Betzel, Christian
2018-07-15
Snake venom thrombin-like enzymes (SVTLEs) constitute the major portion (10-24%) of snake venom and these are the second most abundant enzymes present in the crude venom. During envenomation, these enzymes had shown prominently the various pathological effects, such as disturbance in hemostatic system, fibrinogenolysis, fibrinolysis, platelet aggregation, thrombosis, neurologic disorders, activation of coagulation factors, coagulant, procoagulant etc. These enzymes also been used as a therapeutic agent for the treatment of various diseases such as congestive heart failure, ischemic stroke, thrombotic disorders etc. Although the crystal structures of five SVTLEs are available in the Protein Data Bank (PDB), there is no single article present in the literature that has described all of them. The current work describes the structural aspects, structure-based mechanism of action, processing and inhibition of these enzymes. The sequence analysis indicates that these enzymes show a high sequence identity (57-85%) with each other and low sequence identity with trypsin (36-43%), human alpha-thrombin (29-36%) and other snake venom serine proteinases (57-85%). Three-dimensional structural analysis indicates that the loops surrounding the active site are variable both in amino acids composition and length that may convey variable substrate specificity to these enzymes. The surface charge distributions also vary in these enzymes. Docking analysis with suramin shows that this inhibitor preferably binds to the C-terminal region of these enzymes and causes the destabilization of their three-dimensional structure. Copyright © 2018 Elsevier B.V. All rights reserved.
Robert Slevc, L.; Rosenberg, Jason C.; Patel, Aniruddh D.
2009-01-01
Linguistic processing–especially syntactic processing–is often considered a hallmark of human cognition, thus the domain-specificity or domain-generality of syntactic processing has attracted considerable debate. These experiments address this issue by simultaneously manipulating syntactic processing demands in language and music. Participants performed self-paced reading of garden-path sentences in which structurally unexpected words cause temporary syntactic processing difficulty. A musical chord accompanied each sentence segment, with the resulting sequence forming a coherent chord progression. When structurally unexpected words were paired with harmonically unexpected chords, participants showed substantially enhanced garden-path effects. No such interaction was observed when the critical words violated semantic expectancy, nor when the critical chords violated timbral expectancy. These results support a prediction of the shared syntactic integration resource hypothesis (SSIRH, Patel, 2003), which suggests that music and language draw on a common pool of limited processing resources for integrating incoming elements into syntactic structures. PMID:19293110
Identification of sequence–structure RNA binding motifs for SELEX-derived aptamers
Hoinka, Jan; Zotenko, Elena; Friedman, Adam; Sauna, Zuben E.; Przytycka, Teresa M.
2012-01-01
Motivation: Systematic Evolution of Ligands by EXponential Enrichment (SELEX) represents a state-of-the-art technology to isolate single-stranded (ribo)nucleic acid fragments, named aptamers, which bind to a molecule (or molecules) of interest via specific structural regions induced by their sequence-dependent fold. This powerful method has applications in designing protein inhibitors, molecular detection systems, therapeutic drugs and antibody replacement among others. However, full understanding and consequently optimal utilization of the process has lagged behind its wide application due to the lack of dedicated computational approaches. At the same time, the combination of SELEX with novel sequencing technologies is beginning to provide the data that will allow the examination of a variety of properties of the selection process. Results: To close this gap we developed, Aptamotif, a computational method for the identification of sequence–structure motifs in SELEX-derived aptamers. To increase the chances of identifying functional motifs, Aptamotif uses an ensemble-based approach. We validated the method using two published aptamer datasets containing experimentally determined motifs of increasing complexity. We were able to recreate the author's findings to a high degree, thus proving the capability of our approach to identify binding motifs in SELEX data. Additionally, using our new experimental dataset, we illustrate the application of Aptamotif to elucidate several properties of the selection process. Contact: przytyck@ncbi.nlm.nih.gov, Zuben.Sauna@fda.hhs.gov PMID:22689764
Visualizing and Clustering Protein Similarity Networks: Sequences, Structures, and Functions.
Mai, Te-Lun; Hu, Geng-Ming; Chen, Chi-Ming
2016-07-01
Research in the recent decade has demonstrated the usefulness of protein network knowledge in furthering the study of molecular evolution of proteins, understanding the robustness of cells to perturbation, and annotating new protein functions. In this study, we aimed to provide a general clustering approach to visualize the sequence-structure-function relationship of protein networks, and investigate possible causes for inconsistency in the protein classifications based on sequences, structures, and functions. Such visualization of protein networks could facilitate our understanding of the overall relationship among proteins and help researchers comprehend various protein databases. As a demonstration, we clustered 1437 enzymes by their sequences and structures using the minimum span clustering (MSC) method. The general structure of this protein network was delineated at two clustering resolutions, and the second level MSC clustering was found to be highly similar to existing enzyme classifications. The clustering of these enzymes based on sequence, structure, and function information is consistent with each other. For proteases, the Jaccard's similarity coefficient is 0.86 between sequence and function classifications, 0.82 between sequence and structure classifications, and 0.78 between structure and function classifications. From our clustering results, we discussed possible examples of divergent evolution and convergent evolution of enzymes. Our clustering approach provides a panoramic view of the sequence-structure-function network of proteins, helps visualize the relation between related proteins intuitively, and is useful in predicting the structure and function of newly determined protein sequences.
Consistent global structures of complex RNA states through multidimensional chemical mapping
Cheng, Clarence Yu; Chou, Fang-Chieh; Kladwang, Wipapat; Tian, Siqi; Cordero, Pablo; Das, Rhiju
2015-01-01
Accelerating discoveries of non-coding RNA (ncRNA) in myriad biological processes pose major challenges to structural and functional analysis. Despite progress in secondary structure modeling, high-throughput methods have generally failed to determine ncRNA tertiary structures, even at the 1-nm resolution that enables visualization of how helices and functional motifs are positioned in three dimensions. We report that integrating a new method called MOHCA-seq (Multiplexed •OH Cleavage Analysis with paired-end sequencing) with mutate-and-map secondary structure inference guides Rosetta 3D modeling to consistent 1-nm accuracy for intricately folded ncRNAs with lengths up to 188 nucleotides, including a blind RNA-puzzle challenge, the lariat-capping ribozyme. This multidimensional chemical mapping (MCM) pipeline resolves unexpected tertiary proximities for cyclic-di-GMP, glycine, and adenosylcobalamin riboswitch aptamers without their ligands and a loose structure for the recently discovered human HoxA9D internal ribosome entry site regulon. MCM offers a sequencing-based route to uncovering ncRNA 3D structure, applicable to functionally important but potentially heterogeneous states. DOI: http://dx.doi.org/10.7554/eLife.07600.001 PMID:26035425
Inter-subject synchronization of brain responses during natural music listening.
Abrams, Daniel A; Ryali, Srikanth; Chen, Tianwen; Chordia, Parag; Khouzam, Amirah; Levitin, Daniel J; Menon, Vinod
2013-05-01
Music is a cultural universal and a rich part of the human experience. However, little is known about common brain systems that support the processing and integration of extended, naturalistic 'real-world' music stimuli. We examined this question by presenting extended excerpts of symphonic music, and two pseudomusical stimuli in which the temporal and spectral structure of the Natural Music condition were disrupted, to non-musician participants undergoing functional brain imaging and analysing synchronized spatiotemporal activity patterns between listeners. We found that music synchronizes brain responses across listeners in bilateral auditory midbrain and thalamus, primary auditory and auditory association cortex, right-lateralized structures in frontal and parietal cortex, and motor planning regions of the brain. These effects were greater for natural music compared to the pseudo-musical control conditions. Remarkably, inter-subject synchronization in the inferior colliculus and medial geniculate nucleus was also greater for the natural music condition, indicating that synchronization at these early stages of auditory processing is not simply driven by spectro-temporal features of the stimulus. Increased synchronization during music listening was also evident in a right-hemisphere fronto-parietal attention network and bilateral cortical regions involved in motor planning. While these brain structures have previously been implicated in various aspects of musical processing, our results are the first to show that these regions track structural elements of a musical stimulus over extended time periods lasting minutes. Our results show that a hierarchical distributed network is synchronized between individuals during the processing of extended musical sequences, and provide new insight into the temporal integration of complex and biologically salient auditory sequences. © 2013 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Modeling repetitive, non‐globular proteins
Basu, Koli; Campbell, Robert L.; Guo, Shuaiqi; Sun, Tianjun
2016-01-01
Abstract While ab initio modeling of protein structures is not routine, certain types of proteins are more straightforward to model than others. Proteins with short repetitive sequences typically exhibit repetitive structures. These repetitive sequences can be more amenable to modeling if some information is known about the predominant secondary structure or other key features of the protein sequence. We have successfully built models of a number of repetitive structures with novel folds using knowledge of the consensus sequence within the sequence repeat and an understanding of the likely secondary structures that these may adopt. Our methods for achieving this success are reviewed here. PMID:26914323
Computational analysis of sequence selection mechanisms.
Meyerguz, Leonid; Grasso, Catherine; Kleinberg, Jon; Elber, Ron
2004-04-01
Mechanisms leading to gene variations are responsible for the diversity of species and are important components of the theory of evolution. One constraint on gene evolution is that of protein foldability; the three-dimensional shapes of proteins must be thermodynamically stable. We explore the impact of this constraint and calculate properties of foldable sequences using 3660 structures from the Protein Data Bank. We seek a selection function that receives sequences as input, and outputs survival probability based on sequence fitness to structure. We compute the number of sequences that match a particular protein structure with energy lower than the native sequence, the density of the number of sequences, the entropy, and the "selection" temperature. The mechanism of structure selection for sequences longer than 200 amino acids is approximately universal. For shorter sequences, it is not. We speculate on concrete evolutionary mechanisms that show this behavior.
@TOME-2: a new pipeline for comparative modeling of protein-ligand complexes.
Pons, Jean-Luc; Labesse, Gilles
2009-07-01
@TOME 2.0 is new web pipeline dedicated to protein structure modeling and small ligand docking based on comparative analyses. @TOME 2.0 allows fold recognition, template selection, structural alignment editing, structure comparisons, 3D-model building and evaluation. These tasks are routinely used in sequence analyses for structure prediction. In our pipeline the necessary software is efficiently interconnected in an original manner to accelerate all the processes. Furthermore, we have also connected comparative docking of small ligands that is performed using protein-protein superposition. The input is a simple protein sequence in one-letter code with no comment. The resulting 3D model, protein-ligand complexes and structural alignments can be visualized through dedicated Web interfaces or can be downloaded for further studies. These original features will aid in the functional annotation of proteins and the selection of templates for molecular modeling and virtual screening. Several examples are described to highlight some of the new functionalities provided by this pipeline. The server and its documentation are freely available at http://abcis.cbs.cnrs.fr/AT2/
@TOME-2: a new pipeline for comparative modeling of protein–ligand complexes
Pons, Jean-Luc; Labesse, Gilles
2009-01-01
@TOME 2.0 is new web pipeline dedicated to protein structure modeling and small ligand docking based on comparative analyses. @TOME 2.0 allows fold recognition, template selection, structural alignment editing, structure comparisons, 3D-model building and evaluation. These tasks are routinely used in sequence analyses for structure prediction. In our pipeline the necessary software is efficiently interconnected in an original manner to accelerate all the processes. Furthermore, we have also connected comparative docking of small ligands that is performed using protein–protein superposition. The input is a simple protein sequence in one-letter code with no comment. The resulting 3D model, protein–ligand complexes and structural alignments can be visualized through dedicated Web interfaces or can be downloaded for further studies. These original features will aid in the functional annotation of proteins and the selection of templates for molecular modeling and virtual screening. Several examples are described to highlight some of the new functionalities provided by this pipeline. The server and its documentation are freely available at http://abcis.cbs.cnrs.fr/AT2/ PMID:19443448
Pairwise Sequence Alignment Library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jeff Daily, PNNL
2015-05-20
Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, amore » novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lapidus, Alla L.
From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly ofmore » whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.« less
The UMR Conception Cycle of Vocational School Students in Solving Linear Equation
ERIC Educational Resources Information Center
Li, Shao-Ying; Leon, Shian
2013-01-01
The authors designed instruments from theories and literatures. Data were collected throughout remedial teaching processes and interviewed with vocational school students. By SOLO (structure of the observed learning outcome) taxonomy, the authors made the UMR (unistructural-multistructural-relational sequence) conception cycle of the formative and…
Inferring Action Structure and Causal Relationships in Continuous Sequences of Human Action
2014-01-01
language processing literature (e.g., Brent, 1999; Venkataraman , 2001), and which were also used by Goldwater et al. (2009). Precision (P) is the...trees in oriented linear graphs. Simon Stevin: Wis-en Natuurkundig Tijdschrift, 28 , 203. Venkataraman , A. (2001). A statistical model for word discovery
The Decision Tree: A Tool for Achieving Behavioral Change.
ERIC Educational Resources Information Center
Saren, Dru
1999-01-01
Presents a "Decision Tree" process for structuring team decision making and problem solving about specific student behavioral goals. The Decision Tree involves a sequence of questions/decisions that can be answered in "yes/no" terms. Questions address reasonableness of the goal, time factors, importance of the goal, responsibilities, safety,…
Vouille, V; Amiche, M; Nicolas, P
1997-09-01
We cloned the genes of two members of the dermaseptin family, broad-spectrum antimicrobial peptides isolated from the skin of the arboreal frog Phyllomedusa bicolor. The dermaseptin gene Drg2 has a 2-exon coding structure interrupted by a small 137-bp intron, wherein exon 1 encoded a 22-residue hydrophobic signal peptide and the first three amino acids of the acidic propiece; exon 2 contained the 18 additional acidic residues of the propiece plus a typical prohormone processing signal Lys-Arg and a 32-residue dermaseptin progenitor sequence. The dermaseptin genes Drg2 and Drg1g2 have conserved sequences at both untranslated ends and in the first and second coding exons. In contrast, Drg1g2 comprises a third coding exon for a short version of the acidic propiece and a second dermaseptin progenitor sequence. Structural conservation between the two genes suggests that Drg1g2 arose recently from an ancestral Drg2-like gene through amplification of part of the second coding exon and 3'-untranslated region. Analysis of the cDNAs coding precursors for several frog skin peptides of highly different structures and activities demonstrates that the signal peptides and part of the acidic propieces are encoded by conserved nucleotides encompassed by the first coding exon of the dermaseptin genes. The organization of the genes that belong to this family, with the signal peptide and the progenitor sequence on separate exons, permits strikingly different peptides to be directed into the secretory pathway. The recruitment of such a homologous 'secretory' exon by otherwise non-homologous genes may have been an early event in the evolution of amphibian.
Holm, Liisa; Laakso, Laura M
2016-07-08
The Dali server (http://ekhidna2.biocenter.helsinki.fi/dali) is a network service for comparing protein structures in 3D. In favourable cases, comparing 3D structures may reveal biologically interesting similarities that are not detectable by comparing sequences. The Dali server has been running in various places for over 20 years and is used routinely by crystallographers on newly solved structures. The latest update of the server provides enhanced analytics for the study of sequence and structure conservation. The server performs three types of structure comparisons: (i) Protein Data Bank (PDB) search compares one query structure against those in the PDB and returns a list of similar structures; (ii) pairwise comparison compares one query structure against a list of structures specified by the user; and (iii) all against all structure comparison returns a structural similarity matrix, a dendrogram and a multidimensional scaling projection of a set of structures specified by the user. Structural superimpositions are visualized using the Java-free WebGL viewer PV. The structural alignment view is enhanced by sequence similarity searches against Uniprot. The combined structure-sequence alignment information is compressed to a stack of aligned sequence logos. In the stack, each structure is structurally aligned to the query protein and represented by a sequence logo. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Jossinet, Fabrice; Westhof, Eric
2005-08-01
Efficient RNA sequence manipulations (such as multiple alignments) need to be constrained by rules of RNA structure folding. The structural knowledge has increased dramatically in the last years with the accumulation of several large RNA structures similar to those of the bacterial ribosome subunits. However, no tool in the RNA community provides an easy way to link and integrate progress made at the sequence level using the available three-dimensional information. Sequence to Structure (S2S) proposes a framework in which an user can easily display, manipulate and interconnect heterogeneous RNA data, such as multiple sequence alignments, secondary and tertiary structures. S2S has been implemented using the Java language and has been developed and tested under UNIX systems, such as Linux and MacOSX. S2S is available at http://bioinformatics.org/S2S/.
JPL-ANTOPT antenna structure optimization program
NASA Technical Reports Server (NTRS)
Strain, D. M.
1994-01-01
New antenna path-length error and pointing-error structure optimization codes were recently added to the MSC/NASTRAN structural analysis computer program. Path-length and pointing errors are important measured of structure-related antenna performance. The path-length and pointing errors are treated as scalar displacements for statics loading cases. These scalar displacements can be subject to constraint during the optimization process. Path-length and pointing-error calculations supplement the other optimization and sensitivity capabilities of NASTRAN. The analysis and design functions were implemented as 'DMAP ALTERs' to the Design Optimization (SOL 200) Solution Sequence of MSC-NASTRAN, Version 67.5.
Slevc, L Robert; Rosenberg, Jason C; Patel, Aniruddh D
2009-04-01
Linguistic processing, especially syntactic processing, is often considered a hallmark of human cognition; thus, the domain specificity or domain generality of syntactic processing has attracted considerable debate. The present experiments address this issue by simultaneously manipulating syntactic processing demands in language and music. Participants performed self-paced reading of garden path sentences, in which structurally unexpected words cause temporary syntactic processing difficulty. A musical chord accompanied each sentence segment, with the resulting sequence forming a coherent chord progression. When structurally unexpected words were paired with harmonically unexpected chords, participants showed substantially enhanced garden path effects. No such interaction was observed when the critical words violated semantic expectancy or when the critical chords violated timbral expectancy. These results support a prediction of the shared syntactic integration resource hypothesis (Patel, 2003), which suggests that music and language draw on a common pool of limited processing resources for integrating incoming elements into syntactic structures. Notations of the stimuli from this study may be downloaded from pbr.psychonomic-journals.org/content/supplemental.
Cryo-EM Structure Determination Using Segmented Helical Image Reconstruction.
Fromm, S A; Sachse, C
2016-01-01
Treating helices as single-particle-like segments followed by helical image reconstruction has become the method of choice for high-resolution structure determination of well-ordered helical viruses as well as flexible filaments. In this review, we will illustrate how the combination of latest hardware developments with optimized image processing routines have led to a series of near-atomic resolution structures of helical assemblies. Originally, the treatment of helices as a sequence of segments followed by Fourier-Bessel reconstruction revealed the potential to determine near-atomic resolution structures from helical specimens. In the meantime, real-space image processing of helices in a stack of single particles was developed and enabled the structure determination of specimens that resisted classical Fourier helical reconstruction and also facilitated high-resolution structure determination. Despite the progress in real-space analysis, the combination of Fourier and real-space processing is still commonly used to better estimate the symmetry parameters as the imposition of the correct helical symmetry is essential for high-resolution structure determination. Recent hardware advancement by the introduction of direct electron detectors has significantly enhanced the image quality and together with improved image processing procedures has made segmented helical reconstruction a very productive cryo-EM structure determination method. © 2016 Elsevier Inc. All rights reserved.
GDAP: a web tool for genome-wide protein disulfide bond prediction.
O'Connor, Brian D; Yeates, Todd O
2004-07-01
The Genomic Disulfide Analysis Program (GDAP) provides web access to computationally predicted protein disulfide bonds for over one hundred microbial genomes, including both bacterial and achaeal species. In the GDAP process, sequences of unknown structure are mapped, when possible, to known homologous Protein Data Bank (PDB) structures, after which specific distance criteria are applied to predict disulfide bonds. GDAP also accepts user-supplied protein sequences and subsequently queries the PDB sequence database for the best matches, scans for possible disulfide bonds and returns the results to the client. These predictions are useful for a variety of applications and have previously been used to show a dramatic preference in certain thermophilic archaea and bacteria for disulfide bonds within intracellular proteins. Given the central role these stabilizing, covalent bonds play in such organisms, the predictions available from GDAP provide a rich data source for designing site-directed mutants with more stable thermal profiles. The GDAP web application is a gateway to this information and can be used to understand the role disulfide bonds play in protein stability both in these unusual organisms and in sequences of interest to the individual researcher. The prediction server can be accessed at http://www.doe-mbi.ucla.edu/Services/GDAP.
Bystrykh, L V; Vonck, J; van Bruggen, E F; van Beeumen, J; Samyn, B; Govorukhina, N I; Arfman, N; Duine, J A; Dijkhuizen, L
1993-01-01
The quaternary protein structure of two methanol:N,N'-dimethyl-4-nitrosoaniline (NDMA) oxidoreductases purified from Amycolatopsis methanolica and Mycobacterium gastri MB19 was analyzed by electron microscopy and image processing. The enzymes are decameric proteins (displaying fivefold symmetry) with estimated molecular masses of 490 to 500 kDa based on their subunit molecular masses of 49 to 50 kDa. Both methanol:NDMA oxidoreductases possess a tightly but noncovalently bound NADP(H) cofactor at an NADPH-to-subunit molar ratio of 0.7. These cofactors are redox active toward alcohol and aldehyde substrates. Both enzymes contain significant amounts of Zn2+ and Mg2+ ions. The primary amino acid sequences of the A. methanolica and M. gastri MB19 methanol:NDMA oxidoreductases share a high degree of identity, as indicated by N-terminal sequence analysis (63% identity among the first 27 N-terminal amino acids), internal peptide sequence analysis, and overall amino acid composition. The amino acid sequence analysis also revealed significant similarity to a decameric methanol dehydrogenase of Bacillus methanolicus C1. Images PMID:8449887
System for plotting subsoil structure and method therefor
NASA Technical Reports Server (NTRS)
Narasimhan, K. Y.; Nathan, R.; Parthasarathy, S. P. (Inventor)
1980-01-01
Data for use in producing a tomograph of subsoil structure between boreholes is derived by pacing spaced geophones in one borehole, on the Earth surface if desired, and by producing a sequence of shots at spaced apart locations in the other borehole. The signals, detected by each of the geophones from the various shots, are processed either on a time of arrival basis, or on the basis of signal amplitude, to provide information of the characteristics of a large number of incremental areas between the boreholes. Such information is useable to produce a tomograph of the subsoil structure between the boreholes. By processing signals of relatively high frequencies, e.g., up to 100 Hz, and by closely spacing the geophones, a high resolution tomograph can be produced.
Iterative refinement of structure-based sequence alignments by Seed Extension
Kim, Changhoon; Tai, Chin-Hsien; Lee, Byungkook
2009-01-01
Background Accurate sequence alignment is required in many bioinformatics applications but, when sequence similarity is low, it is difficult to obtain accurate alignments based on sequence similarity alone. The accuracy improves when the structures are available, but current structure-based sequence alignment procedures still mis-align substantial numbers of residues. In order to correct such errors, we previously explored the possibility of replacing the residue-based dynamic programming algorithm in structure alignment procedures with the Seed Extension algorithm, which does not use a gap penalty. Here, we describe a new procedure called RSE (Refinement with Seed Extension) that iteratively refines a structure-based sequence alignment. Results RSE uses SE (Seed Extension) in its core, which is an algorithm that we reported recently for obtaining a sequence alignment from two superimposed structures. The RSE procedure was evaluated by comparing the correctly aligned fractions of residues before and after the refinement of the structure-based sequence alignments produced by popular programs. CE, DaliLite, FAST, LOCK2, MATRAS, MATT, TM-align, SHEBA and VAST were included in this analysis and the NCBI's CDD root node set was used as the reference alignments. RSE improved the average accuracy of sequence alignments for all programs tested when no shift error was allowed. The amount of improvement varied depending on the program. The average improvements were small for DaliLite and MATRAS but about 5% for CE and VAST. More substantial improvements have been seen in many individual cases. The additional computation times required for the refinements were negligible compared to the times taken by the structure alignment programs. Conclusion RSE is a computationally inexpensive way of improving the accuracy of a structure-based sequence alignment. It can be used as a standalone procedure following a regular structure-based sequence alignment or to replace the traditional iterative refinement procedures based on residue-level dynamic programming algorithm in many structure alignment programs. PMID:19589133
Dykeman, Eric C; Stockley, Peter G; Twarock, Reidun
2013-09-09
The current paradigm for assembly of single-stranded RNA viruses is based on a mechanism involving non-sequence-specific packaging of genomic RNA driven by electrostatic interactions. Recent experiments, however, provide compelling evidence for sequence specificity in this process both in vitro and in vivo. The existence of multiple RNA packaging signals (PSs) within viral genomes has been proposed, which facilitates assembly by binding coat proteins in such a way that they promote the protein-protein contacts needed to build the capsid. The binding energy from these interactions enables the confinement or compaction of the genomic RNAs. Identifying the nature of such PSs is crucial for a full understanding of assembly, which is an as yet untapped potential drug target for this important class of pathogens. Here, for two related bacterial viruses, we determine the sequences and locations of their PSs using Hamiltonian paths, a concept from graph theory, in combination with bioinformatics and structural studies. Their PSs have a common secondary structure motif but distinct consensus sequences and positions within the respective genomes. Despite these differences, the distributions of PSs in both viruses imply defined conformations for the packaged RNA genomes in contact with the protein shell in the capsid, consistent with a recent asymmetric structure determination of the MS2 virion. The PS distributions identified moreover imply a preferred, evolutionarily conserved assembly pathway with respect to the RNA sequence with potentially profound implications for other single-stranded RNA viruses known to have RNA PSs, including many animal and human pathogens. Copyright © 2013 Elsevier Ltd. All rights reserved.
SeqRate: sequence-based protein folding type classification and rates prediction
2010-01-01
Background Protein folding rate is an important property of a protein. Predicting protein folding rate is useful for understanding protein folding process and guiding protein design. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. And most methods do not distinguish the different kinetic nature (two-state folding or multi-state folding) of the proteins. Here we developed a method, SeqRate, to predict both protein folding kinetic type (two-state versus multi-state) and real-value folding rate using sequence length, amino acid composition, contact order, contact number, and secondary structure information predicted from only protein sequence with support vector machines. Results We systematically studied the contributions of individual features to folding rate prediction. On a standard benchmark dataset, the accuracy of folding kinetic type classification is 80%. The Pearson correlation coefficient and the mean absolute difference between predicted and experimental folding rates (sec-1) in the base-10 logarithmic scale are 0.81 and 0.79 for two-state protein folders, and 0.80 and 0.68 for three-state protein folders. SeqRate is the first sequence-based method for protein folding type classification and its accuracy of fold rate prediction is improved over previous sequence-based methods. Its performance can be further enhanced with additional information, such as structure-based geometric contacts, as inputs. Conclusions Both the web server and software of predicting folding rate are publicly available at http://casp.rnet.missouri.edu/fold_rate/index.html. PMID:20438647
Predicting residue-wise contact orders in proteins by support vector regression.
Song, Jiangning; Burrage, Kevin
2006-10-03
The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.
Ghosh, Jayadri Sekhar; Bhattacharya, Samik; Pal, Amita
2017-06-01
The unavailability of the reproductive structure and unpredictability of vegetative characters for the identification and phylogenetic study of bamboo prompted the application of molecular techniques for greater resolution and consensus. We first employed internal transcribed spacer (ITS1, 5.8S rRNA and ITS2) sequences to construct the phylogenetic tree of 21 tropical bamboo species. While the sequence alone could grossly reconstruct the traditional phylogeny amongst the 21-tropical species studied, some anomalies were encountered that prompted a further refinement of the phylogenetic analyses. Therefore, we integrated the secondary structure of the ITS sequences to derive individual sequence-structure matrix to gain more resolution on the phylogenetic reconstruction. The results showed that ITS sequence-structure is the reliable alternative to the conventional phenotypic method for the identification of bamboo species. The best-fit topology obtained by the sequence-structure based phylogeny over the sole sequence based one underscores closer clustering of all the studied Bambusa species (Sub-tribe Bambusinae), while Melocanna baccifera, which belongs to Sub-Tribe Melocanneae, disjointedly clustered as an out-group within the consensus phylogenetic tree. In this study, we demonstrated the dependability of the combined (ITS sequence+structure-based) approach over the only sequence-based analysis for phylogenetic relationship assessment of bamboo.
nextPARS: parallel probing of RNA structures in Illumina
Saus, Ester; Willis, Jesse R.; Pryszcz, Leszek P.; Hafez, Ahmed; Llorens, Carlos; Himmelbauer, Heinz
2018-01-01
RNA molecules play important roles in virtually every cellular process. These functions are often mediated through the adoption of specific structures that enable RNAs to interact with other molecules. Thus, determining the secondary structures of RNAs is central to understanding their function and evolution. In recent years several sequencing-based approaches have been developed that allow probing structural features of thousands of RNA molecules present in a sample. Here, we describe nextPARS, a novel Illumina-based implementation of in vitro parallel probing of RNA structures. Our approach achieves comparable accuracy to previous implementations, while enabling higher throughput and sample multiplexing. PMID:29358234
Structural basis of viral invasion: lessons from paramyxovirus F
Lamb, Robert A.; Jardetzky, Theodore S.
2007-01-01
Summary The structures of glycoproteins that mediate enveloped virus entry into cells have revealed dramatic structural changes that accompany membrane fusion and provided mechanistic insights into this process. The group of class I viral fusion proteins includes the influenza hemagglutinin, paramyxovirus F, HIV env and other mechanistically related fusogens, but these proteins are unrelated in sequence and exhibit clearly distinct structural features. Recently determined crystal structures of the paramyxovirus F protein in two conformations, representing prefusion and postfusion states, reveal a novel protein architecture that undergoes large-scale, irreversible refolding during membrane fusion, extending our understanding of this diverse group of membrane fusion machines. PMID:17870467
Mustafa, Farah; Vivet-Boudou, Valérie; Jabeen, Ayesha; Ali, Lizna M; Kalloush, Rawan M; Marquet, Roland; Rizvi, Tahir A
2018-06-21
Packaging the mouse mammary tumor virus (MMTV) genomic RNA (gRNA) requires the entire 5' untranslated region (UTR) in conjunction with the first 120 nucleotides of the gag gene. This region includes several palindromic (pal) sequence(s) and stable stem loops (SLs). Among these, stem loop 4 (SL4) adopts a bifurcated structure consisting of three stems, two apical loops, and an internal loop. Pal II, located in one of the apical loops, mediates gRNA dimerization, a process intricately linked to packaging. We thus hypothesized that the bifurcated SL4 structure could constitute the major gRNA packaging determinant. To test this hypothesis, the two apical loops and the flanking sequences forming the bifurcated SL4 were individually mutated. These mutations all had deleterious effects on gRNA packaging and propagation. Next, single and compensatory mutants were designed to destabilize then recreate the bifurcated SL4 structure. A structure-function analysis using bioinformatics predictions and RNA chemical probing revealed that mutations that led to the loss of the SL4 bifurcated structure abrogated RNA packaging and propagation, while compensatory mutations that recreated the native SL4 structure restored RNA packaging and propagation to wild type levels. Altogether, our results demonstrate that SL4 constitutes the principal packaging determinant of MMTV gRNA. Our findings further suggest that SL4 acts as a structural switch that can not only differentiate between RNA for translation versus packaging/dimerization, but its location also allows differentiation between spliced and unspliced RNAs during gRNA encapsidation.
Cloning and bioinformatic analysis of lovastatin biosynthesis regulatory gene lovE.
Huang, Xin; Li, Hao-ming
2009-08-05
Lovastatin is an effective drug for treatment of hyperlipidemia. This study aimed to clone lovastatin biosynthesis regulatory gene lovE and analyze the structure and function of its encoding protein. According to the lovastatin synthase gene sequence from genebank, primers were designed to amplify and clone the lovastatin biosynthesis regulatory gene lovE from Aspergillus terrus genomic DNA. Bioinformatic analysis of lovE and its encoding animo acid sequence was performed through internet resources and software like DNAMAN. Target fragment lovE, almost 1500 bp in length, was amplified from Aspergillus terrus genomic DNA and the secondary and three-dimensional structures of LovE protein were predicted. In the lovastatin biosynthesis process lovE is a regulatory gene and LovE protein is a GAL4-like transcriptional factor.
Jang, Bora; Kim, Boyoung; Kim, Hyunsook; Kwon, Hyokyoung; Kim, Minjeong; Seo, Yunmi; Colas, Marion; Jeong, Hansaem; Jeong, Eun Hye; Lee, Kyuri; Lee, Hyukjin
2018-06-08
Enzymatic synthesis of RNA nanostructures is achieved by isothermal rolling circle transcription (RCT). Each arm of RNA nanostructures provides a functional role of Dicer substrate RNA inducing sequence specific RNA interference (RNAi). Three different RNAi sequences (GFP, RFP, and BFP) are incorporated within the three-arm junction RNA nanostructures (Y-RNA). The template and helper DNA strands are designed for the large-scale in vitro synthesis of RNA strands to prepare self-assembled Y-RNA. Interestingly, Dicer processing of Y-RNA is highly influenced by its physical structure and different gene silencing activity is achieved depending on its arm length and overhang. In addition, enzymatic synthesis allows the preparation of various Y-RNA structures using a single DNA template offering on demand regulation of multiple target genes.
Overcoming Sequence Misalignments with Weighted Structural Superposition
Khazanov, Nickolay A.; Damm-Ganamet, Kelly L.; Quang, Daniel X.; Carlson, Heather A.
2012-01-01
An appropriate structural superposition identifies similarities and differences between homologous proteins that are not evident from sequence alignments alone. We have coupled our Gaussian-weighted RMSD (wRMSD) tool with a sequence aligner and seed extension (SE) algorithm to create a robust technique for overlaying structures and aligning sequences of homologous proteins (HwRMSD). HwRMSD overcomes errors in the initial sequence alignment that would normally propagate into a standard RMSD overlay. SE can generate a corrected sequence alignment from the improved structural superposition obtained by wRMSD. HwRMSD’s robust performance and its superiority over standard RMSD are demonstrated over a range of homologous proteins. Its better overlay results in corrected sequence alignments with good agreement to HOMSTRAD. Finally, HwRMSD is compared to established structural alignment methods: FATCAT, SSM, CE, and Dalilite. Most methods are comparable at placing residue pairs within 2 Å, but HwRMSD places many more residue pairs within 1 Å, providing a clear advantage. Such high accuracy is essential in drug design, where small distances can have a large impact on computational predictions. This level of accuracy is also needed to correct sequence alignments in an automated fashion, especially for omics-scale analysis. HwRMSD can align homologs with low sequence identity and large conformational differences, cases where both sequence-based and structural-based methods may fail. The HwRMSD pipeline overcomes the dependency of structural overlays on initial sequence pairing and removes the need to determine the best sequence-alignment method, substitution matrix, and gap parameters for each unique pair of homologs. PMID:22733542
DNA secondary structures: stability and function of G-quadruplex structures
Bochman, Matthew L.; Paeschke, Katrin; Zakian, Virginia A.
2013-01-01
In addition to the canonical double helix, DNA can fold into various other inter- and intramolecular secondary structures. Although many such structures were long thought to be in vitro artefacts, bioinformatics demonstrates that DNA sequences capable of forming these structures are conserved throughout evolution, suggesting the existence of non-B-form DNA in vivo. In addition, genes whose products promote formation or resolution of these structures are found in diverse organisms, and a growing body of work suggests that the resolution of DNA secondary structures is critical for genome integrity. This Review focuses on emerging evidence relating to the characteristics of G-quadruplex structures and the possible influence of such structures on genomic stability and cellular processes, such as transcription. PMID:23032257
Inverted temperature sequences: role of deformation partitioning
NASA Astrophysics Data System (ADS)
Grujic, D.; Ashley, K. T.; Coble, M. A.; Coutand, I.; Kellett, D.; Whynot, N.
2015-12-01
The inverted metamorphism associated with the Main Central thrust zone in the Himalaya has been historically attributed to a number of tectonic processes. Here we show that there is actually a composite peak and deformation temperature sequence that formed in succession via different tectonic processes. The deformation partitioning seems to the have played a key role, and the magnitude of each process has varied along strike of the orogen. To explain the formation of the inverted metamorphic sequence across the Lesser Himalayan Sequence (LHS) in eastern Bhutan, we used Raman spectroscopy of carbonaceous material (RSCM) to determine the peak metamorphic temperatures and Ti-in-quartz thermobarometry to determine the deformation temperatures combined with thermochronology including published apatite and zircon U-Th/He and fission-track data and new 40Ar/39Ar dating of muscovite. The dataset was inverted using 3D-thermal-kinematic modeling to constrain the ranges of geological parameters such as fault geometry and slip rates, location and rates of localized basal accretion, and thermal properties of the crust. RSCM results indicate that there are two peak temperature sequences separated by a major thrust within the LHS. The internal temperature sequence shows an inverted peak temperature gradient of 12 °C/km; in the external (southern) sequence, the peak temperatures are constant across the structural sequence. Thermo-kinematic modeling suggest that the thermochronologic and thermobarometric data are compatible with a two-stage scenario: an Early-Middle Miocene phase of fast overthrusting of a hot hanging wall over a downgoing footwall and inversion of the synkinematic isotherms, followed by the formation of the external duplex developed by dominant underthrusting and basal accretion. To reconcile our observations with the experimental data, we suggest that pervasive ductile deformation within the upper LHS and along the Main Central thrust zone at its top stopped at ~11 Ma at which time the deformation shifted and focused within the external duplex and the Main Boundary Thrust.
NASA Astrophysics Data System (ADS)
Walker, David Lee
1999-12-01
This study uses dynamical analysis to examine in a quantitative fashion the information coding mechanism in DNA sequences. This exceeds the simple dichotomy of either modeling the mechanism by comparing DNA sequence walks as Fractal Brownian Motion (fbm) processes. The 2-D mappings of the DNA sequences for this research are from Iterated Function System (IFS) (Also known as the ``Chaos Game Representation'' (CGR)) mappings of the DNA sequences. This technique converts a 1-D sequence into a 2-D representation that preserves subsequence structure and provides a visual representation. The second step of this analysis involves the application of Wavelet Packet Transforms, a recently developed technique from the field of signal processing. A multi-fractal model is built by using wavelet transforms to estimate the Hurst exponent, H. The Hurst exponent is a non-parametric measurement of the dynamism of a system. This procedure is used to evaluate gene- coding events in the DNA sequence of cystic fibrosis mutations. The H exponent is calculated for various mutation sites in this gene. The results of this study indicate the presence of anti-persistent, random walks and persistent ``sub-periods'' in the sequence. This indicates the hypothesis of a multi-fractal model of DNA information encoding warrants further consideration. This work examines the model's behavior in both pathological (mutations) and non-pathological (healthy) base pair sequences of the cystic fibrosis gene. These mutations both natural and synthetic were introduced by computer manipulation of the original base pair text files. The results show that disease severity and system ``information dynamics'' correlate. These results have implications for genetic engineering as well as in mathematical biology. They suggest that there is scope for more multi-fractal models to be developed.
Nucleotide sequence of the gag gene and gag-pol junction of feline leukemia virus.
Laprevotte, I; Hampe, A; Sherr, C J; Galibert, F
1984-01-01
The nucleotide sequence of the gag gene of feline leukemia virus and its flanking sequences were determined and compared with the corresponding sequences of two strains of feline sarcoma virus and with that of the Moloney strain of murine leukemia virus. A high degree of nucleotide sequence homology between the feline leukemia virus and murine leukemia virus gag genes was observed, suggesting that retroviruses of domestic cats and laboratory mice have a common, proximal evolutionary progenitor. The predicted structure of the complete feline leukemia virus gag gene precursor suggests that the translation of nonglycosylated and glycosylated gag gene polypeptides is initiated at two different AUG codons. These initiator codons fall in the same reading frame and are separated by a 222-base-pair segment which encodes an amino terminal signal peptide. The nucleotide sequence predicts the order of amino acids in each of the individual gag-coded proteins (p15, p12, p30, p10), all of which derive from the gag gene precursor. Stable stem-and-loop secondary structures are proposed for two regions of viral RNA. The first falls within sequences at the 5' end of the viral genome, together with adjacent palindromic sequences which may play a role in dimer linkage of RNA subunits. The second includes coding sequences at the gag-pol junction and is proposed to be involved in translation of the pol gene product. Sequence analysis of the latter region shows that the gag and pol genes are translated in different reading frames. Classical consensus splice donor and acceptor sequences could not be localized to regions which would permit synthesis of the expected gag-pol precursor protein. Alternatively, we suggest that the pol gene product (RNA-dependent DNA polymerase) could be translated by a frameshift suppressing mechanism which could involve cleavage modification of stems and loops in a manner similar to that observed in tRNA processing. PMID:6328019
Harrigan, Robert L; Smith, Alex K; Mawn, Louise A; Smith, Seth A; Landman, Bennett A
2016-02-27
The optic nerve (ON) plays a crucial role in human vision transporting all visual information from the retina to the brain for higher order processing. There are many diseases that affect the ON structure such as optic neuritis, anterior ischemic optic neuropathy and multiple sclerosis. Because the ON is the sole pathway for visual information from the retina to areas of higher level processing, measures of ON damage have been shown to correlate well with visual deficits. Increased intracranial pressure has been shown to correlate with the size of the cerebrospinal fluid (CSF) surrounding the ON. These measures are generally taken at an arbitrary point along the nerve and do not account for changes along the length of the ON. We propose a high contrast and high-resolution 3-D acquired isotropic imaging sequence optimized for ON imaging. We have acquired scan-rescan data using the optimized sequence and a current standard of care protocol for 10 subjects. We show that this sequence has superior contrast-to-noise ratio to the current standard of care while achieving a factor of 11 higher resolution. We apply a previously published automatic pipeline to segment the ON and CSF sheath and measure the size of each individually. We show that these measures of ON size have lower short-term reproducibility than the population variance and the variability along the length of the nerve. We find that the proposed imaging protocol is (1) useful in detecting population differences and local changes and (2) a promising tool for investigating biomarkers related to structural changes of the ON.
Morgan, Alexander A.; Rubenstein, Edward
2013-01-01
Proline is an anomalous amino acid. Its nitrogen atom is covalently locked within a ring, thus it is the only proteinogenic amino acid with a constrained phi angle. Sequences of three consecutive prolines can fold into polyproline helices, structures that join alpha helices and beta pleats as architectural motifs in protein configuration. Triproline helices are participants in protein-protein signaling interactions. Longer spans of repeat prolines also occur, containing as many as 27 consecutive proline residues. Little is known about the frequency, positioning, and functional significance of these proline sequences. Therefore we have undertaken a systematic bioinformatics study of proline residues in proteins. We analyzed the distribution and frequency of 687,434 proline residues among 18,666 human proteins, identifying single residues, dimers, trimers, and longer repeats. Proline accounts for 6.3% of the 10,882,808 protein amino acids. Of all proline residues, 4.4% are in trimers or longer spans. We detected patterns that influence function based on proline location, spacing, and concentration. We propose a classification based on proline-rich, polyproline-rich, and proline-poor status. Whereas singlet proline residues are often found in proteins that display recurring architectural patterns, trimers or longer proline sequences tend be associated with the absence of repetitive structural motifs. Spans of 6 or more are associated with DNA/RNA processing, actin, and developmental processes. We also suggest a role for proline in Kruppel-type zinc finger protein control of DNA expression, and in the nucleation and translocation of actin by the formin complex. PMID:23372670
NASA Astrophysics Data System (ADS)
Harrigan, Robert L.; Smith, Alex K.; Mawn, Louise A.; Smith, Seth A.; Landman, Bennett A.
2016-03-01
The optic nerve (ON) plays a crucial role in human vision transporting all visual information from the retina to the brain for higher order processing. There are many diseases that affect the ON structure such as optic neuritis, anterior ischemic optic neuropathy and multiple sclerosis. Because the ON is the sole pathway for visual information from the retina to areas of higher level processing, measures of ON damage have been shown to correlate well with visual deficits. Increased intracranial pressure has been shown to correlate with the size of the cerebrospinal fluid (CSF) surrounding the ON. These measures are generally taken at an arbitrary point along the nerve and do not account for changes along the length of the ON. We propose a high contrast and high-resolution 3-D acquired isotropic imaging sequence optimized for ON imaging. We have acquired scan-rescan data using the optimized sequence and a current standard of care protocol for 10 subjects. We show that this sequence has superior contrast-to-noise ratio to the current standard of care while achieving a factor of 11 higher resolution. We apply a previously published automatic pipeline to segment the ON and CSF sheath and measure the size of each individually. We show that these measures of ON size have lower short- term reproducibility than the population variance and the variability along the length of the nerve. We find that the proposed imaging protocol is (1) useful in detecting population differences and local changes and (2) a promising tool for investigating biomarkers related to structural changes of the ON.
NASA Astrophysics Data System (ADS)
Omar, M. A.; Parvataneni, R.; Zhou, Y.
2010-09-01
Proposed manuscript describes the implementation of a two step processing procedure, composed of the self-referencing and the Principle Component Thermography (PCT). The combined approach enables the processing of thermograms from transient (flash), steady (halogen) and selective (induction) thermal perturbations. Firstly, the research discusses the three basic processing schemes typically applied for thermography; namely mathematical transformation based processing, curve-fitting processing, and direct contrast based calculations. Proposed algorithm utilizes the self-referencing scheme to create a sub-sequence that contains the maximum contrast information and also compute the anomalies' depth values. While, the Principle Component Thermography operates on the sub-sequence frames by re-arranging its data content (pixel values) spatially and temporally then it highlights the data variance. The PCT is mainly used as a mathematical mean to enhance the defects' contrast thus enabling its shape and size retrieval. The results show that the proposed combined scheme is effective in processing multiple size defects in sandwich steel structure in real-time (<30 Hz) and with full spatial coverage, without the need for a priori defect-free area.
Bobrova, E V; Liakhovetskiĭ, V A; Borshchevskaia, E R
2011-01-01
The dependence of errors during reproduction of a sequence of hand movements without visual feedback on the previous right- and left-hand performance ("prehistory") and on positions in space of sequence elements (random or ordered by the explicit rule) was analyzed. It was shown that the preceding information about the ordered positions of the sequence elements was used during right-hand movements, whereas left-hand movements were performed with involvement of the information about the random sequence. The data testify to a central mechanism of the analysis of spatial structure of sequence elements. This mechanism activates movement coding specific for the left hemisphere (vector coding) in case of an ordered sequence structure and positional coding specific for the right hemisphere in case of a random sequence structure.
Fang, Jing; Nevin, Philip; Kairys, Visvaldas; Venclovas, Česlovas; Engen, John R; Beuning, Penny J
2014-04-08
The relationship between protein sequence, structure, and dynamics has been elusive. Here, we report a comprehensive analysis using an in-solution experimental approach to study how the conservation of tertiary structure correlates with protein dynamics. Hydrogen exchange measurements of eight processivity clamp proteins from different species revealed that, despite highly similar three-dimensional structures, clamp proteins display a wide range of dynamic behavior. Differences were apparent both for structurally similar domains within proteins and for corresponding domains of different proteins. Several of the clamps contained regions that underwent local unfolding with different half-lives. We also observed a conserved pattern of alternating dynamics of the α helices lining the inner pore of the clamps as well as a correlation between dynamics and the number of salt bridges in these α helices. Our observations reveal that tertiary structure and dynamics are not directly correlated and that primary structure plays an important role in dynamics. Copyright © 2014 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mueser, Timothy C., E-mail: timothy.mueser@utoledo.edu; Griffith, Wendell P.; Kovalevsky, Andrey Y.
2010-11-01
X-ray and neutron diffraction studies of cyanomethemoglobin are being used to evaluate the structural waters within the dimer–dimer interface involved in quaternary-state transitions. Improvements in neutron diffraction instrumentation are affording the opportunity to re-examine the structures of vertebrate hemoglobins and to interrogate proton and solvent position changes between the different quaternary states of the protein. For hemoglobins of unknown primary sequence, structural studies of cyanomethemoglobin (CNmetHb) are being used to help to resolve sequence ambiguity in the mass spectra. These studies have also provided additional structural evidence for the involvement of oxidized hemoglobin in the process of erythrocyte senescence. X-raymore » crystal studies of Tibetan snow leopard CNmetHb have shown that this protein crystallizes in the B state, a structure with a more open dyad, which possibly has relevance to RBC band 3 protein binding and erythrocyte senescence. R-state equine CNmetHb crystal studies elaborate the solvent differences in the switch and hinge region compared with a human deoxyhemoglobin T-state neutron structure. Lastly, comparison of histidine protonation between the T and R state should enumerate the Bohr-effect protons.« less
Use of designed sequences in protein structure recognition.
Kumar, Gayatri; Mudgal, Richa; Srinivasan, Narayanaswamy; Sandhya, Sankaran
2018-05-09
Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as 'linkers', where natural linkers between distant proteins are unavailable. This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chain, P; Garcia, E
2003-02-06
The goal of this proposed effort was to assess the difficulty in identifying and characterizing virulence candidate genes in an organism for which very limited data exists. This was accomplished by first addressing the finishing phase of draft-sequenced F. tularensis genomes and conducting comparative analyses to determine the coding potential of each genome; to discover the differences in genome structure and content, and to identify potential genes whose products may be involved in the F. tularensis virulence process. The project was divided into three parts: (1) Genome finishing: This part involves determining the order and orientation of the consensus sequencesmore » of contigs obtained from Phrap assemblies of random draft genomic sequences. This tedious process consists of linking contig ends using information embedded in each sequence file that relates the sequence to the original cloned insert. Since inserts are sequenced from both ends, we can establish a link between these paired-ends in different contigs and thus order and orient contigs. Since these genomes carry numerous copies of insertion sequences, these repeated elements ''confuse'' the Phrap assembly program. It is thus necessary to break these contigs apart at the repeated sequences and individually join the proper flanking regions using paired-end information, or using results of comparisons against a similar genome. Larger repeated elements such as the small subunit ribosomal RNA operon require verification with PCR. Tandem repeats require manual intervention and typically rely on single nucleotide polymorphisms to be resolved. Remaining gaps require PCR reactions and sequencing. Once the genomes have been ''closed'', low quality regions are addressed by resequencing reactions. (2) Genome analysis: The final consensus sequences are processed by combining the results of three gene modelers: Glimmer, Critica and Generation. The final gene models are submitted to a battery of homology searches and domain prediction programs in order to annotate them (e.g. BLAST, Pfam, TIGRfam, COG, KEGG, InterPro, TMhmm, SignalP). The genome structure is also assessed in terms of G+C content, GC bias (GC skew), and locations of repeated regions (e.g. IS elements) and phage-like genes. (3) Comparative genomics: The results of the various genome analyses are compared between the finished (or almost finished) genomes. Here, we have compared the F. tularensis genomes from the extremely lethal strain Schu4 (subsp. tularensis), the vaccine strain LVS (subsp. holartica), and strain UT01-4992 of the less virulent, opportunistic subsp. novicida. Regions present in the highly virulent strain that are absent from the other less virulent strains may provide insight into what factors are required for the high level of virulence.« less
Automatic processing of spoken dialogue in the home hemodialysis domain.
Lacson, Ronilda; Barzilay, Regina
2005-01-01
Spoken medical dialogue is a valuable source of information, and it forms a foundation for diagnosis, prevention and therapeutic management. However, understanding even a perfect transcript of spoken dialogue is challenging for humans because of the lack of structure and the verbosity of dialogues. This work presents a first step towards automatic analysis of spoken medical dialogue. The backbone of our approach is an abstraction of a dialogue into a sequence of semantic categories. This abstraction uncovers structure in informal, verbose conversation between a caregiver and a patient, thereby facilitating automatic processing of dialogue content. Our method induces this structure based on a range of linguistic and contextual features that are integrated in a supervised machine-learning framework. Our model has a classification accuracy of 73%, compared to 33% achieved by a majority baseline (p<0.01). This work demonstrates the feasibility of automatically processing spoken medical dialogue.
The importance of the specific Z-DNA structure and polyamines in carcinogenesis: fact or fiction.
Juranic, Z; Kidric, M; Tomin, R; Juranić, I; Spuzić, I; Petrović, J
1991-08-01
In this work some aspects of carcinogenesis are given. The importance of the emergence of Z or H DNA structure in the gene, or in the flanking gene sequences for the gene deletion and unusual gene recombination, is discussed. Some considerations on the role of selective pressure (of polyamines, of Mg2+, of the various levels of topoisomerase II, and of ATP) in the process of oncogene amplification, are given too.
The correlation of fragmentation and structure of a protein
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Qinyuan; Cheng, Xueheng; Van Orden, S.
1995-12-31
Characterization of proteins of similar structures is important to understanding the biological function of the proteins and the processes with which they are involved. Cytochrome c variants typically have similar sequences, and have similar conformations in solution with almost identical absorption spectra and redox potentials. The authors chose cytochrome c`s from bovine, tuna, rabbit and horse as a model system in studying large biomolecules using MS{sup n} of multiply charged ions generated from electrospray ionization (ESI).
Schmidt, Christoph; Piper, Diana; Pester, Britta; Mierau, Andreas; Witte, Herbert
2018-05-01
Identification of module structure in brain functional networks is a promising way to obtain novel insights into neural information processing, as modules correspond to delineated brain regions in which interactions are strongly increased. Tracking of network modules in time-varying brain functional networks is not yet commonly considered in neuroscience despite its potential for gaining an understanding of the time evolution of functional interaction patterns and associated changing degrees of functional segregation and integration. We introduce a general computational framework for extracting consensus partitions from defined time windows in sequences of weighted directed edge-complete networks and show how the temporal reorganization of the module structure can be tracked and visualized. Part of the framework is a new approach for computing edge weight thresholds for individual networks based on multiobjective optimization of module structure quality criteria as well as an approach for matching modules across time steps. By testing our framework using synthetic network sequences and applying it to brain functional networks computed from electroencephalographic recordings of healthy subjects that were exposed to a major balance perturbation, we demonstrate the framework's potential for gaining meaningful insights into dynamic brain function in the form of evolving network modules. The precise chronology of the neural processing inferred with our framework and its interpretation helps to improve the currently incomplete understanding of the cortical contribution for the compensation of such balance perturbations.
NASA Astrophysics Data System (ADS)
Sarkar, R.; Das, P.; Basu Sarbadhikari, A.
2017-12-01
A 2 km thick layered sequence within the Noachian Terby crater ( 174 km diameter, 28.0°S - 74.0°E), located at the Northern rim of Hellas basin, has been re-classified here into three major categories, i.e. mega-slump, debris flows, and turbidites based on sedimentation process. A wide spectrum of deformation structures, such as large scale isoclinal moderately inclined fold, pinch and swells, disharmonic folds, sediment loading structure, normal faults and thrust duplexes, suggest that amplitude of the syndepositional deformation spanned from hydroplastic to brittle domains. These structures provide ample evidences of sediment remobilization in Terby. The dominance of such mass-flow deposits in different stratigraphic horizons indicates that the basin was reactivated in frequent intervals during the filling process. However, an undeformed thinning-up sequence of beds, well exhibited at the basinal-lows, identified as ponded/confined turbidites, indicates that the basin experienced a stable bathymetric condition at the up-dip areas of the mega-slumps. An overall enrichment of phyllosilicates and scarcity of large boulders at the basin margins indicates that the provenance materials were deposited under stable and low-energy condition before being transported and re-deposited within the crater during the Terby impact. We presume that the inter-crater layered terrain of Hellas acted as a provenance of Terby's mass-transport deposits.
Process for growing a film epitaxially upon an oxide surface and structures formed with the process
McKee, Rodney Allen; Walker, Frederick Joseph
1998-01-01
A process and structure wherein a film comprised of a perovskite or a spinel is built epitaxially upon a surface, such as an alkaline earth oxide surface, involves the epitaxial build up of alternating constituent metal oxide planes of the perovskite or spinel. The first layer of metal oxide built upon the surface includes a metal element which provides a small cation in the crystalline structure of the perovskite or spinel, and the second layer of metal oxide built upon the surface includes a metal element which provides a large cation in the crystalline structure of the perovskite or spinel. The layering sequence involved in the film build up reduces problems which would otherwise result from the interfacial electrostatics at the first atomic layers, and these oxides can be stabilized as commensurate thin films at a unit cell thickness or grown with high crystal quality to thicknesses of 0.5-0.7 .mu.m for optical device applications.
Process for growing a film epitaxially upon an oxide surface and structures formed with the process
McKee, Rodney A.; Walker, Frederick J.
1995-01-01
A process and structure wherein a film comprised of a perovskite or a spinel is built epitaxially upon a surface, such as an alkaline earth oxide surface, involves the epitaxial build up of alternating constituent metal oxide planes of the perovskite or spinel. The first layer of metal oxide built upon the surface includes a metal element which provides a small cation in the crystalline structure of the perovskite or spinel, and the second layer of metal oxide built upon the surface includes a metal element which provides a large cation in the crystalline structure of the perovskite or spinel. The layering sequence involved in the film build up reduces problems which would otherwise result from the interfacial electrostatics at the first atomic layers, and these oxides can be stabilized as commensurate thin films at a unit cell thickness or grown with high crystal quality to thicknesses of 0.5-0.7 .mu.m for optical device applications.
Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns
2013-01-01
Background It is well known that the search for homologous RNAs is more effective if both sequence and structure information is incorporated into the search. However, current tools for searching with RNA sequence-structure patterns cannot fully handle mutations occurring on both these levels or are simply not fast enough for searching large sequence databases because of the high computational costs of the underlying sequence-structure alignment problem. Results We present new fast index-based and online algorithms for approximate matching of RNA sequence-structure patterns supporting a full set of edit operations on single bases and base pairs. Our methods efficiently compute semi-global alignments of structural RNA patterns and substrings of the target sequence whose costs satisfy a user-defined sequence-structure edit distance threshold. For this purpose, we introduce a new computing scheme to optimally reuse the entries of the required dynamic programming matrices for all substrings and combine it with a technique for avoiding the alignment computation of non-matching substrings. Our new index-based methods exploit suffix arrays preprocessed from the target database and achieve running times that are sublinear in the size of the searched sequences. To support the description of RNA molecules that fold into complex secondary structures with multiple ordered sequence-structure patterns, we use fast algorithms for the local or global chaining of approximate sequence-structure pattern matches. The chaining step removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our improved online algorithm is faster than the best previous method by up to factor 45. Our best new index-based algorithm achieves a speedup of factor 560. Conclusions The presented methods achieve considerable speedups compared to the best previous method. This, together with the expected sublinear running time of the presented index-based algorithms, allows for the first time approximate matching of RNA sequence-structure patterns in large sequence databases. Beyond the algorithmic contributions, we provide with RaligNAtor a robust and well documented open-source software package implementing the algorithms presented in this manuscript. The RaligNAtor software is available at http://www.zbh.uni-hamburg.de/ralignator. PMID:23865810
Architecture of a Species: Phylogenomics of Staphylococcus aureus.
Planet, Paul J; Narechania, Apurva; Chen, Liang; Mathema, Barun; Boundy, Sam; Archer, Gordon; Kreiswirth, Barry
2017-02-01
A deluge of whole-genome sequencing has begun to give insights into the patterns and processes of microbial evolution, but genome sequences have accrued in a haphazard manner, with biased sampling of natural variation that is driven largely by medical and epidemiological priorities. For instance, there is a strong bias for sequencing epidemic lineages of methicillin-resistant Staphylococcus aureus (MRSA) over sensitive isolates (methicillin-sensitive S. aureus: MSSA). As more diverse genomes are sequenced the emerging picture is of a highly subdivided species with a handful of relatively clonal groups (complexes) that, at any given moment, dominate in particular geographical regions. The establishment of hegemony of particular clones appears to be a dynamic process of successive waves of replacement of the previously dominant clone. Here we review the phylogenomic structure of a diverse range of S. aureus, including both MRSA and MSSA. We consider the utility of the concept of the 'core' genome and the impact of recombination and horizontal transfer. We argue that whole-genome surveillance of S. aureus populations could lead to better forecasting of antibiotic resistance and virulence of emerging clones, and a better understanding of the elusive biological factors that determine repeated strain replacement. Copyright © 2016. Published by Elsevier Ltd.
A Spiking Neural Network System for Robust Sequence Recognition.
Yu, Qiang; Yan, Rui; Tang, Huajin; Tan, Kay Chen; Li, Haizhou
2016-03-01
This paper proposes a biologically plausible network architecture with spiking neurons for sequence recognition. This architecture is a unified and consistent system with functional parts of sensory encoding, learning, and decoding. This is the first systematic model attempting to reveal the neural mechanisms considering both the upstream and the downstream neurons together. The whole system is a consistent temporal framework, where the precise timing of spikes is employed for information processing and cognitive computing. Experimental results show that the system is competent to perform the sequence recognition, being robust to noisy sensory inputs and invariant to changes in the intervals between input stimuli within a certain range. The classification ability of the temporal learning rule used in the system is investigated through two benchmark tasks that outperform the other two widely used learning rules for classification. The results also demonstrate the computational power of spiking neurons over perceptrons for processing spatiotemporal patterns. In summary, the system provides a general way with spiking neurons to encode external stimuli into spatiotemporal spikes, to learn the encoded spike patterns with temporal learning rules, and to decode the sequence order with downstream neurons. The system structure would be beneficial for developments in both hardware and software.
Schwartze, Michael; Keller, Peter E; Patel, Aniruddh D; Kotz, Sonja A
2011-01-20
The basal ganglia (BG) are part of extensive subcortico-cortical circuits that are involved in a variety of motor and non-motor cognitive functions. Accumulating evidence suggests that one specific function that engages the BG and associated cortico-striato-thalamo-cortical circuitry is temporal processing, i.e., the mechanisms that underlie the encoding, decoding and evaluation of temporal relations or temporal structure. In the current study we investigated the interplay of two processes that require precise representations of temporal structure, namely the perception of an auditory pacing signal and manual motor production by means of finger tapping in a sensorimotor synchronization task. Patients with focal lesions of the BG and healthy control participants were asked to align finger taps to tone sequences that either did or did not contain a tempo acceleration or tempo deceleration at a predefined position, and to continue tapping at the final tempo after the pacing sequence had ceased. Performance in this adaptive synchronization-continuation paradigm differed between the two groups. Selective damage to the BG affected the abilities to detect tempo changes and to perform attention-dependent error correction, particularly in response to tempo decelerations. An additional assessment of preferred spontaneous, i.e., unpaced but regular, production rates yielded more heterogeneous results in the patient group. Together these findings provide evidence for less efficient processing in the perception and the production of temporal structure in patients with focal BG lesions. The results also support the functional role of the BG system in attention-dependent temporal processing. Copyright © 2010 Elsevier B.V. All rights reserved.
Dong, Zheng; Zhou, Hongyu; Tao, Peng
2018-02-01
PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence-structure-dynamics-function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence-conserved residues and build phylogenetic tree. Three-dimensional structure alignment was also applied to obtain structure-conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics. © 2017 The Protein Society.
How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis.
Tian, Pengfei; Best, Robert B
2017-10-17
Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance. Published by Elsevier Inc.
Cube - an online tool for comparison and contrasting of protein sequences.
Zhang, Zong Hong; Khoo, Aik Aun; Mihalek, Ivana
2013-01-01
When comparing sequences of similar proteins, two kinds of questions can be asked, and the related two kinds of inference made. First, one may ask to what degree they are similar, and then, how they differ. In the first case one may tentatively conclude that the conserved elements common to all sequences are of central and common importance to the protein's function. In the latter case the regions of specialization may be discriminative of the function or binding partners across subfamilies of related proteins. Experimental efforts - mutagenesis or pharmacological intervention - can then be pointed in either direction, depending on the context of the study. Cube simplifies this process for users that already have their favorite sets of sequences, and helps them collate the information by visualization of the conservation and specialization scores on the sequence and on the structure, and by spreadsheet tabulation. All information can be visualized on the spot, or downloaded for reference and later inspection. http://eopsf.org/cube.
The effects of processing and sequence organization on the timing of turn taking: a corpus study
Roberts, Seán G.; Torreira, Francisco; Levinson, Stephen C.
2015-01-01
The timing of turn taking in conversation is extremely rapid given the cognitive demands on speakers to comprehend, plan and execute turns in real time. Findings from psycholinguistics predict that the timing of turn taking is influenced by demands on processing, such as word frequency or syntactic complexity. An alternative view comes from the field of conversation analysis, which predicts that the rules of turn-taking and sequence organization may dictate the variation in gap durations (e.g., the functional role of each turn in communication). In this paper, we estimate the role of these two different kinds of factors in determining the speed of turn-taking in conversation. We use the Switchboard corpus of English telephone conversation, already richly annotated for syntactic structure speech act sequences, and segmental alignment. To this we add further information including Floor Transfer Offset (the amount of time between the end of one turn and the beginning of the next), word frequency, concreteness, and surprisal values. We then apply a novel statistical framework (“random forests”) to show that these two dimensions are interwoven together with indexical properties of the speakers as explanatory factors determining the speed of response. We conclude that an explanation of the of the timing of turn taking will require insights from both processing and sequence organization. PMID:26029125
Oh, Jeong-Wook; Lim, Dong-Kwon; Kim, Gyeong-Hwan; Suh, Yung Doug; Nam, Jwa-Min
2014-10-08
The design, synthesis and control of plasmonic nanostructures, especially with ultrasmall plasmonically coupled nanogap (∼1 nm or smaller), are of significant interest and importance in chemistry, nanoscience, materials science, optics and nanobiotechnology. Here, we studied and established the thiolated DNA-based synthetic principles and methods in forming and controlling Au core-nanogap-Au shell structures [Au-nanobridged nanogap particles (Au-NNPs)] with various interior nanogap and Au shell structures. We found that differences in the binding affinities and modes among four different bases to Au core, DNA sequence, DNA grafting density and chemical reagents alter Au shell growth mechanism and interior nanogap-forming process on thiolated DNA-modified Au core. Importantly, poly A or poly C sequence creates a wider interior nanogap with a smoother Au shell, while poly T sequence results in a narrower interstitial interior gap with rougher Au shell, and on the basis of the electromagnetic field calculation and experimental results, we unraveled the relationships between the width of the interior plasmonic nanogap, Au shell structure, electromagnetic field and surface-enhanced Raman scattering. These principles and findings shown in this paper offer the fundamental basis for the thiolated DNA-based chemistry in forming and controlling metal nanostructures with ∼1 nm plasmonic gap and insight in the optical properties of the plasmonic NNPs, and these plasmonic nanogap structures are useful as strong and controllable optical signal-generating nanoprobes.
Phage display as a technology delivering on the promise of peptide drug discovery.
Hamzeh-Mivehroud, Maryam; Alizadeh, Ali Akbar; Morris, Michael B; Church, W Bret; Dastmalchi, Siavoush
2013-12-01
Phage display represents an important approach in the development pipeline for producing peptides and peptidomimetics therapeutics. Using randomly generated DNA sequences and molecular biology techniques, large diverse peptide libraries can be displayed on the phage surface. The phage library can be incubated with a target of interest and the phage which bind can be isolated and sequenced to reveal the displayed peptides' primary structure. In this review, we focus on the 'mechanics' of the phage display process, whilst highlighting many diverse and subtle ways it has been used to further the drug-development process, including the potential for the phage particle itself to be used as a drug carrier targeted to a particular pathogen or cell type in the body. Copyright © 2013 Elsevier Ltd. All rights reserved.
Detecting Coevolution in and among Protein Domains
Yeang, Chen-Hsiang; Haussler, David
2007-01-01
Correlated changes of nucleic or amino acids have provided strong information about the structures and interactions of molecules. Despite the rich literature in coevolutionary sequence analysis, previous methods often have to trade off between generality, simplicity, phylogenetic information, and specific knowledge about interactions. Furthermore, despite the evidence of coevolution in selected protein families, a comprehensive screening of coevolution among all protein domains is still lacking. We propose an augmented continuous-time Markov process model for sequence coevolution. The model can handle different types of interactions, incorporate phylogenetic information and sequence substitution, has only one extra free parameter, and requires no knowledge about interaction rules. We employ this model to large-scale screenings on the entire protein domain database (Pfam). Strikingly, with 0.1 trillion tests executed, the majority of the inferred coevolving protein domains are functionally related, and the coevolving amino acid residues are spatially coupled. Moreover, many of the coevolving positions are located at functionally important sites of proteins/protein complexes, such as the subunit linkers of superoxide dismutase, the tRNA binding sites of ribosomes, the DNA binding region of RNA polymerase, and the active and ligand binding sites of various enzymes. The results suggest sequence coevolution manifests structural and functional constraints of proteins. The intricate relations between sequence coevolution and various selective constraints are worth pursuing at a deeper level. PMID:17983264
Vembar, Shruthi Sridhar; Seetin, Matthew; Lambert, Christine; Nattestad, Maria; Schatz, Michael C.; Baybayan, Primo; Scherf, Artur; Smith, Melissa Laird
2016-01-01
The application of next-generation sequencing to estimate genetic diversity of Plasmodium falciparum, the most lethal malaria parasite, has proved challenging due to the skewed AT-richness [∼80.6% (A + T)] of its genome and the lack of technology to assemble highly polymorphic subtelomeric regions that contain clonally variant, multigene virulence families (Ex: var and rifin). To address this, we performed amplification-free, single molecule, real-time sequencing of P. falciparum genomic DNA and generated reads of average length 12 kb, with 50% of the reads between 15.5 and 50 kb in length. Next, using the Hierarchical Genome Assembly Process, we assembled the P. falciparum genome de novo and successfully compiled all 14 nuclear chromosomes telomere-to-telomere. We also accurately resolved centromeres [∼90–99% (A + T)] and subtelomeric regions and identified large insertions and duplications that add extra var and rifin genes to the genome, along with smaller structural variants such as homopolymer tract expansions. Overall, we show that amplification-free, long-read sequencing combined with de novo assembly overcomes major challenges inherent to studying the P. falciparum genome. Indeed, this technology may not only identify the polymorphic and repetitive subtelomeric sequences of parasite populations from endemic areas but may also evaluate structural variation linked to virulence, drug resistance and disease transmission. PMID:27345719
Droplet barcoding for single cell transcriptomics applied to embryonic stem cells
Klein, Allon M; Mazutis, Linas; Akartuna, Ilke; Tallapragada, Naren; Veres, Adrian; Li, Victor; Peshkin, Leonid; Weitz, David A; Kirschner, Marc W
2015-01-01
Summary It has long been the dream of biologists to map gene expression at the single cell level. With such data one might track heterogeneous cell sub-populations, and infer regulatory relationships between genes and pathways. Recently, RNA sequencing has achieved single cell resolution. What is limiting is an effective way to routinely isolate and process large numbers of individual cells for quantitative in-depth sequencing. We have developed a high-throughput droplet-microfluidic approach for barcoding the RNA from thousands of individual cells for subsequent analysis by next-generation sequencing. The method shows a surprisingly low noise profile and is readily adaptable to other sequencing-based assays. We analyzed mouse embryonic stem cells, revealing in detail the population structure and the heterogeneous onset of differentiation after LIF withdrawal. The reproducibility of these high-throughput single cell data allowed us to deconstruct cell populations and infer gene expression relationships. PMID:26000487
Preparation of 2D sequences of corneal images for 3D model building.
Elbita, Abdulhakim; Qahwaji, Rami; Ipson, Stanley; Sharif, Mhd Saeed; Ghanchi, Faruque
2014-04-01
A confocal microscope provides a sequence of images, at incremental depths, of the various corneal layers and structures. From these, medical practioners can extract clinical information on the state of health of the patient's cornea. In this work we are addressing problems associated with capturing and processing these images including blurring, non-uniform illumination and noise, as well as the displacement of images laterally and in the anterior-posterior direction caused by subject movement. The latter may cause some of the captured images to be out of sequence in terms of depth. In this paper we introduce automated algorithms for classification, reordering, registration and segmentation to solve these problems. The successful implementation of these algorithms could open the door for another interesting development, which is the 3D modelling of these sequences. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Denoising time-resolved microscopy image sequences with singular value thresholding.
Furnival, Tom; Leary, Rowan K; Midgley, Paul A
2017-07-01
Time-resolved imaging in microscopy is important for the direct observation of a range of dynamic processes in both the physical and life sciences. However, the image sequences are often corrupted by noise, either as a result of high frame rates or a need to limit the radiation dose received by the sample. Here we exploit both spatial and temporal correlations using low-rank matrix recovery methods to denoise microscopy image sequences. We also make use of an unbiased risk estimator to address the issue of how much thresholding to apply in a robust and automated manner. The performance of the technique is demonstrated using simulated image sequences, as well as experimental scanning transmission electron microscopy data, where surface adatom motion and nanoparticle structural dynamics are recovered at rates of up to 32 frames per second. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Armstrong, Miles R; Husmeier, Dirk; Phillips, Mark S; Blok, Vivian C
2007-06-01
The discovery that the potato cyst nematode Globodera pallida has a multipartite mitochondrial DNA (mtDNA) composed, at least in part, of six small circular mtDNAs (scmtDNAs) raised a number of questions concerning the population-level processes that might act on such a complex genome. Here we report our observations on the distribution of some scmtDNAs among a sample of European and South American G. pallida populations. The occurrence of sequence variants of scmtDNA IV in population P4A from South America, and that particular sequence variants are common to the individuals within a single cyst, is described. Evidence for recombination of sequence variants of scmtDNA IV in P4A is also reported. The mosaic structure of P4A scmtDNA IV sequences was revealed using several detection methods and recombination breakpoints were independently detected by maximum likelihood and Bayesian MCMC methods.
ERIC Educational Resources Information Center
Pontrello, Jason K.
2016-01-01
Introductory organic laboratory courses frequently begin with a set of activities built around developing basic experimental skills and techniques, often with guided-inquiry components. A sequence of skill-based activities is described to promote reflection, analysis of, and interpersonal communication around science. A multistage process was used…
What Artificial Grammar Learning Reveals about the Neurobiology of Syntax
ERIC Educational Resources Information Center
Petersson, Karl-Magnus; Folia, Vasiliki; Hagoort, Peter
2012-01-01
In this paper we examine the neurobiological correlates of syntax, the processing of structured sequences, by comparing FMRI results on artificial and natural language syntax. We discuss these and similar findings in the context of formal language and computability theory. We used a simple right-linear unification grammar in an implicit artificial…
(Pea)nuts and Bolts of Visual Narrative: Structure and Meaning in Sequential Image Comprehension
ERIC Educational Resources Information Center
Cohn, Neil; Paczynski, Martin; Jackendoff, Ray; Holcomb, Phillip J.; Kuperberg, Gina R.
2012-01-01
Just as syntax differentiates coherent sentences from scrambled word strings, the comprehension of sequential images must also use a cognitive system to distinguish coherent narrative sequences from random strings of images. We conducted experiments analogous to two classic studies of language processing to examine the contributions of narrative…
Compartmentalization of pathogens in fire-injured trees
Kevin T. Smith
2013-01-01
Wildland fire is an episodic process that greatly influences the composition, structure, and developmental sequence of forests. Most news reports of wildland fire involves blazes fueled by slash, standing dead stems, and snags that reach into tree crowns and burn deeply into the forest floor, causing extensive tree mortality and the eventual replacement of the standing...
Deriving Process-Driven Collaborative Editing Pattern from Collaborative Learning Flow Patterns
ERIC Educational Resources Information Center
Marjanovic, Olivera; Skaf-Molli, Hala; Molli, Pascal; Godart, Claude
2007-01-01
Collaborative Learning Flow Patterns (CLFPs) have recently emerged as a new method to formulate best practices in structuring the flow of activities within various collaborative learning scenarios. The term "learning flow" is used to describe coordination and sequencing of learning tasks. This paper adopts the existing concept of CLFP and argues…
ERIC Educational Resources Information Center
Botvinick, Matthew; Plaut, David C.
2004-01-01
In everyday tasks, selecting actions in the proper sequence requires a continuously updated representation of temporal context. Previous models have addressed this problem by positing a hierarchy of processing units, mirroring the roughly hierarchical structure of naturalistic tasks themselves. The present study considers an alternative framework,…
Neural Bases of Sequence Processing in Action and Language
ERIC Educational Resources Information Center
Carota, Francesca; Sirigu, Angela
2008-01-01
Real-time estimation of what we will do next is a crucial prerequisite of purposive behavior. During the planning of goal-oriented actions, for instance, the temporal and causal organization of upcoming subsequent moves needs to be predicted based on our knowledge of events. A forward computation of sequential structure is also essential for…
Evolutionary Origins of a Bioactive Peptide Buried within Preproalbumin[C][W
Elliott, Alysha G.; Delay, Christina; Liu, Huanle; Phua, Zaiyang; Rosengren, K. Johan; Benfield, Aurélie H.; Panero, Jose L.; Colgrave, Michelle L.; Jayasena, Achala S.; Dunse, Kerry M.; Anderson, Marilyn A.; Schilling, Edward E.; Ortiz-Barrientos, Daniel; Craik, David J.; Mylne, Joshua S.
2014-01-01
The de novo evolution of proteins is now considered a frequented route for biological innovation, but the genetic and biochemical processes that lead to each newly created protein are often poorly documented. The common sunflower (Helianthus annuus) contains the unusual gene PawS1 (Preproalbumin with SFTI-1) that encodes a precursor for seed storage albumin; however, in a region usually discarded during albumin maturation, its sequence is matured into SFTI-1, a protease-inhibiting cyclic peptide with a motif homologous to unrelated inhibitors from legumes, cereals, and frogs. To understand how PawS1 acquired this additional peptide with novel biochemical functionality, we cloned PawS1 genes and showed that this dual destiny is over 18 million years old. This new family of mostly backbone-cyclic peptides is structurally diverse, but the protease-inhibitory motif was restricted to peptides from sunflower and close relatives from its subtribe. We describe a widely distributed, potential evolutionary intermediate PawS-Like1 (PawL1), which is matured into storage albumin, but makes no stable peptide despite possessing residues essential for processing and cyclization from within PawS1. Using sequences we cloned, we retrodict the likely stepwise creation of PawS1’s additional destiny within a simple albumin precursor. We propose that relaxed selection enabled SFTI-1 to evolve its inhibitor function by converging upon a successful sequence and structure. PMID:24681618
Xiao, Yibei; Luo, Min; Hayes, Robert P; Kim, Jonathan; Ng, Sherwin; Ding, Fang; Liao, Maofu; Ke, Ailong
2017-06-29
Type I CRISPR systems feature a sequential dsDNA target searching and degradation process, by crRNA-displaying Cascade and nuclease-helicase fusion enzyme Cas3, respectively. Here we present two cryo-EM snapshots of the Thermobifida fusca type I-E Cascade: (1) unwinding 11 bp of dsDNA at the seed-sequence region to scout for sequence complementarity, and (2) further unwinding of the entire protospacer to form a full R-loop. These structures provide the much-needed temporal and spatial resolution to resolve key mechanistic steps leading to Cas3 recruitment. In the early steps, PAM recognition causes severe DNA bending, leading to spontaneous DNA unwinding to form a seed-bubble. The full R-loop formation triggers conformational changes in Cascade, licensing Cas3 to bind. The same process also generates a bulge in the non-target DNA strand, enabling its handover to Cas3 for cleavage. The combination of both negative and positive checkpoints ensures stringent yet efficient target degradation in type I CRISPR-Cas systems. Copyright © 2017 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Nair, Nisha; Pandey, Dhananjai K.
2018-02-01
Interpretation of multichannel seismic reflection data along the Mumbai Offshore Basin (MOB) revealed the tectonic processes that led to the development of sedimentary basins during Cenozoic evolution. Structural interpretation along three selected MCS profiles from MOB revealed seven major sedimentary sequences (∼3.0 s TWT, thick) and the associated complex fault patterns. These stratigraphic sequences are interpreted to host detritus of syn- to post rift events during rift-drift process. The acoustic basement appeared to be faulted with interspaced intrusive bodies. The sections also depicted the presence of slumping of sediments, subsidence, marginal basins, rollover anticlines, mud diapirs etc accompanied by normal to thrust faults related to recent tectonics. Presence of upthrusts in the slope region marks the locations of local compression during collision. Forward gravity modeling constrained with results from seismic and drill results, revealed that the crustal structure beneath the MOB has undergone an extensional type tectonics intruded with intrusive bodies. Results from the seismo-gravity modeling in association with litholog data from drilled wells from the western continental margin of India (WCMI) are presented here.
Chierotti, Michele R; Gobetto, Roberto; Nervi, Carlo; Bacchi, Alessia; Pelagatti, Paolo; Colombo, Valentina; Sironi, Angelo
2014-01-06
The hydrogen bond network of three polymorphs (1α, 1β, and 1γ) and one solvate form (1·H2O) arising from the hydration-dehydration process of the Ru(II) complex [(p-cymene)Ru(κN-INA)Cl2] (where INA is isonicotinic acid), has been ascertained by means of one-dimensional (1D) and two-dimensional (2D) double quantum (1)H CRAMPS (Combined Rotation and Multiple Pulses Sequences) and (13)C CPMAS solid-state NMR experiments. The resolution improvement provided by homonuclear decoupling pulse sequences, with respect to fast MAS experiments, has been highlighted. The solid-state structure of 1γ has been fully characterized by combining X-ray powder diffraction (XRPD), solid-state NMR, and periodic plane-wave first-principles calculations. None of the forms show the expected supramolecular cyclic dimerization of the carboxylic functions of INA, because of the presence of Cl atoms as strong hydrogen bond (HB) acceptors. The hydration-dehydration process of the complex has been discussed in terms of structure and HB rearrangements.
Zhang, Gaihua; Su, Zhen
2012-01-01
Work on protein structure prediction is very useful in biological research. To evaluate their accuracy, experimental protein structures or their derived data are used as the 'gold standard'. However, as proteins are dynamic molecular machines with structural flexibility such a standard may be unreliable. To investigate the influence of the structure flexibility, we analysed 3,652 protein structures of 137 unique sequences from 24 protein families. The results showed that (1) the three-dimensional (3D) protein structures were not rigid: the root-mean-square deviation (RMSD) of the backbone Cα of structures with identical sequences was relatively large, with the average of the maximum RMSD from each of the 137 sequences being 1.06 Å; (2) the derived data of the 3D structure was not constant, e.g. the highest ratio of the secondary structure wobble site was 60.69%, with the sequence alignments from structural comparisons of two proteins in the same family sometimes being completely different. Proteins may have several stable conformations and the data derived from resolved structures as a 'gold standard' should be optimized before being utilized as criteria to evaluate the prediction methods, e.g. sequence alignment from structural comparison. Helix/β-sheet transition exists in normal free proteins. The coil ratio of the 3D structure could affect its resolution as determined by X-ray crystallography.
Distributed biotin–streptavidin transcription roadblocks for mapping cotranscriptional RNA folding
Strobel, Eric J.; Nedialkov, Yuri; Artsimovitch, Irina
2017-01-01
Abstract RNA folding during transcription directs an order of folding that can determine RNA structure and function. However, the experimental study of cotranscriptional RNA folding has been limited by the lack of easily approachable methods that can interrogate nascent RNA structure at nucleotide resolution. To address this, we previously developed cotranscriptional selective 2΄-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq) to simultaneously probe all intermediate RNA transcripts during transcription by stalling elongation complexes at catalytically dead EcoRIE111Q roadblocks. While effective, the distribution of elongation complexes using EcoRIE111Q requires laborious PCR using many different oligonucleotides for each sequence analyzed. Here, we improve the broad applicability of cotranscriptional SHAPE-Seq by developing a sequence-independent biotin–streptavidin (SAv) roadblocking strategy that simplifies the preparation of roadblocking DNA templates. We first determine the properties of biotin–SAv roadblocks. We then show that randomly distributed biotin–SAv roadblocks can be used in cotranscriptional SHAPE-Seq experiments to identify the same RNA structural transitions related to a riboswitch decision-making process that we previously identified using EcoRIE111Q. Lastly, we find that EcoRIE111Q maps nascent RNA structure to specific transcript lengths more precisely than biotin–SAv and propose guidelines to leverage the complementary strengths of each transcription roadblock in cotranscriptional SHAPE-Seq. PMID:28398514
Takeda, Ryuta; Petrov, Anton I.; Leontis, Neocles B.; Ding, Biao
2011-01-01
Cell-to-cell trafficking of RNA is an emerging biological principle that integrates systemic gene regulation, viral infection, antiviral response, and cell-to-cell communication. A key mechanistic question is how an RNA is specifically selected for trafficking from one type of cell into another type. Here, we report the identification of an RNA motif in Potato spindle tuber viroid (PSTVd) required for trafficking from palisade mesophyll to spongy mesophyll in Nicotiana benthamiana leaves. This motif, called loop 6, has the sequence 5′-CGA-3′...5′-GAC-3′ flanked on both sides by cis Watson-Crick G/C and G/U wobble base pairs. We present a three-dimensional (3D) structural model of loop 6 that specifies all non-Watson-Crick base pair interactions, derived by isostericity-based sequence comparisons with 3D RNA motifs from the RNA x-ray crystal structure database. The model is supported by available chemical modification patterns, natural sequence conservation/variations in PSTVd isolates and related species, and functional characterization of all possible mutants for each of the loop 6 base pairs. Our findings and approaches have broad implications for studying the 3D RNA structural motifs mediating trafficking of diverse RNA species across specific cellular boundaries and for studying the structure-function relationships of RNA motifs in other biological processes. PMID:21258006
Takeda, Ryuta; Petrov, Anton I; Leontis, Neocles B; Ding, Biao
2011-01-01
Cell-to-cell trafficking of RNA is an emerging biological principle that integrates systemic gene regulation, viral infection, antiviral response, and cell-to-cell communication. A key mechanistic question is how an RNA is specifically selected for trafficking from one type of cell into another type. Here, we report the identification of an RNA motif in Potato spindle tuber viroid (PSTVd) required for trafficking from palisade mesophyll to spongy mesophyll in Nicotiana benthamiana leaves. This motif, called loop 6, has the sequence 5'-CGA-3'...5'-GAC-3' flanked on both sides by cis Watson-Crick G/C and G/U wobble base pairs. We present a three-dimensional (3D) structural model of loop 6 that specifies all non-Watson-Crick base pair interactions, derived by isostericity-based sequence comparisons with 3D RNA motifs from the RNA x-ray crystal structure database. The model is supported by available chemical modification patterns, natural sequence conservation/variations in PSTVd isolates and related species, and functional characterization of all possible mutants for each of the loop 6 base pairs. Our findings and approaches have broad implications for studying the 3D RNA structural motifs mediating trafficking of diverse RNA species across specific cellular boundaries and for studying the structure-function relationships of RNA motifs in other biological processes.
Distributed biotin-streptavidin transcription roadblocks for mapping cotranscriptional RNA folding.
Strobel, Eric J; Watters, Kyle E; Nedialkov, Yuri; Artsimovitch, Irina; Lucks, Julius B
2017-07-07
RNA folding during transcription directs an order of folding that can determine RNA structure and function. However, the experimental study of cotranscriptional RNA folding has been limited by the lack of easily approachable methods that can interrogate nascent RNA structure at nucleotide resolution. To address this, we previously developed cotranscriptional selective 2΄-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq) to simultaneously probe all intermediate RNA transcripts during transcription by stalling elongation complexes at catalytically dead EcoRIE111Q roadblocks. While effective, the distribution of elongation complexes using EcoRIE111Q requires laborious PCR using many different oligonucleotides for each sequence analyzed. Here, we improve the broad applicability of cotranscriptional SHAPE-Seq by developing a sequence-independent biotin-streptavidin (SAv) roadblocking strategy that simplifies the preparation of roadblocking DNA templates. We first determine the properties of biotin-SAv roadblocks. We then show that randomly distributed biotin-SAv roadblocks can be used in cotranscriptional SHAPE-Seq experiments to identify the same RNA structural transitions related to a riboswitch decision-making process that we previously identified using EcoRIE111Q. Lastly, we find that EcoRIE111Q maps nascent RNA structure to specific transcript lengths more precisely than biotin-SAv and propose guidelines to leverage the complementary strengths of each transcription roadblock in cotranscriptional SHAPE-Seq. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Brylinski, Michal; Konieczny, Leszek; Kononowicz, Andrzej; Roterman, Irena
2008-03-21
The well-known procedure implemented in ClustalW oriented on the sequence comparison was applied to structure comparison. The consensus sequence as well as consensus structure has been defined for proteins belonging to serpine family. The structure of early stage intermediate was the object for similarity search. The high values of W(sequence) appeared to be accordant with high values of W(structure) making possible structure comparison using common criteria for sequence and structure comparison. Since the early stage structural form has been created according to limited conformational sub-space which does not include the beta-structure (this structure is mediated by C7eq structural form), is particularly important to see, that the C7eq structural form may be treated as the seed for beta-structure present in the final native structure of protein. The applicability of ClustalW procedure to structure comparison makes these two comparisons unified.
Jones, Darryl R; Thomas, Dallas; Alger, Nicholas; Ghavidel, Ata; Inglis, G Douglas; Abbott, D Wade
2018-01-01
Deposition of new genetic sequences in online databases is expanding at an unprecedented rate. As a result, sequence identification continues to outpace functional characterization of carbohydrate active enzymes (CAZymes). In this paradigm, the discovery of enzymes with novel functions is often hindered by high volumes of uncharacterized sequences particularly when the enzyme sequence belongs to a family that exhibits diverse functional specificities (i.e., polyspecificity). Therefore, to direct sequence-based discovery and characterization of new enzyme activities we have developed an automated in silico pipeline entitled: Sequence Analysis and Clustering of CarboHydrate Active enzymes for Rapid Informed prediction of Specificity (SACCHARIS). This pipeline streamlines the selection of uncharacterized sequences for discovery of new CAZyme or CBM specificity from families currently maintained on the CAZy website or within user-defined datasets. SACCHARIS was used to generate a phylogenetic tree of a GH43, a CAZyme family with defined subfamily designations. This analysis confirmed that large datasets can be organized into sequence clusters of manageable sizes that possess related functions. Seeding this tree with a GH43 sequence from Bacteroides dorei DSM 17855 (BdGH43b, revealed it partitioned as a single sequence within the tree. This pattern was consistent with it possessing a unique enzyme activity for GH43 as BdGH43b is the first described α-glucanase described for this family. The capacity of SACCHARIS to extract and cluster characterized carbohydrate binding module sequences was demonstrated using family 6 CBMs (i.e., CBM6s). This CBM family displays a polyspecific ligand binding profile and contains many structurally determined members. Using SACCHARIS to identify a cluster of divergent sequences, a CBM6 sequence from a unique clade was demonstrated to bind yeast mannan, which represents the first description of an α-mannan binding CBM. Additionally, we have performed a CAZome analysis of an in-house sequenced bacterial genome and a comparative analysis of B. thetaiotaomicron VPI-5482 and B. thetaiotaomicron 7330, to demonstrate that SACCHARIS can generate "CAZome fingerprints", which differentiate between the saccharolytic potential of two related strains in silico. Establishing sequence-function and sequence-structure relationships in polyspecific CAZyme families are promising approaches for streamlining enzyme discovery. SACCHARIS facilitates this process by embedding CAZyme and CBM family trees generated from biochemically to structurally characterized sequences, with protein sequences that have unknown functions. In addition, these trees can be integrated with user-defined datasets (e.g., genomics, metagenomics, and transcriptomics) to inform experimental characterization of new CAZymes or CBMs not currently curated, and for researchers to compare differential sequence patterns between entire CAZomes. In this light, SACCHARIS provides an in silico tool that can be tailored for enzyme bioprospecting in datasets of increasing complexity and for diverse applications in glycobiotechnology.
Metamorphic Proteins: Emergence of Dual Protein Folds from One Primary Sequence.
Lella, Muralikrishna; Mahalakshmi, Radhakrishnan
2017-06-20
Every amino acid exhibits a different propensity for distinct structural conformations. Hence, decoding how the primary amino acid sequence undergoes the transition to a defined secondary structure and its final three-dimensional fold is presently considered predictable with reasonable certainty. However, protein sequences that defy the first principles of secondary structure prediction (they attain two different folds) have recently been discovered. Such proteins, aptly named metamorphic proteins, decrease the conformational constraint by increasing flexibility in the secondary structure and thereby result in efficient functionality. In this review, we discuss the major factors driving the conformational switch related both to protein sequence and to structure using illustrative examples. We discuss the concept of an evolutionary transition in sequence and structure, the functional impact of the tertiary fold, and the pressure of intrinsic and external factors that give rise to metamorphic proteins. We mainly focus on the major components of protein architecture, namely, the α-helix and β-sheet segments, which are involved in conformational switching within the same or highly similar sequences. These chameleonic sequences are widespread in both cytosolic and membrane proteins, and these folds are equally important for protein structure and function. We discuss the implications of metamorphic proteins and chameleonic peptide sequences in de novo peptide design.
SMARTIV: combined sequence and structure de-novo motif discovery for in-vivo RNA binding data.
Polishchuk, Maya; Paz, Inbal; Yakhini, Zohar; Mandel-Gutfreund, Yael
2018-05-25
Gene expression regulation is highly dependent on binding of RNA-binding proteins (RBPs) to their RNA targets. Growing evidence supports the notion that both RNA primary sequence and its local secondary structure play a role in specific Protein-RNA recognition and binding. Despite the great advance in high-throughput experimental methods for identifying sequence targets of RBPs, predicting the specific sequence and structure binding preferences of RBPs remains a major challenge. We present a novel webserver, SMARTIV, designed for discovering and visualizing combined RNA sequence and structure motifs from high-throughput RNA-binding data, generated from in-vivo experiments. The uniqueness of SMARTIV is that it predicts motifs from enriched k-mers that combine information from ranked RNA sequences and their predicted secondary structure, obtained using various folding methods. Consequently, SMARTIV generates Position Weight Matrices (PWMs) in a combined sequence and structure alphabet with assigned P-values. SMARTIV concisely represents the sequence and structure motif content as a single graphical logo, which is informative and easy for visual perception. SMARTIV was examined extensively on a variety of high-throughput binding experiments for RBPs from different families, generated from different technologies, showing consistent and accurate results. Finally, SMARTIV is a user-friendly webserver, highly efficient in run-time and freely accessible via http://smartiv.technion.ac.il/.
New Nomenclatures for Heat Treatments of Additively Manufactured Titanium Alloys
NASA Astrophysics Data System (ADS)
Baker, Andrew H.; Collins, Peter C.; Williams, James C.
2017-07-01
The heat-treatment designations and microstructure nomenclatures for many structural metallic alloys were established for traditional metals processing, such as casting, hot rolling or forging. These terms do not necessarily apply for additively manufactured (i.e., three-dimensionally printed or "3D printed") metallic structures. The heat-treatment terminology for titanium alloys generally implies the heat-treatment temperatures and their sequence relative to a thermomechanical processing step (e.g., forging, rolling). These designations include: β-processing, α + β-processing, β-annealing, duplex annealing and mill annealing. Owing to the absence of a thermomechanical processing step, these traditional designations can pose a problem when titanium alloys are first produced via additive manufacturing, and then heat-treated. This communication proposes new nomenclatures for heat treatments of additively manufactured titanium alloys, and uses the distinct microstructural features to provide a correlation between traditional nomenclature and the proposed nomenclature.
Voss, T; Falkner, E; Ahorn, H; Krystek, E; Maurer-Fogy, I; Bodo, G; Hauptmann, R
1994-01-01
Human interferon-alpha 2c (IFN-alpha 2c) was produced in Escherichia coli under the control of the alkaline phosphatase promoter using a periplasmic expression system. Compared with other leader sequences, the heat-stable enterotoxin II leader of E. coli (STII) resulted in the highest rate of correct processing as judged by Western-blot analysis. The fermentation was designed as a batch-fed process in order to obtain a high yield of biomass. The processing rate of IFN-alpha 2c could be increased from 25% to more than 50% by shifting the fermentation pH from 7.0 to 6.7. IFN-alpha 2c extracted from the periplasm was purified by a new four-step chromatographic procedure. Whereas cytoplasmically produced IFN-alpha 2c does not have its full native structure, IFN-alpha 2c extracted from the periplasm was found to be correctly folded, as shown by c.d. spectroscopy. Peptide-map analysis in combination with m.s. revealed the correct formation of disulphide bridges. N-terminal sequence analysis showed complete removal of the leader sequence, creating the authentic N-terminus starting with cysteine. Images Figure 3 Figure 4 Figure 6 PMID:8141788
Automated hierarchical time gain compensation for in-vivo ultrasound imaging
NASA Astrophysics Data System (ADS)
Moshavegh, Ramin; Hemmsen, Martin C.; Martins, Bo; Brandt, Andreas H.; Hansen, Kristoffer L.; Nielsen, Michael B.; Jensen, Jørgen A.
2015-03-01
Time gain compensation (TGC) is essential to ensure the optimal image quality of the clinical ultrasound scans. When large fluid collections are present within the scan plane, the attenuation distribution is changed drastically and TGC compensation becomes challenging. This paper presents an automated hierarchical TGC (AHTGC) algorithm that accurately adapts to the large attenuation variation between different types of tissues and structures. The algorithm relies on estimates of tissue attenuation, scattering strength, and noise level to gain a more quantitative understanding of the underlying tissue and the ultrasound signal strength. The proposed algorithm was applied to a set of 44 in vivo abdominal movie sequences each containing 15 frames. Matching pairs of in vivo sequences, unprocessed and processed with the proposed AHTGC were visualized side by side and evaluated by two radiologists in terms of image quality. Wilcoxon signed-rank test was used to evaluate whether radiologists preferred the processed sequences or the unprocessed data. The results indicate that the average visual analogue scale (VAS) is positive ( p-value: 2.34 × 10-13) and estimated to be 1.01 (95% CI: 0.85; 1.16) favoring the processed data with the proposed AHTGC algorithm.
Biological sequence compression algorithms.
Matsumoto, T; Sadakane, K; Imai, H
2000-01-01
Today, more and more DNA sequences are becoming available. The information about DNA sequences are stored in molecular biology databases. The size and importance of these databases will be bigger and bigger in the future, therefore this information must be stored or communicated efficiently. Furthermore, sequence compression can be used to define similarities between biological sequences. The standard compression algorithms such as gzip or compress cannot compress DNA sequences, but only expand them in size. On the other hand, CTW (Context Tree Weighting Method) can compress DNA sequences less than two bits per symbol. These algorithms do not use special structures of biological sequences. Two characteristic structures of DNA sequences are known. One is called palindromes or reverse complements and the other structure is approximate repeats. Several specific algorithms for DNA sequences that use these structures can compress them less than two bits per symbol. In this paper, we improve the CTW so that characteristic structures of DNA sequences are available. Before encoding the next symbol, the algorithm searches an approximate repeat and palindrome using hash and dynamic programming. If there is a palindrome or an approximate repeat with enough length then our algorithm represents it with length and distance. By using this preprocessing, a new program achieves a little higher compression ratio than that of existing DNA-oriented compression algorithms. We also describe new compression algorithm for protein sequences.
Zhang, Fan; Zhang, Bing; Xiang, Hua; Hu, Songnian
2009-11-01
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is a widespread system that provides acquired resistance against phages in bacteria and archaea. Here we aim to genome-widely analyze the CRISPR in extreme halophilic archaea, of which the whole genome sequences are available at present time. We used bioinformatics methods including alignment, conservation analysis, GC content and RNA structure prediction to analyze the CRISPR structures of 7 haloarchaeal genomes. We identified the CRISPR structures in 5 halophilic archaea and revealed a conserved palindromic motif in the flanking regions of these CRISPR structures. In addition, we found that the repeat sequences of large CRISPR structures in halophilic archaea were greatly conserved, and two types of predicted RNA secondary structures derived from the repeat sequences were likely determined by the fourth base of the repeat sequence. Our results support the proposal that the leader sequence may function as recognition site by having palindromic structures in flanking regions, and the stem-loop secondary structure formed by repeat sequences may function in mediating the interaction between foreign genetic elements and CAS-encoded proteins.
BIOPEP database and other programs for processing bioactive peptide sequences.
Minkiewicz, Piotr; Dziuba, Jerzy; Iwaniak, Anna; Dziuba, Marta; Darewicz, Małgorzata
2008-01-01
This review presents the potential for application of computational tools in peptide science based on a sample BIOPEP database and program as well as other programs and databases available via the World Wide Web. The BIOPEP application contains a database of biologically active peptide sequences and a program enabling construction of profiles of the potential biological activity of protein fragments, calculation of quantitative descriptors as measures of the value of proteins as potential precursors of bioactive peptides, and prediction of bonds susceptible to hydrolysis by endopeptidases in a protein chain. Other bioactive and allergenic peptide sequence databases are also presented. Programs enabling the construction of binary and multiple alignments between peptide sequences, the construction of sequence motifs attributed to a given type of bioactivity, searching for potential precursors of bioactive peptides, and the prediction of sites susceptible to proteolytic cleavage in protein chains are available via the Internet as are other approaches concerning secondary structure prediction and calculation of physicochemical features based on amino acid sequence. Programs for prediction of allergenic and toxic properties have also been developed. This review explores the possibilities of cooperation between various programs.
Prediction of redox-sensitive cysteines using sequential distance and other sequence-based features.
Sun, Ming-An; Zhang, Qing; Wang, Yejun; Ge, Wei; Guo, Dianjing
2016-08-24
Reactive oxygen species can modify the structure and function of proteins and may also act as important signaling molecules in various cellular processes. Cysteine thiol groups of proteins are particularly susceptible to oxidation. Meanwhile, their reversible oxidation is of critical roles for redox regulation and signaling. Recently, several computational tools have been developed for predicting redox-sensitive cysteines; however, those methods either only focus on catalytic redox-sensitive cysteines in thiol oxidoreductases, or heavily depend on protein structural data, thus cannot be widely used. In this study, we analyzed various sequence-based features potentially related to cysteine redox-sensitivity, and identified three types of features for efficient computational prediction of redox-sensitive cysteines. These features are: sequential distance to the nearby cysteines, PSSM profile and predicted secondary structure of flanking residues. After further feature selection using SVM-RFE, we developed Redox-Sensitive Cysteine Predictor (RSCP), a SVM based classifier for redox-sensitive cysteine prediction using primary sequence only. Using 10-fold cross-validation on RSC758 dataset, the accuracy, sensitivity, specificity, MCC and AUC were estimated as 0.679, 0.602, 0.756, 0.362 and 0.727, respectively. When evaluated using 10-fold cross-validation with BALOSCTdb dataset which has structure information, the model achieved performance comparable to current structure-based method. Further validation using an independent dataset indicates it is robust and of relatively better accuracy for predicting redox-sensitive cysteines from non-enzyme proteins. In this study, we developed a sequence-based classifier for predicting redox-sensitive cysteines. The major advantage of this method is that it does not rely on protein structure data, which ensures more extensive application compared to other current implementations. Accurate prediction of redox-sensitive cysteines not only enhances our understanding about the redox sensitivity of cysteine, it may also complement the proteomics approach and facilitate further experimental investigation of important redox-sensitive cysteines.
Churkin, Alexander; Barash, Danny
2008-01-01
Background RNAmute is an interactive Java application which, given an RNA sequence, calculates the secondary structure of all single point mutations and organizes them into categories according to their similarity to the predicted structure of the wild type. The secondary structure predictions are performed using the Vienna RNA package. A more efficient implementation of RNAmute is needed, however, to extend from the case of single point mutations to the general case of multiple point mutations, which may often be desired for computational predictions alongside mutagenesis experiments. But analyzing multiple point mutations, a process that requires traversing all possible mutations, becomes highly expensive since the running time is O(nm) for a sequence of length n with m-point mutations. Using Vienna's RNAsubopt, we present a method that selects only those mutations, based on stability considerations, which are likely to be conformational rearranging. The approach is best examined using the dot plot representation for RNA secondary structure. Results Using RNAsubopt, the suboptimal solutions for a given wild-type sequence are calculated once. Then, specific mutations are selected that are most likely to cause a conformational rearrangement. For an RNA sequence of about 100 nts and 3-point mutations (n = 100, m = 3), for example, the proposed method reduces the running time from several hours or even days to several minutes, thus enabling the practical application of RNAmute to the analysis of multiple-point mutations. Conclusion A highly efficient addition to RNAmute that is as user friendly as the original application but that facilitates the practical analysis of multiple-point mutations is presented. Such an extension can now be exploited prior to site-directed mutagenesis experiments by virologists, for example, who investigate the change of function in an RNA virus via mutations that disrupt important motifs in its secondary structure. A complete explanation of the application, called MultiRNAmute, is available at [1]. PMID:18445289
Ganesan, K; Parthasarathy, S
2011-12-01
Annotation of any newly determined protein sequence depends on the pairwise sequence identity with known sequences. However, for the twilight zone sequences which have only 15-25% identity, the pair-wise comparison methods are inadequate and the annotation becomes a challenging task. Such sequences can be annotated by using methods that recognize their fold. Bowie et al. described a 3D1D profile method in which the amino acid sequences that fold into a known 3D structure are identified by their compatibility to that known 3D structure. We have improved the above method by using the predicted secondary structure information and employ it for fold recognition from the twilight zone sequences. In our Protein Secondary Structure 3D1D (PSS-3D1D) method, a score (w) for the predicted secondary structure of the query sequence is included in finding the compatibility of the query sequence to the known fold 3D structures. In the benchmarks, the PSS-3D1D method shows a maximum of 21% improvement in predicting correctly the α + β class of folds from the sequences with twilight zone level of identity, when compared with the 3D1D profile method. Hence, the PSS-3D1D method could offer more clues than the 3D1D method for the annotation of twilight zone sequences. The web based PSS-3D1D method is freely available in the PredictFold server at http://bioinfo.bdu.ac.in/servers/ .
Folding and Stabilization of Native-Sequence-Reversed Proteins
Zhang, Yuanzhao; Weber, Jeffrey K; Zhou, Ruhong
2016-01-01
Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols. PMID:27113844